The generics implementation of Go 1.18

Last updated on Dec 1, 2022

After years of demand, Go 1.18 finally introduced generics. Generics is a shorthand for a function or type that takes types as parameters. Go already supported a form of polymorphism using interfaces, and one of the propositions for generics is that it will eliminate the runtime overhead for interface method lookups. But this may not be true in every scenario with current Go Generics implementation.

In this blog, we will look at how Go implements parametric polymorphism, and why it may hurt performance if not used properly.

Ways to implement type-parametric polymorphism

There are many ways for a programming language to implement generic reification (a.k.a. parametric polymorphism), but we would mainly focus on two major solutions — "Monomorphisation" and "Boxing".

Monomorphisation is used by programming languages like C++, Rust and D. One of the most straightforward approaches to doing monomorphisation, is copying the code multiple times for the different reification of types — converting polymorphic to a concrete function. This can improve the runtime performance but incurs a compile-time cost and can produce a bloated binary.

In Boxing, "values" are boxed and passed around as references of the polymorphic type, generally using a pointer table, commonly called virtual table. This is how Go's interface are implemented. This generally produces a smaller binary and requires short compile-times but may have a toll on the runtime performance.

Go generics combines concepts from "monomorphisation" (stenciling) and "boxing" (dynamic dispatch) and is implemented using GCshape stenciling and dictionaries. This allows Go to have fast compile times and smaller binaries while having generics.

The Go way

Go does monomorphisation (stenciling) for objects with the same GCshape. It's also how the object appears to the garbage collector, or as the Go design document states "two concrete types are in the same GCshape grouping if and only if they have the same underlying type or they are both pointer types".

This means that an int is different from a string and will generate other code. We can verify this behavior by compiling the following snippet and inspecting the binary in lensm.

package main

func foo[T int | string](data T) {
  println(data)
}

func main() {
  foo(123)
  foo("string")
}

Screenshot of lensm showing the compiled code for the above snippet

It can be seen that foo(123) calls main.foo[go.shape.int_0] and foo("string") calls main.foo[go.shape.string_0].

All shapes are put into a built-in package called go.shape. The compiler also includes the type parameter index into the shape. The underlying type for the shape of an int may be go.shape.int_0 or go.shape.int_1, depending on whether the type parameter is used as the first or second argument.

Also, all pointer types share the same GCshape, i.e., both *int and *string generate the same code and are not monomorphised. This is also true for interfaces. Consider the following snippet:

package main

type Foo interface {
    Stuff()
}

type Bar struct{}

func (Bar) Stuff() {}

func foo[T *int | *string](data T) {
    println(data)
}

func baz[T Foo](data T) {
    println(data)
}

func main() {
    a := 123
    b := "string"

    foo(&a)
    foo(&b)
    baz(&Bar{})
}

Screenshot of lensm showing the compiled code for the above snippet

This time both foo(&a) and foo(&b) call main.foo[go.shape.*uint8_0] and baz(&Bar{}) calls main.baz[go.shape.*uint8_0]. You can also see that main.baz was not monomorphised for Bar as all interface types share the same GCshape.

Resolving methods for objects of the same `GCshape`

You might wonder how Go does the dynamic dispatch, i.e., resolve the methods for objects for the same GCshape. This is where the "dictionary" part of the spec comes in. Take the following snippet as an example.

package main

type Foo interface {
    Stuff() string
}

type Bar struct{}

func (Bar) Stuff() string {
    return "bar"
}

type Baz struct {
    a string // different "shape" than Bar
}

func (b Baz) Stuff() string {
    return "baz" + b.a
}

func stuff[T Foo](data T) {
    s := data.Stuff()
    println(s)
}

func main() {
    stuff(&Bar{})
    stuff(&Baz{})
}

Screenshot of lensm showing the compiled code for the above snippet

For x86_64, you can see that a dictionary is passed into the AX register. Go passes a dictionary to the callee function with the current generics implementation. The dictionary includes all the metadata to resolve methods and convert them to and from interfaces.

Screenshot of lensm showing interface resolution using dictionaries

You can see that before calling data.Stuff(), the runtime resolves the function address using the dictionary stored in the register AX. This is similar to how the interface's "virtual" method calls work.

Dictionaries are statically defined at compile time and named after the fully-qualified name of the generic function and the name of the type parameters. Dictionaries with the same name are de-duplicated by the compiler.

Performance

Re-writing interface-based APIs with generics can be more performant but might even hurt the performance due to method resolution using dictionaries. Consider the following snippet:

package main

import "io"

type DummyWriter struct {
    buf []byte
}

func (d *DummyWriter) Write(data []byte) (int, error) {
    d.buf = append(d.buf, data...)
    return len(data), nil
}

func FooIFace(w io.Writer, data byte) {
    w.Write([]byte{data})
}

func FooGeneric[T io.Writer](w T, data byte) {
    w.Write([]byte{data})
}

func main() {
    w := &DummyWriter{}
    FooIFace(w, 42)
    FooGeneric(w, 42)
}

Benchmarking FooIFace and FooGeneric shows that FooIFace is faster.

BenchmarkFooIFace-8             207985495                5.386 ns/op
BenchmarkFooGeneric-8           71622663                 14.33 ns/op

Using benchstat, we can compute and compare statistics about benchmarks, and we can see that FooGeneric is ~2.6x slower than FooIFace.

name   old time/op  new time/op   delta
Foo-8  5.22ns ±12%  15.67ns ± 5%  +199.87%  (p=0.000 n=10+9)

To understand the difference, one can inspect the binary in lensm and see the difference in the underlying implementations.

Screenshot of lensm showing compiled code for the generic function

Here, you can see that before calling the actual method (*main.DummyWriter).Write it resolves the interface type for T from the AX register (MOVQ AX, 0X38(SP)), and then resolves the concrete type for the interface io.Writer using LEAQ runtime.types+2030(SB), AX. This adds a lot of runtime overhead as the type is looked up twice before calling the method, once using the type parameter dictionary, and once using the interface's virtual table.

To understand why the interface version is faster, we need to know how some Go optimization passes work.

Inlining

Screenshot of lensm showing the assembly code for the interface function call

You can see that the code corresponding to FooIFace(w, 42) is just a single NOPL instruction, while the actual implementation for FooIFace is inlined into the main.main function.

The Go compiler inlines short functions at the call site itself. You can think of this as copy-pasting the function implementation at the call site. This improves performance and opens doors for other optimizations that are impossible otherwise. One catch is that a function call through an interface prevents inlining. This is what we call "function inlining."

The generated code calls main.(*DummyWriter).Write(SB) directly bypassing interface virtual table method lookup. This is possible due to a compiler optimization that bypasses v-table lookup for inline functions.

At the time of writing, such optimizations are impossible with generic functions due to how generics are implemented in Go. Hence re-writing interface-based APIs with generics may hurt performance.

Conclusion

Go generics are a great addition when it comes to writing parametric polymorphic code. Although one should keep in mind, passing interfaces as type parameters to generic functions will increase the number of lookups involved to resolve the concrete type, affecting performance adversely. A better use-case for generics is to directly write generalised data structures. Removing the need for interface types, and opening up the gates for many optimisations.