The generics implementation of Go 1.18
How Go generics work under the hood
- By Akshit
- ·
- Go
- Engineering
After years of demand, Go 1.18 finally introduced generics. Generics is a shorthand for a function or type that takes types as parameters. Go already supported a form of polymorphism using interfaces, and one of the propositions for generics is that it will eliminate the runtime overhead for interface
method lookups. But this may not be true in every scenario with current Go Generics implementation.
In this blog, we will look at how Go implements parametric polymorphism, and why it may hurt performance if not used properly.
Ways to implement type-parametric polymorphism
There are many ways for a programming language to implement generic reification (a.k.a. parametric polymorphism), but we would mainly focus on two major solutions — "Monomorphisation" and "Boxing".
Monomorphisation is used by programming languages like C++, Rust and D. One of the most straightforward approaches to doing monomorphisation, is copying the code multiple times for the different reification of types — converting polymorphic to a concrete function. This can improve the runtime performance but incurs a compile-time cost and can produce a bloated binary.
In Boxing, "values" are boxed and passed around as references of the polymorphic type, generally using a pointer table, commonly called virtual table. This is how Go's interface
are implemented. This generally produces a smaller binary and requires short compile-times but may have a toll on the runtime performance.
Go generics combines concepts from "monomorphisation" (stenciling) and "boxing" (dynamic dispatch) and is implemented using GCshape
stenciling and dictionaries. This allows Go to have fast compile times and smaller binaries while having generics.
The Go way
Go does monomorphisation (stenciling) for objects with the same GCshape
. It's also how the object appears to the garbage collector, or as the Go design document states "two concrete types are in the same GCshape
grouping if and only if they have the same underlying type or they are both pointer types".
This means that an int
is different from a string
and will generate other code. We can verify this behavior by compiling the following snippet and inspecting the binary in lensm.
package main
func foo[T int | string](data T) {
println(data)
}
func main() {
foo(123)
foo("string")
}
It can be seen that foo(123)
calls main.foo[go.shape.int_0]
and foo("string")
calls main.foo[go.shape.string_0]
.
All shapes are put into a built-in package called go.shape
. The compiler also includes the type parameter index into the shape. The underlying type for the shape of an int
may be go.shape.int_0
or go.shape.int_1
, depending on whether the type parameter is used as the first or second argument.
Also, all pointer types share the same GCshape
, i.e., both *int
and *string
generate the same code and are not monomorphised. This is also true for interfaces. Consider the following snippet:
package main
type Foo interface {
Stuff()
}
type Bar struct{}
func (Bar) Stuff() {}
func foo[T *int | *string](data T) {
println(data)
}
func baz[T Foo](data T) {
println(data)
}
func main() {
a := 123
b := "string"
foo(&a)
foo(&b)
baz(&Bar{})
}
This time both foo(&a)
and foo(&b)
call main.foo[go.shape.*uint8_0]
and baz(&Bar{})
calls main.baz[go.shape.*uint8_0]
. You can also see that main.baz
was not monomorphised for Bar
as all interface
types share the same GCshape
.
Resolving methods for objects of the same GCshape
You might wonder how Go does the dynamic dispatch, i.e., resolve the methods for objects for the same GCshape
. This is where the "dictionary" part of the spec comes in. Take the following snippet as an example.
package main
type Foo interface {
Stuff() string
}
type Bar struct{}
func (Bar) Stuff() string {
return "bar"
}
type Baz struct {
a string // different "shape" than Bar
}
func (b Baz) Stuff() string {
return "baz" + b.a
}
func stuff[T Foo](data T) {
s := data.Stuff()
println(s)
}
func main() {
stuff(&Bar{})
stuff(&Baz{})
}
For x86_64
, you can see that a dictionary is passed into the AX
register. Go passes a dictionary to the callee function with the current generics implementation. The dictionary includes all the metadata to resolve methods and convert them to and from interfaces.
You can see that before calling data.Stuff()
, the runtime resolves the
function address using the dictionary stored in the register AX
. This is
similar to how the interface's "virtual" method calls work.
Dictionaries are statically defined at compile time and named after the fully-qualified name of the generic function and the name of the type parameters. Dictionaries with the same name are de-duplicated by the compiler.
Performance
Re-writing interface-based APIs with generics can be more performant but might even hurt the performance due to method resolution using dictionaries. Consider the following snippet:
package main
import "io"
type DummyWriter struct {
buf []byte
}
func (d *DummyWriter) Write(data []byte) (int, error) {
d.buf = append(d.buf, data...)
return len(data), nil
}
func FooIFace(w io.Writer, data byte) {
w.Write([]byte{data})
}
func FooGeneric[T io.Writer](w T, data byte) {
w.Write([]byte{data})
}
func main() {
w := &DummyWriter{}
FooIFace(w, 42)
FooGeneric(w, 42)
}
Benchmarking FooIFace
and FooGeneric
shows that FooIFace
is faster.
BenchmarkFooIFace-8 207985495 5.386 ns/op
BenchmarkFooGeneric-8 71622663 14.33 ns/op
Using benchstat, we can compute and compare statistics about benchmarks, and we can see that FooGeneric
is ~2.6x slower than FooIFace
.
name old time/op new time/op delta
Foo-8 5.22ns ±12% 15.67ns ± 5% +199.87% (p=0.000 n=10+9)
To understand the difference, one can inspect the binary in lensm
and see the difference in the underlying implementations.
Here, you can see that before calling the actual method (*main.DummyWriter).Write
it resolves the interface type for T
from the AX
register (MOVQ AX, 0X38(SP)
), and then resolves the concrete type for the interface io.Writer
using LEAQ runtime.types+2030(SB), AX
. This adds a lot of runtime overhead as the type is looked up twice before calling the method, once using the type parameter dictionary, and once using the interface's virtual table.
To understand why the interface version is faster, we need to know how some Go optimization passes work.
Inlining
You can see that the code corresponding to FooIFace(w, 42)
is just a single NOPL
instruction, while the actual implementation for FooIFace
is inlined into the main.main
function.
The Go compiler inlines short functions at the call site itself. You can think of this as copy-pasting the function implementation at the call site. This improves performance and opens doors for other optimizations that are impossible otherwise. One catch is that a function call through an interface prevents inlining. This is what we call "function inlining."
The generated code calls main.(*DummyWriter).Write(SB)
directly bypassing interface virtual table method lookup. This is possible due to a compiler optimization that bypasses v-table lookup for inline functions.
At the time of writing, such optimizations are impossible with generic functions due to how generics are implemented in Go. Hence re-writing interface-based APIs with generics may hurt performance.
Conclusion
Go generics are a great addition when it comes to writing parametric polymorphic code. Although one should keep in mind, passing interfaces as type parameters to generic functions will increase the number of lookups involved to resolve the concrete type, affecting performance adversely. A better use-case for generics is to directly write generalised data structures. Removing the need for interface types, and opening up the gates for many optimisations.