Starlight’s new default GC: MiGC

This post begins with a brief background about GC (garbage collection). Then it describes the MiGC algorithm in detail on.

Introduction

Garbage collection is known to be expensive. In the worst case, for the collector to free a single object, it needs to scan the entire heap to ensure that no objects have any references to the one it wants to free. To solve this Starlight at the beginning had Immix GC algorithm which allowed to collect large heaps in a few milliseconds.

The problem with Immix approach is that it is very hard to parallelize. To solve this there is a few approaches: incremental or concurrent collectors which use series of write barriers to keep track of objects when marking but these write barriers introduce overhead and UB if not used correctly. To not solve all these problems with write barriers or bunch of locks MiGC was developed. Its name comes from mimalloc library which is used internally for memory management.

The goal of MiGC is to be fast and easy to use. Because MiGC will be default GC, we also want it to be as efficent as Immix in terms of cache locality, speed and memory footprint.

MiGC algorithm

The MiGC collector combines:

Marking: The collector marks objects as it finds references to them. Objects not marked are deleted. Most of the collector’s time is spent visiting objects to find references to other objects.
Constraints: The collector allows the runtime to supply additional constraints on when objects should be marked, to support custom object lifetime rules.
Parallelism: Marking is parallelized on up to 4 logical CPUs. (Higher thread count works too but it does not give reasonable performance boost)
Conservatism: The collector scans the stack and registers conservatively, that is, checking each word to see if it is in the bounds of some object and then marking it if it is. This means that all of the Rust and just-in-time (JIT) compiler-generated code in our system can store heap pointers in local variables without any hassles.

Efficent Mark-Sweep

This section gives an overview of how our mark-sweep collector works: MiGC uses a mimalloc for heap. Bytecode compiler, runtime and JIT compiler all introduce custom marking constraints, which the GC executes before marking. Marking is done in parallel to maximize throughput. The collector uses conservative stack scanning to easily use it with Rust and JIT compiled code.

mimalloc for heap

mimalloc was choosen for heap because it is: fast, easy to use, small, and has all necessary API for conservative stack scanning and sweeping.

free list sharding: instead of one big free list (per size class) mimalloc has many smaller list per “mimalloc page” which reduce fragmentation and increases locality – things that are allocated close in time get allocated close in memory. (A mimalloc page contains blocks of one size class and is usually 64KiB on a 64-bit system).
eager page reset: when a “page” becomes empty (with increased chance due to free list sharding) the memory is marked to the OS as unused (“reset” or “purged”) reducing (real) memory pressure and fragmentation, especially in long running programs.
secure: mimalloc can be built in secure mode, adding guard pages, randomized allocation, encrypted free lists, etc. to protect against various heap vulnerabilities.
first-class heaps: efficiently create and use multiple heaps to allocate across different regions. A heap can be destroyed at once instead of deallocating each object separately.

Sweeping is performed right after mark using mi_heap_visit_blocks function which allows us to iterate all blocks in mimalloc heap and free all unmarked objects.

Constraint-Based Marking

Garbage collection is ordinarily a graph search problem and the heap is ordinarily just a graph: the roots are the local variables, their values are directional edges that point to objects, and those objects have fields that each create edges to some other objects. Starlight’s garbage collector also allows the users of API, compiler, and other parts of the runtime to install constraint callbacks. These constraints are allowd to query which objects are marked and they are allowed to mark objects. All constraints is executed right before marking phase. Running constraints is usually the fastest part in GC cycle. Most of the time in GC is spent in performing a depth-first search over marked objects.

Parallel Marking

Marking takes up most of the GC’s time. The collector has queue of objects to mark on each marking thread. Each marking thread has its own worklist of objects to visit, and ordinarily it runs a graph search algorithm that only sees this worklist. Using local worklist means avoiding synchronizations most of the time. Each marking thread will check global worklist under these condition:

It runs out of work. When thread runs out of work it will try to steal work from other marker threads or from global worklist.
Every 256 objects visited, the marker thread will push half of local worklist to global worklist.

Marking in parallel means having to synchronize marking. Our marking algorithm uses lock-free atomic compare-and-swap instruction to set mark color (white,grey or black).

Conservative roots

Garbage collection begins by looking at global runtime and local variables state to figure out initial set of marked objects. Introspecting the values of local variables is tricky. We could use special API to handle objects on stack correctly but this would require us to write clippy or rustc plugin lint pass like Servo did. This is not only harder to debug but also makes all accesses to GC objects through double indirection. So Starlight just uses Rust local variables for pointers to the garbage collector’s heap, but C-like languages provide no facility for precisely introspeccting values of specific variables of arbitrary stack frames. Starlight solves this problem by marking heap objects conservatively when scanning roots. We use mimalloc extended API to check if pointer is indeed allocated by mimalloc and is in GC heap.

We view this as very important optimization. Without conservative root scanning, Rust code would have to use some API to notify the collector about what objects it points to. Conservative root scanning means not having to do any of that work.

Mark-Sweep Summary

MiGC implements complex notions of reachability via arbitrary constraint callbacks and allows Rust,C, and JITed code to manipulate objects directly without indirection. For performance, it parallelizes marking.

Algorithm Summary

MiGC is an improvement to Starlight’s previous GC and retains most of the things of Immix GC. MiGC combines simple tri-color mark&sweep and parallel marking to get a high throughput GC. MiGC can be used with any Rust type that implements Trace and GcCell trait which makes it competitor to other Rust GC libraries like rust-gc.