One of the bottlenecks in the current mark and sweep GC is the sweep portion. We have to walk through every memory pool in every arena to find every dead GCable. The more GCables in the system, the longer this process takes. In the worst case, marking requires us to touch every GCable in the system, that's slightly worse than O(n) (plus the overhead of recursing through the object graph and pruning out cycles). In this case, all GCables are live, but we still have to sweep all of the pools in another slightly worse than O(n) operation -- to say nothing of the memory cache performance.
We can avoid that sweep if we have some mechanism to segregate live from dead GCables. (This also gets us further along to a generational system.)
Every time we allocate a new pool, allocate additional space to store three pointers:
- a pointer to the GCable
- a previous pointer
- a next pointer
... that is, a doubly-linked list.
The GC structure needs to contain four additional pointers:
- a pointer to the head of the LIVE list
- a pointer to the head of the FRESH list
- a pointer to the head of the DEAD list
- a pointer to the head of the MAYBE list
Every time we grab a fresh new GCable, we add it to the the FRESH list.
At the start of the mark phase, we move the FRESH list to the MAYBE list and start a new FRESH list. This avoids the freshly-allocated-but-not-marked problem.
During the mark phase, we append every live GCable to the LIVE list, if it's in the MAYBE list. (We'll have to manage the existing flags carefully to avoid adding and re-adding; perhaps some toggle will help. "Last time, 0 meant live. This time, 1 means live.")
By the end of the mark phase, all live GCables should be in the LIVE list. Everything in the MAYBE list should now be DEAD, so we can append it to that list. We can walk the DEAD list and perform any custom destruction then.
The benefit to this system is that the O(n) operation of the sweep (which tends to hew to the full extent of n) of sweeping becomes much more efficient. The cost of allocating an object is slightly higher due to the need to manipulate pointers in the linked lists, but the cost of freeing dead objects is much cheaper.
We may be able to find smaller and cheaper data structures than the doubly linked list, though the cheapness of splicing elements and lists together is compelling.
- Create branch
- Copy mark/sweep implementation into a new file (gc_sf.c, perhaps)
- Change function names, especially in GC struct, in the new file
- Add the pointers for the new lists to the SF GC struct
- Change arena allocation to allocate space for each list in each new arena (because a GCable can only be on one list at a time, we need two pointers worth of data for each GCable -- allocating this in one chunk at the start of the arena makes them relatively contiguous in memory, so it should fit in the cache much better). Let's call this a GC_LL struct.
- Add a function to calculate the correct GC_LL struct for any given GCable, given a pointer to the GCable (this isn't trivial, but it can be cheap)
- Change the get_free_header function in gc_sf.c to put the freshly-allocated header on the MAYBE list
- Change the mark functions to splice a given GCable out of its current list and into the LIVE list
- Change the beginning of mark function in gc_sf.c to point to a new MAYBE list
- Change the end of mark function in gc_sf.c to append the MAYBE list to the FREE list (or prepend?)
- Change the "get a new free header" function to perform any active destruction (though for an optimization, if a PMC header already has sufficient PMC_data space allocated, it's unnecessary to free it)
- Add a configuration option to enable the new sweep-free GC
- Fix bugs