Copying Garbage Collector

(see TT #616)

The basic concept of a Copying Collector is that the heap is divided into two equal semi-spaces, one of which contains the current data and the other obsolete data. The Garbage Collection phase starts by flipping the roles of the two semi-spaces. The collector then scans the active data in the old semi-space, Fromspace, copying each live cell into the new semi-space, Tospace. On completion of the collection phase, Tospace contains all the active data and Fromspace is effectively abandoned.

One major advantage of this is that the active data is compacted at the bottom of Tospace. Compacting collectors can allocate space much more efficiently than collectors in which fragmentation is a problem.

Allocation costs are extremely low. The out-of-space check is a simple pointer comparison. New memory is acquired by simply incrementing the free-space pointer. Fragmentation is eliminated by simply copying the active data into the bottom of Tospace.

The chief drawback of copying collectors is the need to divide the available memory into two semi-spaces. As the residency of a program increases the performance of the collector degrades, as less free space is recovered and collections become more frequent.

The basic Copying algorithm:

init() =
    Tospace = Heap_bottom
    space_size = Heap_size / 2
    top_of_space = Tospace + space_size
    Fromspace = top_of_space + 1
    free = Tospace

New(n) =
    if free + n > top_of_spece
    if free + n > top_of_space
        abort "Memory exhausted"
    newcell = free
    free = free + n
    return newcell

flip() =
    Fromspace, Tospace = Tospace, Fromspace
    top_of_space = Tospace + space_size
    free = Tospace
    for R in roots
        R = copy(R)

-- P points to a word not a cell
copy(P) =
    if atomic(P) or P == nil        -- P is not a pointer
        return P

    if not forwarded(P)
        n = size(P)
        P' = free                   -- reserve space in Topace
        free = free + n
        temp = P[0]                 -- field 0 will hold the forwarding address
        forwardingaddress(P) = P'
        P'[0] = copy(temp)
        for i = 1 to n-1            -- copy each field of P into P'
            P'[i] = copy(P[i])

    return forwarding_address(P)

First, the roles of Tospace and Fromspace ar swapped by a flip, which resets the variables Tospace, Fromspace and top_of_space. Each cell reachable from a root is then copied from Fromspace to Tospace. For clarity, a simple recursive algorithm is used, (more elegant iterative algorithms are available see CheneysCopyingCollector). copy(P) scavenges the fields of the cell pointed to by P. Care has to be taken when copying data structures to ensure that the topology of the shared data structures is preserved. Failure to do this would lead to multiple copies of shared objects, which at best would increase heap residency of the program but may also break the user program (for example, if it updated one copy of a cell, but then read the value from another). Copying cyclic data structures without preserving sharing would also require a lot of room!

Copying collectors preserve sharing by leaving a forwarding address in the Fromspace object when it is copied. The forwarding address is the address of the copy in Tospace. When a cell in Fromspace is visited, copy checks to see if it has already been copied, if it has the forwarding address is returned, otherwise memory is reserved for the copy in Tospace. In this recursive copying algorithm, the forwarding address is set to point to this reserved memory before the constituent fields of the object are copied -- this ensures termination and that sharing is preserved.

The forwarding address might be held in its own field in the cell. More generally it can be written over the first word in the cell provided that the original value of the cell is saved beforehand. In the above it is assumed that the forwarding address field in cell P is P[0], and forwarding address(P) and P[0] are used interchangeably.

extracted from:
Garbage Collection: Algorithms for Automatic Dynamic Memory Management
by Richard Jones and Rafael Sims