Ticket #616 (new roadmap)

Opened 6 years ago

Last modified 4 years ago

compacting/copying gc

Reported by: allison Owned by:
Priority: normal Milestone: 3.0
Component: GC Version:
Severity: medium Keywords:
Cc: Language:
Patch status: Platform:

Description

Implement a compacting/copying garbage collection module for Parrot. This will require some changes to the Parrot core to make it possible to move objects.

Change History

Changed 5 years ago by mikehh

Copying Garbage Collector

The basic concept of a Copying Collector is that the heap is divided into two equal semi-spaces, one of which contains the current data and the other obsolete data. The Garbage Collection phase starts by flipping the roles of the two semi-spaces. The collector then scans the active data in the old semi-space, Fromspace, copying each live cell into the new semi-space, Tospace. On completion of the collection phase, Tospace contains all the active data and Fromspace is effectively abandoned.

One major advantage of this is that the active data is compacted at the bottom of Tospace. Compacting collectors can allocate space much more efficiently than collectors in which fragmentation is a problem.

Allocation costs are extremely low. The out-of-space check is a simple pointer comparison. New memory is acquired by simply incrementing the free-space pointer. Fragmentation is eliminated by simply copying the active data into the bottom of Tospace.

The chief drawback of copying collectors is the need to divide the available memory into two semi-spaces. As the residency of a program increases the performance of the collector degrades, as less free space is recovered and collections become more frequent.

The basic algorithm:

init() =
    Tospace = Heap_bottom
    space_size - Heap_size / 2
    top_of_space = Tospace + space_size
    Fromspace = top_of_space + 1
    free = Tospace

New(n) =
    if free + n > top_of_spece
        flip()
    if free + n > top_of_space
        abort "Memory exhausted"
    newcell = free
    free = free + n
    return newcell

flip() =
    Fromspace, Tospace = Tospace, Fromspace
    top_of_space = Tospace + space_size
    free = Tospace
    for R in roots
        R = copy(R)


-- P points to a word not a cell
copy(P) =
    if atomic(P) or P == nil        -- P is not a pointer
        return P

    if not forwarded(P)
        n = size(P)
        P' = free                   -- reserve space in Topace
        free = free + n
        temp = P[0]                 -- field 0 will hold the forwarding address
        forwardingaddress[P] = P'
        P'[0] = copy(temp)
        for i = 1 to n-1            -- copy each field of P into P'
            P'[i] = copy(P[i])

    return forwarding_address(P)

First, the roles of Tospace and Fromspace ar swapped by a flip, which resets the variables Tospace, Fromspace and top_of_space. Each cell reachable from a root is then copied from Fromspace to Tospace. For clarity, a simple recursive algorithm is used, (more elegant iterative algorithms are available). Copy(P) scavenges the fields of the cell pointed to by P. Care has to be taken when copying data structures to ensure that the topology of the shared data structures is preserved. Failure to do this would lead to multiple copies of shared objects, which at best would increase heap residency of the program but may also break the user program (for example, if it updated one copy of a cell, but then read the value from another). Copying cyclic data structures without preserving sharing would also require a lot of room!

Copying collectors preserve sharing by leaving a forwarding address in the Fromspace object when it is copied. The forwarding address is the address of the copy in Tospace. When a cell in Fromspace is visited, copy checks to see if it has already been copied, if it has the forwarding address is returned, otherwise memory is reserved for the copy in Tospace. In this recursive copying algorithm, the forwarding address is set to point to this reserved memory before the constituent fields of the object are copied -- this ensures termination and that sharing is preserved.

The forwarding address might be held in its own field in the cell. More generally it can be written over the first word in the cell provided that the original value of the cell is saved beforehand. In the above it is assumed that the forwarding address field in cell P is P[0], and forwarding address(P) and P[0] are used interchangeably.

extracted from: Garbage Collection: Algotithms for Automatic Dynamic Memory Management by Richard Jones and Rafael Sims  http://www.cs.kent.ac.uk/people/staff/rej/gc.html

Changed 5 years ago by mikehh

the second line of init() should be

space_size = Heap_size / 2

Changed 5 years ago by whiteknight

I have some misgivings about the utility of a copying collector in Parrot. In a "normal" memory allocation system objects in memory can be allocated in random-sized chunks. Think about the way a program uses malloc(), to get various chunks that it needs for different purposes. However in Parrot, all our PMC objects are fixed-size, and each PMC type contains a fixed-size attributes structure. Some PMC types (Arrays) and our strings use arbitrary-sized buffers, but most do not.

A copying collector helps to reduce memory fragmentation by copying data from the Fromspace into the bottom of the ToSpace to help free up large contiguous chunks of memory from which large buffers can be allocated. In Parrot for PMC allocations, this isn't necessary: Any free chunk is large enough to hold any PMC. We can allocate very efficiently off a free list and don't need to worry about PMC pools becoming fragmented. It would be a huge waste to implement copying/compacting for our PMC pools.

Where we do run into a problem is in the string pools and storage allocations for our Array PMC types. These buffers are going to need to be compacted during collections to help improve allocation performance. This leads us to a two-stage GC: one that is non-copying for PMCs and fixed-size objects, and one that is compacting for the buffers.

What I think we need to do is make a clear separation in the GC subsystem to show the two different parts: the fixed-size PMC/STRING header manager, and the arbitrary-sized buffer manager. These things should be pluggable independently, and one of the cores for the arbitrary-sized buffer manager should implement a copying/compacting collector.

Changed 5 years ago by mikehh

I agree with the above comment in essence, however one of the major advantages of copying collectors is that the collector compacts the heap discarding short-lived objects. Techniques are available so that long-lived objects can be set up in a 'different' space and handled in a generational or incremental collector. I have put these notes on the wiki as CopyingGarbageCollector and also CheneysCopyingCollector and I will expand them as well as adding further notes later.

Changed 4 years ago by coke

  • component changed from none to GC
Note: See TracTickets for help on using tickets.