PCCPerformanceImprovements – Parrot

Context Navigation

The two biggest expenses in PCC right now are creating unnecessary CallSignatures and allocating memory for register sets and the like.

Most of our call signatures we know at the point of PIR compilation: call this function, passing these specific registers, receiving these values back in those specific registers. From a PBC point of view, if signatures are immutable, we can cache these signatures in bytecode once and use the frozen signatures for all calls. Because we have constant caching, we can use the same signature PMC for multiple calls with the same logical signature. Likewise, any code which uses the C API to make calls into Parrot can create a single signature PMC for each logical signature.

This is similar to what NCI does, if you like prior art.

To make this work, we need to separate the mutable portion of CallSignature from the immutable portion. The immutable portion should describe:

the parameter information (number, type, and any flags such as slurpy or flat)
the return parameter information (ditto)

The mutable portion should describe:

the calling context (a reference to the caller's caller, a reference to the signature)
the registers themselves
the destination PC
active exception handlers
(likely) the storage for the callee's registers

In effect, the mutable portion should represent enough information to serve as a continuation. If this data structure supports cloning, we can even treat it simultaneously as a continuation and return continuation.

speculation

For additional fun, we could consider *avoiding* the copying of values between registers during PCC with a smarter register allocation strategy. Assume the code:

.sub f
    .local pmc x, y, z
    ...
    y = g( x, z )
.end

.sub g
    .param pmc x
    .param pmx y

    .local pmc z
    z = x + y
    .return( z )
.end

If we mandate that all registers of a specific type used as arguments to an invocation must be in successive registers, f() could desugar to:

.sub f
    P3 = g( P1, P2 )
.end

... and the register set passed in as parameters to g() could merely point to the appropriate place to *start* finding these linear registers. In other words, instead of copying registers into the register set for g() and only then being able to use them, g() could operate on its caller's registers directly:

.sub g
    .alias pmc R1, P1
    .alias pmc R2, P2
    R3 = P1 + P2
    .return()
.end

It's not entirely clear how this would work in the case of complex return handling (such as slurpy/flat -- though named is fairly simple), but we can resolve this at compile time and avoid calculating things we need to know. It's also not obvious how this would work with complex continuations. We would also have to revise how we refer to registers in the caller context, but that's doable as well.

Download in other formats:

Plain Text