Parrot had a JIT implementation that was removed as of Parrot 1.7.0. This document describes our plans for the new implementation.
Here are some general steps forward
- Write a replacement for the old JIT frame builder
- Decide on a standard intermediate language for our Ops (C what we have now, Lorito, or JIT Definitions)
- Write a parser for that intermediate language
- Write code generation backends for the intermediate language to output C Functions and JIT Definitions for at least one JIT engine *
- Write all the ops in the intermediate language *
Note: if our intermediate language for Ops is C, we can ignore at least part of the starred steps, but will have to do a lot more work building a suitable parser.
The specific plan is:
- Design Lorito
- Create a Lorito parser with multiple backends (C code in several formats, LLVM code)
- read LLVM documentation like mad men (whiteknight, help wanted!)
- Make a C code backend for PCT
- Redirect --runcore=jit to the fast core (also redirected switch JIT and CGP JIT cores and their Makefile targets)
- rip out the current JIT system
- A "C Function" is a function written in C that performs some task
- A "JIT Definition Function" is a function written in C that generates the machine code for a function that performs some task
- LIR is "Low-level Intermediate Representation". It is an instruction format that is internal to a compiler for representing the input programs.
A Good JIT Will…
Here are some basic requirements:
- Generate machine code on the fly for all supported platforms
- Generate call frames on the fly for NCI calls, on all supported platforms, for all NCI call signatures
Notice that we could end up with NCI call frame builders that do not require the JIT (will probably require lots of platform-specific assembly, but is possible).
Here are some nice extras that a "Great" JIT will provide:
- Perform optimizations on generated machine code, to a degree specified by the user
- Be able to output generated machine code to an executable or object file
- Be usable in other situations where machine code will be needed, not just tied to the translation of PASM ops
- Be platform agnostic, so far as Parrot's internals are concerned. Not require different behaviors and calls internally for different platforms.
Generating JIT Definition Functions
Every op will need at least two implementations: One for each of the "normal" runcores, and one to build the JIT version at runtime. Ideally, we would be able to produce both from a single specification, without needing to create and maintain two separate implementations of each op. There are two paths to go about doing this:
- Write an OPS parser using PGE. Use miniparrot to parse the .ops files and output both C language and JIT definition code for each op. Compile Parrot using both of these.
- Implement OPS in Lorito. Write a series of conversion tools to convert Lorito ops into C language and JIT definition code for each op. Compile Parrot using both of these.
- Implement all OPS as JIT definitions. Use miniparrot to output the machine code definitions for all ops, and link those into Parrot when we build.
We absolutely do not want to have to maintain multiple separate implementations of each op. We should be able to define them once and use automated tools to generate different output forms for each. These conversions can all be done at build time.
Notice that using Lorito as our implementation target will allow us to translate Parrot ops into any target language we want (machine code, JIT definitions, bytecode formats for other VMs, etc)
JIT Engine Options
Because we can be generating JIT definitions at build time, a smart code generator can be used to generate the code for multiple JIT backends. Here are some options for backends we can support. We should be able to target all of these, depending on which the user has installed on their system.
- GNU Lightning
- Roll our own
The last option is possible, but it is unlikely that we will be able to produce a JIT engine as portable, performant, and robust as an existing solution. It is more likely that we can generate a variety of code generators that target all these JITs with less effort then it would take to write our own that works on all our target platform.
libJIT is developed with the DotGNU project but is a general-purpose, separately-available JIT library that does not require DotGNU. It is released separately from DotGNU.
Pros: Easier to use then LLVM. Faster code compilation then LLVM. Active development team. Cons: Slower generated code then LLVM. No functionality to save object files (an ELF writer is in development)
Last release: ftp://ftp.gnu.org/gnu/dotgnu/libjit/
Also, there is a project branch which uses a linear scan register allocator and other optimizations: http://code.google.com/p/libjit-linear-scan-register-allocator/
LLVM provides JIT compilation for it's own custom opcodes. If we target LLVM specifically, we could use native LLVM ops as our LIR for writing ops in.
Pros: Lots of optimizations across multiple stages. Can generate executables from JIT'd code. Cons: Heavy-weight, is more then just a JIT. Requires translation of PBC to LLVM opcodes. Some developers claim it is difficult to use.
Paolo Bonzini, the maintainer of GNU Lightning, has this to say about it:
- Current status: Mature. It serves its purpose well and since there is no incentive and contributors to write new backends, I'm just fixing bugs. An ARM backend is the only one that is really missing.
- Stable releases: Just use the latest git tree. The code is stable enough that right now using git is better.
- Support policy: Best effort. Write email and I'll do my best to fix bugs and publish the fixes as soon as possible; all I expect you to do is to include the miscompiled code's disassembly (i.e. I won't install Parrot). On the other hand, the code is quite small and easy to understand. Most of the contributors came out with patches of their own, that we then developed together.
- Project Viability: The project is viable if all you care about is x86 (32/64 bit), SPARC (32-bit only), PPC (32-bit only). It is very lightweight and favors fast compilation over fast execution (though I still got 3-4x improvement over a very optimized bytecode interpreter).
"If this is what you want, go for it; I don't think lightning will disappoint you. If you want slow but optimized code generation, register allocation, and the like, I'd say use LLVM instead."
Pros: fast code generation, stable codebase, small codebase so we can make patches if needed. Cons: slow generated code execution, limited platform support and not enough developer momentum to add new platforms.
nanoJIT is being developed separately by Mozilla and Adobe. The two teams are currently attempting a merger into a single codebase, at which point nanoJIT should be released as a separate project.
From one of the developers:
It's not available separately yet as Mozilla and Adobe have diverged their implementations. They are currently working on merging their changes and coming up with a version of nanojit that will then become a separate project that both use. I currently track and use the Mozilla version. Yes, I think integrating it into Parrot would be do-able and probably not all that difficult. It's pretty simple to use. The nanojit development team is currently split between Mozilla and Adobe so it's hard to gauge. There's a few people who work on it from both companies. Because there is no separate project yet there are no release/support policies.
Pros: Fast. Backed by developers from Mozilla and Adobe. Cons: Not ready for general consumption yet.
Allison's thoughts while reviewing unladen-swallow
- Basic idea is to dynamically generate LLVM IR from a PIR sub, then store the compiled code object within the sub to run instead of following the opcode runloop.
- Each opcode has a LIR "template", which can be combined in series (with substitutions for literals, variables, etc) to generate the full LIR body.
- Some callsites in the PIR code can be JIT-ed to direct C function calls, completely eliminating the PCC overhead.
- Added win for caching the JIT-ed subs between runs of the interpreter (in LIR form if necessary, or compiled form if possible), as the big cost is compiling them.