Microprocessor analogy

When you look at modern microprocessors, you see these complex instruction sets like x86. In traditional machine code, each bit in a machine code opcode travels down wires and acts as control signals to the actual circuit hardware. The problem with this is that the opcode structure becomes intimately tied to the architecture of the hardware: If you change one, you need to change the other (because different bits now need to travel down different wires to different circuits), and then things built on top break: Your assemblers now need to output new machine code, your pre-existing binary executables are no longer valid, etc.

In addition to this, some machine code words are very complex. x86 has hundreds of operations, and each operation can use a large number of addressing modes. Combine these together and you get hundreds, maybe millions of permutations. A single op, depending on the types of the sources and the destinations can have a number of distinct behaviors. add ax, bx is pretty far different from add [ax], bx, which is different from add ax, [bx], etc. Having to decide what behaviors to activate for each opcode from a large set of opcodes is hard to do.


Reasons for L1

Consider the idea now of converting those "high-level" machine code words into "low-level" microcode where each microcode operation is small, simple, atomic and fast. Because each of these lower level instructions is more simple and atomic, they are easier to use in automated translation and with JIT.

Right now we have a lot of our critical code paths are written both in C and PIR. It's immensely expensive moving data between PIR registers and the C system stack to call functions in each language with their very different calling conventions, and we are incurring that cost very frequently. In addition, we lose out on lots of opportunities because of the boundary between the two: we cannot easily optimize, we cannot easily inline functions, we cannot easily profile (we can typically profile C or profile PIR, not both at once), etc.

L1 is an abstraction that allows us to insert a new abstraction layer into Parrot to help reduce algorithmic complexity on the global level and give us more opportunities for analysis and optimization than are possible with our current combination of PIR and C. Other projects like TraceMonkey are using cool optimization techniques such as polymorphic inline caching (which Parrot has a basic and messy implementation of now, which has been deprecated), trace-based JIT and context threading. These optimizations together are producing some stunning benchmark results for leading-edge javascript interpreters.


What L1 will look like

So the question invariably arises is what L1 will look like to the average programmer? Will it look like some grizzled assembly language? Will we be forced to reimplement large swaths of Parrot in some freaky home-brewed assembly? The answer, I think we are all pleased to consider now, is no. We have skills, background and tools for programming language design and we have several existing parsers for languages that we could utilize. One specific example of things that we could do, as mentioned by pmichaud, is to write core PMC types in NQP. It would have to have some extensions to allow direct register addressing, definition of vtables, and definition of attributes which may be references to arbitrary C data structures. Also, it would have to have a new backend which outputs L1 bytecode instead of PIR->PBC as PCT normally does now. We also have a new language in development by Austin Hastings called "Close", which is a very low-level language based on C that will eventually have all the capability of PIR (and therefore L1), but with familiar C-like syntax.

What this means is that we won't need a human-editable form of L1 bytecode because we can write it in NQP, Close, or even other high-level languages which currently run on Parrot (although we will want to find one with a small runtime, or even a variant with no runtime). The short answer: L1 looks like any language we want it to look like (but probably Perl6-like in NQP or C-like in Close).


Next Steps

So what are the next steps? Well, pmichaud suggests that a good way forward is to start prototyping some key core PMC types in NQP now, so we can get a feel for what syntax and semantics we will actually want and need. Once we know what our needs are, we can start modifying NQP to support our new syntax for dealing with these things.

With the syntax requirements set, we can then modify the NQP parser to output normal C for now, and eventually redo the backend to output L1 instead.

chromatic suggests that with our current plug-in architecture we could create a library of L1 dynops, and create an NQP backend to target those. Once we have core PMCs as written in NQP executing only on L1 dynops, we can start the rest of the conversion process.

Here is a quick checklist:

  1. Write core PMC types in NQP, making careful note of what syntax we need to add to support all existing operations. Good candidates for this prototyping would be Integer, ResizablePMCArray, and Hash (which is soon to be renamed to AssociativePMCArray).
  2. Write existing PIR opcodes in NQP, making careful note of what syntax we need to add to support existing operations.
  3. Write a dynop lib for L1 ops
  4. Modify the NQP compiler to have all the necessary syntax from steps #1 and #2, and to output only those L1 opcodes created in step #3.
  5. Modify IMCC (or PIRC, depending on time frame) to output L1 or PBC, depending on some kind of switch
  6. Move L1 opcodes into the core instead of being a dynoplib.
  7. Make the final switch. All ops and PMCs get compiled from NQP into L1 during the normal build process (possibly using a bootstrapping step)
  8. Optimize like rabid ferrets.