PIR Tutorial

Introduction

Welcome to the beginners' tutorial for Parrot Intermediate Representation (PIR)! So, you want to program the most exciting and new virtual machine for dynamic languages eh? This is the place to start! Although Parrot is still under development, it can already solve a lot of your programming problems. As time progresses, it will get even better. Moreover, because Parrot is currently not compiled with optimizations, it will get faster too! It should be noted at this point that the syntax of Parrot's internal language is not set in stone, and may change at some points. However, if you stick to the syntax as described in this tutorial, you should be quite safe.

If you're comfortable with Backus–Naur Form (BNF), a format for the grammar of context-free languages, you may take a look at the grammar of PIR in languages/PIR. Note that this is not the official implementation, it is an attempt to be as close as possible. If you don't know what BNF is, you may forget about it :-).

Now, let's get started!

PIR Basics

Your first Parrot program

As always, we start with the simplest program imaginable:

  .sub main
    print "Hello Parrot!\n"
  .end

Save this program to a file called hello.pir. Then, to run this program, type this on the command line (assuming you successfully compiled Parrot):

  parrot hello.pir

And the output will be:

  Hello Parrot!

That was not too hard, now was it? Before we continue to more complex examples, let's first analyze what happened. The first line in the file is .sub main. This indicates that we're defining a subroutine that goes by the name main. Note that it is not necessary to name your subroutine like this, even if it's the only subroutine. The name main does not indicate execution will start at that subroutine, like in C. In PIR, execution will start at the top-most defined subroutine in the file, not matter what its name is. (There are ways to change this, though, but we will forget that for now. More on that later). As you can see on the third line, the subroutine is closed with the .end directive.

In between these subroutine directives, you can define the subroutine body, which consists of PIR or Parrot assembly (PASM) instructions. In this simple program we just sticked to the print instruction. It takes one parameter that can be of any type, as long as it is something (i.e. it is not undefined or null). Please note that all instruction should be in between a .sub/.end pair.

More instructions

Parrot has a lot of instructions. I mean, a lot. This tutorial will not discuss all of them, but instead we will discuss them as the need arises for them. Now, we will first see how to do some calculations so you can do some useful stuff. We'll do it step by step and explain things as they pass by. (Do note however, this is not a tutorial on assembly programming, so some knowledge of registers etc. is expected).

Storing things

Before we continue, we need to explain some details on how Parrot stores numbers, strings and objects. As Parrot is a register-based virtual machine (as opposed to stack-based VMs like the Java VM), you store things in registers. There are 4 types: registers for storing integers (I registers), floating-point numbers (N registers), strings (S registers) and objects (P registers). So, let's consider the case we need to store some things, we could do it like this:

  I0  = 42              # store 42 in integer register 0
  N10 = 3.14            # store 3.14 in numeric register 10
  S20 = "Hello world!"  # store this string in string register 20
  P30 = new .String     # create a new String object in PMC register 30. See "Where to read further?" for more information on Strings.

Above we used Parrot registers, and there's only a limited number of them. Instead, it's better to use temporary registers; they look almost the same as registers, but have a $ prefix. They can be considered as variables that don't need any declaration (and you can use as many of them as you need). Some examples:

  $I0 = 42
  $S9999999 = "Hi" # use *any* register number

However, if you like to name things by their name, you might consider using named temporary variables. These, however, do need declaration. This is done by stating:

  .local int answer
  .local num PI, e
  answer = 42
  PI = 3.14
  e = 2.7

This declares some temporary variables. Although this declares an integer and some numeric variables, you could use any of the following types:

  • int - declare an integer variable
  • num - declare a floating-point number variable
  • string - declare a string variable
  • pmc - declare a Parrot Magic Cookie (PMC) variable

You might wonder what the heck is a Parrot Magic Cookie. This is where Parrot's Magic comes in. In fact, it's so magical, there's a separate document written on that. Have a look at the section Where to read further?.

Now we know how to store numbers and strings, let's do some operations on those values.

Calculating things

Calculating things is as trivial as you might expect. We'll give some full examples below, so you can copy+paste the code and run it yourself:

ABC formula

  .sub foo
    .local num a, b, c, det

    # give a, b and c some value for now; later specify them as parameters
    a = 2
    b = -3
    c = -2

    # calculate -b and b squared.
    $N0 = -b
    $N1 = b * b

    # calculate 4ac
    $N2 = 4 * a
    $N2 = $N2 * c
    
    $N3 = 2 * a
    det = $N1 - $N2
    $N4 = sqrt det

    .local num x1, x2   
    x1 = $N0 + $N4
    x1 = x1 / $N3

    x2 = $N0 - $N4
    x2 /= $N3      # fancy way of saying x2 = x2 / $N3, but more efficient

    print "Answers to ABC formula are:\n"
    print "x1 = "
    print x1
    print "\nx2 = "
    print x2
    print "\n"
  .end

Of course, as Parrot offers operations at a more abstract level than hardware processors, you can also do more fancy things like manipulating strings, like in the example below:

  .sub joe
    .local string name
    name = " Joe!"
    $S0 = "Hi"
    $S1 = $S0 . name
    $S1 .= "\n"  # extend $S1 with "\n"
    print $S1
  .end

The dot is short in PIR for the concat operation. It takes 2 strings and concatenates them. Just as the assignment operations in the ABC formula example (x2 /= $N3), this can also be done with strings using the .= operator.

As mentioned, Parrot has many instructions. This tutorial will not list all of them, but instead you could take a look at  the list of ops by category.

More on subroutines

This section will discuss a little bit more on subroutines so you can do some useful stuff. Although there's much more to subroutines, we'll postpone that to a later section.

Passing parameters

In order to pass parameters to a sub, you'll need to define these parameters. This is easy:

  .sub foo
    .param int n
    .param string message
    # do something useful
  .end

Sometimes you want a subroutine that might take a parameter, depending on the situation. In that case you'd want to use optional parameters.

Returning values

If you remember the example in which we implemented the ABC formula, you could see that we calculated them, but only printed them. Usually, you'd like to have some subroutine calculate something and then return the answers. Instead of printing them, you could return the answers, as shown in this code snippet:

  .sub abc
    .local num x1, x2
    # do some calculations
    .return (x1, x2)
  .end

Invoking subroutines

Now you know how to define parameters and return values, it's time to explore how to invoke your subroutine. Some examples:

  .sub main
    # invoke 'foo' without parameters, no return values
    foo()
    
    # invoke 'bar' with parameters $I0, 42, and "hi", no return values
    $N0 = 3.14
    bar($I0, 42, "hi")

    # invoke 'baz' with parameters $N2, "hello yourself", and return values
    .local int a
    .local num b
    .local string c
    $N2 = 2.7
    (a, b, c) = baz($N2, "hello yourself")

  .end

  .sub foo
    print "Foo!\n"
  .end

  .sub bar
    .param num i
    .param int answer
    .param string message
    print "Bar!\n"
    print i
    print "\n"
    print answer
    print "\n"
    print message
  .end

  .sub baz
    .param num e
    .param string msg
    print "Baz!\n"
    print e
    print "\n"
    print msg
    .return (1000, 1.23, "hi from baz") 
  .end

Controlling the flow of your program

PIR as a few instructions to control the flow of your program. This section describes them.

Goto statements

The most basic one is of course the goto instruction. It's very simple:

    ...
    goto L1
  L1:
    print "hi!\n"
    ...

Although the use of goto is not advised in high-level languages (HLLs), in PIR you are hardly able to write useful programs without it. Besides, PIR is assembly language after all, so that's okay.

If statements

PIR has a built-in if statement. It can take three forms:

Evaluate an atomic expression:

    ...
    .local int x
    x = 1

    if x goto L1
    ...
  L1:

Evaluate a binary expression:

  ...
  .local int x
  x = 1
  
  if x == 2 goto L2
  ...

Evaluate an object expression:

  ...
  .local pmc obj
  
  if null obj goto L3
  ...

If obj is null, then the execution engine will continue at label L3. More on PMCs soon.

Unless statements

While an if statement is useful, it is sometimes more efficient to use the unless instruction. It's the opposite of the if statement, and jumps to the specified label unless its argument is true. Its format is exactly the same as the if statement, except the word if is replaced by unless. So you should be able to figure out how to write the unless instruction. Loop constructions

PIR has no built-in while or for statement, but implementing a loop can be easily done using the if statement, some gotos and a couple of labels, like this:

    ...
    .local int i
    i = 0
  loop_begin:
    if i >= 10 goto loop_end
    print i
    print "\n"
    i += 1
    goto loop_begin
  loop_end:
    ...

this loop will print the numbers 0 to (but not including) 10 to the screen.

Splitting your program into several files

Sometimes it's easier to split up your program into multiple files. There are two ways to do this:

  • use the .include directive
  • compile the PIR files separately and load them using load_bytecode

The first way is the simplest. Use the .include directive at a point in your main file where you'd like to pull in the contents of another file. The contents of the specified file is read and replaces the .include directive, just as the #include directive does in the C preprocessor.

An example will show what happens:

contents of the main file:

  .sub main
    foo()
  .end

  .include "foolib.pir"

And the file foolib.pir contains:

  .sub foo
    print "foo!\n"
  .end

After processing the .include directive, the input to the PIR compiler looks like this:

  .sub main
    foo()
  .end

  .sub foo
    print "foo!\n"
  .end

Where to read further?

Take a look at these documents:

  •  docs/glossary.pod - contains explanations of some often used terms
  • docs/art/pp001-intro.pod - a general introduction
  • docs/art/pp002-pmc.pod - a good introduction to PMCs
  • docs/art/pp003-oop.pod - an introduction to Object Oriented Programming in Parrot
  • docs/imcc/ - all files in this directory
  •  docs/compiler_faq.pod - a document describing how to implement various language constructs in PIR
  •  docs/pdds/pdd03_calling_conventions.pod - the Parrot Design Document on Parrot's calling conventions
  •  docs/pdds/pdd20_lexical_vars.pod - the Parrot Design Document on Lexical variables
  • languages/PIR/docs/pirgrammar.pod - the grammar of PIR as implemented using PGE (matches about 90% of PIR)
  • compilers/pirc - A top-down recursive descent parser for PIR, with embedded specification
  •  Parrot Docs - all kinds of files on particular subjects