Developers implementing HLLs on top of Parrot need a way to unit test their code. One source of difficulty lies in ensuring that their parser emits correct PAST trees, and that the trees are consistent. TreeUnit is a tool for making assertions about the trees.

Necessary Features

Maybe not in v1, but here are some things TreeUnit needs:

Scaffolding

In a C program, you can't just compile "a = 1;" and see what comes back. Instead, you need to compile something like

void main(void) {
   int a;

   a = 1;
}

So the resulting PAST tree won't be completely independent. It will be nested inside the "scaffolding" that had to be implemented to drive the parser to a state where the expression could be parsed. TreeUnit will provide some kind of search or match feature so the scaffolding can be ignored.

Background Noise

A PAST tree node can have a whole collection of attributes set, many of which are completely insignificant.

    [10] => PMC 'PAST;Block'  {
        <blocktype> => "declaration"
        <name> => "_runner"
        <lstype> => "function body"
        <hll> => \past
        <namespace> => ResizablePMCArray (size:1) [
            "Z"
        ]
        <source> => \past
        <pos> => 1309
        <is_function> => 1
        <params> => "uplifted"
        <pirflags> => " :init :load"
        <rtype> => undef
        <scope> => \past
        <is_rooted> => 0
        <adverbs> => Hash {
            "init" => 1,
            "load" => 1
        }
        [0] => PMC 'PAST;Stmts'  {
            <name> => "local variables"
        }
        [1] => PMC 'PAST;Stmts'  {
            <source> => \past
            <pos> => 1336
            <name> => "compound_stmt"
            [0] => PMC 'PAST;Op'  {
                <pasttype> => "call"
                <source> => \past
                <pos> => 1344
                [0] => PMC 'PAST;Var'  {
                    <source> => \past
                    <pos> => 1338
                    <scope> => "package"
                    <name> => "test"
                    <is_rooted> => 0
                    <hll> => \past
                    <namespace> => ResizablePMCArray (size:0) [
                    ]
                    <decl> => \past
                }
            }
        }
    }

TreeUnit will permit selective validation: only those nodes and attributes which are significant will be validated.

Arbitrary descendants

Given that a parser may insert arbitrary containers (PAST::Block or PAST::Stmts) between a parent and child node, TreeUnit syntax has to provide for some kind of "..." mechanism. (In Ant file patterns, they use "/**/" for this.)

Successor matching

Sequences of statements will produces sequences of nodes, so TreeUnit must be able to recognize A; B; C as well as (A (B (C))) relationships.

Ordering

In addition to strict sequence ('A', followed by 'B') it may be necessary to recognized general orderings: 'A', then eventually 'B'.

Lexical Nesting

HLL lexical blocks may map to an entirely separate sub that is "lexically contained" in an outer sub. TreeUnit should recognize the sub's "inner" and "outer" linkages as indicating containment.

References

Here are some tools that deal with trees, or tree-like structures. Maybe they've got ideas to steal?

  • The Unix find utility and Perl's File::Find
  • Ant's FileList and FileSet elements
  • CSS Descriptors
  • The HTML DOM
  • XML's XQuery and XPath specifications
  • Lisp/Scheme
  • ANTLR has a tree serialization format
  • LDAP
  • TreeDL
  • YAML (arbitrary structures, user-specified node types)

Implementation

Syntax

An important part of TreeUnit is a facile syntax. It's possible to do tree matching in PIR, but who would want to write it?

Instead, consider something like YAML (note that "---" and "..." are part of YAML):

--- name : Test "a = 1;" error-in : Simple assignment of int source : |

void main(void) {

int a;

a = 1;

}

match:

type : PAST::Block name : main error-in : PAST::Block for "main". descendants:

type : PAST::Op pirop : assign children :

  • type : PAST::Var name : a scope : lexical
  • type : PAST::Val returns : Integer value : 1

...

This is verbose, but ultimately readable, and a parser for it exists in P5 already. In this case I've chosen to express the "eventual descendant" predicate using "descendants" versus "children", and sequences using YAML sequences. There's nothing here directly supporting "in order but not consequent", nor "in any order". That would be a V2 thing, I think. :) Implicitly, any attribute listed must match, and any attribute not listed is ignored.

Output

A Parrot TreeUnit could obviously interoperate with the actual code (once it got working, which is another question entirely).

A P5 TreeUnit would need an intermediate form. One obvious choice is PIR -- produce a PIR program that did what needed doing. Another, related choice is PAST: produce a PAST representation of a program that, etc. Another choice is to translate "easy to write" into "easy to parse", and produce some output format that was trivial for e.g., a Perl6 program to parse (since there is no YAML library for perl6 yet).