Changes between Version 4 and Version 5 of PGEBestPractices

Show
Ignore:
Timestamp:
06/30/09 01:10:23 (13 years ago)
Author:
Austin_Hastings
Comment:

Added some more tid-bits

Legend:

Unmodified
Added
Removed
Modified
  • PGEBestPractices

    v4 v5  
     1[[PageOutline]] 
     2 
     3== Parsing Techniques == 
     4 
     5=== ''Everything'' goes on a stack === 
     6 
     7Almost everything in a programming language can nest. In your parser, namespaces, blocks, declarations, expressions, references, and any number of things I can't think of will want to nest. 
     8  
     9The answer to nesting, of course, is a stack. Some stacks are obvious: lexical scope, and blocks. But what about declarations? 
     10{{{ 
     11   class Foo { 
     12      int bar(int x, int y) { 
     13         extern class Other {  
     14            attribute int x; 
     15            attribute int y; 
     16            void setX(int x) :method; 
     17            void setY(int y) :method; 
     18            pmc get_instance(); 
     19         } 
     20 
     21         pmc other = Other::get_instance(); 
     22         other.setX(x); 
     23         other.setY(y); 
     24      } 
     25   } 
     26}}} 
     27 
     28That code contains a class decl (Foo), which has a symbol decl (bar), which has a parameter-list-decl, which contains two param-decls. The bar symbol eventually gets recoded as a function definition, containing an extern class(!) decl containing 2 attribute decls and 3 more function decls. 
     29 
     30When your parser is telling you that you need another stack, be willing to listen. Not everything can, or should, go on the same stack. 
     31 
     32=== Encapsulate or defer actions === 
     33 
     34Remember when building PAST trees that the parser is "experimenting" with your rule. Maybe this is an expression, but maybe it's an anonymous declaration. 
     35 
     36As a result, you may build a very complex parse tree only to have the parser decide it's wrong, and discard it. Consider having some "sequence points" in your grammar where you can be sure that the parser has correctly identifier whatever it is you are building. 
     37 
     38Consider a variable declaration. If you store the variable declaration in a lexical block, and the parser decides that those curly braces don't ''really'' indicate a block, there's no harm. The connection from declaration to block was encapsulated in the block itself, so discarding the block's PAST tree discards the variable declaration. 
     39 
     40But if you interpret a variable declaration as some kind of global, and jam a symbol into a namespace, then what will you do if the parse fails? The parser might find a valid alternative, but you've injected a bogus symbol into the symtable someplace. 
     41 
     42For a dynamic language, the best way to deal with this may be to ignore it -- let autovivification solve the problem. Alternatively, you may want to create a "transaction" block representing changes to the namespace, and merge that block when the parser returns to some higher level (the sequence points mentioned before: a completed function decl, for example). 
     43 
     44=== Use 'state' variables to avoid special cases === 
     45 
     46For handling non-local conditions, like "first time this occurs" or "first one in file", use a state variable instead of trying to build a complex pattern. This can be used to start parsing with a "default" configuration that skips certain productions. 
     47{{{ 
     48  rule TOP { 
     49    {{ $P0 = box 1 
     50       set_global '$Top_of_file', $P0 
     51    }} 
     52 
     53    # ... rest of rule 
     54  } 
     55 
     56  rule section { 
     57    [ <section_header> 
     58 
     59    || # Note: Section Header is OPTIONAL at top of file 
     60      <?{{ $P0 = get_global '$Top_of_file' 
     61            .return($P0) 
     62       }}> 
     63    ] 
     64    # Not at TOF anymore. 
     65    {{ $P0 = box 0 
     66       set_global '$Top_of_file', $P0 
     67    }} 
     68 
     69    # ... rest of rule 
     70  } 
     71}}} 
     72 
     73=== Use 'mode' variables to unify rules === 
     74 
     75One problem with creating several rules that are slightly different is the need to maintain the slightly-different action methods underlying them. It may be a better idea to unify the various rules with some variables storing 'mode' information about the parse. 
     76 
     77Consider the case of declaring a variable. In many instances, a storage class (such as 'static') may be specified on the variable declaration. But when declaring a function parameter list, that makes no sense. One alternative is to have separate rules: 
     78{{{ 
     79   rule variable_decl { 
     80      <storage_class> <type> <init_declarator> 
     81   } 
     82 
     83   rule parameter_decl { 
     84       <type> <init_declarator> 
     85   } 
     86}}} 
     87Or, if your language is self-similar, you may use a mode: 
     88{{{ 
     89   rule decl { 
     90      [ <!DECL_MODE_PARAM> <storage_class>? ]? 
     91      <type> 
     92      <init_declarator> 
     93   } 
     94}}} 
     95The mode is a special token that just executes some inline PIR code: 
     96{{{ 
     97token DECL_MODE_PARAM { 
     98   <?{{ 
     99      $P0 = get_global '@Decl_mode_stack' 
     100      $S0 = $P0[0] 
     101      if 'parameter' == $S0 goto yes 
     102      .return (0) 
     103   yes: 
     104      .return (1) 
     105   }}> 
     106} 
     107}}} 
     108 
     109=== Report errors via special subrules === 
     110 
     111In cases where a partial match of a rule has occurred, add error checking alternatives. 
     112{{{ 
     113  rule compound_stmt { 
     114    '{' 
     115      [ <declaration> | <statement> ]* 
     116    [ '}' || <error: "Missing '}' in compound statement"> ] 
     117    {*} 
     118  } 
     119}}} 
     120Note that the parser may try several alternatives, so don't try to insert error messages too early in the rule. In the example above, the '{' is the key indicator of a compound statement. It makes no sense to try to report a "Compound statement missing '{'" error, since the failure of the rule may tell the parser to try some other, valid subrule. But once the block is opened, it will obviously have to be closed. 
     121 
     122== Performance == 
     123 
     124=== Use whitespace caching === 
     125 
     126Build an "offset cache" into your ws rule using the code below, so that it will remember the last whitespace it recognized. This can help when several alternations are trying to match some text in the same place. 
     127 
     128{{{ 
     129token ws { 
     130  | <?{{  $P0 = get_global '$!ws' 
     131          if null $P0 goto noshort 
     132          $P1 = $P0.'to'() 
     133          $P2 = match.'to'() 
     134          if $P1 != $P2 goto noshort 
     135          .return (1) 
     136        noshort: 
     137          set_global '$!ws', match 
     138          .return (0) 
     139  }}> 
     140  | <!ww> <ws_all>+ 
     141  | <ws_all>* 
     142} 
     143}}} 
     144 
     145=== "Hoist" prefixes === 
     146 
     147If several rules can be uniquely distinguished by a simple prefix, hoist the prefix into a parent rule. 
     148 
     149{{{ 
     150rule built_in_function { 
     151   [ <?before <t_builtin_name>  
     152   || <.fail> 
     153   ] 
     154 
     155   [ <builtin_sine>    {*} #= sine 
     156   | <builtin_cosine>  {*} #= cosine 
     157   | <builtin_tangent> {*} #= tangent 
     158   ... 
     159   ] 
     160} 
     161 
     162token t_builtin_name { 
     163   [ 'sin' | 'cos' | 'tan' | ... ] # Any of the builtins 
     164   >> # At end-of-word 
     165} 
     166}}} 
     167 
     168 
     169== Debugging == 
     170 
    1171Place the following in a rule to dump the current match object 
    2172 
     
    17187}}} 
    18188 
    19 use whitespace caching 
    20  
    21 {{{ 
    22 token ws { 
    23   | <?{{  $P0 = get_global '$!ws' 
    24           if null $P0 goto noshort 
    25           $P1 = $P0.'to'() 
    26           $P2 = match.'to'() 
    27           if $P1 != $P2 goto noshort 
    28           .return (1) 
    29         noshort: 
    30           set_global '$!ws', match 
    31           .return (0) 
    32   }}> 
    33   | <!ww> <ws_all>+ 
    34   | <ws_all>* 
    35 } 
    36 }}}