Version 42 (modified by jhorwitz, 13 years ago)

--

mod_parrot HLL Module Developer's Guide

Overview

This is the mod_parrot HLL module developer's guide. The target audience is developers wishing to embed their language in Apache using mod_parrot. The benefits of this are one-time compilation of scripts, a persistent execution environment, direct access to the Apache API, and the ability to write custom hooks in the embedded language. Some languages can even be self-hosted, meaning the code to implement mod_foo is written in the "foo" language.

Most examples are taken from the PIR HLL module, with several from mod_perl6 to illustrate self-hosting.

Prerequisites

Any language targeted to Parrot can use mod_parrot to execute code in a persistent environment. However, to best take advantage of mod_parrot's features, including self-hosting, languages should support the following:

  • namespaces
  • lexical and global variables
  • an Parrot-compatible object model

Using mod_parrot without these features is still possible with some PIR scaffolding.

Bootstrapping

Each HLL in mod_parrot is contained in its own Apache module, known as an HLL module. All steps leading up to and including the registration of the HLL module with Apache is called bootstrapping. The bootstrapping process typically follows this procedure:

  1. Load the HLL compiler.
  2. Declare server and directory configuration hooks.
  3. Declare Apache directive hooks.
  4. Declare Apache directives.
  5. Declare metahandlers (hooks for Apache phases).
  6. Register the Apache module.

As of mod_parrot 0.5, step 1 must be written (or compiled to) PIR, but all subsequent steps can be written in the HLL itself (a self-hosting HLL module). The PIR bootstrap file MUST be compiled to bytecode and located in ModParrot/HLL/hllname.pbc in Parrot's library path. HLL code can be located anywhere, though conventions will eventually be defined. If there is HLL bootstrap code, it must be loaded and executed using PIR in the bootstrap file.

Bootstrap code must be placed in a PIR subroutine marked with the :load adverb so it is run when the file is loaded. This subroutine can be named or anonymous (using the :anon adverb).

Example: mod_perl6

The first part of the bootstrap file from mod_perl6 loads the compiler and supporting libraries, then executes Perl 6 code from mod_perl6.pm:

.sub __onload :anon :load
    load_bytecode 'languages/perl6/perl6.pbc'
    load_bytecode 'ModParrot/Apache/Module.pbc'
    load_bytecode 'ModParrot/Constants.pbc'

    # load mod_perl6.pm, which may be precompiled
    $P0 = compreg 'Perl6'
    $P1 = $P0.'compile'('use mod_perl6')
    $P1()

    ...

Configuration

Each HLL module provides two configuration data structures to Apache: server and directory. Server configurations are specific to the main server and individual virtual hosts. Directory configurations are specific to individual sections, which can be real directories or locations defined in the Apache configuration file. All configuration structures can be merged with parent configurations to implement inheritance or overriding behavior.

Creating HLL Configurations

Each Apache module is responsible for defining and creating its own configuration data structures. When Apache asks an HLL module for a server or directory configuration, mod_parrot will look for a "constructor" subroutine in the ModParrot;HLL;hllname namespace to execute. Server configs are provided by server_create, while directory configs are provided by dir_create. Both are passed a ModParrot;Apache;CmdParms object. These subroutines should create a data structure, possibly populated with default values, and return it. The type of the structure is up to the implementor, as long as it is a valid Parrot PMC.

Signatures

  • PMC server_create(PMC parms)
  • PMC dir_create(PMC parms)

Example: the PIR configuration constructors

.namespace [ 'ModParrot'; 'HLL'; 'PIR' ]

.sub server_create
    .param pmc parms
    $P0 = new 'Hash'
    .return($P0)
.end

.sub dir_create
    .param pmc parms
    $P0 = new 'Hash'
    .return($P0)
.end

Merging HLL Configurations

Like the constructor subroutines above, HLL modules can provide subroutines to merge two configurations. This is useful when, for example, a particular configuration setting for a directory should be overridden by a different setting in a subdirectory. Or perhaps your HLL is maintaining an array of values for a virtual host that should be concatenated with the array defined in the main server. All of this behavior is performed by the merge subroutines.

If a merge subroutine is not provided, Apache will perform the merge itself by completely overriding the base configuration with the new configuration. This may or may not be desirable for your particular module.

The server merge subroutine is called server_merge, and the directory merge is handled by dir_merge. They are passed the "base" configuration and the "new" configuration, and are expected to return a merged configuration. These subroutines should create a new PMC for the merged configuration rather than reusing the PMC from the "new" configuration. Parrot passes PMCs by reference, and the code would thus be changing the "new" configuration directly, resulting in unexpected behavior.

Signatures

  • PMC server_merge(PMC basecfg, PMC newcfg)
  • PMC dir_merge(PMC basecfg, PMC newcfg)

Example: A self-hosted server merge from mod_perl6

sub server_merge(%base, %new)
{
    my %merged;

    # merge handlers -- never inherit
    for @server_phases.map({$_ ~ '_handler'}) -> $h {
        %merged{$h} = %new{$h};
    }

    return %merged;
}

Custom Apache Directives

The server and directory configurations would be fairly useless without support for adding custom Apache directives for an HLL module. Here we will learn how to define a custom directive. Actual registration of the directive occurs later when you add the module to Apache.

A directive can be defined as a Parrot Hash, or any HLL type that implements a keyed-by-string interface. There are five keys for which you need to provide values:

  • name: the name of the directive as specified in the Apache configuration file
  • args_how: a constant defining how arguments are processed
  • func: a reference to a callback subroutine that will process the arguments
  • req_override: a constant defining where in the Apache configuration file the directive can be used
  • errmsg: a message displayed when the directive is misused

Constants for args_how and req_override are found in the ModParrot;Apache;Constants;table hash from ModParrot/Apache/Constants.pbc.

Values for args_how

NO_ARGSno arguments
TAKE1one argument
TAKE2two arguments
TAKE3three arguments
TAKE12one or two arguments
TAKE23two or three arguments
TAKE123one two or three arguments
ITERATEa list of arguments passed to the callback one at a time
ITERATE2one argument followed by a list of arguments passed to the callback one at a time, along with the first argument
FLAGa single On or Off argument, passed to the callback as 0 (off) or 1 (on)
RAW_ARGSno parsing, passes the entire configuration line to the callback

Values for req_override

ACCESS_CONFcan be used in directory sections, but not in .htaccess files
OR_NONEcannot be overridden by AllowOverride
OR_ALLcan be used anywhere in the configuration
OR_AUTHCFGcan be used inside directory sections (and .htaccess with the AuthConfig override)
OR_FILEINFOcan be used anywhere (and .htaccess with the FileInfo override)
OR_INDEXEScan be used anywhere (and .htaccess with the Indexes override)
OR_OPTIONScan be used anywhere (and .htaccess with the Options override)
OR_LIMITcan be used in directory sections (and .htaccess with the Options override)
RSRC_CONFcan be used outside of a directory section (not allowed in .htaccess)
OR_UNSETnot yet implemented
EXEC_ON_READnot yet implemented

The directive hashes should be stored in an array for use during module registration.

Example: The ParrotHandler definition from the PIR HLL module

    .local pmc ap_const
    .local pmc cmd

    ap_const = get_root_global ['ModParrot'; 'Apache'; 'Constants'], 'table'

    cmd = new 'Hash'
    $P0 = new 'String'
    $P0 = 'ParrotHandler'
    cmd['name'] = $P0
    $P0 = new 'Integer'
    $P0 = ap_const['TAKE1']
    cmd['args_how'] = $P0
    cmd['func'] = cmd_parrothandler
    $P0 = new 'Integer'
    $P0 = ap_const['OR_AUTHCFG']
    cmd['req_override'] = $P0
    $P0 = new 'String'
    $P0 = 'usage: ParrotHandler handler-name'
    cmd['errmsg'] = $P0

Creating a hash for more than a few directives can be cumbersome in PIR. The PIR HLL module includes a subroutine called new_cmd that does the dirty work for us. It may eventually be moved to ModParrot;Apache;Module and tweaked to be more generic so other modules can use it.

new_cmd from the PIR HLL module

.sub new_cmd
    .param string name
    .param string how
    .param string func
    .param string override
    .param string errmsg
    .local pmc ap_const
    .local pmc cmd

    ap_const = get_root_global ['ModParrot'; 'Apache'; 'Constants'], 'table'

    cmd = new 'Hash'
    $P0 = new 'String'
    $P0 = name
    cmd['name'] = $P0
    $P0 = new 'Integer'
    $P0 = ap_const[how]
    cmd['args_how'] = $P0
    $P0 = get_hll_global [ 'ModParrot'; 'HLL'; 'PIR' ], func
    cmd['func'] = $P0
    $P0 = new 'Integer'
    $P0 = ap_const[override]
    cmd['req_override'] = $P0
    $P0 = new 'String'
    $P0 = errmsg
    cmd['errmsg'] = $P0

    .return(cmd)
.end

    # creating a directive hash with new_cmd()
    $P0 = new_cmd('ParrotHandler', 'TAKE1', 'cmd_parrothandler', 'OR_AUTHCFG', 'usage: ParrotHandler handler-name')

We now turn our focus to the callback functions. The prototype for a callback function is:

VOID callback(PMC dircfg, PMC args)

  • parms: the ModParrot;Apache;CmdParms object for this command
  • dircfg: the current directory configuration
  • args: an array of arguments

In a server scope, dircfg will be populated with that server's document root directory configuration (XXX verify this).

Directive callbacks are responsible for updating the module configuration based on the arguments provided from the Apache config. In the case of ParrotHandler, func references the cmd_parrothandler subroutine, which will be called each time the ParrotHandler directive is encountered in the Apache configuration. Since it's a TAKE1 callback, args will have one element. When called, it updates the directory configuration with the name of the response handler from args.

Example: The ParrotHandler directive callback

.sub cmd_parrothandler
    .param pmc parms
    .param pmc dircfg
    .param pmc args

    $S0 = args[0]
    dircfg['response_handler'] = $S0
.end

Only the directory configuration is passed to the callback. If it needed to update the server configuration instead (i.e. for ParrotPostConfig), it would call ModParrot;Apache;get_config, which has the following signature:

PMC get_config(STRING module_name, PMC per_dir_config)

If per_dir_config is omitted, it will return the server configuration.

Example: The ParrotPostConfig directive callback

.sub cmd_parrotpostconfighandler
    .param pmc parms
    .param pmc mconfig
    .param pmc args
    .local pmc cfg, get_config

    get_config = get_hll_global ['ModParrot'; 'Apache'; 'Module'], 'get_config'
    cfg = get_config('modparrot_pir_module')
    $S0 = args[0]
    cfg['post_config_handler'] = $S0
.end

Accessing mod_parrot's Configuration

TODO

Metahandlers

Metahandlers are responsible for implementing the semantics of an HLL module for each Apache phase. If a module registers a hook with Apache, mod_parrot will call the metahandler for that hook during that phase. The metahandler then executes the actual HLL handler code specified in the configuration and returns the status. In this way, the metahandler acts as a proxy, hiding the implementation details of the HLL module from Apache and mod_parrot. For example, mod_perl6 passes an Apache::RequestRec object to response handlers, but mod_parrot doesn't know that -- it's up to the mod_perl6 metahandler to pass the Apache::RequestRec object to the handler code.

Apache Hooks

The following Apache phases are supported by mod_parrot:

Apache PhaseModParrot;Constant;table keyMetahandler Name
Open logsMP_HOOK_OPEN_LOGSopen_logs_handler
Post-configurationMP_HOOK_POST_CONFIGpost_config_handler
Child process initializationMP_HOOK_CHILD_INITchild_init_handler
PreconnectionMP_HOOK_PRE_CONNECTIONpre_connection_handler
Process connectionMP_HOOK_PROCESS_CONNECTIONprocess_connection_handler
Post Read RequestMP_HOOK_POST_READ_REQUESTpost_read_request_handler
Map To StorageMP_HOOK_MAP_TO_STORAGEmap_to_storage_handler
URI TranslationMP_HOOK_TRANStrans_handler
Input FilterMP_HOOK_INPUT_FILTERTBD
Parse headersMP_HOOK_HEADER_PARSERheader_parser_handler
AccessMP_HOOK_ACCESSaccess_handler
AuthenticationMP_HOOK_AUTHENauthen_handler
AuthorizationMP_HOOK_AUTHZauthz_handler
ResponseMP_HOOK_RESPONSEresponse_handler
Output FilterMP_HOOK_OUTPUT_FILTERTBD
MIME TypeMP_HOOK_TYPEtype_handler
FixupsMP_HOOK_FIXUPfixup_handler
LoggingMP_HOOK_LOGlog_handler

Each hook that an HLL module supports must have a corresponding metahandler with the name indicated above. It must be declared in the ModParrot;HLL;hllname namespace.

The Context Object

You might be wondering where metahandlers get information like the previously mentioned Apache::RequestRec object. mod_parrot passes a single ModParrot;Context object as the lone argument to every metahandler. This "context" object contains methods for accessing information relevant to the particular phase of a metahandler. During the response phase for instance, the request_rec method returns the current Apache;RequestRec object. That method would return a NULL PMC during a phase like open_logs.

ModParrot;Context provides the following methods:

MethodReturns
interpthe ModParrot;Interpreter object
request_recthe current request record as a ModParrot;Apache;RequestRec object
server_recthe current server record as a ModParrot;Apache;ServerRec object (NOT SUPPORTED UNTIL 0.6)
conn_recthe current connection record as a ModParrot;Apache;ConnRec object (NOT SUPPORTED UNTIL 0.6)
csdthe current connection descriptor (usually a socket) as an unmanaged PMC (NOT SUPPORTED UNTIL 0.6)
pconfthe configuration pool as an ModParrot;Apache;Pool object
ptempthe temporary pool as an ModParrot;Apache;Pool object
plogthe log pool as an ModParrot;Apache;Pool object
pchildthe child pool as an ModParrot;Apache;Pool object

The following table indicates the methods that are in scope for each Apache phase. Note that while a particular structure might be in scope for a particular phase, it may not be available here (e.g. conn_rec in the response phase). In that case you should retrieve the structure by traversing the other structures (e.g. request_rec->connection).

request_recserver_recconn_reccsdpconfptempplogpchild
Open logs XXX
Post-configuration XXX
Child process initializationXX
Preconnection XX
Process connectionX
Post Read RequestX
Map To StorageXX
URI TranslationXX
Input FilterXX
Parse headersXX
AccessXX
AuthenticationXX
AuthorizationXX
ResponseXX
Output FilterXX
MIME TypeXX
FixupsXX
LoggingXX

Retrieving Configurations

No configuration information is passed directly to metahandlers, so they should use ModParrot;Apache;get_config to retrieve configuration structures.

To retrieve a server configuration:

.local pmc cfg, get_config
get_config = get_root_global ['ModParrot';'Apache'], 'get_config'
cfg = get_config("my_module_name")

To retrieve a directory configuration, use the per_dir_config method of ModParrot;Apache;RequestRec:

.local pmc dircfg, get_config, per_dir_config
get_config = get_root_global ['ModParrot';'Apache'], 'get_config'
per_dir_config = r.'per_dir_config'()
dircfg = get_config("my_module_name", per_dir_config)

I/O Redirection

mod_parrot can tie Parrot I/O operations on standard input and output to an Apache request, emulating CGI behavior. This is useful when you either don't want to expose the Apache API to the language, or the language lacks the features to support it (i.e. no objects).

To tie a request to stdin or stdout, use the stdin and stdout methods of the ModParrot;Interpreter object, which you can get from the context or by instantiating a new ModParrot;Interpreter object.

  • PMC stdin(PMC handle)
  • PMC stdout(PMC handle)

handle is a PMC of one of the following types:

  • ModParrot;Apache;RequestRec - ties I/O on to the request
  • FileHandle - assigns the filehandle object

Each method returns the previous handle so you can restore it later. You must always restore stdin and stdout before returning from a metahandler.

Here is code adapted from Pipp's response metahandler. It ties the request to stdin and stdout, runs the requested PHP script, and restores stdin and stdout to their original filehandles.

interp = ctx.'interp'()
php_file = r.'filename'()
oldout = interp.'stdout'(r)
oldin = interp.'stdin'(r)
r.'content_type'("text/html")
run_php_file(php_file)
interp.'stdout'(oldout)
interp.'stdin'(oldin)

Return Values

The return value from a metahandler is passed directly back to Apache, and should be a valid Apache or HTTP status code. Constants for these codes are available from the ModParrot;Apache;Constants;table hash from ModParrot/Apache/Constants.pbc. Please refer to the Apache documentation for details on how each phase reacts to different status codes.

Example Metahandlers

Following are the response handlers for PIR and Perl 6. They perform the exact same function in different languages. You can see how code from a self-hosted module like mod_perl6 is much easier to understand and maintain.

The PIR HLL module response handler

# response handler
.sub response_handler
    .param pmc ctx
    .local pmc r, handler, cfg, dircfg, get_config, ap_const
    .local int status

    ap_const = get_root_global ['ModParrot'; 'Apache'; 'Constants'], 'table'

    # get the request_rec object
    r = ctx.'request_rec'()

    # decline if not our handler
    $S0 = r.'handler'()
    if $S0 == 'parrot-code' goto get_configs
    status = ap_const['DECLINED']
    goto return_status

  get_configs:
    get_config = get_hll_global ['ModParrot'; 'Apache'; 'Module'], 'get_config'
    cfg = get_config('modparrot_pir_module')
    $P0 = r.'per_dir_config'()
    dircfg = get_config('modparrot_pir_module', $P0)

    # decline if we have no config in this section
    unless null dircfg goto get_handler
    status = ap_const['DECLINED']
    .return(status)

  get_handler:
    # decline if we have no handler in this section
    $S0 = dircfg['response_handler']
    if $S0 goto run_handler
    status = ap_const['DECLINED']
    .return(status)

  run_handler:
    # set our default content type
    r.'content_type'('text/html')
    # find the handler sub and call it
    $P0 = split ';', $S0
    get_hll_global handler, $P0, 'handler'
    status = handler(r)

  return_status:
    .return(status)
.end

The mod_perl6 response handler

sub response_handler($ctx)
{
    my $r = $ctx.request_rec();

    unless ($r.handler() ~~ any(<modperl6 perl6-script>)) {
        return $Apache::Const::DECLINED;
    }

    my %cfg = ModParrot::Apache::Module::get_config("modparrot_perl6_module");
    my %dircfg = ModParrot::Apache::Module::get_config("modparrot_perl6_module",
        $r.per_dir_config());

    my $handler = %dircfg<response_handler>;

    $r.content_type('text/html');
    my $status = call_handler($handler, $r);
    return $status;
}

Registering the HLL Apache Module

Now that we have all of this information, we can finally register our HLL module. The only missing piece is the name of the module. The mod_parrot convention is modparrot_hllname_module, where hllname is the name of your language (e.g. modparrot_perl6_module). This will prevent name clashes with non-mod_parrot modules for the same language, which is likely for languages like PHP and Python that already have modules.

With the name in hand, we can call ModParrot;Apache;add_module to register our module. Its prototype is as follows:

VOID add_module(STRING module_name, STRING hll_namespace, PMC command_array, PMC hook_array)

hll_namespace is the namespace of your module under ModParrot;HLL.

There is no return value from this subroutine; errors that occur when registering a module are fatal and will print an error message and exit.

Example: Registering the PIR HLL module

    add_module = get_hll_global [ 'ModParrot'; 'Apache'; 'Module' ], 'add'
    add_module("modparrot_pir_module", "PIR", cmds, hooks)

HINT: If you are registering ALL hooks, use an array with the single value MP_HOOK_ALL, rather than populating the array with every hook constant.

Miscellany

Persistence

TODO

Cooperating with Other HLL Modules

TODO