Version 37 (modified by jhorwitz, 13 years ago) |
---|
mod_parrot HLL Module Developer's Guide
Overview
This is the mod_parrot HLL module developer's guide. The target audience is developers wishing to embed their language in Apache using mod_parrot. The benefits of this are one-time compilation of scripts, a persistent execution environment, direct access to the Apache API, and the ability to write custom hooks in the embedded language. Some languages can even be self-hosted, meaning the code to implement mod_foo is written in the "foo" language.
Most examples are taken from the PIR HLL module, with several from mod_perl6 to illustrate self-hosting.
Prerequisites
Any language targeted to Parrot can use mod_parrot to execute code in a persistent environment. However, to best take advantage of mod_parrot's features, including self-hosting, languages should support the following:
- namespaces
- lexical and global variables
- an Parrot-compatible object model
Using mod_parrot without these features is still possible with some PIR scaffolding.
Bootstrapping
Each HLL in mod_parrot is contained in its own Apache module, known as an HLL module. All steps leading up to and including the registration of the HLL module with Apache is called bootstrapping. The bootstrapping process typically follows this procedure:
- Load the HLL compiler.
- Declare server and directory configuration hooks.
- Declare Apache directive hooks.
- Declare Apache directives.
- Declare metahandlers (hooks for Apache phases).
- Register the Apache module.
As of mod_parrot 0.5, step 1 must be written (or compiled to) PIR, but all subsequent steps can be written in the HLL itself (a self-hosting HLL module). The PIR bootstrap file MUST be compiled to bytecode and located in ModParrot/HLL/hllname.pbc in Parrot's library path. HLL code can be located anywhere, though conventions will eventually be defined. If there is HLL bootstrap code, it must be loaded and executed using PIR in the bootstrap file.
Bootstrap code must be placed in a PIR subroutine marked with the :load adverb so it is run when the file is loaded. This subroutine can be named or anonymous (using the :anon adverb).
Example: mod_perl6
The first part of the bootstrap file from mod_perl6 loads the compiler and supporting libraries, then executes Perl 6 code from mod_perl6.pm:
.sub __onload :anon :load load_bytecode 'languages/perl6/perl6.pbc' load_bytecode 'ModParrot/Apache/Module.pbc' load_bytecode 'ModParrot/Constants.pbc' # load mod_perl6.pm, which may be precompiled $P0 = compreg 'Perl6' $P1 = $P0.'compile'('use mod_perl6') $P1() ...
Configuration
Each HLL module provides two configuration data structures to Apache: server and directory. Server configurations are specific to the main server and individual virtual hosts. Directory configurations are specific to individual sections, which can be real directories or locations defined in the Apache configuration file. All configuration structures can be merged with parent configurations to implement inheritance or overriding behavior.
Creating HLL Configurations
Each Apache module is responsible for defining and creating its own configuration data structures. When Apache asks an HLL module for a server or directory configuration, mod_parrot will look for a "constructor" subroutine in the ModParrot;HLL;hllname namespace to execute. Server configs are provided by server_create, while directory configs are provided by dir_create. These subroutines should create a data structure, possibly populated with default values, and return it. The type of the structure is up to the implementor, as long as it is a valid Parrot PMC.
Signatures
- PMC server_create()
- PMC dir_create()
Example: the PIR configuration constructors
.namespace [ 'ModParrot'; 'HLL'; 'PIR' ] .sub server_create $P0 = new 'Hash' .return($P0) .end .sub dir_create $P0 = new 'Hash' .return($P0) .end
Merging HLL Configurations
Like the constructor subroutines above, HLL modules can provide subroutines to merge two configurations. This is useful when, for example, a particular configuration setting for a directory should be overridden by a different setting in a subdirectory. Or perhaps your HLL is maintaining an array of values for a virtual host that should be concatenated with the array defined in the main server. All of this behavior is performed by the merge subroutines.
If a merge subroutine is not provided, Apache will perform the merge itself by completely overriding the base configuration with the new configuration. This may or may not be desirable for your particular module.
The server merge subroutine is called server_merge, and the directory merge is handled by dir_merge. They are passed the "base" configuration and the "new" configuration, and are expected to return a merged configuration. These subroutines should create a new PMC for the merged configuration rather than reusing the PMC from the "new" configuration. Parrot passes PMCs by reference, and the code would thus be changing the "new" configuration directly, resulting in unexpected behavior.
Signatures
- PMC server_merge(PMC basecfg, PMC newcfg)
- PMC dir_merge(PMC basecfg, PMC newcfg)
Example: A self-hosted server merge from mod_perl6
sub server_merge(%base, %new) { my %merged; # merge handlers -- never inherit for @server_phases.map({$_ ~ '_handler'}) -> $h { %merged{$h} = %new{$h}; } return %merged; }
Custom Apache Directives
The server and directory configurations would be fairly useless without support for adding custom Apache directives for an HLL module. Here we will learn how to define a custom directive. Actual registration of the directive occurs later when you add the module to Apache.
A directive can be defined as a Parrot Hash, or any HLL type that implements a keyed-by-string interface. There are five keys for which you need to provide values:
- name: the name of the directive as specified in the Apache configuration file
- args_how: a constant defining how arguments are processed
- func: a reference to a callback subroutine that will process the arguments
- req_override: a constant defining where in the Apache configuration file the directive can be used
- errmsg: a message displayed when the directive is misused
Constants for args_how and req_override are found in the ModParrot;Apache;Constants;table hash from ModParrot/Apache/Constants.pbc.
Values for args_how
NO_ARGS | no arguments |
TAKE1 | one argument |
TAKE2 | two arguments |
TAKE3 | three arguments |
TAKE12 | one or two arguments |
TAKE23 | two or three arguments |
TAKE123 | one two or three arguments |
ITERATE | a list of arguments passed to the callback one at a time |
ITERATE2 | one argument followed by a list of arguments passed to the callback one at a time, along with the first argument |
FLAG | a single On or Off argument, passed to the callback as 0 (off) or 1 (on) |
RAW_ARGS | no parsing, passes the entire configuration line to the callback |
Values for req_override
ACCESS_CONF | can be used in directory sections, but not in .htaccess files |
OR_NONE | cannot be overridden by AllowOverride |
OR_ALL | can be used anywhere in the configuration |
OR_AUTHCFG | can be used inside directory sections (and .htaccess with the AuthConfig override) |
OR_FILEINFO | can be used anywhere (and .htaccess with the FileInfo override) |
OR_INDEXES | can be used anywhere (and .htaccess with the Indexes override) |
OR_OPTIONS | can be used anywhere (and .htaccess with the Options override) |
OR_LIMIT | can be used in directory sections (and .htaccess with the Options override) |
RSRC_CONF | can be used outside of a directory section (not allowed in .htaccess) |
OR_UNSET | not yet implemented |
EXEC_ON_READ | not yet implemented |
The directive hashes should be stored in an array for use during module registration.
Example: The ParrotHandler definition from the PIR HLL module
.local pmc ap_const .local pmc cmd ap_const = get_root_global ['ModParrot'; 'Apache'; 'Constants'], 'table' cmd = new 'Hash' $P0 = new 'String' $P0 = 'ParrotHandler' cmd['name'] = $P0 $P0 = new 'Integer' $P0 = ap_const['TAKE1'] cmd['args_how'] = $P0 cmd['func'] = cmd_parrothandler $P0 = new 'Integer' $P0 = ap_const['OR_AUTHCFG'] cmd['req_override'] = $P0 $P0 = new 'String' $P0 = 'usage: ParrotHandler handler-name' cmd['errmsg'] = $P0
Creating a hash for more than a few directives can be cumbersome in PIR. The PIR HLL module includes a subroutine called new_cmd that does the dirty work for us. It may eventually be moved to ModParrot;Apache;Module and tweaked to be more generic so other modules can use it.
new_cmd from the PIR HLL module
.sub new_cmd .param string name .param string how .param string func .param string override .param string errmsg .local pmc ap_const .local pmc cmd ap_const = get_root_global ['ModParrot'; 'Apache'; 'Constants'], 'table' cmd = new 'Hash' $P0 = new 'String' $P0 = name cmd['name'] = $P0 $P0 = new 'Integer' $P0 = ap_const[how] cmd['args_how'] = $P0 $P0 = get_hll_global [ 'ModParrot'; 'HLL'; 'PIR' ], func cmd['func'] = $P0 $P0 = new 'Integer' $P0 = ap_const[override] cmd['req_override'] = $P0 $P0 = new 'String' $P0 = errmsg cmd['errmsg'] = $P0 .return(cmd) .end # creating a directive hash with new_cmd() $P0 = new_cmd('ParrotHandler', 'TAKE1', 'cmd_parrothandler', 'OR_AUTHCFG', 'usage: ParrotHandler handler-name')
We now turn our focus to the callback functions. The prototype for a callback function is:
VOID callback(PMC dircfg, PMC args)
- dircfg: the current directory configuration
- args: an array of arguments
In a server scope, dircfg will be populated with that server's document root directory configuration (XXX verify this).
Directive callbacks are responsible for updating the module configuration based on the arguments provided from the Apache config. In the case of ParrotHandler, func references the cmd_parrothandler subroutine, which will be called each time the ParrotHandler directive is encountered in the Apache configuration. Since it's a TAKE1 callback, args will have one element. When called, it updates the directory configuration with the name of the response handler from args.
Example: The ParrotHandler directive callback
.sub cmd_parrothandler .param pmc dircfg .param pmc args $S0 = args[0] dircfg['response_handler'] = $S0 .end
Only the directory configuration is passed to the callback. If it needed to update the server configuration instead (i.e. for ParrotPostConfig), it would call ModParrot;Apache;get_config, which has the following signature:
PMC get_config(STRING module_name, PMC per_dir_config)
If per_dir_config is omitted, it will return the server configuration.
Example: The ParrotPostConfig directive callback
.sub cmd_parrotpostconfighandler .param pmc mconfig .param pmc args .local pmc cfg, get_config get_config = get_hll_global ['ModParrot'; 'Apache'; 'Module'], 'get_config' cfg = get_config('modparrot_pir_module') $S0 = args[0] cfg['post_config_handler'] = $S0 .end
Accessing mod_parrot's Configuration
TODO
Metahandlers
Metahandlers are responsible for implementing the semantics of an HLL module for each Apache phase. If a module registers a hook with Apache, mod_parrot will call the metahandler for that hook during that phase. The metahandler then executes the actual HLL handler code specified in the configuration and returns the status. In this way, the metahandler acts as a proxy, hiding the implementation details of the HLL module from Apache and mod_parrot. For example, mod_perl6 passes an Apache::RequestRec object to response handlers, but mod_parrot doesn't know that -- it's up to the mod_perl6 metahandler to pass the Apache::RequestRec object to the handler code.
Apache Hooks
The following Apache phases are supported by mod_parrot:
Apache Phase | ModParrot;Constant;table key | Metahandler Name |
Open logs | MP_HOOK_OPEN_LOGS | open_logs_handler |
Post-configuration | MP_HOOK_POST_CONFIG | post_config_handler |
Child process initialization | MP_HOOK_CHILD_INIT | child_init_handler |
Preconnection | MP_HOOK_PRE_CONNECTION | pre_connection_handler |
Process connection | MP_HOOK_PROCESS_CONNECTION | process_connection_handler |
Post Read Request | MP_HOOK_POST_READ_REQUEST | post_read_request_handler |
Map To Storage | MP_HOOK_MAP_TO_STORAGE | map_to_storage_handler |
URI Translation | MP_HOOK_TRANS | trans_handler |
Input Filter | MP_HOOK_INPUT_FILTER | TBD |
Parse headers | MP_HOOK_HEADER_PARSER | header_parser_handler |
Access | MP_HOOK_ACCESS | access_handler |
Authentication | MP_HOOK_AUTHEN | authen_handler |
Authorization | MP_HOOK_AUTHZ | authz_handler |
Response | MP_HOOK_RESPONSE | response_handler |
Output Filter | MP_HOOK_OUTPUT_FILTER | TBD |
MIME Type | MP_HOOK_TYPE | type_handler |
Fixups | MP_HOOK_FIXUP | fixup_handler |
Logging | MP_HOOK_LOG | log_handler |
Each hook that an HLL module supports must have a corresponding metahandler with the name indicated above. It must be declared in the ModParrot;HLL;hllname namespace.
The Context Object
You might be wondering where metahandlers get information like the previously mentioned Apache::RequestRec object. mod_parrot passes a single ModParrot;Context object as the lone argument to every metahandler. This "context" object contains methods for accessing information relevant to the particular phase of a metahandler. During the response phase for instance, the request_rec method returns the current Apache;RequestRec object. That method would return a NULL PMC during a phase like open_logs.
ModParrot;Context provides the following methods:
Method | Returns |
interp | the ModParrot;Interpreter object}}} |
request_rec | the current request record as a ModParrot;Apache;RequestRec object |
server_rec | the current server record as a ModParrot;Apache;ServerRec object (NOT SUPPORTED UNTIL 0.6) |
conn_rec | the current connection record as a ModParrot;Apache;ConnRec object (NOT SUPPORTED UNTIL 0.6) |
csd | the current connection descriptor (usually a socket) as an unmanaged PMC (NOT SUPPORTED UNTIL 0.6) |
pconf | the configuration pool as an ModParrot;Apache;Pool object |
ptemp | the temporary pool as an ModParrot;Apache;Pool object |
plog | the log pool as an ModParrot;Apache;Pool object |
pchild | the child pool as an ModParrot;Apache;Pool object |
The following table indicates the methods that are in scope for each Apache phase:
request_rec | server_rec | conn_rec | csd | pconf | ptemp | plog | pchild | |
Open logs | X | X | X | |||||
Post-configuration | X | X | X | |||||
Child process initialization | X | X | ||||||
Preconnection | X | X | ||||||
Process connection | X | |||||||
Post Read Request | X | |||||||
Map To Storage | X | X | ||||||
URI Translation | X | X | ||||||
Input Filter | X | X | ||||||
Parse headers | X | X | ||||||
Access | X | X | ||||||
Authentication | X | X | ||||||
Authorization | X | X | ||||||
Response | X | X | ||||||
Output Filter | X | X | ||||||
MIME Type | X | X | ||||||
Fixups | X | X | ||||||
Logging | X | X |
Retrieving Configurations
No configuration information is passed directly to metahandlers, so they should use ModParrot;Apache;get_config to retrieve configuration structures.
To retrieve a server configuration:
.local pmc cfg, get_config get_config = get_root_global ['ModParrot';'Apache'], 'get_config' cfg = get_config("my_module_name")
To retrieve a directory configuration, use the per_dir_config method of ModParrot;Apache;RequestRec:
.local pmc dircfg, get_config, per_dir_config get_config = get_root_global ['ModParrot';'Apache'], 'get_config' per_dir_config = r.'per_dir_config'() dircfg = get_config("my_module_name", per_dir_config)
I/O Redirection
mod_parrot can tie Parrot I/O operations on standard input and output to an Apache request, emulating CGI behavior. This is useful when you either don't want to expose the Apache API to the language, or the language lacks the features to support it (i.e. no objects).
To tie a request to stdin or stdout, use the stdin and stdout methods of ModParrot;Interpreter.
- PMC stdin(PMC handle)
- PMC stdout(PMC handle)
handle is a PMC of one of the following types:
- ModParrot;Apache;RequestRec - ties I/O on to the request
- FileHandle - assigns the filehandle object
Each method returns the previous handle so you can restore it later. You must always restore stdin and stdout before returning from a metahandler.
Here is code adapted from Pipp's response metahandler. It ties the request to stdin and stdout, runs the requested PHP script, and restores stdin and stdout to their original filehandles.
interp = ctx.'interp'() php_file = r.'filename'() oldout = interp.'stdout'(r) oldin = interp.'stdin'(r) r.'content_type'("text/html") run_php_file(php_file) interp.'stdout'(oldout) interp.'stdin'(oldin)
Return Values
The return value from a metahandler is passed directly back to Apache, and should be a valid Apache or HTTP status code. Constants for these codes are available from the ModParrot;Apache;Constants;table hash from ModParrot/Apache/Constants.pbc. Please refer to the Apache documentation for details on how each phase reacts to different status codes.
Example Metahandlers
Following are the response handlers for PIR and Perl 6. They perform the exact same function in different languages. You can see how code from a self-hosted module like mod_perl6 is much easier to understand and maintain.
The PIR HLL module response handler
# response handler .sub response_handler .param pmc ctx .local pmc r, handler, cfg, dircfg, get_config, ap_const .local int status ap_const = get_root_global ['ModParrot'; 'Apache'; 'Constants'], 'table' # get the request_rec object r = ctx.'request_rec'() # decline if not our handler $S0 = r.'handler'() if $S0 == 'parrot-code' goto get_configs status = ap_const['DECLINED'] goto return_status get_configs: get_config = get_hll_global ['ModParrot'; 'Apache'; 'Module'], 'get_config' cfg = get_config('modparrot_pir_module') $P0 = r.'per_dir_config'() dircfg = get_config('modparrot_pir_module', $P0) # decline if we have no config in this section unless null dircfg goto get_handler status = ap_const['DECLINED'] .return(status) get_handler: # decline if we have no handler in this section $S0 = dircfg['response_handler'] if $S0 goto run_handler status = ap_const['DECLINED'] .return(status) run_handler: # set our default content type r.'content_type'('text/html') # find the handler sub and call it $P0 = split ';', $S0 get_hll_global handler, $P0, 'handler' status = handler(r) return_status: .return(status) .end
The mod_perl6 response handler
sub response_handler($ctx) { my $r = $ctx.request_rec(); unless ($r.handler() ~~ any(<modperl6 perl6-script>)) { return $Apache::Const::DECLINED; } my %cfg = ModParrot::Apache::Module::get_config("modparrot_perl6_module"); my %dircfg = ModParrot::Apache::Module::get_config("modparrot_perl6_module", $r.per_dir_config()); my $handler = %dircfg<response_handler>; $r.content_type('text/html'); my $status = call_handler($handler, $r); return $status; }
Registering the HLL Apache Module
Now that we have all of this information, we can finally register our HLL module. The only missing piece is the name of the module. The mod_parrot convention is modparrot_hllname_module, where hllname is the name of your language (e.g. modparrot_perl6_module). This will prevent name clashes with non-mod_parrot modules for the same language, which is likely for languages like PHP and Python that already have modules.
With the name in hand, we can call ModParrot;Apache;add_module to register our module. Its prototype is as follows:
VOID add_module(STRING module_name, STRING hll_namespace, PMC command_array, PMC hook_array)
hll_namespace is the namespace of your module under ModParrot;HLL.
There is no return value from this subroutine; errors that occur when registering a module are fatal and will print an error message and exit.
Example: Registering the PIR HLL module
add_module = get_hll_global [ 'ModParrot'; 'Apache'; 'Module' ], 'add' add_module("modparrot_pir_module", "PIR", cmds, hooks)
HINT: If you are registering ALL hooks, use an array with the single value MP_HOOK_ALL, rather than populating the array with every hook constant.
Miscellany
Persistence
TODO
Cooperating with Other HLL Modules
TODO