HTTP/1.1 -1 Read error in cache disk data: SuccessContent-Type: text/plain; charset="utf-8" Last-Modified: Sat, 22 Jan 2022 07:06:01 GMT Content-length: 22819 Connection: Close Proxy-Connection: Close X-Cache: HIT from web1.osuosl.org Server: ProxyTrack 0.5 (HTTrack 3.49.2) = mod_parrot HLL Module Developer's Guide = == Overview == This is the mod_parrot HLL module developer's guide. The target audience is developers wishing to embed their language in Apache using mod_parrot. The benefits of this are one-time compilation of scripts, a persistent execution environment, direct access to the Apache API, and the ability to write custom hooks in the embedded language. Some languages can even be self-hosted, meaning the code to implement mod_foo is written in the "foo" language. Most examples are taken from the PIR HLL module, with several from mod_perl6 to illustrate self-hosting. == Prerequisites == Any language targeted to Parrot can use mod_parrot to execute code in a persistent environment. However, to best take advantage of mod_parrot's features, including self-hosting, languages should support the following: * namespaces * lexical and global variables * an Parrot-compatible object model Using mod_parrot without these features is still possible with some PIR scaffolding. == Bootstrapping == Each HLL in mod_parrot is contained in its own Apache module, known as an ''HLL module''. All steps leading up to and including the registration of the HLL module with Apache is called bootstrapping. The bootstrapping process typically follows this procedure: 1. Load the HLL compiler. 1. Declare server and directory configuration hooks. 1. Declare Apache directive hooks. 1. Declare Apache directives. 1. Declare metahandlers (hooks for Apache phases). 1. Register the Apache module. As of mod_parrot 0.5, step 1 must be written (or compiled to) PIR, but all subsequent steps can be written in the HLL itself (a self-hosting HLL module). The PIR bootstrap file MUST be compiled to bytecode and located in {{{ModParrot/HLL/hllname.pbc}}} in Parrot's library path. HLL code can be located anywhere, though conventions will eventually be defined. If there is HLL bootstrap code, it must be loaded and executed using PIR in the bootstrap file. Bootstrap code must be placed in a PIR subroutine marked with the {{{:load}}} adverb so it is run when the file is loaded. This subroutine can be named or anonymous (using the {{{:anon}}} adverb). ''Example: mod_perl6'' The first part of the bootstrap file from mod_perl6 loads the compiler and supporting libraries, then executes Perl 6 code from {{{mod_perl6.pm}}}: {{{ .sub __onload :anon :load load_bytecode 'languages/perl6/perl6.pbc' load_bytecode 'ModParrot/Apache/Module.pbc' load_bytecode 'ModParrot/Constants.pbc' # load mod_perl6.pm, which may be precompiled $P0 = compreg 'Perl6' $P1 = $P0.'compile'('use mod_perl6') $P1() ... }}} == Configuration == Each HLL module provides two configuration data structures to Apache: server and directory. Server configurations are specific to the main server and individual virtual hosts. Directory configurations are specific to individual ''sections'', which can be real directories or locations defined in the Apache configuration file. All configuration structures can be merged with parent configurations to implement inheritance or overriding behavior. === Creating HLL Configurations === Each Apache module is responsible for defining and creating its own configuration data structures. When Apache asks an HLL module for a server or directory configuration, mod_parrot will look for a "constructor" subroutine in the {{{ModParrot;HLL;hllname}}} namespace to execute. Server configs are provided by {{{server_create}}}, while directory configs are provided by {{{dir_create}}}. These subroutines should create a data structure, possibly populated with default values, and return it. The type of the structure is up to the implementor, as long as it is a valid Parrot PMC. ''Signatures'' * {{{PMC server_create()}}} * {{{PMC dir_create()}}} ''Example: the PIR configuration constructors'' {{{ .namespace [ 'ModParrot'; 'HLL'; 'PIR' ] .sub server_create $P0 = new 'Hash' .return($P0) .end .sub dir_create $P0 = new 'Hash' .return($P0) .end }}} === Merging HLL Configurations === Like the constructor subroutines above, HLL modules can provide subroutines to merge two configurations. This is useful when, for example, a particular configuration setting for a directory should be overridden by a different setting in a subdirectory. Or perhaps your HLL is maintaining an array of values for a virtual host that should be concatenated with the array defined in the main server. All of this behavior is performed by the merge subroutines. If a merge subroutine is not provided, Apache will perform the merge itself by completely overriding the base configuration with the new configuration. This may or may not be desirable for your particular module. The server merge subroutine is called {{{server_merge}}}, and the directory merge is handled by {{{dir_merge}}}. They are passed the "base" configuration and the "new" configuration, and are expected to return a merged configuration. These subroutines should create a new PMC for the merged configuration rather than reusing the PMC from the "new" configuration. Parrot passes PMCs by reference, and the code would thus be changing the "new" configuration directly, resulting in unexpected behavior. ''Signatures'' * {{{PMC server_merge(PMC basecfg, PMC newcfg)}}} * {{{PMC dir_merge(PMC basecfg, PMC newcfg)}}} ''Example: A self-hosted server merge from mod_perl6'' {{{ sub server_merge(%base, %new) { my %merged; # merge handlers -- never inherit for @server_phases.map({$_ ~ '_handler'}) -> $h { %merged{$h} = %new{$h}; } return %merged; } }}} === Custom Apache Directives === The server and directory configurations would be fairly useless without support for adding custom Apache directives for an HLL module. Here we will learn how to define a custom directive. Actual registration of the directive occurs later when you add the module to Apache. A directive can be defined as a Parrot Hash, or any HLL type that implements a keyed-by-string interface. There are five keys for which you need to provide values: * {{{name}}}: the name of the directive as specified in the Apache configuration file * {{{args_how}}}: a constant defining how arguments are processed * {{{func}}}: a reference to a callback subroutine that will process the arguments * {{{req_override}}}: a constant defining where in the Apache configuration file the directive can be used * {{{errmsg}}}: a message displayed when the directive is misused Constants for {{{args_how}}} and {{{req_override}}} are found in the {{{ModParrot;Apache;Constants;table}}} hash from {{{ModParrot/Apache/Constants.pbc}}}. ''Values for {{{args_how}}}'' ||{{{NO_ARGS}}}||no arguments|| ||{{{TAKE1}}}||one argument|| ||{{{TAKE2}}}||two arguments|| ||{{{TAKE3}}}||three arguments|| ||{{{TAKE12}}}||one or two arguments|| ||{{{TAKE23}}}||two or three arguments|| ||{{{TAKE123}}}||one two or three arguments|| ||{{{ITERATE}}}||a list of arguments passed to the callback one at a time|| ||{{{ITERATE2}}}||one argument followed by a list of arguments passed to the callback one at a time, along with the first argument|| ||{{{FLAG}}}||a single On or Off argument, passed to the callback as 0 (off) or 1 (on)|| ||{{{RAW_ARGS}}}||no parsing, passes the entire configuration line to the callback|| ''Values for {{{req_override}}}'' ||{{{ACCESS_CONF}}}||can be used in directory sections, but not in .htaccess files|| ||{{{OR_NONE}}}||cannot be overridden by {{{AllowOverride}}}|| ||{{{OR_ALL}}}||can be used anywhere in the configuration|| ||{{{OR_AUTHCFG}}}||can be used inside directory sections (and .htaccess with the {{{AuthConfig}}} override)|| ||{{{OR_FILEINFO}}}||can be used anywhere (and .htaccess with the {{{FileInfo}}} override)|| ||{{{OR_INDEXES}}}||can be used anywhere (and .htaccess with the {{{Indexes}}} override)|| ||{{{OR_OPTIONS}}}||can be used anywhere (and .htaccess with the {{{Options}}} override)|| ||{{{OR_LIMIT}}}||can be used in directory sections (and .htaccess with the {{{Options}}} override)|| ||{{{RSRC_CONF}}}||can be used outside of a directory section (not allowed in .htaccess)|| ||{{{OR_UNSET}}}||not yet implemented|| ||{{{EXEC_ON_READ}}}||not yet implemented|| The directive hashes should be stored in an array for use during module registration. ''Example: The {{{ParrotHandler}}} definition from the PIR HLL module'' {{{ .local pmc ap_const .local pmc cmd ap_const = get_root_global ['ModParrot'; 'Apache'; 'Constants'], 'table' cmd = new 'Hash' $P0 = new 'String' $P0 = 'ParrotHandler' cmd['name'] = $P0 $P0 = new 'Integer' $P0 = ap_const['TAKE1'] cmd['args_how'] = $P0 cmd['func'] = cmd_parrothandler $P0 = new 'Integer' $P0 = ap_const['OR_AUTHCFG'] cmd['req_override'] = $P0 $P0 = new 'String' $P0 = 'usage: ParrotHandler handler-name' cmd['errmsg'] = $P0 }}} Creating a hash for more than a few directives can be cumbersome in PIR. The PIR HLL module includes a subroutine called {{{new_cmd}}} that does the dirty work for us. It may eventually be moved to {{{ModParrot;Apache;Module}}} and tweaked to be more generic so other modules can use it. ''{{{new_cmd}}} from the PIR HLL module'' {{{ .sub new_cmd .param string name .param string how .param string func .param string override .param string errmsg .local pmc ap_const .local pmc cmd ap_const = get_root_global ['ModParrot'; 'Apache'; 'Constants'], 'table' cmd = new 'Hash' $P0 = new 'String' $P0 = name cmd['name'] = $P0 $P0 = new 'Integer' $P0 = ap_const[how] cmd['args_how'] = $P0 $P0 = get_hll_global [ 'ModParrot'; 'HLL'; 'PIR' ], func cmd['func'] = $P0 $P0 = new 'Integer' $P0 = ap_const[override] cmd['req_override'] = $P0 $P0 = new 'String' $P0 = errmsg cmd['errmsg'] = $P0 .return(cmd) .end # creating a directive hash with new_cmd() $P0 = new_cmd('ParrotHandler', 'TAKE1', 'cmd_parrothandler', 'OR_AUTHCFG', 'usage: ParrotHandler handler-name') }}} We now turn our focus to the callback functions. The prototype for a callback function is: {{{VOID callback(PMC dircfg, PMC args)}}} * {{{dircfg}}}: the current directory configuration * {{{args}}}: an array of arguments In a server scope, {{{dircfg}}} will be populated with that server's document root directory configuration (XXX verify this). Directive callbacks are responsible for updating the module configuration based on the arguments provided from the Apache config. In the case of {{{ParrotHandler}}}, {{{func}}} references the {{{cmd_parrothandler}}} subroutine, which will be called each time the {{{ParrotHandler}}} directive is encountered in the Apache configuration. Since it's a {{{TAKE1}}} callback, {{{args}}} will have one element. When called, it updates the directory configuration with the name of the response handler from {{{args}}}. ''Example: The {{{ParrotHandler}}} directive callback'' {{{ .sub cmd_parrothandler .param pmc dircfg .param pmc args $S0 = args[0] dircfg['response_handler'] = $S0 .end }}} Only the directory configuration is passed to the callback. If it needed to update the server configuration instead (i.e. for {{{ParrotPostConfig}}}), it would call {{{ModParrot;Apache;get_config}}}, which has the following signature: {{{PMC get_config(STRING module_name, PMC per_dir_config)}}} If {{{per_dir_config}}} is omitted, it will return the server configuration. ''Example: The {{{ParrotPostConfig}}} directive callback'' {{{ .sub cmd_parrotpostconfighandler .param pmc mconfig .param pmc args .local pmc cfg, get_config get_config = get_hll_global ['ModParrot'; 'Apache'; 'Module'], 'get_config' cfg = get_config('modparrot_pir_module') $S0 = args[0] cfg['post_config_handler'] = $S0 .end }}} === Accessing mod_parrot's Configuration === TODO == Metahandlers == Metahandlers are responsible for implementing the semantics of an HLL module for each Apache phase. If a module registers a hook with Apache, mod_parrot will call the metahandler for that hook during that phase. The metahandler then executes the actual HLL handler code specified in the configuration and returns the status. In this way, the metahandler acts as a proxy, hiding the implementation details of the HLL module from Apache and mod_parrot. For example, mod_perl6 passes an {{{Apache::RequestRec}}} object to response handlers, but mod_parrot doesn't know that -- it's up to the mod_perl6 metahandler to pass the {{{Apache::RequestRec}}} object to the handler code. === Apache Hooks === The following Apache phases are supported by mod_parrot: ||'''Apache Phase'''||'''{{{ModParrot;Constant;table}}} key'''||'''Metahandler Name'''|| ||Open logs||{{{MP_HOOK_OPEN_LOGS}}}||{{{open_logs_handler}}}|| ||Post-configuration||{{{MP_HOOK_POST_CONFIG}}}||{{{post_config_handler}}}|| ||Child process initialization||{{{MP_HOOK_CHILD_INIT}}}||{{{child_init_handler}}}|| ||Preconnection||{{{MP_HOOK_PRE_CONNECTION}}}||{{{pre_connection_handler}}}|| ||Process connection||{{{MP_HOOK_PROCESS_CONNECTION}}}||{{{process_connection_handler}}}|| ||Post Read Request||{{{MP_HOOK_POST_READ_REQUEST}}}||{{{post_read_request_handler}}}|| ||Map To Storage||{{{MP_HOOK_MAP_TO_STORAGE}}}||{{{map_to_storage_handler}}}|| ||URI Translation||{{{MP_HOOK_TRANS}}}||{{{trans_handler}}}|| ||Input Filter||{{{MP_HOOK_INPUT_FILTER}}}||TBD|| ||Parse headers||{{{MP_HOOK_HEADER_PARSER}}}||{{{header_parser_handler}}}|| ||Access||{{{MP_HOOK_ACCESS}}}||{{{access_handler}}}|| ||Authentication||{{{MP_HOOK_AUTHEN}}}||{{{authen_handler}}}|| ||Authorization||{{{MP_HOOK_AUTHZ}}}||{{{authz_handler}}}|| ||Response||{{{MP_HOOK_RESPONSE}}}||{{{response_handler}}}|| ||Output Filter||{{{MP_HOOK_OUTPUT_FILTER}}}||TBD|| ||MIME Type||{{{MP_HOOK_TYPE}}}||{{{type_handler}}}|| ||Fixups||{{{MP_HOOK_FIXUP}}}||{{{fixup_handler}}}|| ||Logging||{{{MP_HOOK_LOG}}}||{{{log_handler}}}|| Each hook that an HLL module supports must have a corresponding metahandler with the name indicated above. It must be declared in the {{{ModParrot;HLL;hllname}}} namespace. === The Context Object === You might be wondering where metahandlers get information like the previously mentioned {{{Apache::RequestRec}}} object. mod_parrot passes a single {{{ModParrot;Context}}} object as the lone argument to every metahandler. This "context" object contains methods for accessing information relevant to the particular phase of a metahandler. During the response phase for instance, the {{{request_rec}}} method returns the current {{{Apache;RequestRec}}} object. That method would return a NULL PMC during a phase like {{{open_logs}}}. {{{ModParrot;Context}}} provides the following methods: ||Method||Returns|| ||{{{interp}}}||the {{{ModParrot;Interpreter}}} object|| ||{{{request_rec}}}||the current request record as a {{{ModParrot;Apache;RequestRec}}} object|| ||{{{server_rec}}}||the current server record as a {{{ModParrot;Apache;ServerRec}}} object (NOT SUPPORTED UNTIL 0.6)|| ||{{{conn_rec}}}||the current connection record as a {{{ModParrot;Apache;ConnRec}}} object (NOT SUPPORTED UNTIL 0.6)|| ||{{{csd}}}||the current connection descriptor (usually a socket) as an unmanaged PMC (NOT SUPPORTED UNTIL 0.6)|| ||{{{pconf}}}||the configuration pool as an {{{ModParrot;Apache;Pool}}} object|| ||{{{ptemp}}}||the temporary pool as an {{{ModParrot;Apache;Pool}}} object|| ||{{{plog}}}||the log pool as an {{{ModParrot;Apache;Pool}}} object|| ||{{{pchild}}}||the child pool as an {{{ModParrot;Apache;Pool}}} object|| The following table indicates the methods that are in scope for each Apache phase: ||||{{{request_rec}}}||{{{server_rec}}}||{{{conn_rec}}}||{{{csd}}}||{{{pconf}}}||{{{ptemp}}}||{{{plog}}}||{{{pchild}}}|| ||Open logs|| || || || ||X||X||X|| || ||Post-configuration|| || || || ||X||X||X|| || ||Child process initialization||||X||||||||||||X|| ||Preconnection|| || ||X||X|||||||||| ||Process connection||||||X|| |||||||||| ||Post Read Request||X|||||||||||||||| ||Map To Storage||X||X|||||||||||||| ||URI Translation||X||X|||||||||||||| ||Input Filter||X||X|||||||||||||| ||Parse headers||X||X|||||||||||||| ||Access||X||X|||||||||||||| ||Authentication||X||X|||||||||||||| ||Authorization||X||X|||||||||||||| ||Response||X||X|||||||||||||| ||Output Filter||X||X|||||||||||||| ||MIME Type||X||X|||||||||||||| ||Fixups||X||X|||||||||||||| ||Logging||X||X|||||||||||||| === Retrieving Configurations === No configuration information is passed directly to metahandlers, so they should use {{{ModParrot;Apache;get_config}}} to retrieve configuration structures. To retrieve a server configuration: {{{ .local pmc cfg, get_config get_config = get_root_global ['ModParrot';'Apache'], 'get_config' cfg = get_config("my_module_name") }}} To retrieve a directory configuration, use the {{{per_dir_config}}} method of {{{ModParrot;Apache;RequestRec}}}: {{{ .local pmc dircfg, get_config, per_dir_config get_config = get_root_global ['ModParrot';'Apache'], 'get_config' per_dir_config = r.'per_dir_config'() dircfg = get_config("my_module_name", per_dir_config) }}} === I/O Redirection === mod_parrot can tie Parrot I/O operations on standard input and output to an Apache request, emulating CGI behavior. This is useful when you either don't want to expose the Apache API to the language, or the language lacks the features to support it (i.e. no objects). To tie a request to stdin or stdout, use the {{{stdin}}} and {{{stdout}}} methods of {{{ModParrot;Interpreter}}}. * {{{PMC stdin(PMC handle)}}} * {{{PMC stdout(PMC handle)}}} {{{handle}}} is a PMC of one of the following types: * {{{ModParrot;Apache;RequestRec}}} - ties I/O on to the request * {{{FileHandle}}} - assigns the filehandle object Each method returns the previous handle so you can restore it later. You must always restore stdin and stdout before returning from a metahandler. Here is code adapted from Pipp's response metahandler. It ties the request to stdin and stdout, runs the requested PHP script, and restores stdin and stdout to their original filehandles. {{{ interp = ctx.'interp'() php_file = r.'filename'() oldout = interp.'stdout'(r) oldin = interp.'stdin'(r) r.'content_type'("text/html") run_php_file(php_file) interp.'stdout'(oldout) interp.'stdin'(oldin) }}} === Return Values === The return value from a metahandler is passed directly back to Apache, and should be a valid Apache or HTTP status code. Constants for these codes are available from the {{{ModParrot;Apache;Constants;table}}} hash from {{{ModParrot/Apache/Constants.pbc}}}. Please refer to the Apache documentation for details on how each phase reacts to different status codes. === Example Metahandlers === Following are the response handlers for PIR and Perl 6. They perform the exact same function in different languages. You can see how code from a self-hosted module like mod_perl6 is much easier to understand and maintain. ''The PIR HLL module response handler'' {{{ # response handler .sub response_handler .param pmc ctx .local pmc r, handler, cfg, dircfg, get_config, ap_const .local int status ap_const = get_root_global ['ModParrot'; 'Apache'; 'Constants'], 'table' # get the request_rec object r = ctx.'request_rec'() # decline if not our handler $S0 = r.'handler'() if $S0 == 'parrot-code' goto get_configs status = ap_const['DECLINED'] goto return_status get_configs: get_config = get_hll_global ['ModParrot'; 'Apache'; 'Module'], 'get_config' cfg = get_config('modparrot_pir_module') $P0 = r.'per_dir_config'() dircfg = get_config('modparrot_pir_module', $P0) # decline if we have no config in this section unless null dircfg goto get_handler status = ap_const['DECLINED'] .return(status) get_handler: # decline if we have no handler in this section $S0 = dircfg['response_handler'] if $S0 goto run_handler status = ap_const['DECLINED'] .return(status) run_handler: # set our default content type r.'content_type'('text/html') # find the handler sub and call it $P0 = split ';', $S0 get_hll_global handler, $P0, 'handler' status = handler(r) return_status: .return(status) .end }}} ''The mod_perl6 response handler'' {{{ sub response_handler($ctx) { my $r = $ctx.request_rec(); unless ($r.handler() ~~ any()) { return $Apache::Const::DECLINED; } my %cfg = ModParrot::Apache::Module::get_config("modparrot_perl6_module"); my %dircfg = ModParrot::Apache::Module::get_config("modparrot_perl6_module", $r.per_dir_config()); my $handler = %dircfg; $r.content_type('text/html'); my $status = call_handler($handler, $r); return $status; } }}} == Registering the HLL Apache Module == Now that we have all of this information, we can finally register our HLL module. The only missing piece is the name of the module. The mod_parrot convention is {{{modparrot_hllname_module}}}, where {{{hllname}}} is the name of your language (e.g. {{{modparrot_perl6_module}}}). This will prevent name clashes with non-mod_parrot modules for the same language, which is likely for languages like PHP and Python that already have modules. With the name in hand, we can call {{{ModParrot;Apache;add_module}}} to register our module. Its prototype is as follows: {{{VOID add_module(STRING module_name, STRING hll_namespace, PMC command_array, PMC hook_array)}}} {{{hll_namespace}}} is the namespace of your module under {{{ModParrot;HLL}}}. There is no return value from this subroutine; errors that occur when registering a module are fatal and will print an error message and exit. ''Example: Registering the PIR HLL module'' {{{ add_module = get_hll_global [ 'ModParrot'; 'Apache'; 'Module' ], 'add' add_module("modparrot_pir_module", "PIR", cmds, hooks) }}} HINT: If you are registering ALL hooks, use an array with the single value {{{MP_HOOK_ALL}}}, rather than populating the array with every hook constant. == Miscellany == === Persistence === TODO === Cooperating with Other HLL Modules === TODO
17:17 ArrayTasklist edited by kurahaupo
notes on applicability of uninitialized resizable arrays; more details on … (diff)