Version 49 (modified by jhorwitz, 13 years ago) |
---|
mod_parrot Architecture
Overview
This page describes the various subsystems of mod_parrot, and how they all work together. It will track mod_parrot trunk, so it may not accurately describe the latest release.
Terminology
- HLL: high level language (e.g. Rakudo Perl 6, PHP)
- NCI: Parrot's native call interface, used to call C functions from Parrot
- HLL layer: code that implements a particular HLL Apache module
- Metahandler: code that implements a single Apache phase for a particular HLL
- PMC: Parrot Magic Cookie
- Context: a data structure containing information about the current connection
- Interpreter: a Parrot interpreter (Parrot_Interp)
mod_parrot Module
The mod_parrot Apache module is the product of compilation, and is usually named mod_parrot.so on Unix systems. It has a dependency on libparrot.so.
After installation, the module should be loaded as follows in the Apache configuration:
LoadModule parrot_module modules/mod_parrot.so
The module alone provides no HLL layers, and thus no real functionality by itself.
Configuration
mod_parrot itself requires very little configuration, and most of it should be done from the HLL layer. Most tunables have corresponding Apache directives that you can use in httpd.conf.
Initialization
mod_parrot bootstraps itself by loading mod_parrot.pbc. If not otherwise specified, Parrot will look for it in its library search path (see below). You can explicitly specify the path to mod_parrot.pbc with the ParrotInit directive.
Paths
If you are running handlers written in PIR and are including files from the Parrot runtime directory via .include, you will need to set the PARROT_RUNTIME environment variable. This is not required if you have compiled your handlers to PBC.
For libraries loaded via load_bytecode, you can add paths to Parrot's library search path with ParrotIncludePath. The name of this directive is misleading and will eventually be renamed to ParrotLibraryPath.
Loading Code
Two directives are capable of loading code:
ParrotLoad loads PIR or bytecode after startup has completed. This happens during the open_logs phase.
ParrotLoadImmediate causes an early startup during the configuration phase and loads PIR or bytecode immediately. This is required for bootstrapping HLL module code, as they need to register Apache modules during the configuration phase.
Tracing
ParrotTrace is used to set the Parrot tracing level. The default is 0 (disabled). See the Parrot documentation for the various trace flags. Output is sent to stderr, which should appear in your error log. If TraceInit is enabled (see Options), some output may appear on your terminal, as stderr is not redirected to the error log until the open_logs phase.
Options
You can set various flags using ParrotOptions, though in practice most will be set via HLL modules and their configuration directives (not yet supported).
C Define | ParrotOptions equivalent | Default | Description |
MP_OPT_ENABLE | Enable | On | Enables mod_parrot and HLL modules for the main server or virtual host |
MP_OPT_PARENT | Parent | Off | Dedicates a separate context pool for a virtual host |
MP_OPT_TRACEINIT | TraceInit | Off | Trace mod_parrot initialization code (can be verbose) |
mod_parrot stores its configuration in its per-server configuration, which is defined as follows:
struct modparrot_srv_config { apr_pool_t *pool; /* config APR pool */ apr_array_header_t *ctx_pool; /* context pool */ int start_interp; /* # of interps to start */ int minspare_interp; /* mininum spare interps */ int maxspare_interp; /* maximum spare interps */ int max_interp; /* maximum total interps */ char *init_path; /* path to mod_parrot.pbc */ int trace_flags; /* parrot trace flags */ int enable_option_flags; /* used during configuration */ int disable_option_flags; /* used during configuration */ int option_flags; /* OR'd option flags */ char *include_path; /* parrot library path */ apr_array_header_t *preload; /* modules to preload (ParrotLoad) */ apr_array_header_t *module_array; /* array of HLL module indices */ apr_hash_t *module_hash; /* maps HLL module names to module structs */ apr_array_header_t *handler_modules[MP_HOOK_LAST]; /* per-hook module registration */ }; typedef struct modparrot_srv_config modparrot_srv_config;
Server configurations for the main server and virtual servers are merged according to the following rules:
- Options are merged first, as they can influence merging behavior.
- For virtual servers with MP_OPT_PARENT:
- Options are reset with defaults. No options are inherited from the main server.
- The HLL module array and name map are inherited, as they are set during the main server config.
- The library path is inherited unless one is specified.
- Trace flags, init path, and preload array are not inherited.
- For virtual servers without MP_OPT_PARENT:
- Options are merged with with main server. Options explictly set in the virtual server config override the corresponding options from the main server , while options not explicitly set will inherit their values from the main server.
- The HLL module array and name map are inherited, as they are set during the main server config.
- The library path is inherited and cannot be overridden
- Trace flags and the init path override their main server values (XXX init path should not be overridden here)
- The preload array is concatenated with the main server preload array.
mod_parrot has no section/directory configuration (HLL modules do, however).
Contexts
A context is a data structure that maintains state for mod_parrot during the various phases of a request. It is defined as modparrot_context in mod_parrot.h and contains the following members:
- Parrot_Interp interp - a Parrot interpreter bound to this context
- Parrot_Interp parent_interp - the parent interpreter (if any) of interp
- long count - UNUSED
- int locked - is this context in use? 0=available, 1=in use
- Various Apache data structures relevant to this request:
- request_rec *r
- apr_pool_t *pconf
- apr_pool_t *plog
- apr_pool_t *ptemp
- apr_pool_t *pchild
- server_rec *s
- conn_rec *c
- void *csd
- int module_index - identifies the current HLL module (index into the server configuration's module_array)
The block of Apache data structures lists all possible structures that Apache may pass to a handler. As a rule, if any of the Apache data structures are in scope, mod_parrot MUST update the corresponding pointer in the context before calling a metahandler. This ensures that metahandlers can access the proper data structures, as they are given only the context to work with.
Context Pools
Contexts are allocated from context pools, which are populated at startup and tuned dynamically at runtime. There is a single global pool, plus one pool per virtual host configured with +Parent. Virtual hosts without +Parent share the global pool. Contexts contain pointers to Parrot interpreters so this also provides mod_parrot with a pool of interpreters.
Context Lifecycle
Contexts are created at Apache startup during the configuration phase, when interpreters are needed to register various HLL modules and parse directives. An interpreter is always started when a context is created. They are destroyed when an Apache process exits (XXX update for MaxSpareServers, etc.)
Child processes in forking MPMs inherit (via fork()) the context pools created at startup, saving both CPU cycles and memory.
To maintain state, the same context must be used for all phases of a request, as it contains a reference to the interpreter. To accomodate this, each context is bound (and locked) to a particular pool when the context is created, and is unbound (and unlocked) when the pool is destroyed. When a context is needed, code should call init_ctx(server_rec *s, apr_pool_t *p). If p is non-null, init_ctx will either return the context bound to that Apache pool, or bind and return an available context if none is already bound. If p is null, init_ctx will not bind the returned context to a pool, and will assume you will manage its lifecycle on your own. This is useful in phases were a pool might outlive the scope of the context, because the context may never be unlocked. This would quickly exhaust the context pool. An example of this is the child_init phase, whose primary pool lasts for the lifetime of the child process, well beyond the scope of a child_init handler.
When a context is in use, it is locked behind the scenes by reserve_ctx and unlocked by release_ctx. We usually don't call these functions directly, relying on init_ctx and pool cleanup functions to do it for us. However, if you passed a null pool to init_ctx, you MUST call release_ctx after the phase is complete to unlock the context for the next phase.
Apache Interface
TODO
NCI Functions
TODO
Parrot Objects
TODO
Parrot Interface
TODO
NCI Functions
TODO
Parrot Objects
TODO
HLL Modules
In addition to the global mod_parrot Apache module, each HLL layer must register its own Apache module. Having separate Apache modules for each HLL pushes things like configuration management, hook dispatch, and error semantics onto Apache, which is designed to do just those things. This keeps mod_parrot small, generic and flexible.
Module Registration
The following information is required to register a module:
- A unique module name. The convention is modparrot_HLLNAME_module (replace HLLNAME with the name of your HLL)
- The namespace where your module code lives. HLL module code is required to live in a nested namespace under the ModParrot;HLL namespace. If your code lives in ModParrot;HLL;perl6, specify perl6 as your namespace.
- An array of hashes describing the HLL's custom Apache directives (this is optional).
- An array of hook enums that your module will handle.
This information is used to register the module via Apache;Module;add_module(). See the HLL module developer guide for details on its usage.
Module Configuration
Like any other Apache module, HLL modules create their own server and section (directory) configurations. These are PMCs of the module's choosing and are stored in the cfg member of the modparrot_module_config structure:
struct modparrot_module_config { char *name; modparrot_module_info *minfo; Parrot_PMC cfg; }; typedef struct modparrot_module_config modparrot_module_config;
This structure is used by Apache as the HLL module's configuration (one for the server config, and one for the directory config). When the configuration is requested by HLL code however, mod_parrot will fetch the modparrot_module_config structure from Apache and return the cfg member.
Directive Dispatch
TODO
Hooks and Metahandlers
Each HLL module can register a hook for each Apache phase. It is not required to register a hook for every phase, though you must register at least one.
Metahandlers are HLL-specific hooks that implement the semantics of that particular HLL module. For example, mod_perl6 request phase handlers expect to receive an Apache::RequestRec object as the first argument. The mod_perl6 metahandlers would be responsible for making sure that happens. Metahandlers are also responsible for running the actual HLL handler code and returning an appropriate status.
Apache cannot call HLL code directly, so mod_parrot registers generic hook functions that call HLL code on its behalf. These functions are in src/mod_parrot.c and have a modparrot_meta_ prefix.
Since mod_parrot knows nothing about the semantics of the underlying HLL layer, metahandlers must have a consistent signature that is independent of the HLL. They do this by accepting a single argument: the context. The context by definition contains all of the Apache data structures in scope for a particular phase, so the metahandler can pick and choose the structures it needs from the context.
For details on registering hooks and writing metahandlers, see the HLL module developer guide.
Module Scope
Apache makes an obvious but significant assumption about module code. It assumes your code knows the module to which it belongs. So the code for mod_cgi KNOWS it's part of mod_cgi, and, as an example, its response handler can ask for the mod_cgi configuration structure appropriately. So Apache provides no infrastructure for asking it what module code it's currently executing -- it assumes you know who you are.
This is a big problem for mod_parrot, which provides SHARED hooks to support multiple HLL modules. The same hook is called regardless of which HLL module is currently in scope, and Apache won't tell you which module it is. This comes into play in three places:
- hook registration
- metahandler callbacks
- cleanup handlers
The single function register_meta_hooks is called to register an HLL module's hooks with Apache. Since this function is called for every HLL module, we can maintain state within the function by using a static variable. In particular, that variable can be an index into the global module array in the mod_parrot server config. And since it's only run once at startup, we can initialize it to zero at the first invocation:
/* initialize the index to 0 for the first module only */ static int module_index = 0;
and retrieve the module structure:
mpcfg = ap_get_module_config(our_server->module_config, &parrot_module); modp = ((module **)mpcfg->module_array->elts)[module_index];
and finally increment the index for the next call:
/* prepare for the next module */ module_index++;
Metahandler callbacks use a similar indexing scheme, but the situation is more complex. Not every HLL module registers every hook, so each hook must have its own module array. This also means that a separate index must be used for each phase. And finally, unlike registering the hooks, the actual hooks can be called more than once, so we must somehow reset the index for each phase.
The per-hook module array is maintained in the mod_parrot server config:
apr_array_header_t *handler_modules[MP_HOOK_LAST];
The index is maintained in the context (module_index). We do this because a context is bound to the current connection and can maintain the index while not interfering with indexes from other connections. The code to actually retrieve the next indexed module is complex, so it's been refactored to a single macro, NEXT_HANDLER_MODULE, which mod_parrot calls for each metahandler. It takes a single argument, the hook:
/* get next module in line */ modp = NEXT_HANDLER_MODULE(MP_HOOK_PRE_CONNECTION);
The only remaining problem is how to reset the index for each connection. mod_parrot registers a hook for every Apache phase with the APR_HOOK_REALLY_FIRST flag so the mod_parrot hook always runs first. It does this to reset the index before any other modules can run. Here's the actual mod_parrot pre_connection handler, which does nothing but reset the index:
static int modparrot_pre_connection_handler(conn_rec *c, void *csd) { modparrot_context *ctxp; /* initialize context */ if (!(ctxp = init_ctx(c->base_server, c))) { MPLOG_ERROR(c->base_server, "context initialization failed"); return HTTP_INTERNAL_SERVER_ERROR; } /* we're REALLY_FIRST, so reset the module index */ ctxp->module_index = -1; /* clean up */ release_ctx(ctxp); /* we only do setup */ return DECLINED; }
Cleanup handlers are not real Apache handlers, as there is no "cleanup" phase in Apache. It's actually a cleanup callback for the request pool. Fortunately, we can provide a structure that will be passed to the callback function, and that structure can contain information about the module that registered the function:
struct modparrot_cleanup_info { module *module; /* the HLL module that registered us */ Parrot_PMC callback; /* callback subroutine */ Parrot_PMC hll_data; /* a PMC to pass to the callback */ apr_pool_t *pool; /* the pool for which the cleanup was registered */ server_rec *s; /* used for context init */ }; typedef struct modparrot_cleanup_info modparrot_cleanup_info;
We use the actual Apache module struct instead of the usual module_index idiom here, as cleanups are not real Apache hooks and therefore don't have a module array to index in the configuration. More importantly, the timing of cleanup registration and execution is indeterminate, since cleanups can be registered during the request, and pools can be destroyed at various points during a connection.
modparrot_meta_request_cleanup is a generic callback function that uses this data to call callback and pass it the hll_data PMC.
Multiprocessing Module Support
Prefork MPM
In a non-threaded (prefork) MPM, each context pool in a process contains only one context, and cannot be grown or shrunk. There may be multiple context pools depending on the virtual host configuration. As each process is running at most one interpreter, there are no concurrency issues with this MPM.
Threaded MPMs (e.g. worker)
In a threaded MPM such as the worker MPM, each context pool in a process contain multiple contexts and can be grown or shrunk dynamically. Due to concurrency issues and the need to maintain state during all phases of a request, contexts are locked and bound to individual connections for the lifetime of those connections.
Threads in Parrot are implemented using multiple interpreters (one interpreter per thread). In Apache, each connection gets its own thread, and mod_parrot assigns a single context to that thread. Therefore, as long as proper locks are maintained on the context pool and the contexts themselves, mod_parrot should be thread-safe with respect to a threaded MPM.
mod_parrot uses one mutex, which is used to control access to the context pool: apr_thread_mutex_t *ctx_pool_mutex; (from src/context.c)