Version 17 (modified by jhorwitz, 13 years ago)

--

mod_parrot Architecture

NOTE: This is a work in progress.

Overview

This page describes the various subsystems of mod_parrot, and how they all work together. It will track mod_parrot trunk, so it may not accurately describe the latest release.

Terminology

  • HLL: high level language (e.g. Rakudo Perl 6, PHP)
  • NCI: Parrot's native call interface, used to call C functions from Parrot
  • HLL layer: code that implements a particular HLL Apache module
  • Metahandler: code that implements a single Apache phase for a particular HLL
  • PMC: Parrot Magic Cookie
  • Context: a data structure containing information about the current connection
  • Interpreter: a Parrot interpreter (Parrot_Interp)

mod_parrot Module

The mod_parrot Apache module is the product of compilation, and is usually named mod_parrot.so on Unix systems. It has a dependency on libparrot.so.

After installation, the module should be loaded as follows in the Apache configuration:

LoadModule parrot_module modules/mod_parrot.so

The module alone provides no HLL layers, and thus no real functionality by itself.

Configuration

Contexts

A context is a data structure that maintains state for mod_parrot during the various phases of a request. It is defined as modparrot_context in mod_parrot.h and contains the following members:

  • Parrot_Interp interp - a Parrot interpreter bound to this context
  • Parrot_Interp parent_interp - the parent interpreter (if any) of interp
  • long count - UNUSED
  • int locked - is this context in use? 0=available, 1=in use
  • Various Apache data structures relevant to this request:
    • request_rec *r
    • apr_pool_t *pconf
    • apr_pool_t *plog
    • apr_pool_t *ptemp
    • apr_pool_t *pchild
    • server_rec *s
    • conn_rec *c
    • void *csd
  • int module_index - identifies the current HLL module (index into the server configuration's module_array)
  • int pool_index - index into the context pool array

The block of Apache data structures lists all possible structures that Apache may pass to a handler. As a rule, if any of the Apache data structures are in scope, mod_parrot MUST update the corresponding pointer in the context before calling a metahandler. This ensures that metahandlers can access the proper data structures, as they are given only the context to work with.

Context Pools

Contexts are allocated from context pools, which are populated at startup and tuned dynamically at runtime. There is a single global pool, plus one pool per virtual host configured with +Parent. Virtual hosts without +Parent share the global pool. Contexts contain pointers to Parrot interpreters so this also provides mod_parrot with a pool of interpreters. Contexts are aware of their slot in the pool array (see pool_index).

Context Lifecycle

Contexts are created at Apache startup during the configuration phase, when interpreters are needed to register various HLL modules and parse directives. An interpreter is always started when a context is created.

XXX Forking?

To maintain state, the same context must be used for all phases of a request, as it contains a reference to the interpreter. When a context is needed, code should call init_ctx(server_rec *s, conn_rec *c), which will return an available context from the pool. If c is non-null, init_ctx will either return the context bound to that connection, or bind and return an available context if none is already bound. Code in hooks that run before the pre-connection phase should pass a NULL connection.

When a context is in use, it is locked behind the scenes by reserve_ctx. While code should not call this function directly, it MUST call release_ctx after each phase is complete to unlock the context. This may change in the future, as connection binding requires the use of the same context, so all this locking and unlocking is just overhead.

When the request/connection is complete, it is unlocked and disassociated from the connection by modparrot_conn_cleanup, a connection-scope pool cleanup handler.

Apache Interface

NCI Functions

Parrot Objects

HLL Modules

Module Registration

Metahandlers

Tracking

Apache makes an obvious but significant assumption about module code. It assumes your code knows the module to which it belongs. So the code for mod_cgi KNOWS it's part of mod_cgi, and, as an example, its response handler can ask for the mod_cgi configuration structure appropriately. So Apache provides no infrastructure for asking it what module code it's currently executing -- it assumes you know who you are.

This is a big problem for mod_parrot, which provides SHARED hooks to support multiple HLL modules. The same hook is called regardless of which HLL module is currently in scope, and Apache won't tell you which module it is. This comes into play in three places:

  • hook registration
  • metahandler callbacks
  • cleanup handlers

The single function register_meta_hooks is called to register an HLL module's hooks with Apache. Since this function is called for every HLL module, we can maintain state within the function by using a static variable. In particular, that variable can be an index into the global module array in the mod_parrot server config. And since it's only run once at startup, we can initialize it to zero at the first invocation:

    /* initialize the index to 0 for the first module only */
    static int module_index = 0;

and retrieve the module structure:

    mpcfg = ap_get_module_config(our_server->module_config, &parrot_module);
    modp = ((module **)mpcfg->module_array->elts)[module_index];

and finally increment the index for the next call:

    /* prepare for the next module */
    module_index++;

Metahandler callbacks use a similar indexing scheme, but the situation is more complex. Not every HLL module registers every hook, so each hook must have its own module array. This also means that a separate index must be used for each phase. And finally, unlike registering the hooks, the actual hooks can be called more than once, so we must somehow reset the index for each phase.

The per-hook module array is maintained in the mod_parrot server config:

apr_array_header_t *handler_modules[MP_HOOK_LAST];

The index is maintained in the context (module_index). We do this because a context is bound to the current connection and can maintain the index while not interfering with indexes from other connections. The code to actually retrieve the next indexed module is complex, so it's been refactored to a single macro, NEXT_HANDLER_MODULE, which mod_parrot calls for each metahandler. It takes a single argument, the hook:

    /* get next module in line */
    modp = NEXT_HANDLER_MODULE(MP_HOOK_PRE_CONNECTION);

The only remaining problem is how to reset the index for each connection. mod_parrot registers a hook for every Apache phase with the APR_HOOK_REALLY_FIRST flag so the mod_parrot hook always runs first. It does this to reset the index before any other modules can run. Here's the actualy mod_parrot pre_connection handler, which does nothing but reset the index:

static int modparrot_pre_connection_handler(conn_rec *c, void *csd)
{
    modparrot_context *ctxp;

    /* initialize context */
    if (!(ctxp = init_ctx(c->base_server, c))) {
        MPLOG_ERROR(c->base_server, "context initialization failed");
        return HTTP_INTERNAL_SERVER_ERROR;
    }

    /* we're REALLY_FIRST, so reset the module index */
    ctxp->module_index = -1;

    /* clean up */
    release_ctx(ctxp);

    /* we only do setup */
    return DECLINED;
}

Cleanup handlers are not real Apache handlers, as there is no "cleanup" phase in Apache. It's actually a cleanup callback for the request pool. Fortunately, we can provide a structure that will be passed to the callback function, and that structure can contain information about the module that registered the function:

struct modparrot_cleanup_info {
    int module_index;    /* the HLL module that registered us */
    Parrot_PMC callback; /* callback subroutine */
    Parrot_PMC hll_data; /* a PMC to pass to the callback */
    union {              /* internal stuff for mod_parrot */
        conn_rec *c;
        request_rec *r;
    } data;
};
typedef struct modparrot_cleanup_info modparrot_cleanup_info;

module_index lets us know which module registered us. callback is the Sub PMC that should be invoked, and hll_data is a PMC that should be passed to the callback. modparrot_meta_request_cleanup is a generic callback function that uses this data to set the module index and call the subroutine.

Multiprocessing Module Support

Prefork MPM

In a non-threaded (prefork) MPM, each context pool in a process contains only one context, and cannot be grown or shrunk. There may be multiple context pools depending on the virtual host configuration. As each process is running at most one interpreter, there are no concurrency issues with this MPM.

Threaded MPM (e.g. worker)

In a threaded MPM such as the worker MPM, each context pool in a process contain multiple contexts and can be grown or shrunk dynamically. Due to concurrency issues and the need to maintain state during all phases of a request, contexts are locked and bound to individual connections for the lifetime of those connections.

Threads in Parrot are implemented using multiple interpreters (one interpreter per thread). In Apache, each connection gets its own thread, and mod_parrot assigns a single context to that thread. Therefore, as long as proper locks are maintained on the context pool and the contexts themselves, mod_parrot should be thread-safe with respect to a threaded MPM.

mod_parrot uses one mutex, which is used to control access to the context pool: apr_thread_mutex_t *ctx_pool_mutex; (from src/context.c)