NCI Tasks

The current Parrot NCI system has a number of weaknesses; the following is a list of the major areas to be addressed:

Generalize Function Handling

  • Generalize the callback subsystem (see TT #1192)
  • Support GetProcAddress-style functions (library functions that return addresses for other library functions)

Improve Type Handling

  • Basic types
    • long long (see TT #1182)
    • Types larger than native machine word (long double, 64 bit int on x86, 128 bit int on x86_64, etc.)
    • Exact sized types, including smaller types (16 bit int and float, 32 bit float, etc.)
    • Wide characters
    • Endian conversion
    • Handle signed/unsigned properly
  • Structures, unions, and arrays
    • Bit packed fields
    • Unions
    • Less painful and *faster* API for structures
    • Proper handling of packed single- and multi-dimensional arrays
    • Clean SoA and AoS handling (to arbitrary depth)
  • Buffers and protocol data
    • Counted strings
    • Padded strings/buffers
    • Zero-copy R/W access to buffer contents

Improve documentation

Discussion

The above tasks were discussed  in #parrot on 2009-11-05, leading to the proposed solutions below.

Proposed solutions

  • Create and document pluggable (configure-time) framebuilder system
    • 2 varieties: call-out (current framebuilder), and call-in (callbacks)
    • implementations are not required to handle all signature strings (but document where they lack functionality). notably, the fallback static builder only handles a predefined set of strings (configure-time modifiable)
    • for callbacks without an opaque userdata parameter, call-in framebuilders will DTRT if they can, and explode otherwise
    • Fixes:
      • callback problems
      • external ffi/jit library feature-envy
  • replace current signature strings with new, more general strings and/r structured definitions
    • more general strings:
      • similar to pcc signatures ("args->retv"), each position's type is defined by an upper-case character followed by non-upper-case-modifiers
      • types: I => integer, F => float, S => string, P => pointer/buffer, ... others?
      • modifiers: & => pass by ref (PMC required), various => size modifier (size defaults to parrot register type)
      • examples: I => INTVAL, Is => short, I[32] => int32, Ic& => pass by reference char (from inside an Integer PMC)
      • transition strategy: initially require new-style signature strings to be prefixed by a character not in either scheme (eg: '+')
    • structured definitions:
      • similar to struct handling API
      • transition strategy: different types (PMC vs String) for signatures => different operations => no conflict
    • fixes (potentially):
      • long long
      • exact sized types
      • wide chars
      • endianness
      • signedness
    • what needs to be done:
      • make src/pmc/nci.pmc understand the new signatures
      • make tools/dev/nci_thunk_gen.pir understand the new signatures
      • deprecation notice (if necessary)
  • allow more direct PIR access to NCI PMC (particularily creation)
    • example:
  $P0 = get_c_pointer_somehow()
  $P1 = new 'NCI'
  $P1['FooBarSig'] = $P0
  • fixes:
    • GetProcAddress woes
  • replace/augment (un)?managed struct with a struct type factory (JIT or a set of types that efficiently handle common cases)
    • could unify type signatures with NCI strings (probably more of an NCI change than a struct handling change)
    • fixes:
      • structure/union/array problems

The 2.3 Plan

This plan is to be implemented in 2 phases - as much as possible before the 2.3 release, immediately after the 2.3 release for things require deprecation.

This change is focused around signature strings. The same strings will be used to describe dlfunc signatures and UnManagedStruct layout.

Additionally the PMCs involved in NCI - NCI and UnManagedStruct, will have their implementations simplified by removing functionality.

  • NCI will no longer permit calling raw pointers. A separate PCCNativeSub class will handle native PMC methods.
  • UnManagedStruct will no longer support named element lookup, only by index. Lookup by name functionality can be obtained by wrapping a hash of ints and an unmanaged struct in an object.

Callbacks will also be slightly improved. new_callback_PPPS will be eliminated, new_callback_PPPPS will take its place (the extra P is used to not interfere with user-supplied userdata). A speculative new_callback_PPS op will be documented, but not implemented until a closure-based callback system is available (eg: provided by libffi).

Signature String Description

The new NCI signature strings resemble PCC signature strings. They are composed of a sequence of elements, each element consisting of an upper-case category character followed by zero or more non-upper-case modifier characters. The categories are 'I' for int-ish things, 'N' for float-ish things, 'S' for stringy things, and 'P' for more complicated or special, PMC-ish things. Most modifiers are specific to categories and are described bellow. Some modifiers are incompatible with others; where this occurs, it will be documented.

For dlfunc routines, signatures contain a mandatory marker "->" to separate argument types (before marker), from return type (after marker).

Universal Modifiers

Pass By Reference: 'r'

Integer Modifiers

By default, is an INTVAL with no modifications. Modifiers:

  • sizes:
    • 'c' => char, 's' => short, 'i' => int, 'l' => long, 'll' => long long
    • '3' => 8 bits, '4' => 16 bits, '5' => 32 bits, '6' => 64 bits, '7' => 128 bits
  • sign: 'u' => unsigned, '+' => signed
  • byteorder: 'n' => network order (big-endian), 'v' => VAX order (little-endian)
  • pointers: 'p' => data pointer, 'f' => function pointer
  • bit-packed fields: '{' \d+ ( ',' \d+ )* '}' => unsigned bitfields of specified sizes (treated as separate elements)

Number Modifiers

By default, is a FLOATVAL with no modifications. Modifiers:

  • size:
    • 'f' => float, 'd' => double, ld => 'long double'
    • '4' => 16 bits, '5' => 32 bits, '6' => 64 bits, '7' => 128 bits

String Modifiers

By default, is a Parrot_String with no modifications. Modifiers:

  • native conversions:
    • 'cu' => unmanaged (expects called code to free) null-terminated string
    • 'cm' => managed (parrot will free) null-terminated string
    • 'p'[um][3456] => managed or unmanaged counted (pascal) string. Count size is given by digit and is always unsigned. '3' => 8 bits, ..., '6' => 64 bits
  • raw buffer: 'b'
    • allows in-place modification

PMC Modifiers

By default, is a Parrot_PMC with no modifications. Modifiers:

  • 'i' => invocant
  • 'z' => null pointer (does not count as an element)
  • 'j' => current interpreter (does not count as an element)
  • 'u' => userdata parameter (used for callbacks)

Useful Information

Some useful information has been compiled at CFFI