Version 26 (modified by kurahaupo, 5 years ago)

Link to WhiteKnight++ 's Array splintering on GitHub

Current Array Types

At the moment, we have these Array types, in addition to the myriad types that provide array-like interfaces:

  • Array
  • FixedBooleanArray
  • ResizableBooleanArray
  • FixedIntegerArray
  • ResizableIntegerArray
  • FixedStringArray
  • ResizableStringArray
  • FixedPMCArray
  • ResizablePMCArray

Questions

Perlishness

Negative Indices

  • Some array types support the Perlish notion that negative indeces count backwards from the end of an array, while others throw an exception. Which behaviour is preferred?
    • (Which types? Is this tied to the Resizable vs. Fixed? as an HLL author, I don't particularly care, as my language doesn't support straight negative indices anyway. If we're moving to unify this feature, I would again suggest we move to unify more, ala support of shift/push --Coke)
      • (No, it's "random". ResizableBooleanArray does accept negative indeces, while ResizableIntegerArray does not.)

Resizable Arrays

  • Do any HLL's use the current resizable array types without extending them to include initialization of unset elements?
    • (Yes, partcl, at the moment, uses RPAs without extending them to deal with potentially unset elements --coke)
    • ResizablePMCArray, ResizableStringArray and ResizableBooleanArray already do initialization of elements; this question is really about making ResizableIntegerArray and ResizableFloatArray work the same way.
    • (The Regex engine uses ResizableIntegerArray somewhere -- I'm not sure where, but the compiler died when I had a broken version under development --kurahaupo)

They were created as extensions of fixed-sized arrays, extended to support pop, push, shift, unshift and set-integer, but not extended to guarantee initialization. However it seems likely that the lack of initialization is a misplaced optimization that is probably of no use to any HLL, forcing every HLL to override this behaviour. Does anyone implementing a language say otherwise?

  • Is there room to make new types of resizable array that target their use as a queue (optimized for push, pop, shift & unshift) rather than for indexability?

The current versions implement shift and unshift by doing bulk memmove, which is quite expensive. A ring-buffer approach that uses head and tail indexes would be far more efficient, if that's what we're really targetting by having the content uninitialized.

  • Is there room to make new types of resizable array that target raw buffer use and don't allow push, pop, shift and unshift?
  • Is the inheritance from Fixed to Resizeable wrong?
  • My dictionaries can't agree whether the spelling is "resizable" or "resizeable"; if the latter, should we fix the spelling now, later, or never?

Fixed-sized Arrays

  • Should fixed-sized arrays support push, pop, shift, unshift, splice and resize?
    • (probably not)
  • If so, should they automatically be mutated into the corresponding resizable type when these operations are attempted?
    • (probably)
  • Should there be special cases for very small arrays, such that no separate memory block is needed? These could be useful to implement vectors in mathematical operations, and tuples generally.

Invariant Arrays

Invariant arrays are one way to implement the notion of "an array as a (composite) value" rather than "an array as a container". Optimizing for this case brings huge benefits to single-assignment languages.

Invariant arrays are effectively a special case of fixed-sized arrays, only more stringent: not only is the array structure fixed, but the values of the elements within it are also fixed. Operations such as shift, unshift, push, pop, splice, resize and replace-element would work, if at all, by generating a new array, rather than altering the old one. This is easy to support for native types (Integer, Float & String) but would require additional support for invariance within PMCs.

An good implementation would probably need:

  • A "write only" flag, for arrays that are under construction
  • A wrapper "container" PMC that simply references its corresponding "value" PMC
  • A "radix tree" approach to storage, to avoid excessively heavy copying when generating new arrays as a result of the usual mutator methods

(What else? Any other thoughts?)

Universal Ancestral "Array"

It has been suggested (in #1416) that we should have a common ancestor class for all array-like types, to facilitate adding new common methods to them all. Perhaps a better alternative would be multiple "base classes" representing the various aspects of an array (see next section).

Role separation

Arrays are overloaded to perform several roles:

  • Tuple (A composite value being a list of values, rather than a container)
  • Indexable ("Hashes with Integer Keys")
  • Stack (Push and Pop, or Shift and Unshift)
  • Queue (Push and Shift, or Unshift and Pop)

Should we have separate PMC's that optimize for each of these roles?

(I think the current explosion of Array types was an attempt to optimize that has failed; do we have benchmarks showing that any particular array is faster/more compact/??? than another for a particular usage? Personally, I'd like to see a unification of array types, not a further segmentation --coke)

Thread Safety

Any object that can change its representation is a potential pit-fall for thread-safe programming.

This includes reallocating memory to expand the size of an array. To do this safely means every operation on the array has to be gated through a mutex, which clearly will be very expensive.

Fixed sized arrays, on the other hand, need no such gating, because the location and manner of storing each element is fixed.

With some care it is even possible to support resizable arrays with a fixed allocation bound, provided that one only uses push, pop, unshift, shift and truncate (assign length zero) to change the logical size of the array. It's also a bit simpler (and fast) if only one thread performs shift & unshift, and one other (or possibly the same) thread performs push, pop and truncate.

Tasks

Resizable Arrays

  • Update all ResizableArrays to perform initialization of unset elements, in particular those that re-appear when an array is shrunk and then re-extended. Assuming of course no HLL's would suffer significantly from this "pessimization".
  • Array types should grow by powers of two (or multiples of system page size for large arrays) for improved allocation performance
    • (they already do -- kurahaupo)
  • The current implementations of resizable array PMC's are derived from fixed-sized array PMC's. However the (current) use of the "size" field -- provided by the base PMC -- is arguably incorrect, in that it does not track the allocated size as it would for the fixed-sized version. Instead of changing the meaning of "size" in the base PMC, we should call it "allocated_size", and then in the derived PMC have "logical_size". This means that the memory-safety of the "allocated_size" bound is then implemented only once, within the base PMC.

Refactoring

Refactor all the classes as mix-ins between two groups of classes:

  • one group of classes that know the type of each element (int, num, str, pmc, any), and
  • another group of classes that compute memory addresses for indexed elements, perform resizing (or not), and report number of elements.

(These latter would if necessary target the roles listed above, but a single compromise implementation would also work. -- kurahaupo)

A general-purpose resizable array would need these attrs:

  • element_size (corresponding to int8, int16, int32, int64, int128, float32, float64, float128, string, pmc & any)
  • allocated_size
  • initialized_size
  • logical_size
  • index_offset (used to make shift and unshift operate efficiently, not to allow arbitrary base indexing)

A general-purpose fixed-sized array would only need the element size and number of elements.

This split could also form the basis for mapping onto C structs.

Related Tickets

  • #1421 RFC: [RFC] zero length FxA behaviour
  • #1416 RFC: Unreasonably painful to add a method to all *Array PMCs
  • #1399 Array unshift/access broken
  • #1356 todo: [TODO] add method FixedStringArray.sort()
  • #1332 RFC: change get_string on FPA, RPA (et al?)
  • #1317 todo: t/pmc/fixedpmcarray.t: fix to work with prederef of JIT
  • #1303 todo: [DEPRECATION] Array PMC
  • #1295 RFC: Should FixedPMCArray autovivify nested arrays on set_*keyed()?
  • #1293 Array PMC freeze/thaw/visit broken
  • #1089 has multiple patches (primary patch converts tests to PIR (already applied) but also includes patch to implement and test zero-fill)
  • #1039 MMD bug in FixedPMCArray.sort ("no applicable methods")
  • #809 Opcode 'isa' does not accept ResizableStringArray PMC for class ("get_string() not implemented in class 'ResizableStringArray'")
  • #879 isa does not take (correctly) namespace, pmcarray,stringarray
  • #218 can't sort a PIR subclass of an ResizablePMCArray
  • #159 Deprecated: named class/pmc lookup in pir syntax such as new, isa, subclass, get_class, etc

GIT fork

 http://wiki.github.com/Whiteknight/parrot-data-structures/