Version 12 (modified by kurahaupo, 12 years ago)

notes on applicability of uninitialized resizable arrays; more details on "fully general" arrays

Current Array Types

At the moment, we have these Array types, in addition to the myriad types that provide array-like interfaces:

  • Array
  • FixedBooleanArray
  • ResizableBooleanArray
  • FixedIntegerArray
  • ResizableIntegerArray
  • FixedStringArray
  • ResizableStringArray
  • FixedPMCArray
  • ResizablePMCArray


Fixed Arrays

  • Should fixed-sized arrays support push, pop, shift and unshift?
    • (probably not)
  • Should there be special cases for very small arrays, such that no separate memory block is needed? These could be useful to implement vectors in mathematical operations, and tuples generally.
  • Should fixed-sized arrays automatically become resizable arrays when resizing operations are attempted?
    • (maybe)

Resizable Arrays

  • Do any HLL's use the current resizable array types without extending them to include initialization of unset elements?
    • (Yes, partcl, at the moment, uses RPAs without extending them to deal with potentially unset elements --coke)
    • ResizablePMCArray, ResizableStringArray and ResizableBooleanArray already do initialization of elements; this question is really about making ResizableIntegerArray and ResizableFloatArray work the same way.
    • (The Regex engine uses ResizableIntegerArray somewhere -- I'm not sure where, but the compiler died when I had a broken version under development --kurahaupo)

They were created as extensions of fixed-sized arrays, extended to support pop, push, shift, unshift and set-integer, but not extended to guarantee initialization. However it seems likely that the lack of initialization is a misplaced optimization that is probably of no use to any HLL, forcing every HLL to override this behaviour. Does anyone implementing a language say otherwise?

  • Is there room to make new types of resizable array that target their use as a queue (optimized for push, pop, shift & unshift) rather than for indexability?

The current versions implement shift and unshift by doing bulk memmove, which is quite expensive. A ring-buffer approach that uses head and tail indexes would be far more efficient, if that's what we're really targetting by having the content uninitialized.

  • Is there room to make new types of resizable array that target raw buffer use and don't allow push, pop, shift and unshift?
  • The current implementations of resizable array PMC's are derived from fixed-sized array PMC's. However the (current) use of the "size" field -- provided by the base PMC -- is arguably incorrect, in that it does not track the allocated size as it would for the fixed-sized version. Instead of changing the meaning of "size" in the base PMC, we should call it "allocated_size", and then in the derived PMC have "logical_size". This means that the memory-safety of the "allocated_size" bound is then implemented only once, within the base PMC.
  • A general-purpose resizable array would need these attrs:
    • element_type (int8, int16, int32, int64, int128, float32, float64, float128, string, pmc)
    • allocated_size
    • initialized_size
    • logical_size
    • index_offset (used to make shift and unshift operate efficiently, not to allow arbitrary base indexing)
  • My dictionaries can't agree whether the spelling is "resizable" or "resizeable"; if the latter, should we fix the spelling now, later, or never?

Role separation

Arrays are overloaded to perform several roles:

  • Indexable ("Hashes with Integer Keys")
  • Stack (Push and Pop, or Shift and Unshift)
  • Queue (Push and Shift, or Unshift and Pop)

Should we have separate PMC's that optimize for each of these roles?

(I think the current explosion of Array types was an attempt to optimize that has failed; do we have benchmarks showing that any particular array is faster/more compact/??? than another for a particular usage? Personally, I'd like to see a unification of array types, not a further segmentation --coke)

Thread Safety

Any object that can change its representation is a potential pit-fall for thread-safe programming.

This includes reallocating memory to expand the size of an array. To do this safely means every operation on the array has to be gated through a mutex, which clearly will be very expensive.

Fixed sized arrays, on the other hand, need no such gating, because the location and manner of storing each element is fixed.

With some care it is even possible to support resizable arrays with a fixed allocation bound, provided that one only uses push, pop, unshift, shift and truncate (assign length zero) to change the logical size of the array. It's also a bit simpler (and fast) if only one thread performs shift & unshift, and one other (or possibly the same) thread performs push, pop and truncate.


Resizable Arrays

  • Update all ResizableArrays to perform initialization of unset elements, in particular those that re-appear when an array is shrunk and then re-extended. Assuming of course no HLL's would suffer significantly from this "pessimization".
  • Array types should grow by powers of two (or multiples of system page size for large arrays) for improved allocation performance
    • (they already do -- kurahaupo)
  • Deprecate and remove Array PMC type.

Related Tickets

  • #1089 has multiple patches (primary patch converts tests to PIR (already applied) but also includes patch to implement and test zero-fill)
  • #1039 MMD bug in FixedPMCArray.sort ("no applicable methods")
  • #809 Opcode 'isa' does not accept ResizableStringArray PMC for class ("get_string() not implemented in class 'ResizableStringArray'")
  • #879 isa does not take (correctly) namespace, pmcarray,stringarray
  • #218 can't sort a PIR subclass of an ResizablePMCArray
  • #159 Deprecated: named class/pmc lookup in pir syntax such as new, isa, subclass, get_class, etc