Version 9 (modified by kurahaupo, 12 years ago)

--

Current Array Types

At the moment, we have these Array types, in addition to the myriad types that provide array-like interfaces:

  • Array
  • FixedBooleanArray
  • ResizableBooleanArray
  • FixedIntegerArray
  • ResizableIntegerArray
  • FixedStringArray
  • ResizableStringArray
  • FixedPMCArray
  • ResizablePMCArray

Questions

Fixed Arrays

  • Should fixed-sized arrays support push, pop, shift and unshift?
    • (probably not)
  • Should there be special cases for very small arrays, such that no separate memory block is needed? These could be useful to implement vectors in mathematical operations, and tuples generally.
  • Should fixed-sized arrays automatically become resizable arrays when resizing operations are attempted?
    • (maybe)

Resizable Arrays

  • Do any HLL's use the current resizable array types without extending them to include initialization of unset elements?

They were created as extensions of fixed-sized arrays, extended to support pop, push, shift, unshift and set-integer, but not extended to guarantee initialization. However it seems likely that the lack of initialization is a misplaced optimization that is probably of no use to any HLL, forcing every HLL to override this behaviour. Does anyone implementing a language say otherwise?

  • Is there room to make new types of resizable array that target their use as a queue (optimized for push, pop, shift & unshift) rather than for indexability?

The current versions implement shift and unshift by doing bulk memmove, which is quite expensive. A ring-buffer approach that uses head and tail indexes would be far more efficient, if that's what we're really targetting by having the content uninitialized.

  • Is there room to make new types of resizable array that target raw buffer use and don't allow push, pop, shift and unshift?
  • The current implementations of resizable array PMC's are derived from fixed-sized array PMC's. However the (current) use of the "size" field -- provided by the base PMC -- is arguably incorrect, in that it does not track the allocated size as it would for the fixed-sized version. Instead of changing the meaning of "size" in the base PMC, we should call it "allocated_size", and then in the derived PMC have "logical_size". This means that the memory-safety of the "allocated_size" bound is then implemented only once, within the base PMC.
  • A general-purpose resizable array would need these attrs:
    • allocated_size
    • initialized_size
    • logical_size
    • index_offset (used to make shift and unshift operate efficiently, not to allow arbitrary base indexing)
  • My dictionaries can't agree whether the spelling is "resizable" or "resizeable"; if the latter, should we fix the spelling now, later, or never?

Role separation

Arrays are overloaded to perform several roles:

  • Indexable ("Hashes with Integer Keys")
  • Stack (Push and Pop, or Shift and Unshift)
  • Queue (Push and Shift, or Unshift and Pop)

Should we have separate PMC's that optimize for each of these roles?

Tasks

Resizable Arrays

  • Update all ResizableArrays to perform initialization of unset elements, in particular those that re-appear when an array is shrunk and then re-extended. Assuming of course no HLL's would suffer significantly from this "pessimization".
  • Array types should grow by powers of two (or multiples of system page size for large arrays) for improved allocation performance
  • Deprecate and remove Array PMC type.

Related Tickets

  • #1089 has multiple patches (primary patch converts tests to PIR (already applied) but also includes patch to implement and test zero-fill)
  • #1039 MMD bug in FixedPMCArray.sort ("no applicable methods")
  • #809 Opcode 'isa' does not accept ResizableStringArray PMC for class ("get_string() not implemented in class 'ResizableStringArray'")
  • #879 isa does not take (correctly) namespace, pmcarray,stringarray
  • #218 can't sort a PIR subclass of an ResizablePMCArray
  • #159 Deprecated: named class/pmc lookup in pir syntax such as new, isa, subclass, get_class, etc