Version 20 (modified by jimmy, 12 years ago)

change RT #56718 to TT #1293

Current Array Types

At the moment, we have these Array types, in addition to the myriad types that provide array-like interfaces:

  • Array
  • FixedBooleanArray
  • ResizableBooleanArray
  • FixedIntegerArray
  • ResizableIntegerArray
  • FixedStringArray
  • ResizableStringArray
  • FixedPMCArray
  • ResizablePMCArray

Questions

Perlishness

Negative Indices

  • Some array types support the Perlish notion that negative indeces count backwards from the end of an array, while others throw an exception. Which behaviour is preferred?
    • (Which types? Is this tied to the Resizable vs. Fixed? as an HLL author, I don't particularly care, as my language doesn't support straight negative indices anyway. If we're moving to unify this feature, I would again suggest we move to unify more, ala support of shift/push --Coke)
    • (No, it's "random". ResizableBooleanArray does accept negative indeces, while ResizableIntegerArray does not.)

Fixed Arrays

  • Should fixed-sized arrays support push, pop, shift and unshift?
    • (probably not)
  • Should there be special cases for very small arrays, such that no separate memory block is needed? These could be useful to implement vectors in mathematical operations, and tuples generally.
  • Should fixed-sized arrays automatically become resizable arrays when resizing operations are attempted?
    • (maybe)

Resizable Arrays

  • Do any HLL's use the current resizable array types without extending them to include initialization of unset elements?
    • (Yes, partcl, at the moment, uses RPAs without extending them to deal with potentially unset elements --coke)
    • ResizablePMCArray, ResizableStringArray and ResizableBooleanArray already do initialization of elements; this question is really about making ResizableIntegerArray and ResizableFloatArray work the same way.
    • (The Regex engine uses ResizableIntegerArray somewhere -- I'm not sure where, but the compiler died when I had a broken version under development --kurahaupo)

They were created as extensions of fixed-sized arrays, extended to support pop, push, shift, unshift and set-integer, but not extended to guarantee initialization. However it seems likely that the lack of initialization is a misplaced optimization that is probably of no use to any HLL, forcing every HLL to override this behaviour. Does anyone implementing a language say otherwise?

  • Is there room to make new types of resizable array that target their use as a queue (optimized for push, pop, shift & unshift) rather than for indexability?

The current versions implement shift and unshift by doing bulk memmove, which is quite expensive. A ring-buffer approach that uses head and tail indexes would be far more efficient, if that's what we're really targetting by having the content uninitialized.

  • Is there room to make new types of resizable array that target raw buffer use and don't allow push, pop, shift and unshift?
  • Is the inheritance from Fixed to Resizeable wrong?
  • My dictionaries can't agree whether the spelling is "resizable" or "resizeable"; if the latter, should we fix the spelling now, later, or never?

Deprecation of Array

Several commentators have said that Array should be removed.

What, if anything, is it useful for? What's the difference between "Array" and "ResizeablePMCArray" or "FixedPMCArray"? (A note on  RT#56718 says that its only useful feature is its sparseness.)

What needs to be done before it can be removed?

Role separation

Arrays are overloaded to perform several roles:

  • Tuple (A composite value being a list of values, rather than a container)
  • Indexable ("Hashes with Integer Keys")
  • Stack (Push and Pop, or Shift and Unshift)
  • Queue (Push and Shift, or Unshift and Pop)

Should we have separate PMC's that optimize for each of these roles?

(I think the current explosion of Array types was an attempt to optimize that has failed; do we have benchmarks showing that any particular array is faster/more compact/??? than another for a particular usage? Personally, I'd like to see a unification of array types, not a further segmentation --coke)

Thread Safety

Any object that can change its representation is a potential pit-fall for thread-safe programming.

This includes reallocating memory to expand the size of an array. To do this safely means every operation on the array has to be gated through a mutex, which clearly will be very expensive.

Fixed sized arrays, on the other hand, need no such gating, because the location and manner of storing each element is fixed.

With some care it is even possible to support resizable arrays with a fixed allocation bound, provided that one only uses push, pop, unshift, shift and truncate (assign length zero) to change the logical size of the array. It's also a bit simpler (and fast) if only one thread performs shift & unshift, and one other (or possibly the same) thread performs push, pop and truncate.

Tasks

Resizable Arrays

  • Update all ResizableArrays to perform initialization of unset elements, in particular those that re-appear when an array is shrunk and then re-extended. Assuming of course no HLL's would suffer significantly from this "pessimization".
  • Array types should grow by powers of two (or multiples of system page size for large arrays) for improved allocation performance
    • (they already do -- kurahaupo)
  • Deprecate and remove Array PMC type.
  • The current implementations of resizable array PMC's are derived from fixed-sized array PMC's. However the (current) use of the "size" field -- provided by the base PMC -- is arguably incorrect, in that it does not track the allocated size as it would for the fixed-sized version. Instead of changing the meaning of "size" in the base PMC, we should call it "allocated_size", and then in the derived PMC have "logical_size". This means that the memory-safety of the "allocated_size" bound is then implemented only once, within the base PMC.

Refactoring

Refactor all the classes as mix-ins between two groups of classes:

* one group of classes that know the type of each element (int, num, str, pmc, any), and * another group of classes that compute memory addresses for indexed elements, perform resizing (or not), and report number of elements.

(These latter would if necessary target the roles listed above, but a single compromise implementation would also work. -- kurahaupo)

A general-purpose resizable array would need these attrs: * element_size (corresponding to int8, int16, int32, int64, int128, float32, float64, float128, string, pmc & any) * allocated_size * initialized_size * logical_size * index_offset (used to make shift and unshift operate efficiently, not to allow arbitrary base indexing)

A general-purpose fixed-sized array would only need the element size and number of elements.

This split could also form the basis for mapping onto C structs.

Related Tickets

  • #1293 Array PMC freeze/thaw/visit broken
  • #1089 has multiple patches (primary patch converts tests to PIR (already applied) but also includes patch to implement and test zero-fill)
  • #1039 MMD bug in FixedPMCArray.sort ("no applicable methods")
  • #809 Opcode 'isa' does not accept ResizableStringArray PMC for class ("get_string() not implemented in class 'ResizableStringArray'")
  • #879 isa does not take (correctly) namespace, pmcarray,stringarray
  • #218 can't sort a PIR subclass of an ResizablePMCArray
  • #159 Deprecated: named class/pmc lookup in pir syntax such as new, isa, subclass, get_class, etc