Changes between Initial Version and Version 1 of FixingConstantSTRINGCaching

03/30/10 05:52:36 (7 years ago)

created page


  • FixingConstantSTRINGCaching

    v1 v1  
     1The current constant STRING cache is a mess for several reasons: 
     3 * it's the only consumer of a cstring Hash type that we could otherwise remove 
     4 * hashing cstring keys is time consuming 
     5 * it only handles ASCII STRINGs 
     6 * we can't perform any compilation-time precaching 
     7 * rehashing performance is terrible when we cache a lot of STRINGs 
     8 * we can't easily cache constant STRINGs thawed from PBC, which contain a lot of duplication 
     10We can fix it: 
     12 * Create a new custom data structure with which we can cache these STRINGs easily by their important characteristics (length, encoding type, and contents).  This is likely some kind of binary tree; for best performance characteristics, an AVL tree has the most advantages, as we perform many more fetches than inserts and never delete.  A red-black tree would also be fine if that's easier, but its additional complexity adds few advantages (and is slightly more expensive to read). 
     14 * Rewrite the constant STRING caching mechanism in Parrot_str_new_constant() to use the new cache.  We should index into the tree based on the STRING's characteristics, not any hash key or not the contents of the STRING naively. 
     16 * Add a function to create a new constant STRING given an encoding, a length, and a buffer. 
     18 * Use that new function when thawing STRINGs from PBC to avoid re-creating STRINGs we already have. 
     20 * Remove the cstring hash (optional). 
     22Simple profiling tests done on Rakudo reveal that we can reduce our usage of STRING headers thawed from PBC by more than 80%. 
     24This is a task of moderate difficulty, primarily in creating and debugging the binary tree.  Takers welcome.