| | 1 | The current constant STRING cache is a mess for several reasons: |
| | 2 | |
| | 3 | * it's the only consumer of a cstring Hash type that we could otherwise remove |
| | 4 | * hashing cstring keys is time consuming |
| | 5 | * it only handles ASCII STRINGs |
| | 6 | * we can't perform any compilation-time precaching |
| | 7 | * rehashing performance is terrible when we cache a lot of STRINGs |
| | 8 | * we can't easily cache constant STRINGs thawed from PBC, which contain a lot of duplication |
| | 9 | |
| | 10 | We can fix it: |
| | 11 | |
| | 12 | * Create a new custom data structure with which we can cache these STRINGs easily by their important characteristics (length, encoding type, and contents). This is likely some kind of binary tree; for best performance characteristics, an AVL tree has the most advantages, as we perform many more fetches than inserts and never delete. A red-black tree would also be fine if that's easier, but its additional complexity adds few advantages (and is slightly more expensive to read). |
| | 13 | |
| | 14 | * Rewrite the constant STRING caching mechanism in Parrot_str_new_constant() to use the new cache. We should index into the tree based on the STRING's characteristics, not any hash key or not the contents of the STRING naively. |
| | 15 | |
| | 16 | * Add a function to create a new constant STRING given an encoding, a length, and a buffer. |
| | 17 | |
| | 18 | * Use that new function when thawing STRINGs from PBC to avoid re-creating STRINGs we already have. |
| | 19 | |
| | 20 | * Remove the cstring hash (optional). |
| | 21 | |
| | 22 | Simple profiling tests done on Rakudo reveal that we can reduce our usage of STRING headers thawed from PBC by more than 80%. |
| | 23 | |
| | 24 | This is a task of moderate difficulty, primarily in creating and debugging the binary tree. Takers welcome. |