Ticket #726 (closed patch: done)
[PATCH] Reduce calls to mem_sys_allocate() when creating a Hash
Reported by: | Infinoid | Owned by: | Infinoid |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | none | Version: | 1.2.0 |
Severity: | medium | Keywords: | |
Cc: | Language: | ||
Patch status: | applied | Platform: |
Description
parrot_create_hash calls the allocator twice; once when allocating the Hash structure, and one when allocating the array of HashBuckets. Consolidating this saves us a lot of time.
One problem I ran into is reallocation. I chose not to realloc the original buffer, because the pointer to it may have changed, and this code is not able to find and fix up the caller's pointer(s). Since the pointer to the top level Hash structure cannot change, this code just leaves that buffer alone, and allocates a secondary one as necessary when expanding the HashBucket array.
The patch works by modifying the creation function, consolidating the two allocations (hash-struct and buckets) into a single buffer. There are two additional places the patch touches; expand_hash and parrot_hash_destroy. Both of these functions check whether the HashBucket pointer is at a static offset from the Hash pointer or not, in determining whether a separate buffer is used. If the destruction function detects a separate buffer, it makes a separate call to free(). If the expansion function detects a separate buffer, it realloc()s it, otherwise it allocates a new buffer and copies the old data into it. The extra space after the end of the Hash structure is unused in this case, so this patch will hurt memory performance slightly in the case where we have lots of long-standing and large hashes (with more than the default number of buckets).
Anyway, this patch saves about 10% of the total processing time in a simple while-loop benchmark in rakudo. Unfortunately, it does make the allocation model rather more complicated. Anyone have any suggestions for how to make it simpler/nicer/more efficient?
Here's the results of a microbenchmark (both sides of this comparison also have the TT #725 patch applied).
Before:
./perl6 -e 'my $i = 0; while ( $i < 100000) { $i++ }' 38.54s user 1.40s system 99% cpu 40.211 total ./perl6 -e 'my $i = 0; while ( $i < 100000) { $i++ }' 37.98s user 1.49s system 98% cpu 40.193 total ./perl6 -e 'my $i = 0; while ( $i < 100000) { $i++ }' 38.57s user 1.46s system 99% cpu 40.349 total ./perl6 -e 'my $i = 0; while ( $i < 100000) { $i++ }' 39.16s user 1.33s system 99% cpu 40.864 total ./perl6 -e 'my $i = 0; while ( $i < 100000) { $i++ }' 37.91s user 1.53s system 98% cpu 40.064 total
After:
./perl6 -e 'my $i = 0; while ( $i < 100000) { $i++ }' 34.90s user 1.47s system 99% cpu 36.682 total ./perl6 -e 'my $i = 0; while ( $i < 100000) { $i++ }' 34.52s user 1.40s system 98% cpu 36.293 total ./perl6 -e 'my $i = 0; while ( $i < 100000) { $i++ }' 33.76s user 1.53s system 97% cpu 36.036 total ./perl6 -e 'my $i = 0; while ( $i < 100000) { $i++ }' 34.33s user 1.39s system 99% cpu 36.007 total ./perl6 -e 'my $i = 0; while ( $i < 100000) { $i++ }' 34.64s user 1.36s system 99% cpu 36.293 total