Ticket #24 (closed bug: fixed)

Opened 6 years ago

Last modified 6 years ago

non-equivalence of equal Hash key strings

Reported by: pmichaud Owned by:
Priority: major Milestone:
Component: core Version:
Severity: medium Keywords:
Cc: Language: perl6
Patch status: Platform:

Description

Two strings of different encodings that otherwise compare as equal do not result in equivalent hash lookups:

$ cat x.pir
.sub 'main'
    .local string str0, str1
    str0 = unicode:"\u00ab"
    str1 = iso-8859-1:"\xab"

    .local pmc hash
    hash = new 'Hash'
    hash[str0] = 'hello'

    $I0 = iseq str0, str1
    print "iseq str0, str1               => "
    say $I0

    $S0 = hash[str0]
    $S1 = hash[str1]
    $I0 = iseq $S0, $S1
    print "iseq hash[str0], hash[str1]   => "
    say $I0
.end

$ ./parrot x.pir
iseq str0, str1               => 1
iseq hash[str0], hash[str1]   => 0
$

This is preventing Rakudo from being able to properly recognize the french angle brackets (U+00AB and U+00BB) in operator and other tokens.

Pm

Change History

Changed 6 years ago by coke

  • summary changed from [BUG] non-equivalence of equal Hash key strings to non-equivalence of equal Hash key strings

Changed 6 years ago by chromatic

I added a TODO test for this to t/op/stringu.t in r35772. Simon Cozens was looking at it too.

Changed 6 years ago by pmichaud

  • lang set to perl6

Changed 6 years ago by pmichaud

I worked on this bug a bit, hoping to get Rakudo to use fixed-width strings for more of its processing. In r39282 I made a change to src/hash.c:205 that changes

    if (s1->hashval != s2->hashval)
         return 1;

to

    if (s1->charset == s2->charset && s1->hashval != s2->hashval)
         return 1;

The original assumed that different string hashvals automatically implied logically unequal strings, but this holds only if the strings are from the same charset. The above change causes the original test script to pass.

But now there's a different failure -- somehow in larger hashes logically equivalent hash keys sometimes end up as non-equivalent. Here's a test program (added to t/op/stringu.t in r39284):

$ cat x.pir
.sub 'main'
    .local string str0, str1
    str0 = unicode:"infix:\u00b1"
    str1 = iso-8859-1:"infix:\xb1"

    .local pmc hash
    hash = new 'Hash'
    hash[str0] = 'hello'

    $I0 = 0
  fill_loop:
    unless $I0 < 200 goto fill_done
    inc $I0
    $S0 = $I0
    $S0 = concat 'infix:', $S0
    hash[$S0] = 'hello'
    goto fill_loop
  fill_done:

    $I0 = iseq str0, str1
    print "iseq str0, str1               => "
    say $I0

    $S0 = hash[str0]
    $S1 = hash[str1]
    $I0 = iseq $S0, $S1
    print "iseq hash[str0], hash[str1]   => "
    say $I0
    say $S0
    say $S1
.end

$ ./parrot x.pir
iseq str0, str1               => 1
iseq hash[str0], hash[str1]   => 0
hello

$ 

This test is very sensitive to the size of the hash -- in fact, the failure only appears when the hash has more than 192 entries. I'm not sure of the significance of the 192 here, but I suspect it has something to do with bucket and/or key management in hashes. (Change the "200" to "191" in the test script above and it produces the correct output.)

AFAICT this problem is the only significant blocker to Rakudo being able to run a significant number of spectests using fixed-width (iso-8859-1) strings, which will reduce the overall "make spectest" time by about 20%.

Thanks,

Pm

Changed 6 years ago by bacek

Fix commited at r39286. STRING* design apparently isn't robust enough...

Changed 6 years ago by bacek

  • status changed from new to closed
  • resolution set to fixed
Note: See TracTickets for help on using tickets.