Ticket #752 (closed bug: fixed)
Parrot concatenates iso-8859-1 and utf8 incorrectly
|Reported by:||pmichaud||Owned by:|
Parrot has difficulty concatenating iso-8859-1 and utf8 strings. Here's the test case:
$ cat x.pir .sub 'main' $S0 = unicode:"\u00e5\u263b" $S1 = chr 0xe5 $S2 = chr 0x263b $S3 = concat $S1, $S2 if $S0 == $S3 goto equal print "not " equal: say "equal" .end $ ./parrot x.pir Malformed UTF-8 string current instr.: 'main' pc 13 (x.pir:7) $
Note that the exception occurs at the point of the == comparison, not when the concatenation occurs. If one outputs the value of $S3, it comes out as four bytes (e5 e2 98 bb). The correct result should be five bytes (c3 a5 e2 98 bb) -- i.e., the iso-8859-1 string that comes back from chr(229) needs to be converted to utf8 before concatenation.
This looks very similar to the bug reported in RT #39930 (which has since been marked as fixed, but apparently doesn't fix this case).
A fix for this is needed for various modules in Rakudo--especially those dealing with url encoding and decoding.