Ticket #752 (closed bug: fixed)
Parrot concatenates iso-8859-1 and utf8 incorrectly
| Reported by: | pmichaud | Owned by: | |
|---|---|---|---|
| Priority: | normal | Milestone: | |
| Component: | core | Version: | 1.2.0 |
| Severity: | high | Keywords: | |
| Cc: | Language: | perl6 | |
| Patch status: | Platform: |
Description
Parrot has difficulty concatenating iso-8859-1 and utf8 strings. Here's the test case:
$ cat x.pir
.sub 'main'
$S0 = unicode:"\u00e5\u263b"
$S1 = chr 0xe5
$S2 = chr 0x263b
$S3 = concat $S1, $S2
if $S0 == $S3 goto equal
print "not "
equal:
say "equal"
.end
$ ./parrot x.pir
Malformed UTF-8 string
current instr.: 'main' pc 13 (x.pir:7)
$
Note that the exception occurs at the point of the == comparison, not when the concatenation occurs. If one outputs the value of $S3, it comes out as four bytes (e5 e2 98 bb). The correct result should be five bytes (c3 a5 e2 98 bb) -- i.e., the iso-8859-1 string that comes back from chr(229) needs to be converted to utf8 before concatenation.
This looks very similar to the bug reported in RT #39930 (which has since been marked as fixed, but apparently doesn't fix this case).
A fix for this is needed for various modules in Rakudo--especially those dealing with url encoding and decoding.
Thanks!
Pm

