Ticket #1671 (closed bug: wontfix)

Opened 4 years ago

Last modified 4 years ago

Can't encode strings with non-ASCII characters in them

Reported by: masak Owned by:
Priority: normal Milestone:
Component: none Version: 2.4.0
Severity: medium Keywords:
Cc: Language:
Patch status: Platform:

Description

In the below pir, encoding an all-ASCII string works fine, but Parrot dies on trying to translate the string "ö" to fixed_8. The error message is 'unimpl fixed_8'.

.sub _main :main
    .local int bin_coding, i, max, byte
    .local string bin_string
    .local pmc it, result
    $S0 = "OH HAI"
    bin_coding = find_encoding 'fixed_8'
    bin_string = trans_encoding $S0, bin_coding
    i = 0
    max = length bin_string
  bytes_loop:
    if i >= max goto bytes_done
    byte = ord bin_string, i
    say byte
    inc i
    goto bytes_loop
  bytes_done:

    $S0 = unicode:"ö"
    bin_string = trans_encoding $S0, bin_coding
.end

Attachments

to_charset_binary.patch Download (1.4 KB) - added by NotFound 4 years ago.

Change History

Changed 4 years ago by NotFound

Changed 4 years ago by NotFound

The attached patch to_charset_binary allows the wanted functionality, using trans_charset instead of trans_encoding. It does a raw copy of the string content regardless of its charset and encoding.

However I'm not sure if this conversion is desirable to have. Opinions?

Changed 4 years ago by NotFound

The conversions of strings to/from raw bytes can be done now with the ByteBuffer PMC. Is that way enough, or a direct translation to binary string is still wanted?

Changed 4 years ago by masak

That way should indeed be enough; translation directly to string was more a sign of desperation than a viable long-term solution.

I've done an initial review of the ByteBuffer PMC, and it looks very good. Will port my Perl 6 code to make use of it tonight or tomorrow. Thanks.

Changed 4 years ago by NotFound

  • status changed from new to closed
  • resolution set to wontfix

Closing ticket.

Note: See TracTickets for help on using tickets.