Ticket #1863 (closed feature: fixed)
Parrot IO and encodings
Reported by: | nwellnhof | Owned by: | nwellnhof |
---|---|---|---|
Priority: | normal | Milestone: | 3.0 |
Component: | core | Version: | 2.10.0 |
Severity: | medium | Keywords: | |
Cc: | Language: | ||
Patch status: | Platform: |
Description
The FileHandle PMC is supposed to support different encodings via the 'encoding' method. Currently, this only works for single-byte encodings and UTF-8, that is UTF-16, UCS-2 and UCS-4 are not supported. UCS-2 and UCS-4 are fixed-width and should be pretty easy to support, but UTF-16 would need something like the code in src/io/utf8.c. It would be cleaner to move that logic to the string code. We would only need an additional function in the encoding vtable that can partially decode incomplete variable-width strings.
One thing that I don't like is that the 'read' method works on bytes not on characters. I think this doesn't make sense for multi-byte encodings.
We should also consider to support encodings for Socket or StringHandle PMCs.