Ticket #800 (new bug)

Opened 13 years ago

Last modified 11 years ago

Parrot assumes command line arguments are ASCII

Reported by: pmichaud Owned by: chromatic
Priority: normal Milestone:
Component: core Version: 1.3.0
Severity: medium Keywords:
Cc: whiteknight, plobsing, nwellnhof Language:
Patch status: Platform:

Description

Currently Parrot (incorrectly) assumes that all command line arguments are ASCII:

$ cat x.pir
.sub 'main'
    .param pmc args

    $S0 = args[1]
    say $S0

    $I1 = charset $S0
    $S1 = charsetname $I1
    say $S1

    $I1 = encoding $S0
    $S1 = encodingname $I1
    say $S1

    $I1 = length $S0
    say $I1
.end
$ ./parrot x.pir 'say «hello»'
say «hello»
ascii
fixed_8
13
$ 

Most would expect the above to be a unicode/utf8 string of length 11.

If Parrot itself cannot be easily changed to accept unicode/utf8 command line arguments, then it would be nice to have a way to easily convert the "ascii" strings in args into proper unicode strings. Thus far I've been unable to find a good way of doing that.

This ticket also relates to Rakudo RT #66364.

Pm

Attachments

unicode_args.patch Download (0.5 KB) - added by chromatic 13 years ago.
Encode command-line arguments as Unicode, not ASCII

Change History

Changed 13 years ago by chromatic

Encode command-line arguments as Unicode, not ASCII

Changed 13 years ago by chromatic

  • owner set to chromatic

This patch passes all Parrot tests; how does it work for Rakudo?

Changed 13 years ago by pmichaud

  • lang perl6 deleted

Everything works great in Rakudo now. We should add whatever tests we want for this, and then I think we can mark it as "fixed".

Pm

Changed 11 years ago by jkeenan

  • priority changed from major to normal
  • cc whiteknight added

pmichaud, chromatic:

Is this still a live issue? If so, the patch will have to be re-pulled, because the area where it presumably would apply in src/embed.c has changed significantly.

PARROT_CANNOT_RETURN_NULL
static PMC*
setup_argv(PARROT_INTERP, int argc, ARGIN(const char **argv))
{
...
    for (i = 0; i < argc; ++i) {
        /* Run through argv, adding everything to @ARGS. */
        STRING * const arg = Parrot_str_new_init(interp, argv[i], 
                strlen(argv[i]),
                Parrot_utf8_encoding_ptr, PObj_external_FLAG);

        if (Interp_debug_TEST(interp, PARROT_START_DEBUG_FLAG))
            Parrot_io_eprintf(interp, "\t%vd: %s\n", i, argv[i]);

        VTABLE_push_string(interp, userargv, arg);
    }
...
}

Changed 11 years ago by whiteknight

  • cc plobsing, nwellnhof added

I suspect this is either fixed or on the way to being fixed with recent work by nwellnhof and plobsing and others. I would like to hear what they have to say about it.

Changed 11 years ago by nwellnhof

This was fixed by plobsing for Linux only. For other Unices, we need code to initialize the platform encoding. For Windows it's not that easy. See here for our discussion on IRC:  http://irclog.perlgeek.de/parrot/2011-01-09#i_3167023

Note: See TracTickets for help on using tickets.