Ticket #800 (new bug)

Opened 5 years ago

Last modified 4 years ago

Parrot assumes command line arguments are ASCII

Reported by: pmichaud Owned by: chromatic
Priority: normal Milestone:
Component: core Version: 1.3.0
Severity: medium Keywords:
Cc: whiteknight, plobsing, nwellnhof Language:
Patch status: Platform:

Description

Currently Parrot (incorrectly) assumes that all command line arguments are ASCII:

$ cat x.pir
.sub 'main'
    .param pmc args

    $S0 = args[1]
    say $S0

    $I1 = charset $S0
    $S1 = charsetname $I1
    say $S1

    $I1 = encoding $S0
    $S1 = encodingname $I1
    say $S1

    $I1 = length $S0
    say $I1
.end
$ ./parrot x.pir 'say «hello»'
say «hello»
ascii
fixed_8
13
$ 

Most would expect the above to be a unicode/utf8 string of length 11.

If Parrot itself cannot be easily changed to accept unicode/utf8 command line arguments, then it would be nice to have a way to easily convert the "ascii" strings in args into proper unicode strings. Thus far I've been unable to find a good way of doing that.

This ticket also relates to Rakudo RT #66364.

Pm

Attachments

unicode_args.patch Download (0.5 KB) - added by chromatic 5 years ago.
Encode command-line arguments as Unicode, not ASCII

Change History

Changed 5 years ago by chromatic

Encode command-line arguments as Unicode, not ASCII

Changed 5 years ago by chromatic

  • owner set to chromatic

This patch passes all Parrot tests; how does it work for Rakudo?

Changed 5 years ago by pmichaud

  • lang perl6 deleted

Everything works great in Rakudo now. We should add whatever tests we want for this, and then I think we can mark it as "fixed".

Pm

Changed 4 years ago by jkeenan

  • cc whiteknight added
  • priority changed from major to normal

pmichaud, chromatic:

Is this still a live issue? If so, the patch will have to be re-pulled, because the area where it presumably would apply in src/embed.c has changed significantly.

PARROT_CANNOT_RETURN_NULL
static PMC*
setup_argv(PARROT_INTERP, int argc, ARGIN(const char **argv))
{
...
    for (i = 0; i < argc; ++i) {
        /* Run through argv, adding everything to @ARGS. */
        STRING * const arg = Parrot_str_new_init(interp, argv[i], 
                strlen(argv[i]),
                Parrot_utf8_encoding_ptr, PObj_external_FLAG);

        if (Interp_debug_TEST(interp, PARROT_START_DEBUG_FLAG))
            Parrot_io_eprintf(interp, "\t%vd: %s\n", i, argv[i]);

        VTABLE_push_string(interp, userargv, arg);
    }
...
}

Changed 4 years ago by whiteknight

  • cc plobsing, nwellnhof added

I suspect this is either fixed or on the way to being fixed with recent work by nwellnhof and plobsing and others. I would like to hear what they have to say about it.

Changed 4 years ago by nwellnhof

This was fixed by plobsing for Linux only. For other Unices, we need code to initialize the platform encoding. For Windows it's not that easy. See here for our discussion on IRC:  http://irclog.perlgeek.de/parrot/2011-01-09#i_3167023

Note: See TracTickets for help on using tickets.