Ticket #550 (closed bug: invalid)

Opened 5 years ago

Last modified 3 years ago

t/pmc/codestring.t failing

Reported by: whiteknight Owned by:
Priority: normal Milestone:
Component: core Version: trunk
Severity: high Keywords: codestring pmc icu
Cc: Language:
Patch status: Platform: linux

Description

I thought there was a ticket for this already, but on closer inspection I don't see one. I'm getting test failures in t/pmc/codestring.t. The code failing in question is:

$I0 = code.'charname_to_ord'('<no such symbol>')
is($I0, -1, '<no such symbol>')

So it's obviously something ICU-related. Backtrace shows that we're getting a SIGBUS error in Parrot_CodeString_nci_charname_to_ord. Here's the backtrace I get from GDB:

Program received signal SIGBUS, Bus error.
0x00007fde0d94e54c in u_charFromName_3_8 () from /usr/lib/libicuuc.so.38
(gdb) frame
#0  0x00007fde0d94e54c in u_charFromName_3_8 () from /usr/lib/libicuuc.so.38
(gdb) bt
#0  0x00007fde0d94e54c in u_charFromName_3_8 () from /usr/lib/libicuuc.so.38
#1  0x00007fde0e2a8099 in Parrot_CodeString_nci_charname_to_ord (
    interp=0x7f4080, pmc=0x9035c0) at ./src/pmc/codestring.pmc:247
#2  0x00007fde0e240f08 in Parrot_NCI_invoke (interp=0x7f4080, pmc=0x9035c0, 
    next=0x9a73c0) at ./src/pmc/nci.pmc:329
#3  0x00007fde0e094573 in Parrot_callmethodcc_p_sc (cur_opcode=0x9a73a8, 
    interp=0x7f4080) at src/ops/object.ops:80
#4  0x00007fde0e167b45 in runops_slow_core (interp=0x7f4080, pc=0x9a73a8)
    at src/runops_cores.c:461
#5  0x00007fde0e10f265 in runops_int (interp=0x7f4080, offset=0)
    at src/interpreter.c:982
#6  0x00007fde0e10fd6a in runops (interp=0x7f4080, offs=0)
    at src/call/ops.c:107
#7  0x00007fde0e110179 in runops_args (interp=0x7f4080, sub=0x8feb20, 
    obj=0x87d4f0, meth_unused=0x0, sig=0x7fde0e382e1b "vP", ap=0x7fff16939a70)
    at src/call/ops.c:255
#8  0x00007fde0e11154c in Parrot_runops_fromc_args (interp=0x7f4080, 
    sub=0x8feb20, sig=0x7fde0e382e1b "vP") at src/call/ops.c:324
#9  0x00007fde0e0eac6e in Parrot_runcode (interp=0x7f4080, argc=1, 
    argv=0x7fff16939d70) at src/embed.c:1012
#10 0x00007fde0e3576f4 in imcc_run_pbc (interp=0x7f4080, obj_file=0, 
    output_file=0x0, argc=1, argv=0x7fff16939d70) at compilers/imcc/main.c:800
#11 0x00007fde0e358391 in imcc_run (interp=0x7f4080, 
    sourcefile=0x7fff1693b6e0 "t/pmc/codestring.t", argc=1, 
    argv=0x7fff16939d70) at compilers/imcc/main.c:1091
#12 0x0000000000400bc4 in main (argc=1, argv=0x7fff16939d70) at src/main.c:61

Attachments

t_pmc_codestrinh_t.patch Download (0.6 KB) - added by mikehh 5 years ago.

Change History

in reply to: ↑ description ; follow-up: ↓ 2   Changed 5 years ago by jkeenan

Replying to whiteknight:

I thought there was a ticket for this already, but on closer inspection I don't see one. I'm getting test failures in t/pmc/codestring.t. The code failing in question is:

I cannot reproduce this test failure. On Linux/i386 at r38067, I get:

All tests successful.
Files=1, Tests=38,  0 wallclock secs ( 0.02 usr  0.00 sys +  0.02 cusr  0.00 csys =  0.04 CPU)
Result: PASS

... with no tests TODO-ed out.

Which particular test is failing? Is this on one of the alternate cores?

Thank you very much.
kid51

in reply to: ↑ 1   Changed 5 years ago by Infinoid

I can reproduce it here.

(gdb) run t/pmc/codestring.t
Starting program: /home/infinoid/parrot-dev/test/parrot t/pmc/codestring.t
[Thread debugging using libthread_db enabled]
warning: Lowest section in /usr/lib64/libicudata.so.38 is .hash at 0000000000000190
1..24
ok 1 - code string creation succeeded
ok 2 - call to unique with name
ok 3 - call to unique with no params
ok 4 - call to unique with reg name
ok 5 - code string looks fine
ok 6 - code string with positional args looks fine
ok 7 - code string with % args looks fine
ok 8 - emit with named args looks fine
ok 9 - emit with pos + named args
ok 10 - global unique #1 looks ok
ok 11 - global unique #2 looks ok
ok 12 - unnested namespace key
ok 13 - nested namespace key
ok 14 - flattened nested unicode ns key
ok 15 - nested unicode ns key
ok 16 - big ns key
ok 17 - empty string namespace
ok 18 - empty array namespace
ok 19 - null PMC namespace
ok 20 - regression on first char repl bug looks fine
ok 21 - LATIN CAPITAL LETTER C
ok 22 - MUSIC FLAT SIGN
ok 23 - RECYCLING SYMBOL FOR TYPE-1 PLASTICS
[New Thread 0x7fc298f85700 (LWP 18131)]

Program received signal SIGBUS, Bus error.
[Switching to Thread 0x7fc298f85700 (LWP 18131)]
0x00007fc297d87721 in u_charFromName_3_8 () from /usr/lib/libicuuc.so.38
(gdb) bt
#0  0x00007fc297d87721 in u_charFromName_3_8 () from /usr/lib/libicuuc.so.38
#1  0x00007fc29892b5cd in Parrot_CodeString_nci_charname_to_ord (
    interp=0x236b080, pmc=0x2478e00) at ./src/pmc/codestring.pmc:220
#2  0x00007fc2988b1227 in Parrot_NCI_invoke (interp=0x236b080, pmc=0x2478e00,
    next=0x250e538) at ./src/pmc/nci.pmc:330
#3  0x00007fc2986f6813 in Parrot_callmethodcc_p_sc (cur_opcode=0x250e520,
    interp=0x236b080) at src/ops/object.ops:80
#4  0x00007fc2987cb055 in runops_slow_core (interp=0x236b080, pc=0x250e520)
    at src/runops_cores.c:461
#5  0x00007fc2987776a5 in runops_int (interp=0x236b080, offset=0)
    at src/interpreter.c:982
#6  0x00007fc2987781aa in runops (interp=0x236b080, offs=0)
    at src/call/ops.c:107
#7  0x00007fc298778586 in runops_args (interp=0x236b080, sub=0x2474360,
    obj=0x23f68c0, meth_unused=0x0, sig=0x7fc298a032eb "vP", ap=0x7fffa0fc48a0)
    at src/call/ops.c:255
#8  0x00007fc29877993c in Parrot_runops_fromc_args (interp=0x236b080,
    sub=0x2474360, sig=0x7fc298a032eb "vP") at src/call/ops.c:324
#9  0x00007fc29875303e in Parrot_runcode (interp=0x236b080, argc=1,
    argv=0x7fffa0fc4ba0) at src/embed.c:1012
#10 0x00007fc2989d8224 in imcc_run_pbc (interp=0x236b080, obj_file=0,
    output_file=0x0, argc=1, argv=0x7fffa0fc4ba0) at compilers/imcc/main.c:800
#11 0x00007fc2989d8ec1 in imcc_run (interp=0x236b080,
---Type <return> to continue, or q <return> to quit---
    sourcefile=0x7fffa0fc5f73 "t/pmc/codestring.t", argc=1,
    argv=0x7fffa0fc4ba0) at compilers/imcc/main.c:1091
#12 0x0000000000400c14 in main (argc=1, argv=0x7fffa0fc4ba0) at src/main.c:61
(gdb)

Bisect points to r37866. I spoke briefly with pmichaud about it, here's his description of what the test is trying to do:

09:19 <@pmichaud> it's a literal C string -- we're verifying that the method returns the correct error value (-1) when passed a string that isn't a valid character name.
09:19 <@pmichaud> I'd be okay with changing the test to remove the angles or otherwise test this particular case.
09:19 <@pmichaud> (and yes, my ICU accepts it just fine.)

It sounds like it may be something specific to my particular version of ICU. (gentoo dev-libs/icu-3.8.1-r1)

Changed 5 years ago by mikehh

  Changed 5 years ago by mikehh

I get a Bus Error on Ubuntu Intrepid Amd64

mhu64@mhu64-desktop:~/parrot.t$ ./parrot t/pmc/codestring.t                                   
1..38                                                                                         
ok 1 - code string creation succeeded                                                         
ok 2 - call to unique with name                                                               
ok 3 - call to unique with no params                                                          
ok 4 - call to unique with reg name                                                           
ok 5 - code string looks fine                                                                 
ok 6 - code string with positional args looks fine                                            
ok 7 - code string with % args looks fine                                                     
ok 8 - emit with named args looks fine                                                        
ok 9 - emit with pos + named args                                                             
ok 10 - global unique #1 looks ok                                                             
ok 11 - global unique #2 looks ok                                                             
ok 12 - unnested namespace key                                                                
ok 13 - nested namespace key                                                                  
ok 14 - flattened nested unicode ns key                                                       
ok 15 - nested unicode ns key                                                                 
ok 16 - big ns key                                                                            
ok 17 - empty string namespace                                                                
ok 18 - empty array namespace                                                                 
ok 19 - null PMC namespace                                                                    
ok 20 - regression on first char repl bug looks fine                                          
ok 21 - LATIN CAPITAL LETTER C                                                                
ok 22 - MUSIC FLAT SIGN                                                                       
ok 23 - RECYCLING SYMBOL FOR TYPE-1 PLASTICS                                                  
Bus error                                                                                     

I was looking at the changelog for libicu and saw the following:

icu (3.8.1-2ubuntu0.1) intrepid-security; urgency=low

  * SECURITY UPDATE: Cross-site scripting attack via invalid character
    sequences (LP: #341834)
    - debian/patches/03-cve-2008-1036.patch: Improve parsing logic in
      source/common/{ucnv2022.c,ucnv_bld.*,ucnv.c,ucnvhz.c} to replace
      invalid character sequences. Also, add test case to
      source/test/{cintltst/nucnvtst.c,testdata/conversion.txt}.
    - CVE-2008-1036

 -- Marc Deslauriers <marc.deslauriers@ubuntu.com>  Wed, 25 Mar 2009 09:09:07 -0400

This seems to there might be a problem with the test which effectively seems to test for an invalid character. I would suggest removing the test if that is the case.

Possible patch included.

The amended test PASSES.

Cheers Michael (mikehh)

follow-up: ↓ 5   Changed 5 years ago by pmichaud

I'm guessing it's a bug in ICU itself. Here's a simple test program for people to try on the 64-bit machines:

$ cat icutest.c
#include <unicode/uchar.h>
#include <stdio.h>

/* compile with:
        cc -o icutest icutest.c -licuuc -licudata
*/


int main(int argc, char* argv[])
{
    const char* str = "<no such symbol>";
    UErrorCode err = U_ZERO_ERROR;
    UChar32 codepoint;

    codepoint = u_charFromName(U_EXTENDED_CHAR_NAME, str, &err);
    if (U_SUCCESS(err)) {
        printf("codepoint: %ld\n", (long)codepoint);
    }
    else {
         printf("invalid name (expected)\n");
    }

    return 0;

}

Compiling and running that program should result in "invalid name (expected)". If it results in a segfault or Bus Error, then there's likely a problem with ICU on that system.

$ cc -o icutest icutest.c -licuuc -licudata
$ ./icutest
invalid name (expected)
$

Pm

in reply to: ↑ 4 ; follow-up: ↓ 6   Changed 5 years ago by jkeenan

Replying to pmichaud:

I'm guessing it's a bug in ICU itself. Here's a simple test program for people to try on the 64-bit machines:

On my 32-bit machine, I tried Patrick's test. I'm on Debian-stable, so even upgrading to the latest version of ICU brought me to a version older than the 3.8 in question here. Patrick's file ran without incident.

Also, I see that a revision to t/pmc/codestring.t was committed by Infinoid in r38087. All tests in this file continue to pass for me -- but, of course, that's not on 64-bit.

kid51

in reply to: ↑ 5 ; follow-up: ↓ 7   Changed 5 years ago by Infinoid

Replying to jkeenan:

Also, I see that a revision to t/pmc/codestring.t was committed by Infinoid in r38087. All tests in this file continue to pass for me -- but, of course, that's not on 64-bit.

Yeah, r38087 is a workaround which gets things passing for me.

On my gentoo x86-64 machine, I was able to reproduce the crash with ICU versions 3.6, 3.8.1 (with and without gentoo patches), and 4.0.1.

in reply to: ↑ 6 ; follow-up: ↓ 8   Changed 5 years ago by jkeenan

Replying to Infinoid:

Yeah, r38087 is a workaround which gets things passing for me.

So, are there people/machines that are still getting a failure on this file? If not, then can we close the ticket?

Thank you very much.
kid51

in reply to: ↑ 7 ; follow-ups: ↓ 9 ↓ 10   Changed 5 years ago by Infinoid

Replying to jkeenan:

So, are there people/machines that are still getting a failure on this file? If not, then can we close the ticket?

I don't know whether anyone else has been able to reproduce it after the workaround; I haven't heard of any more failures, so we're probably good.

As to whether or not the ticket can be closed, I suppose it depends on whether this function only processes symbol names internally, or if its data comes directly from user applications. If it's an internal function and there's no chance of being passed bad data, the ticket can be closed, otherwise we should probably do some input filtering (for "parrot should never segfault" reasons).

I'm not familiar enough with this part of parrot to know how externally accessible this code is.

Mark

in reply to: ↑ 8   Changed 4 years ago by jkeenan

  • status changed from new to assigned
  • owner set to jkeenan

Replying to Infinoid:

Replying to jkeenan:

So, are there people/machines that are still getting a failure on this file? If not, then can we close the ticket?

I don't know whether anyone else has been able to reproduce it after the workaround; I haven't heard of any more failures, so we're probably good. As to whether or not the ticket can be closed, I suppose it depends on whether this function only processes symbol names internally, or if its data comes directly from user applications. If it's an internal function and there's no chance of being passed bad data, the ticket can be closed, otherwise we should probably do some input filtering (for "parrot should never segfault" reasons).

I will close this ticket in 7 days if there is no further objection.

Thank you very much.
kid51

in reply to: ↑ 8 ; follow-up: ↓ 11   Changed 4 years ago by pmichaud

As to whether or not the ticket can be closed, I suppose it depends on whether this function only processes symbol names internally, or if its data comes directly from user applications. If it's an internal function and there's no chance of being passed bad data, the ticket can be closed, otherwise we should probably do some input filtering (for "parrot should never segfault" reasons).

The input to this function can come directly from user applications.

I'm not sure exactly what sort of input filtering we would perform here, though. We would need to determine which characters are valid in Unicode symbol names and exclude all others, I guess. Or we'd need to figure out which characters cause ICU to segfault and exclude those.

Pm

in reply to: ↑ 10   Changed 4 years ago by jkeenan

  • owner jkeenan deleted
  • status changed from assigned to new

Replying to pmichaud:

> > As to whether or not the ticket can be closed, I suppose it 
depends on whether this function only processes symbol names 
internally, or if its data comes directly from user applications.  
If it's an internal function and there's no chance of being passed 
bad data, the ticket can be closed, otherwise we should probably 
do some input filtering (for "parrot should never segfault" reasons).
> 
> The input to this function can come directly from user applications.
> 

Well, then this ticket will probably be open a long time. And it will need someone more expert with these issues than I to take it.

  Changed 3 years ago by cotto

  • status changed from new to closed
  • resolution set to invalid

CodeString is gone, and with it this ticket.

Note: See TracTickets for help on using tickets.