Ticket #1557 (new bug)

Opened 12 years ago

Last modified 11 years ago

pbc_disassemble fails on large PBCs

Reported by: plobsing Owned by:
Priority: normal Milestone:
Component: none Version: 2.2.0
Severity: medium Keywords:
Cc: Util Language:
Patch status: Platform:

Description

PBC disassemble complains about encoding or flat out segfaults when dealing with large PBC files. Good choices are perl6.pbc from rakudo, nqp-rx.pbc, etc

> /home/pitr/parrot-trunk/bin/pbc_disassemble perl6.pbc
zsh: segmentation fault  /home/pitr/parrot-trunk/bin/pbc_disassemble perl6.pbc
> ./pbc_disassemble parrot-nqp.pbc
...
000000000750-000000002197 000707: 	index_i_sc_s I8,"\t Invalid character for UTF-8 encoding

current instr.: 'parrot;NQP;Compiler;main' pc 76066 (ext/nqp-rx/src/stage0/NQP-s0.pir:20736)

Change History

Changed 12 years ago by Util

  • cc Util added

Here is a partial analysis, tested on r45670:

I don't think that this problem is really about the size of the .pbc file.

This problem occurs with *any* PBC built from PIR containing Unicode literal strings. Only large PIR modules happen to contain Unicode literals right now, hence the appearance that PBC size is an element of the problem.

For example, the following code complies to .pbc and executes correctly, but fails to disassemble:

$ cat unicode_minimal_crash.pir

.sub _main
    $S0 = 'd'
    $I0 = index unicode:"abc\x{a0}def", $S0
    print "The answer is "
    say $I0
    end
.end

$ ./parrot -o unicode_minimal_crash.pbc unicode_minimal_crash.pir

$ ./parrot unicode_minimal_crash.pbc

The answer is 4

$ ./pbc_disassemble unicode_minimal_crash.pbc

=head1 Constant-table

PMC_CONST(0): 'ParrotInterpreter'
PMC_CONST(1): abc def
PMC_CONST(2): d
PMC_CONST(3): The answer is 
PMC_CONST(4): unicode_minimal_crash.pir
PMC_CONST(5): 
PMC_CONST(6): _main

=cut

#   Seq_Op_Num- Relative-PC SrcLn#:
# Current Source Filename 'unicode_minimal_crash.pir'
000000000000-000000000000 000003: 	set_s_sc S0,"d"
000000000001-000000000003 000004: 	index_i_sc_s I0,"abcInvalid character for UTF-8 encoding

Changed 12 years ago by NotFound

The problem is that the string is converted to C string and later sent char by char to the output file, which is a encoding nightmare. To fix the problem the C string usage must be avoided. in the meantime, workaround added in r45831.

Changed 11 years ago by coke

Current failure mode:

$ pbc_disassemble perl6.pbc 
Could not load oplib `io_ops'

Note: See TracTickets for help on using tickets.