Ticket #688 (closed bug: fixed)

Opened 5 years ago

Last modified 5 years ago

Fakecutable creation by pbc_to_exe is critically slow for large PBC files.

Reported by: Util Owned by: Util
Priority: critical Milestone:
Component: install Version: trunk
Severity: high Keywords: fakecutable pbc_to_exe pbc exe performance
Cc: Util Language: perl6
Patch status: Platform: all

Description (last modified by Util) (diff)

The techniques used by pbc_to_exe do not scale well. When the PBC size is larger than about 2MB, both GCC (on all platforms) and MSVC (on Win32) can take up to 2 hours to create a fakecutable; Since Rakudo's perl6.pbc is around 3MB, and since the pbc_to_exe time should be under 1 minute, this situation is causing daily development delays.

Attachments

faster_pbc_to_exe.patch Download (2.1 KB) - added by Util 5 years ago.
First test patch
faster_pbc_to_exe_win32_also.patch Download (7.1 KB) - added by Util 5 years ago.
Second test patch
pbc_to_exe_win32_msvc_fix.patch Download (10.2 KB) - added by Util 5 years ago.

Change History

Changed 5 years ago by pmichaud

  • priority changed from normal to critical

Changed 5 years ago by Util

  • status changed from new to assigned
  • version changed from 1.1.0 to trunk

Changed 5 years ago by Util

PROGRAM HISTORY

Originally, pbc_to_exe.pir built fakecutables with this technique:

    const Parrot_UInt1 program_code[] = {
        254,80,66,67,13,10,26,10,4,0,0,1,0,0,4,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
        //...many lines snipped...
        112,0,0,0,35,0,0,0,144,0,0,0,35,0,0,0,180,0,0,0,35,0,0,0,0,0,0,0,0,0,0,0,
    };
    const int bytecode_size = 11680;
    //...main setup code skipped...
    pf = PackFile_new(interp, 0);
    
    PackFile_unpack(interp, pf, (const opcode_t *)program_code, bytecode_size);
    //...error handling, pragmas and fixups skipped...
    Parrot_pbc_load(interp, pf);    
    Parrot_runcode(interp, argc, argv);

The PIR program reads a PBC file, and writes a C file containing an array representing the entire PBC file as a series of char numbers in a single block of contiguous memory. The array is initialized at compile-time. The array's address is passed directly to PackFile_unpack().

This technique is homogeneous on all platforms, and works well as long when the PBC size does not exceed about 2MB.

PROBLEM HISTORY

When used with PBCs larger than 2MB, the technique causes problems under some compilers.

* GCC (All platforms): the compiler uses pathological amounts of memory when compiling the array. On most systems, this causes swapping to the point of thrashing. The compilation, which takes well under a minute using faster techniques, ends up taking up to 2 hours. Once compiled, the linking is fast, as is the execution of the resulting binary.

* MSVC (Win32): Less analysis has been done of MSVC's behavior (than of GCC's) with pbc_to_exe.
I believe that it exhibits similar memory consumption, but perhaps not quite as severe as GCC.
MSVC has the additional problem of needing heap-tuning (using the /Zm flag) during compilation. MSVC allocates 2GB of working set, divided between heaps and everything else. The default heap size is too small to compile the array in perl6.pbc. Note that setting the heap-size too high causes compilation to fail, since the compiler's stack space (and "everything else") get squeezed to zero. Jonathan Worthington has been keeping the compiler flags tweaked to accommodate the compilation of the large perl6.pbc.

* Other compilers might also have problems, but none have been reported yet.

SOLUTIONS TRIED

1. One big C string:

    const char * program_code =
    "\376\120\102\103\15\12\32\12\4\0\0\1\2\0\4\6\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"
    "\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\204\15\0\0\0\0\0\0\0\0\0\0\0\0\0\0"
    //...many lines snipped...
    "\35\0\0\0\44\0\0\0\113\0\0\0\44\0\0\0\175\0\0\0\44\0\0\0\235\0\0\0\44\0\0\0"
    "\251\0\0\0\44\0\0\0\320\0\0\0\44\0\0\0\364\0\0\0\44\0\0\0\0\0\0\0\0\0\0\0"
    ;
    // The rest is identical to the original

The C standard allows this technique; a double-quoted string can be written on multiple lines as shown above. The code "abc" "def" results in the same string as "abcdef". No NULL character is inserted; only a single NULL is appended after the string's *final* closing quote is seen.

GCC handles these huge strings particularly well; when encoded as variable-length octal, perl6.pir (about 3MB) consistently compiles in under 10 seconds.

On 2009-05-05, a rough-cut was published for testing as  http://sial.org/pbot/36403, replicated here as the attached patch faster_pbc_to_exe.patch. The technique works perfectly everywhere, except MSVC on Win32. MSVC has a (highly artificial, to my mind) limit on string size; either 16KB or 64KB, depending on the MSVC version.

2. A suggestion was made in IRC to use multiple strings in an array (with careful attention to only breaking strings where a NULL would naturally occur), resulting in the same contiguous block as before. For example, if the PBC were "abcdef\0uvwxyz\0abcdef\0", and the string limit was 8 bytes, then this code might work:

    const char *  program_code[] = {
        "abcdef",
        "uvwxyz",
        "abcdef"
    };

The technique cannot work reliably under (at least) GCC, since the third element would be replaced by a pointer to the first element in the string table. Although the chance of exact duplicate lines drops as the line size increases, we must note that our bytecode is NULL-heavy (greatly increasing the chance of collision). Also, since the bytecode's string table could be padded directly in PASM, we might be leaving a exploitable hole.

3. Since MSVC cannot support the "one big C string" solution, a second code path was written just for MSVC. The bytecode is encoded in multiple strings, which are each copied into a single large malloc'ed block at run-time.

On 2009-05-13, a new solution was published for testing as  http://nopaste.snit.ch/16532, replicated here as the attached patch faster_pbc_to_exe_win32_also.patch.

While this solution created valid and legal code fitting within the 16KB/64KB limitations on string size, every variation of the solution failed to compile. Variations included: one array of multiple strings (each one memcpy'ed), multiple named string vars, in-line strings with multiple memcpy calls, declaring the vars as static, and moving the declarations inside the main() routine. In every case, the compiler "fell over"; No value of the /Zm flag provided enough heap for the large total string size while leaving enough room for the rest of the compilation. Microsoft's compiler design is simply not oriented toward any single large compilation unit.

4. Microsoft's approved solution for the inclusion of large binary data is to link it in as a "resource", then to use Win32 calls to load and lock the data into memory. A complete code path has been successfully implemented in Perl5 to do this, and the code is being re-written in PIR for inclusion in pbc_to_exe.pir.

Changed 5 years ago by Util

First test patch

Changed 5 years ago by Util

Second test patch

Changed 5 years ago by Util

COMMIT OF PARTIAL SOLUTION

In r39176, tools/dev/pbc_to_exe.pir was updated to partially resolve the problem. Rather than split the codepath into Original and MSVC (based on config['os']), we split it into Original and GCC (based on config['gccversion']). This allows for future changes to more easily address the needs of specific compilers.

* Problem resolved on GCC.

* MSVC still pending conversion to PIR.

* Other compilers will be left on the old codepath until problems are reported and/or GCC codepath is verified to work for the non-GCC compiler.

Changed 5 years ago by Util

  • description modified (diff)

Changed 5 years ago by Util

Changed 5 years ago by Util

Candidate patch pbc_to_exe_win32_msvc_fix.patch adds a new codepath to handle MSVC by linking .pbc as a resource.

Changed 5 years ago by Util

Candidate patch pbc_to_exe_win32_msvc_fix.patch committed in r39585.

Changed 5 years ago by chromatic

  • status changed from assigned to closed
  • resolution set to fixed

Let's call this resolved and file more specific tickets as necessary.

Note: See TracTickets for help on using tickets.