Ticket #1885 (new RFC)

Opened 4 years ago

Last modified 3 years ago

Parrot_load_bytecode should only load bytecode

Reported by: whiteknight Owned by: cotto
Priority: normal Milestone:
Component: core Version: master
Severity: medium Keywords: pir, pbc
Cc: Language:
Patch status: Platform: all

Description

The function src/packfile.c:Parrot_load_bytecode() is too magical. It not only loads bytecode, but it also does some rudimentary extension string matching to also allow it to load .pir and .pasm file. This is extremely magical and is not made clear through the function's name.

This function is exposed primarily through the load_bytecode PIR op, which has similar magical behavior.

I suggest that we eliminate the magical behavior of this function and the opcode that exposes it: load_bytecode and Parrot_load_bytecode should *only* load bytecode files under any extension. If the user wants to compile a PIR source file they should use the PIR compreg.

Change History

  Changed 4 years ago by coke

I suggest, instead of changing the functionality of this function to
add Parrot_load_pbc() and a load_pbc opcode.

This will simplify the transition if we then decide to remove the
mis-named *bytecode variants.

Regards.

  Changed 4 years ago by whiteknight

I don't want to remove the opcode or the function, I simply want to restrict their behavior. The load_bytecode op should stay, but it should only load bytecode. There are other ways to compile and load a PIR source file, we should use those instead of nonsensical overloads on this op.

  Changed 4 years ago by whiteknight

I'm not against the idea of creating a new opcode to simplify the transition. If we go that route I want to be explicit that we aren't changing load_bytecode but are instead removing it completely. I do not want to add in a new opcode without the promise that we are also going to remove the old one. I don't want to add to code bloat.

If we decide to go ahead with this plan (it's only an RFC right now) we can deprecate load_bytecode and it's internals completely, replace that with load_pbc (I can set up a GCI task to do this even) and go from there.

  Changed 4 years ago by NotFound

The steps for a smooth transition can be: 1 - add the load_pbc opcode as experimental, test it heavily 2 - move it to its place. core.ops? 3 - replace all usages of load_bytecode in libraries, examples and test 4 - deprecate load_bytecode

  Changed 4 years ago by pmichaud

Overall I agree with the proposed changes, but there's a very important dependency that I think needs to be added before removing the ability to load_bytecode on .pir files.

Currently it's not possible for a Parrot program to reliably generate .pbc files on its own -- .pbc can only be generated by an external command line invocation. This means that .pir files are the only mechanism we currently have for a Parrot program to create a file that can be subsequently loaded directly by another Parrot program (via load_bytecode).

Also, note that compiling via the PIR compreg is somewhat different from using load_bytecode, especially with respect to :load/:init markers. A file compiled using PIR compreg executes :init-flagged subs, while load_bytecode executes :load-flagged subs. Using the PIR compreg is not at all a direct replacement for load_bytecode.

Lastly, some crude benchmarking I did in 2009 seemed to indicate that using load_bytecode on .pir files was actually faster than loading the equivalent pre-compiled .pbc file. I totally admit this seems counterintuitive, but I tried it several times and got consistent results for several of Rakudo's precompiled modules.

So, while I give a +1 to eventually removing the .pir aspect of load_bytecode, the PIR compreg doesn't yet provide equivalent capability, and we ought to have a way to generate .pbc from Parrot before eliminating .pir from load_bytecode.

(It's okay with me if .pbc's are created via PIR compreg or some other compiler object in Parrot -- I'm simply noting that currently we don't have any officially-supported mechanism to do it.)

Pm

  Changed 4 years ago by NotFound

In addition to the :init problem, there is no way AFAIK to know what .sub is marked as :main, if any, from a Eval PMC. That must also be addressed.

follow-up: ↓ 8   Changed 4 years ago by whiteknight

Thanks for the comments Patrick. One question I have is whether the call "load_bytecode 'foo.pir'" Always triggers the :load functions and not the :init functions. For a .pbc file I agree that triggering the :load functions is the expected behavior, but I would be very very surprised if IMCC were smart enough to determine that it should trigger :load and not :init after compiling from the load_bytecode opcode.

Considering that :init and :immediate flags on subs are often used to update the constants table, I would be extremely surprised if "load_bytecode 'foo.pir'" did not trigger the :init functions. It may also trigger the :load functions too. I think we should put together a test for this behavior.

Also, I thought there was a way to output .pbc files from Parrot PIR code. If there isn't one, we need to make creating it a high priority.

So it seems like we have a few steps to tackle first (I'll lay out some things in a tasklist page on the wiki eventually):

  • Figure out what, exactly, load_bytecode does.
  • Figure out whether loading a .pir file is actually faster than loading a .pbc file in current Parrot and, if so, why. This should not be the case and if it is we have a serious problem.
  • Create a new compiler compreg interface for IMCC that would perform the kinds of functions we need it to perform

So this is turning out to be a bigger task than I originally thought but I am becoming more and more convinced that it is necessary.

in reply to: ↑ 7   Changed 4 years ago by pmichaud

Replying to whiteknight:

One question I have is whether the call "load_bytecode 'foo.pir'" Always triggers the :load functions and not the :init functions. For a .pbc file I agree that triggering the :load functions is the expected behavior, but I would be very very surprised if IMCC were smart enough to determine that it should trigger :load and not :init after compiling from the load_bytecode opcode. [...] I would be extremely surprised if "load_bytecode 'foo.pir'" did not trigger the :init functions.

Prepare to be surprised then, because IMCC (2.10.1) definitely triggers the :load functions and not the :init ones when invoked via load_bytecode :-)

$ cat x.pir
.sub 'first'
   say 'first'
.end

.sub 'load' :load
   say ':load'
.end

.sub 'init' :init
   say ':init'
.end

.sub 'loadinit' :load :init
   say ':load :init'
.end

.sub 'main' :main
   say ':main'
.end

$ cat y.pir
.sub 'main' :main
    load_bytecode 'x.pir'
.end

$ ./parrot y.pir
:load
:load :init
$ 

As you can see, the subs marked ":load" are the ones executed -- the one marked only with ":init" is not executed.

Also, I thought there was a way to output .pbc files from Parrot PIR code. If there isn't one, we need to make creating it a high priority.

It was speculated at one point that one could simply write the object coming back from the PIR compiler to a file and use that. That seems to work for very simple PIR files, but fails for anything much larger (i.e., it didn't work for files generated by nqp-rx).

Pm

  Changed 4 years ago by whiteknight

Thanks for the testcase Patrick. You're right that I am very surprised by that result. In fact, I'm much more surprised that it hadn't been flagged as a bug beforehand. This rabbithole keeps going deeper.

src/packfile.c:Parrot_load_bytecode() calls src/packfile.c:compile_of_load_file(), which calls src/interp/inter_misc.c:Parrot_compile_file(), which calls compilers/imcc/parser_util.c:imcc_compile_file() which does compile the PIR down but doesn't trigger any of the :load or :init functions. The :load functions are then triggered back in compile_or_load_file, but only for the special case of .pir or .pasm files (it's triggered automatically for .pbc files, like it should be).

So there is one mystery down.

  Changed 3 years ago by jkeenan

whiteknight,

Can we get an update on the status of the issues discussed in this ticket? I see that Parrot_load_bytecode() is now found in src/packfile/api.c (lines 2377+).

Thank you very much.

kid51

Note: See TracTickets for help on using tickets.