Ticket #1670 (closed bug: fixed)

Opened 12 years ago

Last modified 12 years ago

t/library/lwp.t fails after parallel build

Reported by: doughera Owned by:
Priority: normal Milestone:
Component: none Version: 2.4.0
Severity: medium Keywords:
Cc: Language:
Patch status: Platform:

Description

I am getting the following strange failure with t/library/lwp.t, but only if I build with make -j 6 (or greater). The test succeeds if I build with make -j 5.

The failure looks like this:

1..48
ok 1 - new ['LWP';'UserAgent']
ok 2 - new ['LWP';'Protocol';'file']
ok 3 - isa ['LWP';'Protocol']
ok 4 - new ['LWP';'Protocol';'http']
ok 5 - isa ['LWP';'Protocol']
ok 6 - new ['HTTP';'Request']
ok 7 - isa ['HTTP';'Message']
ok 8 - new ['HTTP';'Response']
ok 9 - isa ['HTTP';'Message']
"load_bytecode" no file name
current instr.: 'parrot;HTTP;Date;time2str' pc 8 (runtime/parrot/library/HTTP/Message.pir:31)
called from Sub 'parrot;LWP;UserAgent;_new_response' pc 1445 (/dev/shm/parrot/runtime/parrot/library/LWP/UserAgent.pir:528)
called from Sub 'parrot;LWP;UserAgent;send_request' pc 210 (/dev/shm/parrot/runtime/parrot/library/LWP/UserAgent.pir:74)
called from Sub 'parrot;LWP;UserAgent;request' pc 406 (/dev/shm/parrot/runtime/parrot/library/LWP/UserAgent.pir:133)
called from Sub 'test_unknown_protocol' pc 273 (t/library/lwp.t:68)
called from Sub 'main' pc 51 (t/library/lwp.t:27)

Running the whole thing under strace, the load bytecode is failing because it's calling

fstat(1275397164, 0x7fffffffd1c0)       = -1 EBADF (Bad file descriptor)

where it looks like that first argument to fstat() may have come from a previous call to time().

I have been able to reproduce this problem on a quad-core x86 and on a dual-core amd64. Both were running Debian Linux "Lenny" (aka "stable").

I have attached the script I used to reproduce the failures, and a tar file showing the different outputs of make, ./parrot t/library/lwp.t, and strace ./parrot t/library/lwp.t.

I vaguely suspect that the recent shuffling of some ops out of core means that some library wasn't available when it was needed, though why that didn't result in a build failure is a mystery to me.

This particular run was at r47170, though I first noticed this problem with r47059, and it could be even older than that.

Attachments

reproduce-lwp-failure.sh Download (0.7 KB) - added by doughera 12 years ago.
Script to reproduce failures
lwp-failures.tar.gz Download (28.8 KB) - added by doughera 12 years ago.
Compressed tar files of logs for make-j5 and make-j6 resulting from the test script.
untitled-part.html Download (4.1 KB) - added by francois.perrad@… 12 years ago.
Added by email2trac

Change History

Changed 12 years ago by doughera

Script to reproduce failures

Changed 12 years ago by doughera

Compressed tar files of logs for make-j5 and make-j6 resulting from the test script.

Changed 12 years ago by francois.perrad@…

2010/6/1 Parrot <parrot-tickets@lists.parrot.org>

> #1670: t/library/lwp.t fails after parallel build
>
> ----------------------+-----------------------------------------------------
>  Reporter:  doughera  |       Owner:
>     Type:  bug       |      Status:  new
>  Priority:  normal    |   Milestone:
> Component:  none      |     Version:  2.4.0
>  Severity:  medium    |    Keywords:
>     Lang:            |       Patch:
>  Platform:            |
>
> ----------------------+-----------------------------------------------------
>  I am getting the following strange failure with t/library/lwp.t, but only
>  if I build with {{{make -j 6}}} (or greater).  The test succeeds if I
>  build with {{{make -j 5}}}.
>
>  The failure looks like this:
>
>  {{{
>  1..48
>  ok 1 - new ['LWP';'UserAgent']
>  ok 2 - new ['LWP';'Protocol';'file']
>  ok 3 - isa ['LWP';'Protocol']
>  ok 4 - new ['LWP';'Protocol';'http']
>  ok 5 - isa ['LWP';'Protocol']
>  ok 6 - new ['HTTP';'Request']
>  ok 7 - isa ['HTTP';'Message']
>  ok 8 - new ['HTTP';'Response']
>  ok 9 - isa ['HTTP';'Message']
>  "load_bytecode" no file name
>  current instr.: 'parrot;HTTP;Date;time2str' pc 8
>  (runtime/parrot/library/HTTP/Message.pir:31)
>  called from Sub 'parrot;LWP;UserAgent;_new_response' pc 1445
>  (/dev/shm/parrot/runtime/parrot/library/LWP/UserAgent.pir:528)
>  called from Sub 'parrot;LWP;UserAgent;send_request' pc 210
>  (/dev/shm/parrot/runtime/parrot/library/LWP/UserAgent.pir:74)
>  called from Sub 'parrot;LWP;UserAgent;request' pc 406
>  (/dev/shm/parrot/runtime/parrot/library/LWP/UserAgent.pir:133)
>  called from Sub 'test_unknown_protocol' pc 273 (t/library/lwp.t:68)
>  called from Sub 'main' pc 51 (t/library/lwp.t:27)
>  }}}
>
>  Running the whole thing under strace, the load bytecode is failing because
>  it's calling
>  {{{
>  fstat(1275397164, 0x7fffffffd1c0)       = -1 EBADF (Bad file descriptor)
>  }}}
>  where it looks like that first argument to fstat() may have come from a
>  previous call to time().
>
>  I have been able to reproduce this problem on a quad-core x86  and on a
>  dual-core amd64.  Both were running Debian Linux "Lenny" (aka "stable").
>
>  I have attached the script I used to reproduce the failures, and a tar
>  file showing the different outputs of
>  {{{make}}},
>  {{{ ./parrot t/library/lwp.t}}}, and
>  {{{ strace ./parrot t/library/lwp.t}}}.
>
>  I vaguely suspect that the recent shuffling of some ops out of core means
>  that some library wasn't available when it was needed, though why that
>  didn't result in a build failure is a mystery to me.
>
>  This particular run was at r47170, though I first noticed this problem
>  with r47059, and it could be even older than that.
>
> --
> Ticket URL: <https://trac.parrot.org/parrot/ticket/1670>
> Parrot <https://trac.parrot.org/parrot/>
> Parrot Development
> _______________________________________________
> parrot-tickets mailing list
> parrot-tickets@lists.parrot.org
> http://lists.parrot.org/mailman/listinfo/parrot-tickets
>


I think it is another opcode mixture (see TT #1663),
because the subroutine time2str() doesn't use the opcode load_bytecode.

François

untitled-part.html Download

Changed 12 years ago by francois.perrad@…

Added by email2trac

Changed 12 years ago by julian.notfound@…

I also had that problem several times, even without -j

> I think it is another opcode mixture (see TT #1663),
> because the subroutine time2str() doesn't use the opcode load_bytecode.

I think so. I looked at a trace and the disassembled code doesn't look
like the source.

-- 
Salu2

Changed 12 years ago by doughera

  • status changed from new to closed
  • resolution set to fixed

The workarounds added (referenced in TT #1663) seem to have made this particular manifestation of the problem disappear. Basically, they amount to manually loading dynops in a specific order to avoid the problem -- which defeats the purpose of having dynops, but at least lets things run again. Closing this ticket.

Note: See TracTickets for help on using tickets.