Ticket #1499 (closed bug: fixed)

Opened 4 years ago

Last modified 4 years ago

Null pointer dereference in imageio.pmc

Reported by: arnsholt Owned by: plobsing
Priority: normal Milestone:
Component: none Version:
Severity: medium Keywords:
Cc: Language: perl6
Patch status: Platform: darwin

Description

Running Parrot r44371, Rakudo bf29be, OS X 10.6.2.

When running perl6 t/spec/S32-trig/sinh.t I get intermittent segfaults. I've recompiled both Parrot and Rakudo with debugging symbols, giving the following gdb backtrace:

(gdb) back
#0  0x0000000000000000 in ?? ()
#1  0x0000000100ce13ec in visit_todo_list_freeze (interp=0x101208890, pmc=0x10cadd220, info=0x10cadd1f8) at imageio.pmc:204
#2  0x0000000100ce1c7b in Parrot_ImageIO_set_pmc (interp=0x101208890, pmc=0x10cadd1f8, p=0x10cadd220) at imageio.pmc:492
#3  0x0000000100b94e15 in Parrot_freeze (interp=0x101208890, pmc=0x10cadd220) at src/pmc_freeze.c:58
#4  0x0000000100c6d09d in Parrot_default_clone (interp=0x101208890, pmc=0x10cadd220) at default.pmc:1068
#5  0x0000000100b897fb in Parrot_oo_clone_object (interp=0x101208890, pmc=0x1061b7d18, class_=0x102217ea8, dest=0x0) at src/oo.c:291
#6  0x0000000100cf9dba in Parrot_Object_clone (interp=0x101208890, pmc=0x1061b7d18) at object.pmc:723
#7  0x00000001011be039 in Parrot_P6opaque_clone ()
#8  0x0000000100a9cc06 in Parrot_clone_p_p (cur_opcode=0x101303b08, interp=0x101208890) at set.ops:474
#9  0x0000000100b98864 in runops_fast_core (interp=0x101208890, runcore=0x10121c2b0, pc=0x101303b08) at src/runcore/cores.c:670
#10 0x0000000100b9736c in runops_int (interp=0x101208890, offset=202378) at src/runcore/main.c:549
#11 0x0000000100b5ad92 in runops (interp=0x101208890, offs=202378) at src/call/ops.c:112
#12 0x0000000100b5049e in Parrot_pcc_invoke_from_sig_object (interp=0x101208890, sub_obj=0x10256f868, call_object=0x1016f9f98) at src/call/pcc.c:314
#13 0x0000000100b50652 in Parrot_pcc_invoke_sub_from_c_args (interp=0x101208890, sub_obj=0x10256f868, sig=0x100d5cb7a "P->") at src/call/pcc.c:75
#14 0x0000000100b33c36 in Parrot_runcode (interp=0x101208890, argc=2, argv=0x7fff5fbff810) at src/embed.c:826
#15 0x0000000100000d69 in main ()

The line in imageio.pmc calls the VTABLE_push_pmc macro, which expands to a function pointer call to the push_pmc member in the vtable of info->todo, which is NULL in this case.

Attachments

vtable.txt Download (4.9 KB) - added by arnsholt 4 years ago.
gdb printout of todo->vtable

Change History

in reply to: ↑ description ; follow-up: ↓ 3   Changed 4 years ago by plobsing

  • owner set to plobsing

Replying to arnsholt:

The line in imageio.pmc calls the VTABLE_push_pmc macro, which expands to a function pointer call to the push_pmc member in the vtable of info->todo, which is NULL in this case.

I'm not sure how that could have happened. info->todo gets set to an RPA upon creation of the ImageIO pmc and is not pointed elsewhere after.

I will try to reproduce this later today.

It would help me if you tried to:

  • check whether info->todo gets set to an RPA in ImageIO.init (src/pmc/imageio.pmc:240)
  • figure out what is resetting it to NULL

  Changed 4 years ago by bacek

Hello.

Symptoms are quite similar to #1080 and #1081...

-- Bacek

in reply to: ↑ 1 ; follow-up: ↓ 4   Changed 4 years ago by plobsing

Replying to plobsing:

I will try to reproduce this later today.

I can't seem to reproduce this. In fact, I can't seem to build the version of rakudo specified (git checkout bf29be), because it makes use of a symbol "PObj_active_destroy_SET", which isn't in parrot r44371 or later.

I have also unsuccessfully tried to reproduce against the latest rakudo (b348b3ed0464c504b80b159075e5a3773f3ef464) and parrot (r44657).

It would help me if you tried to: * check whether info->todo gets set to an RPA in ImageIO.init (src/pmc/imageio.pmc:240) * figure out what is resetting it to NULL

Since I cannot reproduce this, your best chance is to try to track this down yourself. If I understand bacek correctly, the suspicion is currently on the init vtable being called properly (or at all?) when creating some pmcs.

in reply to: ↑ 3   Changed 4 years ago by arnsholt

Replying to plobsing:

I can't seem to reproduce this. In fact, I can't seem to build the version of rakudo specified (git checkout bf29be), because it makes use of a symbol "PObj_active_destroy_SET", which isn't in parrot r44371 or later. I have also unsuccessfully tried to reproduce against the latest rakudo (b348b3ed0464c504b80b159075e5a3773f3ef464) and parrot (r44657).

My bad, I'm running the version you tried (I'm not too familiar with git, so I found the wrong checksum). But as I said, it's an intermittent bug. I had to run the test file three times in my gdb session before the bug hit. But the bug seems to hit consistently after test #64, so if you hit higher numbered tests it's probably safe to abort the run, so that you don't have to run all 1568 tests.

I'll poke this more (lookig for a root cause, seeing if Parrot trunk fixes it) later tonight if I get the time, or tomorrow.

follow-up: ↓ 6   Changed 4 years ago by whiteknight

I would suggest that it's probably not a problem with the init vtable as likely as it is a problem with premature GC collection. When you see this problem again in GDB, could you please capture the value of "p * pmc" and "p * pmc->vtable->whoami", and post them here?

in reply to: ↑ 5 ; follow-up: ↓ 8   Changed 4 years ago by arnsholt

Replying to whiteknight:

I would suggest that it's probably not a problem with the init vtable as likely as it is a problem with premature GC collection. When you see this problem again in GDB, could you please capture the value of "p * pmc" and "p * pmc->vtable->whoami", and post them here?

I managed to trigger a segfault again, but in visit_todo_list_thaw() instead of visit_todo_list_freeze(). Backtrace:

(gdb) back
#0  0x0000000000000000 in ?? ()
#1  0x0000000100ce15b6 in visit_todo_list_thaw (interp=0x101208890, info=0x10cadd220) at imageio.pmc:158
#2  0x0000000100ce174a in Parrot_ImageIO_set_string_native (interp=0x101208890, pmc=0x10cadd220, image=0x10c3fff98) at imageio.pmc:521
#3  0x0000000100b94ed5 in Parrot_thaw (interp=0x101208890, image=0x10c3fff98) at src/pmc_freeze.c:134
#4  0x0000000100c6d0a9 in Parrot_default_clone (interp=0x101208890, pmc=0x1061b7520) at default.pmc:1068
#5  0x0000000100b896f4 in Parrot_oo_clone_object (interp=0x101208890, pmc=0x1061b7570, class_=0x102217ea8, dest=0x0) at src/oo.c:277
#6  0x0000000100cf9dba in Parrot_Object_clone (interp=0x101208890, pmc=0x1061b7570) at object.pmc:723
#7  0x00000001011be039 in Parrot_P6opaque_clone ()
#8  0x0000000100a9cc06 in Parrot_clone_p_p (cur_opcode=0x101303b08, interp=0x101208890) at set.ops:474
#9  0x0000000100b98864 in runops_fast_core (interp=0x101208890, runcore=0x10121c2b0, pc=0x101303b08) at src/runcore/cores.c:670
#10 0x0000000100b9736c in runops_int (interp=0x101208890, offset=202378) at src/runcore/main.c:549
#11 0x0000000100b5ad92 in runops (interp=0x101208890, offs=202378) at src/call/ops.c:112
#12 0x0000000100b5049e in Parrot_pcc_invoke_from_sig_object (interp=0x101208890, sub_obj=0x10256f868, call_object=0x1016f9f98) at src/call/pcc.c:314
#13 0x0000000100b50652 in Parrot_pcc_invoke_sub_from_c_args (interp=0x101208890, sub_obj=0x10256f868, sig=0x100d5cb7a "P->") at src/call/pcc.c:75
#14 0x0000000100b33c36 in Parrot_runcode (interp=0x101208890, argc=2, argv=0x7fff5fbff810) at src/embed.c:826
#15 0x0000000100000d69 in main ()

The PMC is named todo in this case:

(gdb) p *todo
$1 = {
  flags = 524288, 
  vtable = 0x10c780cd0, 
  data = 0x111bc9070, 
  _metadata = 0x0, 
  _synchronize = 0x0
}

todo->vtable->whoami is NULL. I'll add todo->vtable as an attachment, to keep the comments relatively uncluttered.

Changed 4 years ago by arnsholt

gdb printout of todo->vtable

  Changed 4 years ago by whiteknight

That PMC is definitely invalid. It doesnt necessarily look like one that has been prematurely collected, but there is no saying all such instances would have the same result

in reply to: ↑ 6 ; follow-up: ↓ 9   Changed 4 years ago by plobsing

Replying to arnsholt:

todo->vtable->whoami is NULL. I'll add todo->vtable as an attachment, to keep the comments relatively uncluttered.

Thanks, I've been able to occasionally reproduce the failure using this configuration. A few observations:

  • My todo->vtable has zeroes and 0x80000s in the same places, so this isn't a totally random piece of memory, just the same piece of memory taken out of context.
  • The intermittent-ness of this bug is what really bothers me. gc_ms is deterministic, freeze/thaw is deterministic, I certainly hope our math is deterministic, so where is the randomness coming from?
  • This test is *way* too big (from a parrot perspective) for me to get anything useful done in gdb with watchpoints or breakpoints.
  • Boy does rakudo ever copy a lot! And it seems every copy falls back on default.clone which uses freeze/thaw to copy. Given we now have pluggable visit system, someone (probably me) can write up a cloning visitor that won't create an intermediate image, giving you guys faster copies, less string memory churn, and a probably a nice speedup. Downside: it will likely hide this bug again.

in reply to: ↑ 8   Changed 4 years ago by plobsing

Replying to plobsing:

* The intermittent-ness of this bug is what really bothers me. gc_ms is deterministic, freeze/thaw is deterministic, I certainly hope our math is deterministic, so where is the randomness coming from?

I've found out where the randomness is comming from: hash seeding, which currently uses the current time.

Unfortunately, setting the hash seed is broken in r44371; only fixed in r44718, where I can't get this bug to manifest. To get a parrot on which you can reliably reproduce this bug, you'll have to checkout r44371 and then merge the patch 44717:44718. Running this, I got the following results:

# fails to reproduce bug (0/20 runs)
> parrot --hash-seed F00F perl6.pbc t/spec/S32-trig/sinh.t

# probe for seeds that trigger this bug
> parrot --hash-seed $(perl -e "printf qq{%x\n}, scalar time" | tee seed) perl6.pbc t/spec/S32-trig/sinh.t
> cat seed

# consistently reproduces bug (8/8 runs)
> parrot --hash-seed 4b93fa3a perl6.pbc t/spec/S32-trig/sinh.t

  Changed 4 years ago by nwellnhof

r48697 might help.

  Changed 4 years ago by plobsing

  • status changed from new to closed
  • resolution set to fixed

This bug has not been reproducable for months. It is either fixed or hidden.

Based on the source of the non-determinism, it was most likely a bug related to hashes, which are a well exercised feature of Parrot, so this bug is most likely fixed.

Note: See TracTickets for help on using tickets.