Ticket #2194 (new bug)

Opened 3 years ago

Last modified 3 years ago

Alignment errors on Solaris/SPARC since 3.7.0

Reported by: Andy Dougherty <doughera@…> Owned by:
Priority: normal Milestone: 2.11
Component: core Version: 3.8.0
Severity: medium Keywords:
Cc: Language:
Patch status: Platform:

Description

On Solaris/SPARC I now see bus errors where I didn't before in 3.7.0.

Curiously, the exact failures seem to depend on the compiler used
(Sun's cc vs. gcc).  Accordingly, git bisect finds different "bad"
commits.

#############

Here are the results from Sun's compiler:

git bisect start
# good: [9114095c0442b9b07954d0a6d621954b507946eb] prep for release 3.7.0
git bisect good 9114095c0442b9b07954d0a6d621954b507946eb
# bad: [dd3f6b86f1ba5a21684444b520859b5dc77c74fd] Fix g++ build errors
git bisect bad dd3f6b86f1ba5a21684444b520859b5dc77c74fd
# good: [f9b416751ba15043364a39fd56a36e44d16d03f4] Merge branch 'all-hll-test'
git bisect good f9b416751ba15043364a39fd56a36e44d16d03f4
# good: [3803c5c107c7da58198d74e8359507c3c0f1fd26] inform about missing program name from prt0
git bisect good 3803c5c107c7da58198d74e8359507c3c0f1fd26
# good: [554effb10a60b032ece2557621ae639527446073] Merge branch 'whiteknight/frontend_parrot2'
git bisect good 554effb10a60b032ece2557621ae639527446073
# good: [d74dbbe0c7c5bfba3bbda3efc159d7cdb249bbb8] manually import into api.yaml several previously missed deprecations
git bisect good d74dbbe0c7c5bfba3bbda3efc159d7cdb249bbb8
# skip: [e54106bb8406f198f974ec3dbe5cc748ec458dc5] first stab at a gc is_pmc_ptr optimization from jnthn__++. Parrot mostly builds and is only a little segfaulty. I need to double-check some logic
git bisect skip e54106bb8406f198f974ec3dbe5cc748ec458dc5
# skip: [594464e6f18e9e50e703550f12e0a89d82ae3725] a few small changes. Parrot seems to be less segfaulty now
git bisect skip 594464e6f18e9e50e703550f12e0a89d82ae3725
# good: [8b78a4f3ad8b65bf4a6088e1cffccce51db1114d] fix api.yaml
git bisect good 8b78a4f3ad8b65bf4a6088e1cffccce51db1114d
# bad: [1b0e041e79c81ef6626f77bd1a791c8af48506f3] misc cleanups
git bisect bad 1b0e041e79c81ef6626f77bd1a791c8af48506f3
# good: [633ac86535f1418ae55e7a126a1ce27ae5c42746] Update NEWS for whiteknight/frontend_parrot2 merge. Probably needs better wording
git bisect good 633ac86535f1418ae55e7a126a1ce27ae5c42746
There are only 'skip'ped commits left to test.
The first bad commit could be any of:
594464e6f18e9e50e703550f12e0a89d82ae3725
e54106bb8406f198f974ec3dbe5cc748ec458dc5
1b0e041e79c81ef6626f77bd1a791c8af48506f3

Unfortunately, the "skip'ed" commits are ones that wouldn't compile
due to various errors, so I haven't narrowed it down further.

The first "bad" commit that builds is 1b0e041e79c81ef6626f77bd1a791c8af48506f3
which crashes on pbc_to_exe.  Here's the debugger output:

Reading symbolic information for pbc_to_exe
core file header read successfully
Reading symbolic information for rtld /usr/lib/ld.so.1
Reading symbolic information for libparrot.so
Reading symbolic information for libsocket.so.1
Reading symbolic information for libnsl.so.1
Reading symbolic information for libdl.so.1
Reading symbolic information for libm.so.1
Reading symbolic information for libpthread.so.1
Reading symbolic information for librt.so.1
Reading symbolic information for libintl.so.1
Reading symbolic information for libw.so.1
Reading symbolic information for libc.so.1
Reading symbolic information for libmp.so.2
Reading symbolic information for libaio.so.1
Reading symbolic information for libc_psr.so.1
Reading symbolic information for libthread.so.1
Reading symbolic information for os.so
detected a multithreaded program
t@1 (l@1) terminated by signal BUS (invalid address alignment)
Current function is Parrot_pa_is_owned
  147           if (PTR2UINTVAL(ref) > PTR2UINTVAL(chunk) + CHUNK_SIZE)
(dbx) print chunk
chunk = (nil)
(dbx) where
current thread: t@1
=>[1] Parrot_pa_is_owned(interp = 0x2c790, self = 0x2ca30, orig = 0xce140, ref = 0xce13d), line 147 in "pointer_array.c"
  [2] gc_gms_is_pmc_ptr(interp = 0x2c790, ptr = (nil)), line 1487 in "gc_gms.c"
  [3] trace_mem_block(interp = 0x2c790, mem_pools = (nil), lo_var_ptr = 4290706076U, hi_var_ptr = 4290703980U), line 487 in "system.c"
  [4] trace_system_stack(interp = 0x10d968, mem_pools = 0x1), line 243 in "system.c"
  [5] trace_system_areas(interp = 0x2, mem_pools = 0x1), line 212 in "system.c"
  [6] Parrot_gc_trace_root(interp = 0x2c790, mem_pools = 0x2c940, trace = 182808), line 183 in "mark_sweep.c"
  [7] gc_gms_mark_and_sweep(interp = (nil), flags = 4290704572U), line 805 in "gc_gms.c"
  [8] gc_gms_allocate_string_header(interp = (nil), flags_unused = 0), line 1523 in "gc_gms.c"
  [9] Parrot_gc_new_string_header(interp = 0x2c790, flags = 6745628U), line 365 in "api.c"
  [10] Parrot_str_new_init(interp = 0x103f80, buffer = 0x2000 "", len = 5U, encoding = 0x10ad28, flags = 4281100032U), line 653 in "api.c"
  [11] Parrot_str_from_uint(interp = 0x2c790, tc = 0xffbef5fc "", num = 0, base = 10U, minus = 0), line 3220 in "api.c"
  [12] Parrot_str_from_int_base(interp = 0x2c790, tc = 0xffbef5fc "", num = 52LL, base = 10U), line 3250 in "api.c"
  [13] Parrot_str_from_int(interp = 0x2c790, i = 1277804), line 2123 in "api.c"
  [14] Parrot_set_s_i(cur_opcode = 0x2c790, interp = 0xe80a8), line 19754 in "core_ops.c"
  [15] runops_fast_core(interp = 0x2c790, runcore_unused = 0xda2c8, pc = 0xe38d0), line 504 in "cores.c"
  [16] runops_int(interp = 0x2c790, offset = 0), line 218 in "main.c"
  [17] runops(interp = 0x2c790, offs = 0), line 126 in "ops.c"
  [18] Parrot_pcc_invoke_from_sig_object(interp = 0xff2453f0, sub_obj = 0xffbef914, call_object = 0xffbef910), line 337 in "pcc.c"
  [19] Parrot_pcc_invoke_sub_from_c_args(interp = 0x2c790, sub_obj = 0xe7144, sig = 0xff2453f0 "P->", ...), line 139 in "pcc.c"
  [20] Parrot_pf_execute_bytecode_program(interp = 0x112cc, pbc = 0xffbefa38, sysargs = 0x11290, progargs = (nil)), line 2864 in "api.c"
  [21] Parrot_api_run_bytecode(interp_pmc = 0xd6908, pbc = 0xe806c, sysargs = 0xdbfa4, progargs = (nil)), line 163 in "bytecode.c"
  [22] main(argc = 0, argv = (nil)), line 689 in "pbc_to_exe.c"


In other runs, I encountered a crash at the same spot with chunk==0x1.

#############

Here are the results from gcc-4.1.0:

git bisect start
# bad: [ba4bd62fd6ff8275f02dabadde3767d2eb4d9ee4] Don't try to return a function that returns void
git bisect bad ba4bd62fd6ff8275f02dabadde3767d2eb4d9ee4
# good: [dd3f6b86f1ba5a21684444b520859b5dc77c74fd] Fix g++ build errors
git bisect good dd3f6b86f1ba5a21684444b520859b5dc77c74fd
# bad: [91bf0271eca95e9795b4c4ecc4960cde2e84822c] [codingstd] c_arg_assert
git bisect bad 91bf0271eca95e9795b4c4ecc4960cde2e84822c
# bad: [00fe23d2b65bcab16c70602e6aedf8e7f9846bee] Don't increment line numbers on .annotate directives. This fixes some line number disparities and places where the line number is reported as 0. mls++
git bisect bad 00fe23d2b65bcab16c70602e6aedf8e7f9846bee
# bad: [f7a12d13d6ce9f185ef1c9789cee8e9febd4a1d8] Remove the horribly out-dated and unused Parrot::Test::PIR_PGE
git bisect bad f7a12d13d6ce9f185ef1c9789cee8e9febd4a1d8
# good: [6f57d171a1d76fb1c9c3337c76b8fff9f6da13f8] Merge remote-tracking branch 'gerdr/whiteknight/pmc_is_ptr'
git bisect good 6f57d171a1d76fb1c9c3337c76b8fff9f6da13f8
# bad: [a7ec805b1b38929ca6eaa3a694995fa53bb69fb5] forgot add top_arena
git bisect bad a7ec805b1b38929ca6eaa3a694995fa53bb69fb5
# skip: [88f0795224551220a0766f3924c36907a6089999] revert some cleanups which is wrong
git bisect skip 88f0795224551220a0766f3924c36907a6089999
# bad: [5f6ccb2a8a29e78c38e5e2441905e0640066a15c] various cleanup to fixed_allocator
git bisect bad 5f6ccb2a8a29e78c38e5e2441905e0640066a15c
# bad: [5f6ccb2a8a29e78c38e5e2441905e0640066a15c] various cleanup to fixed_allocator
git bisect bad 5f6ccb2a8a29e78c38e5e2441905e0640066a15c


This bisect points to 5f6ccb2a8a29e78c38e5e2441905e0640066a15c

That version crashes wwhen trying to run miniparrot.  For some reason,
gcc doesn't leave as much helpful information lying around, so the
stack trace is less helpful.

Reading symbolic information for miniparrot
core file header read successfully
Reading symbolic information for rtld /usr/lib/ld.so.1
Reading symbolic information for libparrot.so
Reading symbolic information for libsocket.so.1
Reading symbolic information for libnsl.so.1
Reading symbolic information for libdl.so.1
Reading symbolic information for libm.so.1
Reading symbolic information for libpthread.so.1
Reading symbolic information for librt.so.1
Reading symbolic information for libintl.so.1
Reading symbolic information for libc.so.1
Reading symbolic information for libmp.so.2
Reading symbolic information for libaio.so.1
Reading symbolic information for libc_psr.so.1
Reading symbolic information for libthread.so.1
detected a multithreaded program
t@1 (l@1) terminated by signal BUS (invalid address alignment)
(dbx) where
current thread: t@1
=>[1] Parrot_CallContext_unshift_pmc_orig(0x25be8, 0xe0df0, 0xe0db4, 0xe0e04, 0x0, 0x0), at 0xff23ec20
  [2] Parrot_CallContext_unshift_pmc(0x25be8, 0xe0df0, 0xe0db4, 0xffbef888, 0x0, 0x0), at 0xff23ed3c
  [3] Parrot_pcc_add_invocant(0x25be8, 0xe0df0, 0xe0db4, 0xffbef888, 0x0, 0xff0000), at 0xff156dc4
  [4] Parrot_pcc_invoke_method_from_c_args(0x25be8, 0xe0db4, 0xda5c4, 0xff3302e8, 0xda57c, 0xffbef91c), at 0xff156e88
  [5] imcc_compile_file_api(0xdada4, 0xe0db4, 0xda57c, 0xffbef9d8, 0x0, 0x0), at 0xff3039c4
  [6] run_imcc(0xdada4, 0xda57c, 0xffbefa68, 0xffbefa98, 0xffbefa90, 0xffbefa68), at 0x125d4
  [7] main(0x3, 0xffbefb24, 0xffbefb34, 0x25160, 0x0, 0x0), at 0x129d8


For all of the "bad" revisions I checked, the crash was in the same
spot.

-- 
    Andy Dougherty		doughera@lafayette.edu

Change History

Changed 3 years ago by doughera@…

This message has 0 attachment(s)

Changed 3 years ago by jkeenan

  • type set to bug
  • component changed from none to core

Changed 3 years ago by doughera@…

I have included a patch below that avoids the alignment problem on
Solaris/SPARC.  I don't understand parrot's pools and arenas deeply
enough to know if the fix is in the right place, but I present it anyway
as it's better than crashing.

In the function allocate_new_pool_arena() in src/gc/fixed_allocator.c,
new_arena is assigned to a new chunk of space allocated by malloc.
Then, line 553 of src/gc/fixed_allocator.c sets up the "next" pointer
as follows:

   next            = (Pool_Allocator_Free_List *)(new_arena + 1);

The problem on SPARC is that sizeof(Pool_Allocator_Free_list) == 4,
but some items (such as doubles) must be aligned on 8-byte boundaries.
Thus the "next" pointer is not suitably aligned for all uses.  From here,
the story gets fuzzy, but at least in some test cases, parrot crashes
while accessing an "N" register.  (The original errors reported in this
ticket might be mostly due to other problems in the same file, which
have subsequently been fixed, so they are probably not too relevant.)

The following hacky patch makes the alignment problem go away by
making Pool_Allocator_Free_list larger (and Pool_Allocator_Arena --
I don't understand the difference between them, and didn't have time to
experiment with all possible permutations of this fix).  This means it
is now 8 bytes, and the "next" pointer is suitably aligned for all uses.
A better way would be to have the compiler automatically align stuff for
us, but I'm unclear where the relevant assignments are actually happening,
so I don't know where to do it correctly.


diff --git a/src/gc/fixed_allocator.h b/src/gc/fixed_allocator.h
index 8c2f1b8..7a8366e 100644
--- a/src/gc/fixed_allocator.h
+++ b/src/gc/fixed_allocator.h
@@ -28,11 +28,12 @@ src/gc/fixed_allocator.h - implementation of allocator for small-size objects.
 
 typedef struct Pool_Allocator_Free_List {
     struct Pool_Allocator_Free_List * next;
-
+    char *dummy; /* XXX minimal hack to force alignment suitable for doubles */
 } Pool_Allocator_Free_List;
 
 typedef struct Pool_Allocator_Arena {
     struct Pool_Allocator_Arena * next;
+    char *dummy; /* XXX minimal hack to force alignment suitable for doubles */
 } Pool_Allocator_Arena;
 
 typedef struct Pool_Allocator {


-- 
    Andy Dougherty		doughera@lafayette.edu


Note: See TracTickets for help on using tickets.