Ticket #549 (closed todo: fixed)

Opened 13 years ago

Last modified 12 years ago

Kill UnionVal

Reported by: whiteknight Owned by: whiteknight
Priority: normal Milestone:
Component: none Version:
Severity: medium Keywords: UnionVal
Cc: Language:
Patch status: Platform:

Description

Thanks to cotto++, PMCs now don't rely on UnionVal anymore. However, a number of other structures still do rely on it:

1) call_state structures, as used in the various calling conventions and JIT system files 2) stacks, especially in src/stacks.c. 3) Lists, in src/list.c. 4) STRINGs, in src/string/*. Uses mostly buflen and bufstart to manage system memory. Also shows up in a few places in src/gc/* where string memory is managed.

There are a few other places maybe too. We're going to have to fix up all these systems to avoid UnionVal before we can remove it completely.

Attachments

unionvalrem.patch Download (14.0 KB) - added by jessevdam 12 years ago.
patch removing the unionval struct from the pobj.h
killunionvalpatch.patch Download (105.5 KB) - added by jessevdam 12 years ago.
kill unionval patch
killunionvalpatch2.patch Download (106.6 KB) - added by jessevdam 12 years ago.
patch tested on r40531
changes_on_pmc_sans_unionval_branch.patch Download (2.1 KB) - added by jessevdam 12 years ago.
changes on sans unionval branch
pmc_sans_unionval.patch Download (166.9 KB) - added by whiteknight 12 years ago.
patch for the current status of the pmc_sans_unionval branch at r40666

Change History

Changed 13 years ago by cotto

Are you sure we want to kill all instances of it? I thought it was only deprecated as far as it was used in PMCs.

Changed 13 years ago by cotto

On  #parrot, allison confirmed that we do eventually want to remove all uses of UnionVal from Parrot. This will probably be a significant task, but it should divide fairly well into subtasks, i.e. PCC, STRING, generic PObj and whatever else needs updating.

This should be a smaller task than updating the core PMCs was. The string "UVal" only appears in a current svn checkout 88 times, compared to ~1200 uses of the PMC_x_val macros before they were replaced.

Changed 13 years ago by doughera

Actually, I'd expect it to be a much more difficult task, since the different structures involved are intended to be isomorphic (at least for the cache and flags values) and since the current specific memory layout is intertwined in complex and undocumented ways with the various memory management and garbage collection routines.

A good starting question might be: "What, exactly, are those structures supposed to look like after this?"

Changed 12 years ago by jessevdam

I managed to kill the Unionval form the pobj.h.

I replaced the union with the _bufstart and _buflen because there are used by the gc system. As far I know the gc system can't do without them, but I am not sure.

The call_state struct in the call.h did also make use of the unionval, but this is a different structure then the core structures in pobj.h, so I made a UnionCallStateVal in the call.h which is the unionval struct without the b struct containing _bufstart and _buflen.

By doing this I also had to update the jiddebug.c and jit_debug_xcoff.c files, but it seemed there where already gone outdated.

I also cleaned up some of the unused structures in the pobj.h.

I attached a patch with the changes I made to remove the unionval structure.

Changed 12 years ago by jessevdam

patch removing the unionval struct from the pobj.h

Changed 12 years ago by jessevdam

Existing reference I found for PObj_bufstart are in

gc system

list.c (This one uses the Buffer struct so that's ok)

string.c (To my opinion that's ok)

debug.c in dump_string (ok)

pmc_freeze.c (an assert related to string)

nativecall.pl (related to string)

As far I can the PObj_bufstart and PObj_buflen are used by the gc system and string system. I do not know exactly how the gc work, but it seems that the gc can't do without these 2 field. The string subsystem memory management is closely related to memory/gc management to make efficiently use of the memory. So this remaining uses seem to be ok, at least in my opinion at the moment.

The list object makes use of the "struct" buffer, in a different structure.

Changed 12 years ago by doughera

There is additional discussion of this ticket on the parrot-dev maililng list, starting at  http://lists.parrot.org/pipermail/parrot-dev/2009-July/002627.html.

In summary, Whiteknight suggested the following:

In the long term, I suspect PMCs will look like this:

struct PMC {
    Parrot_UInt     flags;
    VTABLE         *vtable;
    DPOINTER       *data;
    struct PMC_EXT *pmc_ext;
};

Or maybe this, if PMC_EXT is common enough that we can prevent it from
being a separate structure:

struct PMC {
    Parrot_UInt flags;
    VTABLE *vtable;
    DPOINTER *data;
    PMC *metadata;
    struct _Sync *synchronize;
};

STRING will look like this:

struct parrot_string_t {
    Parrot_UInt flags;
    char       *strstart;
    void *_bufstart;
    UINTVAL _buflen;
    UINTVAL     bufused;
    UINTVAL     strlen;
    UINTVAL     hashval;
    const struct _encoding *encoding;
    const struct _charset  *charset;
};

In attempting to think through the implications, I drew up the following notes for one way I thought all the different pieces might fit together. Apologies in advance if I get the details wrong.

We start with the following basic structures:

/* From include/parrot/pobj.h */

typedef struct pobj_t {
    Parrot_UInt flags;
} pobj_t;

struct PMC {
    Parrot_UInt     flags;
    VTABLE         *vtable;
    DPOINTER       *data;
    struct PMC_EXT *pmc_ext;
};

When a PMC is no longer needed, the following function in src/gc/gc_ms.c is called to add it to the free list:

static void
gc_ms_add_free_object(SHIM_INTERP, ARGMOD(Small_Object_Pool *pool),
    ARGIN(void *to_add))
{
    ASSERT_ARGS(gc_ms_add_free_object)
    PObj *object           = (PObj *)to_add;

    PObj_flags_SETTO(object, PObj_on_free_list_FLAG);

    ((GC_MS_PObj_Wrapper*)object)->next_ptr = (PObj *)pool->free_list;
    pool->free_list        = object;
}

The GC_MS_PObj_Wrapper structure is currently defined in src/gc/gc_private.h as

typedef union GC_MS_PObj_Wrapper {
    PObj obj;
    PObj *next_ptr;
} GC_MS_PObj_Wrapper;

The intent is to take an existing field of a PObj-derived structure and use it to hold the pool->free_list pointer. Currently, it overwrites the UnionVal location. With the proposed layout, the flags would be overwritten, but presumably the GC_MS_PObj_Wrapper structure would be adjusted to match the new structure layouts.

A related issue has to do with the many calls to POBj_bufstart() and POBj_buflen(). Not every PMC needs a bufstart or buflen field, so those won't be available in the pobj_t structure anymore.

What this seems to imply is a somewhat richer hierarchy of structures than exists now. Specifically, right now, pobj_t and Buffer structures are identical. It might make sense to change that. pobj_t would still be the "base class for all others", and Buffer would be the "base class" used for structures that need the bufstart and buflen members.

Perhaps something like this:

/* Parrot Object - base class for all others */
typedef struct pobj_t {
    Parrot_UInt flags;
} pobj_t;

typedef pobj_t PObj;

struct PMC {
    Parrot_UInt     flags;
    VTABLE         *vtable;
    DPOINTER       *data;
    struct PMC_EXT *pmc_ext;
};

/* plain Buffer is the base class for all "buffer-like" 
   parrot objects.
*/
typedef struct Buffer {
    Parrot_UInt flags;
    void       *_bufstart;
    UINTVAL     _buflen;
} Buffer;

struct parrot_string_t {
    Parrot_UInt flags;
    void       *_bufstart;
    UINTVAL     _buflen;
    char       *strstart;
    UINTVAL     bufused;
    UINTVAL     strlen;
    UINTVAL     hashval;
    const struct _encoding *encoding;
    const struct _charset  *charset;
};

(Note I've rearranged the first few elements of parrot_string_t to be isomorphic with Buffer.)

Over in src/gc/gc_private.h, the Wrapper structure becomes

typedef struct GC_MS_PObj_Wrapper {
    Parrot_UInt flags;
    PObj *next_ptr;
} GC_MS_PObj_Wrapper;

where the next_ptr overlays either vtable in a PMC or _bufstart in a "Buffer-like" structure.

Then the PObj_buflen() and PObj_bufstart macros would be defined as:

#define PObj_bufstart(b)    (b)->_bufstart
#define PObj_buflen(b)      (b)->_buflen

with the new requirement that they can only be used on "Buffer-like" objects, not on PMCs or PObjs.

Thanks to WhiteKnight for explaining this task in more detail.

Changed 12 years ago by jessevdam

I succeeded in removing the unionval struct from the pmc struct.

I considered the PARROT_GC_IMS, PARROT_GC_GMS, PARROT_GC_INF gc systems as not working and dead at the moment. So I did not do any tests on these.

I added a documentation (memory_internals2.pod) file, which give a bit extra documentation on the memory internals of the gc system.

The patch I made includes the following items

* Removal of the unionval struct

* The pmc_ext has been removed, it has been intregrated into the pmc struct.

* Major memory leak fix, buffers including strings where never released back by the gc, only the headers where released. The initialize_header_pools and associated functions did not set the pool->gc_object function.

* Added assertion functions for memory system.

* Changes in pmc_freeze.c to keep the used String intact, needed to keep my assertion function happy

* Updated jit_debug and jit_debug_xcoff to new pobj.h structure

* Renamed the PObj_buf* macros to Buffer_buf*

* Added memory zeroing when creating new pmc header.

In my opinion the following can still to be done. (may be open some other tickets, but I mention these thing here so body else in future nows about it)

* Regroup the gc/memory management function into the right *.c files

* Make a better split up of the gc system and memory system, the link between them is not that strong

* (mentioned earlier) Make option of dynamically loading the a gc system at startup, so it's easier to run some unit test for the different memory systems.

* Make the other gc systems work again

* The data pointed by pmc->data is now allocated via the c allocation system, but I think we can get a nice boost if we would use the small object pool system for those pmc (class) attributes.

I used the assertion code to find some errors, but did not a complete test with it, it succeeds with most things, but does not pass every thing. Although the system keeps running correctly. So it could be my assertion code misses some exceptions to the rule. But the code is there and can be used to search for bugs if the are strange memory system corruptions.

Changed 12 years ago by jessevdam

kill unionval patch

Changed 12 years ago by whiteknight

Wow, quite a nice patch! Some of the things you mentioned (allocating pmc->data from fixed-size pools) is already being worked on in the auto_attrs branch. The GC_INF collector *should* work, although it is know (and expected) to fail a handful of tests. I can run some tests on that core with your patch applied to make sure nothing funny happens in that case.

Also, I'd be happy to hear any ideas about how you think we should split up functions in the various files.

Changed 12 years ago by whiteknight

Bad news already. I created a fresh branch, pmc_sans_unionval, to test out this patch. However, it doesn't build. Here are the errors I get:

$ Projects> make
Compiling with:
xx.c
ccache gcc -I./include -D_REENTRANT -D_GNU_SOURCE -DDEBIAN -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -DHASATTRIBUTE_CONST -DHASATTRIBUTE_DEPRECATED -DHASATTRIBUTE_MALLOC -DHASATTRIBUTE_NONNULL -DHASATTRIBUTE_NORETURN -DHASATTRIBUTE_PURE -DHASATTRIBUTE_UNUSED -DHASATTRIBUTE_WARN_UNUSED_RESULT -falign-functions=16 -fvisibility=hidden -funit-at-a-time -maccumulate-outgoing-args -W -Wall -Waggregate-return -Wcast-align -Wcast-qual -Wchar-subscripts -Wcomment -Wdisabled-optimization -Wendif-labels -Wextra -Wformat -Wformat-extra-args -Wformat-nonliteral -Wformat-security -Wformat-y2k -Wimplicit -Wimport -Winit-self -Winline -Winvalid-pch -Wlogical-op -Wmissing-braces -Wmissing-field-initializers -Wno-missing-format-attribute -Wmissing-include-dirs -Wpacked -Wparentheses -Wpointer-arith -Wreturn-type -Wsequence-point -Wno-shadow -Wsign-compare -Wstrict-aliasing -Wstrict-aliasing=2 -Wswitch -Wswitch-default -Wtrigraphs -Wundef -Wunknown-pragmas -Wno-unused -Wvariadic-macros -Wwrite-strings -Wlarger-than-4096 -Wbad-function-cast -Wc++-compat -Wdeclaration-after-statement -Werror=declaration-after-statement -Wimplicit-function-declaration -Wimplicit-int -Wmain -Wmissing-declarations -Wmissing-prototypes -Wnested-externs -Wnonnull -DHAS_GETTEXT -g -DHAVE_COMPUTED_GOTO -fPIC -I. -o xx.o -c xx.c
cc -o miniparrot src/main.o src/null_config.o \
	-Wl,-rpath=/home/andrew/Projects/pmc_sans_unionval/blib/lib -L/home/andrew/Projects/pmc_sans_unionval/blib/lib -lparrot -lm -L/usr/lib  -licuuc -licudata -lm -ldl -lm -lpthread -lcrypt -lgmp  -L/usr/local/lib -Wl,-E  
/home/andrew/Projects/pmc_sans_unionval/blib/lib/libparrot.so: undefined reference to `PObj_bufstart'
/home/andrew/Projects/pmc_sans_unionval/blib/lib/libparrot.so: undefined reference to `PObj_buflen'
collect2: ld returned 1 exit status
make: *** [miniparrot] Error 1

Also, there appear to be many (read: several dozen) references to PObj_bufstart and PObj_buflen in the codebase still. I'm not sure if this should be the case, or what is going wrong. I'll take a look through to see if I can figure out what this patch is doing and how to fix it.

I committed this patch to the branch in r40531. Any followup fixes/patches should be taken against this branch at that revision. We'll delete that branch if we haven't gotten it building reasonably by the release of 1.5.0, and figure out a better way forward.

Changed 12 years ago by jessevdam

I tested my patch on version 40531, but on my ubuntu linux 9.04 on i386 machine, works fine. May be a make clean will help. Although I looked around for forgotten PObj_bufstart and PObj_buflen items and found 2. One in the unicode.c when PARROT_HAS_ICU is defined and one in the nativecall.pl which end up in the generated nci.c. So fixed this 2 items. I also fixed 1 warning in my code.

Following the next piece of code form the settings.h and gc_malloc.c, do I consider the code in generational_ms.c, incremental_ms.c and gc_malloc.c as dead and so I did not do any update on those.

From settings.h

#define PARROT_GC_SUBSYSTEM 0 /*

  • GC_SUBSYSTEM selection
  • 0 ... MS stop-the-world mark & sweep
  • 1 ... IMS incremental mark & sweep
  • 2 ... GMS generational mark & sweep
  • 3 ... INF infinite memory "collector" *
  • Please note that only 0 and 3 currently work (and INF doesn't really
  • "work"). */

From gc_malloc.c.

Handles garbage collection with malloc()/free(). Note that this doesn't currently work; this file just collects all of the #GC_MALLOC functions in one convenient place.

Changed 12 years ago by jessevdam

patch tested on r40531

Changed 12 years ago by whiteknight

I'll try to apply this new patch to the branch when I get home from work tonight. Do me a favor, check out a copy of the pmc_sans_unionval branch and give it a try. We need to make sure it works in that location before we can think about merging it into trunk.

Changed 12 years ago by jessevdam

I tested the pmc_sans_unionval branch and it builds and passes all tests. Without any changes after the checkout did it work.

But I saw 2 missed references to PObj_buflen and PObj_bufstart. So I fixed those to.

I removed 1 warning in my code.

I added a small bug fix on my assertion code.

Changed 12 years ago by jessevdam

changes on sans unionval branch

Changed 12 years ago by whiteknight

  • owner set to whiteknight

Thanks for the new patch! I've applied it to the branch in r40562. There's a freeze right now on merging branches into trunk before the release. I'll start up a conversation to make sure everybody is happy with this change.

Changed 12 years ago by whiteknight

Here's the current status: I have talked to allison about this and have her thumbs-up to get this applied to trunk. I've also gone through and made a number of fixes and cleanups (removing unused stuff, codingstd fixes, etc).

However, after a few recent commits to trunk this branch doesn't merge in cleanly. I need to get that resolved before we can merge it.

Changed 12 years ago by whiteknight

patch for the current status of the pmc_sans_unionval branch at r40666

Changed 12 years ago by whiteknight

I've posted a patch of all the changes that have been made to the branch since it was created. All tests pass *except* t/pmc/complex.t, which segfaults on test 117. It's worth noting that even if we comment out that test, the next one fails.

Once that test is made to pass, we can commit this patch to trunk.

Changed 12 years ago by whiteknight

resolved in r40726

Changed 12 years ago by whiteknight

  • status changed from new to closed
  • resolution set to fixed
Note: See TracTickets for help on using tickets.