[Openmcl-devel] GC aspects, tuning, behavior

Jon S. Anthony j-anthony at comcast.net
Fri Jun 26 16:51:51 PDT 2009


Let me see if I'm getting the basic claims here. First, CCL's
generational GC is not a copying type collector but a mark/compact
collector (that alone is somewhat interesting as it must be one of a
very few that take this route).  So, it doesn't have semi-spaces
(hemispheres) in either new (creation/young) space or in old (aging)
space.  It really only has one old/aging space (the so called region "on
the left").

It uses 3 generations - all in new/creation space (the so called region
"on the right").  This space is of size LISP-HEAP-GC-THRESHOLD.  It is
split into three regions - one for each generation.  Youngest
(generation 0) is probably "right most" with gen 1 adjacent to the left
and gen 2 adjacent to gen 1 on the left.  I don't think the sizes of
these regions is user accessible, but the "threshold limits" (for when
to trigger a minor collection) for them are - they are the
egc-configuration values.  While these probably affect the size of the
regions in some way as well (under the overall constraint of
LISP-HEAP-GC-THRESHOLD), there is no way to configure the sizes of these
regions directly - only the overall size they are carved from:
LISP-HEAP-GC-THRESHOLD.

Since you don't use a copying collector these spaces are not split into
semi-spaces.  Also, for the same basic reason, it doesn't sound like you
use the technique of using older generations as "Tospace" areas for a
given scavenge.

There is some promotion policy (not mentioned, but maybe based on
percent used of the regions for each generation and/or the number of
minor collections an item survives), and minor collections (those
involving only new space - what you seem to call EGC collections) are
performed by mark/compaction in each region.  When promotion occurs, for
Gen I, the items to be promoted (a subset of the live ones) are moved to
Gen I+1.  However, a somewhat interesting wrinkle on this is that the
promotion is done by simply advancing the "top" pointer of Gen I+1's
region - Gen I+1 sort of eats part of Gen I.  Since the scavenge
technique uses order preserving compaction, the piece of Gen I that Gen
I+1 eats will be (most likely anyway) the oldest (via some measure -
probably collection cycles).  However that probably means there will be
a gap in Gen I+1 between it's last set of non promoted collected
survivors and the newly eaten Gen I items.  That gap will be removed on
the next collection (via compaction) of Gen I+1.

Of course any collection for an older generation will collect younger
ones as well (do to the usual idea of only wanting to track - via write
barriers - old-young pointer changes).

When new space fills up completely, a full/global/major collection is
triggered and promotable items in the 3 new space generations are
"moved" to old space (again, probably by simply advancing the "top"
pointer of old space).  It also does a collection (mark/compaction) on
the items in old space (which probably includes the new promoted things
as that step likely happens first).  Old space's "top" (right most)
pointer is adjusted to just past last live thing in old space.  If there
is not enough space freed up in old space to account for all the newly
promoted objects plus the ones remaining in new space so that
LISP-HEAP-GC-THRESHOLD space is available for new space, some extra
space is allocated to the "top" (right most) of new space to achieve
this limit.

The basic scheme does not treat "large" objects specially - I don't
think that buys much either, especially nowadays.

There is no way to invoke or trigger a minor collection (egc
collection).  ccl:gc invokes a full/major/global collection.  That's not
so good, IMHO, but I guess it is what it is.

There is no way to tell the GC/allocator/mutator that a "new" item is
really an old space item, i.e., just allocate it up front in old space.
I don't think that is so good either - there are definitely cases where
this is a major win.  But given your basic scheme, I can see how this
would be non trivial to achieve: There's only one old space.  It's fully
compacted (modulo all current knowledge of live items in it).  There's
simply no place to use to achieve direct old space allocation.  The idea
you mention - moving everything to the "right", allocating, moving
everything "left" - is indeed "crazy".

Is this more or less on track?


/Jon


On Thu, 2009-06-25 at 19:33 -0600, Gary Byers wrote:
> CCL's GC basically manages a single contiguous region of memory; you
> can think of that region as being divided into two subregions.  The
> subregion "on the left" contains allocated objects (things that survived
> the last GC) and the subregion "on the right" contains free space.  When
> an image is first loaded and after a full GC, the size of the free space
> is determined by LISP-HEAP-GC-THRESHOLD.
> 
> Conceptually (it doesn't really work exactly like this), free space fills
> up from left to right; when free space fills up completely, a full GC
> is triggered.  The GC identifies which objects are live, then does an
> in-place, order-preserving compaction of live objects ("towards the left"),
> creating contiguous free space on the right.  If necessary, more pages
> are obtained from the OS so that the free space is of a size determined
> by LISP-HEAP-GC-THRESHOLD; in some cases, the heap can shrink and unneeded
> pages can be returned to the OS.  (There's a large - possibly very large -
> chunk of reserved address space "to the right" of the free area.)
> 
> The fact that garbage is returned to (contiguous) free space by
> in-place, order-preserving compaction offers some useful properties.
> If allocated object X is "to the left of" (has a lower address than)
> object Y, then it is generally true that X is older than Y (X has
> survived more GC cycles than Y has), though X and Y may be of the same
> vintage/generation.  If we wanted to, we could keep some extra
> bookkeeping information and identify each subregion of allocated space
> that had survived I full GCs for I = 1 .. N, and part of that bookkeeping
> might involve adjusting some pointers if/when things moved around due to
> compaction.
> 
> We don't actually do that, but we do use a similar scheme to keep track
> of things on the right end of allocated space (maintaining regions that
> describe "newly allocated things", "things that've survived a few GCs",
> "things that've survived quite a few GCs but might still become garbage
> soon.")  These regions are the EGC's "generations", and they're basically
> just a pair of pointers to the start and end of a region in allocated
> space and a threshold/limit value that affects EGC policy (how big the
> youngest generation can get before the EGC is invoked, how big the older
> ephemeral generations can get before the EGC looks at them as well as
> looking at the youngest generation.)  The EGC's essentially doing the
> same work as the full GC does - identifying live objects and doing an
> in-place order-preserving compaction on them - only it's doing that
> work on (generally) small subregions of at the extreme right end of
> allocated space.  To the extent that it's true that a high percentage
> of newly-allocated objects are short-lived, this work is productive.
> (That's true of all ephemeral/generational GC schemes; the only things
> that might be atypical of CCL's EGC is that promoting/tenuring objects
> from one generation to the next is done by adjusting the bounds of
> the generations, not by copying things to other memory areas.)
> 
> There are a lot of things to like about this scheme; it makes
> allocation quick (since free memory is contiguous) and it generally
> takes time proportional to the number of live objects.  The worst-case
> scenarios - which would involve copying large objects around during
> in-place compaction - tend not to happen all that often (or tend to
> happen a few times before things stabilize.)  In practice, long-lived
> objects tend not to move around, though in theory they could.
> 
> The scheme doesn't special-case "large" objects (they tend to behave
> like the pig in the python: the pig eventually gets from one end of
> the python to the other, but it's not always pleasant for pig or
> python and may involve some premature tenuring.)  There's no way to
> allocate an object and declare it to be "old", though this could be
> done (essentially by pushing everything to the right and allocating
> the object at or near the extreme left.)  I can imagine cases where
> this would be worthwhile and desirable, but in most cases where I'd
> have been concerned about this that concern has been unfounded.
> 
> You can manually invoke a full GC via (CCL:GC).  There's no way
> to trigger the EGC manually.
> 
> I don't remember if anything checks for it, but setting the size
> of generation 0 to 0 will probably cause the EGC to be triggered
> unproductively (if it doesn't cause worse behavior.)  Setting the
> size of generations 1 or 2 to 0 could be done but may not be
> very useful; making the generations slightly larger than the defaults
> could improve performance somewhat (if things are "short-lived, but
> not THAT short-lived.")
> 
> 
> 
> 
> 
> 
> On Thu, 25 Jun 2009, Jon S. Anthony wrote:
> 
> > Hi,
> >
> > This will likely be a few exchanges...
> >
> > As part of my ongoing porting/open sourcing the graph store/semantic
> > engine system I have developed I am now educating myself on various
> > details of CCLs GC.  This is to enable understanding the basic behavior
> > for various possible tuning scenarios in different circumstances. I have
> > a pretty good grip on various details of ACL's GC when originally
> > looking into this sort of stuff, but (being implementation dependent - a
> > good thing here...) CCL's is different.
> >
> > If you have a bunch of objects that will be allocated up front for some
> > processing, and you know that these will basically be around for the
> > life of the program (more or less), it can be useful to just allocate
> > them from the start in old space.  In ACL you can do this by setting the
> > generation-spread to 0 and tenure-limit to nil.  Basically this means
> > new things have no generations to move through - just go directly to old
> > space - and inhibit GC (tenure "unlimited" amount until next trigger).
> > Of course, you also need to request enough old space up front.
> >
> > CCL's GC has three basic generations for new space allocations: young,
> > medium, old.  I don't think old here is true old space, but simply the
> > oldest new space generation before things get tenured to old space.  But
> > I could be wrong, but assuming I'm not,
> >
> > (lisp-heap-gc-threshold new-threshold) should basically do the job of
> > requesting enough space, assuming you can request a full GC.
> >
> > Is there a means of requesting a full GC?  ccl::freeze does this, but
> > with the (undoubtedly useful as times) side effect of pinning everything
> > left alive as non relocatable.  That's a very neat feature, but is
> > primary reason for freeze - not the full GC it also does.
> >
> >
> > Next, if you set the generation 0-2 sizes all to 0, would this indicate
> > to the GC that all new things are just immediately placed in old space?
> > (like setting generation-spread to 0 in ACL).  I'm not sure about this
> > one, especially as the documentation says that whatever is requested is
> > rounded up to a multiple of 64KB (of course 0 is such a multiple,
> > but...)
> >
> > Along with this - are there multiple old spaces, and if so do they
> > behave analogous to the generations in new space (ACL does something
> > like this - it may not even make sense for CCL's GC but I thought I'd
> > ask...)
> >
> > Let's start with this and see what happens :-)
> >
> > Thanks in advance for any info!
> >
> > /Jon
> >
> >
> >
> > _______________________________________________
> > Openmcl-devel mailing list
> > Openmcl-devel at clozure.com
> > http://clozure.com/mailman/listinfo/openmcl-devel
> >
> >




More information about the Openmcl-devel mailing list