[Openmcl-devel] GC aspects, tuning, behavior

Sat Jun 27 13:25:12 PDT 2009

On Fri, 26 Jun 2009, Jon S. Anthony wrote:

>
> Let me see if I'm getting the basic claims here. First, CCL's
> generational GC is not a copying type collector but a mark/compact
> collector (that alone is somewhat interesting as it must be one of a
> very few that take this route).  So, it doesn't have semi-spaces
> (hemispheres) in either new (creation/young) space or in old (aging)
> space.  It really only has one old/aging space (the so called region "on
> the left").

At some point in time, I seem to remember hearing that Microsoft's .NET
GC uses a mark/compact scheme that's eerily similar to CCL's.  (I may
be misremembering that.)

>
> It uses 3 generations - all in new/creation space (the so called region
> "on the right").  This space is of size LISP-HEAP-GC-THRESHOLD.  It is
> split into three regions - one for each generation.  Youngest
> (generation 0) is probably "right most" with gen 1 adjacent to the left
> and gen 2 adjacent to gen 1 on the left.  I don't think the sizes of
> these regions is user accessible, but the "threshold limits" (for when
> to trigger a minor collection) for them are - they are the
> egc-configuration values.  While these probably affect the size of the
> regions in some way as well (under the overall constraint of
> LISP-HEAP-GC-THRESHOLD), there is no way to configure the sizes of these
> regions directly - only the overall size they are carved from:
> LISP-HEAP-GC-THRESHOLD.
>

Since they aren't statically sized regions, there is indeed no way to
configure their size; the threshold limits basically have to do with
when the region gets GCed.

> Since you don't use a copying collector these spaces are not split into
> semi-spaces.  Also, for the same basic reason, it doesn't sound like you
> use the technique of using older generations as "Tospace" areas for a
> given scavenge.

Right; it's not a multispace, copying collector.

>
> There is some promotion policy (not mentioned, but maybe based on
> percent used of the regions for each generation and/or the number of
> minor collections an item survives), and minor collections (those
> involving only new space - what you seem to call EGC collections) are
> performed by mark/compaction in each region.  When promotion occurs, for
> Gen I, the items to be promoted (a subset of the live ones) are moved to
> Gen I+1.  However, a somewhat interesting wrinkle on this is that the
> promotion is done by simply advancing the "top" pointer of Gen I+1's
> region - Gen I+1 sort of eats part of Gen I.  Since the scavenge
> technique uses order preserving compaction, the piece of Gen I that Gen
> I+1 eats will be (most likely anyway) the oldest (via some measure -
> probably collection cycles).  However that probably means there will be
> a gap in Gen I+1 between it's last set of non promoted collected
> survivors and the newly eaten Gen I items.  That gap will be removed on
> the next collection (via compaction) of Gen I+1.
>

There's never a gap.

When the youngest generation fills up, then at the very least that generation
is collected and compacted; the GC operates on the (generally small) region
of memory between the youngest generation's start and end.

> Of course any collection for an older generation will collect younger
> ones as well (do to the usual idea of only wanting to track - via write
> barriers - old-young pointer changes).

And if, at the point when the youngest generation is full and a GC is
triggered the next oldest generation's size exceeds its threshold, it's
also collected (and likewise for the oldest ephemeral generation.)  This
means that the GC operates on (marks and compacts) objects in the contiguous
region region between the start of the oldest ephemeral region and the
end of the youngest ephemeral region.

>
> When new space fills up completely, a full/global/major collection is
> triggered and promotable items in the 3 new space generations are
> "moved" to old space (again, probably by simply advancing the "top"
> pointer of old space).  It also does a collection (mark/compaction) on
> the items in old space (which probably includes the new promoted things
> as that step likely happens first).  Old space's "top" (right most)
> pointer is adjusted to just past last live thing in old space.  If there
> is not enough space freed up in old space to account for all the newly
> promoted objects plus the ones remaining in new space so that
> LISP-HEAP-GC-THRESHOLD space is available for new space, some extra
> space is allocated to the "top" (right most) of new space to achieve
> this limit.

Yes, basically.

If it helps: the GC (whether "ephemeral" or "full") always operates
on a region of allocated memory between the start of allocated space
and its end.  A full GC operates on all of allocated space; an ephemeral
GC operates on a generally small where the start is nearer the end.  In
all cases (for simplicity's sake), the live objects in the region that 
the GC operates on are compacted towards the start of the region, generally
freeing up memory near the end of allocated space/start of free space.  The
differences (really the only ones) between full and ephemeal GC have to do
with the size (start) of the region being GCed and with the fact that the
ephemeral GC has to consider pointers from old objects to new ones.

The exception to the "everything's always fully compacted" rule has to
do with CCL::FREEZE, which you mentioned earlier.  CCL::FREEZE does a
full GC and notes the end allocated space after GC; subsequent full
GCs (including those invoked by FREEZE) will not compact below/to the
left of that point (though a full GC will zero out the words occupied
by "frozen" objects that become garbage.  (So frozen objects are
tenured in the sense that they're immune from relocation due to
compaction, but they're not immortal and can die in office.)

>
> The basic scheme does not treat "large" objects specially - I don't
> think that buys much either, especially nowadays.
>
> There is no way to invoke or trigger a minor collection (egc
> collection).  ccl:gc invokes a full/major/global collection.  That's not
> so good, IMHO, but I guess it is what it is.

It's not hard to implement, but I've always found it hard to understand
why someone would want to invoke a minor collection manually: there's
a fairly good chance that one just happened and an equally good chance
that one's about to, and it's not immediately clear that one can accurately
predict the effects of an ephemeral collection.  (For a full GC, you can
more reliably predict that "doing one now makes it less likely that you'll
do one in the middle of the big demo", for example; I can't fill in the
blank in "I want to force an ephemeral GC now, so that ____.")

>
> There is no way to tell the GC/allocator/mutator that a "new" item is
> really an old space item, i.e., just allocate it up front in old space.
> I don't think that is so good either - there are definitely cases where
> this is a major win.

As I said, I used to be more concerned about this than I am now (or, more
accurately, every time that I've convinced myself that it'd be a big win
to do some sort of static allocation, I've been proven wrong.)  I can't
say that your intuition about this is wrong, but I can certainly say that
mine often has been.

To try to explain some of the basis for my skepticism: a live object is
only moved (compacted) by CCL's GC if some older object(s) (to the left)
become garbage.  If we create a new object (a function, just because that's
easy to do and functions' printed representations contain their addresses)
and GC a few times, printing the function (and observing its address) after
each GC, we can see the effects of compaction (and something else as well):

? (defun foo (x) x)
FOO
? #'foo
#<Compiled-function FOO #x3000411A760F>
? (gc)
NIL
? #'foo
#<Compiled-function FOO #x300040F35DAF>
? (gc)
NIL
? #'foo
#<Compiled-function FOO #x300040F357AF>
? (gc)
NIL
? #'foo
#<Compiled-function FOO #x300040F3576F>
? (gc)
NIL
? #'foo
#<Compiled-function FOO #x300040F3576F>
? (gc)
NIL
? #'foo
#<Compiled-function FOO #x300040F3576F>

The "something else" of course is that the object stopped moving after
a few GC iterations.  (I did this in a fresh image that generally contained
some things that're reinitialized when the session starts; after that's
happened, it might take fewer iterations for things to reach a point of
relative stability.)

#'FOO isn't nailed down/pinned in memory, so its address might change
on some future GC if some older object(s) become garbage.  Old objects
tend to be ... long-lived, so in practice things tend to "settle" and
not be affected by compaction for relatively long stretches of time.
In a straightforward 2-space copying GC, a long-lived object's address
would likely change on every GC until something decides that it'd be
better to keep it in a static memory area, and it might indeed be a
big win to just allocate it in that static area in the first place.
In a compacting GC, long-lived things -tend- to spend large parts of
their lifetimes at stable addresses (aren't moved by the compaction
process), though it may take a few GC cycles to reach stability and
the object is still (potentially) subject to compaction. There might
still be scenarios where static allocation could win (by offering
stronger guarantees that the object wouldn't move or by avoiding the
"few" GC cycles that it takes to reach quiescence), but since the
tradeoff is between static allocation and "dynamic allocation that's
in fact nearly static" (as opposed to "dynamic allocation where space
flips happen on every GC"), the degree of ... winnitude is certainly
different.

> But given your basic scheme, I can see how this
> would be non trivial to achieve: There's only one old space.  It's fully
> compacted (modulo all current knowledge of live items in it).  There's
> simply no place to use to achieve direct old space allocation.  The idea
> you mention - moving everything to the "right", allocating, moving
> everything "left" - is indeed "crazy".

If this does prove to be necessary, the less crazy way of doing it would
involve reserving some address space at the extreme left and doing static
allocation there.  In 64-bit CCL, there's some other stuff a bit less than
1GB away (to the left), so there isn't currently an infinite amount of room
to grow in that direction, but this wouldn't involve moving anything else
out of the way.

>
> Is this more or less on track?

I think so.

Bill St Clair once wrote a little graphic utility for MCL (the "GC
thermometer", if anyone remembers) that displayed a representation of
the heap in a wide/short window at the bottom of the screen; it
updated itself every second or so and basically provided an animated
view of GC activity.  If something like that still existed, it'd likely
make some of this stuff easier to understand (and certainly easier to
visualize.)

>
>
> /Jon
>
>
> On Thu, 2009-06-25 at 19:33 -0600, Gary Byers wrote:
>> CCL's GC basically manages a single contiguous region of memory; you
>> can think of that region as being divided into two subregions.  The
>> subregion "on the left" contains allocated objects (things that survived
>> the last GC) and the subregion "on the right" contains free space.  When
>> an image is first loaded and after a full GC, the size of the free space
>> is determined by LISP-HEAP-GC-THRESHOLD.
>>
>> Conceptually (it doesn't really work exactly like this), free space fills
>> up from left to right; when free space fills up completely, a full GC
>> is triggered.  The GC identifies which objects are live, then does an
>> in-place, order-preserving compaction of live objects ("towards the left"),
>> creating contiguous free space on the right.  If necessary, more pages
>> are obtained from the OS so that the free space is of a size determined
>> by LISP-HEAP-GC-THRESHOLD; in some cases, the heap can shrink and unneeded
>> pages can be returned to the OS.  (There's a large - possibly very large -
>> chunk of reserved address space "to the right" of the free area.)
>>
>> The fact that garbage is returned to (contiguous) free space by
>> in-place, order-preserving compaction offers some useful properties.
>> If allocated object X is "to the left of" (has a lower address than)
>> object Y, then it is generally true that X is older than Y (X has
>> survived more GC cycles than Y has), though X and Y may be of the same
>> vintage/generation.  If we wanted to, we could keep some extra
>> bookkeeping information and identify each subregion of allocated space
>> that had survived I full GCs for I = 1 .. N, and part of that bookkeeping
>> might involve adjusting some pointers if/when things moved around due to
>> compaction.
>>
>> We don't actually do that, but we do use a similar scheme to keep track
>> of things on the right end of allocated space (maintaining regions that
>> describe "newly allocated things", "things that've survived a few GCs",
>> "things that've survived quite a few GCs but might still become garbage
>> soon.")  These regions are the EGC's "generations", and they're basically
>> just a pair of pointers to the start and end of a region in allocated
>> space and a threshold/limit value that affects EGC policy (how big the
>> youngest generation can get before the EGC is invoked, how big the older
>> ephemeral generations can get before the EGC looks at them as well as
>> looking at the youngest generation.)  The EGC's essentially doing the
>> same work as the full GC does - identifying live objects and doing an
>> in-place order-preserving compaction on them - only it's doing that
>> work on (generally) small subregions of at the extreme right end of
>> allocated space.  To the extent that it's true that a high percentage
>> of newly-allocated objects are short-lived, this work is productive.
>> (That's true of all ephemeral/generational GC schemes; the only things
>> that might be atypical of CCL's EGC is that promoting/tenuring objects
>> from one generation to the next is done by adjusting the bounds of
>> the generations, not by copying things to other memory areas.)
>>
>> There are a lot of things to like about this scheme; it makes
>> allocation quick (since free memory is contiguous) and it generally
>> takes time proportional to the number of live objects.  The worst-case
>> scenarios - which would involve copying large objects around during
>> in-place compaction - tend not to happen all that often (or tend to
>> happen a few times before things stabilize.)  In practice, long-lived
>> objects tend not to move around, though in theory they could.
>>
>> The scheme doesn't special-case "large" objects (they tend to behave
>> like the pig in the python: the pig eventually gets from one end of
>> the python to the other, but it's not always pleasant for pig or
>> python and may involve some premature tenuring.)  There's no way to
>> allocate an object and declare it to be "old", though this could be
>> done (essentially by pushing everything to the right and allocating
>> the object at or near the extreme left.)  I can imagine cases where
>> this would be worthwhile and desirable, but in most cases where I'd
>> have been concerned about this that concern has been unfounded.
>>
>> You can manually invoke a full GC via (CCL:GC).  There's no way
>> to trigger the EGC manually.
>>
>> I don't remember if anything checks for it, but setting the size
>> of generation 0 to 0 will probably cause the EGC to be triggered
>> unproductively (if it doesn't cause worse behavior.)  Setting the
>> size of generations 1 or 2 to 0 could be done but may not be
>> very useful; making the generations slightly larger than the defaults
>> could improve performance somewhat (if things are "short-lived, but
>> not THAT short-lived.")
>>
>>
>>
>>
>>
>>
>> On Thu, 25 Jun 2009, Jon S. Anthony wrote:
>>
>>> Hi,
>>>
>>> This will likely be a few exchanges...
>>>
>>> As part of my ongoing porting/open sourcing the graph store/semantic
>>> engine system I have developed I am now educating myself on various
>>> details of CCLs GC.  This is to enable understanding the basic behavior
>>> for various possible tuning scenarios in different circumstances. I have
>>> a pretty good grip on various details of ACL's GC when originally
>>> looking into this sort of stuff, but (being implementation dependent - a
>>> good thing here...) CCL's is different.
>>>
>>> If you have a bunch of objects that will be allocated up front for some
>>> processing, and you know that these will basically be around for the
>>> life of the program (more or less), it can be useful to just allocate
>>> them from the start in old space.  In ACL you can do this by setting the
>>> generation-spread to 0 and tenure-limit to nil.  Basically this means
>>> new things have no generations to move through - just go directly to old
>>> space - and inhibit GC (tenure "unlimited" amount until next trigger).
>>> Of course, you also need to request enough old space up front.
>>>
>>> CCL's GC has three basic generations for new space allocations: young,
>>> medium, old.  I don't think old here is true old space, but simply the
>>> oldest new space generation before things get tenured to old space.  But
>>> I could be wrong, but assuming I'm not,
>>>
>>> (lisp-heap-gc-threshold new-threshold) should basically do the job of
>>> requesting enough space, assuming you can request a full GC.
>>>
>>> Is there a means of requesting a full GC?  ccl::freeze does this, but
>>> with the (undoubtedly useful as times) side effect of pinning everything
>>> left alive as non relocatable.  That's a very neat feature, but is
>>> primary reason for freeze - not the full GC it also does.
>>>
>>>
>>> Next, if you set the generation 0-2 sizes all to 0, would this indicate
>>> to the GC that all new things are just immediately placed in old space?
>>> (like setting generation-spread to 0 in ACL).  I'm not sure about this
>>> one, especially as the documentation says that whatever is requested is
>>> rounded up to a multiple of 64KB (of course 0 is such a multiple,
>>> but...)
>>>
>>> Along with this - are there multiple old spaces, and if so do they
>>> behave analogous to the generations in new space (ACL does something
>>> like this - it may not even make sense for CCL's GC but I thought I'd
>>> ask...)
>>>
>>> Let's start with this and see what happens :-)
>>>
>>> Thanks in advance for any info!
>>>
>>> /Jon
>>>
>>>
>>>
>>> _______________________________________________
>>> Openmcl-devel mailing list
>>> Openmcl-devel at clozure.com
>>> http://clozure.com/mailman/listinfo/openmcl-devel
>>>
>>>
>
>