[Openmcl-devel] GC aspects, tuning, behavior

Mon Jun 29 05:49:58 PDT 2009

On Sun, 28 Jun 2009, Jon S. Anthony wrote:

>
>>> There is some promotion policy (not mentioned, but maybe based on
>>> percent used of the regions for each generation and/or the number of
>>> minor collections an item survives), and minor collections (those
>>> involving only new space - what you seem to call EGC collections) are
>
> Could you give a thumbnail sketch of what the promotion policy is?  That
> could give interesting hints about behavior under various conditions.
>

There isn't a promotion policy as such (or there's a ridiculously
simple one): anything that survives a GC of generation N is
immediately incorporated (promoted, tenured) into generation N+1.
This can in some hopefully rare cases cause (extremely) premature
tenuring: since generations less than N are collected whenever
generation N is collected, in the worst case, this can mean that
something that survives a single GC can get tenured directly into
old space (where it can only be recovered by a full GC.)

How often this happens (and how much it costs) is
application-dependent; when compiling the lisp itself - something that
can create both a lot of short-lived objects and a lot of stuff whose
lifetime is sort of "intermediate" - the ratio of generation 0 to
generation 2 collections is about 50:1 (doubling the size of
generation 2 makes that ratio closer to 100:1, not surprisingly.)  So,
with the default parameters in that case, about 2% of all GCs are of
generation 2 (and therefore of generations 1 and 0 as well); with the
default EGC configuration, about 1/7 (2MB out of 14) of the memory in
the region being GCed were newly allocated, and typically only a small
percentage of newly-allocated objects survive even 1 GC.  (If I'm
doing the math right, then this particular worst-case-scenario where
something is tenured to oldspace after it survives a single GC happens
"a small percentage" (the generation 0 survival rate) of .28% (.0028)
of the time.  A more elaborate promotion/tenuring policy could avoid
this particular worst-case behavior, but any such policy is heuristic
(trying to predict the lifetime of an object based on its history) and
I'm not convinced that a more elaborate tenuring policy necessarily
offers a better heuristic.  (I confess that I haven't thought about it
or studied the issues in 10 years or more; 10 years ago, the results
that people claimed or observed were in the noise.)

>
>> ... Since the scavenge
>>> technique uses order preserving compaction, the piece of Gen I that Gen
>>> I+1 eats will be (most likely anyway) the oldest (via some measure -
>>> probably collection cycles).  However that probably means there will be
>>> a gap in Gen I+1 between it's last set of non promoted collected
>>> survivors and the newly eaten Gen I items.  That gap will be removed on
>>> the next collection (via compaction) of Gen I+1.
>>>
>>
>> There's never a gap.
>>
>> When the youngest generation fills up, then at the very least that generation
>> is collected and compacted; the GC operates on the (generally small) region
>> of memory between the youngest generation's start and end.
>
> Yes, but the "gap" bit was about when things from a younger generation
> are promoted to an older generation during a scavenge of just the
> younger generation (no older scavenge triggered - but maybe you are
> saying that can't happen?)

It is always the case - after any kind of GC - that all live memory is
compacted into a contiguous region ("on the left") and therefore free
memory is a single contiguous region on the right.

>
> My understanding was that promotion was achieved by simply increasing
> the top pointer of the older generation (Gen I+1) so that it now
> included part of what was once the younger generation (Gen I).  The gap
> would then be the piece of space between the last currently believed
> live object of Gen I+1 and the oldest just promoted object from Gen I:
>
> ...|XXXXX ...|YYYYYYYY...|  <--- Before scavenge triggers promotion
>
>     I+1       I         ^-- "Highest currently allocated address"
>
>             ^----- "top of I+1, before promotion"
>
>
> ...|XXXXX ...YYYY|YYYY...|  <---- After scavenge promotes some of I
>
>                 ^--- New top of I+1 effects promotion
>
>           ^-- the "gap"
>
> Or, maybe such a promotion always triggers a mark compact of I+1??  Or
> maybe I have just misunderstood the description.

The highest allocated address and all of the Ys that survive the GC
will have moved to the left.

>
>
>>> Of course any collection for an older generation will collect younger
>>> ones as well (do to the usual idea of only wanting to track - via write
>>> barriers - old-young pointer changes).
>>
>> And if, at the point when the youngest generation is full and a GC is
>> triggered the next oldest generation's size exceeds its threshold, it's
>> also collected (and likewise for the oldest ephemeral generation.)  This
>> means that the GC operates on (marks and compacts) objects in the contiguous
>> region region between the start of the oldest ephemeral region and the
>> end of the youngest ephemeral region.
>
> Yes, sure.
>
>
>>> When new space fills up completely, a full/global/major collection is
>>> triggered and promotable items in the 3 new space generations are
>>> "moved" to old space (again, probably by simply advancing the "top"
>>> pointer of old space).  It also does a collection (mark/compaction) on
>>> the items in old space (which probably includes the new promoted things
>>> as that step likely happens first).  Old space's "top" (right most)
>>> pointer is adjusted to just past last live thing in old space.  If there
>>> is not enough space freed up in old space to account for all the newly
>>> promoted objects plus the ones remaining in new space so that
>>> LISP-HEAP-GC-THRESHOLD space is available for new space, some extra
>>> space is allocated to the "top" (right most) of new space to achieve
>>> this limit.
>>
>> Yes, basically.
>>
>> If it helps: the GC (whether "ephemeral" or "full") always operates
>> on a region of allocated memory between the start of allocated space
>> and its end.  A full GC operates on all of allocated space; an ephemeral
>> GC operates on a generally small where the start is nearer the end.  In
>> all cases (for simplicity's sake), the live objects in the region that
>> the GC operates on are compacted towards the start of the region, generally
>> freeing up memory near the end of allocated space/start of free space.  The
>> differences (really the only ones) between full and ephemeal GC have to do
>> with the size (start) of the region being GCed and with the fact that the
>> ephemeral GC has to consider pointers from old objects to new ones.
>
> Yes, that's how I understood it was working at this level.
>
>
>> The exception to the "everything's always fully compacted" rule has to
>> do with CCL::FREEZE, which you mentioned earlier.  CCL::FREEZE does a
>> full GC and notes the end allocated space after GC; subsequent full
>> GCs (including those invoked by FREEZE) will not compact below/to the
>> left of that point
>
> Makes sense.
>
>
>>  (though a full GC will zero out the words occupied
>> by "frozen" objects that become garbage.  (So frozen objects are
>> tenured in the sense that they're immune from relocation due to
>> compaction, but they're not immortal and can die in office.)
>
> Is that zeroed out space actually reuseable?  Or do the dead continue to
> take up office space in "peter principle" analogy...
>

The free chunks are of varying size.  They can be linked into a list
(turning the GC into something like a mark/sweep collector, at least
as far as frozen space is concerned) and recycled as "static" cons
cells.  "Static" cons cells can be used as address-based hash table
keys without incurring rehashing costs that would be present if those
address-based keys moved due to GC activity.

>
>>> The basic scheme does not treat "large" objects specially - I don't
>>> think that buys much either, especially nowadays.
>>>
>>> There is no way to invoke or trigger a minor collection (egc
>>> collection).  ccl:gc invokes a full/major/global collection.  That's not
>>> so good, IMHO, but I guess it is what it is.
>>
>> It's not hard to implement, but I've always found it hard to understand
>> why someone would want to invoke a minor collection manually: there's
>> a fairly good chance that one just happened and an equally good chance
>> that one's about to, and it's not immediately clear that one can accurately
>> predict the effects of an ephemeral collection.  (For a full GC, you can
>> more reliably predict that "doing one now makes it less likely that you'll
>> do one in the middle of the big demo", for example; I can't fill in the
>> blank in "I want to force an ephemeral GC now, so that ____.")
>
> I think this (and see below about the other point) is more useful in a
> copying collector where you can also control the promotion policy.  It
> also depends on your knowing a fair amount about what is happening in
> certain special cases.  But as I'm thinking of it right now, those
> actually have more to do with preventing "premature" minor collections
> than triggering them.
>
>
>>> There is no way to tell the GC/allocator/mutator that a "new" item is
>>> really an old space item, i.e., just allocate it up front in old space.
>>> I don't think that is so good either - there are definitely cases where
>>> this is a major win.
>>
>> As I said, I used to be more concerned about this than I am now (or, more
>> accurately, every time that I've convinced myself that it'd be a big win
>> to do some sort of static allocation, I've been proven wrong.)  I can't
>> say that your intuition about this is wrong, but I can certainly say that
>> mine often has been.
>
> This isn't so much about intuition as simple observations in the past -
> but that has been with copying collectors and mostly with one specific
> collector.
>
>
>> The "something else" of course is that the object stopped moving after
>> a few GC iterations.
>
> Sure, and of course that happens in copying collectors as well.  But,
>
>
>> In a straightforward 2-space copying GC, a long-lived object's address
>> would likely change on every GC until something decides that it'd be
>> better to keep it in a static memory area,
>
> That's really no different from what is happening in the compaction
> collector.  However, there is typically less work being done for such
> things in the compactor since they don't need to be copied except during
> promotion compactions.  Scavenge compactions in a generation (region)
> will only move them if something older than they dies before promotion.
> Which in the sort of case being discussed would not likely be happening.
> In the copying collector, they will be copied on each scavenge.
>
>
>>  and it might indeed be a
>> big win to just allocate it in that static area in the first place.
>> In a compacting GC, long-lived things -tend- to spend large parts of
>> their lifetimes at stable addresses (aren't moved by the compaction
>> process), though it may take a few GC cycles to reach stability
>
> Again, that's basically true of copying collectors as well.  The real
> issue here is that if you know that a lot (hundreds of thousands to
> millions) of objects are going to be created that are not going to die
> during a given program run, then those "GC cycles" can add up during the
> "bulk load" process (a few milliseconds here a few milliseconds there -
> pretty soon you're talking real time).  But again, this will be rather
> more pronounced for a copying collector as the cycles will typically
> always move the objects until they actually are tenured to an old/aging
> space.  In a compacting collector (certainly the one described for CCL
> here) they will tend not to move except possibly during a promotion.
> That's a big difference.

>
>
>>  and
>> the object is still (potentially) subject to compaction.
>
> That would be the equivalent of possibly being dumped during a
> global/major GC in a copying collector.  However, notice that for this
> case, the copying collector comes out rather better: you don't need to
> do any work since it won't be copied to the Tospace, but the compactor
> will very likely need to move stuff into the vacated space.  There is no
> free lunch ...
>
>
>>  but since the
>> tradeoff is between static allocation and "dynamic allocation that's
>> in fact nearly static" (as opposed to "dynamic allocation where space
>> flips happen on every GC"), the degree of ... winnitude is certainly
>> different.
>
> Yes.
>
>
>>> Is this more or less on track?
>>
>> I think so.
>
> I think we are pretty near closure (!) here.  This thing is sufficiently
> different that there really isn't a whole lot you can tune.  However,
> let me ask about this scenario for the "loads of large numbers of known
> to be "undying" objects":
>
> Start with a major/global/full GC.  Save the current value of
> lisp-gc-heap-threshold. Set the threshold limits of gen-1 and gen-2 (NOT
> gen 0) to something very small.  Actually how about just 0.  Set
> lisp-heap-gc-threshold "high" - basically enough to hold all the objects
> to be created.  Create objects.  There should be no minor GC during this
> as they should all fit in Gen 0.  Do a full GC.  Set
> lisp-gc-heap-threshold back to its saved (previous) value.  I guess this
> part depends on promotion policy: Will all these things be moved to old
> space at this point?  Or do you need to do a "few" GC's to trigger the
> promotion?  Or will it all just go "KA-BOOM!"  Or...
>

You lost me there.  If you do a full GC, then everything that's been allocated
will be GCed and compacted and considered "old"; that's what a full GC does.

Ephemeral (generational, lifetime-based) GCs try to exploit the
heuristic that "most of the time, most newly-created objects are
short-lived" and therefore concentrating GC effort on discovering
which newly-created objects are live and which are garbage will be
worthwhile.  If that heuristic assumption is false (if a high percentage
of newly-created objects are live after GC), that GC effort is wasted.

A bulk-load (something that creates and initializes a lot of
long-lived objects) does likely violate the "most new objects die
young" assumption, and Matt's suggestion (turning off the EGC around
the bulk load) may make that load/initialization run faster (the EGC
won't be spinning its wheels discovering that most of these new
objects are long-lived) and will get you to the same place (the
newly-allocated objects will effectively wind up in old space, where
they belong.)  Obviously, if you can do some of this initialization
before saving an image that's likely to be better than if you have
to do it on startup, but if you're concerned about it that's probably
not practical in your case.

You might want to just time your initialization code with EGC on and
with it off; I wouldn't be shocked if the EGC spent a lot of time
doing nothing useful, but it's probably worth trying to verify that:
even if you're (primarily) creating a lot of long-lived objects, the
process of doing so may or may not also create a lot of short-lived
objects that the EGC likes.