[Openmcl-devel] sharing data between fortran and OpenMCL

Fri Oct 1 20:52:30 PDT 2004

I'll answer this question generally and let Gary answer it specifically 
to openmcl. If my answer is less than clear it's because I haven't read 
my garbage collection book lately -- good book though: "Garbage 
Collection" by Jones & Lins.

Here are the big two reasons to do garbage collection:
1. Memory allocation/deallocation errors are one of the top three 
causes of bugs in ALL software without language-supported full GCing.
2. In languages with aliasing (i.e. all languages that you'd use), it 
is IMPOSSIBLE in general short of GCing to know when something can be 
freed. In other words it is impossible to not leak memory without 
GCing.

However, garbage collectors don't all move data around. Roughly 
speaking there are three kinds of garbage collection algorithms that I 
can think of off hand, namely mark-and-sweep, coloring/copying, and 
conservative. I think openmcl and MCL use compacting mark-and-sweep GC. 
The reason for moving the data is usually so that you get deallocation 
of garbage for free and don't have to touch everything -- only 
non-garbage -- when you collect, and so that allocating new objects is 
essentially free -- if all uncollected/allocated space is contiguous 
you just advance a free pointer. Reasons for not moving data are 
generally that less space is needed  to run the system (although with 
VM this advantage is questionable) and that pointers are stable.

Copying collectors are possibly the most popular, they are easy to turn 
into generational collectors for one thing and the algorithm is the 
standard 3-coloring graph traversal algorithm with some bells and 
whistles added for what to do with the nodes and arcs. Namely, you GC 
over everything in "oldspace" and when you first see an object you copy 
it into "newspace" and replace the version in "oldspace" with a 
forwarding pointer and set its color to grey from black. Then you 
enqueue all its pointers for processing. When all the pointers in an 
object have been updated to refer to newspace versions of things in an 
object, you color it white and you're done touching it. When everything 
is white you are done and can summarily ignore oldspace from then on, 
there are no more pointers into it and no one will ever look there 
again. The next time you GC (when newspace fills up with allocated 
objects), you swap the names of oldspace and newspace and copy every 
reachable object back into the new newspace. Allocating an object is 
just advancing the free pointer into newspace, and GC is triggered when 
you can't advance that pointer any further. (replace old and newspace 
with onespace twospace... nspace and you have a generational collector, 
which requires a couple more tricks but fewspace collectors are pretty 
popular since recently-allocated things are killed off almost as fast 
in many types of applications)

mark-and-sweep collectors are similar -- the same graph traversal 
usually happens, but when you see an object you flag it instead of 
copying it elsewhere. No pointer rearrangement happens in this phase, 
and if you're not compacting you just go through and free all the 
unflagged memory, building up a new freelist of memory to allocate 
from, and you're done. If you want to do compaction, you then go and 
copy all the flagged stuff in order to the front of memory, updating 
pointers as you go much as in a copying collector but I'd imagine with 
some additional tricks since you're overwriting some of the space you 
were just using seconds before.

Conservative collectors and non-compacting mark-and-sweep collectors 
don't actually move stuff around. Conservative collectors are like 
mark-and-sweep collectors, but don't depend on type bits to do their 
job -- if something looks like it might be a valid pointer (i.e. it's a 
word of data that has a value which if interpreted as a pointer points 
into a live block of memory), it's considered a valid pointer, and 
anything it points to is marked as "possibly not garbage" -- sort of 
like when I clean my house, actually... anyway conservative collectors 
normally do a fine job and eventually will catch all your garbage 
(unless you have unfortunate constants sitting around that are mistaken 
for pointers to things). For them as with noncompacting mark-and-sweep 
systems you have to maintain free lists.

In languages like lisp, often typical programming style 
pre-optimization will have objects allocated and deallocated so much 
that you often end up with lots of garbage and do a lot of temporary 
allocation of things, so not touching garbage is a good thing and 
making allocation fast is a good thing. Typically once things work (and 
they will work sooner than in other languages, or so the story goes) 
you can do profiling of time and space usage and pound on the time and 
space hogs to make them faster and cons less gratuitously, but then -- 
and this is the nifty part -- you don't have to rearrange your 
memory-freeing strategy at all! You just allocate less, or less on the 
heap (allocating on the stack instead, or even shudder in non-GCing 
C-space temporarily), and everything else takes care of itself!

One nice thing about not moving objects is that FFI calls into C and 
C++ and such can be made on lisp objects, because they won't move 
(which C/C++ has no concept of and couldn't track), you just have to 
worry about the object evaporating out from under the FFI call, and 
that's easy to deal with by extending the root set of the collection 
algorithm (uh... look it up, it's just the starting point of whatever 
GC algorithm). Another really good reason for not moving things around 
is that it's easier to make a real-time GCer because you don't ever 
have to worry about objects being in odd states when you're 
interrupting the collector (you just have to worry about the GCer 
knowing what the heck is going on so it can be sure when an object is 
not free-able, it's way better to be a little conservative about 
keeping things around than freeing them too soon).

Oh, and re: data -- people have studied allocation patterns 
EXtensively, I would be surprised if there aren't a ream or two of 
papers on the subject, and GCing, and not GCing, and books and 
workshops at conferences and so on. Garbage collection is 15+ years old 
and heavily researched. Generally, I think it's considered to be a 
solved problem with I believe one exception -- the real time concurrent 
collection case. (Now, if it's a solved problem why doesn't C++ use it? 
I'd guess it's because C++ programmers are often angst-filled people 
with silly FUD-induced notions about how much slower their GCed system 
will run, and they'd riot if the more educated amongst them tried to 
introduce it into the language spec.)

Does that answer some of your questions?

h

On Oct 1, 2004, at 4:17 PM, Cyrus Harmon wrote:

> I suppose this might be a bit offtopic, but here goes anyway. So why 
> does data get moved around by the GC? I'm sure this must have been a 
> frequent topic for discussion on c.l.l. long ago, but I don't know the 
> answer. I can see the arguments for the benefit of heap compaction a 
> la the old MacOS memory management model, but what is it about GC that 
> necessitates moving data? In the C world, one just mallocs and frees 
> memory and lets the OS sort it out. Long-lived blocks stick around, 
> and others don't but the data doesn't move. I can see certain 
> applications (DB, file system, e.g.) where one would like to manage 
> memory in a compact, fixed-size manner and provide for a nice compact 
> memory layout, but with modern VMs, is this really such an issue for 
> allocating/releasing LISP data?
>
> Thanks,
>
> Cyrus
>
> On Sep 30, 2004, at 2:06 PM, Gary Byers wrote:
>
>> Lisp data can be moved around in memory by the GC.
>> Because lisp threads are preemptively scheduled, a GC can occur
>> between any two instructions.
>> It is therefore meaningless to talk about taking the address of
>> a lisp object (in general; it would work for things that were
>> stack-allocated, or allocated via %MAKE-HEAP-IVECTOR, and it might
>> work a very high percentage of the time otherwise - the GC doesn't,
>> after all, run on -most- instruction boundaries.)
>
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/mailman/listinfo/openmcl-devel
>