[Openmcl-devel] sharing data between fortran and OpenMCL
Hamilton Link
hamlink at comcast.net
Fri Oct 1 20:52:30 PDT 2004
I'll answer this question generally and let Gary answer it specifically
to openmcl. If my answer is less than clear it's because I haven't read
my garbage collection book lately -- good book though: "Garbage
Collection" by Jones & Lins.
Here are the big two reasons to do garbage collection:
1. Memory allocation/deallocation errors are one of the top three
causes of bugs in ALL software without language-supported full GCing.
2. In languages with aliasing (i.e. all languages that you'd use), it
is IMPOSSIBLE in general short of GCing to know when something can be
freed. In other words it is impossible to not leak memory without
GCing.
However, garbage collectors don't all move data around. Roughly
speaking there are three kinds of garbage collection algorithms that I
can think of off hand, namely mark-and-sweep, coloring/copying, and
conservative. I think openmcl and MCL use compacting mark-and-sweep GC.
The reason for moving the data is usually so that you get deallocation
of garbage for free and don't have to touch everything -- only
non-garbage -- when you collect, and so that allocating new objects is
essentially free -- if all uncollected/allocated space is contiguous
you just advance a free pointer. Reasons for not moving data are
generally that less space is needed to run the system (although with
VM this advantage is questionable) and that pointers are stable.
Copying collectors are possibly the most popular, they are easy to turn
into generational collectors for one thing and the algorithm is the
standard 3-coloring graph traversal algorithm with some bells and
whistles added for what to do with the nodes and arcs. Namely, you GC
over everything in "oldspace" and when you first see an object you copy
it into "newspace" and replace the version in "oldspace" with a
forwarding pointer and set its color to grey from black. Then you
enqueue all its pointers for processing. When all the pointers in an
object have been updated to refer to newspace versions of things in an
object, you color it white and you're done touching it. When everything
is white you are done and can summarily ignore oldspace from then on,
there are no more pointers into it and no one will ever look there
again. The next time you GC (when newspace fills up with allocated
objects), you swap the names of oldspace and newspace and copy every
reachable object back into the new newspace. Allocating an object is
just advancing the free pointer into newspace, and GC is triggered when
you can't advance that pointer any further. (replace old and newspace
with onespace twospace... nspace and you have a generational collector,
which requires a couple more tricks but fewspace collectors are pretty
popular since recently-allocated things are killed off almost as fast
in many types of applications)
mark-and-sweep collectors are similar -- the same graph traversal
usually happens, but when you see an object you flag it instead of
copying it elsewhere. No pointer rearrangement happens in this phase,
and if you're not compacting you just go through and free all the
unflagged memory, building up a new freelist of memory to allocate
from, and you're done. If you want to do compaction, you then go and
copy all the flagged stuff in order to the front of memory, updating
pointers as you go much as in a copying collector but I'd imagine with
some additional tricks since you're overwriting some of the space you
were just using seconds before.
Conservative collectors and non-compacting mark-and-sweep collectors
don't actually move stuff around. Conservative collectors are like
mark-and-sweep collectors, but don't depend on type bits to do their
job -- if something looks like it might be a valid pointer (i.e. it's a
word of data that has a value which if interpreted as a pointer points
into a live block of memory), it's considered a valid pointer, and
anything it points to is marked as "possibly not garbage" -- sort of
like when I clean my house, actually... anyway conservative collectors
normally do a fine job and eventually will catch all your garbage
(unless you have unfortunate constants sitting around that are mistaken
for pointers to things). For them as with noncompacting mark-and-sweep
systems you have to maintain free lists.
In languages like lisp, often typical programming style
pre-optimization will have objects allocated and deallocated so much
that you often end up with lots of garbage and do a lot of temporary
allocation of things, so not touching garbage is a good thing and
making allocation fast is a good thing. Typically once things work (and
they will work sooner than in other languages, or so the story goes)
you can do profiling of time and space usage and pound on the time and
space hogs to make them faster and cons less gratuitously, but then --
and this is the nifty part -- you don't have to rearrange your
memory-freeing strategy at all! You just allocate less, or less on the
heap (allocating on the stack instead, or even shudder in non-GCing
C-space temporarily), and everything else takes care of itself!
One nice thing about not moving objects is that FFI calls into C and
C++ and such can be made on lisp objects, because they won't move
(which C/C++ has no concept of and couldn't track), you just have to
worry about the object evaporating out from under the FFI call, and
that's easy to deal with by extending the root set of the collection
algorithm (uh... look it up, it's just the starting point of whatever
GC algorithm). Another really good reason for not moving things around
is that it's easier to make a real-time GCer because you don't ever
have to worry about objects being in odd states when you're
interrupting the collector (you just have to worry about the GCer
knowing what the heck is going on so it can be sure when an object is
not free-able, it's way better to be a little conservative about
keeping things around than freeing them too soon).
Oh, and re: data -- people have studied allocation patterns
EXtensively, I would be surprised if there aren't a ream or two of
papers on the subject, and GCing, and not GCing, and books and
workshops at conferences and so on. Garbage collection is 15+ years old
and heavily researched. Generally, I think it's considered to be a
solved problem with I believe one exception -- the real time concurrent
collection case. (Now, if it's a solved problem why doesn't C++ use it?
I'd guess it's because C++ programmers are often angst-filled people
with silly FUD-induced notions about how much slower their GCed system
will run, and they'd riot if the more educated amongst them tried to
introduce it into the language spec.)
Does that answer some of your questions?
h
On Oct 1, 2004, at 4:17 PM, Cyrus Harmon wrote:
> I suppose this might be a bit offtopic, but here goes anyway. So why
> does data get moved around by the GC? I'm sure this must have been a
> frequent topic for discussion on c.l.l. long ago, but I don't know the
> answer. I can see the arguments for the benefit of heap compaction a
> la the old MacOS memory management model, but what is it about GC that
> necessitates moving data? In the C world, one just mallocs and frees
> memory and lets the OS sort it out. Long-lived blocks stick around,
> and others don't but the data doesn't move. I can see certain
> applications (DB, file system, e.g.) where one would like to manage
> memory in a compact, fixed-size manner and provide for a nice compact
> memory layout, but with modern VMs, is this really such an issue for
> allocating/releasing LISP data?
>
> Thanks,
>
> Cyrus
>
> On Sep 30, 2004, at 2:06 PM, Gary Byers wrote:
>
>> Lisp data can be moved around in memory by the GC.
>> Because lisp threads are preemptively scheduled, a GC can occur
>> between any two instructions.
>> It is therefore meaningless to talk about taking the address of
>> a lisp object (in general; it would work for things that were
>> stack-allocated, or allocated via %MAKE-HEAP-IVECTOR, and it might
>> work a very high percentage of the time otherwise - the GC doesn't,
>> after all, run on -most- instruction boundaries.)
>
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/mailman/listinfo/openmcl-devel
>
More information about the Openmcl-devel
mailing list