[Openmcl-devel] forking to dump?

Wed Jul 9 18:58:42 PDT 2008

On Wed, 9 Jul 2008, Hamilton Link wrote:

>> 
>
> Wow, I had _no_ idea this was even remotely possible, Gary.  Am I correct in 
> understanding that you managed to save-application without quitting all of 
> lisp (by merit of forking off a duplicate lisp environment first)?  Last time 
> I talked to you about why save-app quit lisp, it had to quit in order that 
> the image-dumping code didn't need to contend with the garbage collector or 
> other threads while it was writing the new image file.  Does fork work some 
> magic to avoid these issues while forking the copy, or was this not as big a 
> deal as I thought?
>

Hans came up with the idea of trying to do the SAVE-APPLICATION from
a forked copy of a running lisp; that had never occurred to me before.
#_fork has the property of effectively killing all threads in the
child process (or of failing to make copies of them) except for the
calling thread.  If you call #_fork from the initial thread, it seems
that (if you don't run into a locking problem) you have a fairly
good chance of being able to do SAVE-APPLICATION from the child (OS)
process, at least on Linux and FreeBSD.  (On The World's Most Advanced
Operating System, there are Mach issues that are more likely to get
in the way and which may be hard to work around.  Who'd have guessed ?)

In that case, code run by SAVE-APPLICATION in the initial (only, after
#_fork) thread tries to shutdown other threads; it basically does
PROCESS-KILL on them, waits for a fairly short time to see if they got
the message (and raised a termination semaphore that's raised on
normal thread exit), and basically just kills them abruptly if they
don't die voluntarily.  After #_fork, the other threads aren't really
there, so as far as the initial thread is concerned they're just
taking too long to die and are terminated with extreme prejudice, and
it can then do the low-level SAVE-APPLICATION stuff, confident that
things are single-threaded.

SAVE-APPLICATION basically does want to run in an environment where
it's sure that no other threads are running (and modifying the heap
while it's trying to write a consistent snapshot to disk.)  In the
current implementation, SAVE-APPLICATION basically runs after #_exit
is called (via the #_atexit mechanism), and it'd probably still be
pretty hard to return from a call to SAVE-APPLICATION into a 
multi-threaded world.  (It may not be impossible, but there are
probably lots of issues that I'm not thinking about.)

So, it's at least somewhat likely that:

(process-interrupt ccl::*initial-process*
   (lambda ()
     (when (eql (#_fork) 0)
       (save-application ...))))

will successfully write a heap image (on Linux or FreeBSD).  It might
fail (hang in the child) if some lock needed by SAVE-APPLICATION is
apparently owned by some thread that doesn't really exist in the
child, and it's much more likely to fail on Darwin because various
Mach-level things aren't set up properly in the child process.)

When this works, the parent (OS) process and all of its threads
should be unaffected (except for the fact that the child process
is burning a lot of CPU and doing lots of disk I/O for however
long it takes to GC/purify/write the image file.)

For OSes that aren't ... um, Mach-enhanced ... it's probably
possible to clean things up after fork (stealing locks, etc.)
so that this could be a bit more robust.  It may be possible
to clean up the Mach stuff as well, but I'm not sure that I'm
aware of all of the Mach stuff (and if it was all easy to clean
up, Apple's fork man page might not be as scary as it is.)

> thanks,
> h
>
>
>
>