[Openmcl-devel] forking to dump?

Thu Jul 10 01:28:29 PDT 2008

Suppose that two threads - A and B - are running in a lisp image
before a fork occurs, and that thread B obtains a lock.  (That might
mean that it stores a thread ID of some sort in the lock data
structure, or atomically increments some field, or soemething like that.)

Thread A calls fork(), which creates a new OS-level process whose address
space is a (copy-on-write) copy of the original, except that in the child
process, only thread A really exists (as a schedulable entity).  Of course,
the address space of the child process is an exact copy of the parent,
so it will contain references to thread B.  (In CCL, thread B will
still be on (ALL-PROCESSES) and on lower-level enumerations of all
threads)..

Thread A (in the child) then tries to obtain the lock, which is still
(apparently) held by thread B.  (Thread B's identifier is stored in
the lock data structure, or the ownership count is non-zero, or
whatever.)  Thread A blocks, waiting for thread B to release the
lock.. There is no thread B in the child - there's certainly nothing
that will run and release the lock while thread A is waiting for it -
so the wait will never terminate.

This sort of thing can happen for (at least) any flavor of lock that's
implemented at least partly in userspace: "userspace" in a child
process after fork is basically a copy of the parent's address space,
including the userspace side of a lock.  This is true of application-
level locks (created by MAKE-LOCK in CCL), implementation-level locks
(basically the same thing in CCL), POSIX mutexes on most platforms,
simple spinlocks used to guard critical sections of code, ...

You might be able to minimize some of the risk of some kinds of
deadlock in CCL by having the child semi-magically find every lock
in the lisp image not owned by the surviving thread and reset it
to an "available" state (semi-magically.)  You could still lose
if some thread in the parent was off in malloc() and held some
sort of lock which serializes malloc access, and there's probably
no good/portable way to find and reset the malloc lock  (Or the
stdio lock.  Or ... well, who knows what else ?)

I don't think that any of this is CCL- or lisp-specific, and I hope
that people will agree that it's understandable and not very
interesting.

On Thu, 10 Jul 2008, Tobias C. Rittweiler wrote:

> "Hans Hübner" <hans at huebner.org> writes:
>
>> On Thu, Jul 10, 2008 at 08:16, Tobias C. Rittweiler <tcr at freebits.de> wrote:
>>
>>> But if S-A modifies the heap of the child, doesn't it also modify the
>>> heap of the parent, possibly corrupting it?
>>
>> fork() creates a new child process with a copy of the memory image of
>> the parent process.  It is normally implemented using copy-on-write
>> pages.  The child process has no access to the parent's memory image,
>> so it can't corrupt it's heap.
>
> So what's the locking problem Gary is talking about? (I could imagine
> that the copied states of condition variables etc. may cause problems
> when loading the image back in, and trying to get the image working
> again; but Gary seems to be talking about locking problems prior to
> dumping.)
>
>  -T.
>
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/mailman/listinfo/openmcl-devel
>
>