[Openmcl-devel] A bug and a quick fix

Gary Byers gb at clozure.com
Sun Dec 11 01:45:48 PST 2011


lazarus() is registered via the atexit() mechanism.  On a Unix system,
it'd be called by exit() before the process exits (via _exit()); it's
a little surprising that it's called at all when Windows is
terminating the process, since exit() and atexit() and _exit() are
just C runtime routines; it may be that the C runtime that we're using
hooks atexit() into something Windows-specific (like a "DLL detach function);
if so, that doesn't seem like a good idea.

For obscure reasons. some parts of SAVE-APPLICATION have to run after
exit() is called.  lazarus() looks at the word on the bottom of the lisp
stack (the "value stack") and if it's non-NIL it reenters lisp and FUNCALLs
the value of that word.

A TCR ("Thread Context Record") is a data structure that contains
lisp-specific thread-local data.  It doesn't make a lot of sense for
a thread to have a TCR but for that TCR's vs_area slot to be NIL.
(The vs_area slot contains another data structure which describes
where the thread's value stack is in memory.  When a thread (other
than the initial thread) exits, it'll deallocate its lisp/value stack
and set its tcr->vs_area slot to NULL; I can't think of any other way
for that to happen.

What this all means is that either lazarus() is running on the initial
thread and its tcr->vs_area has been set to NULL or it's running on a
thread that's presumed to have exited, and neither of those things
make any sense at all.  (Rather than miraculously rising from the
grave - or at least running after exit() has been called - as its name
suggests, lazarus() seems to be staggering around and spreading
terror, like a character in a bad zombie movie ...)

It makes some sense for lazarus() to run if QUIT has called exit(), but
it's not particularly useful otherwise.  Modulo race conditions, having
QUIT set a flag and having lazarus() do nothing unless the flag is set
might work around the problem.  lazarus() isn't exactly a bottleneck,
so it could afford to check that the tcr's sane (has a vs_area slot)
more than some other things can.

I don't know enough to be able to claim that this is all due to some
confusion in the C runtime, but I was unable to get the 64-but CCL to
generate a crash dialog when the console window it was running in was
closed, and there are some differences between the 64-bit and 32-bit
C runtimes.


On Sun, 11 Dec 2011, CRLF0710 wrote:

> OK. I'm here to provide more information...
>
> According to some debugging, i'm found that first such exception
> raised at  lazarus() in pmcl-kernel.c ?
>
> void
> lazarus()
> {
>  TCR *tcr = get_tcr(false);
>  if (tcr) {
>    /* Some threads may be dying; no threads should be created. */
>    LOCK(lisp_global(TCR_AREA_LOCK),tcr);
>
>    tcr->vs_area->active = tcr->vs_area->high - node_size;
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> It seems that vs_area is NULL at this point.(Not pretty sure, because
> the process itself  was soon killed by the system in the same time the
> debugger window showed) I know nothing about what lazarus() does but
> that's where the first exception raised. Any clue how to fix  this?
>
> --
> Wir m?sen wissen; wir werden wissen!
> CrLF.0710
>
>



More information about the Openmcl-devel mailing list