[Openmcl-devel] A bug and a quick fix
Gary Byers
gb at clozure.com
Sun Dec 11 01:45:48 PST 2011
lazarus() is registered via the atexit() mechanism. On a Unix system,
it'd be called by exit() before the process exits (via _exit()); it's
a little surprising that it's called at all when Windows is
terminating the process, since exit() and atexit() and _exit() are
just C runtime routines; it may be that the C runtime that we're using
hooks atexit() into something Windows-specific (like a "DLL detach function);
if so, that doesn't seem like a good idea.
For obscure reasons. some parts of SAVE-APPLICATION have to run after
exit() is called. lazarus() looks at the word on the bottom of the lisp
stack (the "value stack") and if it's non-NIL it reenters lisp and FUNCALLs
the value of that word.
A TCR ("Thread Context Record") is a data structure that contains
lisp-specific thread-local data. It doesn't make a lot of sense for
a thread to have a TCR but for that TCR's vs_area slot to be NIL.
(The vs_area slot contains another data structure which describes
where the thread's value stack is in memory. When a thread (other
than the initial thread) exits, it'll deallocate its lisp/value stack
and set its tcr->vs_area slot to NULL; I can't think of any other way
for that to happen.
What this all means is that either lazarus() is running on the initial
thread and its tcr->vs_area has been set to NULL or it's running on a
thread that's presumed to have exited, and neither of those things
make any sense at all. (Rather than miraculously rising from the
grave - or at least running after exit() has been called - as its name
suggests, lazarus() seems to be staggering around and spreading
terror, like a character in a bad zombie movie ...)
It makes some sense for lazarus() to run if QUIT has called exit(), but
it's not particularly useful otherwise. Modulo race conditions, having
QUIT set a flag and having lazarus() do nothing unless the flag is set
might work around the problem. lazarus() isn't exactly a bottleneck,
so it could afford to check that the tcr's sane (has a vs_area slot)
more than some other things can.
I don't know enough to be able to claim that this is all due to some
confusion in the C runtime, but I was unable to get the 64-but CCL to
generate a crash dialog when the console window it was running in was
closed, and there are some differences between the 64-bit and 32-bit
C runtimes.
On Sun, 11 Dec 2011, CRLF0710 wrote:
> OK. I'm here to provide more information...
>
> According to some debugging, i'm found that first such exception
> raised at lazarus() in pmcl-kernel.c ?
>
> void
> lazarus()
> {
> TCR *tcr = get_tcr(false);
> if (tcr) {
> /* Some threads may be dying; no threads should be created. */
> LOCK(lisp_global(TCR_AREA_LOCK),tcr);
>
> tcr->vs_area->active = tcr->vs_area->high - node_size;
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> It seems that vs_area is NULL at this point.(Not pretty sure, because
> the process itself was soon killed by the system in the same time the
> debugger window showed) I know nothing about what lazarus() does but
> that's where the first exception raised. Any clue how to fix this?
>
> --
> Wir m?sen wissen; wir werden wissen!
> CrLF.0710
>
>
More information about the Openmcl-devel
mailing list