[Openmcl-devel] 1.8 prerelease

Gary Byers gb at clozure.com
Thu Mar 15 11:00:38 PDT 2012


That (using a script that runs wget instead of using ab) didn't seem
to make a difference, but changing the vm so that it had 2 CPUs instead
of 1 caused it to crash reliably.

I looked at a couple of core dumps but didn't fully understand what I saw.
In both cases, the SIGSEGV occurred around the point where a thread is
about to enter a handler for the signal used by the GC to suspend other
threads.  For obscure reasons, a lot of CCL's signal handlers are run
on an alternate signal stack, and it occurred to me that the particular
case of the "suspend" signal's handler likely didn't need to be (and at
the very least this problem would be easier to debug if it wasn't.  I
changed how that signal was handled, and haven't been able to reproduce
the problem since.

It doesn't make much sense (yet) that this would fix the problem (and all
that I know is that I didn't see the problem after making the change.)  If
there's a general problem with the alternate signal stack mechanism or
CCL's use of it, that problem would still be there (other signal handlers
more clearly need to use that mechanism.)

In any case, there does seem to be a problem there (somewhere) and it seems
likely that the problem you've seen is the same one that I saw.  The small
change below seemed to at least avoid the problem in your test case, but
I don't claim that it really fixes it or that I even understand the problem
at this point ...

[ccl/lisp-kernel] gb at loser> svn diff x86-exceptions.c
Index: x86-exceptions.c
===================================================================
--- x86-exceptions.c	(revision 15251)
+++ x86-exceptions.c	(working copy)
@@ -2292,10 +2292,8 @@
  #endif

  #ifdef USE_SIGALTSTACK
-#define SUSPEND_RESUME_HANDLER altstack_suspend_resume_handler
  #define THREAD_KILL_HANDLER altstack_thread_kill_handler
  #else
-#define SUSPEND_RESUME_HANDLER arbstack_suspend_resume_handler
  #define THREAD_KILL_HANDLER arbstack_thread_kill_handler
  #endif

@@ -2311,8 +2309,8 @@
    thread_suspend_signal = SIG_SUSPEND_THREAD;
    thread_kill_signal = SIG_KILL_THREAD;

-  install_signal_handler(thread_suspend_signal, (void *)SUSPEND_RESUME_HANDLER,
-			 RESERVE_FOR_LISP|ON_ALTSTACK|RESTART_SYSCALLS);
+  install_signal_handler(thread_suspend_signal, (void *)suspend_resume_handler,
+			 RESERVE_FOR_LISP|RESTART_SYSCALLS);
    install_signal_handler(thread_kill_signal, (void *)THREAD_KILL_HANDLER,
  			 RESERVE_FOR_LISP|ON_ALTSTACK);
  }


On Thu, 15 Mar 2012, Antony wrote:

> On 3/14/2012 10:55 AM, Gary Byers wrote:
>> I just tried this with the current trunk CCL on FreeBSD 9.0 running on
>> a VMWare VM on a Mac. "ab -n 2000 -c 4 https://localhost:8083/" ran to
>> completion (though every request got "SSL read failed - closing
>> connection", presumably because ab can't deal with self-signed 
>> certificates.)
>> 
>> Last month, I don't think that I was sure whether there was a problem
>> with CCL and FreeBSD 9.0 or whether there was a problem with FreeBSD
>> 9.0 and the virtualization software that you were using.  The fact
>> that this worked for me (as well as it did) makes me more suspicious
>> of the latter at this point.
>> 
>> CCL is very sensitive to the format of a signal context (a data structure
>> that describes the machine state at the time that an exception occurred.)
>> At some point, the size and layout of the machine-dependent parts of a
>> signal context will change in some way (to accommodate AVX and 256-bit
>> vector registers.)  I don't think that FreeBSD has made any such change
>> as of 9.0, and FreeBSD has generally been very good (at least since 5.x)
>> about maintaining backward compatibility at this level.
>> 
> I converted the bsd vm to play in VMWare vmplayer.
> Ran the same test. CCL segfaulted after about 80 requests.
> Could you try with the script I mentioned at
> http://clozure.com/pipermail/openmcl-devel/2012-February/013376.html
> since it completes the requests even with self signed certs unlike  ab
> I guess it doesn't make much difference, but its a relatively easy thing to 
> try
>
> -Antony
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/mailman/listinfo/openmcl-devel
>
>



More information about the Openmcl-devel mailing list