[Openmcl-devel] ccl64 freebsd64 hunchentoot segfault

Wed Feb 8 21:34:55 PST 2012

Thanks.  I wasn't able to reproduce this (running a trunk CCL on
FreeBSD 8.2 on real hardware), but did see another severe problem.
It's not clear how what I saw could cause what you saw, but until the
cause of what I saw is eliminated it's probably not worth looking for
anything else.  What I saw affects CCL on FreeBSD, might affect it on
Solaris, doesn't currently affect it on Linux (but might in some
future Linux version), and is pretty critical.  I think that it'll
take anywhere from a few hours to a few days to fix; the fix will be
made in the trunk, smoke-tested a bit, and then propagated to 1.7 in
svn if it all seems to work and doesn't obviously break anything.

Gory details follow.

Most (perhaps all) implementations of malloc/free use a global lock to
ensure that at most one thread in a process can modify heap data
structures (and malloc/free and friends generally need to modify data
structures without worrying about other threads trying to modify those
datas structures at the same time.)  It's possible that some
implementations could use atomic memory operations to keep things thread-safe,
but I don't know if any implemetations do so.

CCL's GC runs on an arbitrary thread (usually whatever thread tries to do
a memory allocation that would otherwise cause the heap to grow past a
specified threshold.)  The GC isn't concurrent; on entry, it suspends all
other lisp threads and on exit it resumes them.

CCL's GC supports "gcable pointers"; these are used to support language
constructs like MAKE-GCABLE-RECORD and are also used in the implementation
of things like locks and semaphores.  Conceptually, when the GC discovers
that certain foreign pointers are about to become garbage, it arranges to
do a kind of adhoc finalization (also known as termination) on them.  The
GC can't actually free the foreign memory assocated with the pointer while
other threads are suspended, because some suspended thread might hold the
malloc heap's lock (and the GC thread would deadlock, waiting forever for
a lock held by a thread that's suspended and obviously can't release it.)

To work around this, when certain kinds of gcable pointers (locks and
semaphores) are discovered to be garbage, the GC does the work of freeing
the object in two stages: the first stage runs immediately and does some
sort of "deinitialization" of the pointer (telling the OS kernel that the
semaphore isn't a semaphore anymore) and adding the pointer itself to a
list; the second stage runs after the GC has allowed other threads to
resume and calls free() on all of the "deinitialized" pointers on that
list.

That seems to work well on most platforms, but it assumes that (for
instance) initializing a POSIX semaphore (via sem_init()) doesn't
itself call malloc() and that deinitializaing one (via sem_destroy())
doesn't call free().  That isn't a safe assumption in general (though
it happens to be true in Linux) and isn't a correct assumption on
FreeBSD.  (I haven't checked Solaris, Windows uses its own semaphore
objects, and Apple hasn't invented POSIX semaphores yet AFAIK.)

As I said, it's not clear to me how this could lead to termination
via SIGSEGV (though it's clear that it can lead to the kind of deadlock
that I saw), so that may be another bug (or something unique to some
combination of FreeBSD 9.0 and virtualization.)  I'll try your test
again after this is cleaned up.

On Wed, 8 Feb 2012, Antony wrote:

> On 2/7/2012 12:06 PM, Gary Byers wrote:
>> I could speculate more, but I don't know how useful that'd be.  I don't
>> know of any FreeBSD-specific CCL problems that might cause this but that
>> doesn't mean too much either way.
>> 
> I am able to reproduce this without any of my code (Thanks to some prodding)
> Following is what I did
> run CCL as
>
> CCL_DEFAULT_DIRECTORY=/home/antony/ccl.freebsd/ccl 
> /home/antony/ccl.freebsd/ccl/scripts/ccl64
>
> in the repl do the following
>
> (load #P"/home/antony/git/thirdparty/asdf")
> (asdf:initialize-source-registry
> (list :source-registry (list :tree #P"/home/antony/git/thirdparty") ;;where 
> hunchentoot and it's dependencies live
>       :inherit-configuration))
> (asdf:oos 'asdf:load-op :hunchentoot)
> (defvar *https* (hunchentoot:start
>           (make-instance 'hunchentoot:easy-ssl-acceptor :port 8083
>                          :ssl-privatekey-password "xxxxxxx"
>                          :ssl-certificate-file 
> "~/git/config/https-cert/server.crt"
>                          :ssl-privatekey-file 
> "~/git/config/https-cert/server.key")))
>
> Run apache bench as
> ab -n 2000 -c 4 'https://xxxxx:8083/'
> I get segfault after some requests
>
> ab does not ignore  ssl cert errors (mine is self signed),
> so you essentially get a series of aborted requests,
>
> to make the test more complete, i got hold of the following script
> this ignores the cert sign error and does full requests
> (from 
> http://stackoverflow.com/questions/189993/how-do-i-fix-ssl-handshake-failed-with-apachebench 
> )
> #--------------------------------------------------
> #!/bin/bash
> K=200;
> HTTPSA='https://192.168.0.105:8083/'
> date +%M-%S-%N
> for (( c=1; c<=$K; c++ ))
> do
>    wget --no-check-certificate --secure-protocol=SSLv3 --spider $HTTPSA &
> done
> date +%M-%S-%N
> #------------------------------------------------
>
> and ran it concurrently as
> sh qqqqqq.sh &  sh qqqqqq.sh &
> this also caused segfault
>
> But neither caused segfault on CCL+linux
>
> The core file is still too big to email
>
> -Antony
>
>
>
>
>
>
>