[Openmcl-devel] ccl64 freebsd64 hunchentoot segfault

Tue Feb 7 12:06:12 PST 2012

IIRC, someone who works for Clozure runs a personal blog on a FreeBSD
machine, using some combination of Hunchentoot/CCL/other code.  (I'm
more certain of the FreeBSD/CCL/other code components being involved
and don't remember what he's said about Hunchentoot.)

My general impression is that many more people use CCL on Linux (and
Darwin and probably Windows) than do on FreeBSD (or Solaris), but if
that's true it's probably got more to do with how many people use Linux
relative to FreeBSD than anything else.

FWIW, I don't think of the FreeBSD ports of CCL as being particularly
problematic. I don't think that we get too many FreeBSD-specific bug
reports and tend to think of that as meaning "relatively few people
use it and those that do don't experience many problems", but I suppose
that it could mean something else.

CCL establishes a handler for the SIGSEGV ("segmentation violation")
signal.  If that handler can't figure out how to ... handle ... a
particular signal, it will generally drop into the kernel debugger
("unhandled exception N ..", where N is 11 for SIGSEGV IIRC.)  Having
a SIGSEGV cause the process to get terminated abruptly like what
you're seeing usually means something like:

- the OS kernel wanted to push some context information on a thread's
   stack and call an application-defined signal handler, but the thread
   was out of stack space.  This kind of thing can happen if (for instance)
   a signal handler causes a memory fault, which raises a signal, which
   invokes a signal handler which causes a memory fault ...  It's pretty
   amazing how quickly a modern CPU can exhaust even a very large stack this
   way.

- the process is just completely out of memory and anything else that the
   OS might do to report that would cause more memory to be allocated.  This
   sort of thing can happen on OSes that "overcommit" memory (allow an allocation
   request to succeed in cases where sufficient memory isn't available, in
   the hope that it will be available by the time it's acually used.)  It's
   easy to blame overcommit policies for a wide variety of problems; I've probably
   blamed it for things that it wasn't responsible, but it's probably responsible
   for some of it.  ("If it's not guilty of this crime, then surely it's guilty
   of some other.")

I suppose that it's possible that something in Hunchentoot or the large layer
of 3rd-party lisp libraries it depends on could affect signal/exception handling
in some way, but this doesn't seem likely.  FreeBSD 9.0 was released less than
a month ago and (just like other OSes) new releases often have extremely subtle
bugs that aren't found during testing.  CCL could certainly be responsible
for the problem; I don't think that I've seen this sort of problem in the
FreeBSD ports in years, but that doesn't mean that something that I haven't
seen couldn't be involved.

One other thing that can lead to this sort of problem (though I don't know
whether it does) is VirtualBox.  In the past, there have been cases where
CCL wouldn't run at all in some virtual environments because of bugs in those
environments.  Just about all aspects of signal handling are very complicated
and I imagine (and have seen evidence) that some of that is tricky to virtualize.
I don't know what the problem is or who the guilty party is, but simply want
to point out that VirtualBox should be added to the list of suspects (or at
least brought in for questioning, just to help us with our inquiries.)

Whether or not a core file was dumped when the message said it had been
generally depends on resource limits.  If you have a core file, how big
is it ?  (I'm not sure if I'll be able to but am curious to know whether
it'd be practical to send it to me via email.)  There are some utilities
in CCL (in "ccl;library;core-files.lisp") that can be used to do postmortem
analysis of core files (that code's all conditionalized for x8664 Linux,
but I imagine that a lot of it would work on x8664 FreeBSD as well.)

I could speculate more, but I don't know how useful that'd be.  I don't
know of any FreeBSD-specific CCL problems that might cause this but that
doesn't mean too much either way.

On Tue, 7 Feb 2012, Antony wrote:

> Hi
>
> I have been developing CL with CCL under Linux.
> From the point of intereacting with stuff outside CL, I basically use 
> cl-postgres for db and hunchentoot (and required dependencies) for https.
> Things seem to work fine in this world.
>
> I wanted to try out FreeBSD (cause I am a bit worried about going to 
> production with Linux as my only choice)
> It works, but under testing (that's a bit loaded) it segfaults (basically in 
> the context of a serving a https request)
> All I get in the repl is
> Segmentation fault (core dumped)
>
> System info is
> uname -a
> gives
> FreeBSD 9.0-RELEASE #0: Tue Jan  3 07:46:30 UTC 2012 
> root at farrell.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  amd64
>
> Clozure Common Lisp Version 1.7-r15199M  (FreebsdX8664)
>
> It's hard to give version info for all the libs, but I had recently (less 
> than two months) updated most of them
> The code is identical (since it's a shred folder) across my Linux and bsd.
> Both Linux and bsd are actually under VirtualBox inside a Win 7. Everything 
> is 64bit. bsd is assigned one cpu (not sure if any of this matters)
>
> Do others use this combo successfully?
> Are there any known issues?
>
> -Antony
>
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/mailman/listinfo/openmcl-devel
>
>