[Openmcl-devel] CCL crash on windows (with multithreading and ffi callbacks)

Anton Vodonosov avodonosov at yandex.ru
Mon May 2 14:24:47 PDT 2011


02.05.2011, 06:59, "Gary Byers" <gb at clozure.com>:
> I'd recommend debugging.
>
> Is the callback (locking-callback) called ?  Do its arguments look plausible ?
> Do BT:ACQUIRE-LOCK and BT:RELEASE-LOCK do the right thing(s) ?  If things
> seem to work when this callback isn't installed and don't when it is, then one
> could suspect either the mechanics of the callback or the code it calls; if
> (hypothetically) BT:ACQUIRE-LOCK and BT:RELEASE-LOCK didn't work, then it
> wouldn't be too surprising if a multithreaded application that relied on those
> things working didn't work.
>

As for BT:ACQUIRE-LOCK and other callback impl. details, they are ruled out
by the fact that the crash reproduces the same way if we leave the callback
body empty.

And the crash happens not the because of absence of proper synchronization - 
if we do not register the callback at all, the crash doesn't happen.

> If the "mechanics of the callback" - receiving arguments from and returning
> results from foreign code - were at fault, then that is something that CCL
> (and to some extent CFFI) is responsible for and a problem there would
> almost certainly be a bug in CCL.  The callbacks that seem to be involved
> don't -look- too unusual, but one never knows.

Arguments are passed OK to the callback - the strings and numbers we would
expect. So it doesn't seem to be a stack corruption or something in that fashion.

Also interesting is that the callback is called thousands of times before the crash 
happens.

> If someone isolates the problem as a  CCL bug, then I'd certainly be interested
> in trying to fix it.  It's potentially a lot of work just to isolate the problem;
> I wish that I coul say (well, sort of wish ...) that I had time and interest
> in doing that, but I quite frankly have neither.  (I don't even know where
> things like QuickLisp put the sources to the systems that it downloads, and I
> don't have the attention span or patience or whatever to learn that.)
>
> It sounds like you've already done quite a bit to narrow this down;
> adding a few calls to BREAK or PRINT or FORMAT in appropriate places
> might do a lot to help isolate the problem to the point where someone
> could actually do something about it.
>

I tried already PRINT and FORMAT. I hoped maybe you have some debugging
technique which allows you to find out crash reasons somehow immediately.

The symptoms are strange. I don't know, maybe it's not FFI directly, maybe
Windows CCL is not thread safe with some basic data structures (e.g. CONSes)
which happen to be used in FFI implementations. As I said, I also observed 
crashes without FFI.

Well. If I find anything more I'll report here.

Best regards,
- Anton



More information about the Openmcl-devel mailing list