[Openmcl-devel] CCL deadlocked?

Gary Byers gb at clozure.com
Tue Apr 21 14:41:23 PDT 2009


The situation that you describe doesn't sound like what I'd call deadlock.

First of all: at just about all times in CCL, a background thread is
doing:

  (loop
    (sleep .33) ; using nanosleep
    (maybe-do-a-little-work))

The implementation of nanosleep on Darwin involves a system call
named '__semwait_signal'; that system call ordinarily times out.
The dtruss output shows that those syscalls are happening frequently
and timing out; that's what should happen (and ordinarily does happen.)

Whatever the listener thread was doing didn't seem to involve much CPU
usage; you were able to interrupt it and call SAVE-APPLICATION, and
that all apparently worked.  In hindsight, it might have been better
to look at a backtrace after interrupting the listener to see just
what it was doing (or what it was waiting for) when it was interrupted.

Once you've determined what it's doing (instead of "doing more work"
or "returning to the REPL"), it might be possible to conclude that
it's doing that becausee of a bug in CCL, or that it's because of
a bug in some third-party code, or a bug in your code.  We don't know,
and it would seem that until we know what it's doing, we can't begin
to understand why it's doing that.  There are cases where it's very
hard to just interrupt something with ^C and get a backtrace that'd
tell us something about what's going on, but this doesn't seem to be
one of them.


On Tue, 21 Apr 2009, Valeriy Zamarayev wrote:

> Hello, all!
>
> I tried to use Clozure CL 64 bit on my MacBook Pro to read in
> about 12G of network data and build a graph using CL-Graph from
> this data. I'm only interested in a subset of records so the total
> amount of memory consumed ended up about 500M. So most time is consumed
> by parsing  the data and filtering out stuff that is not interesting.
>
> The process takes a while. It is constantly hogging the CPU.
> I'm not a LISP optimization guru. It has been running almost all
> day in a 'screen' session. I did some other work on the laptop
> and at times the load and swapping was pretty high.
>
> Then, when I noticed CPU usage drop, I though it had completed. But
> there was
> no prompt in the CCL session. It was just not responding. I tried to
> use the
> tool dtruss to see what it was doing, here is what I saw:
>
> SYSCALL(args)            = return
> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
> gettimeofday(0x7FFF5FBFE610, 0x0, 0x0)           = 1240340270 0
> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
>
>
> Errno 60 is ETIMEDOUT.
>
> It looked like it was waiting for something that did not happen,
> the CPU was free, no disk activity, nothing. Like it deadlocked.
>
> I hit Control-C, did a 'save-application' so that I have that image
> with loaded  data. Judging from the record count I can say it
> has not completed its work.
>
> Now the questions are:
>
> 1. How can it be that it deadlocked?
>
> 2. Can the saved image be useful to identify the cause?
>
> 3. It is very likely that I'll have to go though this procedure again,
> so I could instrument CLL so that I can try to catch the bug.
> How can I do this? Or maybe I should use DTrace utility, gdb etc.
>
> I really like CCL, and if it has some bugs, I'd like to help find them,
> but I'm rather new to LISP, and don't know a lot about CCL internals,
> if someone could instruct me what I should do before I run my next 10
> hour session, we could actually track down the problem.
>
> Thanks.
>
>
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/mailman/listinfo/openmcl-devel
>
>



More information about the Openmcl-devel mailing list