[Openmcl-devel] CCL deadlocked?

Valeriy Zamarayev valeriy.zamarayev at gmail.com
Tue Apr 21 15:02:16 PDT 2009


Thanks, now I'll know what to do next time.

Regards,
Valeriy

> The situation that you describe doesn't sound like what I'd call  
> deadlock.
>
> First of all: at just about all times in CCL, a background thread is
> doing:
>
> (loop
>   (sleep .33) ; using nanosleep
>   (maybe-do-a-little-work))
>
> The implementation of nanosleep on Darwin involves a system call
> named '__semwait_signal'; that system call ordinarily times out.
> The dtruss output shows that those syscalls are happening frequently
> and timing out; that's what should happen (and ordinarily does  
> happen.)
>
> Whatever the listener thread was doing didn't seem to involve much CPU
> usage; you were able to interrupt it and call SAVE-APPLICATION, and
> that all apparently worked.  In hindsight, it might have been better
> to look at a backtrace after interrupting the listener to see just
> what it was doing (or what it was waiting for) when it was  
> interrupted.
>
> Once you've determined what it's doing (instead of "doing more work"
> or "returning to the REPL"), it might be possible to conclude that
> it's doing that becausee of a bug in CCL, or that it's because of
> a bug in some third-party code, or a bug in your code.  We don't know,
> and it would seem that until we know what it's doing, we can't begin
> to understand why it's doing that.  There are cases where it's very
> hard to just interrupt something with ^C and get a backtrace that'd
> tell us something about what's going on, but this doesn't seem to be
> one of them.
>
>
> On Tue, 21 Apr 2009, Valeriy Zamarayev wrote:
>
>> Hello, all!
>>
>> I tried to use Clozure CL 64 bit on my MacBook Pro to read in
>> about 12G of network data and build a graph using CL-Graph from
>> this data. I'm only interested in a subset of records so the total
>> amount of memory consumed ended up about 500M. So most time is  
>> consumed
>> by parsing  the data and filtering out stuff that is not interesting.
>>
>> The process takes a while. It is constantly hogging the CPU.
>> I'm not a LISP optimization guru. It has been running almost all
>> day in a 'screen' session. I did some other work on the laptop
>> and at times the load and swapping was pretty high.
>>
>> Then, when I noticed CPU usage drop, I though it had completed. But
>> there was
>> no prompt in the CCL session. It was just not responding. I tried to
>> use the
>> tool dtruss to see what it was doing, here is what I saw:
>>
>> SYSCALL(args)            = return
>> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
>> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
>> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
>> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
>> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
>> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
>> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
>> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
>> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
>> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
>> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
>> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
>> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
>> gettimeofday(0x7FFF5FBFE610, 0x0, 0x0)           = 1240340270 0
>> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
>> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
>> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
>> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
>> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
>> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
>> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
>> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
>> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
>> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
>> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
>> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
>> __semwait_signal(0x30F, 0x0, 0x1)                = -1 Err#60
>>
>>
>> Errno 60 is ETIMEDOUT.
>>
>> It looked like it was waiting for something that did not happen,
>> the CPU was free, no disk activity, nothing. Like it deadlocked.
>>
>> I hit Control-C, did a 'save-application' so that I have that image
>> with loaded  data. Judging from the record count I can say it
>> has not completed its work.
>>
>> Now the questions are:
>>
>> 1. How can it be that it deadlocked?
>>
>> 2. Can the saved image be useful to identify the cause?
>>
>> 3. It is very likely that I'll have to go though this procedure  
>> again,
>> so I could instrument CLL so that I can try to catch the bug.
>> How can I do this? Or maybe I should use DTrace utility, gdb etc.
>>
>> I really like CCL, and if it has some bugs, I'd like to help find  
>> them,
>> but I'm rather new to LISP, and don't know a lot about CCL internals,
>> if someone could instruct me what I should do before I run my next 10
>> hour session, we could actually track down the problem.
>>
>> Thanks.
>>
>>
>> _______________________________________________
>> Openmcl-devel mailing list
>> Openmcl-devel at clozure.com
>> http://clozure.com/mailman/listinfo/openmcl-devel
>>
>>




More information about the Openmcl-devel mailing list