[Openmcl-devel] lock owner: how to access?

Wed Jul 6 02:44:38 PDT 2011

On Wed, 6 Jul 2011, Daniel Herring wrote:

> On Tue, 5 Jul 2011, Ron Garret wrote:
>
>> On Jul 2, 2011, at 1:35 AM, Gary Byers wrote:
>> 
>>> The only thing you can generally do (outside of a context like the kernel
>>> debugger, where all other threads are stopped) is answer the question
>>> "what thread owned this lock a few cycles ago, when I asked ?"  The answer
>>> to that question is either useful or misleading, and there's no general
>>> way of knowing which.
>> 
>> You can make that answer more useful by adding a timestamp (or a counter) 
>> to a lock that keeps track of the last time the lock was grabbed.  That 
>> lets you determine wether a cycle in the resource contention graph is real 
>> or a timing artifact by walking the cycle twice.  If the 
>> timestamps/counters haven't changed since the first traverse then you have 
>> found a deadlock.
>
> Such timing instrumentation may be expensive to add compared to the payoff. 
> How often do we actually care?  How long must the observer wait between cycle 
> walks?

I don't think that cycle-checking during locking operations is particularly
attractive; it would indeed add some overhead and could at best catch certain
kinds of deadlock.  (Something like:

  (with-lock-grabbed (lock)
    (infinite-loop)) ; or WAIT-FOR-SOMETHING-THAT-CANT-HAPPEN-NOW

can also lock things up pretty badly and is harder to detect.)

Ron's suggestion - just incrementing a counter every time that a lock
is acquired - is fairly cheap to implement (assuming that the counter gets
incremented modulo MOST-POSITIVE-FIXNUM or something like that) and it
does give you some information that could help with debugging.

>
> A with-rest-of-world-stopped macro might be useful for wrapping 
> probe-lock-owner in the cases where one needs an answer for other than 
> reporting purposes.  Otherwise the lock owner can still change between when 
> the probe is performed and when you see/act on the answer.

Most of the cqses that I can remember where CCL's own code has deadlocked
has involved some variant of:

(with-other-threads-suspended
   (do-something-that-requires-a-lock-held-by-a-suspended-thread)); oops

The GC runs with other threads suspended, and the bugs that I remember
most vividly involve cases where the GC tries to free a GCable pointer
when some lock maintained by #_malloc/#_free is owned by a suspended
thread; there have been a few variants of that.

There actually is a CCL::WITH-OTHER-THREADS-SUSPENDED macro; it offers
similar ways to lose.  (One of the most common ways involves printing
to a shared stream.)  If it was documented, the documentation would say
something like:

  WITH-OTHER-THREADS-SUSPENDED &body body [Macro]

    body - the results are undefined if this is non-NIL

So yes, if you're extraordinarily careful about how you do it,  you
could try to debug deadlock scenarios in the body of that macro.  I'm
not sure that I'd want to try that. 
>
> The lateral option to all this is to call trylock in a polling loop. When 
> trylock succeeds, you know who the owner is.  ;)
>
> - Daniel
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/mailman/listinfo/openmcl-devel
>
>