[Openmcl-devel] armcl memory bug 1257 and v1.12-dev.4-3-gdd5622e9 [locks not held on armcl?]

JTK jetmonk at gmail.com
Sat Jan 19 23:29:47 PST 2019


Thanks for looking at this, Matthew.

Here is an update that pins down the problem further:

https://pastebin.com/zvfyL4Dj <https://pastebin.com/zvfyL4Dj>

the with-lock macro here is a clone of ccl:with-lock-grabbed, except it checks ownership of  the lock before and after the body.

It apparently shows that a lock held by a thread is often stolen by another thread, even in the absence of GC.

Specifically, running (threadtest2 :exercise-locking t) causes “ERR2”, where a test after the body of the macro indicates that a lock has been stolen by another thread.


Going deeper may require a knowledge of Arm cpu instructions.

The culprit seems to be in l0-misc.lisp, function #-futex  %lock-recursive-lock-ptr

Wild ill-informed speculation: This Arm manual
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dht0008a/ch01s03s02.html <http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dht0008a/ch01s03s02.html>
suggests that a DMB instruction is needed after a lock operation, but disassembling
ccl::%lock-recursive-lock-ptr doesn’t show this instruction, assuming everything it does is inlined.

John


> On Jan 11, 2019, at 7:13 PM, R. Matthew Emerson <rme at acm.org> wrote:
> 
> 
> 
>> On Jan 11, 2019, at 3:34 PM, JTK <jetmonk at gmail.com <mailto:jetmonk at gmail.com>> wrote:
>> 
>> 
>> 
>> Apologies for these piece-wise emails.
>> 
>> I tried to replicate armcl threading problems described in
>> https://trac.clozure.com/ccl/ticket/1257 <https://trac.clozure.com/ccl/ticket/1257> 
>> and I think I might have found some other issues, as well as a clue to a possible cause
>> 
>> See https://pastebin.com/2x3mcHMc <https://pastebin.com/2x3mcHMc>
>> 
>> I found that 
>> 
>> * a simple thread test with a bit of garbage generation in threads fails with Unhandled Exception 4 on armcl 1.12-dev (v1.12-dev.4-3-gdd5622e9), but does not fail in release version v1.11.5 
>> 
>> * If I put a lock around the garbage generation (just a make-string), to exercise locking in threads, in v1.11.5 it fails with 	“Current process #<PROCESS Reader 4(26) [Active] #x150D335E> does not own lock #<RECURSIVE-LOCK "glock" [ptr @ #x76105C60] #x150BB0D6> in CCL::%UNLOCK-RECURSIVE-LOCK-OBJECT"
>> 
>> Is it possible that imperfect locking (race condition?  non atomicity?) in the armcl implementation causes bug 1257, mimicking a GC bug?  
> 
> The last time I tried hard to find this bug I utterly failed to find it.
> 
> I pasted your comments into the Trac ticket (and I'll be to copying the whole issue over to GitHub soon).
> 
> -m

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clozure.com/pipermail/openmcl-devel/attachments/20190119/664d6be8/attachment.htm>


More information about the Openmcl-devel mailing list