[Openmcl-devel] armcl memory bug 1257 and v1.12-dev.4-3-gdd5622e9 [locks not held on armcl?]
jetmonk at gmail.com
Sat Jan 19 23:29:47 PST 2019
Thanks for looking at this, Matthew.
Here is an update that pins down the problem further:
the with-lock macro here is a clone of ccl:with-lock-grabbed, except it checks ownership of the lock before and after the body.
It apparently shows that a lock held by a thread is often stolen by another thread, even in the absence of GC.
Specifically, running (threadtest2 :exercise-locking t) causes “ERR2”, where a test after the body of the macro indicates that a lock has been stolen by another thread.
Going deeper may require a knowledge of Arm cpu instructions.
The culprit seems to be in l0-misc.lisp, function #-futex %lock-recursive-lock-ptr
Wild ill-informed speculation: This Arm manual
suggests that a DMB instruction is needed after a lock operation, but disassembling
ccl::%lock-recursive-lock-ptr doesn’t show this instruction, assuming everything it does is inlined.
> On Jan 11, 2019, at 7:13 PM, R. Matthew Emerson <rme at acm.org> wrote:
>> On Jan 11, 2019, at 3:34 PM, JTK <jetmonk at gmail.com <mailto:jetmonk at gmail.com>> wrote:
>> Apologies for these piece-wise emails.
>> I tried to replicate armcl threading problems described in
>> https://trac.clozure.com/ccl/ticket/1257 <https://trac.clozure.com/ccl/ticket/1257>
>> and I think I might have found some other issues, as well as a clue to a possible cause
>> See https://pastebin.com/2x3mcHMc <https://pastebin.com/2x3mcHMc>
>> I found that
>> * a simple thread test with a bit of garbage generation in threads fails with Unhandled Exception 4 on armcl 1.12-dev (v1.12-dev.4-3-gdd5622e9), but does not fail in release version v1.11.5
>> * If I put a lock around the garbage generation (just a make-string), to exercise locking in threads, in v1.11.5 it fails with “Current process #<PROCESS Reader 4(26) [Active] #x150D335E> does not own lock #<RECURSIVE-LOCK "glock" [ptr @ #x76105C60] #x150BB0D6> in CCL::%UNLOCK-RECURSIVE-LOCK-OBJECT"
>> Is it possible that imperfect locking (race condition? non atomicity?) in the armcl implementation causes bug 1257, mimicking a GC bug?
> The last time I tried hard to find this bug I utterly failed to find it.
> I pasted your comments into the Trac ticket (and I'll be to copying the whole issue over to GitHub soon).
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Openmcl-devel