[Openmcl-devel] Slime problem on windows

Thu Oct 23 04:19:15 PDT 2008

On Thu, 23 Oct 2008, Marko KociÄ~G wrote:

>>> Fault during read of memory address #xC4
>>>  [Condition of type CCL::INVALID-MEMORY-ACCESS]
>>>
>>> Restarts:
>>>
>>> Backtrace:
>>>  0: (CCL::XCMAIN 6225198)
>>>     Locals:
>>>       #:G122183 = 6225198
>>>       #:G122193 = #<A Foreign Pointer [stack-allocated] #x17BF4B8>
>>>       CCL::XP = #<A Foreign Pointer #x17BFA74>
>>>       CCL::XCF = #<A Foreign Pointer #x18D0FD0>
>>>       SIGNAL = 10
>>>       CCL::CODE = 0
>>>       CCL::ADDR = 196
>>>       CCL::FRAME-PTR = 6505460
>>>  1: (CCL::%PASCAL-FUNCTIONS% 1 6225198)
>>>     Locals:
>>>       CCL::INDEX = 1
>>>       CCL::ARGS-PTR-FIXNUM = 9633070
>>>       CCL::LISP-FUNCTION = #<Compiled-function CCL::XCMAIN
>>> (Non-Global)  #x8375B76>
>>>       WITHOUT-INTERRUPTS = NIL
>>>       CCL::*CALLBACK-TRACE-P* = #<error printing CCL::IMMEDIATE #x33>
>>>
>>
>> What SLIME's backtrace is showing here is part of the process of calling
>> out to lisp after the lisp kernel has detected an exception (an invalid
>> memory access of some sort.)  That isn't very interesting; the next
>> few older frames would (hopefully) show us something about what was
>> happening when the fault happened.
>
> How do I get these frames? This is all info provided by slime sldb.

I had something similar happen a couple of times when running under
SLIME.  The first time, the backtrace was almost exactly the same
as what you saw; the second time, there was a little bit more there,
and it looked like the crash happened soon after a thread started
executing lisp code.

>
>>> I'll try to rebuild ccl to see if the error will persist, but it'll
>>> probably take some time for me to figure out how to do it.
>>
>> ? (rebuild-ccl :clean t)
>>
>> will delete all FASL (.wx32fsl) files in the CCL hierarchy, compile
>> all sources into fasls, and rebuild the heap image.
>>
>> On other platforms, other arguments to REBUILD-CCL will also cause
>> the lisp kernel (C and assembler code) to be rebuilt.  Unfortunately,
>> Windows doesn't allow an executable file (wx86cl.exe) to be overwritten
>> while it's running, since that would be too useful.
>
> It is possible to do (ccl:rebuild-ccl :clean t :full t) on windows if
> you fists rename wx86cl.exe and wx86cl.image to ccl.exe and ccl.image.
> After that start ccl.exe and perform full rebuild which will create
> new wx86cl.exe and wx86cl.emage.
>

Yes.  It doesn't lend itself to automation, since you can't (AFAIK)
rename a running executable, either.

> I had to apply small patch in order to build kernel (I'm using mingw +
> msys, no cygwin), see attachment. I don't know enough make to make it
> work with both cygwin and mingw.
>
> Seems like pathnames were broken after original image is created,
> since I got the following error after trying (asdf:oos 'asdf:load-op
> :swank) with the new image.
>
> ; loading system definition from
> c:/lisp/lib/asdf-binary-locations/asdf-binary-locations.asd into
> #<Package "ASDF0">
> ; registering #<SYSTEM ASDF-BINARY-LOCATIONS #x89EA2C6> as ASDF-BINARY-LOCATIONS
>  (WindowsX8632)!e Common Lisp Version 1.3-dev-r11200MS
> ? (asdf :swank)
> ; loading system definition from c:/dev/cvstree/lisp/slime/swank.asd
> into #<Package "ASDF0">
> ; registering #<SYSTEM :SWANK #x8A7C1A6> as SWANK
> ;;
> __(windowsx8632)-windows-x86/swank-backend.wx32fslmarkko/.slime/fasl/2008-10-21/openmcl-version_1.3-dev-r11200ms
> ;; Condition: Can't create directory "/Documents and
> Settings/markko/.slime/fasl/2008-10-21/openmcl-version_1.3-dev-r112
> __(windowsx8632)-windows-x86/".
> ;; Aborting.
> ;;
> ?
>
> Note missing "c:" before "Documents and Settings" paths. It worked
> with image from svn, so I suppose it is not slime fault.

Not sure what that means.  I don't think that the absence of the "c:"
is significant, unless something changed the default drive.

I'm not sure whether it was lost in email, but it's a little odd that
the directory name seems to contain "...-dev-r112__..." instead of
"...-dev-r11200ms-trunk__..."

> Also, with newly rebuilt system, I sometimes get errors like this:
> ? (quit)
>> Error: Fault during read of memory address #xB4
>> While executing: CCL::THREAD-INTERRUPT, in process listener(1).

Looks very similar to the other cases: something's trying to access a
field in some structure (at an offset of #xB4, or #xC4, or whatever)
only the structure pointer is NULL.  The "structure" in question is
almost certainly the threads's lisp thread-local-storage, but if
something's clobbering the (segment) register that should be pointing
at that I'd feel better if it did it a little more deterministically.
(Actually, it makes a little bit of sense in this case, but less 
in the cases where it appears to happen early in the thread's lifetime.)
I've at least seen something similar happen, so hopefully it'll start
to make a bit more sense.

(There's a warning that occurs when building the win32 kernel that
says something about needing "context setup for win32"; it has to do
with being able to suspend a thread when that thread's in the process
of returning from an exception.  The warning's a reminder that that
case isn't handled correctly on win32; I wouldn't be too surprised if
there's a similar vulnerability when a thread's interrupted.)

In other words: this isn't quite ready for prime time, but it's
not incredibly far away, either.