[Openmcl-devel] freebsd 9.1 amd64

Gary Byers gb at clozure.com
Wed Feb 13 23:58:29 PST 2013

I made a change to the trunk kernel source that seemed to fix the problem for
me while running 9.1/amd64 under a VMware VM.  (VMware made the VM appear to
have AVX.)  I wanted to try to test it on some real AVX-enabled hardware;
unfortunately, the only AVX-enabled hardware I have has network adapters
that were made less than 5 years ago and are therefore a total mystery to
FreeBSD ...

I'll keep wrestling with that (may be able to find a supported USB->ethernet
or USB->wifi adapter); if people who use FreeBSD 9.1 on amd64 could test
these changes, that'd be helpful.

The changes are just in source in the trunk and they only affect the lisp
kernel, so to test them you need to:

1) check out or "svn update" a copy of the "freebsdx86" tree from the trunk.
2) do:

$ cd ccl/lisp-kernel/freebsdx8664
$ make
$ cd ../..
$ ./fx86cl64

I find that I can do things like:

? (dotimes (i 10) (rebuild-ccl :full t))

reliably after those changes; before they were made, that would die on
the first or second iteration with a cryptic message from the
'sigreturn' system call (often the same message that people have
reported, occasionally something different but just as cryptic.)

The old saying ("testing can confirm the presence of bugs but doesn't
prove their absence") seems to apply here: I haven't been able to
provoke the bug yet, but I'm not 100% confident that I understand why
the change avoids the problem.  (The change has to do with how
"alternate signal stacks" are allocated and the symptom has been that
a stack-allocated data structure that describes the state of a thread
when an exception occurs is apparently getting corrupted in some way
so that it can't be used to restore the thread's state ("sigreturn")
sometimes.  Those things are at least somewhat related, but I don't
fully understand how one thing (the old way of allocating stacks for
signal processing)  causes the other (corruption of the signal context.)

On Sun, 10 Feb 2013, Mark Cox wrote:

> On 10/02/2013, at 10:06 AM, R. Matthew Emerson wrote:
>> On Feb 9, 2013, at 6:55 PM, Mark Cox <markcox80 at gmail.com> wrote:
>>> On 09/02/2013, at 1:14 AM, Gary Byers wrote:
>>>> I was able to run 9.1 under VMware and think/hope that this is now fixed
>>>> in the CCL trunk.  If people who can reproduce this could try the trunk,
>>>> that'd be helpful.
>>> I fresh checkout of trunk does the following on both of my virtual machines:
>>> $ ./fx86cl64
>>> Unhandled exception 10 at 0x300001050b6b, context->regs at #x7fffffffd280
>>> received signal 10; faulting address: 0x300001050b6b
>>> ? for help
>>> [88099] Clozure CL kernel debugger:
>>> I installed misc/compat6x as stated in [1].
>> Can you please try it with a locally-built lisp kernel?  That is, do
>> cd lisp-kernel/freebsdx8664 && make clean && make
> My apologies. I did not do this.
> The Core 2 Duo rebuilds fine, but the Core i7 signals the same error as originally reported by Cyrille.
> Mark
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/mailman/listinfo/openmcl-devel

More information about the Openmcl-devel mailing list