[Openmcl-devel] Segmentation fault on linuxppc
Gary Byers
gb at clozure.com
Fri Sep 3 23:12:44 PDT 2004
On Tue, 17 Aug 2004, Gary Byers wrote:
>
>
> On Tue, 17 Aug 2004, Jan Idzikowski wrote:
>
> > i have use all the breakpoints in gdb, but it segfaulted bevor any of this
> > breakpoints obtained
> >
> > but i found that not the kernel 2.6.7 is the problem,
> > the problem is the libc 2.3.3
> >
> > with libc version 2.3.2 works openmcl also with kernel 2.6.7
> >
> > my libc version: /lib/libc.so.6
> > "GNU C Library stable release version 2.3.3, by Roland McGrath et al."
> >
> > do you ever used openmcl with libc 2.3.3
>
Sorry it's taken me so long to get back to this.
I installed libc 2.3.3 in the Gentoo system that I have here.
Gentoo generally installs packages by downloading the source and a set
of Gentoo-specific patches, extracting the source, applying the patches,
etc. As of a couple of weeks ago, libc 2.3.3 was built by applying a
fairly large patch to the 2.3.2 (sic) sources. (I'm not sure why they
aren't building 2.3.3 from 2.3.3 sources ...).
OpenMCL failed in that environment for 2 reasons that I could see:
1) OpenMCL's lisp kernel wants to load parts of itself into very low
memory (starting around address #x3000/#x4000); it uses a custom linker
script to achieve this under LinuxPPC.
The dynamic linker in both 2.3.2 and 2.3.3 tries to read- and write-protect
a (potentially large) range of pages between the end of the
executable segment (the "text section", in unix-ese) and the start of
the data segment. The 2.3.3 dynamic linker that was built under
Gentoo miscalculated the address to use and quietly read- and write-protects
a large range of pages starting at address 0. This miscalculation has the
unfortunate effect of making the program that's being loaded (the lisp
kernel) segfault as soon as the dynamic linker tries to execute it, as you
noticed.
2) I was able to try to work around that and get a bit further; the
next problem seemed to be the the C and pthreads libraries were
compiled to use thread-local storage (TLS) but signal handlers have
"glue" - provided by the pthreads library - that has traditionally
been necessary to overcome Linux kernel limitations. TLS depends on
register-usage conventions that OpenMCL doesn't follow (at least not
while running lisp code: those conventions are re-established during
a foreign function call, and the lisp kernel's signal handlers ensure
that they're followed before any C library functions are called.)
The "glue" code that the pthreads library wraps around signal handlers
assumes that those conventions are in effect when a signal is delivered;
OpenMCL violates that assumption. This is espcially annoying since,
as far as I can tell, that glue should no longer be necessary; that
was certainly one of the goals of a new pthreads library (NPTL).
I installed another Linux distribution (a pre-release of YDL 4.0) that
uses libc 2.3.3 on another machine; OpenMCL seems to run fine in that
environment. I don't know whether YDL's libc 2.3.3 is more recent or
more "official" than the one I built under Gentoo a few weeks ago, but
the whole business of using patched 2.3.2 sources to build a 2.3.3 libc -
and these bugs - make me suspect that.
Of these 2 problems, (1) is pretty clearly just a bug (that one can
reproduce with a simple C program and a linker script.); there isn't a
particularly good workaround (though there -is- a bad one.). (2) is a
bit less clear; other Linux distributions have used TLS without
effectively requiring that the application follow TLS conventions at
all times (whenever a signal could be delivered), and I'm not sure
whether this is intentional on Gentoo's part or not. I think that I
know how to work around the problem.
Gentoo versions of glibc 2.3.4 seem to be on the near horizon. I hope
that that's good news.
More information about the Openmcl-devel
mailing list