[Openmcl-devel] Starting OpenMCL 64

Gary Byers gb at clozure.com
Wed Feb 21 22:21:56 PST 2007



On Wed, 21 Feb 2007, Gary Byers wrote:

> Someone ran into this (at least the same symptom) and reported it on an
> internal mailing list a few days ago.
>
> In their case, the problem was fixed by recompiling the lisp kernel:
>
> shell> cd ccl/lisp-kernel/linuxx8664
> shell> make clean
> shell> make
>
> [If the version of GCC installed is older than 4.0 and GCC 4.x is
> installed, it may be  necessary to do the last step as:
>
> shell> make "CC=gcc4" # if gcc 4.0 is installed as "gcc4"
>
> Older GCC versions seem to have problems compiling OpenMCL's GC
> correctly.
>
> ].
>
>
> The problem seems to stem from the fact that the kernel shipped in the
> 070214 tarball was compiled (and linked) on Fedora Core 6; previous
> versions had used FC5 or earlier.
>
> I don't know what the exact problem is; in the past, there have been
> cases where a kernel linked on some version of some distro will cause
> the dynamic linker on some other distro to complain that the libraries
> it finds at runtime are too old/too new for the kernel.  I'd guess that
> this is a related (but more severe-looking) problem.
>
>

I Googled a bit and think that I understand what's going on.

First of all, an integer division-by-zero is often reported as a
"floating-point exception" (it maps to the same Unix signal - SIGFPE -
and reporting the exception as a floating-point exception is
understandable if not very informative.)

You can get this integer division-by-zero error by compiling a trivial
C program on FC6 and trying to run it on an older system.

A shared object file (shared library or executable linked agains shared
libraries) typically refers to things defined in other libraries by
name, and may export names which other shared objects can use to
refer to things that they define.  (The "things" in question are
usually the addresses of functions and variables; the "names" -
strings - are usually referred to as "symbols.")  In order for
a shared object file to refer to something (function or variable
address) defined in another shared object file, the name of that
external thing has to be resolved to an address.

When a shared object file is executed, the OS runs the dynamic
linker and tells it to prepare that program for execution; this
process involves determining what other shared objects (libraries)
the executable depends on, mapping the executable and all dependent
libraries into memory, determining the addresses of all exported
symbols, and either resolving all references to those symbols
or setting things up so that they can be resolved lazily.

To try to speed this last part up a bit, ELF object files (used on
Linux and FreeBSD and some other platforms) have traditionally defined
a hashing scheme, and a hash table has traditionally been embedded in
ELF shared object files.

People have found that the traditional ELF hashing scheme doesn't
scale well to systems where thousands of symbols need to be resolved
(this is apparently the case for many KDE and GNOME applications.)
New schemes have been developed, and the toolchain in FC6 uses one
or more of these newer schemes by default; the object files produced
by new versions of the GNU linker won't contain the traditional
hash-table data structures.

All well and good, but when such an executable is loaded by an
old version of the dynamic linker it doesn't quite notice that
the old hash-table data structure isn't there and does the C
version of something like:

   (let* ((bucket (mod (hash string) *number-of-buckets*)))
     ...)

At this point, *number-of-buckets* is 0; the MOD operation generates
an exception, and we scratch our heads and wonder who's generating
floating-point exceptions.

The workaround seems to be to tell the GNU linker to include the
traditional hashing data structures in the lisp kernel.



More information about the Openmcl-devel mailing list