[Openmcl-devel] Problem rebuilding on Linux x8632 w gcc 4.6.2

Erik Pearson erik at adaptations.com
Thu Feb 9 14:23:35 PST 2012


HI Fellow CCLispers,

To cut to the chase, it looks like rebuilding CCL 1.7 in my specific
environment creates a bad boy:

The platform is:

  Linux archie 3.2.4-1-ARCH #1 SMP PREEMPT Sat Feb 4 11:21:15 UTC 2012
i686 Intel(R) Atom(TM) CPU D525 @ 1.80GHz GenuineIntel GNU/Linux

I'm using the latest CCL 1.7 from SVN: 1.7-r14925M

My current gcc is: gcc (GCC) 4.6.2 20120120 (prerelease)

What happens is that after a (rebuild :full t), I get the symptoms
described (way) below -- which, to summarize, is predominantly value
stack overflow and bogus object in a call to remote-socket-info.

The testing that allowed me to come to this conclusion is that I do
not encounter this error at all when using the binary from svn. The
binary from svn has survived two tests of over 1million page hits with
50 concurrent siege requests (which does not result in 50 concurrent
threads -- there are 10, which reach a concurrency in terms of thread
execution somewhat below that.) The rebuilt 1.7 and 1.8 fail after
several thousand to several hundred thousand.

One little problem is that I need to rebuild at least one time because
I have a little patch to with-encoded-cstrs (a new version called
with-encoded-strings that also returns the length) that I need to have
available for production.

I compiled the lisp kernel for Linux x86 32 with all warnings, but it
has been a long time since I've used gcc ...

[epearson at archie linuxx8632]$ vi Makefile
[epearson at archie linuxx8632]$ make
m4 -DLINUX -DX86 -DX8632 -DHAVE_TLS -I../ ../pad.s | as  --32 -o pad.o
m4 -DLINUX -DX86 -DX8632 -DHAVE_TLS -I../ ../x86-spjump32.s | as  --32
-o x86-spjump32.o
m4 -DLINUX -DX86 -DX8632 -DHAVE_TLS -I../ ../x86-spentry32.s | as
--32 -o x86-spentry32.o
m4 -DLINUX -DX86 -DX8632 -DHAVE_TLS -I../ ../x86-subprims32.s | as
--32 -o x86-subprims32.o
cc -include ../platform-linuxx8632.h -c ../pmcl-kernel.c -DLINUX
-D_REENTRANT -DX86 -DX8632 -D_GNU_SOURCE -DHAVE_TLS -DUSE_FUTEX
-DSVN_REVISION=15201M  -g -O2 -Wall -m32 -o pmcl-kernel.o
../pmcl-kernel.c: In function ‘allocate_lisp_stack’:
../pmcl-kernel.c:232:11: warning: variable ‘overhead’ set but not used
[-Wunused-but-set-variable]
../pmcl-kernel.c: In function ‘create_reserved_area’:
../pmcl-kernel.c:533:7: warning: variable ‘h’ set but not used
[-Wunused-but-set-variable]
../pmcl-kernel.c: In function ‘process_options’:
../pmcl-kernel.c:1171:19: warning: variable ‘wval’ set but not used
[-Wunused-but-set-variable]
cc -include ../platform-linuxx8632.h -c ../gc-common.c -DLINUX
-D_REENTRANT -DX86 -DX8632 -D_GNU_SOURCE -DHAVE_TLS -DUSE_FUTEX
-DSVN_REVISION=15201M  -g -O2 -Wall -m32 -o gc-common.o
../gc-common.c: In function ‘gc’:
../gc-common.c:1779:11: warning: format ‘%d’ expects argument of type
‘int’, but argument 4 has type ‘__time_t’ [-Wformat]
../gc-common.c:1779:11: warning: format ‘%d’ expects argument of type
‘int’, but argument 5 has type ‘__suseconds_t’ [-Wformat]
../gc-common.c:1784:19: warning: format ‘%d’ expects argument of type
‘int’, but argument 5 has type ‘__time_t’ [-Wformat]
../gc-common.c:1784:19: warning: format ‘%d’ expects argument of type
‘int’, but argument 6 has type ‘__suseconds_t’ [-Wformat]
../gc-common.c: In function ‘reaphashv’:
../gc-common.c:265:14: warning: ‘bitsp’ may be used uninitialized in
this function [-Wuninitialized]
../gc-common.c:260:34: warning: ‘bitidx’ may be used uninitialized in
this function [-Wuninitialized]
../gc-common.c:259:13: warning: ‘bits’ may be used uninitialized in
this function [-Wuninitialized]
../gc-common.c: In function ‘mark_weak_hash_vector’:
../gc-common.c:409:14: warning: ‘bitsp’ may be used uninitialized in
this function [-Wuninitialized]
../gc-common.c:404:34: warning: ‘bitidx’ may be used uninitialized in
this function [-Wuninitialized]
../gc-common.c:403:13: warning: ‘bits’ may be used uninitialized in
this function [-Wuninitialized]
cc -include ../platform-linuxx8632.h -c ../x86-gc.c -DLINUX
-D_REENTRANT -DX86 -DX8632 -D_GNU_SOURCE -DHAVE_TLS -DUSE_FUTEX
-DSVN_REVISION=15201M  -g -O2 -Wall -m32 -o x86-gc.o
../x86-gc.c: In function ‘check_all_areas’:
../x86-gc.c:340:5: warning: enumeration value ‘AREA_VOID’ not handled
in switch [-Wswitch]
../x86-gc.c:340:5: warning: enumeration value ‘AREA_CSTACK’ not
handled in switch [-Wswitch]
../x86-gc.c:340:5: warning: enumeration value ‘AREA_READONLY’ not
handled in switch [-Wswitch]
../x86-gc.c:340:5: warning: enumeration value ‘AREA_STATIC_CONS’ not
handled in switch [-Wswitch]
../x86-gc.c: In function ‘check_refmap_consistency’:
../x86-gc.c:1161:31: warning: variable ‘prev’ set but not used
[-Wunused-but-set-variable]
../x86-gc.c: In function ‘compact_dynamic_heap’:
../x86-gc.c:2020:89: warning: variable ‘prev’ set but not used
[-Wunused-but-set-variable]
../x86-gc.c: In function ‘update_managed_refs’:
../x86-gc.c:2754:21: warning: variable ‘prev’ set but not used
[-Wunused-but-set-variable]
../x86-gc.c: In function ‘purify’:
../x86-gc.c:2827:11: warning: variable ‘new_pure_start’ set but not
used [-Wunused-but-set-variable]
cc -include ../platform-linuxx8632.h -c ../bits.c -DLINUX -D_REENTRANT
-DX86 -DX8632 -D_GNU_SOURCE -DHAVE_TLS -DUSE_FUTEX
-DSVN_REVISION=15201M  -g -O2 -Wall -m32 -o bits.o
cc -include ../platform-linuxx8632.h -c ../x86-exceptions.c -DLINUX
-D_REENTRANT -DX86 -DX8632 -D_GNU_SOURCE -DHAVE_TLS -DUSE_FUTEX
-DSVN_REVISION=15201M  -g -O2 -Wall -m32 -o x86-exceptions.o
../x86-exceptions.c: In function ‘handle_gc_trap’:
../x86-exceptions.c:192:3: warning: dereferencing type-punned pointer
will break strict-aliasing rules [-Wstrict-aliasing]
../x86-exceptions.c: In function ‘handle_exception’:
../x86-exceptions.c:1155:4: warning: implicit declaration of function
‘handle_watch_trap’ [-Wimplicit-function-declaration]
../x86-exceptions.c: In function ‘signal_handler’:
../x86-exceptions.c:1422:5: warning: format ‘%lx’ expects argument of
type ‘long unsigned int’, but argument 5 has type ‘natural’ [-Wformat]
../x86-exceptions.c:1422:5: warning: format ‘%lx’ expects argument of
type ‘long unsigned int’, but argument 6 has type ‘unsigned int’
[-Wformat]
../x86-exceptions.c:1422:5: warning: format ‘%lx’ expects argument of
type ‘long unsigned int’, but argument 5 has type ‘natural’ [-Wformat]
../x86-exceptions.c:1422:5: warning: format ‘%lx’ expects argument of
type ‘long unsigned int’, but argument 6 has type ‘unsigned int’
[-Wformat]
../x86-exceptions.c: In function ‘pc_luser_xp’:
../x86-exceptions.c:2560:9: warning: dereferencing type-punned pointer
will break strict-aliasing rules [-Wstrict-aliasing]
../x86-exceptions.c:2607:13: warning: dereferencing type-punned
pointer will break strict-aliasing rules [-Wstrict-aliasing]
../x86-exceptions.c: In function ‘handle_exception’:
../x86-exceptions.c:1290:1: warning: control reaches end of non-void
function [-Wreturn-type]
cc -include ../platform-linuxx8632.h -c ../image.c -DLINUX
-D_REENTRANT -DX86 -DX8632 -D_GNU_SOURCE -DHAVE_TLS -DUSE_FUTEX
-DSVN_REVISION=15201M  -g -O2 -Wall -m32 -o image.o
../image.c: In function ‘load_openmcl_image’:
../image.c:372:2: warning: implicit declaration of function ‘set_nil’
[-Wimplicit-function-declaration]
../image.c:382:2: warning: implicit declaration of function
‘make_dynamic_heap_executable’ [-Wimplicit-function-declaration]
../image.c:410:9: warning: implicit declaration of function
‘lower_heap_start’ [-Wimplicit-function-declaration]
../image.c:415:2: warning: implicit declaration of function
‘xMakeDataExecutable’ [-Wimplicit-function-declaration]
cc -include ../platform-linuxx8632.h -c ../thread_manager.c -DLINUX
-D_REENTRANT -DX86 -DX8632 -D_GNU_SOURCE -DHAVE_TLS -DUSE_FUTEX
-DSVN_REVISION=15201M  -g -O2 -Wall -m32 -o thread_manager.o
../thread_manager.c: In function ‘lock_futex’:
../thread_manager.c:271:5: warning: pointer targets in passing
argument 1 of ‘store_conditional’ differ in signedness
[-Wpointer-sign]
../thread_manager.c:37:1: note: expected ‘natural *’ but argument is
of type ‘signed_natural *’
../thread_manager.c: In function ‘sem_wait_forever’:
../thread_manager.c:445:5: warning: implicit declaration of function
‘gettimeofday’ [-Wimplicit-function-declaration]
../thread_manager.c: In function ‘cooperative_thread_startup’:
../thread_manager.c:1719:1: warning: control reaches end of non-void
function [-Wreturn-type]
../thread_manager.c: In function ‘create_stack’:
../thread_manager.c:1562:1: warning: control reaches end of non-void
function [-Wreturn-type]
cc -include ../platform-linuxx8632.h -c ../lisp-debug.c -DLINUX
-D_REENTRANT -DX86 -DX8632 -D_GNU_SOURCE -DHAVE_TLS -DUSE_FUTEX
-DSVN_REVISION=15201M  -g -O2 -Wall -m32 -o lisp-debug.o
../lisp-debug.c: In function ‘debug_get_string_value’:
../lisp-debug.c:808:5: warning: implicit declaration of function
‘__fpurge’ [-Wimplicit-function-declaration]
../lisp-debug.c: In function ‘debug_thread_info’:
../lisp-debug.c:887:5: warning: format ‘%lx’ expects argument of type
‘long unsigned int’, but argument 3 has type ‘struct TCR *’ [-Wformat]
../lisp-debug.c:889:13: warning: format ‘%lx’ expects argument of type
‘long unsigned int’, but argument 3 has type ‘char *’ [-Wformat]
../lisp-debug.c:889:13: warning: format ‘%lx’ expects argument of type
‘long unsigned int’, but argument 4 has type ‘char *’ [-Wformat]
../lisp-debug.c:891:13: warning: format ‘%lx’ expects argument of type
‘long unsigned int’, but argument 3 has type ‘unsigned int’ [-Wformat]
../lisp-debug.c:891:13: warning: format ‘%lx’ expects argument of type
‘long unsigned int’, but argument 4 has type ‘unsigned int’ [-Wformat]
../lisp-debug.c:903:15: warning: format ‘%lx’ expects argument of type
‘long unsigned int’, but argument 3 has type ‘natural’ [-Wformat]
../lisp-debug.c: In function ‘debug_show_fpu’:
../lisp-debug.c:1111:12: warning: unused variable ‘i’ [-Wunused-variable]
../lisp-debug.c:1111:8: warning: unused variable ‘np’ [-Wunused-variable]
../lisp-debug.c:1110:11: warning: unused variable ‘dp’ [-Wunused-variable]
../lisp-debug.c: In function ‘bit_for_regnum’:
../lisp-debug.c:266:1: warning: control reaches end of non-void
function [-Wreturn-type]
cc -include ../platform-linuxx8632.h -c ../memory.c -DLINUX
-D_REENTRANT -DX86 -DX8632 -D_GNU_SOURCE -DHAVE_TLS -DUSE_FUTEX
-DSVN_REVISION=15201M  -g -O2 -Wall -m32 -o memory.o
../memory.c: In function ‘allocation_failure’:
../memory.c:45:3: warning: format ‘%ld’ expects argument of type ‘long
int’, but argument 4 has type ‘natural’ [-Wformat]
../memory.c: In function ‘ReserveMemoryForHeap’:
../memory.c:154:3: warning: implicit declaration of function
‘raise_limit’ [-Wimplicit-function-declaration]
cc -include ../platform-linuxx8632.h -c ../unix-calls.c -DLINUX
-D_REENTRANT -DX86 -DX8632 -D_GNU_SOURCE -DHAVE_TLS -DUSE_FUTEX
-DSVN_REVISION=15201M  -g -O2 -Wall -m32 -o unix-calls.o
m4 -DLINUX -DX86 -DX8632 -DHAVE_TLS -I../ ../x86-asmutils32.s | as
--32 -o x86-asmutils32.o
m4 -DLINUX -DX86 -DX8632 -DHAVE_TLS -I../ ../imports.s | as  --32 -o imports.o
cc -include ../platform-linuxx8632.h -c ../lispdcmd.c -DLINUX
-D_REENTRANT -DX86 -DX8632 -D_GNU_SOURCE -DHAVE_TLS -DUSE_FUTEX
-DSVN_REVISION=15201M  -g -O2 -Wall -m32 -o lispdcmd.o
cc -include ../platform-linuxx8632.h -c ../plprint.c -DLINUX
-D_REENTRANT -DX86 -DX8632 -D_GNU_SOURCE -DHAVE_TLS -DUSE_FUTEX
-DSVN_REVISION=15201M  -g -O2 -Wall -m32 -o plprint.o
cc -include ../platform-linuxx8632.h -c ../plsym.c -DLINUX
-D_REENTRANT -DX86 -DX8632 -D_GNU_SOURCE -DHAVE_TLS -DUSE_FUTEX
-DSVN_REVISION=15201M  -g -O2 -Wall -m32 -o plsym.o
cc -include ../platform-linuxx8632.h -c ../xlbt.c -DLINUX -D_REENTRANT
-DX86 -DX8632 -D_GNU_SOURCE -DHAVE_TLS -DUSE_FUTEX
-DSVN_REVISION=15201M  -g -O2 -Wall -m32 -o xlbt.o
../xlbt.c: In function ‘walk_stack_frames’:
../xlbt.c:108:9: warning: format ‘%lx’ expects argument of type ‘long
unsigned int’, but argument 3 has type ‘struct lisp_frame *’
[-Wformat]
../xlbt.c:118:7: warning: format ‘%x’ expects argument of type
‘unsigned int’, but argument 3 has type ‘struct lisp_frame *’
[-Wformat]
../xlbt.c:118:7: warning: format ‘%x’ expects argument of type
‘unsigned int’, but argument 4 has type ‘struct lisp_frame *’
[-Wformat]
../xlbt.c: In function ‘plbt_sp’:
../xlbt.c:157:5: warning: format ‘%lx’ expects argument of type ‘long
unsigned int’, but argument 3 has type ‘struct TCR *’ [-Wformat]
../xlbt.c:157:5: warning: format ‘%lx’ expects argument of type ‘long
unsigned int’, but argument 4 has type ‘void *’ [-Wformat]
../xlbt.c:143:19: warning: variable ‘cs_area’ set but not used
[-Wunused-but-set-variable]
cc -include ../platform-linuxx8632.h -c ../x86_print.c -DLINUX
-D_REENTRANT -DX86 -DX8632 -D_GNU_SOURCE -DHAVE_TLS -DUSE_FUTEX
-DSVN_REVISION=15201M  -g -O2 -Wall -m32 -o x86_print.o
cc  -m32 -g  -Wl,--export-dynamic "-Wl,--hash-style=sysv" -o
../../lx86cl  pad.o x86-spjump32.o x86-spentry32.o x86-subprims32.o
pmcl-kernel.o gc-common.o x86-gc.o bits.o  x86-exceptions.o image.o
thread_manager.o lisp-debug.o memory.o unix-calls.o x86-asmutils32.o
imports.o lispdcmd.o plprint.o plsym.o xlbt.o x86_print.o -ldl -lm
-lpthread
[epearson at archie linuxx8632]$


My originally symptoms whine is below, but may be irrelevant since
this seems to be a rebuild issue.

Erik.




I'm having a really fun time trying track down an overflow error that
keeps cropping up while I'm load testing my web server. It occurs in
the latest 1.7 and 1.8 svn codebase.
The load testing is via siege on the same machine (I know). The
results are variable, but I've only had one test complete without
achieving an overflow (1.2 million page hits with a siege concurrency
level of 50), otherwise I can usually trigger it within a couple
hundred thousand page hits. Higher concurrency results in faster
crash.
It is a threaded server (thread pooling) with the test set up for 10 threads.
The stack overflow is normally not fatal to the server -- the thread
catches the error and resets itself. However, I've also had memory
corruption on in the call to remote-socket-info, with the results
being things like:

value #<BOGUS object @ #x195ABB5E> is not of the expected type
(UNSIGNED-BYTE 32)
value 1328550583 is not of the expected type (UNSIGNED-BYTE 32)
and once
Exception occurred while executing foreign code
 at create_exception_callback_frame + 400
received signal 11; faulting address: 0x6d
address not mapped to object

Then again, maybe the seg fault was thrown because I turned debugging
off (because it was exacerbating the stack overflow).

The stack overflow is during some parsing code that is a bit
recursive, but not terribly deep at the time. And it runs several
hundred thousand times quite fine until one thread throws the error.

The platform is:

  Linux archie 3.2.4-1-ARCH #1 SMP PREEMPT Sat Feb 4 11:21:15 UTC 2012
i686 Intel(R) Atom(TM) CPU D525 @ 1.80GHz GenuineIntel GNU/Linux

One clue is that a freshly checked out 1.7-r14925M was the only lisp
that was able to handle the siege to completion (1.2 million hits).
After the successful run I installed a little harmless patch (which
was not invoked by the test code) and did a (rebuild-ccl :full t), and
managed to only 300K+ page hits before a seg  fault, which was
preceded by several bogus object errors on remote-socket-info.

I quite feeling the deep water at this point, but have tried my best
to research this before hitting up the list.

I've rewritten the server just about from scratch to eliminate old
gnarly code (plenty of that).

Inspected the code for remote-socket-info and friends, wondering if
something allocated and passed out of a function was going out of
scope but only sometimes being trapped on the wrong side of a gc
(feeling the deep water).

Had some theories, most of which were just good learning experiences
(one I haven't resolved yet -- whether a pref of an rletted :signed
object can be returned from a function?)

Now I'm having strange ideas about the recompilation of ccl itself --
perhaps the build from svn has a proclaim optimize that the vanilla
rebuild-ccl usage doesn't? Is it the compiler version installed on my
Arch Linux? gcc (GCC) 4.6.2 20120120 (prerelease). Oh, look, it says
"prerelease". I hadn't noticed bfore. I can test with a precompiled
1.7, but 1.8 trunk doesn't work out of the box for me -- I need to
rebuild to get asdf to be happy.

Any ideas welcome.

Thanks,
Erik.



More information about the Openmcl-devel mailing list