[Openmcl-devel] Problem rebuilding on Linux x8632 w gcc 4.6.2

Erik Pearson erik at adaptations.com
Fri Feb 10 11:01:10 PST 2012


Thanks Gary,

I spent a bit too much time last night, and precious hours of this
morning, installing 64 bit Arch Linux and dev environment, and ran my
little test again, and am happy to report that everything works
swimmingly. On my little, miserly Atom workstation, with the ccl web
server being pounded by siege, I can do all sorts of other things,
like use Chrome to set up my new 64 bit Arch Linode. My quad baby is
at about 90-130% cpu, so plenty of brain power left for other things.

So I think can rule out, on the broad level, the gcc version, although
of course there will be differences between 64 and 32 bit code
generation, conditional pathways, etc.

Thanks for your thoughts, Gary. I'll try tweaks to the gcc settings,
but that may not be for the next couple of daysL  I need to unwind my
personal stack here and get back to the work that this all
interrupted, not to mention the other sides of life like birthday
parties, laundry, and walking dogs.

Erik.




On Fri, Feb 10, 2012 at 7:39 AM, Gary Byers <gb at clozure.com> wrote:
> Something prints as a #<BOGUS OBJECT> because it fails some simple
> consistency checks that the printer applies before printing things.
> Those checks include tests like "if something appears to be in the
> lisp heap, make sure that it's in an allocated part of the heap
> and not in free space", "if it's not in the heap, it had better be
> on a stack", "if its tag bits indicate that the first word at the
> address it's referencing should be a header word which further
> describes the object's type and size, make sure that that header
> word is reasonable" and "if the tag bits say that the first word
> shouldn't be a header, make sure that it isn't.  These tests aren't
> perfect (it's hard to know exactly where another thread's stack's
> bounds are in general, because in general they're changing while
> you look at them), but they catch a lot of things and it's generally
> better to print something as a #<BOGUS OBJECT> than it would be to
> try to print its contents if those tests fail.
>
> There are generally a couple of ways in which something that'd print
> that way can be produced.
>
> 1) By violating a DYNAMIC-EXTENT declaration.
>
> (defun foo ()
>  (let* ((x (cons nil nil)))
>    (declare (dynamic-extent x))
>    x))
>
> A DYNAMIC-EXTENT declaration says that the extent (lifetime) of the
> initial value of the binding to which it applies is coterminous with
> the scope of that binding, so the compiler could stack-allocate the
> value (if it knows how) or otherwise try to exploit that information.
> By returning that value in the code above, we're violating that claim.
> The value returned might print as a BOGUS-OBJECT; it'd generally be
> worse if it didn't (if whatever was later allocated on the stack where
> X used to point happened to look plausible as a CONS cell.)
>
> 2) By introducing some form of memory corruption.
>
> You can do this yourself by doing something incorrect (like storing
> past the end of a vector) in unsafe code; a compiler bug might do this
> for you.  The GC does a lot of reading from and writing to heap memory
> and if it doesn't do this correctly it can introduce memory corruption
> (and in some cases can introduce a lot of it before anything notices
> and starts displaying visible symptoms, so this can be very difficult
> to track down.)
>
> The GC's about the only part of CCL that's written in C and where we
> spend much time; since we generally want you to spend as little time
> there as we can get away with, it (and the rest of the kernel) are
> compiled with -O2 to put a fairly high level of optimization in effect.
> This actually makes a difference: without that, the GC can be measurably
> slower.  If -O2 makes it faster, it might be reasonable to assume that
> higher optimization levels might make it faster still, but those higher
> optimization levels can make code even harder to debug (in GDB) than -O2
> does and they -may- in some cases introduce bugs (some rarely-exercised
> optimizations interfere in some unpredictable way) and -O2 seems to be
> a reasonable compromise.  (Again, this pretty much only affects GC time
> and - depending on what Lisp code is doing - GC time is usually going
> to be a relatively small percentage of total execution time; we're just
> talking about making that relatively small percentage a bit smaller.)
>
> If I was dealing with a major new release of GCC, I might (in some
> combination of fore- and hindsight) be less willing to trust the code
> it produces at -O2; if it was a prerelease (and my hindsight was as
> good as it seems to be at the moment) I might be even less likely
> to trust it.
>
> In the lisp kernel Makefile (I assume that that'd be
> ccl/lisp-kernel/linuxx8632/Makefile in your case), you'll see (maybe a
> screenful or so from the top) a line that says:
>
> COPT = -O2
>
> If you change that to
>
> COPT = # -O2
>
> and then do:
>
> $ cd ccl/lisp-kernel/linuxx8632
> $ make clean
> $ make
>
> you'll get a new ccl/lx86cl where the C code is less aggessively optimized
> (and where the GC might be a little slower as a result); whether or not
> your tests fail in that kernel will tell us a lot about the problem.
>
> I should say (just for the record) that if the presence or absence of
> -O2 seems to be the determining factor, that doesn't necessarily mean
> that the bug's in GCC 4.6; it's entirely possible that there's some
> code in CCL's GC that depends on implementation artifacts (like
> evaluation order) in some way, and GCC 4.6 may just do things in a
> different way than earlier versions did (and some CCL code may be
> implicitly expecting the old behavior.)  There's been a few times over
> the years where GCC optimization levels seemed to affect semantics,
> and I think that in all (or very nearly all) of those cases the
> problem was in our code and not the C compiler.  That said, most Linux
> distributions and other OSes tend to ship with more nominally mature C
> compilers, and the fact that the one that you're using identifies itself
> as a "prerelease" version suggests that outright bugs at -O2 may not
> be totally unexpected.
>
>
>
>
> On Thu, 9 Feb 2012, Erik Pearson wrote:
>
>> HI Fellow CCLispers,
>>
>> To cut to the chase, it looks like rebuilding CCL 1.7 in my specific
>> environment creates a bad boy:
>>
>> The platform is:
>>
>>  Linux archie 3.2.4-1-ARCH #1 SMP PREEMPT Sat Feb 4 11:21:15 UTC 2012
>> i686 Intel(R) Atom(TM) CPU D525 @ 1.80GHz GenuineIntel GNU/Linux
>>
>> I'm using the latest CCL 1.7 from SVN: 1.7-r14925M
>>
>> My current gcc is: gcc (GCC) 4.6.2 20120120 (prerelease)
>>
>> What happens is that after a (rebuild :full t), I get the symptoms
>> described (way) below -- which, to summarize, is predominantly value
>> stack overflow and bogus object in a call to remote-socket-info.
>>
>> The testing that allowed me to come to this conclusion is that I do
>> not encounter this error at all when using the binary from svn. The
>> binary from svn has survived two tests of over 1million page hits with
>> 50 concurrent siege requests (which does not result in 50 concurrent
>> threads -- there are 10, which reach a concurrency in terms of thread
>> execution somewhat below that.) The rebuilt 1.7 and 1.8 fail after
>> several thousand to several hundred thousand.
>>
>> One little problem is that I need to rebuild at least one time because
>> I have a little patch to with-encoded-cstrs (a new version called
>> with-encoded-strings that also returns the length) that I need to have
>> available for production.
>>
>> I compiled the lisp kernel for Linux x86 32 with all warnings, but it
>> has been a long time since I've used gcc ...
>>
>> [epearson at archie linuxx8632]$ vi Makefile
>> [epearson at archie linuxx8632]$ make
>> m4 -DLINUX -DX86 -DX8632 -DHAVE_TLS -I../ ../pad.s | as  --32 -o pad.o
>> m4 -DLINUX -DX86 -DX8632 -DHAVE_TLS -I../ ../x86-spjump32.s | as  --32
>> -o x86-spjump32.o
>> m4 -DLINUX -DX86 -DX8632 -DHAVE_TLS -I../ ../x86-spentry32.s | as
>> --32 -o x86-spentry32.o
>> m4 -DLINUX -DX86 -DX8632 -DHAVE_TLS -I../ ../x86-subprims32.s | as
>> --32 -o x86-subprims32.o
>> cc -include ../platform-linuxx8632.h -c ../pmcl-kernel.c -DLINUX
>> -D_REENTRANT -DX86 -DX8632 -D_GNU_SOURCE -DHAVE_TLS -DUSE_FUTEX
>> -DSVN_REVISION=15201M  -g -O2 -Wall -m32 -o pmcl-kernel.o
>> ../pmcl-kernel.c: In function ?allocate_lisp_stack?:
>> ../pmcl-kernel.c:232:11: warning: variable ?overhead? set but not used
>> [-Wunused-but-set-variable]
>> ../pmcl-kernel.c: In function ?create_reserved_area?:
>> ../pmcl-kernel.c:533:7: warning: variable ?h? set but not used
>> [-Wunused-but-set-variable]
>> ../pmcl-kernel.c: In function ?process_options?:
>> ../pmcl-kernel.c:1171:19: warning: variable ?wval? set but not used
>>
>> [-Wunused-but-set-variable]
>> cc -include ../platform-linuxx8632.h -c ../gc-common.c -DLINUX
>> -D_REENTRANT -DX86 -DX8632 -D_GNU_SOURCE -DHAVE_TLS -DUSE_FUTEX
>> -DSVN_REVISION=15201M  -g -O2 -Wall -m32 -o gc-common.o
>> ../gc-common.c: In function ?gc?:
>> ../gc-common.c:1779:11: warning: format ?%d? expects argument of type
>> ?int?, but argument 4 has type ?__time_t? [-Wformat]
>> ../gc-common.c:1779:11: warning: format ?%d? expects argument of type
>> ?int?, but argument 5 has type ?__suseconds_t? [-Wformat]
>> ../gc-common.c:1784:19: warning: format ?%d? expects argument of type
>> ?int?, but argument 5 has type ?__time_t? [-Wformat]
>> ../gc-common.c:1784:19: warning: format ?%d? expects argument of type
>> ?int?, but argument 6 has type ?__suseconds_t? [-Wformat]
>> ../gc-common.c: In function ?reaphashv?:
>> ../gc-common.c:265:14: warning: ?bitsp? may be used uninitialized in
>> this function [-Wuninitialized]
>> ../gc-common.c:260:34: warning: ?bitidx? may be used uninitialized in
>> this function [-Wuninitialized]
>> ../gc-common.c:259:13: warning: ?bits? may be used uninitialized in
>> this function [-Wuninitialized]
>> ../gc-common.c: In function ?mark_weak_hash_vector?:
>> ../gc-common.c:409:14: warning: ?bitsp? may be used uninitialized in
>> this function [-Wuninitialized]
>> ../gc-common.c:404:34: warning: ?bitidx? may be used uninitialized in
>> this function [-Wuninitialized]
>> ../gc-common.c:403:13: warning: ?bits? may be used uninitialized in
>>
>> this function [-Wuninitialized]
>> cc -include ../platform-linuxx8632.h -c ../x86-gc.c -DLINUX
>> -D_REENTRANT -DX86 -DX8632 -D_GNU_SOURCE -DHAVE_TLS -DUSE_FUTEX
>> -DSVN_REVISION=15201M  -g -O2 -Wall -m32 -o x86-gc.o
>> ../x86-gc.c: In function ?check_all_areas?:
>> ../x86-gc.c:340:5: warning: enumeration value ?AREA_VOID? not handled
>> in switch [-Wswitch]
>> ../x86-gc.c:340:5: warning: enumeration value ?AREA_CSTACK? not
>> handled in switch [-Wswitch]
>> ../x86-gc.c:340:5: warning: enumeration value ?AREA_READONLY? not
>> handled in switch [-Wswitch]
>> ../x86-gc.c:340:5: warning: enumeration value ?AREA_STATIC_CONS? not
>> handled in switch [-Wswitch]
>> ../x86-gc.c: In function ?check_refmap_consistency?:
>> ../x86-gc.c:1161:31: warning: variable ?prev? set but not used
>> [-Wunused-but-set-variable]
>> ../x86-gc.c: In function ?compact_dynamic_heap?:
>> ../x86-gc.c:2020:89: warning: variable ?prev? set but not used
>> [-Wunused-but-set-variable]
>> ../x86-gc.c: In function ?update_managed_refs?:
>> ../x86-gc.c:2754:21: warning: variable ?prev? set but not used
>> [-Wunused-but-set-variable]
>> ../x86-gc.c: In function ?purify?:
>> ../x86-gc.c:2827:11: warning: variable ?new_pure_start? set but not
>>
>> used [-Wunused-but-set-variable]
>> cc -include ../platform-linuxx8632.h -c ../bits.c -DLINUX -D_REENTRANT
>> -DX86 -DX8632 -D_GNU_SOURCE -DHAVE_TLS -DUSE_FUTEX
>> -DSVN_REVISION=15201M  -g -O2 -Wall -m32 -o bits.o
>> cc -include ../platform-linuxx8632.h -c ../x86-exceptions.c -DLINUX
>> -D_REENTRANT -DX86 -DX8632 -D_GNU_SOURCE -DHAVE_TLS -DUSE_FUTEX
>> -DSVN_REVISION=15201M  -g -O2 -Wall -m32 -o x86-exceptions.o
>> ../x86-exceptions.c: In function ?handle_gc_trap?:
>>
>> ../x86-exceptions.c:192:3: warning: dereferencing type-punned pointer
>> will break strict-aliasing rules [-Wstrict-aliasing]
>> ../x86-exceptions.c: In function ?handle_exception?:
>>
>> ../x86-exceptions.c:1155:4: warning: implicit declaration of function
>> ?handle_watch_trap? [-Wimplicit-function-declaration]
>> ../x86-exceptions.c: In function ?signal_handler?:
>> ../x86-exceptions.c:1422:5: warning: format ?%lx? expects argument of
>> type ?long unsigned int?, but argument 5 has type ?natural? [-Wformat]
>> ../x86-exceptions.c:1422:5: warning: format ?%lx? expects argument of
>> type ?long unsigned int?, but argument 6 has type ?unsigned int?
>> [-Wformat]
>> ../x86-exceptions.c:1422:5: warning: format ?%lx? expects argument of
>> type ?long unsigned int?, but argument 5 has type ?natural? [-Wformat]
>> ../x86-exceptions.c:1422:5: warning: format ?%lx? expects argument of
>> type ?long unsigned int?, but argument 6 has type ?unsigned int?
>> [-Wformat]
>> ../x86-exceptions.c: In function ?pc_luser_xp?:
>>
>> ../x86-exceptions.c:2560:9: warning: dereferencing type-punned pointer
>> will break strict-aliasing rules [-Wstrict-aliasing]
>> ../x86-exceptions.c:2607:13: warning: dereferencing type-punned
>> pointer will break strict-aliasing rules [-Wstrict-aliasing]
>> ../x86-exceptions.c: In function ?handle_exception?:
>>
>> ../x86-exceptions.c:1290:1: warning: control reaches end of non-void
>> function [-Wreturn-type]
>> cc -include ../platform-linuxx8632.h -c ../image.c -DLINUX
>> -D_REENTRANT -DX86 -DX8632 -D_GNU_SOURCE -DHAVE_TLS -DUSE_FUTEX
>> -DSVN_REVISION=15201M  -g -O2 -Wall -m32 -o image.o
>> ../image.c: In function ?load_openmcl_image?:
>> ../image.c:372:2: warning: implicit declaration of function ?set_nil?
>>
>> [-Wimplicit-function-declaration]
>> ../image.c:382:2: warning: implicit declaration of function
>> ?make_dynamic_heap_executable? [-Wimplicit-function-declaration]
>>
>> ../image.c:410:9: warning: implicit declaration of function
>> ?lower_heap_start? [-Wimplicit-function-declaration]
>>
>> ../image.c:415:2: warning: implicit declaration of function
>> ?xMakeDataExecutable? [-Wimplicit-function-declaration]
>>
>> cc -include ../platform-linuxx8632.h -c ../thread_manager.c -DLINUX
>> -D_REENTRANT -DX86 -DX8632 -D_GNU_SOURCE -DHAVE_TLS -DUSE_FUTEX
>> -DSVN_REVISION=15201M  -g -O2 -Wall -m32 -o thread_manager.o
>> ../thread_manager.c: In function ?lock_futex?:
>>
>> ../thread_manager.c:271:5: warning: pointer targets in passing
>> argument 1 of ?store_conditional? differ in signedness
>> [-Wpointer-sign]
>> ../thread_manager.c:37:1: note: expected ?natural *? but argument is
>> of type ?signed_natural *?
>> ../thread_manager.c: In function ?sem_wait_forever?:
>>
>> ../thread_manager.c:445:5: warning: implicit declaration of function
>> ?gettimeofday? [-Wimplicit-function-declaration]
>> ../thread_manager.c: In function ?cooperative_thread_startup?:
>>
>> ../thread_manager.c:1719:1: warning: control reaches end of non-void
>> function [-Wreturn-type]
>> ../thread_manager.c: In function ?create_stack?:
>>
>> ../thread_manager.c:1562:1: warning: control reaches end of non-void
>> function [-Wreturn-type]
>> cc -include ../platform-linuxx8632.h -c ../lisp-debug.c -DLINUX
>> -D_REENTRANT -DX86 -DX8632 -D_GNU_SOURCE -DHAVE_TLS -DUSE_FUTEX
>> -DSVN_REVISION=15201M  -g -O2 -Wall -m32 -o lisp-debug.o
>> ../lisp-debug.c: In function ?debug_get_string_value?:
>>
>> ../lisp-debug.c:808:5: warning: implicit declaration of function
>> ?__fpurge? [-Wimplicit-function-declaration]
>> ../lisp-debug.c: In function ?debug_thread_info?:
>> ../lisp-debug.c:887:5: warning: format ?%lx? expects argument of type
>> ?long unsigned int?, but argument 3 has type ?struct TCR *? [-Wformat]
>> ../lisp-debug.c:889:13: warning: format ?%lx? expects argument of type
>> ?long unsigned int?, but argument 3 has type ?char *? [-Wformat]
>> ../lisp-debug.c:889:13: warning: format ?%lx? expects argument of type
>> ?long unsigned int?, but argument 4 has type ?char *? [-Wformat]
>> ../lisp-debug.c:891:13: warning: format ?%lx? expects argument of type
>> ?long unsigned int?, but argument 3 has type ?unsigned int? [-Wformat]
>> ../lisp-debug.c:891:13: warning: format ?%lx? expects argument of type
>> ?long unsigned int?, but argument 4 has type ?unsigned int? [-Wformat]
>> ../lisp-debug.c:903:15: warning: format ?%lx? expects argument of type
>> ?long unsigned int?, but argument 3 has type ?natural? [-Wformat]
>> ../lisp-debug.c: In function ?debug_show_fpu?:
>> ../lisp-debug.c:1111:12: warning: unused variable ?i? [-Wunused-variable]
>> ../lisp-debug.c:1111:8: warning: unused variable ?np? [-Wunused-variable]
>> ../lisp-debug.c:1110:11: warning: unused variable ?dp? [-Wunused-variable]
>> ../lisp-debug.c: In function ?bit_for_regnum?:
>>
>> ../lisp-debug.c:266:1: warning: control reaches end of non-void
>> function [-Wreturn-type]
>> cc -include ../platform-linuxx8632.h -c ../memory.c -DLINUX
>> -D_REENTRANT -DX86 -DX8632 -D_GNU_SOURCE -DHAVE_TLS -DUSE_FUTEX
>> -DSVN_REVISION=15201M  -g -O2 -Wall -m32 -o memory.o
>> ../memory.c: In function ?allocation_failure?:
>> ../memory.c:45:3: warning: format ?%ld? expects argument of type ?long
>> int?, but argument 4 has type ?natural? [-Wformat]
>> ../memory.c: In function ?ReserveMemoryForHeap?:
>>
>> ../memory.c:154:3: warning: implicit declaration of function
>> ?raise_limit? [-Wimplicit-function-declaration]
>>
>> cc -include ../platform-linuxx8632.h -c ../unix-calls.c -DLINUX
>> -D_REENTRANT -DX86 -DX8632 -D_GNU_SOURCE -DHAVE_TLS -DUSE_FUTEX
>> -DSVN_REVISION=15201M  -g -O2 -Wall -m32 -o unix-calls.o
>> m4 -DLINUX -DX86 -DX8632 -DHAVE_TLS -I../ ../x86-asmutils32.s | as
>> --32 -o x86-asmutils32.o
>> m4 -DLINUX -DX86 -DX8632 -DHAVE_TLS -I../ ../imports.s | as  --32 -o
>> imports.o
>> cc -include ../platform-linuxx8632.h -c ../lispdcmd.c -DLINUX
>> -D_REENTRANT -DX86 -DX8632 -D_GNU_SOURCE -DHAVE_TLS -DUSE_FUTEX
>> -DSVN_REVISION=15201M  -g -O2 -Wall -m32 -o lispdcmd.o
>> cc -include ../platform-linuxx8632.h -c ../plprint.c -DLINUX
>> -D_REENTRANT -DX86 -DX8632 -D_GNU_SOURCE -DHAVE_TLS -DUSE_FUTEX
>> -DSVN_REVISION=15201M  -g -O2 -Wall -m32 -o plprint.o
>> cc -include ../platform-linuxx8632.h -c ../plsym.c -DLINUX
>> -D_REENTRANT -DX86 -DX8632 -D_GNU_SOURCE -DHAVE_TLS -DUSE_FUTEX
>> -DSVN_REVISION=15201M  -g -O2 -Wall -m32 -o plsym.o
>> cc -include ../platform-linuxx8632.h -c ../xlbt.c -DLINUX -D_REENTRANT
>> -DX86 -DX8632 -D_GNU_SOURCE -DHAVE_TLS -DUSE_FUTEX
>> -DSVN_REVISION=15201M  -g -O2 -Wall -m32 -o xlbt.o
>> ../xlbt.c: In function ?walk_stack_frames?:
>> ../xlbt.c:108:9: warning: format ?%lx? expects argument of type ?long
>> unsigned int?, but argument 3 has type ?struct lisp_frame *?
>> [-Wformat]
>> ../xlbt.c:118:7: warning: format ?%x? expects argument of type
>> ?unsigned int?, but argument 3 has type ?struct lisp_frame *?
>> [-Wformat]
>> ../xlbt.c:118:7: warning: format ?%x? expects argument of type
>> ?unsigned int?, but argument 4 has type ?struct lisp_frame *?
>> [-Wformat]
>> ../xlbt.c: In function ?plbt_sp?:
>> ../xlbt.c:157:5: warning: format ?%lx? expects argument of type ?long
>> unsigned int?, but argument 3 has type ?struct TCR *? [-Wformat]
>> ../xlbt.c:157:5: warning: format ?%lx? expects argument of type ?long
>> unsigned int?, but argument 4 has type ?void *? [-Wformat]
>> ../xlbt.c:143:19: warning: variable ?cs_area? set but not used
>>
>> [-Wunused-but-set-variable]
>> cc -include ../platform-linuxx8632.h -c ../x86_print.c -DLINUX
>> -D_REENTRANT -DX86 -DX8632 -D_GNU_SOURCE -DHAVE_TLS -DUSE_FUTEX
>> -DSVN_REVISION=15201M  -g -O2 -Wall -m32 -o x86_print.o
>> cc  -m32 -g  -Wl,--export-dynamic "-Wl,--hash-style=sysv" -o
>> ../../lx86cl  pad.o x86-spjump32.o x86-spentry32.o x86-subprims32.o
>> pmcl-kernel.o gc-common.o x86-gc.o bits.o  x86-exceptions.o image.o
>> thread_manager.o lisp-debug.o memory.o unix-calls.o x86-asmutils32.o
>> imports.o lispdcmd.o plprint.o plsym.o xlbt.o x86_print.o -ldl -lm
>> -lpthread
>> [epearson at archie linuxx8632]$
>>
>>
>> My originally symptoms whine is below, but may be irrelevant since
>> this seems to be a rebuild issue.
>>
>> Erik.
>>
>>
>>
>>
>> I'm having a really fun time trying track down an overflow error that
>> keeps cropping up while I'm load testing my web server. It occurs in
>> the latest 1.7 and 1.8 svn codebase.
>> The load testing is via siege on the same machine (I know). The
>> results are variable, but I've only had one test complete without
>> achieving an overflow (1.2 million page hits with a siege concurrency
>> level of 50), otherwise I can usually trigger it within a couple
>> hundred thousand page hits. Higher concurrency results in faster
>> crash.
>> It is a threaded server (thread pooling) with the test set up for 10
>> threads.
>> The stack overflow is normally not fatal to the server -- the thread
>> catches the error and resets itself. However, I've also had memory
>> corruption on in the call to remote-socket-info, with the results
>> being things like:
>>
>> value #<BOGUS object @ #x195ABB5E> is not of the expected type
>> (UNSIGNED-BYTE 32)
>> value 1328550583 is not of the expected type (UNSIGNED-BYTE 32)
>> and once
>> Exception occurred while executing foreign code
>> at create_exception_callback_frame + 400
>> received signal 11; faulting address: 0x6d
>> address not mapped to object
>>
>> Then again, maybe the seg fault was thrown because I turned debugging
>> off (because it was exacerbating the stack overflow).
>>
>> The stack overflow is during some parsing code that is a bit
>> recursive, but not terribly deep at the time. And it runs several
>> hundred thousand times quite fine until one thread throws the error.
>>
>> The platform is:
>>
>>  Linux archie 3.2.4-1-ARCH #1 SMP PREEMPT Sat Feb 4 11:21:15 UTC 2012
>> i686 Intel(R) Atom(TM) CPU D525 @ 1.80GHz GenuineIntel GNU/Linux
>>
>> One clue is that a freshly checked out 1.7-r14925M was the only lisp
>> that was able to handle the siege to completion (1.2 million hits).
>> After the successful run I installed a little harmless patch (which
>> was not invoked by the test code) and did a (rebuild-ccl :full t), and
>> managed to only 300K+ page hits before a seg  fault, which was
>> preceded by several bogus object errors on remote-socket-info.
>>
>> I quite feeling the deep water at this point, but have tried my best
>> to research this before hitting up the list.
>>
>> I've rewritten the server just about from scratch to eliminate old
>> gnarly code (plenty of that).
>>
>> Inspected the code for remote-socket-info and friends, wondering if
>> something allocated and passed out of a function was going out of
>> scope but only sometimes being trapped on the wrong side of a gc
>> (feeling the deep water).
>>
>> Had some theories, most of which were just good learning experiences
>> (one I haven't resolved yet -- whether a pref of an rletted :signed
>> object can be returned from a function?)
>>
>> Now I'm having strange ideas about the recompilation of ccl itself --
>> perhaps the build from svn has a proclaim optimize that the vanilla
>> rebuild-ccl usage doesn't? Is it the compiler version installed on my
>> Arch Linux? gcc (GCC) 4.6.2 20120120 (prerelease). Oh, look, it says
>> "prerelease". I hadn't noticed bfore. I can test with a precompiled
>> 1.7, but 1.8 trunk doesn't work out of the box for me -- I need to
>> rebuild to get asdf to be happy.
>>
>> Any ideas welcome.
>>
>> Thanks,
>> Erik.
>> _______________________________________________
>> Openmcl-devel mailing list
>> Openmcl-devel at clozure.com
>> http://clozure.com/mailman/listinfo/openmcl-devel
>>
>>
>



-- 

--
Erik Pearson
Adaptations



More information about the Openmcl-devel mailing list