[Openmcl-devel] Need advice to debug segfault when running concurrent selects in clsql/postgresql

Paul Meurer paul.meurer at uni.no
Tue Oct 29 08:14:22 PDT 2013


Hi,

I need some advice on how to further debug the following.

I am consistently observing crashes when I do run concurrent database selects using clsql against a PostgreSQL backend. I am running the newest ccl-1.9 64bit on CentOS, the PostgreSQL library advertises itself as being thread safe. Here is the code I am running:

(dotimes (i 16)
  (ccl:process-run-function
   (format nil "test~d" i)
   (lambda (i)
     (with-database (*default-database* *connection-spec* :if-exists :new)
       (select [text] :from [text-table] :limit 10000)
       (print i)))
   i))

This form can be run several times without problems, but eventually I get a segfault. I tried to debug in gdb, where I see that the crash seems to be GC-related (see below). The crash always happens at the same place in bits.c.

I am aware that this is a complex scenario, where either the db lib, or uffi/clsql, or clozure could be the culprit, and it does not seem to be trivial to boil this down to a minimal case. So I would be grateful if somebody could give me some advice as to what would be the most promising way of nailing down this bug.

----------

? Unhandled exception 11 at 0x412360, context->regs at #x7f3ea52ed538
Exception occurred while executing foreign code
received signal 11; faulting address: 0x307e3f94d000
invalid permissions for mapped object
…

and in gdb:

(gdb) br *0x0000000000412360
Breakpoint 2 at 0x412360: file ../bits.c, line 45.
(gdb) continue
Continuing.
[Switching to Thread 0x7f3ea52ef700 (LWP 3974)]

Breakpoint 2, set_n_bits (bits=<value optimized out>, 
    first=<value optimized out>, n=<value optimized out>) at ../bits.c:45
45	        *wstart++ = ALL_ONES;
1: x/i $pc
=> 0x412360 <set_n_bits+112>:	movq   $0xffffffffffffffff,(%rax)
(gdb) bt
#0  set_n_bits (bits=<value optimized out>, first=<value optimized out>, 
    n=<value optimized out>) at ../bits.c:45
#1  0x000000000041111c in rmark (n=52914162892765) at ../x86-gc.c:770
#2  0x00000000004116fd in mark_root (n=<value optimized out>) at ../x86-gc.c:516
#3  0x0000000000411b05 in mark_ephemeral_root (n=<value optimized out>)
    at ../x86-gc.c:650
#4  0x000000000040bfa2 in mark_memoized_area (a=0x1e926e0, 
    num_memo_dnodes=10288289) at ../gc-common.c:1473
#5  0x000000000040d9f0 in gc (tcr=<value optimized out>, 
    param=<value optimized out>) at ../gc-common.c:1688
#6  0x0000000000412c9b in gc_from_tcr (tcr=<value optimized out>, 
    param=<value optimized out>) at ../x86-exceptions.c:2924
#7  0x0000000000413358 in gc_like_from_xp (xp=<value optimized out>, 
    fun=0x412c70 <gc_from_tcr>, param=0) at ../x86-exceptions.c:2881
#8  0x000000000041341e in gc_from_xp (xp=<value optimized out>, 
    param=<value optimized out>) at ../x86-exceptions.c:2936
#9  0x0000000000414ad1 in allocate_object (xp=0x7f3ea52ee440, bytes_needed=32, 
    disp_from_allocptr=19, tcr=0x7f3ea52ef570, 
    crossed_threshold=<value optimized out>) at ../x86-exceptions.c:204
#10 0x0000000000414b9d in handle_alloc_trap (xp=0x7f3ea52ee440, 
    tcr=0x7f3ea52ef570, notify=0x7f3ea52ee1cc) at ../x86-exceptions.c:644
#11 0x0000000000415552 in handle_exception (signum=<value optimized out>, 
    info=<value optimized out>, context=0x7f3ea52ee440, 
    tcr=<value optimized out>, old_valence=<value optimized out>)
    at ../x86-exceptions.c:1193
#12 0x00000000004157fa in signal_handler (signum=11, info=0x7f3ea52ee7f0, 
    context=0x7f3ea52ee440) at ../x86-exceptions.c:1466
#13 <signal handler called>
#14 0x0000302000bdca65 in ?? ()
#15 0x0000000000000052 in ?? ()

-- 
Best wishes,
Paul




More information about the Openmcl-devel mailing list