[Openmcl-devel] Need advice to debug segfault when running concurrent selects in clsql/postgresql

Tue Oct 29 09:56:29 PDT 2013

Am 29.10.2013 um 16:18 schrieb Ralf Mattes <rm at seid-online.de>:

> On Tue, Oct 29, 2013 at 04:14:22PM +0100, Paul Meurer wrote:
>> Hi,
>> 
>> I need some advice on how to further debug the following.
>> 
>> I am consistently observing crashes when I do run concurrent database selects using clsql against a PostgreSQL backend. I am running the newest ccl-1.9 64bit on CentOS, the PostgreSQL library advertises itself as being thread safe. Here is the code I am running:
>> 
>> (dotimes (i 16)
>>  (ccl:process-run-function
>>   (format nil "test~d" i)
>>   (lambda (i)
>>     (with-database (*default-database* *connection-spec* :if-exists :new)
>>       (select [text] :from [text-table] :limit 10000)
>>       (print i)))
>>   i))
>> 
>> This form can be run several times without problems, but eventually I get a segfault. I tried to debug in gdb, where I see that the crash seems to be GC-related (see below). The crash always happens at the same place in bits.c.
>> 
>> I am aware that this is a complex scenario, where either the db lib, or uffi/clsql, or clozure could be the culprit, and it does not seem to be trivial to boil this down to a minimal case. So I would be grateful if somebody could give me some advice as to what would be the most promising way of nailing down this bug.
> 
> I'm afraid it's none of the above - what makes you think that you can simultaneous queries over
> the same shared connection?

Nothing.

>> From the fine docs [http://www.postgresql.org/docs/9.3/static/ecpg-connect.html]
> 
> " If your application uses multiple threads of execution, they cannot share a connection 
>  concurrently. You must either explicitly control access to the connection (using mutexes) 
>  or use a connection for each thread."
> 
> You need to set up a connection per thread of execution (or use one from a pool, but then
> you need to protect shared access to the pool with a mutex).

Of course I am using a dedicated connection per thread. That's done by specifying :new to :if-exists in the with-database() macro. I could have achieved the same by using a connection pool abstraction (that's what I am doing in my application) with the same effect. (Using one connection for all threads immediately gives you statement-out-of-order errors from the database.)

Concurrent execution works fine otherwise in my app, where I am using it all over the place, but there, selects come in more sporadically. Only when heavy fetching is done simultaneously in many threads this error seems to show up.

> HTH Ralf Mattes

>> ----------
>> 
>> ? Unhandled exception 11 at 0x412360, context->regs at #x7f3ea52ed538
>> Exception occurred while executing foreign code
>> received signal 11; faulting address: 0x307e3f94d000
>> invalid permissions for mapped object
>> ?
>> 
>> and in gdb:
>> 
>> (gdb) br *0x0000000000412360
>> Breakpoint 2 at 0x412360: file ../bits.c, line 45.
>> (gdb) continue
>> Continuing.
>> [Switching to Thread 0x7f3ea52ef700 (LWP 3974)]
>> 
>> Breakpoint 2, set_n_bits (bits=<value optimized out>, 
>>    first=<value optimized out>, n=<value optimized out>) at ../bits.c:45
>> 45	        *wstart++ = ALL_ONES;
>> 1: x/i $pc
>> => 0x412360 <set_n_bits+112>:	movq   $0xffffffffffffffff,(%rax)
>> (gdb) bt
>> #0  set_n_bits (bits=<value optimized out>, first=<value optimized out>, 
>>    n=<value optimized out>) at ../bits.c:45
>> #1  0x000000000041111c in rmark (n=52914162892765) at ../x86-gc.c:770
>> #2  0x00000000004116fd in mark_root (n=<value optimized out>) at ../x86-gc.c:516
>> #3  0x0000000000411b05 in mark_ephemeral_root (n=<value optimized out>)
>>    at ../x86-gc.c:650
>> #4  0x000000000040bfa2 in mark_memoized_area (a=0x1e926e0, 
>>    num_memo_dnodes=10288289) at ../gc-common.c:1473
>> #5  0x000000000040d9f0 in gc (tcr=<value optimized out>, 
>>    param=<value optimized out>) at ../gc-common.c:1688
>> #6  0x0000000000412c9b in gc_from_tcr (tcr=<value optimized out>, 
>>    param=<value optimized out>) at ../x86-exceptions.c:2924
>> #7  0x0000000000413358 in gc_like_from_xp (xp=<value optimized out>, 
>>    fun=0x412c70 <gc_from_tcr>, param=0) at ../x86-exceptions.c:2881
>> #8  0x000000000041341e in gc_from_xp (xp=<value optimized out>, 
>>    param=<value optimized out>) at ../x86-exceptions.c:2936
>> #9  0x0000000000414ad1 in allocate_object (xp=0x7f3ea52ee440, bytes_needed=32, 
>>    disp_from_allocptr=19, tcr=0x7f3ea52ef570, 
>>    crossed_threshold=<value optimized out>) at ../x86-exceptions.c:204
>> #10 0x0000000000414b9d in handle_alloc_trap (xp=0x7f3ea52ee440, 
>>    tcr=0x7f3ea52ef570, notify=0x7f3ea52ee1cc) at ../x86-exceptions.c:644
>> #11 0x0000000000415552 in handle_exception (signum=<value optimized out>, 
>>    info=<value optimized out>, context=0x7f3ea52ee440, 
>>    tcr=<value optimized out>, old_valence=<value optimized out>)
>>    at ../x86-exceptions.c:1193
>> #12 0x00000000004157fa in signal_handler (signum=11, info=0x7f3ea52ee7f0, 
>>    context=0x7f3ea52ee440) at ../x86-exceptions.c:1466
>> #13 <signal handler called>
>> #14 0x0000302000bdca65 in ?? ()
>> #15 0x0000000000000052 in ?? ()
>> 
>> -- 
>> Best wishes,
>> Paul
>> 
>> _______________________________________________
>> Openmcl-devel mailing list
>> Openmcl-devel at clozure.com
>> http://clozure.com/mailman/listinfo/openmcl-devel

-- 
Paul

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clozure.com/pipermail/openmcl-devel/attachments/20131029/84ab69e2/attachment.htm>