[Openmcl-devel] Need advice to debug segfault when running concurrent selects in clsql/postgresql
Paul Meurer
Paul.Meurer at uni.no
Thu Oct 31 02:48:49 PDT 2013
Am 31.10.2013 um 01:15 schrieb Gary Byers <gb at clozure.com>:
> On Wed, 30 Oct 2013, Paul Meurer wrote:
>> I run it now with --no-init and in the shell, with no difference. Immediate failure with :consing in *features*,
>> bogus objects etc. after several rounds without :consing.
>
> So, I can't rant and rave about the sorry state of 3rd-party CL libraries, and
> anyone reading this won't be subjected to me doing so ?
>
> Oh well.
>
> I was able to reproduce the problem by running your test 100 times,
I am not able to provoke it at all on the MacBook, and I tried a lot.
> so apparently
> I won't be able to blame this on some aspect of your machine. (Also unfortunate,
> since my ability to diagnose problems that only occur on 16-core machines depends
> on my ability to borrow such machines for a few months.)
I think you can do without a 16-core machine. I am able to reproduce the failure quite reliably on an older 4-core machine with Xeon CPUs and SuSE, with slightly different code (perhaps to get the timing right):
(dotimes (j 100)
(print (ccl::all-processes))
(dotimes (i 8)
(process-run-function
(format nil "getstring-~a-~a" j i)
(lambda (i)
(let ((list ()))
(dotimes (i 500000)
(push (getstring) list)))
(print i))
i))
(print (list :done j))
(sleep 1))
If you really need a 16-core machine to debug this I can give you access to mine. :-)
>> My machine has 16 true cores and hyperthreading; I am running CentOS 6.0, and a recent CCL 1.9 (I did svn update +
>> rebuild of everything yesterday).
>> I also observed that the problem goes away when I replace the constant string in the library by a freshly
>> allocated string:
>> char *getstring() {?
>> ? int index;
>> ? char *buffer = (char *)calloc(100 + 1, sizeof(char));
>> ? for (index = 0; index < 100; index++) {
>> ? ? ? buffer[index] = 'a';
>> ? ? }
>> ? buffer[100] = '\0';
>> ? return buffer ;
>> }
>> One should expect the strings in the Postgres library to be freshly allocated, but nevertheless they behave like
>> the constant string example.
>
> It's unlikely that this change directly avoids the bug (whatever it is); it's more
> likely that it affects timing (exactly what happens when.) I don't yet know what
> the bug is, but I think that it's likely that it's fair to characterize the bug
> as being "timing-sensitive". (For example: from the GC's point of view, whether
> a thread is running Lisp or foreign code when that thread is suspended by the GC.
> The transition between Lisp and foreign code takes a few instructions, and if
> a thread is suspended in the middle of that instruction sequence and the GC
> misintrprets its state, very bad things like what you're seeing could occur.
> That's not supposed to be possible, but something broadly similar seems to be
> happening.)
--
Paul
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clozure.com/pipermail/openmcl-devel/attachments/20131031/d53cc18d/attachment.htm>
More information about the Openmcl-devel
mailing list