[Openmcl-devel] threads & crashes
Gary Byers
gb at clozure.com
Sat Jun 11 17:11:52 PDT 2005
On Sun, 12 Jun 2005, Ralf Stoye wrote:
> Hi,
> i have the following problem with Openmcl (tried in 0.14.3 and 14.2-p1) on OS
> X 10.3.8 / G5 Dual 2GHz
> i experience crashes using some packages utilising multithreading
> (one of these is portable aserve, another one a simple client-server system
> we use inhouse)
> reducing it to the basics i seems that openmcl crashes when some threads
> producing garbage (e.g. short-living lists)
> running at the same time
> here are some examples to reproduce this behaviour,
> i would be very happy if anybody could comment on this
> Thanks!
>
> Ralf
>
The good news is that I think that this is fixed in the current
development sources; I was able to do:
(dotimes (k 1000)
(process-run-function '(:name "hello" :priority -1)
#'make-garbage 1000)
(sleep 0.01))
without experiencing any problems.
The bad news is that I think that there are 3 or 4 possible changes
that could have fixed this. The most likely culprit (and the simplest
change) is in the following code, in the function "suspend_tcr()" in
"ccl:lisp-kernel;thread-manager.c":
Boolean
suspend_tcr(TCR *tcr)
{
int suspend_count = atomic_incf(&(tcr->suspend_count));
if (suspend_count == 1) {
#ifdef DARWIN
if (mach_suspend_tcr(tcr)) {
tcr->flags |= TCR_FLAG_BIT_ALT_SUSPEND;
return true;
}
#endif
if (pthread_kill((pthread_t)ptr_from_lispobj(tcr->osid), thread_suspend_signal) == 0) {
It seems to be the case that if a thread has a pending exception when
it's suspended via mach_suspend_tcr(), Mach will send the same exception
message twice. (That's bad.) That can cause the symptom that you reported
and a few others.
I believe that the call to mach_suspend_tcr() was added to 0.14.2-p1
CVS last fall; I'm fairly sure that it wasn't in the released version
of 0.14.2-p1. I'd ordinarily say that this is a fairly likely culprit,
but if you experience this in the released 0.14.2-p1, I'm less confident
of that. (If it's in something built from CVS after 0.14.2-p1 was released,
that's a different story.)
If you conditionalize-out the Darwin-specific code above (e.g.
Boolean
suspend_tcr(TCR *tcr)
{
int suspend_count = atomic_incf(&(tcr->suspend_count));
if (suspend_count == 1) {
#if 0
#ifdef DARWIN
if (mach_suspend_tcr(tcr)) {
tcr->flags |= TCR_FLAG_BIT_ALT_SUSPEND;
return true;
}
#endif
#endif
if (pthread_kill((pthread_t)ptr_from_lispobj(tcr->osid), thread_suspend_signal) == 0) {
and recompile the kernel, does the problem persist ?
More information about the Openmcl-devel
mailing list