[Openmcl-devel] Trace/BPT trap in 1.0

Sat Dec 3 17:39:40 PST 2005

I think that I've found the problem, and the executive summary is "Doh!".
There are some constants defined in ccl/lisp-kernel/constants.h

/* (fixnumshift = 2 on PPC32) */
#define TCR_FLAG_BIT_FOREIGN fixnumshift
#define TCR_FLAG_BIT_AWAITING_PRESET (fixnumshift+1)
#define TCR_FLAG_BIT_ALT_SUSPEND (fixnumshift+2)
#define TCR_FLAG_BIT_PROPAGATE_EXCEPTION (fixnumshift+2)

There are two problems:

1) these are supposed to be bit indices, not masks, but there are a few
things that're inconsistent about this.

2) the last two constants (TCR_FLAG_BIT_ALT_SUSPEND and
TCR_FLAG_BIT_PROPAGATE_EXCEPTION) should have distinct values.

Mostly because of the first problem, the lisp kernel function that
handles per-thread exceptions gets confused: the foreign thread
has (1<<TCR_FLAG_BIT_FOREIGN) = (1<<2) = 4 set in the "flags" field
of its thread context record (tcr), and the exception handling
function does:

   if (tcr->flags & TCR_FLAG_BIT_PROPAGATE_EXCEPTION) {
     tcr->flags &= ~TCR_FLAG_BIT_PROPAGATE_EXCEPTION;
     return 17;  /* return a non-zero value */
   }

but it should be doing something like:

   if (tcr->flags & (1<< TCR_FLAG_BIT_PROPAGATE_EXCEPTION)) {
     tcr->flags &= ~(1<<TCR_FLAG_BIT_PROPAGATE_EXCEPTION);
     return 17;  /* return a non-zero value */
   }

It just so happens that the value of TCR_FLAG_BIT_PROPAGATE_EXCEPTION
is equal to the value of (1<<TCR_FLAG_BIT_FOREIGN).  Wackiness ensues;
the exception handler sees a foreign thread, misinterprets it as being
a request to propagate the exception to the next handler (GDB if it's
running, probably nothing, otherwise ...) and adds insult to injury
by clearing the TCR_FLAG_BIT_FOREIGN bit ...

TCR_FLAG_BIT_PROPAGATE_EXCEPTION was added in 1.0; the value is wrong
(conflicts with TCR_FLAG_BIT_ALT_SUSPEND), it's tested for incorrectly
in the lisp exception handler (catch_exception_raise()), and it's
set incorrectly in response to the kernel debugger's (P)ropagate Exception
command.

I don't think that much (if any) lisp code looks at these bits, so it
should be possible to just fix the 3 (or more) places in the kernel
sources that're confused about this and check those fixes into CVS.
Rebuilding the kernel should fix the problem.

On Wed, 30 Nov 2005, todd ingalls wrote:

> Hi I was wondering if anyone could help me diagnose this problem.
>
> In pre 1.0 version of openmcl the code below ran fine. This is a simplified
> version of a high resolution timer.
>
> In 1.0 version of openmcl, i get a Trace/BPT trap error and dppcl dies.
>
> I have traced the problem down to the call to #_PrimeTimeTask, but can't seem
> to figure out what to do from there. Does anyone have any suggestions in
> regards to what I should do next to determine why this is happening?
>
> thanks
>
> PS. I am running OSX 10.3.9.
>
>
> (ccl::open-shared-library "Carbon.framework/Carbon")
> (ccl::use-interface-dir :carbon)
>
> (progn
>  (defparameter *tmtask* nil)
>  (defparameter *counter* 0)
>  (if *tmtask*
>      (progn
> 	(#_RemoveTimeTask *tmtask*)
> 	(#_DisposeTimerUPP (ccl::pref *tmtask* :<TMT>ask.tm<A>ddr))))
>  (setf *tmtask* (ccl::make-record :<TMT>ask))
>  (setf (ccl::pref *tmtask* :<TMT>ask.tm<W>ake<U>p) 0)
>  (setf (ccl::pref *tmtask* :<TMT>ask.tm<R>eserved) 0)
>
>  (ccl::defcallback time-task-callback (:<TMT>ask<P>tr tmTaskPtr)
>    (if (> (incf *counter*) 100)
> 	(progn
> 	  (#_RemoveTimeTask tmTaskPtr)
> 	  (#_DisposeTimerUPP (ccl::pref tmTaskPtr :<TMT>ask.tm<A>ddr)))
>      (#_PrimeTime tmTaskPtr 1)))
>
>  (setf (ccl::pref *tmtask* :<TMT>ask.tm<A>ddr)
> 	(#_NewTimerUPP time-task-callback))
>  (#_InstallXTimeTask *tmtask*)
>  (#_PrimeTime *tmtask* 1))
>
>
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/mailman/listinfo/openmcl-devel
>
>