[Openmcl-devel] swank-backend:map-backtrace

Gary Byers gb at clozure.com
Thu Jul 8 09:36:39 PDT 2004



On Thu, 8 Jul 2004, Marco Baringer wrote:

>
> this (see below) happens "fairly" regularly when debugging multi
> threaded apps with slime. the fact that i've _never_ seen it happen
> unless i've got more than a few threads open and the fact that it
> happens more or less randomly mokes me think it's a race condition
> somewhere, but i can't figure out where.

Alan Ruttenberg's been trying to track this down.

The only real difference that I've been able to see between what
MAP-BACKTRACE does and what :B (CCL::PRINT-CALL-HISTORY) does is that
the former sometimes tries to start at a particular "frame address"
(the value of *SWANK-DEBUGGER-STACK-FRAME*.)  I suggested modifying
MAP-BACKTRACE so that it ignores that and always starts at the
"current" frame (CCL::%GET-FRAME-PTR).  (This would make slime's
backtrace contain a few more uninteresting frames, but I'd be interested
in knowing whether that makes the problem go away.)

Speaking of race conditions:

;;; This is in swank-openmcl.lisp
(defmethod ccl::application-error :before (application condition error-pointer)
  (declare (ignore application condition))
  (setq *swank-debugger-stack-frame* error-pointer))

SETQing a special variable that:

a) may not have a thread-specific binding
b) is referenced from multiple threads

can have surprising results.  If

c) the assignments can also happen from multiple threads
d) they happen asynchronously (e.g., in an error handler)

the results can be especially surprising.  (Maybe this could be an
:AROUND method that binds *SWANK-DEBUGGER-STACK-FRAME* around a
call to the next-method.)


> Here's the definiton of map-backtrace i'm using:
>
> (defun map-backtrace (function &optional
>                       (start-frame-number 0)
>                       (end-frame-number most-positive-fixnum))
>   "Call FUNCTION passing information about each stack frame
>  from frames START-FRAME-NUMBER to END-FRAME-NUMBER."
>   (let ((context (backtrace-context))
>         (frame-number 0)
>         (top-stack-frame (or *swank-debugger-stack-frame*
				^^^^^^^^^^^^^^^^^^^^^^^^^^^

What happens if we ignore *SWANK-DEBUGGER-STACK-FRAME* here ?

>                              (ccl::%get-frame-ptr))))
>     (do* ((p top-stack-frame (ccl::parent-frame p context))
>           (q (ccl::last-frame-ptr context)))
>          ((or (null p) (eq p q) (ccl::%stack< q p context))
>           (values))
>       (multiple-value-bind (lfun pc) (ccl::cfp-lfun p)
>         (when lfun
>           (if (and (>= frame-number start-frame-number)
>                    (< frame-number end-frame-number))
>               (funcall function frame-number p context lfun pc))
>           (incf frame-number))))))
>
> is there anything else i can do to debug this?

I've looked at this pretty closely; we'd certainly crash if
*SWANK-DEBUGGER-STACK-FRAME* was bogus (belonged to some other thread's
stack or was stale), and we're otherwise doing pretty much what both
:B and the kernel debugger do without crashing.

If ignoring *SWANK-DEBUGGER-STACK-FRAME* doesn't make the crashes
disappear, I'd have to look closer.  I can certainly imagine scenarios
where two threads invoke the :BEFORE method above and step on the same
value of *SWANK-DEBUGGER-STACK-FRAME*, and whatever thread isn't the
last one to step on that variable will be trying to walk its stack
starting from a location in another thread's stack, and that doesn't
make a lot of sense.

>
> --
> -Marco



More information about the Openmcl-devel mailing list