[Openmcl-devel] debugging debugging

Thu May 7 16:33:15 PDT 2009

On Thu, 7 May 2009, Alexander Repenning wrote:

> we got early parts of our game engine ported to CCL. Things are going pretty 
> well so far, i.e., event processing and threading working better than ever in 
> MCL, but...
>
> In the IDE version practically 90% of the code is running in some thread 
> other than the main one, e.g., some method dealing with a mouse click, some 
> animation running, some opengl view rendering. No problem as long as there is 
> no bug. The moment there is a problem in any of these event handling, 
> animation or rendering methods one gets the AltConsole. In there I usually do 
> not get a prompt to do anything meaningful. I have to scroll back to see what 
> caused the problem "*** Error in event process:  ...." The functionality is 
> very limited compared to getting the same bug in the main thread with 
> feedback in the listener. I cannot
>
> - get an interactive backtrace: backtrace list with disclosure triangles
> - get restart menus
> - clear/delete the content. Text just piles up. The Edit > Delete menu does 
> not work. A clear button similar to the OS X console would be great
>
> In other words the problem is that in 90% of the cases (code running in non 
> main thread) one gets 10% of the debugging functionality. In MCL a new thread 
> associated Listener would pop up giving the developer the same debugging 
> tools no matter which thread caused the issue.
>
> Could that functionality be added to CCL?

MCL ordinarily did event processing from a distinguished (cooperative)
thread; when that got an error and needed to interact with the user, a
new (cooperative) thread was created and designated as the "standin
event processor", and the event thread created a window and ran a
break loop in that window.  "running a break loop" meant that
-something- was handling updates and responding to keystrokes and
other events while the thread that got the error was printing the
error message and waiting for/responding to user input in a break
loop.  When the break loop was exited, the "standin event processor"
thread was killed, the original event thread resumed event processing
(presumably after the cause of the error was addressed), and things
went back to normal.  At no point was any thread asked to do two
things at once (the original event thread in particular was not asked
to both run a break loop and an event loop).  It's generally hard to
do two things at once, though you can sometimes fake it using timer
interrupts or some scheme like that.

This worked as well as it did (fairly well, in fact) in MCL for a
number of reasons; one of the most important of those reasons is that
Carbon allowed any cooperative thread to receive events (via
#_WaitNextEvent.)  Since only one cooperative thread could run at any
time and since context switch between cooperative threads only
happened at well-defined times, this was generally fairly tractable.
(It was of course possible for one thread to see a mouse-down event
and another to see the corresponding mouse-up; MCL tried to ensure
that only the "designated event processor" thread actually processed
events to avoid serialization issues, but the OS allowed other
approaches to that problem.  As far as I remember, MCL only changed
its notion of the designated event processor when the current event
processor got an error and wanted to enter a break loop, but it could
have just passed this designation around whenever it felt like it.
I'm fairly sure that any MCL thread could just put a (real,
window-system) modal dialog on the screen and run a modal event loop
waiting for a button to be pressed; unless special measures were
taken, that wouldn't introduce serialization issues, since no other
(cooperative) thread could run during that modal event loop.

If the application instead uses native, preemptively-scheduled
threads, the issues related to threads and event processing are
different.  Among other things, if multiple native threads were
allowed to process events, then it'd be possible for thread A to
receive a mouse-up event before thread B has received the
corresponding mouse-down (this can easily happen because of scheduling
issues, even if the events were issued in the right order, there's no
guarantee that multiple threads would completely receive them in a
predicatable order.)  Because there's virtually no way to win (and
many ways to lose) and because it simplifies event delivery (event
messages can be sent by the window server to a thread-specific message
port; replies are only expected from that port), modern Carbon and
Cocoa basically only allow one (native) thread within an application
to do event processing.  (That thread is almost always the initial
thread - the one created by the OS when the application started.
There have been some loopholes and backdoors that have allowed some
other thread to process events instead of the initial thread, but
they've been closed and more will be closed in future OS releases.)  I
know of no way of changing that at runtime (and can't imagine how that
could work), so we're basically (while running Cocoa in CCL with
native threads) in a situtation where the initial thread is the
"designated - by Cocoa - event processor" and there's no way to change
that.

Anyone who's ever looked at the gory details of MCL's event code might
agree that that's mostly a good thing, but it does mean that when an
error occurs in the event thread, we can't do what MCL did (allow the
event thread to enter a break loop while some other thread took over
event-processing responsibilities).  (I hope that anyone who thinks
that we're not doing that because we forgot to or forgot how to is
dissuaded of that belief.)

If the event thread can only do one thing at a time (when it gets an
error or in other contexts) and can't simultaneously run a break loop
in an IDE window and process events pertaining to that and other
windows, we seem to have a limited choice of things that we can do.  I
agree that what we've been doing - just writing a backtrace to the
AltConsole window if it's there or to a logging stream if it isn't, or
just filling up an Emacs buffer with that backtrace, then returning to
process the next event before the beachball cursor appears - is
suboptimal.

What one would ideally like to do is to enter a break loop and have
normal debugging facilities available to you in the IDE.  Doing things
in the IDE generally means being able to process events, and the one
thread that Cocoa allows to process events really wants to be sitting
in a break loop instead and can't easily do two things at the same
time.

Another subpoptimal solution (which may be better than the status quo)
is to choose to do one thing (enter a break loop) and completely ignore
the other (event processing) and enter a break loop with I/O redirected
to the AltConsole window.  (AltConsole is a little ObjC program that
can sort of act as a simple termimal-like device for a remote process,
like the CCL IDE).  Event processing in the lisp GUI application would
stop and you wouldn't have access to GUI backtrace/inspector/other tools
while trying to debug things, but the debugging that you'd be doing would
be taking place in the dynamic context where the error occurred (unlike
the current "backtrace and run") behavior. Yes, this is far from ideal,
but being able to browse around the backtrace and type expressions to
a break loop may make debugging easier than it has been and seems like
it's at least a step in the right direction.

I think that we all agree that it'd be much better to be able to use the
IDE to debug the IDE than to do what we've been doing or to do what I
proposed in the last paragraph.  I hope that people will either agree
that this requires the ability to do two things at the same time or
be able to point out that this is in fact not required for reasons that I'm
not thinking about.  (It's not really necessary to do two things "at the same
time", but it's likely necessary to slice the event thread's time up so that
it transparently alternates between two execution contexts - event processing
and the break loop - somehow.)

If anyone reads this and finds themselves thinking that they want to go back
to the simpler days of cooperative threads, the only thing I can say is "no,
you don't."

>
> alex
>
>
>
> Prof. Alexander Repenning
>
> University of Colorado
> Computer Science Department
> Boulder, CO 80309-430
>
> vCard: http://www.cs.colorado.edu/~ralex/AlexanderRepenning.vcf
>
>