[Openmcl-devel] Re: [lambda-gtk-devel] Bug in examples.lisp
Gary Byers
gb at clozure.com
Sun Dec 12 15:43:03 PST 2004
On Sun, 12 Dec 2004 rm at fabula.de wrote:
> On Sun, Dec 05, 2004 at 03:18:46PM -0700, Gary Byers wrote:
>
> > Note that GTK2 is supposed to be "thread aware, but not thread-safe";
> > IIRC, there's a global lock that any thread that touches a GTK data
> > structure has to acquire before doing so. The exact mechanism it's
> > supposed to use is buried in several C preprocessor macros; I think
> > that I once decoded them and found the actual locking primitive and
> > how to find the actual lock object, but didn't write that down ...
> >
> > In any case: I can't tell too much from the backtrace below.
> > What would I need to do to be able to reproduce this ?
>
> Sorry for my late response but i had other work to do. I just
> spent some time trying to narrow down the problem. Right now
> i compiled my own library and implemented callback registration
> and invokation. Unfortunately (?) the callback mechanism seems to
> work fine with my dynamic library - so i guess i have to look closer
> at the gtk2 side. The callback registration _is_ a place where a lot
> of things changed from gtk-1.2 to gtk-2.0. Here's a kernel backtrace:
>
> ? Unhandled exception 11 at 0x00012c78, context->regs at #x30d94cb0
> Read operation to unmapped address 0x7f454c44
Hmm. #x7f454c44 looks like '_ELF', which is what the first 32-bit word
of most ELF-format object files look like.
> In foreign code at address 0x00012c78
> ? for help
> [28607] OpenMCL kernel debugger: b
>
>
> (#x30d95170) #x00013688 : (null) + 79496
> (#x30d951a0) #x0001381C : (null) + 79900
> (#x30d953e0) #x300E4BD4 : (null) + 806243284
> (#x30D95410) #x30D95768 : foreign code (unknown)
> (#x30d95a60) #x3093E7DC : (null) + 814999516
> (#x30d95b20) #x3093D980 : g_signal_emit_valist + 1864
> (#x30d95d70) #x3093DCCC : g_signal_emit + 108
> (#x30d95e00) #x0F015628 : gtk_button_clicked + 132
> (#x30d95e20) #x0F016804 : (null) + 251750404
> (#x30d95e40) #x3093EFF0 : g_cclosure_marshal_VOID__VOID + 176
> (#x30d95e60) #x309298C4 : (null) + 814913732
> (#x30d95e90) #x30929538 : g_closure_invoke + 224
> (#x30d95ed0) #x3093E308 : (null) + 814998280
> (#x30d95f90) #x3093D980 : g_signal_emit_valist + 1864
> (#x30d961e0) #x3093DCCC : g_signal_emit + 108
> (#x30d96270) #x0F015568 : gtk_button_released + 132
> (#x30d96290) #x0F016618 : (null) + 251749912
> (#x30d962b0) #x0F0D62F0 : _gtk_marshal_BOOLEAN__BOXED + 212
> (#x30d962e0) #x309298C4 : (null) + 814913732
> (#x30d96310) #x30929538 : g_closure_invoke + 224
> (#x30d96350) #x3093E404 : (null) + 814998532
> (#x30d96410) #x3093D744 : g_signal_emit_valist + 1292
> (#x30d96660) #x3093DCCC : g_signal_emit + 108
> (#x30d966f0) #x0F1DBEE8 : (null) + 253607656
> (#x30d96710) #x0F0D4670 : gtk_propagate_event + 268
> (#x30d96730) #x0F0D3118 : gtk_main_do_event + 652
> (#x30d96760) #x0F96965C : (null) + 261527132
> (#x30d96780) #x309A9DBC : (null) + 815439292
> (#x30d967d0) #x309AB300 : g_main_context_dispatch + 232
> (#x30d967f0) #x309AB6D0 : (null) + 815445712
> (#x30d96830) #x309ABEDC : g_main_loop_run + 588
> (#x30d96860) #x0F0D281C : gtk_main + 232
>
> Is there a way to set breakpoints on the trampoline function? Or at least
> to get the address of it? I'm pretty new to LISP implementations so i don't
> really know how to drill down my test cases any more - i might try to overload
> g_signal_emit_valist to inspect what's going on from the C side.
The callback trampoline is in the lisp kernel; '_SPeabi_callback' in
"ccl:lisp-kernel;spentry.s".
Setting a breakpoint there (in GDB) is possible, but it's a PITA: GDB
mistakenly believes that all instructions that generate SIGTRAP signals
are caused by its breakpoints, and OpenMCL violates that assumption (the
lisp runtime uses conditional traps for a variety of things, including
heap-overflow checking.)
There are probably two general ways around this:
a) patch GDB so that it recognizes that there are other ways of
generating SIGTRAP besides its breakpoints
b) patch the Linux kernel to generate SIGTRAP on a GDB breakpoint
and something else (SIGILL) when other conditional trap instructions
fire.
I've generally found (b) easier (but I generally don't recompile Linux
kernels as often as some people do.) I would guess that some people
have done (a), and its possible that more recent versions of GDB make
it a non-issue. (When this issue has come up, GDB people tended to blame
kernel people and vice versa; I got in the habit of patching the kernel
because that was easier and more expedient, but I tend to agree that the
problem's really in GDB.)
One thing that you might try is to attach a GDB to the lisp kernel after
it's crashed (into the lisp kernel debugger). Note the PID of the running
pppcl process, do:
shell> gdb /path/to/ccl/ppccl
and
(gdb) attach <PID>
If you then do:
(gdb) info threads
you'll probably find that most threads are in sigwait or sigsuspend or
something; one of them will be trying to read a character from the
keyboard (in response to the kernel debugger's prompt.) If that thread
isn't current in "info threads"'s output, make it current:
(gdb) thread <n>
where <n> is a small integer that shows up in the "info threads" display,
then do:
(gdb) backtrace
GDB's backtrace may do a better job than the lisp kernel's of assigning
names to some of those addresses. I have a hunch that the last few
addresses that we'll see (just before the kernel debugger itself was
entered) will be the lisp kernel trying to recover from an exception
that happened in foreign code; I don't think that we've quite gotten
to the _SPeabi_callback stuff yet.
>
> TIA
> Ralf Mattes
>
>
More information about the Openmcl-devel
mailing list