[Openmcl-devel] Code profiling?

Fri Aug 27 02:20:41 PDT 2004

On Fri, 27 Aug 2004, David Steuber wrote:

> Hi,
>
> Is there a way besides TIME that I can profile code using Emacs + SLIME
> with OpenMCL?  I didn't see any mention of profiling in the OpenMCL
> docs, but I could have missed it.
>

I was actually working on this earlier today.  (Well, earlier Thursday.)

There's a lot of good news and bad news.  (Forgive me for not asking
which you want to hear first.)

The bad news is that most Unix-like OSes don't provide a
high-resolution timer interrupt to user code.  You can sort of arrange
that (some thread in) your application will get (on Darwin and
LinuxPPC, currently) 100Hz alarm timer interrupts, and some of these
interrupts will actually be delivered to a SIGALRM or SIGPROF handler
in your application.  The same 100Hz clock usually drives the OS scheduler,
and it may be awkward for that scheduler to deliver the timer signal
reliably.   Even if this was 100% reliable, you'd generally have to
profile a lot to get meaningful sampling at 100hz.  (Even if the
sampling interval was only a millisecond, that'd be ... 10 times
better.)

The good news (for OSX users, at least) is that Apple's been working
on a profiling package (CHUD) that has OS kernel support: When a
profiling interrupt occurs (either as the result of a timer or as
the result of an on-chip profiling event), it's handled in the kernel.
(This allows high-resolution interrupt sources without interfering
with the scheduler.)  User-space client programs can configure the
profiling facility and obtain sampling data from the kernel.  The
CHUD package comes with a few such programs; "Shark" is a very slick
graphical profiling tool with lots of handy features.

The bad news is that Shark has (not suprisingly) no built-in knowledge
of how to display the names of OpenMCL functions whose code it encounters.
(It does a fairly good job of identifying the start and end addresses
of those functions, but it's not likely that those address ranges will
be meaningful to most users.  It's possible to ask the lisp to bang
around for a bit and tell you what functions those addresses belong
to, but that isn't quite the same as seeing a nice executive summary
that says "you're spending all of your time in FOO".

The good news is that Shark (at least previous versions of it) had a
mechanism whereby address ranges could be given meaningful symbolic
names, by associating a text file (a ".spatch" file) with a particular
running process.  The same kind of lisp code that can tell you the
name of a function containing a particular fixed code address can
also generate .spatch files for you.

The bad news is that, to the best of my knowledge, the .spatch
mechanism has never worked.  I've stepped through the code in GDB,
and it's full of confusion about the difference between a processes
process ID (PID) and the process ID of its parent (PPID), backwards
date comparisons, and other things that strongly suggest that it's
never worked for anyone inside or outside of Apple.

The good news is that a new version of CHUD was released a few weeks
ago.

The bad news is that the new version (a 4.0 beta release) either came
without Shark documentation or put that documentation in such an obscure
place that I haven't been able to find it.  It's not clear whether this
version tries to support .spatch files or whether they work any better if
so, or whether there's any similar mechanism for teaching Shark how to
identify address ranges meaningfully.

The good news is that I finally convinced myself to stop waiting for
Shark to become more usable and to just develop an in-lisp metering
facility that used the CHUD framework.  I'd started on this path
some time ago (before getting distracted by the idea of making .spatch
files work with Shark) and it kind of, sort of worked.

The bad news is that the CHUD API has changed a bit since that earlier
experiment and I haven't been able to get back to the kind of, sort of
working stage.

The good news is that I don't -think- that it's that far off (the code
seems to be doing the necessary things, but expected samples aren't
showing up where I'd expect them to be.)  The fact that Shark seems
to be able to do time-based sampling suggests that CHUD isn't broken
but my calls to it are.

If that does work, it won't be nearly as slick (or quite as complete)
as Shark, but it should be able to give you a reasonable overview of
where  your application's spending its time.  (I guess that that's
kind of mixed news, which is better than no news at all.)

(If anyone's not aware of it, CHUD's described at

<http://developer.apple.com/performance/#CHUD>

, and can be downloaded from a link there.)