[Openmcl-devel] unstable computing times with transient classes

Gary Byers gb at clozure.com
Sun Aug 17 06:46:12 PDT 2008


Naturally, when I ran your example on a machine that I could do profiling
on, I didn't see quite the variance between runs as you saw (or even as
much as I'd seen on another machine.)  I did see some (and was able
to get some profiling output), which seems to explain what's going on.
These numbers were obtained on a fairly fast x86-64 Linux box.

;;; Round 0 ;;;
(GROVE-TO-ANONYMOUS-INSTANCE GROVE) took 267,797 microseconds (0.267797 seconds) to run
                     with 4 available CPU cores.
During that period, 256,961 microseconds (0.256961 seconds) were spent in user mode
                     7,999 microseconds (0.007999 seconds) were spent in system mode
36,605 microseconds (0.036605 seconds) was spent in GC.
  11,672,272 bytes of memory allocated.
  2,213 minor page faults, 0 major page faults, 0 swaps.

and the associated profiling info, showing only those functions that
accounted for 1% or more of the total time:

2051     36.6577  <Compiled-function.FIND-GF-DISPATCH-TABLE-INDEX.0x300040165FFF>
303       5.4155  <Compiled-function.%%NTH-ARG-DCODE.0x300040161D1F>
132       2.3592  <Compiled-function.%%ONE-ARG-DCODE.0x300040162DBF>
116       2.0733  <Compiled-function.%MAYBE-STD-SLOT-VALUE-USING-CLASS.0x3000401AC55F>
112       2.0018  <Compiled-function.FAST-MOD.0x30004000ED8F>
96        1.7158  <Compiled-function.SYMBOL-NAME.0x3000400AAC0F>
74        1.3226  <METHOD-FUNCTION.(SETF.SLOT-VALUE-USING-CLASS).(T.STANDARD-CLASS.T.STANDARD-EFFECTIVE-SLOT-DEFINITION)>
72        1.2869  <Compiled-function.SLOT-ID-VALUE.0x3000401A2B3F>
61        1.0903  <Compiled-function.EQ-HASH-FIND.0x3000400600EF>
56        1.0009  <Compiled-function.GET-PROPERTIES.0x30004010CB0F>

The total time on this first round is pretty good (relative to later 
rounds, at least), but we're spending more than 1/3 of that time
in CCL::FIND-GF-DISPATCH-TABLE-INDEX.  A "generic-function dispatch
table" is basically a cache which maps from the class of a specialized
argument to (at least some part of) an effective method; this is
intended to avoid the overhead of computing the effective method
if it has already been computed for an argument of the same class.
The caching mechanism uses a hashing scheme which usually gets
good results, and it's generally that some sort of caching scheme
like this be used when possible.

If the cache doesn't contain an entry for a class, an entry will
be added, growing the table (and computing the effective method
info) as necessary.

In your example, where you're making single instances of newly-created
classes, there's never a cache hit (and the entries that are added
are never reused, so caching doesn't gain anything.)

Several rounds later, things took substantially longer:

;;; Round 29 ;;;
(GROVE-TO-ANONYMOUS-INSTANCE GROVE) took 3,237,506 microseconds (3.237506 seconds) to run
                     with 4 available CPU cores.
During that period, 3,223,510 microseconds (3.223510 seconds) were spent in user mode
                     11,998 microseconds (0.011998 seconds) were spent in system mode
52,429 microseconds (0.052429 seconds) was spent in GC.
  22,943,168 bytes of memory allocated.
  5,082 minor page faults, 0 major page faults, 0 swaps.

and the profiling info looked like

21336    93.7640  <Compiled-function.FIND-GF-DISPATCH-TABLE-INDEX.0x300040165FFF>

i.e., we're doing almost nothing but wandering through large GF caches,
looking for things that aren't there.

I believe (I'm too lazy to look at the source) that a GF cache will
grow to some fairly large size, then shrink and empty itself.  (Actually,
what happens is that the cache will grow to a size of ~64K, then gets
emptied and doesn't grow further.  I think.  Someone else might remember
the scheme better than I do ...)  When a table is nearly full, adding
a new entry can take significant time; when it's nearly empty, new
entries can be added quickly.

This scheme obviously doesn't work very well for your example: we
get no benefit at all from caching and wind up searching through
large amounts of useless cached information in many cases.  It does
do a pretty good job in more typical cases.

(Somewhere in the part of the MOP that CCL doesn't implement - the
generic function invocation protocol - there's some mechanism for
controlling whether or not caching can be used in cases like this.
For your example to not suffer the sort of random performance
degradtion that it does - where the most influential factors
seem to involve how large/full the GF caches for #'MAKE-INSTANCE,
#'ALLOCATE-INSTANCE, #'INITIALIZE-INSTANCE, #'SHARED-INITIALIZE,
and whatever else is being called are - some mechanism for saying
"don't use the GF cache when instances of these classes are involved"
would seem to be necessary.

As it is, the fact that we tend to see a nearly linear performance
degradation (as caches fill up and grow) and then random fluctuations
(depending on how full the dispatch tables are for each GF involved)
seems to be understandable.  Although the details will vary (and
you may not see these kinds of artifacts elsewhere), you'll probably
find that most implementations will try to optimize cases where
classes are more heavyweight and persistent (and where the universe
of all classes is more nearly static) than is the case in your
example.

On Sun, 17 Aug 2008, Boris Smilga wrote:

> Hello.
>
> I would like to ask a question concerning the inner workings of CCL, which was prompted by the following events.
>
> I was recently experimenting with transient object classes, i.e. classes which are created dynamically and exist only as long as they have instances. (This has some affinity with how prototype-based object systems work.) One evident approach here is to create anonymous classes using (MAKE-INSTANCE 'STANDARD-CLASS) and operate on them using MOP capabilities. What I found out, however, was that this approach has serious performance issues.
>
> To abstract away unnecessary details, I conducted the following experiment:
>
> 1. Generate a grove, G, which is a list of conses (KEY . VALUE) where KEYs are symbols and VALUEs are either strings or nested groves. A good size of groves for our purposes would be in the thousands of nodes.
>
> 2a. For G (and, recursively, for each of its subgroves), create C: an anonymous STANDARD-CLASS whose DIRECT-SLOTS are named after G's KEYs and which is a subclass of STANDARD-OBJECT.
>
> 2b. Do (REMOVE-DIRECT-SUBCLASS (FIND-CLASS 'STANDARD-OBJECT) C), so that DIRECT-SUBCLASSES of STANDARD-OBJECT do not bloat.
>
> 2c. Create an instance of C and set its slots to the VALUEs consed on G's respective KEYs.
>
> 3. Repeat steps 2a-2c several times (for the same G), measuring the runtime.
>
> The Lisp code for the experiment and the transcript are attached herewith.  To summarize, the runtimes were extremely unstable, rising from 1½ sec. on the 1st iteration to 49 sec. on the 16th, then quickly dropping to 7 sec. and rising again, and so on.  (I have run the test in Darwin console mode to exclude that other software appropriates processing cycles; and it doesn't look like the time was being spent on GC.)
>
> Evidently, there is something in the underlying system that causes the runtimes to veer, and I don't quite understand what that is.  I would truly appreciate if someone who knows the implementation satisfied my curiosity about what is going on here.  (Whether it can be fixed on the user side is also of no little interest.)
>
> Sincerely,
> - B. Smilga.
>
>


More information about the Openmcl-devel mailing list