[Openmcl-devel] slow read-char

Sat Jul 15 13:18:43 PDT 2006

On Sat, 15 Jul 2006, Takehiko Abe wrote:
>> BTW It may not be fair to compare Openmcl and Allegro CL in READ
>> performance, since Allegro is not native-threads on OS X.
>
> I agree. MCL 5.1 does not use native (preemptive) threads either.
>

I think (in hindsight) that locking inside the guts of READ-CHAR
probably counts as a naive implementation.

Paying attention to issues introduced by native threads isn't
generally going to be faster than ignoring those issues, but I don't
think that it should be 2x or 3x or more times slower, either.  It's
desirable that streams not get corrupted when they're accessed from
multiple threads, but accessing streams from multiple threads (a)
is probably very rare in practice and (b) generally needs some sort
of higher-level locking (like :Y in the listener) to avoid total
chaos, anyway.  As we seem to all agree, it makes more sense for
the rare/already hairy case to pay extra and the common case to
pay less.

Getting 2X back (as Takehiko's experiment suggests is possible)
certainly seems like a step in the right direction.  My guess is that
there's probably room for some further improvement before hitting at
least something of a wall: the fact that all streams are Gray streams
means that you'll always be paying -something- (generic function
dispatch, among other things) extra in exchange for having the
ability to subclass and specialize built-in stream types via the
Gray stream protocol.  (I'm not even knocking Gray streams as a
protocol; it's not clear that other stream extension protocols -
like Franz's Simple Streams - are much more compelling.)

There are other other approaches that don't necessarily involve
every stream being a Gray stream; if "standard" FILE-STREAMs
were essentially instances of some built-in class, you couldn't
necessarily subclass them but you could still create speciaized
FILE-STREAMs via encapsulation. E.g., wrapping a Gray stream
around a built-in-stream, like:

(defclass crlf-writing-stream (file-character-output-stream)
  ((inner-stream :reader inner-stream)))

(defmethod stream-write-char ((s crlf-writing-stream) char)
   (write-char char (inner-stream s)))

(defmethod stream-write-char ((s crlf-writing-stream) (char (eql #\NewLine)))
   (write-char #\Return (inner-stream s))
   (write-char char (innser-stream s)))

assuming things like:

(defun write-char (char &optional stream)
   (setq stream (decode-stream-argument-if-necessary stream))
   (if (%built-in-stream-p stream)
     (%built-in-write-char char stream)
     (stream-write-char stream char)))

That doesn't strike me as being too onerous, and there might be ways
to arrange things so that subclassing still worked.

FWIW, I was able to get about a 25% speedup (cut the execution time to
about 75% of what it had been) just by tightening some of the code in
CCL::%IOBLOCK-TYI (and not touching the locking code.)  That's
encouraging, but I also found that a loop (reading 2M characters from
a file) that called CCL::%IOBLOCK-TYI directly was about 15X (1500%)
faster than one that called either READ-CHAR or STREAM-READ-CHAR.
That isn't entirely realistic, but it does suggest that the generic
function dispatch and the accessor method/slot-value calls that happen
in the STREAM-READ-CHAR case add up.  (If a typical call to
CCL::%IOBLOCK-TYI barely does anything, then the locking,
generic-function dispatch, slot lookup, and other overhead are each
only a few times more expensive than "barely doing anything"; when
you add up those costs ... it's not too surprising that we're seeing
that READ-CHAR is several times slower than it could/should be.)

Again, I don't want to suggest that a 15X speedup is realistic, but
trading some flexibility (relatively simple Gray stream subclassing,
maybe) for a large chunk of that speedup sounds like a good deal to
me.