[Openmcl-devel] Why is with-output-to-string so slow?

Gary Byers gb at clozure.com
Wed Oct 27 20:26:22 PDT 2010



On Wed, 27 Oct 2010, Ron Garret wrote:

> I guess I didn't state my question clearly enough.  Yes, obviously opening a zillion file streams is going to be slow because you have to interact with the OS.  But to create a string output stream you don't have to interact with the OS.  Why is creating a string output stream so much slower than allocating an adjustable vector with a fill pointer?
>
> rg

Um, it isn't, necessarily.  For instance, allocating about 40 GB takes longer
than allocating < 1/2 GB (of course most of the difference is in GC overhead):


? (time (dotimes (i 1000000) (make-array 10000 :element-type 'character :fill-pointer 0)))
(DOTIMES (I 1000000) (MAKE-ARRAY 10000 :ELEMENT-TYPE 'CHARACTER :FILL-POINTER 0)) took 65,657,637 microseconds (65.657640 seconds) to run
                     with 2 available CPU cores.
During that period, 64,616,198 microseconds (64.616200 seconds) were spent in user mode
                     1,596,865 microseconds (1.596865 seconds) were spent in system mode
63,710,373 microseconds (63.710373 seconds) was spent in GC.
  40,064,000,000 bytes of memory allocated.
  97 minor page faults, 0 major page faults, 0 swaps.
NIL

;; The GC's invoked >2400 times in this case.

? (time (dotimes (i 1000000) (make-string-output-stream)))
(DOTIMES (I 1000000) (MAKE-STRING-OUTPUT-STREAM)) took 2,627,754 microseconds (2.627754 seconds) to run
                     with 2 available CPU cores.
During that period, 2,624,791 microseconds (2.624791 seconds) were spent in user mode
                     7,602 microseconds (0.007602 seconds) were spent in system mode
685,639 microseconds (0.685639 seconds) was spent in GC.
  464,000,000 bytes of memory allocated.
  32 minor page faults, 0 major page faults, 0 swaps.
NIL
?

  (time (dotimes (i 1000000) (make-array 10 :element-type 'character :fill-pointer 0)))
(DOTIMES (I 1000000) (MAKE-ARRAY 10 :ELEMENT-TYPE 'CHARACTER :FILL-POINTER 0)) took 272,446 microseconds (0.272446 seconds) to run
                     with 2 available CPU cores.
During that period, 273,350 microseconds (0.273350 seconds) were spent in user mode
                     50 microseconds (0.000050 seconds) were spent in system mode
141,570 microseconds (0.141570 seconds) was spent in GC.
  96,000,000 bytes of memory allocated.
NIL

On this system, the break-even point seems to be around 400
(MAKE-STRING-OUTPUT-STREAM takes about as long as making a 400-element
string with a fill pointer.)

It seems that the (amortized) cost of allocating something is
proportional to how big it is.  Other factors include the cost of
initialization.  A STRING-OUTPUT-STREAM is both larger and more
complicated than a 10-element string with a fill-pointer.  There are
some auxiliary structures in the STRING-OUTPUT-STREAM that're similar
to those used in other kinds of streams, and the net effect of this
scheme is to make I/O to string streams faster than it might otherwise
be (without requiring that every function involved in doing I/O know
much about their internals.)

It's probably possible to make STRING-OUTPUT-STREAMs smaller and simpler
(and, all other things being equal, make output to them slower in some cases.)
I can't really think of a good reason to do that; as near as I can tell, they're
something less than 500 bytes apiece (+ string size) and much of that can get
recycled.

Does the cost of creating/closing/recycling a STRING-OUTPUT-STREAM exceed the
cost of writing a few characters to one ?  For the right value of "a few",
almost certainly.

For more typical uses (such as '(format nil ...)', where anywhere from several
dozen to a few hundred characters get written to a short-lived STRING-OUTPUT-STREAM
and recycling can take effect, is the stream-creation overhead significant ?  I
don't think so.

Is anyone else having bad Donald Rumsfeld flashbacks ?  I can't answer that ...

------
Here's a partial, naive but not obviously horrible implementation of something like
a STRING-OUTPUT-STREAM that uses an initially small string with a fill-pointer:

? (defclass my-string-output-stream (fundamental-character-output-stream)
     ((string :accessor %string :initform (make-array 10 :element-type 'character :fill-pointer 0 :adjustable t))))
#<STANDARD-CLASS MY-STRING-OUTPUT-STREAM>
? (defmethod stream-write-char ((stream my-string-output-stream) char)
   (vector-push-extend char (%string stream))
   char)

? (defmethod stream-write-string ((stream my-string-output-stream) string &optional (start 0) end)
   (setq end (ccl::check-sequence-bounds string start end))
    (let* ((s (%string stream)))
      (dotimes (i (length string) string)
        (vector-push-extend (char string i) s))))

Let's do something that takes enough time to measure fairly accurately (at least
in a 64-bit CCL) but doesn't cons so much that we get overwhelmed by GC time:
create a stream capable of character output 100 times and write a short string
to that stream 10 times.  I'd guess that this (writing 60 characters) is in
the neighborhood of "typical" use of a STRING-OUTPUT-STREAM:


? (time (dotimes (i 100)
         (let* ((s (make-instance 'my-string-output-stream)))
           (dotimes (i 10) (write-string "hello" s)))))
(DOTIMES (I 100) (LET* ((S (MAKE-INSTANCE 'MY-STRING-OUTPUT-STREAM))) (DOTIMES (I 10) (WRITE-STRING "hello" S)))) took 1,262 microseconds (0.001262 seconds) to run

? (time (dotimes (i 100)
         (let* ((s (make-string-output-stream)))
           (dotimes (i 10) (write-string "hello" s)))))
(DOTIMES (I 100) (LET* ((S (MAKE-STRING-OUTPUT-STREAM))) (DOTIMES (I 10) (WRITE-STRING "hello" S)))) took 453 microseconds (0.000453 seconds) to run
                     with 2 available CPU cores.
During that period, 412 microseconds (0.000412 seconds) were spent in user mode
                     0 microseconds (0.000000 seconds) were spent in system mode
  92,800 bytes of memory allocated.

The second version's almost 3X faster and (since it's not closing the streams,
it's not benefiting from any recycling of stream resources.)  If we do close
the streams in the second case, we take about the same time but cons a total
of about 5K.

It's certainly possible to improve on the naive implementation or to implement
string streams that're faster than CCL's.  It's very likely that in at least
the first case you'd need to make the data structures richer (and larger) and
someday someone will ask you why it takes so much longer to create a 
MY-STRING-OUTPUT-STREAM than it does to create a string with a fill-pointer.
You'll already know the answer !



More information about the Openmcl-devel mailing list