[Openmcl-devel] Why is with-output-to-string so slow?

Gary Byers gb at clozure.com
Wed Oct 27 16:37:13 PDT 2010



On Wed, 27 Oct 2010, Ron Garret wrote:

> WITH-OUTPUT-TO-STRING seems to have an inordinate amount of overhead.  Baseline benchmark:
>
> ? (setf devnull (open "/dev/null" :direction :output :if-exists :overwrite))
> #<BASIC-CHARACTER-OUTPUT-STREAM ISO-8859-1 (CHARACTER-SPECIAL/12) #x30200593DFCD>
> ? (time (dotimes (i 1000000) (princ i devnull)))
> ;Compiler warnings :
> ;   In an anonymous lambda form at position 36: Undeclared free variable DEVNULL
> (DOTIMES (I 1000000) (PRINC I DEVNULL)) took 1,054,338 microseconds (1.054338 seconds) to run
>
>
> Capturing the output in strings is almost four times slower:
>
> ? (time (dotimes (i 1000000) (with-output-to-string (s) (princ i s))))
> (DOTIMES (I 1000000) (WITH-OUTPUT-TO-STRING (S) (PRINC I S))) took 3,952,464 microseconds (3.952464 seconds) to run
>

Well, creating 1000000 STRING-OUTPUT-STREAMs and writing an integer to
each of them is about 4x slower than creating a single FILE-STREAM to
/dev/null and writing 1000000 integers to it.

To compare apples to apples, we can compare the time it takes to create
1000000 FILE-STREAMs and write an integer to each of them.  On a 2.2GHz
Core 2 Duo running FreeBSD, I get:

? (time (dotimes (i 1000000) (with-open-file (s "/dev/null" :direction :output :if-exists :overwrite) (princ i s))))
(DOTIMES (I 1000000) (WITH-OPEN-FILE (S "/dev/null" :DIRECTION :OUTPUT :IF-EXISTS :OVERWRITE) (PRINC I S))) took 128,002,346 microseconds (128.002350 seconds) to run
                     with 2 available CPU cores.

Not surprisingly, creating and using a FILE-STREAM is fairly expensive
(even if writes to that stream are NOPs, we still have to persuade the
OS to open and close a file descriptor 1000000 times.)

Comparing oranges to oranges is a little more complicated.  One complicating
factor is that printing integers via PRINC conses.  (I don't know why and I'm
not sure that it should, but printing lots of integers conses a lot).  As in:

? (defvar *devnull* (open "/dev/null" :direction :output :if-exists :overwrite))

*DEVNULL*
? (time (dotimes (i 1000000) (princ i *devnull*)))
(DOTIMES (I 1000000) (PRINC I *DEVNULL*)) took 2,327,625 microseconds (2.327625 seconds) to run
                     with 2 available CPU cores.
During that period, 2,245,475 microseconds (2.245475 seconds) were spent in user mode
                     84,842 microseconds (0.084842 seconds) were spent in system mode
376,831 microseconds (0.376831 seconds) was spent in GC.
  256,000,544 bytes of memory allocated.
  97 minor page faults, 0 major page faults, 0 swaps.
NIL

That's around 256 bytes per integer.  Compare that to:

? (time (dotimes (i 1000000) (write-char #\space *devnull*)))
(DOTIMES (I 1000000) (WRITE-CHAR #\  *DEVNULL*)) took 99,138 microseconds (0.099138 seconds) to run
                     with 2 available CPU cores.
During that period, 76,754 microseconds (0.076754 seconds) were spent in user mode
                     12 microseconds (0.000012 seconds) were spent in system mode
  0 minor page faults, 1 major page faults, 0 swaps.

Compared to:

? (time (progn (with-output-to-string (s) (dotimes (i 1000000) (write-char #\space s))) nil))
(PROGN (WITH-OUTPUT-TO-STRING (S) (DOTIMES (I 1000000) (WRITE-CHAR #\  S))) NIL) took 122,553 microseconds (0.122553 seconds) to run
                     with 2 available CPU cores.
During that period, 122,552 microseconds (0.122552 seconds) were spent in user mode
                     0 microseconds (0.000000 seconds) were spent in system mode
26,528 microseconds (0.026528 seconds) was spent in GC.
  14,486,432 bytes of memory allocated.
NIL

Notice that we're still consing a lot here, because:

a) this form of WITH-OUTPUT-TO-STRING returns a 1000000-element string (which
    the form above ignores, to keep Emacs from trying to print it ...)
b) the internal string used by the STRING-OUTPUT-STREAM probably has to grow
    several times before it's capable of holding 10000000 characters (where "grow"
    means "be replaced with a larger string".)  We try to recycle these strings
    in many cases; the GC tries to limit this recycling.  (Note also that the
    time difference between this case and writing to /dev/null is almost identical
    to the GC time.)

If you do something like:

? (defvar *s* (make-string-output-stream))
*S*
? (time (dotimes (i 1000000) (write-char #\space *s*)))
;;; Don't care here, just want the string in *s* to "grow".

? (progn (get-output-stream-string *s*) ; "reset" the stream
          (time (dotimes (i 1000000) (write-char #\space *s*))))

there shouldn't be any consing and the times are almost identical to the
"writing to /dev/null" case (a few hundred NOP syscalls vs some presumably
greater number of cache misses.)

As one would expect.


>
> Simply allocating the extra storage does not account for the difference:
>
> ? (defun foo () (make-array 10 :element-type 'character :fill-pointer t :adjustable t))
> FOO
> ? (time (dotimes (i 1000000) (foo) (princ i devnull)))
> ;Compiler warnings :
> ;   In an anonymous lambda form at position 42: Undeclared free variable DEVNULL
> (DOTIMES (I 1000000) (FOO) (PRINC I DEVNULL)) took 1,453,295 microseconds (1.453295 seconds) to run
>
>
> What accounts for all the extra overhead?  Is there any way to speed it up?
>
> Thanks,
> rg
>
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/mailman/listinfo/openmcl-devel
>
>



More information about the Openmcl-devel mailing list