[Openmcl-devel] Speed of concatenate vs format
Daniel Dickison
danieldickison at gmail.com
Fri May 1 18:36:44 PDT 2009
Gary,
I really appreciate your detailed and insightful response (as
always)! I hadn't thought about the difference between char and
schar, and I didn't realize that it would make that much of a
difference. Sure enough, your concatenate-to-string is significantly
faster than format or concatenate.
I'm on a different machine now, but for concatenating two 2000-
character simple-strings 100,000 times, I get 4.0 vs 5.4 vs 10.8
seconds for concat-to-string, format, and concatenate, respectively.
With 100-character strings, it's about 0.23 vs 0.83 vs 0.63.
I imagine concatenating short simple-strings is quite a common
operation, so a compiler-macro to automate this optimization might
help a lot of applications. Regardless, using concatenate-to-string
explicitly should give a boost. Thanks again.
Daniel
On May 1, 2009, at 7:12 PM, Gary Byers wrote:
> Looking at some profiling output for the (CONCATENATE 'STRING ...)
> case:
>
>
> samples % image name app name
> symbol name
> 20960 22.3674 lx86cl64 lx86cl64
> misc_set_common
> 15148 16.1651 profccl64 profccl64
> <Compiled-function.CONCAT-TO-SIMPLE*.0x30004016C01F>
> 12780 13.6381 lx86cl64 lx86cl64
> misc_ref_common
> 12281 13.1056 lx86cl64 lx86cl64
> _SPbuiltin_aset1
> 11181 11.9317 lx86cl64 lx86cl64
> _SPsubtag_misc_set
> 8086 8.6289 lx86cl64 lx86cl64
> _SPbuiltin_aref1
> 7731 8.2501 lx86cl64 lx86cl64
> _SPsubtag_misc_ref
>
> shows that the biggest culprits are some combination of
> CCL::CONCAT-TO-SIMPLE* (which may be a bit of a dog; I haven't looked
> at it yet ...) and things that implement AREF and (SETF AREF) on
> 1-dimensional arrays whose element type isn't known at compile-time.
> If the element-type of the result sequence in a call to CONCATENATE is
> known at compile-time, then those things - which are written in
> assembler and aren't grossly inefficient, for what they do - then
> those things can be replaced with simpler, inlined, type-specific
> things. It certainly takes FORMAT or PRINC a while to get to the
> point where it realizes that it's transferring characters from one
> string to another than it should take some flavor of CONCATENATE, but
> the loop that actually transfers those characters - 4000 references
> and 4000 assignments on each iteration - can use those simpler,
> type-specific reference and assignment operations that the more
> generic CONCATENATE and its aux functions can be. (Saving the extra
> generic AREF overhead 4000 times and the generic (SETF AREF) overhead
> another 4000 times adds up; you probably wouldn't see results like
> this with shorter strings.)
>
> Some uses of CONCATENATE are more common than others, and using
> concatenate with a result-type that's equivalent to STRING (and with
> source arguments that're very often STRINGs and likely SIMPLE-STRINGs)
> may be the most common usage of CONCATENATE. A fairly straightforward
> way of trying to exploit that is:
>
> (defun concatenate-to-string (&rest sequences)
> (declare (dynamic-extent sequences))
> (let* ((len 0))
> (declare (fixnum len))
> (dolist (seq sequences)
> (incf len (length seq)))
> (let* ((result (make-string len))
> (out 0))
> (declare (simple-string result) (fixnum out)
> (optimize (speed 3) (safety 0)))
> (dolist (seq sequences result)
> (etypecase seq
> (simple-string
> (locally (declare (simple-string seq))
> (dotimes (i (length seq))
> (setf (schar result out) (schar seq i))
> (incf out))))
> (string
> (let* ((slen (length seq)))
> (declare (fixnum slen))
> (multiple-value-bind (data offset)
> (ccl::array-data-and-offset seq)
> (declare (fixnum offset) (simple-string data))
> (dotimes (i slen)
> (setf (schar result out) (schar seq offset))
> (incf offset)
> (incf out)))))
> (vector
> (dotimes (i (length seq))
> (setf (schar result out) (aref seq i))
> (incf out)))
> (list
> (dolist (elt seq)
> (setf (schar result out) elt)
> (incf out))))))))
>
> (I'm sure that this could be made faster still by BLTing things around
> and in other ways.)
>
> Since a sequence function result that's of type STRING is always a
> SIMPLE-STRING,
> that recognizes that it can always use (SETF SCHAR) to store into
> the result. If
> we're correct in assuming that the source sequences are often
> STRINGs (and often
> SIMPLE-STRINGs), the function tries to arrange to use SCHAR to
> access source
> characters in those cases. That's likely to be substantially faster
> than the
> more generic CONCATENATE (though it could surely call this code if
> it recognizes
> that the result type is appropriate, and a compiler macro could do
> that recognition
> at compile time.)
>
> This should also be faster than using FORMAT or PRINC to write to a
> STRING-OUTPUT-STREAM, but when the strings involved are large may
> not be as
> much faster as one might expect: the dominant inner loop in the case
> of doing
> WRITE-STRING to a STRING-OUTPUT-STREAM is pretty much the same SCHAR/
> (SETF SCHAR)
> stuff as happens above.
>
> In all of these cases, the memory-management overhead is pretty
> significant (and
> relatively constant). If I'm doing the math right and factoring
> that out correctly,
> then it looks like CONCATENATE-TO-STRING is around 7X faster than
> CONCATENATE on
> your test case (and a few X faster than the WRITE-STRING versions.)
> For shorter
> strings, the speedup isn't likely to be that dramatic, but it may
> still noticeable
> and it's unlikely that CONCATENATE-TO-STRING would ever be slower
> than the other
> approaches.
>
>
>
>
> On Fri, 1 May 2009, Daniel Dickison wrote:
>
>> I've been working with some legacy code that contains many instances
>> of --
>>
>> (format nil "~a~a" string1 string2)
>>
>> for concatenating strings. This code is in some code that is a
>> performance bottleneck, and I thought, surely, this would be more
>> efficient if rewritten as --
>>
>> (concatenate 'string string1 string2)
>>
>> but this turns out not to be the case. In fact, using format for
>> string concatenation is significantly faster. So assuming the
>> overhead for parsing the format control string is negligible, the
>> format version probably becomes two calls to princ wrapped in a with-
>> output-to-string. How come this is so much faster than concatenate?
>>
>> Now that I've done the profiling, I know which code to use, but I'm
>> still really curious why this is the case. I would appreciate any
>> insight. Below is the results of some quick-and-dirty profiling on
>> CCL Version 1.3-r11987M (DarwinX8664), an iMac Core 2 Duo.
>>
>> Thanks!
>>
>> Daniel
>>
>>
>> CL-USER> (defvar *str* (make-string 2000 :initial-element #\a))
>> *STR*
>>
>> CL-USER> (time (dotimes (i 100000)
>> (format nil "~a~a" *str* *str*)))
>> (DOTIMES (I 100000) (FORMAT NIL "~a~a" *STR* *STR*)) took 5,547,598
>> microseconds (5.547598 seconds) to run
>> with 2 available CPU cores.
>> During that period, 3,288,825 microseconds (3.288825 seconds) were
>> spent in user mode
>> 770,732 microseconds (0.770732 seconds) were
>> spent in system mode
>> 1,076,633 microseconds (1.076633 seconds) was spent in GC.
>> 1,606,416,016 bytes of memory allocated.
>> NIL
>>
>> CL-USER> (time (dotimes (i 100000)
>> (concatenate 'string *str* *str*)))
>> (DOTIMES (I 100000) (CONCATENATE 'STRING *STR* *STR*)) took
>> 10,373,570
>> microseconds (10.373570 seconds) to run
>> with 2 available CPU cores.
>> During that period, 8,092,791 microseconds (8.092791 seconds) were
>> spent in user mode
>> 878,887 microseconds (0.878887 seconds) were
>> spent in system mode
>> 1,097,819 microseconds (1.097819 seconds) was spent in GC.
>> 1,601,600,000 bytes of memory allocated.
>> NIL
>>
>> CL-USER> (time (dotimes (i 100000)
>> (with-output-to-string (s)
>> (princ *str* s)
>> (princ *str* s))))
>> (DOTIMES (I 100000) (WITH-OUTPUT-TO-STRING (S) (PRINC *STR* S) (PRINC
>> *STR* S))) took 5,284,978 microseconds (5.284978 seconds) to run
>> with 2 available CPU cores.
>> During that period, 3,153,681 microseconds (3.153681 seconds) were
>> spent in user mode
>> 752,712 microseconds (0.752712 seconds) were
>> spent in system mode
>> 1,087,328 microseconds (1.087328 seconds) was spent in GC.
>> 1,606,416,016 bytes of memory allocated.
>>
>> _______________________________________________
>> Openmcl-devel mailing list
>> Openmcl-devel at clozure.com
>> http://clozure.com/mailman/listinfo/openmcl-devel
>>
>>
More information about the Openmcl-devel
mailing list