[Openmcl-devel] A faster read-line

Wed Oct 20 10:13:48 PDT 2010

In the most general case, READ-LINE is something like:

   (let* ((temp (make-string-with-fill-pointer)))
     (loop
       (let* ((ch (read-char stream nil nil)))
         (cond ((null ch) (return (values (copy-seq temp) t)))
               ((eql ch #\newline) (return (values (copy-seq temp) nil)))
               (t (vector-push-extend ch temp))))))

where a string-with-fill-pointer might or might not be the best way
to accumulate characters.

If the stream is buffered (and you know things about how it's buffered),
no newline translation is going on, and the mapping between octets and
characters is simple enough, you can do better: you can look for an octet
with value #\a in the buffer and if you find one, know how many octets are
used to encode the string (and therefore know the length o the string in
characters), and there are other things that you can do that can be a lot
faster than the "just collect characters until EOF or newline" approach.

The code used in that case (iso-8859-1 encoding, unix line-termination) is
faster than the general case; it's still likely to be slower than #_fgets
(read at most N octets into a preallocated buffer, confuse concepts
"characters" and "octets", etc.)

There's a lot of room in between the very simple iso-8859-1/unix case and
the general one (e.g, ASCII/unix is almost as simple as iso-8859-1), but
CCL doesn't try to do anything special to handle those cases.  Most of those
special things involve trying to determine whether there's a newline in
the buffer, which depends on what character(s) are used to represent #\newline
and on what octet(s) are used to represent those characters.

On Tue, 19 Oct 2010, Ron Garret wrote:

> Thanks.
>
> It seems to be unicode conversion that is taking all the time.  Python yields similar disparities depending on whether you're reading a file opened with open or codecs.open.
>
> READ-SEQUENCE is nice and zippy.
>
> rg
>
> On Oct 19, 2010, at 4:36 PM, Greg Pfeil wrote:
>
>> On 19 Oct 2010, at 19:27, Ron Garret wrote:
>>
>>> Without doing anything special, read-line is, empirically, about fifteen times slower than the equivalent C code, even with :external-format :ascii.  (My benchmark is comparing (loop while (read-line f nil nil)) with wc.)  Lisp also seems to be CPU bound during read-line.  What is it doing with all those cycles?  Are there any easy ways to speed this up?  What's the fastest way to ingest a file in CCL?
>>
>> I don't know what CCL is doing, but I remember seeing this forever ago: http://www.ymeme.com/slurping-a-file-common-lisp-83.html
>
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/mailman/listinfo/openmcl-devel
>
>