[Openmcl-devel] Binary IO...

Gary Byers gb at clozure.com
Tue Jun 9 05:39:16 PDT 2009

There are a couple of approaches to this; they're probably described
in some detail in the ccl/doc/release-notes* files but the documentation
may only mention them in passing (if at all).

First of all, an "ivector" is a simple one-dimensional array that's
specialized to a numeric or character element type.

CCL:MAKE-HEAP-IVECTOR element-count element-type

where ELEMENT-COUNT is an unsigned integer and ELEMENT-TYPE is a type
specifier - is essentially like

(make-array element-count :element-type element-type)

except for the fact that the array is allocated in foreign memory (never
scanned or moved by the GC.

CCL:MAKE-HEAP-IVECTOR returns 3 value: a MACPTR (which points to the 0th
element of the vector), a vector (allocated in foreign memory), and the
size of the vector in 8-bit bytes.

The vector's contents have undefined values ("whatever was there").

CCL:STREAM-DEVICE stream direction

DIRECTION should be one of :INPUT or :OUTPUT; STREAM can be any stream.
For streams that're associated with file descriptors (sockets and file
streams), STREAM-DEVICE returns that file descriptor (or "file handle"
as an integer on Windows.)


(multiple-value-bind (pointer vector)
     (ccl:make-heap-ivector (ash 8 20) '(unsigned-byte 8))
   ;; 'vector' should behave like a regular vector
   (dotimes (i (length vector))
     (setf (aref vector i) (logand i #xff)))
   (with-open-file (f "some-path" :direction :output ...)
     (let* ((fd (ccl:stream-device f :output)))
       (dotimes (i 40)
         (#_write fd pointer (ash 8 20)))))
   (with-open-file (f "some-path" :direction :input ...)
     ;; There can be some cases where an input stream may
     ;; read from the stream before being asked to.  Seek
     ;; to the start of the file.
     (let* ((fd (ccl:stream-device f :output)))
       (#_lseek fd 0 #$SEEK_SET)
       (dotimes (i 40)
         (#_read fd pointer (ash 8 20)))))
    (values pointer vector))

That should be much faster than the version that uses WRITE-SEQUENCE
and READ-SEQUENCE, because it doesn't have to copy bytes between
the stream's buffer (allocated with MAKE-HEAP-IVECTOR) and the sequence.

Since a "heap ivector" isn't even seen by the GC, it'll exist until
the end of a session (it's not meaningfully preserved by SAVE-APPLICATION.)
If there's a well-defined point in time at which you're done with it, you
can explicitly dispose of the vector by doing:

(CCL:DISPOSE-HEAP-IVECTOR ivector) ; where ivector is the vector returned
                                    ; by MAKE-HEAP-IVECTOR

The results of referring to a heap-ivector after it's been disposed of
are undefined.  (Roughly the same as referring to memory allocated by
#_malloc after that memory's been #_free'd.)

The "heap ivector" mechanism works reasonably well for ivectors that
have well-defined (and relatively long) lifetimes.  It's not necessary
to inhibit the GC in order to pass a pointer to their first element to
foreign code (their address is guaranteed not to change.)  Foreign code
that might cache that address can safely do so.

It's also possible to temporarily inhibit the GC and execute code with
a pointer to the current address of an arbitrary ivector:

(CCL:WITH-HEAP-IVECTOR (ptr ivector) &body body)

temporarily disables the GC, binds PTR to a pointer to the (current)
address of the first element of the ivector IVECTOR, and executes BODY.
In general, memory allocation requests that'd otherwise cause the GC
to be invoked may be satisfied by obtaining more memory from the OS if
the GC is inhibited.  The chances of this happening (and leading to
a worst-case scenario of uncontrolled heap growth) can be minimized
if the BODY doesn't cons much (and if other threads don't cons much),
but it's very hard to quantify what "much" means.

If the pointer PTR is passed to foreign code, that code shouldn't cache
the pointer or otherwise try to use it after the WITH-HEAP-IVECTOR form
exits; all that's guaranteed is that the vector won't move (and therefore
the pointer will remain valid) during the extent of the form, and that
isn't otherwise guaranteed.

On Mon, 8 Jun 2009, Jon S. Anthony wrote:

> Hmmmm, forgot about the GC again (I suppose that is as much a good thing
> as a bad thing - forgetting about it - or more exactly, what it does -
> is sort of the point...)
> I think your analysis is exactly right and the behavior pretty much
> exactly what is needed in the absence of any "tuning".  On the subject
> of which -
> On Sun, 2009-06-07 at 18:04 -0600, Gary Byers wrote:
>> A stream's buffer is nailed down in foreign memory, so we can safely
>> read from and write to it without worrying about the GC moving it
>> around
> ...
>> There are ways to inhibit the GC, obtain the (absolute, non-relocatable)
>> address of the vector, and do I/O directly (bypassing the buffer).  Whether
>> that's better overall depends on what the cost of inhibiting the GC would
>> be (which in turn depends on what kind of consing activitiy is going on
>> in other threads.)
> It is straight forward to "pin" a vector like this in ACL, when creating
> it, by essentially telling the memory/GC machinery to just place it (on
> creation) in an unmoving tenured area, and thereby be assured that the
> GC won't be moving it.  You don't need to "inhibit" the GC after it is
> created (and pinned).  At which point, you effectively have the "nailed
> down vector", as you say.  You indicate something like this is doable in
> CCL, any info or pointers for that?  I'm guessing (well, hoping) that
> any GC inhibition here will only be upfront temporary as well.
> Thanks again!
> /Jon

More information about the Openmcl-devel mailing list