[Openmcl-devel] Binary IO...

Jon S. Anthony j-anthony at comcast.net
Tue Jun 9 07:28:44 PDT 2009


This appears to be pretty much exactly what I need.  Since these vectors
behave like "regular vectors" in normal code, I really don't even need
the capabilities of WITH-HEAP-IVECTOR (at least I don't now think
so...).  The only thing left is the ability to have (or simulate) a
bidirectional stream with this.

So, is it legitimate to do this sort of thing (basically use the same
stream for reading and writing)?  Or can there be two separate streams
open at the same time for the same file (one for input, one for output)?

(defun open-store (filespec)
  ...
  (multiple-value-bind (pointer vector)
      (ccl:make-heap-ivector (ash 8 20) '(unsigned-byte 8))
    (save-in-page-vector pointer vector))

  (multiple-value-bind (pointer vector)
      (ccl:make-heap-ivector (ash 8 20) '(unsigned-byte 8))
    (save-out-page-vector pointer vector))
  ...
  (save-stream-in-right-place
   (open "./test.bin"
	 :element-type '(unsigned-byte 8)
	 :direction :io ; <<<------- input and output
	 :if-exists :overwrite
	 :if-does-not-exist :create))
  ...)

...

(defun read-page (page)
  (let* ((f (get-stream-from-right-place))
	 (fd (ccl:stream-device f :input))) <<<--- Use as input
    (multiple-value-bind (pointer vector size)
	(get-page-info page)
      (declare (type macptr pointer)
	       (type (simple-array (unsigned-byte 8) (*)) vector)
	       (type fixnum size))
      (#_read fd pointer size))))


(defun write-page (page)
  (let* ((f (get-stream-from-right-place))
	 (fd (ccl:stream-device f :output))) <<<---- Use as output
    (multiple-value-bind (pointer vector size)
	(get-page-info page)
      (declare (type macptr pointer)
	       (type (simple-array (unsigned-byte 8) (*)) vector)
	       (type fixnum size))
    (#_write fd pointer size))))

...

(defun doing-something (...)
  ....
  (read-page some-page)
  ....
  (write-page some-other-page)
  ....)


Thanks again,

/Jon


On Tue, 2009-06-09 at 06:39 -0600, Gary Byers wrote:
> There are a couple of approaches to this; they're probably described
> in some detail in the ccl/doc/release-notes* files but the documentation
> may only mention them in passing (if at all).
> 
> First of all, an "ivector" is a simple one-dimensional array that's
> specialized to a numeric or character element type.
> 
> CCL:MAKE-HEAP-IVECTOR element-count element-type
> 
> where ELEMENT-COUNT is an unsigned integer and ELEMENT-TYPE is a type
> specifier - is essentially like
> 
> (make-array element-count :element-type element-type)
> 
> except for the fact that the array is allocated in foreign memory (never
> scanned or moved by the GC.
> 
> CCL:MAKE-HEAP-IVECTOR returns 3 value: a MACPTR (which points to the 0th
> element of the vector), a vector (allocated in foreign memory), and the
> size of the vector in 8-bit bytes.
> 
> The vector's contents have undefined values ("whatever was there").
> 
> CCL:STREAM-DEVICE stream direction
> 
> DIRECTION should be one of :INPUT or :OUTPUT; STREAM can be any stream.
> For streams that're associated with file descriptors (sockets and file
> streams), STREAM-DEVICE returns that file descriptor (or "file handle"
> as an integer on Windows.)
> 
> So:
> 
> (multiple-value-bind (pointer vector)
>      (ccl:make-heap-ivector (ash 8 20) '(unsigned-byte 8))
>    ;; 'vector' should behave like a regular vector
>    (dotimes (i (length vector))
>      (setf (aref vector i) (logand i #xff)))
>    (with-open-file (f "some-path" :direction :output ...)
>      (let* ((fd (ccl:stream-device f :output)))
>        (dotimes (i 40)
>          (#_write fd pointer (ash 8 20)))))
>    (with-open-file (f "some-path" :direction :input ...)
>      ;; There can be some cases where an input stream may
>      ;; read from the stream before being asked to.  Seek
>      ;; to the start of the file.
>      (let* ((fd (ccl:stream-device f :output)))
>        (#_lseek fd 0 #$SEEK_SET)
>        (dotimes (i 40)
>          (#_read fd pointer (ash 8 20)))))
>     (values pointer vector))
> 
> That should be much faster than the version that uses WRITE-SEQUENCE
> and READ-SEQUENCE, because it doesn't have to copy bytes between
> the stream's buffer (allocated with MAKE-HEAP-IVECTOR) and the sequence.
> 
> Since a "heap ivector" isn't even seen by the GC, it'll exist until
> the end of a session (it's not meaningfully preserved by SAVE-APPLICATION.)
> If there's a well-defined point in time at which you're done with it, you
> can explicitly dispose of the vector by doing:
> 
> (CCL:DISPOSE-HEAP-IVECTOR ivector) ; where ivector is the vector returned
>                                     ; by MAKE-HEAP-IVECTOR
> 
> The results of referring to a heap-ivector after it's been disposed of
> are undefined.  (Roughly the same as referring to memory allocated by
> #_malloc after that memory's been #_free'd.)
> 
> The "heap ivector" mechanism works reasonably well for ivectors that
> have well-defined (and relatively long) lifetimes.  It's not necessary
> to inhibit the GC in order to pass a pointer to their first element to
> foreign code (their address is guaranteed not to change.)  Foreign code
> that might cache that address can safely do so.
> 
> It's also possible to temporarily inhibit the GC and execute code with
> a pointer to the current address of an arbitrary ivector:
> 
> (CCL:WITH-HEAP-IVECTOR (ptr ivector) &body body)
> 
> temporarily disables the GC, binds PTR to a pointer to the (current)
> address of the first element of the ivector IVECTOR, and executes BODY.
> In general, memory allocation requests that'd otherwise cause the GC
> to be invoked may be satisfied by obtaining more memory from the OS if
> the GC is inhibited.  The chances of this happening (and leading to
> a worst-case scenario of uncontrolled heap growth) can be minimized
> if the BODY doesn't cons much (and if other threads don't cons much),
> but it's very hard to quantify what "much" means.
> 
> If the pointer PTR is passed to foreign code, that code shouldn't cache
> the pointer or otherwise try to use it after the WITH-HEAP-IVECTOR form
> exits; all that's guaranteed is that the vector won't move (and therefore
> the pointer will remain valid) during the extent of the form, and that
> isn't otherwise guaranteed.
> 
> 
> 
> 
> 
> 
> On Mon, 8 Jun 2009, Jon S. Anthony wrote:
> 
> >
> > Hmmmm, forgot about the GC again (I suppose that is as much a good thing
> > as a bad thing - forgetting about it - or more exactly, what it does -
> > is sort of the point...)
> >
> > I think your analysis is exactly right and the behavior pretty much
> > exactly what is needed in the absence of any "tuning".  On the subject
> > of which -
> >
> >
> > On Sun, 2009-06-07 at 18:04 -0600, Gary Byers wrote:
> >> A stream's buffer is nailed down in foreign memory, so we can safely
> >> read from and write to it without worrying about the GC moving it
> >> around
> > ...
> >> There are ways to inhibit the GC, obtain the (absolute, non-relocatable)
> >> address of the vector, and do I/O directly (bypassing the buffer).  Whether
> >> that's better overall depends on what the cost of inhibiting the GC would
> >> be (which in turn depends on what kind of consing activitiy is going on
> >> in other threads.)
> >
> > It is straight forward to "pin" a vector like this in ACL, when creating
> > it, by essentially telling the memory/GC machinery to just place it (on
> > creation) in an unmoving tenured area, and thereby be assured that the
> > GC won't be moving it.  You don't need to "inhibit" the GC after it is
> > created (and pinned).  At which point, you effectively have the "nailed
> > down vector", as you say.  You indicate something like this is doable in
> > CCL, any info or pointers for that?  I'm guessing (well, hoping) that
> > any GC inhibition here will only be upfront temporary as well.
> >
> > Thanks again!
> >
> > /Jon
> >
> >
> >




More information about the Openmcl-devel mailing list