[Openmcl-devel] Moving large amount of data with OpenMCL

Gary Byers gb at clozure.com
Sun Jan 16 12:18:10 PST 2005



On Sat, 15 Jan 2005, Andrew P. Lentvorski, Jr. wrote:

> I'm about to start the process of moving an old program of mine from
> Python to OpenMCL.
>
> The program processes large (>1Gigabyte) datasets stored as XML.
>
> However, pulling this data in from disk is slow; sorting it is even
> slower.
>
> So the question is: how should I store this and how should should I
> pull it into memory for efficiency?

One way to pull large chunks of data into memory (sort of) is to
use memory-mapped files.  (see "man mmap").  There are some tradeoffs
here:

 1) mmap'ing a gigabyte (or more) of contiguous memory is a little
tricky; unless you tell it not to (via the --heap-reserve command-line
option), OpenMCL will try to grab around 2GB of address space for itself
on startup.  (There isn't much more than 2GB of congiguous address space
available on Jaguar or Panter, so it'd be hard to find another large
unmapped chunk without telling the lisp to be less aggressive.)

 2) You'd almost certainly want to map the file read-only (this probably
isn't a viable strategy otherwise).

 3) mmap-ing a file into memory basically gives you a pointer to a
(big) sequence of bytes, which you can access via
(CCL:%GET-UNSIGNED-BYTE <ptr> <offset>)  The <offset> is presumed to
be a FIXNUM, and MOST-POSITIVE-FIXNUM is about half a gig.

I suspect that you'd find that if you limited yourself to 256MB chunks
(and maybe cut --heap-reserve down a bit), you'd find that accessing
mapped files is likely to be pretty fast: the data is "just there"
(though it may have to be paged in.)

I think that Hamilton Link used this technique for something that he
was working on a year or so ago.

>
> There are probably two different questions here.
>
> First, how do I store and process the stuff meant for interchange (ie.
> it goes to disk and possibly to another program)?
>
> When you strip away the XML stuff, the records are effectively:
> ((xmin xmax) (ymin ymax) objectid (objectid linkage) (polygon
> description))
>
> Normally, I would write some form of loop with parsing callbacks in any
> other language.  However, that strikes me as very "un-Lispy".  This
> feels like it should be macro magic, but I'm just not experienced
> enough at Lisp to know what I should be doing.
>
> Second, is there a way that I can "freeze" the data image of a running
> openmcl instance and push it to disk for a later reload.  It would be
> nice to not have to reload, reprocess, and re-sort this data every
> time.  Being able to just let openmcl suck in a memory image or
> something would be a lot better.

As others have noted, SAVE-APPLICATION will basically save the lisp
parts of your application to a file (that can be mmap-ed back into
memory by the lisp kernel.)  It doesn't try to save foreign data
or lisp execution state, but the saved "memory image" usually starts
up pretty quickly (how quickly isn't strictly dependent on image
size.)

>
> Thanks,
> -a
>
>
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/mailman/listinfo/openmcl-devel
>
>



More information about the Openmcl-devel mailing list