[Openmcl-devel] Identity Hashtables

Gary Byers gb at clozure.com
Wed Oct 1 09:11:13 PDT 2003



On Wed, 1 Oct 2003, Sven Van Caekenberghe wrote:

> I have a performance problem which I think is related to Identity
> Hashtables (a hashtable where the keys are plain objects like CLOS
> instances, conses, as well as strings, just anything), and I am looking
> for a way to speed things up or for an alternative solution.
>
> The code in question serializes any object structure to XML. While
> doing this, the code keeps a hashtable (currently (make-hash-table
> :test 'eql)) containing every object it has encountered (mapping it to
> its id), so that it can properly handle shared and/or circular
> references (by refering to the known id).
>
> When serializing object structures containing thousands of objects
> (which is not that big) the code becomes so slow that it cannot
> complete (it is still writing XML, but very slowly). I suspect that
> using the hashtable becomes exponentially slower because (1) the
> hashtable keeps growing and everytime it grows it needs to rehash (2)
> the whole process consumes a lot of memory so that GC kicks in a lot,
> which, IIRC from a post by Gary somewhere, results in all hashtables
> being rehashed because the hash is based on the address in memory and
> this can potentially change with each GC.

I sent a few messages to this list last spring, essentially starting
with the proposition that "address-based hashing [was] considered
harmful" and, after sobering up a bit, moving on to "address-based
hadhing pretty nearly unavoidable".  Since then, I've tried to change
things so that a few types of objects (symbols, CLOS instances) hash
on some other immutable attribute instead of their address.

I've been trying to get the EGC working in 0.14, and noticed in the
process that this still wasn't working correctly.  I think that I
have this fixed now (in the development tree, which should make it
to release Real Soon Now), and that rehashing should only take place
if (a) the hash table believes that it contains one or more keys that
are hashed by address and (b) a particular GC invocation moves one
or more keys in such a hash table.  Both of those conditions are
slightly conservative; it'd be hard to be much more precise about
this.

That should help -some-; it's practical to do this for certain types
of objects that already have some immutable attribute (e.g., a
symbol's pname) that's unique/random enough to be a good hash key.
It's less practical to ensure that every type of object (arbitrary
cons cells, for instance) have such an attribute, though I think that
some implementations do this.

>
> Can I control this process somehow ? Perhaps by controlling how the
> hashtable grows ? I figure that unless every Lisp object stores its own
> hash, there isn't a lot that can be done. Maybe I can make another
> datastructure for this purpose ?


If the rehashing is being caused (more) by growth instead of by GC
key movement, then there are parameters to MAKE-HASH-TABLE that can
help with this (notably :SIZE, :REHASH-SIZE, and :REHASH-THRESHOLD.)
The fist value returned by the function CCL:GCCOUNTS is a running
total of the number of GCs that have occurred since the bootstrapping
image from which the current lisp image was created was first launched.
(It's typically 2 when an openmcl image starts up: those 2 GCs happen
inside SAVE-APPLICATION.)  If that number isn't changing and you're
getting a lot of rehashing, then it's reasonable to conclude that the
rehashing's caused by growth (and that can be influenced when the
hash table's created.)

The EGC is (almost) working.  That's generally good news, but the
worst-case scenario's fairly bad: a program that spends most of its
time adding address-based keys to a hash table will spend a lot of its
time rehashing the entire table because some of its youngest keys
might have had their address changed.  I suspect that there are ways
to be smarter about this and that they'd be practical to implement
and worthwhile in many cases.

>
> Sven
>
>
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/cgi-bin/mailman/listinfo/openmcl-devel
>
>

_______________________________________________
Openmcl-devel mailing list
Openmcl-devel at clozure.com
http://clozure.com/cgi-bin/mailman/listinfo/openmcl-devel



More information about the Openmcl-devel mailing list