[Openmcl-devel] Address-based hashing considered harmful. Maybe.

Stonewall Ballard sb.list at sb.org
Mon Mar 3 06:25:06 PST 2003


On 3/3/03 1:16 AM, "Gary Byers" <gb at clozure.com> wrote:

> On Sat, 1 Mar 2003, Stonewall Ballard wrote:
> 
>> At least in the case of cons cells, how hard would it be to allocate them in
>> their own zone and never move them? Since they're fixed-size, the only
>> reason to move them is so that a scavenger can work uniformly on all
>> objects.
>> 
> 
> There are other reasons to want to move things (improving locality and
> paging behavior where relevant)  Compacting or copying (making allocated
> objects contiguous) has the side effect of making free space contiguous
> and makes allocation much simpler.

For fixed-sized objects like cons cells in their own zone, allocation is
quite simple. Locality of reference and paging optimization are better with
copying, of course.

>...
> At the moment, that sounds like a reasonable short-term approach (don't
> eliminate address-based hashing, but don't use it where alternatives
> exist and let the hash table code concentrate on the case where it's
> not involved.)
> 
> Here's a (half-thought-out) idea: suppose that "support" for
> address-based hashing was an attribute of the hash table - set via an
> argument to MAKE-HASH-TABLE and not subsequently changeable.  If an
> address-based key (something for which no other good hash
> transformation exists) was used in a hash table that didn't provide
> such support, lookup would devolve into linear search : it would still
> "work", but would be grossly inefficient.  On the other hand, when
> keys that had good hashing functions were used in such a table, the
> code could safely ignore all of the issues related to address-based
> hashing.

Linear scanning is "grossly inefficient" only for large hash tables. Did you
see what size the tables of conses you found were?

> 
> There's room for argument as to what the default value of :ADDRESS-BASED
> should be, but if it was in effect things would work as they do now:
> there'd be some significant paranoia and locking involved when such a
> hash table was accessed, but the lookup should still be efficient and
> the overhead might be less significant.
> 
> The difference between this scheme and the current one (where there is
> a bit that indicates whether an address-based key might be present) is
> that you need some degree of exclusive access to the hash table (locking)
> before you can even trust that bit not to change out from under you.
> 
> If :ADDRESS-BASED defaulted to T, things would be no worse than they are
> now; if it was appropriate to say :ADDRESS-BASED NIL explicitly, things
> could be made a lot better in that case.  (The :ADDRESS-BASED T case could
> be made better than it is, but it will always cost more to deal with that
> than to not do so.)

Not knowing how the internals work, this may be naive, but can you change
the type of a hash table if someone puts an address-based object into it?
Let hash tables start life as not-address-based, but morph them as needed.
They'd never revert.

That would keep things simpler for the programmer (me, not you). One could
provide hints, of course.

 - Stoney


_______________________________________________
Openmcl-devel mailing list
Openmcl-devel at clozure.com
http://clozure.com/cgi-bin/mailman/listinfo/openmcl-devel



More information about the Openmcl-devel mailing list