[Openmcl-devel] Address-based hashing considered harmful. Maybe.

Gary Byers gb at clozure.com
Mon Mar 3 08:18:10 PST 2003

On Mon, 3 Mar 2003, Stonewall Ballard wrote:

> On 3/3/03 1:16 AM, "Gary Byers" <gb at clozure.com> wrote:
> >
> > There's room for argument as to what the default value of :ADDRESS-BASED
> > should be, but if it was in effect things would work as they do now:
> > there'd be some significant paranoia and locking involved when such a
> > hash table was accessed, but the lookup should still be efficient and
> > the overhead might be less significant.
> >
> > The difference between this scheme and the current one (where there is
> > a bit that indicates whether an address-based key might be present) is
> > that you need some degree of exclusive access to the hash table (locking)
> > before you can even trust that bit not to change out from under you.
> >
> > If :ADDRESS-BASED defaulted to T, things would be no worse than they are
> > now; if it was appropriate to say :ADDRESS-BASED NIL explicitly, things
> > could be made a lot better in that case.  (The :ADDRESS-BASED T case could
> > be made better than it is, but it will always cost more to deal with that
> > than to not do so.)
> Not knowing how the internals work, this may be naive, but can you change
> the type of a hash table if someone puts an address-based object into it?
> Let hash tables start life as not-address-based, but morph them as needed.
> They'd never revert.

If address-based hashing weren't used, it would generally be possible to
allow concurrent access (or at least a high degree of concurrent access)
to hash tables: if two threads are trying to add the same key (whether
with the same or different values) at the same, it doesn't matter which
of them succeeds; it does matter that transactions happen atomically (e.g.,
that the key appears exactly once, with one or the other value associated
with it.)

There's currently a bit in the hash table that indicates whether or not
an address-based key has been used.  If the bit's set, then things get
more complicated: the needs to be inhibited and the hash table may need
to be rehashed because of key movement.  The latter case effectively
requires exclusive access to the hash table.

If the bit were clear on entry to a hashing function, it'd be tempting
to take the simple path (no unnecessary locking, no GC inhibition.)  Since
another thread could cause the bit to be set (an instant after the first
thread had found it to be clear), it's necessary to enforce at least some
degree of locking in order to keep the bit from changing out from under
you ...

Locking and unlocking are both very expensive in 0.14-030225; I'm trying
to make them less so, but they'll never really be cheap.  It's not desirable
to inhibit the GC any more often or any longer than necessary.  I don't
think that it's safe to avoid these things unless the programmer has asserted
that address-based hashing is not a factor.

> That would keep things simpler for the programmer (me, not you). One could
> provide hints, of course.

I think that if :ADDRESS-BASED defaulted to T we'd have the status quo (or,
more accurately, no worse than the status quo and hopefully better as
locking overhead improves.)  If one explicitly provided the hint (via
(MAKE-HASH-TABLE :ADDRESS-BASED NIL ...)) where it was appropriate to do so,
one could reasonably expect better performance in those cases.  Some of
those cases involve things like the hash table that maps class-names to
CLOS classes and other things that are fairly heavily used.

>  - Stoney

Openmcl-devel mailing list
Openmcl-devel at clozure.com

More information about the Openmcl-devel mailing list