[Openmcl-devel] Address-based hashing considered harmful. Maybe.

Tim Moore moore at bricoworks.com
Mon Mar 3 10:39:17 PST 2003


You might consider another axis of optimization: thread-safe vs. not.  It
wouldn't surprise me to find that many uses of hash tables are
synchronized at a higher level than the hash table accesses themselves,
e.g. the hash table is only referenced from the stack of one thread, or
the application has locked whatever object the hash table is part of.  But
I suppose it's true that you always need to synchronize with GC with the
address-based scheme?  Can GC happen between any two instructions in
OpenMCL now?

It would be too bad to make eq hashing of conses perform worse than
hashing other structures, if only because it violates user expectations.

Tim

On Mon, 3 Mar 2003, Gary Byers wrote:

> 
> 
> On Mon, 3 Mar 2003, Stonewall Ballard wrote:
> 
> > On 3/3/03 1:16 AM, "Gary Byers" <gb at clozure.com> wrote:
> >
> > >
> > > There's room for argument as to what the default value of :ADDRESS-BASED
> > > should be, but if it was in effect things would work as they do now:
> > > there'd be some significant paranoia and locking involved when such a
> > > hash table was accessed, but the lookup should still be efficient and
> > > the overhead might be less significant.
> > >
> > > The difference between this scheme and the current one (where there is
> > > a bit that indicates whether an address-based key might be present) is
> > > that you need some degree of exclusive access to the hash table (locking)
> > > before you can even trust that bit not to change out from under you.
> > >
> > > If :ADDRESS-BASED defaulted to T, things would be no worse than they are
> > > now; if it was appropriate to say :ADDRESS-BASED NIL explicitly, things
> > > could be made a lot better in that case.  (The :ADDRESS-BASED T case could
> > > be made better than it is, but it will always cost more to deal with that
> > > than to not do so.)
> >
> > Not knowing how the internals work, this may be naive, but can you change
> > the type of a hash table if someone puts an address-based object into it?
> > Let hash tables start life as not-address-based, but morph them as needed.
> > They'd never revert.
> 
> If address-based hashing weren't used, it would generally be possible to
> allow concurrent access (or at least a high degree of concurrent access)
> to hash tables: if two threads are trying to add the same key (whether
> with the same or different values) at the same, it doesn't matter which
> of them succeeds; it does matter that transactions happen atomically (e.g.,
> that the key appears exactly once, with one or the other value associated
> with it.)
> 
> There's currently a bit in the hash table that indicates whether or not
> an address-based key has been used.  If the bit's set, then things get
> more complicated: the needs to be inhibited and the hash table may need
> to be rehashed because of key movement.  The latter case effectively
> requires exclusive access to the hash table.
> 
> If the bit were clear on entry to a hashing function, it'd be tempting
> to take the simple path (no unnecessary locking, no GC inhibition.)  Since
> another thread could cause the bit to be set (an instant after the first
> thread had found it to be clear), it's necessary to enforce at least some
> degree of locking in order to keep the bit from changing out from under
> you ...
> 
> Locking and unlocking are both very expensive in 0.14-030225; I'm trying
> to make them less so, but they'll never really be cheap.  It's not desirable
> to inhibit the GC any more often or any longer than necessary.  I don't
> think that it's safe to avoid these things unless the programmer has asserted
> that address-based hashing is not a factor.
> 
> >
> > That would keep things simpler for the programmer (me, not you). One could
> > provide hints, of course.
> 
> I think that if :ADDRESS-BASED defaulted to T we'd have the status quo (or,
> more accurately, no worse than the status quo and hopefully better as
> locking overhead improves.)  If one explicitly provided the hint (via
> (MAKE-HASH-TABLE :ADDRESS-BASED NIL ...)) where it was appropriate to do so,
> one could reasonably expect better performance in those cases.  Some of
> those cases involve things like the hash table that maps class-names to
> CLOS classes and other things that are fairly heavily used.
> 
> >
> >  - Stoney
> >
> >
> >
> 
> 
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/cgi-bin/mailman/listinfo/openmcl-devel
> 


_______________________________________________
Openmcl-devel mailing list
Openmcl-devel at clozure.com
http://clozure.com/cgi-bin/mailman/listinfo/openmcl-devel



More information about the Openmcl-devel mailing list