[Openmcl-devel] Hash Table anomaly -- hash-table-size decreases - wondering how this can happen

Gail Zacharias gz at clozure.com
Mon Jan 5 11:41:20 PST 2015


I've tracked down the bug in CCL (http://trac.clozure.com/ccl/ticket/1258
- it's pretty much the problem Madhu pointed out early in this thread)  and
will try to fix it later today.



On Mon, Jan 5, 2015 at 1:21 PM, Gail Zacharias <gz at clozure.com> wrote:
>
> In CCL hash tables do not get resized due to overall memory pressure,
something else is going on.
>
> As Gary Byers mentioned earlier, the resizing behavior of hash tables is
controlled by :REHASH-THRESHOLD and :REHASH-SIZE.  The default values are
0.85 and 1.5.   Specifying a larger :rehash-size, e.g. :rehash-size 5.0,
would cut down the cost of regrowing the table, although it wouldn't
explain why it gets shrunk in the first place.
>
> Does the problem still arise if you create your hash table with
:LOCK-FREE NIL?
>
> I believe there is a way to get GC to run a user hook, though I don't
remember how off hand.  However, if you are sure that you have a non-weak
equalp hash table whose only keys are byte vectors, it's very unlikely that
the problem has anything to do with the gc, so this wouldn't be the first
thing I'd look at.
>
> You can catch when hash tables get resized by advising
ccl::compute-hash-size.
>
>
>
> On Jan 5, 2015, at 11:03 AM, Glenn Iba <giba at alum.mit.edu> wrote:
>
> > Hi Tim,
> >
> > Yes, for me it's a rather serious problem in terms of search
performance.
> > If I know my hash-table will need to hold  50 million positions, then
there is a huge
> > performance penalty if the hash-table has to grow from a small size.
> > As the number of positions becomes large, the cost to grow the
hash-table and
> > rehash all the positions becomes significant.  Presumably it is growing
many times to
> > get back to a large size.
> >
> > Example:
> >
> >  With a large hash-table, it takes my search roughly 1/2 hour to
compute a generation
> > (the hash-table is basically used as a set to eliminate duplicate
positions).
> > I do a CLRHASH in order to re-use the hash-table to accumulate the next
generation
> > (this is a breadth-first search).
> > When the hash-table shrinks after a CLRHASH, it takes 12 hours to
compute the next
> > generation (which is only slightly larger than the previous one).
> >
> > I agree that profiling is important.
> > Do you (or anyone else listening) know how to get code to generate
alerts:
> >  1.  Every time the hash-table grows
> >  2.  Every time GC is called    (or is there a way to turn off
automatic GC and call it either manually or explicitly in my code?)
> >
> > Thanks for looking at this with me,
> > --Glenn
> >
> >
> >
> >
> > On Mon, Jan 5, 2015 at 6:43 AM, Tim Bradshaw <tfeb at me.com> wrote:
> > On 5 Jan 2015, at 06:05, Glenn Iba <giba at alum.mit.edu> wrote:
> >
> >> Meanwhile, for my search purposes:
> >> Is there any way in CCL to specify a Minimum size for a hash-table??
> >> It seems pretty annoying (to put it mildly) that rehashing might
reduce the size of the table.
> >
> > I hate to ask this but: is this a problem?  In particular is there an
observed performance or functionality problem caused by this?  My
experience with things like this is that the performance characteristics
are usually hairy and hard to understand, and that, usually, the people
doing the implementation have understood these a lot better than I do,
particularly with regard to GC.  So unless I've profiled and found that
there is actually a performance problem, or the application has exploding
memory use or something, I never worry.
> >
> > _______________________________________________
> > Openmcl-devel mailing list
> > Openmcl-devel at clozure.com
> > https://lists.clozure.com/mailman/listinfo/openmcl-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clozure.com/pipermail/openmcl-devel/attachments/20150105/cc909e28/attachment.htm>


More information about the Openmcl-devel mailing list