[Openmcl-devel] Hash Table anomaly -- hash-table-size decreases - wondering how this can happen

Gail Zacharias gz at clozure.com
Tue Jan 6 09:18:12 PST 2015


Lock-free hash tables use an alternate algorithm to minimize the
performance impact of thread-safety.   They avoid the expense of locking
during gethash at the cost of making rehashing more expensive (puthash
performance is basically unaffected).  Aside from performance, hash tables
should behave the same whether you specify :lock-free or not -- it's a bug
if they don't give the same results.

The lock-free algorithm has been the default in CCL since 1.3, and so it's
certainly possible that some non-lock-free code has bit rotted. We can look
into it if you come up with some kind of a test case.

Alternatively you can follow Tim's suggestion -- don't reuse the same hash
table for a new generation, just make a new one. This will avoid both the
lock-free clrhash bug and any issues in locking hash tables.


On Mon, Jan 5, 2015 at 11:52 PM, Glenn Iba <giba at alum.mit.edu> wrote:

> Hi Gail,
>
>    I have no clue what (make-hash-table :test #'equalp :lock-free nil)
> is doing,
> but it doesn't work at all -- my searches fail miserably.  The generation
> sizes are all wrong (much smaller than the correct sizes),
> and no solution is ever found.
>
>    Without the :lock-free nil   the hash-tables work fine (correctly)
> except for the shrinkage problem after CLRHASH.
>
> Is there some reason that :lock-free nil  won't work with :test #'equalp
>  ??
>
> Is there a simple explanation of what  the :lock-free is for and what the
> expected behavior is with :lock-free nil ??
>
> Could the :lock-free nil be interacting badly with the garbage collector?
> I'm at a total loss to imagine what's going so badly wrong  :(
>
> Thanks,
> --Glenn
>
>
> On Mon, Jan 5, 2015 at 5:15 PM, Glenn Iba <giba at alum.mit.edu> wrote:
>
>> Gail,
>>
>>   Thanks for tracking down the bug!   Also,  thanks to everyone else who
>> responded with helpful input!
>>
>> Looks like I can get by for now with (make-hash-table :lock-free nil)
>>
>> BTW - is there a way to search the Clozure documentation?   I tried to
>> find :lock-free documentation
>> but came up empty.
>>
>> Thanks again to all!
>>
>> --Glenn
>>
>>
>>
>> On Mon, Jan 5, 2015 at 2:41 PM, Gail Zacharias <gz at clozure.com> wrote:
>>
>>> I've tracked down the bug in CCL (
>>> http://trac.clozure.com/ccl/ticket/1258 - it's pretty much the problem
>>> Madhu pointed out early in this thread)  and will try to fix it later today.
>>>
>>>
>>>
>>> On Mon, Jan 5, 2015 at 1:21 PM, Gail Zacharias <gz at clozure.com> wrote:
>>> >
>>> > In CCL hash tables do not get resized due to overall memory pressure,
>>> something else is going on.
>>> >
>>> > As Gary Byers mentioned earlier, the resizing behavior of hash tables
>>> is controlled by :REHASH-THRESHOLD and :REHASH-SIZE.  The default values
>>> are 0.85 and 1.5.   Specifying a larger :rehash-size, e.g. :rehash-size
>>> 5.0, would cut down the cost of regrowing the table, although it wouldn't
>>> explain why it gets shrunk in the first place.
>>> >
>>> > Does the problem still arise if you create your hash table with
>>> :LOCK-FREE NIL?
>>> >
>>> > I believe there is a way to get GC to run a user hook, though I don't
>>> remember how off hand.  However, if you are sure that you have a non-weak
>>> equalp hash table whose only keys are byte vectors, it's very unlikely that
>>> the problem has anything to do with the gc, so this wouldn't be the first
>>> thing I'd look at.
>>> >
>>> > You can catch when hash tables get resized by advising
>>> ccl::compute-hash-size.
>>> >
>>> >
>>> >
>>> > On Jan 5, 2015, at 11:03 AM, Glenn Iba <giba at alum.mit.edu> wrote:
>>> >
>>> > > Hi Tim,
>>> > >
>>> > > Yes, for me it's a rather serious problem in terms of search
>>> performance.
>>> > > If I know my hash-table will need to hold  50 million positions,
>>> then there is a huge
>>> > > performance penalty if the hash-table has to grow from a small size.
>>> > > As the number of positions becomes large, the cost to grow the
>>> hash-table and
>>> > > rehash all the positions becomes significant.  Presumably it is
>>> growing many times to
>>> > > get back to a large size.
>>> > >
>>> > > Example:
>>> > >
>>> > >  With a large hash-table, it takes my search roughly 1/2 hour to
>>> compute a generation
>>> > > (the hash-table is basically used as a set to eliminate duplicate
>>> positions).
>>> > > I do a CLRHASH in order to re-use the hash-table to accumulate the
>>> next generation
>>> > > (this is a breadth-first search).
>>> > > When the hash-table shrinks after a CLRHASH, it takes 12 hours to
>>> compute the next
>>> > > generation (which is only slightly larger than the previous one).
>>> > >
>>> > > I agree that profiling is important.
>>> > > Do you (or anyone else listening) know how to get code to generate
>>> alerts:
>>> > >  1.  Every time the hash-table grows
>>> > >  2.  Every time GC is called    (or is there a way to turn off
>>> automatic GC and call it either manually or explicitly in my code?)
>>> > >
>>> > > Thanks for looking at this with me,
>>> > > --Glenn
>>> > >
>>> > >
>>> > >
>>> > >
>>> > > On Mon, Jan 5, 2015 at 6:43 AM, Tim Bradshaw <tfeb at me.com> wrote:
>>> > > On 5 Jan 2015, at 06:05, Glenn Iba <giba at alum.mit.edu> wrote:
>>> > >
>>> > >> Meanwhile, for my search purposes:
>>> > >> Is there any way in CCL to specify a Minimum size for a hash-table??
>>> > >> It seems pretty annoying (to put it mildly) that rehashing might
>>> reduce the size of the table.
>>> > >
>>> > > I hate to ask this but: is this a problem?  In particular is there
>>> an observed performance or functionality problem caused by this?  My
>>> experience with things like this is that the performance characteristics
>>> are usually hairy and hard to understand, and that, usually, the people
>>> doing the implementation have understood these a lot better than I do,
>>> particularly with regard to GC.  So unless I've profiled and found that
>>> there is actually a performance problem, or the application has exploding
>>> memory use or something, I never worry.
>>> > >
>>> > > _______________________________________________
>>> > > Openmcl-devel mailing list
>>> > > Openmcl-devel at clozure.com
>>> > > https://lists.clozure.com/mailman/listinfo/openmcl-devel
>>> >
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clozure.com/pipermail/openmcl-devel/attachments/20150106/d5437d4f/attachment.htm>


More information about the Openmcl-devel mailing list