[Openmcl-devel] Hash Table anomaly -- hash-table-size decreases - wondering how this can happen

Glenn Iba giba at alum.mit.edu
Sat Jan 3 09:06:08 PST 2015


Raymond,

   Thank you for taking the time to help!  I'm new to this Clozure mailing
list.
Do the CCL maintainers monitor his list for issues?  Or do you recommend
I create a ticket to get it looked into?

Thanks again!
--Glenn


On Sat, Jan 3, 2015 at 10:17 AM, Raymond Wiker <rwiker at gmail.com> wrote:

> It looks like repeatedly clrhashing the table may cause it to be rehashed
> down to a smaller size. The following code snippet replicates what you’re
> seeing (I think):
>
> (let ((h (make-hash-table :size 100000)))
>    (dotimes (i (truncate (hash-table-size h) 2))
>      (setf (gethash i h) i))
>    (clrhash h)
>    (dotimes (i (truncate (hash-table-size h) 2))
>      (setf (gethash i h) i))
>    (hash-table-size h))
>
> In ccl, this returns 75001. Lispworks 6.1 32-bit returns 100003, while
> sbcl returns 100000 (all on Mac OS X). This may or may not be a bug in ccl;
> no doubt people better qualified to judge will provide an answer to that.
>
> It may be better to simply allocate a new hash table rather than using
> clrhash.
>
> On 03 Jan 2015, at 15:10 , Glenn Iba <giba at alum.mit.edu> wrote:
>
> Raymond,
>
>   Thanks for the quick response.   Is it definitely the case, then, that a
> GC can trigger rehashing?
> Rehashing a large hash table is potentially expensive -- and I wanted to
> avoid it.
> The reason I allocate a large hash-table to begin with is to avoid the
> rehashing incurred
> due to growing the hash-table.  It takes my search an order of magnitude
> more time to
> compute a generation if the hash-table starts small and grows repeatedly.
> After my hash-tables "shrink" in size, the compute time for a generation
> jumps from 1/2 hour
> to 15 hours.
>
> Is there anyway to specify a minimum size for a hash-table?  or to protect
> it against
> re-hashing during GC?
>
> One idea I had was to allocate a new "really large" hash-table for each
> generation.
> This would be instead of using CLRHASH and re-using the original large
> hash-table.
> But this wouldn't  help if GC causes it to shrink anyway.  My goal is to
> have a
> large hash-table allocated only once, and have it stay large so I can
> reuse it.
>
> Thanks for any suggestions,
> --Glenn
>
>
> On Sat, Jan 3, 2015 at 3:26 AM, Raymond Wiker <rwiker at gmail.com> wrote:
>
>> My understanding (as far as it goes) is that hash-table-size is just a
>> hint to the runtime of the expected number of keys in the hash table; it is
>> used when creating a hash table and can later be used to create other hash
>> tables of similar size. hash-table-count, on the other hand, is the actual,
>> current number of keys in the hash table. I would expect hash-table-size to
>> increase as the number of keys in the table grows past the initial size,
>> and it would also make sense for hash-table-size to decrease if the hash
>> table is rehashed as part of gc.
>>
>> > On 03 Jan 2015, at 08:47 , Glenn Iba <giba at alum.mit.edu> wrote:
>> >
>> > Call for help!
>> >
>> >   I'm doing some large searches in CCL, and have been using large
>> hash-tables,
>> > but I"m perplexed that the hash-table-size is getting mysteriously
>> decreased.
>> > Can anyone explain how this is possible?
>> >
>> > My speculation is that I'm exhausting the heap (though I don't get any
>> notification of this),
>> > and that CCL is trying to create more heap space by shrinking my large
>> hash-table.
>> > Does this sound like it could be possible?   I'd prefer to get a
>> notification that I'm out of space.
>> > Is there any way to control this?
>> >
>> > Details:
>> >    I'm running CCL 1.10 on a Mac with OS X Yosemite (10.10.1), with 8GB
>> RAM.
>> >    I'm creating a single large hash-table with
>> >          (make-hash-table :test #'equalp :size 100000000)   ;;
>> 100,000,000
>> >    I'm storing positions of my search space (each represented by a
>> byte-vector of 16 unsigned-bytes)
>> >        in this hash-table
>> >    For each generation, I collect all the positions in the hash-table
>> (to avoid duplicates).
>> >    I then write the generation out to a file, and do CLRHASH so I can
>> reuse the hash-table.
>> >    My searches reach a point (as the generation size grows) when the
>> hash-table-size
>> >        decreases dramatically (from 100,000,000 to  12,396,373)  -- how
>> is this possible?
>> >
>> > I'd be happy to supply code, detailed traces, and whatever other info I
>> can
>> >   to anyone who'd be willing to help me figure this out.
>> >
>> > Thanks in advance!
>> > --Glenn
>> >
>> >
>> > _______________________________________________
>> > Openmcl-devel mailing list
>> > Openmcl-devel at clozure.com
>> > https://lists.clozure.com/mailman/listinfo/openmcl-devel
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clozure.com/pipermail/openmcl-devel/attachments/20150103/5e40bc0a/attachment.htm>


More information about the Openmcl-devel mailing list