[Openmcl-devel] Hash Table anomaly -- hash-table-size decreases - wondering how this can happen
Ron Garret
ron at flownet.com
Sat Jan 3 09:22:10 PST 2015
Both. The Clozure folks do hang out on this list, but filing a ticket is a more reliable way to get their attention. You can file a ticket here:
http://trac.clozure.com/ccl
But the most reliable way to get an issue like this looked at in a timely manner is to buy a support incident. They are very reasonably priced.
rg
On Jan 3, 2015, at 9:06 AM, Glenn Iba <giba at alum.mit.edu> wrote:
> Raymond,
>
> Thank you for taking the time to help! I'm new to this Clozure mailing list.
> Do the CCL maintainers monitor his list for issues? Or do you recommend
> I create a ticket to get it looked into?
>
> Thanks again!
> --Glenn
>
>
> On Sat, Jan 3, 2015 at 10:17 AM, Raymond Wiker <rwiker at gmail.com> wrote:
> It looks like repeatedly clrhashing the table may cause it to be rehashed down to a smaller size. The following code snippet replicates what you’re seeing (I think):
>
> (let ((h (make-hash-table :size 100000)))
> (dotimes (i (truncate (hash-table-size h) 2))
> (setf (gethash i h) i))
> (clrhash h)
> (dotimes (i (truncate (hash-table-size h) 2))
> (setf (gethash i h) i))
> (hash-table-size h))
>
> In ccl, this returns 75001. Lispworks 6.1 32-bit returns 100003, while sbcl returns 100000 (all on Mac OS X). This may or may not be a bug in ccl; no doubt people better qualified to judge will provide an answer to that.
>
> It may be better to simply allocate a new hash table rather than using clrhash.
>
>> On 03 Jan 2015, at 15:10 , Glenn Iba <giba at alum.mit.edu> wrote:
>>
>> Raymond,
>>
>> Thanks for the quick response. Is it definitely the case, then, that a GC can trigger rehashing?
>> Rehashing a large hash table is potentially expensive -- and I wanted to avoid it.
>> The reason I allocate a large hash-table to begin with is to avoid the rehashing incurred
>> due to growing the hash-table. It takes my search an order of magnitude more time to
>> compute a generation if the hash-table starts small and grows repeatedly.
>> After my hash-tables "shrink" in size, the compute time for a generation jumps from 1/2 hour
>> to 15 hours.
>>
>> Is there anyway to specify a minimum size for a hash-table? or to protect it against
>> re-hashing during GC?
>>
>> One idea I had was to allocate a new "really large" hash-table for each generation.
>> This would be instead of using CLRHASH and re-using the original large hash-table.
>> But this wouldn't help if GC causes it to shrink anyway. My goal is to have a
>> large hash-table allocated only once, and have it stay large so I can reuse it.
>>
>> Thanks for any suggestions,
>> --Glenn
>>
>>
>> On Sat, Jan 3, 2015 at 3:26 AM, Raymond Wiker <rwiker at gmail.com> wrote:
>> My understanding (as far as it goes) is that hash-table-size is just a hint to the runtime of the expected number of keys in the hash table; it is used when creating a hash table and can later be used to create other hash tables of similar size. hash-table-count, on the other hand, is the actual, current number of keys in the hash table. I would expect hash-table-size to increase as the number of keys in the table grows past the initial size, and it would also make sense for hash-table-size to decrease if the hash table is rehashed as part of gc.
>>
>> > On 03 Jan 2015, at 08:47 , Glenn Iba <giba at alum.mit.edu> wrote:
>> >
>> > Call for help!
>> >
>> > I'm doing some large searches in CCL, and have been using large hash-tables,
>> > but I"m perplexed that the hash-table-size is getting mysteriously decreased.
>> > Can anyone explain how this is possible?
>> >
>> > My speculation is that I'm exhausting the heap (though I don't get any notification of this),
>> > and that CCL is trying to create more heap space by shrinking my large hash-table.
>> > Does this sound like it could be possible? I'd prefer to get a notification that I'm out of space.
>> > Is there any way to control this?
>> >
>> > Details:
>> > I'm running CCL 1.10 on a Mac with OS X Yosemite (10.10.1), with 8GB RAM.
>> > I'm creating a single large hash-table with
>> > (make-hash-table :test #'equalp :size 100000000) ;; 100,000,000
>> > I'm storing positions of my search space (each represented by a byte-vector of 16 unsigned-bytes)
>> > in this hash-table
>> > For each generation, I collect all the positions in the hash-table (to avoid duplicates).
>> > I then write the generation out to a file, and do CLRHASH so I can reuse the hash-table.
>> > My searches reach a point (as the generation size grows) when the hash-table-size
>> > decreases dramatically (from 100,000,000 to 12,396,373) -- how is this possible?
>> >
>> > I'd be happy to supply code, detailed traces, and whatever other info I can
>> > to anyone who'd be willing to help me figure this out.
>> >
>> > Thanks in advance!
>> > --Glenn
>> >
>> >
>> > _______________________________________________
>> > Openmcl-devel mailing list
>> > Openmcl-devel at clozure.com
>> > https://lists.clozure.com/mailman/listinfo/openmcl-devel
>>
>>
>
>
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> https://lists.clozure.com/mailman/listinfo/openmcl-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clozure.com/pipermail/openmcl-devel/attachments/20150103/5dcfbc0e/attachment.htm>
More information about the Openmcl-devel
mailing list