[Openmcl-devel] Hash Table anomaly -- hash-table-size decreases - wondering how this can happen

Ron Garret ron at flownet.com
Sat Jan 3 09:22:10 PST 2015


Both.  The Clozure folks do hang out on this list, but filing a ticket is a more reliable way to get their attention.  You can file a ticket here:

http://trac.clozure.com/ccl

But the most reliable way to get an issue like this looked at in a timely manner is to buy a support incident.  They are very reasonably priced.

rg

On Jan 3, 2015, at 9:06 AM, Glenn Iba <giba at alum.mit.edu> wrote:

> Raymond,
> 
>    Thank you for taking the time to help!  I'm new to this Clozure mailing list.
> Do the CCL maintainers monitor his list for issues?  Or do you recommend
> I create a ticket to get it looked into?
> 
> Thanks again!
> --Glenn
> 
> 
> On Sat, Jan 3, 2015 at 10:17 AM, Raymond Wiker <rwiker at gmail.com> wrote:
> It looks like repeatedly clrhashing the table may cause it to be rehashed down to a smaller size. The following code snippet replicates what you’re seeing (I think): 
> 
> (let ((h (make-hash-table :size 100000)))
> 	   (dotimes (i (truncate (hash-table-size h) 2))
> 	     (setf (gethash i h) i))
> 	   (clrhash h)
> 	   (dotimes (i (truncate (hash-table-size h) 2))
> 	     (setf (gethash i h) i))
> 	   (hash-table-size h))
> 
> In ccl, this returns 75001. Lispworks 6.1 32-bit returns 100003, while sbcl returns 100000 (all on Mac OS X). This may or may not be a bug in ccl; no doubt people better qualified to judge will provide an answer to that.
> 
> It may be better to simply allocate a new hash table rather than using clrhash.
> 
>> On 03 Jan 2015, at 15:10 , Glenn Iba <giba at alum.mit.edu> wrote:
>> 
>> Raymond,
>> 
>>   Thanks for the quick response.   Is it definitely the case, then, that a GC can trigger rehashing?
>> Rehashing a large hash table is potentially expensive -- and I wanted to avoid it.
>> The reason I allocate a large hash-table to begin with is to avoid the rehashing incurred
>> due to growing the hash-table.  It takes my search an order of magnitude more time to
>> compute a generation if the hash-table starts small and grows repeatedly.
>> After my hash-tables "shrink" in size, the compute time for a generation jumps from 1/2 hour
>> to 15 hours.
>> 
>> Is there anyway to specify a minimum size for a hash-table?  or to protect it against 
>> re-hashing during GC?
>> 
>> One idea I had was to allocate a new "really large" hash-table for each generation.
>> This would be instead of using CLRHASH and re-using the original large hash-table.
>> But this wouldn't  help if GC causes it to shrink anyway.  My goal is to have a
>> large hash-table allocated only once, and have it stay large so I can reuse it.
>> 
>> Thanks for any suggestions,
>> --Glenn
>> 
>> 
>> On Sat, Jan 3, 2015 at 3:26 AM, Raymond Wiker <rwiker at gmail.com> wrote:
>> My understanding (as far as it goes) is that hash-table-size is just a hint to the runtime of the expected number of keys in the hash table; it is used when creating a hash table and can later be used to create other hash tables of similar size. hash-table-count, on the other hand, is the actual, current number of keys in the hash table. I would expect hash-table-size to increase as the number of keys in the table grows past the initial size, and it would also make sense for hash-table-size to decrease if the hash table is rehashed as part of gc.
>> 
>> > On 03 Jan 2015, at 08:47 , Glenn Iba <giba at alum.mit.edu> wrote:
>> >
>> > Call for help!
>> >
>> >   I'm doing some large searches in CCL, and have been using large hash-tables,
>> > but I"m perplexed that the hash-table-size is getting mysteriously decreased.
>> > Can anyone explain how this is possible?
>> >
>> > My speculation is that I'm exhausting the heap (though I don't get any notification of this),
>> > and that CCL is trying to create more heap space by shrinking my large hash-table.
>> > Does this sound like it could be possible?   I'd prefer to get a notification that I'm out of space.
>> > Is there any way to control this?
>> >
>> > Details:
>> >    I'm running CCL 1.10 on a Mac with OS X Yosemite (10.10.1), with 8GB RAM.
>> >    I'm creating a single large hash-table with
>> >          (make-hash-table :test #'equalp :size 100000000)   ;; 100,000,000
>> >    I'm storing positions of my search space (each represented by a byte-vector of 16 unsigned-bytes)
>> >        in this hash-table
>> >    For each generation, I collect all the positions in the hash-table (to avoid duplicates).
>> >    I then write the generation out to a file, and do CLRHASH so I can reuse the hash-table.
>> >    My searches reach a point (as the generation size grows) when the hash-table-size
>> >        decreases dramatically (from 100,000,000 to  12,396,373)  -- how is this possible?
>> >
>> > I'd be happy to supply code, detailed traces, and whatever other info I can
>> >   to anyone who'd be willing to help me figure this out.
>> >
>> > Thanks in advance!
>> > --Glenn
>> >
>> >
>> > _______________________________________________
>> > Openmcl-devel mailing list
>> > Openmcl-devel at clozure.com
>> > https://lists.clozure.com/mailman/listinfo/openmcl-devel
>> 
>> 
> 
> 
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> https://lists.clozure.com/mailman/listinfo/openmcl-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clozure.com/pipermail/openmcl-devel/attachments/20150103/5dcfbc0e/attachment.htm>


More information about the Openmcl-devel mailing list