[Openmcl-devel] Hash Table anomaly -- hash-table-size decreases - wondering how this can happen

Raymond Wiker rwiker at gmail.com
Sat Jan 3 07:17:39 PST 2015


It looks like repeatedly clrhashing the table may cause it to be rehashed down to a smaller size. The following code snippet replicates what you’re seeing (I think): 

(let ((h (make-hash-table :size 100000)))
	   (dotimes (i (truncate (hash-table-size h) 2))
	     (setf (gethash i h) i))
	   (clrhash h)
	   (dotimes (i (truncate (hash-table-size h) 2))
	     (setf (gethash i h) i))
	   (hash-table-size h))

In ccl, this returns 75001. Lispworks 6.1 32-bit returns 100003, while sbcl returns 100000 (all on Mac OS X). This may or may not be a bug in ccl; no doubt people better qualified to judge will provide an answer to that.

It may be better to simply allocate a new hash table rather than using clrhash.

> On 03 Jan 2015, at 15:10 , Glenn Iba <giba at alum.mit.edu> wrote:
> 
> Raymond,
> 
>   Thanks for the quick response.   Is it definitely the case, then, that a GC can trigger rehashing?
> Rehashing a large hash table is potentially expensive -- and I wanted to avoid it.
> The reason I allocate a large hash-table to begin with is to avoid the rehashing incurred
> due to growing the hash-table.  It takes my search an order of magnitude more time to
> compute a generation if the hash-table starts small and grows repeatedly.
> After my hash-tables "shrink" in size, the compute time for a generation jumps from 1/2 hour
> to 15 hours.
> 
> Is there anyway to specify a minimum size for a hash-table?  or to protect it against 
> re-hashing during GC?
> 
> One idea I had was to allocate a new "really large" hash-table for each generation.
> This would be instead of using CLRHASH and re-using the original large hash-table.
> But this wouldn't  help if GC causes it to shrink anyway.  My goal is to have a
> large hash-table allocated only once, and have it stay large so I can reuse it.
> 
> Thanks for any suggestions,
> --Glenn
> 
> 
> On Sat, Jan 3, 2015 at 3:26 AM, Raymond Wiker <rwiker at gmail.com <mailto:rwiker at gmail.com>> wrote:
> My understanding (as far as it goes) is that hash-table-size is just a hint to the runtime of the expected number of keys in the hash table; it is used when creating a hash table and can later be used to create other hash tables of similar size. hash-table-count, on the other hand, is the actual, current number of keys in the hash table. I would expect hash-table-size to increase as the number of keys in the table grows past the initial size, and it would also make sense for hash-table-size to decrease if the hash table is rehashed as part of gc.
> 
> > On 03 Jan 2015, at 08:47 , Glenn Iba <giba at alum.mit.edu <mailto:giba at alum.mit.edu>> wrote:
> >
> > Call for help!
> >
> >   I'm doing some large searches in CCL, and have been using large hash-tables,
> > but I"m perplexed that the hash-table-size is getting mysteriously decreased.
> > Can anyone explain how this is possible?
> >
> > My speculation is that I'm exhausting the heap (though I don't get any notification of this),
> > and that CCL is trying to create more heap space by shrinking my large hash-table.
> > Does this sound like it could be possible?   I'd prefer to get a notification that I'm out of space.
> > Is there any way to control this?
> >
> > Details:
> >    I'm running CCL 1.10 on a Mac with OS X Yosemite (10.10.1), with 8GB RAM.
> >    I'm creating a single large hash-table with
> >          (make-hash-table :test #'equalp :size 100000000)   ;; 100,000,000
> >    I'm storing positions of my search space (each represented by a byte-vector of 16 unsigned-bytes)
> >        in this hash-table
> >    For each generation, I collect all the positions in the hash-table (to avoid duplicates).
> >    I then write the generation out to a file, and do CLRHASH so I can reuse the hash-table.
> >    My searches reach a point (as the generation size grows) when the hash-table-size
> >        decreases dramatically (from 100,000,000 to  12,396,373)  -- how is this possible?
> >
> > I'd be happy to supply code, detailed traces, and whatever other info I can
> >   to anyone who'd be willing to help me figure this out.
> >
> > Thanks in advance!
> > --Glenn
> >
> >
> > _______________________________________________
> > Openmcl-devel mailing list
> > Openmcl-devel at clozure.com <mailto:Openmcl-devel at clozure.com>
> > https://lists.clozure.com/mailman/listinfo/openmcl-devel <https://lists.clozure.com/mailman/listinfo/openmcl-devel>
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clozure.com/pipermail/openmcl-devel/attachments/20150103/d3c23cc7/attachment.htm>


More information about the Openmcl-devel mailing list