<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">It looks like repeatedly clrhashing the table may cause it to be rehashed down to a smaller size. The following code snippet replicates what you’re seeing (I think): <div class=""><br class=""></div><div class=""><div class="">(let ((h (make-hash-table :size 100000)))</div><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span> (dotimes (i (truncate (hash-table-size h) 2))</div><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span> (setf (gethash i h) i))</div><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span> (clrhash h)</div><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span> (dotimes (i (truncate (hash-table-size h) 2))</div><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span> (setf (gethash i h) i))</div><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span> (hash-table-size h))</div></div><div class=""><br class=""></div><div class="">In ccl, this returns 75001. Lispworks 6.1 32-bit returns 100003, while sbcl returns 100000 (all on Mac OS X). This may or may not be a bug in ccl; no doubt people better qualified to judge will provide an answer to that.</div><div class=""><br class=""></div><div class="">It may be better to simply allocate a new hash table rather than using clrhash.</div><div class=""><br class=""><div class=""><div><blockquote type="cite" class=""><div class="">On 03 Jan 2015, at 15:10 , Glenn Iba <<a href="mailto:giba@alum.mit.edu" class="">giba@alum.mit.edu</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class="">Raymond,<div class=""><br class=""><div class=""> Thanks for the quick response. Is it definitely the case, then, that a GC can trigger rehashing?<div class="">Rehashing a large hash table is potentially expensive -- and I wanted to avoid it.</div><div class="">The reason I allocate a large hash-table to begin with is to avoid the rehashing incurred</div><div class="">due to growing the hash-table. It takes my search an order of magnitude more time to</div><div class="">compute a generation if the hash-table starts small and grows repeatedly.</div><div class="">After my hash-tables "shrink" in size, the compute time for a generation jumps from 1/2 hour</div><div class="">to 15 hours.</div><div class=""><br class=""></div><div class="">Is there anyway to specify a minimum size for a hash-table? or to protect it against </div><div class="">re-hashing during GC?</div><div class=""><br class=""></div><div class="">One idea I had was to allocate a new "really large" hash-table for each generation.</div><div class="">This would be instead of using CLRHASH and re-using the original large hash-table.</div><div class="">But this wouldn't help if GC causes it to shrink anyway. My goal is to have a</div><div class="">large hash-table allocated only once, and have it stay large so I can reuse it.</div><div class=""><br class=""></div><div class="">Thanks for any suggestions,</div></div></div><div class="">--Glenn</div><div class=""><br class=""></div></div><div class="gmail_extra"><br class=""><div class="gmail_quote">On Sat, Jan 3, 2015 at 3:26 AM, Raymond Wiker <span dir="ltr" class=""><<a href="mailto:rwiker@gmail.com" target="_blank" class="">rwiker@gmail.com</a>></span> wrote:<br class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">My understanding (as far as it goes) is that hash-table-size is just a hint to the runtime of the expected number of keys in the hash table; it is used when creating a hash table and can later be used to create other hash tables of similar size. hash-table-count, on the other hand, is the actual, current number of keys in the hash table. I would expect hash-table-size to increase as the number of keys in the table grows past the initial size, and it would also make sense for hash-table-size to decrease if the hash table is rehashed as part of gc.<br class="">
<div class=""><div class="h5"><br class="">
> On 03 Jan 2015, at 08:47 , Glenn Iba <<a href="mailto:giba@alum.mit.edu" class="">giba@alum.mit.edu</a>> wrote:<br class="">
><br class="">
> Call for help!<br class="">
><br class="">
> I'm doing some large searches in CCL, and have been using large hash-tables,<br class="">
> but I"m perplexed that the hash-table-size is getting mysteriously decreased.<br class="">
> Can anyone explain how this is possible?<br class="">
><br class="">
> My speculation is that I'm exhausting the heap (though I don't get any notification of this),<br class="">
> and that CCL is trying to create more heap space by shrinking my large hash-table.<br class="">
> Does this sound like it could be possible? I'd prefer to get a notification that I'm out of space.<br class="">
> Is there any way to control this?<br class="">
><br class="">
> Details:<br class="">
> I'm running CCL 1.10 on a Mac with OS X Yosemite (10.10.1), with 8GB RAM.<br class="">
> I'm creating a single large hash-table with<br class="">
> (make-hash-table :test #'equalp :size 100000000) ;; 100,000,000<br class="">
> I'm storing positions of my search space (each represented by a byte-vector of 16 unsigned-bytes)<br class="">
> in this hash-table<br class="">
> For each generation, I collect all the positions in the hash-table (to avoid duplicates).<br class="">
> I then write the generation out to a file, and do CLRHASH so I can reuse the hash-table.<br class="">
> My searches reach a point (as the generation size grows) when the hash-table-size<br class="">
> decreases dramatically (from 100,000,000 to 12,396,373) -- how is this possible?<br class="">
><br class="">
> I'd be happy to supply code, detailed traces, and whatever other info I can<br class="">
> to anyone who'd be willing to help me figure this out.<br class="">
><br class="">
> Thanks in advance!<br class="">
> --Glenn<br class="">
><br class="">
><br class="">
</div></div>> _______________________________________________<br class="">
> Openmcl-devel mailing list<br class="">
> <a href="mailto:Openmcl-devel@clozure.com" class="">Openmcl-devel@clozure.com</a><br class="">
> <a href="https://lists.clozure.com/mailman/listinfo/openmcl-devel" target="_blank" class="">https://lists.clozure.com/mailman/listinfo/openmcl-devel</a><br class="">
<br class="">
</blockquote></div><br class=""></div>
</div></blockquote></div><br class=""></div></div></body></html>