[Openmcl-devel] Hash Table anomaly -- hash-table-size decreases - wondering how this can happen
Raymond Wiker
rwiker at gmail.com
Mon Jan 5 11:26:15 PST 2015
To me, it looks like a bug in ccl; it seems that clrhash just sets keys/values to the “unbound” value, and inserting new elements somehow fails to recognize the unbound elements as free for (re)use. Then, the table is resized when the total number of insertions since the last resize (even across calls to clrhash) exceeds the table size, but the new size is computed from the number of entries actually in use.
Test case:
;;; ===================================
(defparameter *advice-active-p* nil)
(advise ccl::compute-hash-size (when *advice-active-p*
(format t "Args: ~{~a~^, ~}~%" arglist))
:when :before :name :print-args)
(defun testit (size ratio)
(let ((h (make-hash-table :size size)))
(format t "1:~%")
(dotimes (i (floor (* (hash-table-size h) ratio)))
(setf (gethash i h) i))
(format t "2:~%")
(clrhash h)
(format t "3:~%")
(dotimes (i (floor (* (hash-table-size h) ratio)))
(setf (gethash i h) i)
(when (= i 0)
(format t "4:~%"))
(when (= i 1)
(format t "5:~%")))
(format t "6:~%")
(hash-table-size h)))
#||
(let ((*advice-active-p* t))
(testit 100000 0.5))
(let ((*advice-active-p* t))
(testit 100000 0.49))
||#
;;; ===================================
I get the output
Args: 99999, 1, 1.1764705
1:
2:
3:
4:
5:
Args: 49998, 1.5, 1.1764705
6:
Final size: 75001
and
Args: 99999, 1, 1.1764705
1:
2:
3:
4:
5:
6:
Final size: 100005
> On 05 Jan 2015, at 19:21 , Gail Zacharias <gz at clozure.com> wrote:
>
> In CCL hash tables do not get resized due to overall memory pressure, something else is going on.
>
> As Gary Byers mentioned earlier, the resizing behavior of hash tables is controlled by :REHASH-THRESHOLD and :REHASH-SIZE. The default values are 0.85 and 1.5. Specifying a larger :rehash-size, e.g. :rehash-size 5.0, would cut down the cost of regrowing the table, although it wouldn't explain why it gets shrunk in the first place.
>
> Does the problem still arise if you create your hash table with :LOCK-FREE NIL?
>
> I believe there is a way to get GC to run a user hook, though I don't remember how off hand. However, if you are sure that you have a non-weak equalp hash table whose only keys are byte vectors, it's very unlikely that the problem has anything to do with the gc, so this wouldn't be the first thing I'd look at.
>
> You can catch when hash tables get resized by advising ccl::compute-hash-size.
>
>
>
> On Jan 5, 2015, at 11:03 AM, Glenn Iba <giba at alum.mit.edu> wrote:
>
>> Hi Tim,
>>
>> Yes, for me it's a rather serious problem in terms of search performance.
>> If I know my hash-table will need to hold 50 million positions, then there is a huge
>> performance penalty if the hash-table has to grow from a small size.
>> As the number of positions becomes large, the cost to grow the hash-table and
>> rehash all the positions becomes significant. Presumably it is growing many times to
>> get back to a large size.
>>
>> Example:
>>
>> With a large hash-table, it takes my search roughly 1/2 hour to compute a generation
>> (the hash-table is basically used as a set to eliminate duplicate positions).
>> I do a CLRHASH in order to re-use the hash-table to accumulate the next generation
>> (this is a breadth-first search).
>> When the hash-table shrinks after a CLRHASH, it takes 12 hours to compute the next
>> generation (which is only slightly larger than the previous one).
>>
>> I agree that profiling is important.
>> Do you (or anyone else listening) know how to get code to generate alerts:
>> 1. Every time the hash-table grows
>> 2. Every time GC is called (or is there a way to turn off automatic GC and call it either manually or explicitly in my code?)
>>
>> Thanks for looking at this with me,
>> --Glenn
>>
>>
>>
>>
>> On Mon, Jan 5, 2015 at 6:43 AM, Tim Bradshaw <tfeb at me.com> wrote:
>> On 5 Jan 2015, at 06:05, Glenn Iba <giba at alum.mit.edu> wrote:
>>
>>> Meanwhile, for my search purposes:
>>> Is there any way in CCL to specify a Minimum size for a hash-table??
>>> It seems pretty annoying (to put it mildly) that rehashing might reduce the size of the table.
>>
>> I hate to ask this but: is this a problem? In particular is there an observed performance or functionality problem caused by this? My experience with things like this is that the performance characteristics are usually hairy and hard to understand, and that, usually, the people doing the implementation have understood these a lot better than I do, particularly with regard to GC. So unless I've profiled and found that there is actually a performance problem, or the application has exploding memory use or something, I never worry.
>>
>> _______________________________________________
>> Openmcl-devel mailing list
>> Openmcl-devel at clozure.com
>> https://lists.clozure.com/mailman/listinfo/openmcl-devel
>
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> https://lists.clozure.com/mailman/listinfo/openmcl-devel
More information about the Openmcl-devel
mailing list