[Openmcl-devel] Hash Table anomaly -- hash-table-size decreases - wondering how this can happen

Glenn Iba giba at alum.mit.edu
Sun Jan 4 22:05:00 PST 2015


Hi Gary,

   Thanks for your info.  I understand (I think) about rehashing caused by
the hash-table becoming "full".
I don't think that is the case for the "hash-table shrinking" I'm observing.

I wasn't aware that GC changing addresses of keys might cause rehashing,
but I guess that makes sense.
In my searches, I'm using hash-tables with :test #'equal, and my keys are
all byte-vectors (make-array 16 :element-type '(unsigned-byte 8)).
My model of behavior for equalp hashing is that the addresses of keys
shouldn't matter.   Am I mistaken?

So as far as I can tell, neither of the cases (1 or 2) that you mention
seems to be a cause for rehashing.

What I'm suspecting is that perhaps heap space is getting exhausted, and
somehow the GC is aggressively
looking for space to reclaim, and decides to reduce the size of my
hash-table in order to get more space.
My question is whether this could really be what's happening, or if my
speculation is wrong.
NOTE: Raymond produced a code snippet that demonstrated that CLRHASH could
result in a reduced hash-table size.
This experiment shows a difference in behavior between CCL and alternatives
like LispWorks and SBCL.
Here is a reproduction of his (Raymond Wiker's) experiment (from earlier in
this thread):
***begin quote***
It looks like repeatedly clrhashing the table may cause it to be rehashed
down to a smaller size. The following code snippet replicates what you’re
seeing (I think):

(let ((h (make-hash-table :size 100000)))
   (dotimes (i (truncate (hash-table-size h) 2))
     (setf (gethash i h) i))
   (clrhash h)
   (dotimes (i (truncate (hash-table-size h) 2))
     (setf (gethash i h) i))
   (hash-table-size h))

In ccl, this returns 75001. Lispworks 6.1 32-bit returns 100003, while sbcl
returns 100000 (all on Mac OS X). This may or may not be a bug in ccl; no
doubt people better qualified to judge will provide an answer to that.

It may be better to simply allocate a new hash table rather than using
clrhash.
***end quote***

Note that this experiment uses EQL hash-tables, not EQUALP like I'm using,
but still shows evidence of hash-table shrinking.

Here's a slight modification I made to Raymond's experiment:
(let ((h (make-hash-table :size 100000)))
   (dotimes (i (truncate (hash-table-size h) 2))
     (setf (gethash i h) i))
   (format t "~%before CLRHASH, hash-table-size = ~a" (hash-table-size h))
   (clrhash h)
   (format t "~%after CLRHASH, hash-table-size = ~a" (hash-table-size h))
   (dotimes (i (truncate (hash-table-size h) 2))
     (setf (gethash i h) i))
   (hash-table-size h))
CCL results:
before CLRHASH, hash-table-size = 100005
after CLRHASH, hash-table-size = 100005
75001

LispWorks results:
before CLRHASH, hash-table-size = 100003
after CLRHASH, hash-table-size = 100003
100003

So it seems that AFTER a CLRHASH, merely filling the hash-table
approximately 1/2 full will
cause the hash-table-size to decrease in CCL, but not in LispWorks.

Because the size of the hash-tables in these experiments is relatively
small,
and because the tables are only filled 1/2 way, I'm thinking this can't be
due to
heap-exhaustion.  It does look like CLRHASH may be doing something possibly
buggy.
What do you think?

Meanwhile, for my search purposes:
Is there any way in CCL to specify a Minimum size for a hash-table??
It seems pretty annoying (to put it mildly) that rehashing might reduce the
size of the table.
And the :rehash-size and :rehash-threshold don't help, and should only
matter when rehashing
due to the hash-table becoming full (which it isn't in my case).

Apologies for this longish post, and

Thanks in advance for any light you can shed on my ignorance,

--Glenn




On Sun, Jan 4, 2015 at 1:37 PM, Gary Byers <gb at clozure.com> wrote:

> On 01/03/2015 07:10:35 AM, Glenn Iba wrote:
>
>> Raymond,
>>
>>   Thanks for the quick response.   Is it definitely the case, then, that a
>> GC can trigger rehashing?
>>
>
> A hash table can need to be rehashed for either of two distinct reasons:
> 1) it's "full" - the number of key/value pairs that have been added to the
> table exceeds some threshold that's related to the size of the table.
> 2) some of the keys in the table are hashed by address and the GC has
> changed the address of one or more keys in the table, This is not related
> to the size of the table; it is related to the hash table's :TEST function
> and to the types of objects used as keys in the table.
>
> The value of the :size parameter to MAKE-HASH-TABLE doesn't have much to
> do with the "size" of the table after it's been rehashed (for either
> reason); the values of the :REHASH-SIZE and :REHASH-THRESHOLD should (and
> AFAIK they do.)
>
> There's a lot more that could and should be said about this, but if you're
> thinking that making larger hash tables will avoid rehashing because of key
> movement (case (2)) you're mistaken.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clozure.com/pipermail/openmcl-devel/attachments/20150105/306362b5/attachment.htm>


More information about the Openmcl-devel mailing list