[Openmcl-devel] Hash Table anomaly -- hash-table-size decreases - wondering how this can happen

Glenn Iba giba at alum.mit.edu
Sat Jan 3 06:10:35 PST 2015


Raymond,

  Thanks for the quick response.   Is it definitely the case, then, that a
GC can trigger rehashing?
Rehashing a large hash table is potentially expensive -- and I wanted to
avoid it.
The reason I allocate a large hash-table to begin with is to avoid the
rehashing incurred
due to growing the hash-table.  It takes my search an order of magnitude
more time to
compute a generation if the hash-table starts small and grows repeatedly.
After my hash-tables "shrink" in size, the compute time for a generation
jumps from 1/2 hour
to 15 hours.

Is there anyway to specify a minimum size for a hash-table?  or to protect
it against
re-hashing during GC?

One idea I had was to allocate a new "really large" hash-table for each
generation.
This would be instead of using CLRHASH and re-using the original large
hash-table.
But this wouldn't  help if GC causes it to shrink anyway.  My goal is to
have a
large hash-table allocated only once, and have it stay large so I can reuse
it.

Thanks for any suggestions,
--Glenn


On Sat, Jan 3, 2015 at 3:26 AM, Raymond Wiker <rwiker at gmail.com> wrote:

> My understanding (as far as it goes) is that hash-table-size is just a
> hint to the runtime of the expected number of keys in the hash table; it is
> used when creating a hash table and can later be used to create other hash
> tables of similar size. hash-table-count, on the other hand, is the actual,
> current number of keys in the hash table. I would expect hash-table-size to
> increase as the number of keys in the table grows past the initial size,
> and it would also make sense for hash-table-size to decrease if the hash
> table is rehashed as part of gc.
>
> > On 03 Jan 2015, at 08:47 , Glenn Iba <giba at alum.mit.edu> wrote:
> >
> > Call for help!
> >
> >   I'm doing some large searches in CCL, and have been using large
> hash-tables,
> > but I"m perplexed that the hash-table-size is getting mysteriously
> decreased.
> > Can anyone explain how this is possible?
> >
> > My speculation is that I'm exhausting the heap (though I don't get any
> notification of this),
> > and that CCL is trying to create more heap space by shrinking my large
> hash-table.
> > Does this sound like it could be possible?   I'd prefer to get a
> notification that I'm out of space.
> > Is there any way to control this?
> >
> > Details:
> >    I'm running CCL 1.10 on a Mac with OS X Yosemite (10.10.1), with 8GB
> RAM.
> >    I'm creating a single large hash-table with
> >          (make-hash-table :test #'equalp :size 100000000)   ;;
> 100,000,000
> >    I'm storing positions of my search space (each represented by a
> byte-vector of 16 unsigned-bytes)
> >        in this hash-table
> >    For each generation, I collect all the positions in the hash-table
> (to avoid duplicates).
> >    I then write the generation out to a file, and do CLRHASH so I can
> reuse the hash-table.
> >    My searches reach a point (as the generation size grows) when the
> hash-table-size
> >        decreases dramatically (from 100,000,000 to  12,396,373)  -- how
> is this possible?
> >
> > I'd be happy to supply code, detailed traces, and whatever other info I
> can
> >   to anyone who'd be willing to help me figure this out.
> >
> > Thanks in advance!
> > --Glenn
> >
> >
> > _______________________________________________
> > Openmcl-devel mailing list
> > Openmcl-devel at clozure.com
> > https://lists.clozure.com/mailman/listinfo/openmcl-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clozure.com/pipermail/openmcl-devel/attachments/20150103/e40299fa/attachment.htm>


More information about the Openmcl-devel mailing list