[Openmcl-devel] Hash Table performance
gz at clozure.com
Tue Sep 23 22:34:07 UTC 2008
At 9/23/2008 05:58 PM, Osei Poku wrote:
>CCL extensively uses the ENUMERATE-HASH-KEYS-AND-VALUES function
>underneath most of hash table functions (MAPHASH, WITH-HASH-TABLE-
>ITERATOR). This causes a great deal of waste when iterating through a
>very large hash table (like I have now ~10^8 entries). The way I
>understand the code from briefly looking at l0-hash.lisp and
>hash.lisp, the contents of the hash table are first stored in a pair
>of vectors, then the vectors are iterated through.
>WITH-HASH-TABLE-ITERATOR is used everywhere like in the expansion of
>the LOOP macro. There is simply no standard way in CCL to iterate
>through the contents of a hash table without doing this double work.
>Is there some fundamental limitation of the way hash tables are
>implemented in CCL that prevent iterating through its contents only
Sort of. CCL does rehashing in place. If a hash table were rehashed
while you're iterating over it, the result would be totally random.
However, there are potentially ways around this. For example, a hash
table could know whether it's being iterated over, and do a copying
rehash if so. Of course if this does happen (and especially if it
happens repeatedly) you might end up consing as much or more as
the current iterator, but only if your keys are address based and
recently consed, so it's probably a good bet.
The new "nearly lock free" hash tables that are in the trunk always do
a copying rehash (for other reasons), so they especially should be
made to iterate without copying.
More information about the Openmcl-devel