[Openmcl-devel] eql and equal hash tables help please
Gary Byers
gb at clozure.com
Wed Jan 16 21:36:59 PST 2013
I think that we can conclude that one or both of the following are true:
a) the hashing function that CCL uses for pathnames and EQUAL hash tables
is really really bad: it either doesn't do a very good job of generating
different hash values for different pathnames or it takes way too long to
generate those values, or both
b) EQUAL takes way too long to compare pathnames. (EQUAL on pathnames is
likely never going to be cheap, but if it's grossly expensive for some
reason that could help to explain the results that you're seeing.)
I haven't looked at things too closely yet, but I'd guess that (a) is more
of the problem; we may not be particularly clever about (b), but I don't
think that EQUAL does anything grossly stupid. Knock wood.
The bad news is that nap time is fast approaching for me, and it'd probably
be wise to wait until tomorrow before whacking away at this. The good news
is that many of the things that make the code in l0-hash.lisp get blurry
and out-of-focus as quickly as it does (thread safety and concurrency,
GC safety and integration) don't seem to be implicated.
If I wake up in the middle of the night having had nightmares about pathname
hashing, I'll regret pointing this out, but ... The function that maps objects
to integer hash codes for EQUAL hash tables is called CCL::%%EQUALHASH, and
it returns two values (a hash code and a boolean indicating whether the hash
code is derived from the address of some part of the object and could change
due to GC. Let's look at the values returned by CCL::%%EQUALHASH for two
similar but different pathname arguments:
? (%%equalhash #p"/usr/local/src/ccl/level-1/")
216088106610578
T
? (%%equalhash #p"/usr/local/src/ccl/level-0/")
216088106610578
T
?
Bloodcurdling ...
On Wed, 16 Jan 2013, George Khouri wrote:
> I did but there appears to be some limit to the initial size of a hash table
> in ccl.? I tried :size 90000 but when I interrupted the program the initial
> size of the table was only about 15k. I admit that l0-hash.lisp is a little
> too much brain drain for me to try to understand (in a reasonable amount of
> time) everything that's going on with hashing so I thought I'd ask.
> Thanks,
> George
>
> On 01/16/2013 04:07 PM, Dave Cooper wrote:
>
> Did you try it using the :size argument to make-hash-table, with the
> estimated number of entries in the file system?
>
>
> Best Regards,
> ? Dave
>
>
> On Jan 16, 2013, at 6:31 PM, George Khouri <gk1 at four-four.com> wrote:
>
> Friends,
> In a program that traverses a file system of ~125000
> entries and adds the pathnames as keys to an #'equal hash
> table, ccl hangs in CCL::LOCK-FREE-PUTHASH with no
> progress after ~14000 entries - seems like it's just
> thrashing. No error is thrown and I can interrupt it with
> control-C.
> If I add the pathnames to an #'eql hash table ccl has no
> problem with the same directory.
> Is this expected behavior??
>
> I tried the same thing in SBCL which works with either an
> eql or an equal hash table.? BTW, the eql version runs in
> ~6 seconds in SBCL, compared to ~70 seconds in CCL, though
> I realize this speed difference could be a memory issue
> and not one of algorithms. I know nothing about SBCL.
>
> I'm running ccl 1.8 on linux 3.2.0-33-generic Ubuntu 12.04
> on a core i5 thinkpad, 4GB.
>
> It doesn't "appear" that I'm? running low on memory, but
> the program behaves that way with the equal hash table,
> and though I've tried the various ccl command-line options
> to increase stack and heap allocations, as well as ulimit
> stack size, I've had no change in results.? I've tried
> various values for initial hash table size, rehash size
> and threshold but nothing seems to have any beneficial
> effect.? The system monitor doesn't show any disk swapping
> going on.
> Is there some memory parameter I'm missing that I need to
> change? Is there? a problem related to having 4 cpu cores?
>
> Here's the typical output when I interrupt the seemingly
> hung program - the hash table won't grow beyond this
> point:
>
> Welcome to Clozure Common Lisp Version 1.8-r15286M?
> (LinuxX8664)!
> ? (traverse-filesystem "/home/gkhouri/")
> ................^C
> > Break: interrupt signal
> > While executing: CCL::LOCK-FREE-PUTHASH, in process
> listener(1).
> ...
> 1 > (:f 0)
> ?(7FB9FE90FF80) : 0 (LOCK-FREE-PUTHASH
> #P"/tmp/gkhouri-test/Documents/Music/M Stal Tunes.csv"
> #<HASH-TABLE :TEST EQUAL size 13672/13673 #x302000753F9D>
> T) 613
> ? (CCL::KEY CCL::HASH CCL::VALUE)
> ?? CCL::KEY: #P"/tmp/gkhouri-test/Documents/Music/M Stal
> Tunes.csv"
> ?? CCL::HASH: #<HASH-TABLE :TEST EQUAL size 13672/13673
> #x302000753F9D>
> ?? CCL::VALUE: T
>
> 1 > (room)
> Approximately 17,432,576 bytes of memory can be allocated
> before the next full GC is triggered.
>
> ?????????????????? Total Size????????????
> Free???????????????? Used
> Lisp Heap:?????? 41025536 (40064K)?? 17432576 (17024K)??
> 23592960 (23040K)
> Stacks:????????? 11072816 (10813K)?? 11055872
> (10797K)????? 16944 (17K)
> Static:????????? 20304144 (19828K)????????? 0 (0K)??????
> 20304144 (19828K)
> 376792.870 MB reserved for heap expansion.
>
>
> If I create the equal hash table with :lock-free set to
> nil,? (make-hash-table :test #'equal :lock-free nil),? ccl
> presumably uses an older hashing algorithm and I don't
> think I have the problem storing the pathnames, BUT after
> about 15,000 entries, things slow down drastically, and
> unacceptably, presumably from rehashing (?), and I killed
> the program after about 60,000 entries because it was
> taking so long. With :lock-free enabled, I never get past
> ~14K entries.
>
> Thanks for any info.
> George Khouri
>
>
>
>
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/mailman/listinfo/openmcl-devel
>
>
>
>
More information about the Openmcl-devel
mailing list