[Openmcl-devel] eql and equal hash tables help please
hans.huebner at gmail.com
Wed Jan 16 22:11:15 PST 2013
One idea to try would be to not use pathnames, but name strings as keys.
On Thu, Jan 17, 2013 at 6:36 AM, Gary Byers <gb at clozure.com> wrote:
> I think that we can conclude that one or both of the following are true:
> a) the hashing function that CCL uses for pathnames and EQUAL hash tables
> is really really bad: it either doesn't do a very good job of generating
> different hash values for different pathnames or it takes way too long
> generate those values, or both
> b) EQUAL takes way too long to compare pathnames. (EQUAL on pathnames is
> likely never going to be cheap, but if it's grossly expensive for some
> reason that could help to explain the results that you're seeing.)
> I haven't looked at things too closely yet, but I'd guess that (a) is more
> of the problem; we may not be particularly clever about (b), but I don't
> think that EQUAL does anything grossly stupid. Knock wood.
> The bad news is that nap time is fast approaching for me, and it'd probably
> be wise to wait until tomorrow before whacking away at this. The good news
> is that many of the things that make the code in l0-hash.lisp get blurry
> and out-of-focus as quickly as it does (thread safety and concurrency,
> GC safety and integration) don't seem to be implicated.
> If I wake up in the middle of the night having had nightmares about
> hashing, I'll regret pointing this out, but ... The function that maps
> to integer hash codes for EQUAL hash tables is called CCL::%%EQUALHASH, and
> it returns two values (a hash code and a boolean indicating whether the
> code is derived from the address of some part of the object and could
> due to GC. Let's look at the values returned by CCL::%%EQUALHASH for two
> similar but different pathname arguments:
> ? (%%equalhash #p"/usr/local/src/ccl/level-1/**")
> ? (%%equalhash #p"/usr/local/src/ccl/level-0/**")
> Bloodcurdling ...
> On Wed, 16 Jan 2013, George Khouri wrote:
> I did but there appears to be some limit to the initial size of a hash
>> in ccl.? I tried :size 90000 but when I interrupted the program the
>> size of the table was only about 15k. I admit that l0-hash.lisp is a
>> too much brain drain for me to try to understand (in a reasonable amount
>> time) everything that's going on with hashing so I thought I'd ask.
>> On 01/16/2013 04:07 PM, Dave Cooper wrote:
>> Did you try it using the :size argument to make-hash-table, with the
>> estimated number of entries in the file system?
>> Best Regards,
>> ? Dave
>> On Jan 16, 2013, at 6:31 PM, George Khouri <gk1 at four-four.com> wrote:
>> In a program that traverses a file system of ~125000
>> entries and adds the pathnames as keys to an #'equal hash
>> table, ccl hangs in CCL::LOCK-FREE-PUTHASH with no
>> progress after ~14000 entries - seems like it's just
>> thrashing. No error is thrown and I can interrupt it with
>> If I add the pathnames to an #'eql hash table ccl has no
>> problem with the same directory.
>> Is this expected behavior??
>> I tried the same thing in SBCL which works with either an
>> eql or an equal hash table.? BTW, the eql version runs in
>> ~6 seconds in SBCL, compared to ~70 seconds in CCL, though
>> I realize this speed difference could be a memory issue
>> and not one of algorithms. I know nothing about SBCL.
>> I'm running ccl 1.8 on linux 3.2.0-33-generic Ubuntu 12.04
>> on a core i5 thinkpad, 4GB.
>> It doesn't "appear" that I'm? running low on memory, but
>> the program behaves that way with the equal hash table,
>> and though I've tried the various ccl command-line options
>> to increase stack and heap allocations, as well as ulimit
>> stack size, I've had no change in results.? I've tried
>> various values for initial hash table size, rehash size
>> and threshold but nothing seems to have any beneficial
>> effect.? The system monitor doesn't show any disk swapping
>> going on.
>> Is there some memory parameter I'm missing that I need to
>> change? Is there? a problem related to having 4 cpu cores?
>> Here's the typical output when I interrupt the seemingly
>> hung program - the hash table won't grow beyond this
>> Welcome to Clozure Common Lisp Version 1.8-r15286M?
>> ? (traverse-filesystem "/home/gkhouri/")
>> > Break: interrupt signal
>> > While executing: CCL::LOCK-FREE-PUTHASH, in process
>> 1 > (:f 0)
>> ?(7FB9FE90FF80) : 0 (LOCK-FREE-PUTHASH
>> #P"/tmp/gkhouri-test/**Documents/Music/M Stal Tunes.csv"
>> #<HASH-TABLE :TEST EQUAL size 13672/13673 #x302000753F9D>
>> T) 613
>> ? (CCL::KEY CCL::HASH CCL::VALUE)
>> ?? CCL::KEY: #P"/tmp/gkhouri-test/**Documents/Music/M Stal
>> ?? CCL::HASH: #<HASH-TABLE :TEST EQUAL size 13672/13673
>> ?? CCL::VALUE: T
>> 1 > (room)
>> Approximately 17,432,576 bytes of memory can be allocated
>> before the next full GC is triggered.
>> ?????????????????? Total Size????????????
>> Free???????????????? Used
>> Lisp Heap:?????? 41025536 (40064K)?? 17432576 (17024K)??
>> 23592960 (23040K)
>> Stacks:????????? 11072816 (10813K)?? 11055872
>> (10797K)????? 16944 (17K)
>> Static:????????? 20304144 (19828K)????????? 0 (0K)??????
>> 20304144 (19828K)
>> 376792.870 MB reserved for heap expansion.
>> If I create the equal hash table with :lock-free set to
>> nil,? (make-hash-table :test #'equal :lock-free nil),? ccl
>> presumably uses an older hashing algorithm and I don't
>> think I have the problem storing the pathnames, BUT after
>> about 15,000 entries, things slow down drastically, and
>> unacceptably, presumably from rehashing (?), and I killed
>> the program after about 60,000 entries because it was
>> taking so long. With :lock-free enabled, I never get past
>> ~14K entries.
>> Thanks for any info.
>> George Khouri
>> Openmcl-devel mailing list
>> Openmcl-devel at clozure.com
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Openmcl-devel