[Openmcl-devel] eql and equal hash tables help please

Hans Hübner hans.huebner at gmail.com
Wed Jan 16 22:11:15 PST 2013


One idea to try would be to not use pathnames, but name strings as keys.

-Hans


On Thu, Jan 17, 2013 at 6:36 AM, Gary Byers <gb at clozure.com> wrote:

> I think that we can conclude that one or both of the following are true:
>
> a) the hashing function that CCL uses for pathnames and EQUAL hash tables
>    is really really bad: it either doesn't do a very good job of generating
>    different hash values for different pathnames or it takes way too long
> to
>    generate those values, or both
> b) EQUAL takes way too long to compare pathnames.  (EQUAL on pathnames is
>    likely never going to be cheap, but if it's grossly expensive for some
>    reason that could help to explain the results that you're seeing.)
>
> I haven't looked at things too closely yet, but I'd guess that (a) is more
> of the problem; we may not be particularly clever about (b), but I don't
> think that EQUAL does anything grossly stupid.  Knock wood.
>
> The bad news is that nap time is fast approaching for me, and it'd probably
> be wise to wait until tomorrow before whacking away at this.  The good news
> is that many of the things that make the code in l0-hash.lisp get blurry
> and out-of-focus as quickly as it does (thread safety and concurrency,
> GC safety and integration) don't seem to be implicated.
>
> If I wake up in the middle of the night having had nightmares about
> pathname
> hashing, I'll regret pointing this out, but ...  The function that maps
> objects
> to integer hash codes for EQUAL hash tables is called CCL::%%EQUALHASH, and
> it returns two values (a hash code and a boolean indicating whether the
> hash
> code is derived from the address of some part of the object and could
> change
> due to GC.  Let's look at the values returned by CCL::%%EQUALHASH for two
> similar but different pathname arguments:
>
> ? (%%equalhash #p"/usr/local/src/ccl/level-1/**")
> 216088106610578
> T
> ? (%%equalhash #p"/usr/local/src/ccl/level-0/**")
> 216088106610578
> T
> ?
>
> Bloodcurdling ...
>
>
>
> On Wed, 16 Jan 2013, George Khouri wrote:
>
>  I did but there appears to be some limit to the initial size of a hash
>> table
>> in ccl.? I tried :size 90000 but when I interrupted the program the
>> initial
>>
>> size of the table was only about 15k. I admit that l0-hash.lisp is a
>> little
>> too much brain drain for me to try to understand (in a reasonable amount
>> of
>> time) everything that's going on with hashing so I thought I'd ask.
>> Thanks,
>> George
>>
>> On 01/16/2013 04:07 PM, Dave Cooper wrote:
>>
>> Did you try it using the :size argument to make-hash-table, with the
>> estimated number of entries in the file system?
>>
>>
>> Best Regards,
>> ? Dave
>>
>>
>>
>> On Jan 16, 2013, at 6:31 PM, George Khouri <gk1 at four-four.com> wrote:
>>
>>       Friends,
>>       In a program that traverses a file system of ~125000
>>       entries and adds the pathnames as keys to an #'equal hash
>>       table, ccl hangs in CCL::LOCK-FREE-PUTHASH with no
>>       progress after ~14000 entries - seems like it's just
>>       thrashing. No error is thrown and I can interrupt it with
>>       control-C.
>>       If I add the pathnames to an #'eql hash table ccl has no
>>       problem with the same directory.
>>       Is this expected behavior??
>>
>>
>>       I tried the same thing in SBCL which works with either an
>>       eql or an equal hash table.? BTW, the eql version runs in
>>
>>       ~6 seconds in SBCL, compared to ~70 seconds in CCL, though
>>       I realize this speed difference could be a memory issue
>>       and not one of algorithms. I know nothing about SBCL.
>>
>>       I'm running ccl 1.8 on linux 3.2.0-33-generic Ubuntu 12.04
>>       on a core i5 thinkpad, 4GB.
>>
>>       It doesn't "appear" that I'm? running low on memory, but
>>
>>       the program behaves that way with the equal hash table,
>>       and though I've tried the various ccl command-line options
>>       to increase stack and heap allocations, as well as ulimit
>>       stack size, I've had no change in results.? I've tried
>>
>>       various values for initial hash table size, rehash size
>>       and threshold but nothing seems to have any beneficial
>>       effect.? The system monitor doesn't show any disk swapping
>>
>>       going on.
>>       Is there some memory parameter I'm missing that I need to
>>       change? Is there? a problem related to having 4 cpu cores?
>>
>>
>>       Here's the typical output when I interrupt the seemingly
>>       hung program - the hash table won't grow beyond this
>>       point:
>>
>>       Welcome to Clozure Common Lisp Version 1.8-r15286M?
>>
>>       (LinuxX8664)!
>>       ? (traverse-filesystem "/home/gkhouri/")
>>       ................^C
>>       > Break: interrupt signal
>>       > While executing: CCL::LOCK-FREE-PUTHASH, in process
>>       listener(1).
>>       ...
>>       1 > (:f 0)
>>       ?(7FB9FE90FF80) : 0 (LOCK-FREE-PUTHASH
>>
>>       #P"/tmp/gkhouri-test/**Documents/Music/M Stal Tunes.csv"
>>       #<HASH-TABLE :TEST EQUAL size 13672/13673 #x302000753F9D>
>>       T) 613
>>       ? (CCL::KEY CCL::HASH CCL::VALUE)
>>       ?? CCL::KEY: #P"/tmp/gkhouri-test/**Documents/Music/M Stal
>>       Tunes.csv"
>>       ?? CCL::HASH: #<HASH-TABLE :TEST EQUAL size 13672/13673
>>       #x302000753F9D>
>>       ?? CCL::VALUE: T
>>
>>
>>       1 > (room)
>>       Approximately 17,432,576 bytes of memory can be allocated
>>       before the next full GC is triggered.
>>
>>       ?????????????????? Total Size????????????
>>       Free???????????????? Used
>>       Lisp Heap:?????? 41025536 (40064K)?? 17432576 (17024K)??
>>       23592960 (23040K)
>>       Stacks:????????? 11072816 (10813K)?? 11055872
>>       (10797K)????? 16944 (17K)
>>       Static:????????? 20304144 (19828K)????????? 0 (0K)??????
>>
>>       20304144 (19828K)
>>       376792.870 MB reserved for heap expansion.
>>
>>
>>       If I create the equal hash table with :lock-free set to
>>       nil,? (make-hash-table :test #'equal :lock-free nil),? ccl
>>
>>       presumably uses an older hashing algorithm and I don't
>>       think I have the problem storing the pathnames, BUT after
>>       about 15,000 entries, things slow down drastically, and
>>       unacceptably, presumably from rehashing (?), and I killed
>>       the program after about 60,000 entries because it was
>>       taking so long. With :lock-free enabled, I never get past
>>       ~14K entries.
>>
>>       Thanks for any info.
>>       George Khouri
>>
>>
>>
>>
>>       ______________________________**_________________
>>       Openmcl-devel mailing list
>>       Openmcl-devel at clozure.com
>>       http://clozure.com/mailman/**listinfo/openmcl-devel<http://clozure.com/mailman/listinfo/openmcl-devel>
>>
>>
>>
>>
>>  ______________________________**_________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/mailman/**listinfo/openmcl-devel<http://clozure.com/mailman/listinfo/openmcl-devel>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clozure.com/pipermail/openmcl-devel/attachments/20130117/c785e1d9/attachment.htm>


More information about the Openmcl-devel mailing list