[Openmcl-devel] eql and equal hash tables help please

George Khouri gk1 at four-four.com
Thu Jan 17 00:32:31 PST 2013


namestrings do serve as a workaround, and so does an EQL hash table with 
pathnames - I just wanted to clarify whether  EQUAL hash tables should 
work with a large number of pathnames.  CCL stores the first 10k entries 
in a second or two on my system. It's after that that things aren't pretty.
George

On 01/16/2013 10:11 PM, Hans Hübner wrote:
> One idea to try would be to not use pathnames, but name strings as keys.
>
> -Hans
>
>
> On Thu, Jan 17, 2013 at 6:36 AM, Gary Byers <gb at clozure.com 
> <mailto:gb at clozure.com>> wrote:
>
>     I think that we can conclude that one or both of the following are
>     true:
>
>     a) the hashing function that CCL uses for pathnames and EQUAL hash
>     tables
>        is really really bad: it either doesn't do a very good job of
>     generating
>        different hash values for different pathnames or it takes way
>     too long to
>        generate those values, or both
>     b) EQUAL takes way too long to compare pathnames.  (EQUAL on
>     pathnames is
>        likely never going to be cheap, but if it's grossly expensive
>     for some
>        reason that could help to explain the results that you're seeing.)
>
>     I haven't looked at things too closely yet, but I'd guess that (a)
>     is more
>     of the problem; we may not be particularly clever about (b), but I
>     don't
>     think that EQUAL does anything grossly stupid.  Knock wood.
>
>     The bad news is that nap time is fast approaching for me, and it'd
>     probably
>     be wise to wait until tomorrow before whacking away at this.  The
>     good news
>     is that many of the things that make the code in l0-hash.lisp get
>     blurry
>     and out-of-focus as quickly as it does (thread safety and concurrency,
>     GC safety and integration) don't seem to be implicated.
>
>     If I wake up in the middle of the night having had nightmares
>     about pathname
>     hashing, I'll regret pointing this out, but ...  The function that
>     maps objects
>     to integer hash codes for EQUAL hash tables is called
>     CCL::%%EQUALHASH, and
>     it returns two values (a hash code and a boolean indicating
>     whether the hash
>     code is derived from the address of some part of the object and
>     could change
>     due to GC.  Let's look at the values returned by CCL::%%EQUALHASH
>     for two
>     similar but different pathname arguments:
>
>     ? (%%equalhash #p"/usr/local/src/ccl/level-1/")
>     216088106610578
>     T
>     ? (%%equalhash #p"/usr/local/src/ccl/level-0/")
>     216088106610578
>     T
>     ?
>
>     Bloodcurdling ...
>
>
>
>     On Wed, 16 Jan 2013, George Khouri wrote:
>
>         I did but there appears to be some limit to the initial size
>         of a hash table
>         in ccl.? I tried :size 90000 but when I interrupted the
>         program the initial
>
>         size of the table was only about 15k. I admit that
>         l0-hash.lisp is a little
>         too much brain drain for me to try to understand (in a
>         reasonable amount of
>         time) everything that's going on with hashing so I thought I'd
>         ask.
>         Thanks,
>         George
>
>         On 01/16/2013 04:07 PM, Dave Cooper wrote:
>
>         Did you try it using the :size argument to make-hash-table,
>         with the
>         estimated number of entries in the file system?
>
>
>         Best Regards,
>         ? Dave
>
>
>
>         On Jan 16, 2013, at 6:31 PM, George Khouri <gk1 at four-four.com
>         <mailto:gk1 at four-four.com>> wrote:
>
>               Friends,
>               In a program that traverses a file system of ~125000
>               entries and adds the pathnames as keys to an #'equal hash
>               table, ccl hangs in CCL::LOCK-FREE-PUTHASH with no
>               progress after ~14000 entries - seems like it's just
>               thrashing. No error is thrown and I can interrupt it with
>               control-C.
>               If I add the pathnames to an #'eql hash table ccl has no
>               problem with the same directory.
>               Is this expected behavior??
>
>
>               I tried the same thing in SBCL which works with either an
>               eql or an equal hash table.? BTW, the eql version runs in
>
>               ~6 seconds in SBCL, compared to ~70 seconds in CCL, though
>               I realize this speed difference could be a memory issue
>               and not one of algorithms. I know nothing about SBCL.
>
>               I'm running ccl 1.8 on linux 3.2.0-33-generic Ubuntu 12.04
>               on a core i5 thinkpad, 4GB.
>
>               It doesn't "appear" that I'm? running low on memory, but
>
>               the program behaves that way with the equal hash table,
>               and though I've tried the various ccl command-line options
>               to increase stack and heap allocations, as well as ulimit
>               stack size, I've had no change in results.? I've tried
>
>               various values for initial hash table size, rehash size
>               and threshold but nothing seems to have any beneficial
>               effect.? The system monitor doesn't show any disk swapping
>
>               going on.
>               Is there some memory parameter I'm missing that I need to
>               change? Is there? a problem related to having 4 cpu cores?
>
>
>               Here's the typical output when I interrupt the seemingly
>               hung program - the hash table won't grow beyond this
>               point:
>
>               Welcome to Clozure Common Lisp Version 1.8-r15286M?
>
>               (LinuxX8664)!
>               ? (traverse-filesystem "/home/gkhouri/")
>               ................^C
>               > Break: interrupt signal
>               > While executing: CCL::LOCK-FREE-PUTHASH, in process
>               listener(1).
>               ...
>               1 > (:f 0)
>               ?(7FB9FE90FF80) : 0 (LOCK-FREE-PUTHASH
>
>               #P"/tmp/gkhouri-test/Documents/Music/M Stal Tunes.csv"
>               #<HASH-TABLE :TEST EQUAL size 13672/13673 #x302000753F9D>
>               T) 613
>               ? (CCL::KEY CCL::HASH CCL::VALUE)
>               ?? CCL::KEY: #P"/tmp/gkhouri-test/Documents/Music/M Stal
>               Tunes.csv"
>               ?? CCL::HASH: #<HASH-TABLE :TEST EQUAL size 13672/13673
>               #x302000753F9D>
>               ?? CCL::VALUE: T
>
>
>               1 > (room)
>               Approximately 17,432,576 bytes of memory can be allocated
>               before the next full GC is triggered.
>
>               ?????????????????? Total Size????????????
>               Free???????????????? Used
>               Lisp Heap:?????? 41025536 (40064K)?? 17432576 (17024K)??
>               23592960 (23040K)
>               Stacks:????????? 11072816 (10813K)?? 11055872
>               (10797K)????? 16944 <tel:16944> (17K)
>               Static:????????? 20304144 (19828K)????????? 0 (0K)??????
>
>               20304144 (19828K)
>               376792.870 MB reserved for heap expansion.
>
>
>               If I create the equal hash table with :lock-free set to
>               nil,? (make-hash-table :test #'equal :lock-free nil),? ccl
>
>               presumably uses an older hashing algorithm and I don't
>               think I have the problem storing the pathnames, BUT after
>               about 15,000 entries, things slow down drastically, and
>               unacceptably, presumably from rehashing (?), and I killed
>               the program after about 60,000 entries because it was
>               taking so long. With :lock-free enabled, I never get past
>               ~14K entries.
>
>               Thanks for any info.
>               George Khouri
>
>
>
>
>               _______________________________________________
>               Openmcl-devel mailing list
>         Openmcl-devel at clozure.com <mailto:Openmcl-devel at clozure.com>
>         http://clozure.com/mailman/listinfo/openmcl-devel
>
>
>
>
>     _______________________________________________
>     Openmcl-devel mailing list
>     Openmcl-devel at clozure.com <mailto:Openmcl-devel at clozure.com>
>     http://clozure.com/mailman/listinfo/openmcl-devel
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clozure.com/pipermail/openmcl-devel/attachments/20130117/97fc13fe/attachment.htm>


More information about the Openmcl-devel mailing list