[Openmcl-devel] eql and equal hash tables help please
George Khouri
gk1 at four-four.com
Thu Jan 17 00:32:31 PST 2013
namestrings do serve as a workaround, and so does an EQL hash table with
pathnames - I just wanted to clarify whether EQUAL hash tables should
work with a large number of pathnames. CCL stores the first 10k entries
in a second or two on my system. It's after that that things aren't pretty.
George
On 01/16/2013 10:11 PM, Hans Hübner wrote:
> One idea to try would be to not use pathnames, but name strings as keys.
>
> -Hans
>
>
> On Thu, Jan 17, 2013 at 6:36 AM, Gary Byers <gb at clozure.com
> <mailto:gb at clozure.com>> wrote:
>
> I think that we can conclude that one or both of the following are
> true:
>
> a) the hashing function that CCL uses for pathnames and EQUAL hash
> tables
> is really really bad: it either doesn't do a very good job of
> generating
> different hash values for different pathnames or it takes way
> too long to
> generate those values, or both
> b) EQUAL takes way too long to compare pathnames. (EQUAL on
> pathnames is
> likely never going to be cheap, but if it's grossly expensive
> for some
> reason that could help to explain the results that you're seeing.)
>
> I haven't looked at things too closely yet, but I'd guess that (a)
> is more
> of the problem; we may not be particularly clever about (b), but I
> don't
> think that EQUAL does anything grossly stupid. Knock wood.
>
> The bad news is that nap time is fast approaching for me, and it'd
> probably
> be wise to wait until tomorrow before whacking away at this. The
> good news
> is that many of the things that make the code in l0-hash.lisp get
> blurry
> and out-of-focus as quickly as it does (thread safety and concurrency,
> GC safety and integration) don't seem to be implicated.
>
> If I wake up in the middle of the night having had nightmares
> about pathname
> hashing, I'll regret pointing this out, but ... The function that
> maps objects
> to integer hash codes for EQUAL hash tables is called
> CCL::%%EQUALHASH, and
> it returns two values (a hash code and a boolean indicating
> whether the hash
> code is derived from the address of some part of the object and
> could change
> due to GC. Let's look at the values returned by CCL::%%EQUALHASH
> for two
> similar but different pathname arguments:
>
> ? (%%equalhash #p"/usr/local/src/ccl/level-1/")
> 216088106610578
> T
> ? (%%equalhash #p"/usr/local/src/ccl/level-0/")
> 216088106610578
> T
> ?
>
> Bloodcurdling ...
>
>
>
> On Wed, 16 Jan 2013, George Khouri wrote:
>
> I did but there appears to be some limit to the initial size
> of a hash table
> in ccl.? I tried :size 90000 but when I interrupted the
> program the initial
>
> size of the table was only about 15k. I admit that
> l0-hash.lisp is a little
> too much brain drain for me to try to understand (in a
> reasonable amount of
> time) everything that's going on with hashing so I thought I'd
> ask.
> Thanks,
> George
>
> On 01/16/2013 04:07 PM, Dave Cooper wrote:
>
> Did you try it using the :size argument to make-hash-table,
> with the
> estimated number of entries in the file system?
>
>
> Best Regards,
> ? Dave
>
>
>
> On Jan 16, 2013, at 6:31 PM, George Khouri <gk1 at four-four.com
> <mailto:gk1 at four-four.com>> wrote:
>
> Friends,
> In a program that traverses a file system of ~125000
> entries and adds the pathnames as keys to an #'equal hash
> table, ccl hangs in CCL::LOCK-FREE-PUTHASH with no
> progress after ~14000 entries - seems like it's just
> thrashing. No error is thrown and I can interrupt it with
> control-C.
> If I add the pathnames to an #'eql hash table ccl has no
> problem with the same directory.
> Is this expected behavior??
>
>
> I tried the same thing in SBCL which works with either an
> eql or an equal hash table.? BTW, the eql version runs in
>
> ~6 seconds in SBCL, compared to ~70 seconds in CCL, though
> I realize this speed difference could be a memory issue
> and not one of algorithms. I know nothing about SBCL.
>
> I'm running ccl 1.8 on linux 3.2.0-33-generic Ubuntu 12.04
> on a core i5 thinkpad, 4GB.
>
> It doesn't "appear" that I'm? running low on memory, but
>
> the program behaves that way with the equal hash table,
> and though I've tried the various ccl command-line options
> to increase stack and heap allocations, as well as ulimit
> stack size, I've had no change in results.? I've tried
>
> various values for initial hash table size, rehash size
> and threshold but nothing seems to have any beneficial
> effect.? The system monitor doesn't show any disk swapping
>
> going on.
> Is there some memory parameter I'm missing that I need to
> change? Is there? a problem related to having 4 cpu cores?
>
>
> Here's the typical output when I interrupt the seemingly
> hung program - the hash table won't grow beyond this
> point:
>
> Welcome to Clozure Common Lisp Version 1.8-r15286M?
>
> (LinuxX8664)!
> ? (traverse-filesystem "/home/gkhouri/")
> ................^C
> > Break: interrupt signal
> > While executing: CCL::LOCK-FREE-PUTHASH, in process
> listener(1).
> ...
> 1 > (:f 0)
> ?(7FB9FE90FF80) : 0 (LOCK-FREE-PUTHASH
>
> #P"/tmp/gkhouri-test/Documents/Music/M Stal Tunes.csv"
> #<HASH-TABLE :TEST EQUAL size 13672/13673 #x302000753F9D>
> T) 613
> ? (CCL::KEY CCL::HASH CCL::VALUE)
> ?? CCL::KEY: #P"/tmp/gkhouri-test/Documents/Music/M Stal
> Tunes.csv"
> ?? CCL::HASH: #<HASH-TABLE :TEST EQUAL size 13672/13673
> #x302000753F9D>
> ?? CCL::VALUE: T
>
>
> 1 > (room)
> Approximately 17,432,576 bytes of memory can be allocated
> before the next full GC is triggered.
>
> ?????????????????? Total Size????????????
> Free???????????????? Used
> Lisp Heap:?????? 41025536 (40064K)?? 17432576 (17024K)??
> 23592960 (23040K)
> Stacks:????????? 11072816 (10813K)?? 11055872
> (10797K)????? 16944 <tel:16944> (17K)
> Static:????????? 20304144 (19828K)????????? 0 (0K)??????
>
> 20304144 (19828K)
> 376792.870 MB reserved for heap expansion.
>
>
> If I create the equal hash table with :lock-free set to
> nil,? (make-hash-table :test #'equal :lock-free nil),? ccl
>
> presumably uses an older hashing algorithm and I don't
> think I have the problem storing the pathnames, BUT after
> about 15,000 entries, things slow down drastically, and
> unacceptably, presumably from rehashing (?), and I killed
> the program after about 60,000 entries because it was
> taking so long. With :lock-free enabled, I never get past
> ~14K entries.
>
> Thanks for any info.
> George Khouri
>
>
>
>
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com <mailto:Openmcl-devel at clozure.com>
> http://clozure.com/mailman/listinfo/openmcl-devel
>
>
>
>
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com <mailto:Openmcl-devel at clozure.com>
> http://clozure.com/mailman/listinfo/openmcl-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clozure.com/pipermail/openmcl-devel/attachments/20130117/97fc13fe/attachment.htm>
More information about the Openmcl-devel
mailing list