<div dir="ltr">One idea to try would be to not use pathnames, but name strings as keys.<div><br></div><div style>-Hans</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, Jan 17, 2013 at 6:36 AM, Gary Byers <span dir="ltr"><<a href="mailto:gb@clozure.com" target="_blank">gb@clozure.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I think that we can conclude that one or both of the following are true:<br>
<br>
a) the hashing function that CCL uses for pathnames and EQUAL hash tables<br>
is really really bad: it either doesn't do a very good job of generating<br>
different hash values for different pathnames or it takes way too long to<br>
generate those values, or both<br>
b) EQUAL takes way too long to compare pathnames. (EQUAL on pathnames is<br>
likely never going to be cheap, but if it's grossly expensive for some<br>
reason that could help to explain the results that you're seeing.)<br>
<br>
I haven't looked at things too closely yet, but I'd guess that (a) is more<br>
of the problem; we may not be particularly clever about (b), but I don't<br>
think that EQUAL does anything grossly stupid. Knock wood.<br>
<br>
The bad news is that nap time is fast approaching for me, and it'd probably<br>
be wise to wait until tomorrow before whacking away at this. The good news<br>
is that many of the things that make the code in l0-hash.lisp get blurry<br>
and out-of-focus as quickly as it does (thread safety and concurrency,<br>
GC safety and integration) don't seem to be implicated.<br>
<br>
If I wake up in the middle of the night having had nightmares about pathname<br>
hashing, I'll regret pointing this out, but ... The function that maps objects<br>
to integer hash codes for EQUAL hash tables is called CCL::%%EQUALHASH, and<br>
it returns two values (a hash code and a boolean indicating whether the hash<br>
code is derived from the address of some part of the object and could change<br>
due to GC. Let's look at the values returned by CCL::%%EQUALHASH for two<br>
similar but different pathname arguments:<br>
<br>
? (%%equalhash #p"/usr/local/src/ccl/level-1/<u></u>")<br>
216088106610578<br>
T<br>
? (%%equalhash #p"/usr/local/src/ccl/level-0/<u></u>")<br>
216088106610578<br>
T<br>
?<br>
<br>
Bloodcurdling ...<div class="im"><br>
<br>
<br>
On Wed, 16 Jan 2013, George Khouri wrote:<br>
<br>
</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">
I did but there appears to be some limit to the initial size of a hash table<br></div>
in ccl.? I tried :size 90000 but when I interrupted the program the initial<div class="im"><br>
size of the table was only about 15k. I admit that l0-hash.lisp is a little<br>
too much brain drain for me to try to understand (in a reasonable amount of<br>
time) everything that's going on with hashing so I thought I'd ask.<br>
Thanks,<br>
George<br>
<br>
On 01/16/2013 04:07 PM, Dave Cooper wrote:<br>
<br>
Did you try it using the :size argument to make-hash-table, with the<br>
estimated number of entries in the file system?<br>
<br>
<br>
Best Regards,<br></div>
? Dave<div class="im"><br>
<br>
<br>
On Jan 16, 2013, at 6:31 PM, George Khouri <<a href="mailto:gk1@four-four.com" target="_blank">gk1@four-four.com</a>> wrote:<br>
<br>
Friends,<br>
In a program that traverses a file system of ~125000<br>
entries and adds the pathnames as keys to an #'equal hash<br>
table, ccl hangs in CCL::LOCK-FREE-PUTHASH with no<br>
progress after ~14000 entries - seems like it's just<br>
thrashing. No error is thrown and I can interrupt it with<br>
control-C.<br>
If I add the pathnames to an #'eql hash table ccl has no<br>
problem with the same directory.<br></div>
Is this expected behavior??<div class="im"><br>
<br>
I tried the same thing in SBCL which works with either an<br></div>
eql or an equal hash table.? BTW, the eql version runs in<div class="im"><br>
~6 seconds in SBCL, compared to ~70 seconds in CCL, though<br>
I realize this speed difference could be a memory issue<br>
and not one of algorithms. I know nothing about SBCL.<br>
<br>
I'm running ccl 1.8 on linux 3.2.0-33-generic Ubuntu 12.04<br>
on a core i5 thinkpad, 4GB.<br>
<br></div>
It doesn't "appear" that I'm? running low on memory, but<div class="im"><br>
the program behaves that way with the equal hash table,<br>
and though I've tried the various ccl command-line options<br>
to increase stack and heap allocations, as well as ulimit<br></div>
stack size, I've had no change in results.? I've tried<div class="im"><br>
various values for initial hash table size, rehash size<br>
and threshold but nothing seems to have any beneficial<br></div>
effect.? The system monitor doesn't show any disk swapping<div class="im"><br>
going on.<br>
Is there some memory parameter I'm missing that I need to<br></div>
change? Is there? a problem related to having 4 cpu cores?<div class="im"><br>
<br>
Here's the typical output when I interrupt the seemingly<br>
hung program - the hash table won't grow beyond this<br>
point:<br>
<br></div>
Welcome to Clozure Common Lisp Version 1.8-r15286M?<div class="im"><br>
(LinuxX8664)!<br>
? (traverse-filesystem "/home/gkhouri/")<br>
................^C<br>
> Break: interrupt signal<br>
> While executing: CCL::LOCK-FREE-PUTHASH, in process<br>
listener(1).<br>
...<br>
1 > (:f 0)<br></div>
?(7FB9FE90FF80) : 0 (LOCK-FREE-PUTHASH<div class="im"><br>
#P"/tmp/gkhouri-test/<u></u>Documents/Music/M Stal Tunes.csv"<br>
#<HASH-TABLE :TEST EQUAL size 13672/13673 #x302000753F9D><br>
T) 613<br></div>
? (CCL::KEY CCL::HASH CCL::VALUE)<br>
?? CCL::KEY: #P"/tmp/gkhouri-test/<u></u>Documents/Music/M Stal<br>
Tunes.csv"<br>
?? CCL::HASH: #<HASH-TABLE :TEST EQUAL size 13672/13673<br>
#x302000753F9D><br>
?? CCL::VALUE: T<div class="im"><br>
<br>
1 > (room)<br>
Approximately 17,432,576 bytes of memory can be allocated<br>
before the next full GC is triggered.<br>
<br></div>
?????????????????? Total Size????????????<br>
Free???????????????? Used<br>
Lisp Heap:?????? 41025536 (40064K)?? 17432576 (17024K)??<br>
23592960 (23040K)<br>
Stacks:????????? 11072816 (10813K)?? 11055872<br>
(10797K)????? <a href="tel:16944" value="+4916944" target="_blank">16944</a> (17K)<br>
Static:????????? 20304144 (19828K)????????? 0 (0K)??????<div class="im"><br>
20304144 (19828K)<br>
376792.870 MB reserved for heap expansion.<br>
<br>
<br>
If I create the equal hash table with :lock-free set to<br></div>
nil,? (make-hash-table :test #'equal :lock-free nil),? ccl<div class="im"><br>
presumably uses an older hashing algorithm and I don't<br>
think I have the problem storing the pathnames, BUT after<br>
about 15,000 entries, things slow down drastically, and<br>
unacceptably, presumably from rehashing (?), and I killed<br>
the program after about 60,000 entries because it was<br>
taking so long. With :lock-free enabled, I never get past<br>
~14K entries.<br>
<br>
Thanks for any info.<br>
George Khouri<br>
<br>
<br>
<br>
<br>
______________________________<u></u>_________________<br>
Openmcl-devel mailing list<br>
<a href="mailto:Openmcl-devel@clozure.com" target="_blank">Openmcl-devel@clozure.com</a><br>
<a href="http://clozure.com/mailman/listinfo/openmcl-devel" target="_blank">http://clozure.com/mailman/<u></u>listinfo/openmcl-devel</a><br>
<br>
<br>
<br>
<br>
</div></blockquote><div class="HOEnZb"><div class="h5">
______________________________<u></u>_________________<br>
Openmcl-devel mailing list<br>
<a href="mailto:Openmcl-devel@clozure.com" target="_blank">Openmcl-devel@clozure.com</a><br>
<a href="http://clozure.com/mailman/listinfo/openmcl-devel" target="_blank">http://clozure.com/mailman/<u></u>listinfo/openmcl-devel</a><br>
</div></div></blockquote></div><br></div>