<html>
<head>
</head>
<body text="#000000" bgcolor="#FFFFFF">
namestrings do serve as a workaround, and so does an EQL hash table
with pathnames - I just wanted to clarify whether EQUAL hash tables
should work with a large number of pathnames. CCL stores the first
10k entries in a second or two on my system. It's after that that
things aren't pretty.<br>
George<br>
<br>
<div class="moz-cite-prefix">On 01/16/2013 10:11 PM, Hans Hübner
wrote:<br>
</div>
<blockquote cite="mid:CABj0bwJo+Yo14hZb299Rvo6G7E2hX76-jA+0iNXP8qdFbv6B6w@mail.gmail.com" type="cite">
<div dir="ltr">One idea to try would be to not use pathnames, but
name strings as keys.
<div><br>
</div>
<div style="">-Hans</div>
</div>
<div class="gmail_extra"><br>
<br>
<div class="gmail_quote">On Thu, Jan 17, 2013 at 6:36 AM, Gary
Byers <span dir="ltr"><<a moz-do-not-send="true" href="mailto:gb@clozure.com" target="_blank">gb@clozure.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">I think
that we can conclude that one or both of the following are
true:<br>
<br>
a) the hashing function that CCL uses for pathnames and
EQUAL hash tables<br>
is really really bad: it either doesn't do a very good
job of generating<br>
different hash values for different pathnames or it takes
way too long to<br>
generate those values, or both<br>
b) EQUAL takes way too long to compare pathnames. (EQUAL on
pathnames is<br>
likely never going to be cheap, but if it's grossly
expensive for some<br>
reason that could help to explain the results that you're
seeing.)<br>
<br>
I haven't looked at things too closely yet, but I'd guess
that (a) is more<br>
of the problem; we may not be particularly clever about (b),
but I don't<br>
think that EQUAL does anything grossly stupid. Knock wood.<br>
<br>
The bad news is that nap time is fast approaching for me,
and it'd probably<br>
be wise to wait until tomorrow before whacking away at this.
The good news<br>
is that many of the things that make the code in
l0-hash.lisp get blurry<br>
and out-of-focus as quickly as it does (thread safety and
concurrency,<br>
GC safety and integration) don't seem to be implicated.<br>
<br>
If I wake up in the middle of the night having had
nightmares about pathname<br>
hashing, I'll regret pointing this out, but ... The
function that maps objects<br>
to integer hash codes for EQUAL hash tables is called
CCL::%%EQUALHASH, and<br>
it returns two values (a hash code and a boolean indicating
whether the hash<br>
code is derived from the address of some part of the object
and could change<br>
due to GC. Let's look at the values returned by
CCL::%%EQUALHASH for two<br>
similar but different pathname arguments:<br>
<br>
? (%%equalhash #p"/usr/local/src/ccl/level-1/")<br>
216088106610578<br>
T<br>
? (%%equalhash #p"/usr/local/src/ccl/level-0/")<br>
216088106610578<br>
T<br>
?<br>
<br>
Bloodcurdling ...
<div class="im"><br>
<br>
<br>
On Wed, 16 Jan 2013, George Khouri wrote:<br>
<br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im">
I did but there appears to be some limit to the initial
size of a hash table<br>
</div>
in ccl.? I tried :size 90000 but when I interrupted the
program the initial
<div class="im"><br>
size of the table was only about 15k. I admit that
l0-hash.lisp is a little<br>
too much brain drain for me to try to understand (in a
reasonable amount of<br>
time) everything that's going on with hashing so I
thought I'd ask.<br>
Thanks,<br>
George<br>
<br>
On 01/16/2013 04:07 PM, Dave Cooper wrote:<br>
<br>
Did you try it using the :size argument to
make-hash-table, with the<br>
estimated number of entries in the file system?<br>
<br>
<br>
Best Regards,<br>
</div>
? Dave
<div class="im"><br>
<br>
<br>
On Jan 16, 2013, at 6:31 PM, George Khouri <<a moz-do-not-send="true" href="mailto:gk1@four-four.com" target="_blank">gk1@four-four.com</a>> wrote:<br>
<br>
Friends,<br>
In a program that traverses a file system of
~125000<br>
entries and adds the pathnames as keys to an
#'equal hash<br>
table, ccl hangs in CCL::LOCK-FREE-PUTHASH with no<br>
progress after ~14000 entries - seems like it's
just<br>
thrashing. No error is thrown and I can interrupt
it with<br>
control-C.<br>
If I add the pathnames to an #'eql hash table ccl
has no<br>
problem with the same directory.<br>
</div>
Is this expected behavior??
<div class="im"><br>
<br>
I tried the same thing in SBCL which works with
either an<br>
</div>
eql or an equal hash table.? BTW, the eql version
runs in
<div class="im"><br>
~6 seconds in SBCL, compared to ~70 seconds in
CCL, though<br>
I realize this speed difference could be a memory
issue<br>
and not one of algorithms. I know nothing about
SBCL.<br>
<br>
I'm running ccl 1.8 on linux 3.2.0-33-generic
Ubuntu 12.04<br>
on a core i5 thinkpad, 4GB.<br>
<br>
</div>
It doesn't "appear" that I'm? running low on memory,
but
<div class="im"><br>
the program behaves that way with the equal hash
table,<br>
and though I've tried the various ccl command-line
options<br>
to increase stack and heap allocations, as well as
ulimit<br>
</div>
stack size, I've had no change in results.? I've
tried
<div class="im"><br>
various values for initial hash table size, rehash
size<br>
and threshold but nothing seems to have any
beneficial<br>
</div>
effect.? The system monitor doesn't show any disk
swapping
<div class="im"><br>
going on.<br>
Is there some memory parameter I'm missing that I
need to<br>
</div>
change? Is there? a problem related to having 4 cpu
cores?
<div class="im"><br>
<br>
Here's the typical output when I interrupt the
seemingly<br>
hung program - the hash table won't grow beyond
this<br>
point:<br>
<br>
</div>
Welcome to Clozure Common Lisp Version 1.8-r15286M?
<div class="im"><br>
(LinuxX8664)!<br>
? (traverse-filesystem "/home/gkhouri/")<br>
................^C<br>
> Break: interrupt signal<br>
> While executing: CCL::LOCK-FREE-PUTHASH, in
process<br>
listener(1).<br>
...<br>
1 > (:f 0)<br>
</div>
?(7FB9FE90FF80) : 0 (LOCK-FREE-PUTHASH
<div class="im"><br>
#P"/tmp/gkhouri-test/Documents/Music/M Stal
Tunes.csv"<br>
#<HASH-TABLE :TEST EQUAL size 13672/13673
#x302000753F9D><br>
T) 613<br>
</div>
? (CCL::KEY CCL::HASH CCL::VALUE)<br>
?? CCL::KEY: #P"/tmp/gkhouri-test/Documents/Music/M
Stal<br>
Tunes.csv"<br>
?? CCL::HASH: #<HASH-TABLE :TEST EQUAL size
13672/13673<br>
#x302000753F9D><br>
?? CCL::VALUE: T
<div class="im"><br>
<br>
1 > (room)<br>
Approximately 17,432,576 bytes of memory can be
allocated<br>
before the next full GC is triggered.<br>
<br>
</div>
?????????????????? Total Size????????????<br>
Free???????????????? Used<br>
Lisp Heap:?????? 41025536 (40064K)?? 17432576
(17024K)??<br>
23592960 (23040K)<br>
Stacks:????????? 11072816 (10813K)?? 11055872<br>
(10797K)????? <a moz-do-not-send="true" href="tel:16944" value="+4916944" target="_blank">16944</a>
(17K)<br>
Static:????????? 20304144 (19828K)????????? 0
(0K)??????
<div class="im"><br>
20304144 (19828K)<br>
376792.870 MB reserved for heap expansion.<br>
<br>
<br>
If I create the equal hash table with :lock-free
set to<br>
</div>
nil,? (make-hash-table :test #'equal :lock-free
nil),? ccl
<div class="im"><br>
presumably uses an older hashing algorithm and I
don't<br>
think I have the problem storing the pathnames,
BUT after<br>
about 15,000 entries, things slow down
drastically, and<br>
unacceptably, presumably from rehashing (?), and I
killed<br>
the program after about 60,000 entries because it
was<br>
taking so long. With :lock-free enabled, I never
get past<br>
~14K entries.<br>
<br>
Thanks for any info.<br>
George Khouri<br>
<br>
<br>
<br>
<br>
_______________________________________________<br>
Openmcl-devel mailing list<br>
<a moz-do-not-send="true" href="mailto:Openmcl-devel@clozure.com" target="_blank">Openmcl-devel@clozure.com</a><br>
<a moz-do-not-send="true" href="http://clozure.com/mailman/listinfo/openmcl-devel" target="_blank">http://clozure.com/mailman/listinfo/openmcl-devel</a><br>
<br>
<br>
<br>
<br>
</div>
</blockquote>
<div class="HOEnZb">
<div class="h5">
_______________________________________________<br>
Openmcl-devel mailing list<br>
<a moz-do-not-send="true" href="mailto:Openmcl-devel@clozure.com" target="_blank">Openmcl-devel@clozure.com</a><br>
<a moz-do-not-send="true" href="http://clozure.com/mailman/listinfo/openmcl-devel" target="_blank">http://clozure.com/mailman/listinfo/openmcl-devel</a><br>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</body>
</html>