<html>
  <head>
    
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    namestrings do serve as a workaround, and so does an EQL hash table
    with pathnames - I just wanted to clarify whether  EQUAL hash tables
    should work with a large number of pathnames.  CCL stores the first
    10k entries in a second or two on my system. It's after that that
    things aren't pretty.<br>
    George<br>
    <br>
    <div class="moz-cite-prefix">On 01/16/2013 10:11 PM, Hans Hübner
      wrote:<br>
    </div>
    <blockquote cite="mid:CABj0bwJo+Yo14hZb299Rvo6G7E2hX76-jA+0iNXP8qdFbv6B6w@mail.gmail.com" type="cite">
      <div dir="ltr">One idea to try would be to not use pathnames, but
        name strings as keys.
        <div><br>
        </div>
        <div style="">-Hans</div>
      </div>
      <div class="gmail_extra"><br>
        <br>
        <div class="gmail_quote">On Thu, Jan 17, 2013 at 6:36 AM, Gary
          Byers <span dir="ltr"><<a moz-do-not-send="true" href="mailto:gb@clozure.com" target="_blank">gb@clozure.com</a>></span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">I think
            that we can conclude that one or both of the following are
            true:<br>
            <br>
            a) the hashing function that CCL uses for pathnames and
            EQUAL hash tables<br>
               is really really bad: it either doesn't do a very good
            job of generating<br>
               different hash values for different pathnames or it takes
            way too long to<br>
               generate those values, or both<br>
            b) EQUAL takes way too long to compare pathnames.  (EQUAL on
            pathnames is<br>
               likely never going to be cheap, but if it's grossly
            expensive for some<br>
               reason that could help to explain the results that you're
            seeing.)<br>
            <br>
            I haven't looked at things too closely yet, but I'd guess
            that (a) is more<br>
            of the problem; we may not be particularly clever about (b),
            but I don't<br>
            think that EQUAL does anything grossly stupid.  Knock wood.<br>
            <br>
            The bad news is that nap time is fast approaching for me,
            and it'd probably<br>
            be wise to wait until tomorrow before whacking away at this.
             The good news<br>
            is that many of the things that make the code in
            l0-hash.lisp get blurry<br>
            and out-of-focus as quickly as it does (thread safety and
            concurrency,<br>
            GC safety and integration) don't seem to be implicated.<br>
            <br>
            If I wake up in the middle of the night having had
            nightmares about pathname<br>
            hashing, I'll regret pointing this out, but ...  The
            function that maps objects<br>
            to integer hash codes for EQUAL hash tables is called
            CCL::%%EQUALHASH, and<br>
            it returns two values (a hash code and a boolean indicating
            whether the hash<br>
            code is derived from the address of some part of the object
            and could change<br>
            due to GC.  Let's look at the values returned by
            CCL::%%EQUALHASH for two<br>
            similar but different pathname arguments:<br>
            <br>
            ? (%%equalhash #p"/usr/local/src/ccl/level-1/")<br>
            216088106610578<br>
            T<br>
            ? (%%equalhash #p"/usr/local/src/ccl/level-0/")<br>
            216088106610578<br>
            T<br>
            ?<br>
            <br>
            Bloodcurdling ...
            <div class="im"><br>
              <br>
              <br>
              On Wed, 16 Jan 2013, George Khouri wrote:<br>
              <br>
            </div>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div class="im">
                I did but there appears to be some limit to the initial
                size of a hash table<br>
              </div>
              in ccl.? I tried :size 90000 but when I interrupted the
              program the initial
              <div class="im"><br>
                size of the table was only about 15k. I admit that
                l0-hash.lisp is a little<br>
                too much brain drain for me to try to understand (in a
                reasonable amount of<br>
                time) everything that's going on with hashing so I
                thought I'd ask.<br>
                Thanks,<br>
                George<br>
                <br>
                On 01/16/2013 04:07 PM, Dave Cooper wrote:<br>
                <br>
                Did you try it using the :size argument to
                make-hash-table, with the<br>
                estimated number of entries in the file system?<br>
                <br>
                <br>
                Best Regards,<br>
              </div>
              ? Dave
              <div class="im"><br>
                <br>
                <br>
                On Jan 16, 2013, at 6:31 PM, George Khouri <<a moz-do-not-send="true" href="mailto:gk1@four-four.com" target="_blank">gk1@four-four.com</a>> wrote:<br>
                <br>
                      Friends,<br>
                      In a program that traverses a file system of
                ~125000<br>
                      entries and adds the pathnames as keys to an
                #'equal hash<br>
                      table, ccl hangs in CCL::LOCK-FREE-PUTHASH with no<br>
                      progress after ~14000 entries - seems like it's
                just<br>
                      thrashing. No error is thrown and I can interrupt
                it with<br>
                      control-C.<br>
                      If I add the pathnames to an #'eql hash table ccl
                has no<br>
                      problem with the same directory.<br>
              </div>
                    Is this expected behavior??
              <div class="im"><br>
                <br>
                      I tried the same thing in SBCL which works with
                either an<br>
              </div>
                    eql or an equal hash table.? BTW, the eql version
              runs in
              <div class="im"><br>
                      ~6 seconds in SBCL, compared to ~70 seconds in
                CCL, though<br>
                      I realize this speed difference could be a memory
                issue<br>
                      and not one of algorithms. I know nothing about
                SBCL.<br>
                <br>
                      I'm running ccl 1.8 on linux 3.2.0-33-generic
                Ubuntu 12.04<br>
                      on a core i5 thinkpad, 4GB.<br>
                <br>
              </div>
                    It doesn't "appear" that I'm? running low on memory,
              but
              <div class="im"><br>
                      the program behaves that way with the equal hash
                table,<br>
                      and though I've tried the various ccl command-line
                options<br>
                      to increase stack and heap allocations, as well as
                ulimit<br>
              </div>
                    stack size, I've had no change in results.? I've
              tried
              <div class="im"><br>
                      various values for initial hash table size, rehash
                size<br>
                      and threshold but nothing seems to have any
                beneficial<br>
              </div>
                    effect.? The system monitor doesn't show any disk
              swapping
              <div class="im"><br>
                      going on.<br>
                      Is there some memory parameter I'm missing that I
                need to<br>
              </div>
                    change? Is there? a problem related to having 4 cpu
              cores?
              <div class="im"><br>
                <br>
                      Here's the typical output when I interrupt the
                seemingly<br>
                      hung program - the hash table won't grow beyond
                this<br>
                      point:<br>
                <br>
              </div>
                    Welcome to Clozure Common Lisp Version 1.8-r15286M?
              <div class="im"><br>
                      (LinuxX8664)!<br>
                      ? (traverse-filesystem "/home/gkhouri/")<br>
                      ................^C<br>
                      > Break: interrupt signal<br>
                      > While executing: CCL::LOCK-FREE-PUTHASH, in
                process<br>
                      listener(1).<br>
                      ...<br>
                      1 > (:f 0)<br>
              </div>
                    ?(7FB9FE90FF80) : 0 (LOCK-FREE-PUTHASH
              <div class="im"><br>
                      #P"/tmp/gkhouri-test/Documents/Music/M Stal
                Tunes.csv"<br>
                      #<HASH-TABLE :TEST EQUAL size 13672/13673
                #x302000753F9D><br>
                      T) 613<br>
              </div>
                    ? (CCL::KEY CCL::HASH CCL::VALUE)<br>
                    ?? CCL::KEY: #P"/tmp/gkhouri-test/Documents/Music/M
              Stal<br>
                    Tunes.csv"<br>
                    ?? CCL::HASH: #<HASH-TABLE :TEST EQUAL size
              13672/13673<br>
                    #x302000753F9D><br>
                    ?? CCL::VALUE: T
              <div class="im"><br>
                <br>
                      1 > (room)<br>
                      Approximately 17,432,576 bytes of memory can be
                allocated<br>
                      before the next full GC is triggered.<br>
                <br>
              </div>
                    ?????????????????? Total Size????????????<br>
                    Free???????????????? Used<br>
                    Lisp Heap:?????? 41025536 (40064K)?? 17432576
              (17024K)??<br>
                    23592960 (23040K)<br>
                    Stacks:????????? 11072816 (10813K)?? 11055872<br>
                    (10797K)????? <a moz-do-not-send="true" href="tel:16944" value="+4916944" target="_blank">16944</a>
              (17K)<br>
                    Static:????????? 20304144 (19828K)????????? 0
              (0K)??????
              <div class="im"><br>
                      20304144 (19828K)<br>
                      376792.870 MB reserved for heap expansion.<br>
                <br>
                <br>
                      If I create the equal hash table with :lock-free
                set to<br>
              </div>
                    nil,? (make-hash-table :test #'equal :lock-free
              nil),? ccl
              <div class="im"><br>
                      presumably uses an older hashing algorithm and I
                don't<br>
                      think I have the problem storing the pathnames,
                BUT after<br>
                      about 15,000 entries, things slow down
                drastically, and<br>
                      unacceptably, presumably from rehashing (?), and I
                killed<br>
                      the program after about 60,000 entries because it
                was<br>
                      taking so long. With :lock-free enabled, I never
                get past<br>
                      ~14K entries.<br>
                <br>
                      Thanks for any info.<br>
                      George Khouri<br>
                <br>
                <br>
                <br>
                <br>
                      _______________________________________________<br>
                      Openmcl-devel mailing list<br>
                      <a moz-do-not-send="true" href="mailto:Openmcl-devel@clozure.com" target="_blank">Openmcl-devel@clozure.com</a><br>
                      <a moz-do-not-send="true" href="http://clozure.com/mailman/listinfo/openmcl-devel" target="_blank">http://clozure.com/mailman/listinfo/openmcl-devel</a><br>
                <br>
                <br>
                <br>
                <br>
              </div>
            </blockquote>
            <div class="HOEnZb">
              <div class="h5">
                _______________________________________________<br>
                Openmcl-devel mailing list<br>
                <a moz-do-not-send="true" href="mailto:Openmcl-devel@clozure.com" target="_blank">Openmcl-devel@clozure.com</a><br>
                <a moz-do-not-send="true" href="http://clozure.com/mailman/listinfo/openmcl-devel" target="_blank">http://clozure.com/mailman/listinfo/openmcl-devel</a><br>
              </div>
            </div>
          </blockquote>
        </div>
        <br>
      </div>
    </blockquote>
    <br>
  </body>
</html>