[Openmcl-devel] Directory and symlinks
Gary Byers
gb at clozure.com
Fri Nov 11 15:10:59 PST 2011
On Fri, 11 Nov 2011, Zach Beane wrote:
>
> In light of CCL's interpretation of the standard'
As I noted the other day, SBCL, AllegroCL, and LispWorks (at least)
seem to share this interpretation. If you phrase it the way you just
did, someone might think otherwise. That could have an unfortunate
effect on the signal-to-noise ratio ...
-------------------------------------------------------------------
[~/foo] gb at leadfoot> pwd
/home/gb/foo
[~/foo] gb at leadfoot> ls
[~/foo] gb at leadfoot> ln -sf /usr/local/src/ccl .
[~/foo] gb at leadfoot> ls
ccl
[~/foo] gb at leadfoot> ls -F
ccl@
[~/foo] gb at leadfoot> sbcl
This is SBCL 1.0.45.0.debian yadda yadda yadda
Yadda yadda yadda. Yadda.
* (directory "/home/gb/foo/*/*.image")
NIL
* (directory "/home/gb/foo/*")
(#P"/usr/local/src/ccl/")
* (quit)
[~/foo] gb at leadfoot> ccl64
Yadda yadda yadda 1.8-dev-r15052M-trunk (LinuxX8664)!
? (directory "/home/gb/foo/*/*.image")
NIL
? (directory "/home/gb/foo/*") ;[2]
(#P"/usr/local/src/ccl/")
?
-------------------------------------------------------------------
I have demo versions of AllegroCL and LispWorks installed on another
machine. They behave the same way as SBCL and CCL (and possibly other
implementations) do for both calls to DIRECTORY [1], and this behavior
involves enumerating the set of matching entities (without resolving
or following the TRUENAMEs of those entities) and then returning a
list of those TRUENAMEs. I think that this is what the spec says
(paraphrased slightly).
The version of CLISP that I have (2.48) behaves differently on both
calls:
[Yaddas omitted ...]
1]> (directory "/home/gb/foo/*/*.image")
(#P"/home/gb/foo/ccl/lx86cl.image" #P"/home/gb/foo/ccl/lx86cl64-save.image"
#P"/home/gb/foo/ccl/lx86cl64.image")
[2]> (directory "/home/gb/foo/*")
NIL
I don't see an interpretation of the spec that that complies with, but the
spec is vague enough that such an interpretation could certainly exist.
(The behavior of CLISP on the first call seems to be somewhat analogous
to bash shell globbing. I have to confess that I have no idea what that
has to do with what the spec says about DIRECTORY ...)
Are we all clear on that ? Good. (It'd be rude to shout that in all
caps for the benefit of those not paying attention. I confess to having
been tempted.)
, can you suggest a
> terse way to get a list of all files with a pathname-type of "txt" in a
> directory tree rooted at e.g. #p"foo/" where some entries under foo/
> might be symlinks to directories outside of foo/? I don't mind using
> something implementation-specific if it does the job.
>
> Thanks,
> Zach
>
Something like;
;;; This ignores Windows-specific issues having to do with "drive letters".
;;; Dealing with those issues is left as an exercise.
0) Start with the current directory set to #p"foo/" and the set of
entries found so far empty.
1) Collect the truenames of the toplevel contents of the current
directory.
(directory (merge-pathnames "*" current-directory)) ; works in CCL and SBCL at least
2) For each entry returned in step 1, add the entry to the set. If the entry
wasn't already present and the entry denotes a directory, recurse on step 1.
Return from the current level of recursion.
3) When we return from the outermost level of recursion, return the set.
That's certainly a PITA. If DIRECTORY took an option that caused it to
process links to directories as if they were the directories themselves,
that option could be very useful and could largely eliminate the need
for the user to write something like this themselves.
A couple of (hopefully) final points that I'll try not to belabor:
a) How DIRECTORY canonicalizes its pathname argument and how it
traverses and resolves the entities it finds are two different issues.
(In other words, the fact that
(directory "/home/gb/foo/ccl/*.image")
and
(directory "/home/gb/foo/*/*.image")
may return different results isn't surprising.)
The fact that shell pattern-matching ("globbing") behaves differently
would be interesting if this had something to do with shell globbing.
b) If anyone's (still) interested in this and hasn't done so recently,
I encourage them to read the "ls" man page; the point of that exercise
is to impress on anyone who hasn't thought about it that there are lots
of ways to enumerate and traverse the contents of directories and many
of the differences between these ways have to do with how symbolic links
are treated. Scary as it might sound, there are probably useful behaviors
that no combination of "ls" options provide. The default behavior - "ls"
with no options - shows links without attempting to resolve them. That
may sometimes be useful, often isn't, and is sort of a bare bones starting
point that all of the other options expand on.
When ANSI CL was being standardized, a number of filesystems that are
rarely used today were in widespread use, including Unix variants that
didn't support symbolic links as well as those that did and systems
that were used in institutions that used CL heavily but were rarely
used outside of those institutions. It would probably have been surprising
to see that the world had consolidated 20-25 years later to essentially
2 widely-used filesystems (Windows and Unix, with some variants of each)
and even more surprising that the consolidation involved those 2 systems
(as in "really ? In 20-25 years, that's the best that you can do ?").
The way that (I believe) DIRECTORY is specified is a sort of arbitrary
least-common-denominator behavior in the same sense that "ls" with no
options is. It's not necessarily what any of us would choose for
default behavior in a world where almost everyone uses some Unix
variant except for people who use Windows, but in the context in which
the spec was written it was probably the most reasonable baseline behavior
(or much closer to that than it now appears.)
The spec says that unless otherwise stated, any function that accepts
keyword arguments can be extended by the implementation to accept
implementation-specific keywords, and DIRECTORY is defined to accept
keywords (but no portable keywords are defined.) Getting CL
implementations to agree on much of anything may seem like herding
cats, but I'd agree that it'd be good if implementations (including CCL)
that aren't particularly concerned about supporting VMS/MS-DOS/HFS/... anymore
both extended DIRECTORY to (among other things) change this behavior to
offer different treatment of links (especially links to directories) and
used the same keywords to provide those extensions. (Wait, what am I thinking?
It'd be too much like herding cats ...)
Sorry that this was so long.
----
[1] As far as I know; I only tried something analogous to the first call
when I looked the other day.
[2] I've seen different behavior on Linux, seemingly according to whether
the link existed when a CCL (at least) session was started or was added
during the session. CCL doesn't cache DIRECTORY results; it's possible
that the OS or C library functions used to traverse directories do.
More information about the Openmcl-devel
mailing list