[Openmcl-devel] duplicating a stream, or other alternatives?

Mon Mar 26 18:48:09 PDT 2012

Gary Byers <gb at clozure.com> writes:

> See below.
>
> On Mon, 26 Mar 2012, Pascal J. Bourguignon wrote:
>
>>
>> When calling run-program, one need to pass a :shared stream.
>>
>> Since I may get non shared file streams (by default they're :private),
>> I'd want to "duplicate" the stream, making a shared stream to give to
>> run-program, using the same stream-device:
>> […]
>> What did I do wrong?
>>
>
> 1) called some random internal CCL function.

Of course, since I couldn't find the public API I wanted.  Isn't what
free software is for? ;-)

> 1a) called that function with the wrong type of arguments (the first
>     arg is expected to be a CLASS object, not a symbol that names a
>     class)

Right.  I'm sorry, I was too lazy to read and understand the whole file.

> 1b) called that function with symbols that don't even name classes (though
>    symbols with the same pnames that're internal to the CCL package do
>    happen to name classes).
>    UPDATE: you get some points back for noticing this, but I'm afraid that
>    that barely matters.

> 2) decided that this should happen if your function's argument is of type STREAM.
>    (STRING-STREAMs and lots of other things are of type STREAM.)

Well, run-program is not specified to work on non file streams (or
streams not having an OS file descriptor as backend, I guess a socket
stream would work).  

> 3) Think that creating a situation where two lisp streams share the same file
>    descriptor would be a good idea.  (There might be cases where this is useful,
>    but as a general rule I think it's fair to say that this isn't a
>    good idea.)

Again, this should not occur: the caller relinquishes the streams to the
program being forked, at least until that program is finished.

> 4) In CCL (and most other programming languages), streams are often buffered;
>    the "state" (position, set of characters/bytes read/written, etc.) of the
>    stream and that of the underlying file descriptor are related to each other
>    but aren't generally identical.  If it made sense to "duplicate a stream",
>    creating another stream that uses the same fd would only be a small part
>    of that.

Since we're duplicating it for an external program (fork/exec) that
inherits only the file descriptor, we cannot do much more.  But indeed,
I had the mental model of a mere dup/fork/exec;  ccl:run-program
actually copies the data between lisp streams and input spool file or
output pipes.

> 5) That's enough for now.
>
> Until an hour or two ago, the documentation said that a stream created with
> :SHARING :PRIVATE could only be used by the thread that created it.  That
> hasn't been true in several years: such a stream is "owned" by whatever
> thread first tries to do I/O on the stream, so common idioms like:
>
> (with-open-file (f path)
>   (process-run-function "something" (lamba () (print (read-line f))))
>   (sleep long-enough-to-let-the-thread-run))
>
> work as expected.  (I don't claim that that's a plausible-looking example,
> but I think that the idiom's indeed fairly common.)

This is not occuring with ccl:run-program:

test> (lisp-implementation-version)
"Version 1.7-dev-r14788M-trunk  (Linuxx8664)"
test> (in-package :cl-user)
#<Package "COMMON-LISP-USER">
cl-user> (defstruct process
           "The result of RUN-PROGRAM"
           #+(or abcl clozure cmu sbcl scl) process
           #+(or allegro clisp) input
           #+(or allegro clisp) output
           #+(or allegro clisp) error
           #+(or allegro clisp) pid
           #+(or allegro clisp) %exit-status
           #+(or allegro clisp) signal
           program arguments environment)

process
cl-user> (defun run-program (program arguments &key (wait t) (input nil) (output nil) (error nil)
                             (environment nil environmentp))
           "Runs the program with the given list of arguments.

If WAIT is true, then run-program returns only when the program is
finished, otherwise, it returns as soon as the program is launched,
and the caller must call PROCESS-ALIVE-P or PROCESS-STATUS on the
resulting process, to check when the program is finished.

INPUT, OUTPUT and ERROR specify the redirection of the stdin, stdout
and stderr of the program.  They can be NIL (/dev/null), :stream (a
pipe with the lisp process is created), a string or pathname to
redirect to or from a file, or a file or socket stream to redirect to
or from it.

ENVIRONMENT is an alist of STRINGs (name . value) describing the new
environment. The default is to copy the environment of the current
process.
"
           (check-type input  (or null (member :stream) string pathname stream))
           (check-type output (or null (member :stream) string pathname stream))
           (check-type error  (or null (member :stream) string pathname stream))

           #+clozure
           (make-process
            :process (apply (function ccl:run-program)
                            program arguments
                            :wait wait
                            :input    input  :if-input-does-not-exist :error
                            :output   output :if-output-exists :supersede
                            :error    error  :if-error-exists  :supersede
                            :element-type 'character
                            :sharing :lock
                            :external-format (list :domain nil
                                                   :character-encoding ccl:*default-file-character-encoding*
                                                   :line-termination #+windows :windows #-windows :unix)
                            (when environmentp
                              (list :env environment)))
            :program program
            :arguments arguments
            :environment (if environmentp environment :inherit))

           #-(or allegro clisp abcl clozure cmu ecl sbcl scl)
           (error "~S not implemented for ~A" 'run-program
                  (lisp-implementation-version)))
run-program
cl-user> (with-open-file (err "TESTERR.TXT"
                              :direction :output :if-does-not-exist :create :if-exists :supersede)
           (run-program "sh" '("-c" "echo error 1>&2") :wait t :input nil :output nil :error err))
> Debug: Stream #<BASIC-FILE-CHARACTER-OUTPUT-STREAM ("TESTERR.TXT"/5 UTF-8) #x302004DF391D> is private to #<PROCESS repl-thread(262) [semaphore wait] #x302004A6F09D>
> While executing: (:INTERNAL SWANK::INVOKE-DEFAULT-DEBUGGER), in process Monitor thread for external process (sh -c echo error 1>&2)(665).
> Type :POP to abort, :R for a list of available restarts.
> Type :? for other options.
1 > 

> The case that I can think of where this (stream sharing/ownership) matters
> involves things like:
>
> (with-open-file (f "home:log.txt" :sharing :private :direction :output :if-does-not-exist :create :if-exists :superseded)
>   (write-line "Copyright (c) 2012 Acme Corporation." f) ; become stream's "owner"
>   (run-program "/bin/echo" '("yow!")  :output f))
>
> (I think that some versions of a widely-used CL logging package run into this or
> something similar.)  If this can't be made to work, it'd probably be better
> to complain about the argument earlier; in this particular case - where the stream
> is essentially thread-private except for the fact that a thread used in the
> implementation of RUN-PROGRAM wants to write to it) there may be ways to avoid
> the error (possibly by temporarily transferring ownership of the stream to
> the  background process.)

The situation I have is that I want to write a portability layer over
the various run-program facilities.

Therefore the API should be the same in all implementations: it is not
reasonable to ask the client of a portability layer library to add #+ccl
such as:

  (with-open-file (err "TESTERR.TXT"
                       :direction :output :if-does-not-exist :create :if-exists :supersede
                       #+ccl :sharing #+ccl :lock)
    (run-program "sh" '("-c" "echo error 1>&2")
                 :wait t :input nil :output nil :error err))

because the portability layer run-program is there exactly to avoid them.

Zeroeth solution: That the stream ownership really be established on
                  first I/O.

First solution:   change the default :sharing :lock instead of :sharing
                  :private.

       Advantage: Non-thread conscious code works well, even if their streams
                  are passed inconspicuously to other threads.  I see no
                  inconvenient, because thread-conscious code can always use
                  :sharing :private if they want.

Second solution:  extensions:run-program doesn't create a thread, but
                  would just fork and exec.  But that would prevent it
                  to work with stream that are not backed with a file
                  descriptor.  Being able to send or receive into string
                  streams or other kind of streams  is not a feature
                  that I expect or provide for my portability layer, but
                  it's nice some implementations are able to provide it.
                  So let's strike it.

Third solution:   a way to transfer ownership of the stream, temporarily
                  or not.

Fourth solution:  a way to change the :sharing option.  Indeed it might
                  not be a good idea in general, and even in this case.

Fifth solution:   a way to duplicate the stream with a different :sharing
                  option.  

But I note that monitor-external-process actually copies the lisp
streams to or from pipes established with the forked process.  In this
situation, writing:

  (with-open-file (err "TESTERR.TXT"
                       :direction :output :if-does-not-exist :create
                       :if-exists :supersede)
    (write-line "The error output of the command is:" err)
    (finish-output err) 
    (run-program "sh" '("-c" "echo error 1>&2")
                 :wait t :input nil :output nil :error err)
    (write-line "That was the error output of the command." err))

and even:

  (with-open-file (inp "TESTINP.TXT")
    (read-line inp) ; eat the first line
    (run-program "bash" '("-c" "read line ; exit 0")
                 :wait t :input inp :output nil :error nil)
    (read-line inp)) 

seem perfectly reasonable.  After all, it's only the lisp process that
reads or writes those lisp streams; only ccl:run-program copies the
whole input stream to a temporary file, we hardly can do otherwise, not
knowing how many lines are expected by the forked process.  But in the
case of streams passed to :output and :error, we could easily allow the
former form.   On the other hand, those two forms couldn't be expected
to work in an implementation that would dup(2) the file descriptor of
the stream.

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
A bad day in () is better than a good day in {}.