[Openmcl-devel] *initial-process* and *current-process*

Gary Byers gb at clozure.com
Fri Jun 24 13:16:21 PDT 2011


Well, I think that I have a partial explanation for the mystery.  I said the

other day that SAVE-APPLICATION wouldn't write a partial image: if it erred
before writing the full image to disk, the result wouldn't be loadable.
That's true up to a point ...

(SAVE-APPLICATION output :PREPEND-KERNEL T) works by finding the file
containing the running kernel, copying that file to the "output"
pathname.  If the kernel contains an embedded heap image (if it was
the result of a previous SAVE-APPLICATION call with :PREPEND-KERNEL,
that embedded image (from the running kernel) is copied to the output
file; the intent is that the file will be truncated (so that the
embedded image isn't present) at a slightly later point in time.

The short version of what happens next is that (because of a post-1.6
change in the trunk) we never reach that later point in time because
something calls CCL:QUIT with a default exit status of 0.  We never
get around to truncating the file and never actually write the current
heap contents to the output file; it's entirely identical to the image/kernel
from from which SAVE-APPLICATION was called:

[src/oa.build-trunk-new] gb at rinpoche> md5sum src/boot/strap/bootsys src/lisp/lisp
6ab851c88925ba34b66ba381e1b17d4a  src/boot/strap/bootsys
6ab851c88925ba34b66ba381e1b17d4a  src/lisp/lisp

That's pretty much exactly the behavior that you described (and which I've said/
been thinking was impossible.)

There are at least two bugs there: the call to QUIT was coming from the 
(modified) code to handle a user-supplied toplevel-function.  It makes
some sense to ensure that QUIT is called if that function just returns,
but this was happening even if the thread was being killed.  We also
really, really don't want the copied executable image to "look valid"
at this point: if we don't get around to truncating it and writing
current heap contents to the file, it shouldn't look like a valid image.

I think that the first of these is fixed in r14849; I made that change
and then tried "make' instead of "make all-boot" and it's been running
for quite a while now ...

The second bug is likely harder to fix, but it'd be worth putting some
effort into that (if only to try to ensure that this never, ever
happens again.)

On Fri, 24 Jun 2011, Gabriel Dos Reis wrote:

> On Fri, Jun 24, 2011 at 7:22 AM, Gary Byers <gb at clozure.com> wrote:
>> On Fri, 24 Jun 2011, Gabriel Dos Reis wrote:
> [...]
>> ### and this is the equivalent output where the trunk CCL was used:
>>
>> ../../src/driver/open-axiom --execpath=../lisp/lisp --make
>> --main="|AxiomCore|::|topLevel|"\
>> ? ? ? ? ? ? ? ?--system=../../x86_64-unknown-linux-gnu \
>> ? ? ? ? ? ? ? ?--prologue='(pushnew :open-axiom-boot *features*)' \
>> ? ? ? ? ? ? ? ? --output=strap/bootsys --load-directory=strap
>> strap/initial-env.lx64fsl strap/utility.lx64fsl strap/tokens.lx64fsl
>> strap/includer.lx64fsl strap/scanner.lx64fsl strap/pile.lx64fsl
>> strap/ast.lx64fsl strap/parser.lx64fsl strap/translator.lx64fsl
>> ../../src/driver/open-axiom --execpath=strap/bootsys --translate
>> --import=skip --output=stage1/utility.clisp
>> ../../../oa.svn/src/boot/utility.boot
>> ../../src/driver/open-axiom --execpath=../lisp/lisp
>> --output=stage1/utility.lx64fsl --compile --load-directory=stage1
>> stage1/utility.clisp
>>>
>>> Error of type FILE-ERROR: File #P"stage1/utility.clisp" not found
>>> While executing: CCL::FCOMP-FIND-FILE, in process toplevel(2).
>
> Yes, indeed, we are seeing the same thing.
>
>>
>> e.g., it looks like the command before the last one was expected to create
>> stage1/utility.lisp but failed to do so.
>
> That command line was supposed to translate the file boot/utility.boot to
> Lisp (stage1/utility.clisp).  The executable strap/bootsys (which was saved
> from a previous stage) did run.  However, it did not execute the function
> that is supposed to do the translation, because the handler consulted a
> global table to associate a handler to the option "translate".  However it
> could not find the handler in the table because the correct value of that
> table was not saved -- as I described in earlier messages.
>
>> I don't know why (or even have
>> a good idea of where to look), but hopefully we're ?both seeing the same
>> thing.
>
> Yes, we are seeing the same thing.  In oa.svn/src/lisp/core.lisp.in,
> the function |link| loads fasls, then calls |saveCore|, which calls
> SAVE-APPLICATION.  If you inspect the value of the global
> variable |$driverTable| right before calling |saveCore|, you would
> see that it contains entries for the key (|translate| . "boot") -- when
> building strap/bootsys.  However, if you look at that variable in
> |topLevel| (when invoking strap/bootsys), you would see that entry
> isn't there anymore (along with some other entries that are supposed
> to be there.)
>
>>
>> I don't know whether the global special variables that you see as having
>> unexpected values are application-specific, standard CL/CCL things, or some
>> mixture. ?The only post-1.6 change to code related to SAVE-APPLICATION
>> that I can think of might cause the user-specified toplevel-function to
>> run with (ordinarily) slightly different values for some of the standard
>> steam variables (*TERMINAL-IO* and some other things that're SYNONYM-STREAMs
>> to it.) ?Are these variables among those that have unexpected values ?
>
> No.  But I can also say that if I SETF the SYMBOL-PLIST of a symbol
> in |link| before calling SAVE-APPLICATION, that property does not appear
> in the final strap/bootsys.  As I said in a previous message, the only
> LET-binding (or equivalent) that is in place when |link| run is that of
> *PACKAGE* (as you see in |topLevel|).
>
> Arthur suggested possible issue with DEF-LOAD-POINTERS but I don't
> use it and I do not know whether it is used indirectly by CCL in my
> setting.
>
> Thanks,
>
> -- Gaby
>
>



More information about the Openmcl-devel mailing list