[Openmcl-devel] Re: persistence of xref info in fasl files.

Sat Jan 3 15:42:30 PST 2004

Hi Gary,

Welcome back. I'm still on vacation but finally made it to a Starbucks 
which has a net connection.

Not sure if you realized, but I wrote this as an extension to the xref 
code that someone else supplied recently and
which you incorporated into ccl in the latest release. The policy 
encoded there seemed reasonable  and useful.

I didn't try to make it any better than it was,  other than to make it 
work when you load a fasl into a fresh image.

Regarding interestingness of macros my guess is that it isn't worth 
thinking too hard about this - memory is cheap.
Perhaps just keep a user defined list of *uninteresting-macros* to 
ignore.

Of the 8 cases you list, the current xref code handles handles all but  
2,3,4. I think 4 is also covered with the :exhaustive t
argument to get-relation. I'm assuming from the earlier thread that we 
would like to drop the call to (callers ..) at some point.

Regarding your suggestion, I it is nice and compact. To support the 
current xref functionality we would need to add the names
of macros expanded to the list of constants (I think that that's all 
that would need to be added but am not 100% sure).
We could use the existing xref code to assign role to the reference.

I'd be glad to help do this. I don't know what the appropriate place to 
add the expanded macros to the constants in the function vector.

BTW, I've also got  a working port of my MCL code (disassemble+) that 
recorded source location information into compiled code and
patched it into slime (my version) so that when you  are in the 
debugger it hilights the code that was executing (as best it can). 
Here's a sample
of commented disassembly:

(disassemble 'show-uvector)

   (twnei nargs 4)
   (mflr loc-pc)
   (bla .spsavecontextvsp)
   (vpush arg_z)
   ("REGSAVE" save0 4)
   (vpush save0)
;  Start (dotimes (i (uvsize u)) (format t "~&~d : ~s" i ...
;   Start (uvsize u)
;    Access variable u
      (lwz arg_z 4 vsp)
     (clrlwi imm0 arg_z 30)
     (twnei imm0 2)
     (lwz imm0 -6 arg_z)
     (rlwinm arg_z imm0 26 6 29)
;   End (uvsize u)
    (vpush arg_z)
    (li save0 '0)
    (lwz arg_z 0 vsp)
    (bla .spinteger-sign)
    (cmpwi imm0 0)
    (ble l144)
L64
;   Start (format t "~&~d : ~s" i (uvref u i))
     (li arg_z 8230)
     (vpush arg_z)
;    Start (uvref u i)
;     Access variable u
       (lwz arg_y 12 vsp)
;     Access variable i
       (mr arg_z save0)
      (bla .spmisc-ref)
;    End (uvref u i)
;    Access variable i
      (mr arg_y save0)
;    Value: "~&~d : ~s"
      (lwz arg_x '"~&~d : ~s" fn)
     (set-nargs 4)
     (lwz temp3 'format fn)
     (bla .spjmpsym)
;   End (format t "~&~d : ~s" i (uvref u i))
;   Access variable i
     (mr arg_z save0)
    (set-nargs 1)
    (lwz temp3 '1+ fn)
    (bla .spjmpsym)
    (mr save0 arg_z)
;   Access variable i
     (mr arg_y save0)
    (lwz arg_z 0 vsp)
    (bla .spbuiltin-eql)
    (cmpwi arg_z nil)
    (beq l64)
L144
    (li arg_z nil)
    (lwz save0 4 vsp)
    (ba .sppopj)
;  End (dotimes (i (uvsize u)) (format t "~&~d : ~s" i (u...

It needs to be cleaned up up some but if anyone is interested in 
reviewing it before I do more work I'm
sure it will be better for it. There are similar issues of how to 
encode the information for the fasl files and how
to appropriately hook into the compiler.

-Alan

On Jan 3, 2004, at 5:53 PM, Gary Byers wrote:

>
>
> On Sat, 3 Jan 2004, Alan Ruttenberg wrote:
>
>> Here's an implementation. Gary, I don't really know which what the 
>> best
>> place to put the hooks, so I've used advise where it seemed
>> appropriate. Please feel free to fix or tell me how to.
>>
>> I came across one issue when doing this.  If you have (defun foo ()
>> (flet ((bar ())) *baz*))
>> then you get a recorded xref from bar to *baz*. I think this should be
>> foo to *baz* since bar isn't global. If you disagree with that policy
>> the code needs to be reworked a bit.
>>
>
> I can't remember whether I sent mail about this before the holidays,
> or who I would have sent it to if I did.
>
> It does seem to me that "the function itself" is a good place to keep
> information about what things a function references; among other 
> things,
> that helps to ensure that the information gets GCed when the function
> does.
>
> Depending on what you mean when you ask "what does the function X
> reference ?", some or most of that information's already there. A
> function in OpenMCL's a "uvector" (the name once meant "uniform"
> or "universal vector"; no one remembers); you can find the number
> of elements in a uvector with CCL:UVSIZE, and access those elements
> with CCL:UVREF.  (UVREF's SETFable; think of the fun to be had
> setting the elements of a function.)
>
> ? (defun show-uvector (u)
>   (dotimes (i (ccl:uvsize u))
>     (format t "~&~d : ~s" i (ccl:uvref u i))))
> SHOW-UVECTOR
> ? (show-uvector #'show-uvector)
> 0 : #<CODE-VECTOR #x5438B86>
> 1 : "~&~d : ~s"
> 2 : FORMAT
> 3 : (FUNCTION-SYMBOL-MAP (#(I #:G130 U) . #(31 48 120 575 44 120 63 20 
> 120)))
> 4 : SHOW-UVECTOR
> 5 : 8388864
> NIL
>
> If you're interested in this sort of thing, you might find it more
> interesting to look at a broader sample of functions.  The general
> rules are:
>
> a) element 0 is always a "code-vector": a sequence of machine 
> instructions.
> b) the last element is always a fixnum; bits in this fixnum describe
>    various attributes of the function.
> c) depending on whether one of the bits in (b) is set or not, the next-
>    to-last element is the function's "name"; this is mostly for 
> debugging,
>    but there are some things (the macroexpansion of DEFUN, for 
> instance)
>    that attach greater significance to it.
> d) depending on the setting of another bit in (b), the 
> next-to-next-to-last
>    element is a plist whose entries contain debugging information.
> e) All other elements are things ("constants" or "immediates") that're
>    referenced by the function.
>
> Rules a-d apply to #'SHOW-UVECTOR, which means that elements 1 and 2
> are just "constants that the function references": the format string
> and the symbol FORMAT, in this case.
>
> Looking at larger functions would give you a different impression,
> but let's look at another small one:
> .
> ? (defun bind-package-and-return-nil ()
>     (let* ((*package* *package*))
>       nil))
> BIND-PACKAGE-AND-RETURN-NIL
> ? (show-uvector #'bind-package-and-return-nil)
> 0 : #<CODE-VECTOR #x5437666>
> 1 : #<SVAR *PACKAGE* 41 #x5035456>
> 2 : (FUNCTION-SYMBOL-MAP NIL)
> 3 : BIND-PACKAGE-AND-RETURN-NIL
> 4 : 8388608
>
> Again, rules a-d apply, and this function only references a single
> constant (element #1); that's an object of type CCL::SVAR that refers
> to the special variable *PACKAGE*.  (SVARs are used in the per-thread
> special-binding and reference code in recent versions of OpenMCL; in
> older versions, this constant would have just been the symbol 
> *PACKAGE*.)
>
>> From this little sample, we can observe:
> 1)  It's fairly easy to find the constants that a compiled function
>     (might) actually need to reference.
> 2)  There's currently no information retained about things that're
>     referenced at compile-time (macros) or about things that're
>     inlined, strength-reduced, dead-code-eliminated, 
> source-transformed,
>     or otherwise appeared in the source code but aren't referenced in
>     the object code.  (UVREF, UVSIZE, DOTIMES in #'SHOW-UVECTOR).
> 3)  There's no metainformation that tells us in any detail -how- the
>     constants are being used: a function like:
>
> (defun cons-em-up ()
>   (cons "~&~d : ~s" 'FORMAT))
>
> makes random references to the same constants that #'SHOW-UVECTOR does,
> and we wouldn't want an XREF utility to say that CONS-EM-UP calls 
> FORMAT.
>
> Let's assume that problem (2) is easy to solve, and that the compiler
> frontend kept track of that fact that DOTIMES, UVREF, and UVSIZE were
> referenced in the code and the backend just emitted those symbols as
> constants, so #'SHOW-UVECTOR might look like:
>
> 0: #<code vector>
> 1: "~&~d : ~s"
> 2: FORMAT
> 3: DOTIMES
> 4: CCL:UVSIZE
> 5: CCL:UVREF
> 6: (FUNCTION-SYMBOL-MAP ...)
> 7: SHOW-UVECTOR
> 8: 8388864
>
> (It might take a while to come up with a reasonable policy here:
> is DOTIMES interesting ?  It might have macroexpanded into "calls"
> to 1+, or =.  Are those interesting ? Etc.)  Once we've decided what's
> interesting, it's not too hard to ensure that everything that was
> interesting to the frontend shows up in the backend.
>
> If we agree that problem 2's solvable, we can look at problem (3).
> There seem to be a relatively finite number of interesting ways
> in which a constant can be referenced.
>
> 0) as a global function name.  The last time that I checked, slightly
>    over 50% of all function constants in MCL fell into this category;
>    that was several years ago, but I'd be surprised if the percentage
>    changes that often.
> 1) as an interesting macro name.
> 2) as a random argument to a function, return value, etc.
> 3) as a functonal argument to a function known to take functional
>    arguments.
> 4) as a function name that only appeared in the sourcce
> 5) in a special variable reference operation.
> 6) in a special variable assignment operation.
> 7) in a special variable binding operation.
>
> I may be missing something, but it seems pretty likely that
> we can encode the interesting cases in an 8 bit byte.  (a given
> constant might have more than one use, as in:
>
> (defun foo ()
>   (format t "~s is a programming language in its own right." 'format))
>
> .) A special variable (or a CCL::SVAR denoting one) might have 3 or
> even more of these attributes; most constants in a given function would
> have one or two.
>
> For the five constants (elements 1-5) in the revised #'SHOW-UVECTOR as
> they're used there, we'd have:
>
> 1: "~&~d : ~s"		; (ash 1 2)
> 2: FORMAT		; (ash 1 0)
> 3: DOTIMES		; (ash 1 1)
> 4: CCL:UVSIZE		; (ash 1 4)
> 5: CCL:UVREF		; (ash 1 4)
>
> using bit numbers corresponding to the list above. If the next-to-next-
> to-last element of the function (the debugging-info-plist, or 
> CCL::%LFUN-INFO)
> contained an entry of the form:
>
> (xref-constant-information-map #<5-element vector of UNSIGNED-BYTE 8>)
>
> I -think- that we'd be able to say that the data structure that 
> contains
> the XREF information for a function is the function, and I think that
> there'd be some advantages to not having to maintain separate data
> structures and keep them in synch.  For instance:
>
> (defun x (arg) (foo (1+ arg)))
> (setf (symbol-function 'y) #'x)
> (setf (symbol-function 'x) #'(lambda ()))
> (who-calls 'foo)
>
>
> There are some possible advantages to using a separate data structure
> as Alan's code does as well.  I'm not 100% convinced that the idea I'm
> proposing is entirely better, but I think that it's worth thinking 
> about
> further.  It has always sort of struck me that (incomplete and 
> imprecise)
> XREF info's already there and there's a lot of it, and it seems more
> attractive to make it more complete and precise than to duplicate it.)
>
>> The bit about (add-pre-xref-xrefs) is for bootstrapping.
>>
>> BTW, what does "indirect calls" mean?
>
> Maybe that bit 3 is set in the corresponding 
> xref-constant-information-map ?
>
>>
>> -Alan
>>
>>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 11248 bytes
Desc: not available
URL: <https://lists.clozure.com/pipermail/openmcl-devel/attachments/20040103/85c0dcdd/attachment.bin>