[Openmcl-devel] bogus objects
Gary Byers
gb at clozure.com
Sat Nov 13 18:51:37 PST 2004
On Sat, 13 Nov 2004, Cyrus Harmon wrote:
>
> On Nov 12, 2004, at 5:46 PM, Gary Byers wrote:
> >
> > If there's code doing something like:
> >
> > (with-output-to-string (*S*)
> > (loop
> > ;; spend lots of time here
> > ))
> >
> > and one or more other threads are writing to *S*, then I can easily
> > imagine
> > that the lack of locking could lead to *S* or things near it in memory
> > getting clobbered.
> >
> > (Which does sound suspiciously like what you're seeing.)
>
> And, presumably, a big, honkin' format call would probably cause the
> same problem, as I imagine there might be a with-output-to-stream down
> in format, or at least there is one in the interim definition in
> l1-format.lisp, but I don't know if this gets replaced or not.
>
(FORMAT NIL ...) creates and writes to a string stream, and I think
that there's a general assumption that they're cheap to create and
use and don't the locking/synchronization stuff that full-blown
file/socket/etc streams have.
This is sort of a judgement call: if two threads call VECTOR-PUSH-EXTEND
on an array with a fill pointer at the same time, bad things can happen,
but I don't think that people expect the implementation to protect all
extensible vectors with locks. (Depending on your definition of "bad
thing", two threads trying to do (INCF *FOO*) at the same time could
cause problems. Ad infinitum.)
> Lots of nested formats all over this code. Most are going to make their
> own string-streams, but some, especially error-related stuff, which is
> an error where I'm seeing some problems, are going to write to
> *<some-funky-error-streams>* and this is probably where things are
> getting munged. Bummer...
>
> One other odd thing is that I'm seeing errors on nested princ-to-string
> calls. I'm surprised to see erorrs here as 1) this probably happens all
> the time and 2) l1-io.lisp is making new streams with
> (with-output-to-stream (s) (princ s)). It looks like this is the stream
> that's becoming bogus:
My guess is that it's becoming bogus because something that's incidentally
near it in memory is getting accessed incorrectly.
Writing a character to a string-output-stream is basically involves adding
a character to the underlying extensible string and updating an index
or two. If that all happens atomically (with nothing else modifying
the stream and with the entire process running to completion without
getting interrupted), nothing should go wrong (and, as you note, it
happens pretty regularly.)
Nothing in the implementation guarantees that either of those conditions
hold, and Araneida apparently expects at least the first of them to do
so. I'm surprised by that expectation (I think of string-streams as
being lightweight and thread-private), but I can't really say that the
expectation's misguided.
In the current case using a string stream incidentally (for (FORMAT NIL ...))
means that you cons up a stream object and a string or two that may
become garbage. I don't think that adding a lock to each string stream
is desirable (the OS would have to be involved in the creation and
destruction of that lock), though it may be possible to defer the OS's
involvement to the (hopefully rare) case where there's some actual
contention involved. I'd have to think about that a bit more.
If the shared string-output-streams that Araneida creates are basically
"permanent", it might be practical to:
a) create a corresponding lock for each global shared string stream
b) ensure that any code which accesses such a shared stream grabs
the corresponding lock before doing so.
If that -is- practical, I suspect that you'll see the memory corruption
go away. (I'm relieved that this doesn't seem to be a GC problem.)
If it isn't practical, please let me know and I'll try to talk to
Dan Barlow about it
>
> On another note, I'm sure this is an FAQ, but what do the numbers at
> the end of the lines mean? e.g. the 192, 256, 148 etc...?
It's the relative offset of the pending return address within the
corresponding function's code. If you disassemble the function and
count instructions (each of which is exactly 4 bytes wide) you should
usually see that the preceding instruction is some sort of call (BL*)
or trap (TW*). A full-blown GUI backtrace could/should do this for
you.
More information about the Openmcl-devel
mailing list