[Openmcl-devel] Modal dialog problems with CCL 1.9 32/64 on Mountain Lion

Gary Byers gb at clozure.com
Thu Aug 30 07:45:05 PDT 2012

On Wed, 29 Aug 2012, Alexander Repenning wrote:

> On Aug 29, 2012, at 6:00 PM, R. Matthew Emerson wrote:
>       On Aug 29, 2012, at 7:04 PM, Alexander Repenning
>       <Alexander.Repenning at colorado.edu> wrote:
>       I think I have here a pretty Kosher (uses retain, does not
>       use depreciated functions) version of the dialog + memory
>       surge problem. This version of?choose-file-dialog is
>       completely stripped of any non essential activity. It does
>       not even return the path, i.e., there is no practical
>       value to this function.
> Please have a go and see if you can or cannot experience that
> Memory surge phenomenon. Please follow the instructions closely.
> Otherwise you may miss the issues at sometimes can be kind
> subtle.?
>       ;; CCL 1.8.1 64 (Mac App store) crash on Mountain Lion
>       10.8.1
> The version of CCL in the Mac App Store still contains a bug in it
> that Mountain Lion triggers. ?I am pretty sure that your test case is
> running into that bug.

At the very least, the way that you presented your test case triggers
a known bug in that version of CCL and says nothing about whether or
not some other bug remains.

> I think we are going around in circles.

I would have used a harsher term for the above, but OK: continually
using a 64-bit version of CCL that's known to have a bug which has
similar symptoms is "going around in circles."  You may actually
be going through some careful and controlled testing procedure and
just spacing out and clouding the issue like this when you report
your findings, but this wastes time and makes it harder than it
should be to take you seriously.  (This is not the first time that
you've done this.)

> While that bug does sound similar in
> spirit I am quite sure its not the one because:
> 1) the error also manifests itself in the 32 bit version of CCL

Matt wouldn't have wasted his time if the message he replied to had
said so clearly.

I'm honestly not trying to be dismissive or sarcastic here.  This
stuff is complicated, and it's important to be as precise as possible
when discussing it (much more precise than one might be in casual
conversation.)  That may take extra effort, but the alternative seems
unacceptable to me.

> 2) We tried the most recent version of CCL?1.9-dev-r15450-trunk?
> (DarwinX8632)! and the 64 bit version. Both the test case and the full app
> crashed with the same Memory surge.
> ?
> When I try your modified version (which we actually did try before as well)
> things APPEAR to be better as long as you just dismiss the dialog with ESC.
> However, the reason for that appear to be that then the file choose dialog
> goes into this super slow, spinning indeterminate progress indicator mode
> where it does not list contents of folder. That is interesting. However, if
> you actually try to select a file, instead of pressing ESC, then there is a
> good chance it will crash even faster than before. I never made it beyond
> the first attempt. Can you confirm?

I've seen a spinning progress indicator (something that would have been a
beachball cursor a few OS revisions ago, and generally indicates that progress
isn't being made) appear in the lower left corner of the open panel.  I wasn't
paying close attention to when this did and did not appear, but my impression
is that that there was some correlation between that cursor spinning around
and some kinds of misbehavior (e.g., "empty" or largely empty panel views that
shouldn't be empty.)

I don't think that I've seen this since trying to use a CHOOSE-FILE-DIALOG
implementation that (at a minimum) retained the panel before it was used
and released it afterwards.  I'm 100% sure that I haven't seen excessive
memory or CPU utilization, but I've only seen that once (and only while
running your application.)

> 3) An even simpler test just starting 1000 processes, one after the other,
> does not exhibit the problem (32 bit).

Note that a thread/process in 32-bit CCL needs about 2.5MB of foreign memory
just for its stacks; it also uses other finite resources (semaphores, message
ports, etc.)  You can't have 1000 runnable threads in 32-bit CCL because the
~2.5GB of foreign memory isn't available; the only way that a loop that calls
PROCESS-RUN-FUNCTION 1000 times can run to completion is some older threads
exit before newer ones are created, and the only way that happens is if those
threads run to completion (and the more threads are created and competing for
CPU time and other resources, the less deterministic that is.)

In practice, I'd expect something like:

(dotimes (i 1000) (process-run-function "nothing" (lambda ())))

to use a lot of CPU but probably not exhaust virtual memory (simply because
all of the CPU contention keeps the thread running that loop from running
very often.)  This isn't guaranteed, and it's such a ridiculous thing to do
that I'd find it difficult to get too worked up about things if it didn't.

If we put a delay in that loop:

(dotimes (i 1000)
   (process-run-function "nothing" (lambda ())))

we're probably effectively serializing thread creation, but we're also affecting
the environment in which CHOOSE-FILE-DIALOG runs (competing for some of the
same OS resources that the thread is competing for.)  I'd expect that to work
as well, but of course it's also a ridiculous thing to do and it's hard to care
too much about whether it does or not.

Also ridiculous (in different ways) is:

(dotimes (i 1000)

where CHOOSE-FILE-DIALOG retains the open-panel appropriately.  I haven't
seen that fail, but I haven't counted to 1000 either.

Apple distributes a command-line tool called "heap", which will scan the
(malloc'ed) heap zones of a specified process and (among other things)
identify the number and sizes of the ObjC instances (the program calls them
"classes" ...) it finds there.  The ObjC heap shouldn't change significantly
on each iteration and (when not answering email or trying to do work that
we actually get paid for) I've been trying to verify that it doesn't.

> In summary I think we still don't know what is going on but I don't think it
> is connected to that specific malloc issue you mention.

No one else thinks so either (at least not directly), but if you say
"running a test in a version of CCL affected by that issue says something
useful" a lot of time gets wasted by Matt or me saying "no it doesn't" and
then by you saying "and by 'version of CCL affected by that issue', I mean
'other versions'".  I am emphatically in favor of streamlining this process.

> Need to run but I do remember reading that in sandbox mode (we are NOT
> running in sandbox or or are we?) file choosers are NOT running in the main
> thread but in some other special thread.?

They actually run in another OS-level process, which means that the internals
of NSOpenPanel are probably a lot different than they used to be (and may
change again if and when the whole "sandboxing" issue goes away.)

> hmmm....
> puzzled, ?Alex?

I'm a little less puzzled than I had been.  The implementation of
wrong and could result in methods being invoked on freed objects (which
in turn could cause malloc heap corruption that snowballs into further
malloc heap corruption ...)  Modifying the state of a freed object could
involve modifying free memory (usually relatively harmless) or modifying
the state of some other object that's subsequently allocated at the freed
address (usually relatively harmful.)  I don't know for sure that any of
this leads to the memory/CPU problems that you've seen, but it's certainly
plausible that it could.   I'm fairly certain that it could lead to the
cosmetic problems (apparently empty directories and other display problems).

The problem described in ticket 1005 caused almost exactly the same symptoms.
Recall that that involved allocating a per-thread data structure whose address
could be used to identify a Mach message port; traditionally (I don't know
if this has changed), a Mach message port identifier has been a 32-bit value.
So, the code that allocated that data structure was:

(let* ((free-later ())
        (p nil))
     (setq p (#_malloc few-hundred-bytes))
     (if (and (is-32-bit-pointer p)
              (can-be-used-as-mach-port p)) ; not coincidentally also Mach port name
       (push p free-later)))
   ;; Free anything that failed the test above
   (dolist (bad free-later) (#_free bad)))

That was changed: especially on Mountain Lion, #_malloc would often return
pointers for which IS-32-BIT-POINTER wasn't true, so we use alternatives to
#_malloc and #_free on 64-bit platforms (and IS-32-BIT-POINTER is therefore
always true.)  The second test - that the pointer's address can be used as
a Mach port name - is intended to catch conflicts with the (often essentially
random) set of existing Mach port names.  The second test is implemented by
trying to use P as a Mach port name and seeing if that fails; it could fail
if P was coincidentally already a Mach port name, but might (I'd have to RTFM)
also fail for other reasons (like "Mach is too damned busy now; try later." Yes,
it's 2012.)  If the loop above was run while Mach was too damned busy, we'd keep
_mallocing until we were out of memory, and the one time that I was able to
get your application to fail malloc's heap was full.  (I don't know that this
is what's happening, but it may be worth exploring.)

To state the obvious: Matt and I (and anyone else at Clozure who could
look at this) have other work that customers are paying us to do and
that other work has to take priority over this; if you think
otherwise, you are quite simply wrong and that isn't open to
discussion (with me, my partners, or anyone else *).  If you're just
getting someone's spare time to look at this, that seems like an even
stronger reason to not waste that time with things like "running this
test in a version of CCL known to exhibit these symptoms for other
reasons exhibits these symptoms."

[*] I used to deal with a company whose slogan was "We cheat the other
guy, and pass the savings on to you!".  Sadly, they aren't in business
anymore ...

> Gary mentioned this bug in a previous message:
> - there's a known bug in 64-bit CCL on OSX that can cause lisp thread
> creation
> ??to go into a horrible CPU-burning/memory-thrashing state. ?I think
> that that
> ??bug's been present for a long time (since PPC64 days), but it's
> apparently
> ??much easier to trigger on 10.8 (and/or recent versions of CCL) than
> it has been.
> ??The problem ultimately has to do with whether or not #_malloc
> (actually #_calloc)
> ??returns a 64-bit pointer whose high 32 bits are 0 and there can be
> many factors
> ??that affect that (many of them subtle), and the fix is to stop
> assuming that
> ??it does and allocate such pointers ourselves.
> ??That's been fixed (in the trunk for a few weeks and in the 1.8 tree
> ??for a few days) in svn; the symptoms happen to be very similar to
> ??what people have reported seeing with CHOOSE-FILE-DIALOG, but the
> ??CHOOSE-FILE-DIALOG problems seem to occur for at least some people
> ??in 32-bit CCL (which was never affected by this thread-creation
> ? problem) and in freshly-updated 64-bit versions.
> The fix for this bug is not yet in the Mac App Store version of CCL.
> ?I'll try to update the Mac App Store version soon, but in the
> meantime, please try using up-to-date CCL obtained via Subversion
> (either trunk or 1.8).
> I modified your test case to make the call to the open panel take
> place in the main thread. ?It seemed to work as expected for me in an
> up-to-date trunk CCL.
> ;; modified to use gui:execute-in-gui
> ? (dotimes (i 100)
> ? ? (gui:execute-in-gui #'(lambda ()
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? (choose-file-dialog2)))
> ? ? (ccl::process-run-function "pretent to load project" ?#'(lambda
> ()))))
> (defun choose-file-dialog2 ()
> ? ;; 100% kosher: retain, no use of depreciated calls
> ? (let ((panel (#/retain (#/openPanel ns:ns-open-panel))))
> ? ? (#/runModal panel)
> ? ? (#/release panel)))
> ? (dotimes (i 100)
> ? ? (ccl::with-autorelease-pool
> ? ? ? ? (choose-file-dialog2)
> ? ? ? (ccl::process-run-function "pretent to load project"
> ?#'(lambda () )))))
> ;; this will pop up a file chooser for a number of times. Each
> time just press ESC and watch the Activity Monitor.?
> ;; Set view > update frequency in Actvity Monitor to very often
> (0.5s) for best results
> ;; Watch out for Clozure CL % CPU and Real Mem
> ;; for some time Real Mem will go up gradually (memory leak)
> then at some unpredicatable time it will SURGE to GIGABITES of
> memory?and ultimately crash CCL
> ;; with with-autorelease-pool CCL may crash quite quickly with a
> Unhandled exception 10, comment out if needed
> ; (the-amazing-memory-surge)
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/mailman/listinfo/openmcl-devel
> Prof. Alexander Repenning
> University of Colorado
> Computer Science Department
> Boulder, CO 80309-430
> vCard: http://www.cs.colorado.edu/~ralex/AlexanderRepenning.vcf

More information about the Openmcl-devel mailing list