[Openmcl-devel] Modal dialog problems with CCL 1.9 32/64 on Mountain Lion

Alexander Repenning alexander.repenning at Colorado.EDU
Thu Aug 30 20:41:06 UTC 2012


I am sorry if you feel I have wasted your or Matts time. I have always appreciated your support and beyond that provided some actual funding to Clozure Associates. Moreover, I also offered, in a couple of emails ago, potential financial support if you can help us with this problem. I still think this is a pretty general CCL issue which sooner or later would manifest itself in most gui-based CCL applications trying to run on Mountain Lion. 

I do apologize for having created a confusion with a version number (this was the result of an unnoticed build error with the latest build) but are trying to help you. We have created smaller and smaller test cases to rule out any leftover side effects and have run them on a large number of machines and CCL versions. I am sure you have spent a good deal of time on this but I am also convinced we have spent even more time. You know, we are pretty frustrated here as well. We are sitting in the same boat of trying to make things work.

Looks like the main issue is that you either think this is not a general issue with CCL on Mountain Lion or simply hope Mountain Lion is going away. I am not a great fan of Mountain Lion myself.  However, as tool builders I believe we have the responsibility to build tools for the platforms used by our users. I don't think hoping Mountain Lion will just go away is a very productive approach.  Matthew's efforts are not wasted. I can crash CCL 1.9-dev-r15450-trunk 32 and 64 bit with and without his mods (he added gui:execute-in-gui). However, with gui:execute-in-gui I have to select files to make the crash happen. 

At any rate, I think CCL and Mountain Lion are not playing together well. What do you want me to do to get things fixed: mail big check, submit more test cases, try with more tools, send Apple motivational speaker...?

Alex






_________

;; CCL 1.9-dev-r15450-trunk (32/64) crash on Mountain Lion 10.8.1


(defun choose-file-dialog2 ()
  ;; 100% kosher: retain, no use of depreciated calls
  (let ((panel (#/retain (#/openPanel ns:ns-open-panel))))
    (#/runModal panel)
    (#/release panel)))


(defun THE-AMAZING-MEMORY-SURGE ()
  (dotimes (i 100)
    (gui:execute-in-gui #'(lambda ()  ;; if you use this then you will get spinning progress bar and folder content no show
                            (choose-file-dialog2)))
    (ccl::process-run-function "pretent to load project"  #'(lambda ()))))


;; this will pop up a file chooser for a number of time. Select a file and watch the Activity Monitor. 
;; Set view > update frequency in Actvity Monitor to fast (0.5s) for best results
;; Watch out for Clozure CL % CPU and Real Mem
;; for some time Real Mem will go up gradually (memory leak) then at some unpredicatable time it will SURGE to GIGABITES of memory 
;; and ultimatley crash CCL
;; with with-autorelease-pool may crash quite quickly with a Unhandled exception 10, comment out if needed

; (the-amazing-memory-surge)


___________________


On Aug 30, 2012, at 8:45 AM, Gary Byers wrote:

> 
> 
> On Wed, 29 Aug 2012, Alexander Repenning wrote:
> 
>> On Aug 29, 2012, at 6:00 PM, R. Matthew Emerson wrote:
>> 
>>      On Aug 29, 2012, at 7:04 PM, Alexander Repenning
>>      <Alexander.Repenning at colorado.edu> wrote:
>> 
>>      I think I have here a pretty Kosher (uses retain, does not
>>      use depreciated functions) version of the dialog + memory
>>      surge problem. This version of?choose-file-dialog is
>>      completely stripped of any non essential activity. It does
>>      not even return the path, i.e., there is no practical
>>      value to this function.
>> Please have a go and see if you can or cannot experience that
>> Memory surge phenomenon. Please follow the instructions closely.
>> Otherwise you may miss the issues at sometimes can be kind
>> subtle.?
>> 
>>      ;; CCL 1.8.1 64 (Mac App store) crash on Mountain Lion
>>      10.8.1
>> The version of CCL in the Mac App Store still contains a bug in it
>> that Mountain Lion triggers. ?I am pretty sure that your test case is
>> running into that bug.
>> 
> 
> At the very least, the way that you presented your test case triggers
> a known bug in that version of CCL and says nothing about whether or
> not some other bug remains.
> 
>> I think we are going around in circles.
> 
> I would have used a harsher term for the above, but OK: continually
> using a 64-bit version of CCL that's known to have a bug which has
> similar symptoms is "going around in circles."  You may actually
> be going through some careful and controlled testing procedure and
> just spacing out and clouding the issue like this when you report
> your findings, but this wastes time and makes it harder than it
> should be to take you seriously.  (This is not the first time that
> you've done this.)
> 
>> While that bug does sound similar in
>> spirit I am quite sure its not the one because:
>> 1) the error also manifests itself in the 32 bit version of CCL
> 
> Matt wouldn't have wasted his time if the message he replied to had
> said so clearly.
> 
> I'm honestly not trying to be dismissive or sarcastic here.  This
> stuff is complicated, and it's important to be as precise as possible
> when discussing it (much more precise than one might be in casual
> conversation.)  That may take extra effort, but the alternative seems
> unacceptable to me.
> 
> 
>> 2) We tried the most recent version of CCL?1.9-dev-r15450-trunk?
>> (DarwinX8632)! and the 64 bit version. Both the test case and the full app
>> crashed with the same Memory surge.
>> ?
>> When I try your modified version (which we actually did try before as well)
>> things APPEAR to be better as long as you just dismiss the dialog with ESC.
>> However, the reason for that appear to be that then the file choose dialog
>> goes into this super slow, spinning indeterminate progress indicator mode
>> where it does not list contents of folder. That is interesting. However, if
>> you actually try to select a file, instead of pressing ESC, then there is a
>> good chance it will crash even faster than before. I never made it beyond
>> the first attempt. Can you confirm?
>> 
> 
> I've seen a spinning progress indicator (something that would have been a
> beachball cursor a few OS revisions ago, and generally indicates that progress
> isn't being made) appear in the lower left corner of the open panel.  I wasn't
> paying close attention to when this did and did not appear, but my impression
> is that that there was some correlation between that cursor spinning around
> and some kinds of misbehavior (e.g., "empty" or largely empty panel views that
> shouldn't be empty.)
> 
> I don't think that I've seen this since trying to use a CHOOSE-FILE-DIALOG
> implementation that (at a minimum) retained the panel before it was used
> and released it afterwards.  I'm 100% sure that I haven't seen excessive
> memory or CPU utilization, but I've only seen that once (and only while
> running your application.)
> 
> 
>> 3) An even simpler test just starting 1000 processes, one after the other,
>> does not exhibit the problem (32 bit).
> 
> Note that a thread/process in 32-bit CCL needs about 2.5MB of foreign memory
> just for its stacks; it also uses other finite resources (semaphores, message
> ports, etc.)  You can't have 1000 runnable threads in 32-bit CCL because the
> ~2.5GB of foreign memory isn't available; the only way that a loop that calls
> PROCESS-RUN-FUNCTION 1000 times can run to completion is some older threads
> exit before newer ones are created, and the only way that happens is if those
> threads run to completion (and the more threads are created and competing for
> CPU time and other resources, the less deterministic that is.)
> 
> In practice, I'd expect something like:
> 
> (dotimes (i 1000) (process-run-function "nothing" (lambda ())))
> 
> to use a lot of CPU but probably not exhaust virtual memory (simply because
> all of the CPU contention keeps the thread running that loop from running
> very often.)  This isn't guaranteed, and it's such a ridiculous thing to do
> that I'd find it difficult to get too worked up about things if it didn't.
> 
> If we put a delay in that loop:
> 
> (dotimes (i 1000)
>  (choose-file-dialog)
>  (process-run-function "nothing" (lambda ())))
> 
> we're probably effectively serializing thread creation, but we're also affecting
> the environment in which CHOOSE-FILE-DIALOG runs (competing for some of the
> same OS resources that the thread is competing for.)  I'd expect that to work
> as well, but of course it's also a ridiculous thing to do and it's hard to care
> too much about whether it does or not.
> 
> Also ridiculous (in different ways) is:
> 
> (dotimes (i 1000)
>  (choose-file-dialog))
> 
> where CHOOSE-FILE-DIALOG retains the open-panel appropriately.  I haven't
> seen that fail, but I haven't counted to 1000 either.
> 
> Apple distributes a command-line tool called "heap", which will scan the
> (malloc'ed) heap zones of a specified process and (among other things)
> identify the number and sizes of the ObjC instances (the program calls them
> "classes" ...) it finds there.  The ObjC heap shouldn't change significantly
> on each iteration and (when not answering email or trying to do work that
> we actually get paid for) I've been trying to verify that it doesn't.
> 
>> In summary I think we still don't know what is going on but I don't think it
>> is connected to that specific malloc issue you mention.
> 
> No one else thinks so either (at least not directly), but if you say
> "running a test in a version of CCL affected by that issue says something
> useful" a lot of time gets wasted by Matt or me saying "no it doesn't" and
> then by you saying "and by 'version of CCL affected by that issue', I mean
> 'other versions'".  I am emphatically in favor of streamlining this process.
> 
>> Need to run but I do remember reading that in sandbox mode (we are NOT
>> running in sandbox or or are we?) file choosers are NOT running in the main
>> thread but in some other special thread.?
> 
> They actually run in another OS-level process, which means that the internals
> of NSOpenPanel are probably a lot different than they used to be (and may
> change again if and when the whole "sandboxing" issue goes away.)
> 
>> hmmm....
>> puzzled, ?Alex?
>> 
> 
> I'm a little less puzzled than I had been.  The implementation of
> CHOOSE-FILE-DIALOG (actually of GUI::COCOA-CHOOSE-FILE-DIALOG) was clearly
> wrong and could result in methods being invoked on freed objects (which
> in turn could cause malloc heap corruption that snowballs into further
> malloc heap corruption ...)  Modifying the state of a freed object could
> involve modifying free memory (usually relatively harmless) or modifying
> the state of some other object that's subsequently allocated at the freed
> address (usually relatively harmful.)  I don't know for sure that any of
> this leads to the memory/CPU problems that you've seen, but it's certainly
> plausible that it could.   I'm fairly certain that it could lead to the
> cosmetic problems (apparently empty directories and other display problems).
> 
> The problem described in ticket 1005 caused almost exactly the same symptoms.
> Recall that that involved allocating a per-thread data structure whose address
> could be used to identify a Mach message port; traditionally (I don't know
> if this has changed), a Mach message port identifier has been a 32-bit value.
> So, the code that allocated that data structure was:
> 
> (let* ((free-later ())
>       (p nil))
>  (loop
>    (setq p (#_malloc few-hundred-bytes))
>    (if (and (is-32-bit-pointer p)
>             (can-be-used-as-mach-port p)) ; not coincidentally also Mach port name
>      (return)
>      (push p free-later)))
>  ;; Free anything that failed the test above
>  (dolist (bad free-later) (#_free bad)))
> 
> That was changed: especially on Mountain Lion, #_malloc would often return
> pointers for which IS-32-BIT-POINTER wasn't true, so we use alternatives to
> #_malloc and #_free on 64-bit platforms (and IS-32-BIT-POINTER is therefore
> always true.)  The second test - that the pointer's address can be used as
> a Mach port name - is intended to catch conflicts with the (often essentially
> random) set of existing Mach port names.  The second test is implemented by
> trying to use P as a Mach port name and seeing if that fails; it could fail
> if P was coincidentally already a Mach port name, but might (I'd have to RTFM)
> also fail for other reasons (like "Mach is too damned busy now; try later." Yes,
> it's 2012.)  If the loop above was run while Mach was too damned busy, we'd keep
> _mallocing until we were out of memory, and the one time that I was able to
> get your application to fail malloc's heap was full.  (I don't know that this
> is what's happening, but it may be worth exploring.)
> 
> To state the obvious: Matt and I (and anyone else at Clozure who could
> look at this) have other work that customers are paying us to do and
> that other work has to take priority over this; if you think
> otherwise, you are quite simply wrong and that isn't open to
> discussion (with me, my partners, or anyone else *).  If you're just
> getting someone's spare time to look at this, that seems like an even
> stronger reason to not waste that time with things like "running this
> test in a version of CCL known to exhibit these symptoms for other
> reasons exhibits these symptoms."
> 
> ------
> [*] I used to deal with a company whose slogan was "We cheat the other
> guy, and pass the savings on to you!".  Sadly, they aren't in business
> anymore ...
> 
>> Gary mentioned this bug in a previous message:
>> - there's a known bug in 64-bit CCL on OSX that can cause lisp thread
>> creation
>> ??to go into a horrible CPU-burning/memory-thrashing state. ?I think
>> that that
>> ??bug's been present for a long time (since PPC64 days), but it's
>> apparently
>> ??much easier to trigger on 10.8 (and/or recent versions of CCL) than
>> it has been.
>> ??The problem ultimately has to do with whether or not #_malloc
>> (actually #_calloc)
>> ??returns a 64-bit pointer whose high 32 bits are 0 and there can be
>> many factors
>> ??that affect that (many of them subtle), and the fix is to stop
>> assuming that
>> ??it does and allocate such pointers ourselves.
>> ??That's been fixed (in the trunk for a few weeks and in the 1.8 tree
>> ??for a few days) in svn; the symptoms happen to be very similar to
>> ??what people have reported seeing with CHOOSE-FILE-DIALOG, but the
>> ??CHOOSE-FILE-DIALOG problems seem to occur for at least some people
>> ??in 32-bit CCL (which was never affected by this thread-creation
>> ? problem) and in freshly-updated 64-bit versions.
>> The fix for this bug is not yet in the Mac App Store version of CCL.
>> ?I'll try to update the Mac App Store version soon, but in the
>> meantime, please try using up-to-date CCL obtained via Subversion
>> (either trunk or 1.8).
>> I modified your test case to make the call to the open panel take
>> place in the main thread. ?It seemed to work as expected for me in an
>> up-to-date trunk CCL.
>> ;; modified to use gui:execute-in-gui
>> (defun THE-AMAZING-MEMORY-SURGE ()
>> ? (dotimes (i 100)
>> ? ? (gui:execute-in-gui #'(lambda ()
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? (choose-file-dialog2)))
>> ? ? (ccl::process-run-function "pretent to load project" ?#'(lambda
>> ()))))
>> (defun choose-file-dialog2 ()
>> ? ;; 100% kosher: retain, no use of depreciated calls
>> ? (let ((panel (#/retain (#/openPanel ns:ns-open-panel))))
>> ? ? (#/runModal panel)
>> ? ? (#/release panel)))
>> (defun THE-AMAZING-MEMORY-SURGE ()
>> ? (dotimes (i 100)
>> ? ? (ccl::with-autorelease-pool
>> ? ? ? ? (choose-file-dialog2)
>> ? ? ? (ccl::process-run-function "pretent to load project"
>> ?#'(lambda () )))))
>> ;; this will pop up a file chooser for a number of times. Each
>> time just press ESC and watch the Activity Monitor.?
>> ;; Set view > update frequency in Actvity Monitor to very often
>> (0.5s) for best results
>> ;; Watch out for Clozure CL % CPU and Real Mem
>> ;; for some time Real Mem will go up gradually (memory leak)
>> then at some unpredicatable time it will SURGE to GIGABITES of
>> memory?and ultimately crash CCL
>> ;; with with-autorelease-pool CCL may crash quite quickly with a
>> Unhandled exception 10, comment out if needed
>> ; (the-amazing-memory-surge)
>> _______________________________________________
>> Openmcl-devel mailing list
>> Openmcl-devel at clozure.com
>> http://clozure.com/mailman/listinfo/openmcl-devel
>> Prof. Alexander Repenning
>> University of Colorado
>> Computer Science Department
>> Boulder, CO 80309-430
>> vCard: http://www.cs.colorado.edu/~ralex/AlexanderRepenning.vcf
>> 

Prof. Alexander Repenning

University of Colorado
Computer Science Department
Boulder, CO 80309-430

vCard: http://www.cs.colorado.edu/~ralex/AlexanderRepenning.vcf


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clozure.com/pipermail/openmcl-devel/attachments/20120830/01b7c783/attachment.html>


More information about the Openmcl-devel mailing list