[Openmcl-devel] process-run-function and mach ports usage

Sat Mar 5 01:27:29 PST 2011

I somehow missed this message; sorry for not responding earlier.

Both the change that you propose here (decrementing the reference
count on the thread's kernel port's send right in
catch_exception_raise()) and the change that was committed to the
trunk a few weeks ago (having the thread ensure that that reference
count is no greater than 1 when it exits) seem to have the same effect
in practice on the original bug (if the thread exits when that
reference count is >1, the port is never destroyed and the
task/process's port count increases, eventually having a negative
effect on performance.)  I considered both approaches and favored the
one that I took, but there are enough practical (and aesthetic, as in
"which of these things is uglier?") concerns that there doesn't seem to
be a clear winner.

Aside from fairly strong aesthetic objections (there should be penalties
for using the words "Mach" and "aesthetic" in the same sentence), one of
the concerns that I had about changing catch_exception_raise() is that
doing so would add a little bit more complexity to to some code that's
already pretty complicated.  I think that that's a valid concern, but
there's a related issue that's at least as important.

A few weeks ago, I was working on getting CCL and Apple's CHUD/Shark
metering tools to work together (again); the test case that I was
using was (DOTIMES (I 1000) (FACT 1000)).  As one would expect, that
mostly exercises bignum (or bignum X fixnum) multiplication and memory
allocation/GC, and at first I didn't see anything surprising in
Shark's output.

On second glance, I noticed that something like 14% of the total execution
time was being spent in the exception thread (the thread that waits for
and processes exception messages from other threads), and a lot of that
thread's time was spent in the OS kernel.  There are a few things going
on (general exception-processing latency, a few context switches between
threads) that make things a little hard to measure, but it doesn't seem
at all right that the exception thread should be doing about 1/6 as much
work as a thread that's doing a mixture of bignum arithmetic and GC.

Much of the work that that exception thread does involves receiving and
replying to exception messages; that seems to be an unavoidable cost of
doing business this way.  The rest of that work involves using the thread
port (ahem ... the send right associated with that port) to get and set
the thread's state (register contents).  We need some of that information
in order to process the exception, and we wind up making a few calls to
thread_get_state() and at least one call to thread_set_state() each time
catch_exception_raise() is called.  Getting/setting the state of a thread
involves Mach messaging operations to the thread's kernel port, and these
operations have a (seemingly well-deserved) reputation for being extremely
slow.  It seems clear that reducing the number of these messaging operations
per exception would likely reduce the time spent in the exception thread.

Aside from whatever concerns I have about introducing additional complexity,
it's worth noting that things like mach_port_deallocate() also involve
messaging (to the task_self port in that case), so it'd be good to avoid
introducing that operation if we can avoid doing so.

One way of avoiding the issue that caused us to leak ports in the
first place and to reduce the number of messaging operations is to
arrange to use catch_exception_raise_state() instead of
catch_exception_raise(); this likely eliminates at least one call to
each of thread_get_state() and thread_set_state() per exception and
(since the kernel doesn't pass us the thread's kernel port or
increment its send right count when catch_exception_raise_state() is
used) we don't have to worry about that port leaking.  (We already
have a send-right-enabled reference to the thread's kernel port that
was established on thread creation, and can use that to obtain
additional "flavors" of thread state, should we need to do that.)  In
the most general case, handling an exception means basically what it means
now (arranging that the thread will resume execution in an exception handling
function and possibly signaling a lisp error from that handler), but in some
cases it may be possible and desirable to handle the exception in the exception
thread; using catch_exception_raise_state() might make those cases easier
to recognize.

I was originally (Way Back When) reluctant to use catch_exception_raise_state():
there are GC invariants that say that the thread that the thread that invokes
the GC has to be able to see the state of all other threads (since it has to
find references to lisp objects that may be contained in machine registers.)
Using catch_exception_raise_state() doesn't change that (and doesn't change how
that invariant is maintained).  I've recognized that for a long time but hadn't
seen a compelling reason to change; I'm pretty sure that the reasons outlined
above are either compelling or very, very close to it.

So: unless some currently-unknown problem with the trunk's current
workaround becomes apparent, I'm inclined to stick with it in the short term.
In the slightly longer term, I think that there's probably a way to both
avoid the port-reference-leakage issue and reduce exception processing overhead,
and I haven't yet thought of a reason not to do that.

On Mon, 28 Feb 2011, Willem Rein Oudshoorn wrote:

> Gary Byers <gb at clozure.com> writes:
>
>> On Wed, 23 Feb 2011, Wim Oudshoorn wrote:
>>
>>>> I was going to write a longer reply, but I'd already spent a few hours
>>>> looking into this today (seeing some of the same things that you saw
>>>> and reaching some different conclusions), and this is all starting to
>>>> seem too much like the sort of thing where it's said that "if you don't
>>>> stop doing it, you'll go blind."
>>>
>>> Yes, the mach_port mechanism in CCL seems to induce blindness.
>>> I stared it for quite a while and never really figuring out where
>>> the increased right count comes from.
>>
>> It (or at least a major source of it) doesn't come from CCL.
>>
>>
>> The message(s) sent to the various exception ports include the kernel
>> thread object (the port); naturally, the kernel conflates the ideas of
>> "referencing that object" with the idea of "retaining that object" in
>> this context just as it does in other contexts. It never releases that
>> reference (presumably since doing so would make at least some sense.)
>
> Ah, this makes perfect sense.  I have been reading up on the mach
> architecture and it slowly starts to click.
>
> The receiver gets a message describing the exception.
> In the message some port rights are send.  The receiver receives
> these rights and the rights are now his to do with what he pleases.
> One consequence is that if the receiver does not deallocate these
> rights, they will linger on.
>
> [I read a text somewhere in the mach documentation but it did not click
> before.  The text stated something like that if you received a right,
> you have to deallocate it.]
>
> I can see that rationale:
>
> 1 - Not all messages need a return message, so the sender
>    can not dealloc the rights
>
> 2 - The sender can live in a different namespace, so
>    when the message is transported a new right (name) is created
>    in the receivers namespace.  This right belongs in the receiver
>    namespace and as such the sender can not touch it.
>
> 2a - You could argue that the kernel, who does the transformation
>     in step 2 should deallocate the rights.  But the kernel
>     still has problem 1.
>
> 3 - To make it consistent, have the policy that all rights
>    that are send to a receiver are the responsibility of the receiver.
>
>
> Note: the mach_... calls are thin autogenerated wrappers on the
>      actual mach msg send functionality. These wrappers are quite
>      thin and do not alter the semantics of the content of the messages.
>
>
>> I checked this in to the trunk about 10 hours ago.  It seems to fix
>> the port leakage and I haven't seen evidence of any new problems
>> having been introduced, but this stuff is complicated enough that it's
>> hard to say that for sure.
>
>
> I have an alternative fix, which does all the deallocates in the
> appropriate places.  It works for me and all reference counts are
> in my case low and do not increase.  Also no more dead ports lingering
> around.
>
> I can make this into a patch for you if you tell me what you would need.
> I assume that it should be based upon the most recent development
> version.  But in what format would you prefer it and where do I send it.
> I will need some time to get it complete.  That is, the powerpcc
> version need to be made etc.
> Let me know if you are interested, otherwise I will not waste time
> on it anymore.
>
> Kind regards,
> Wim Oudshoorn.
>
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/mailman/listinfo/openmcl-devel
>
>