[Openmcl-devel] process-run-function and mach ports usage

Tue Feb 22 22:50:19 PST 2011

Gary Byers <gb at clozure.com> writes:

> On Tue, 22 Feb 2011, Willem Rein Oudshoorn wrote:
>
>> Gary Byers <gb at clozure.com> writes:
>>
>>> At this point, I'd probably say that it -looks- like there's a net loss
>>> of ~1 port every time a thread is created and destroyed, but that isn't
>>> entirely predictable.
>>
>> After a bit of experimentation, it seems that it always loses 1 port.
>> (The first time it might not appear this way because doing a garbage
>> collect immediately after booting the lisp image it will recover some
>> ports already.)
>>
>> After some debugging, it turns out that the mach_thread port is not
>> freed.  I think I know at least one reason why this is the case, but
>> that can not be the whole story.  (The mach port business is all
>> completely new for me, so it takes a bit of time figuring this out.)
>>
>
> The port that the the lisp kernel generally refers to as "mach_thread"
> is effectively a "task-wide" (OS process-wide) identifier for the thread.
> It's not explicitly created by user code; it's created by the OS (I use
> the term loosely ...) when the thread is created, and I'd naively
> expect the port to be deallocated at some point after the thread
> exits.

Yes, indeed, the kernel will deallocate the receive right of the port.
Normally this will make the port go away.  However, if the user program
has created send rights to the thread port, the send rights will
transfer into a dead state and will not disappear, but hang around.

One way to create send rights to the port is by calling 
mach_thread_self ().  So basically all calls to mach_thread_self 
(and mach_task_self) should be balanced by a reference decreasing
operation like mach_port_deallocate.

This can easily be observed by running the following attached
c-program 'tt'.  This program creates a thread with:

  pthread_attr_init (&attr);
  pthread_attr_setdetachstate (&attr, PTHREAD_CREATE_DETACHED);

  int err = pthread_create (&tid, &attr, do_nothing, NULL);

and in the do_nothing call it does:

  pp = mach_thread_self ();

This call makes the thread port leak.
(If later mach_port_deallocate (task_port, pp) is called, the
port is immediately freed)

>  (I'd also find it believable that "at some point after" might
> not be "immediately" and that the port might linger, sort of like a
> listening TCP socket does.)  I'll check, but I don't think that it's
> meaningful for a thread to destroy its own self port while exiting,
> and the basis for my naive belief that it's the kernel's responsibility
> to recycle these ports is that it just doesn't seem to scale well to
> do this in some other user thread.  ("Remember those 10,000 threads you
> created earlier ?  Some of them are probably dead by now.  Go harvest
> their kernel ports.")

First, I agree, destroying the thread port is not the programs
responsibility.
However, correctly balancing the send count for the send rights
for the threat is the users responsibility.  

>
> Actually, it looks (at a rough glance) like there's some code in the
> pthreads library that tries to do something like that; in the version
> of that code that I'm looking at, it's called _pthread_reap_threads()
> and it's called (under some circumstances) when a thread exits.  That's
> worth looking at further.  If that works as my reading of the  code
> suggests it should, then when a thread exits other threads are examined
> and if their kernel ports are dead those ports are deallocated.  I don't
> know if that reading's entirely correct, and I don't know why that isn't
> having the intended effect if that is indeed the intended effect.

I haven't looked at the pthread library.  But at the moment, I don't
understand where the bug is.   It could be in the pthread library,
but running some differnt test programs written in C does not
show it yet.  Is I mentioned before, I am quite convinced
that every call to mach_thread_seld (and mach_task_self) should 
be balanced by reference count decreasing operations.

However, a thread has a send_count of about 5:

* 1 for creating the lowlevel thread it self (the pthread library?)
* 1 created by mach_thread_self in CCL code
* 3 others
[This is from memory, I have experimented a bit so I could be one off.]

Now after exiting, the first one is dealt with correctly (pthread
library???).   The second is AFAICS a bug n CCL.  But fixing that 
leaves still the remaining 3.  And at the moment I can't really
see who creates these send rights.   I suspect the swap exception ports
code, but I haven't checked this.

> It's possible that something that CCL does inhibits port recycling by
> Mach.  It's also possible that one would need to wait longer than we
> have in order to see that recycling take place.  I don't know.

In my testing with C, the kernel will deal with ports immediately. 
I would be a bit skeptical if the kernel runs a regular 'port garbage
collect'.   I think that if user code does not manage the reference
count correctly, the port (most likely dead) will hang around forever.

>> The reason this matters:
>>
>>    (time (ccl:process-run-function "test" (lambda ())))
>>
>>    returns on my machine values around the follwoing:
>>
>> During that period, 304 microseconds (0.000304 seconds) were spent in user mode
>>                    325 microseconds (0.000325 seconds) were spent in system mode
>>
>>
>> However after
>>
>>    (loop :repeat 10000 :do
>>        (ccl:process-run-function "test" (lambda ())))
>>
>> doing
>>
>>    (time (ccl:process-run-function "test" (lambda ())))
>>
>>    returns values in the range of:
>>
>> During that period, 315 microseconds (0.000315 seconds) were spent in user mode
>>                    528 microseconds (0.000528 seconds) were spent in system mode
>>
>>
>> And it is getting progressively worse.  After about 150000 thread
>> creations,  the same function takes about 10ms.
>
> I'd certainly agree that it's desirable for thread creation to be as
> quick as possible and for it not to degrade over time.  It's possible
> that the degradation that you see has something to do with port
> leakage, but there are so many other things that can cause that sort
> of thing that I'd be hesitant to conclude that there's a causal
> relationship there.  Whether I need to or not, I also want to point
> out that this can be hard to measure: all that we know for sure after
> the loop above runs is that 10000 threads were created and have either
> exited or are on their way towards exiting.  In order to exit, the
> thread needs to run (get some CPU time), and if the number of runnable
> threads exceeds the number of CPU cores ... well, it can take a while
> for even the short life cycle of the threads in your example to
> complete 10000 times.  All that we know for sure is that when the loop
> above exits, it's been started 10000 times.

Yes, I see your concern.  However, I still think it does degrade,
because:

1 - after running that loop I wait for quite a while
2 - the nr of thread indicated by ps or top is back to the normal amount
    (indicating that at least all the mach threads are finished.)
3 - I run a few (gc) to try to recycle ports and that has worked
4 - The cpu usage is back to normal (low)

> True as that might be, it's definitely the case that the number of Mach
> ports that a task (Unix process) can reference is large but finite, and
> my recollection is that there's a lot of performance degradation as this
> limit is approached.

Well, in my experience the mach port usage goes up with threads
and never down again.  
Also if the nr of mach ports used is in the region of > 100000
the performance definitely degrades.

> Amit Singh's book and website <http://osxbook.com/> deal with Mach
> and other parts of OSX that generally aren't dealt with elsewhere.

I will look it up.

>>
>> Wim Oudshoorn.
>>
>> P.S.:  sbcl also loses 1 port per thread creation.
>
> I'm really skeptical that this has anything to do with user (non-OS-kernel)
> code, but I don't know that with 100% certainty.
>
> If creating threads in C via pthread_create() doesn't seem to have the
> same problem, it'd be interesting to see whether creating a detached
> thread (via pthread_attr_setdetachstate(...,PTHREAD_CREATE_DETACHED))
> affects this. 

Creating threads that are joinable will keep a mach port around until
the threads are actually joined. 
Creating threads in a detached state will free the port immediately
after finishing the threads execution.

> At this point, I'm most suspicious of the pthread cleanup
> code that -looks- like it should be deallocation the Mach ports of recently
> exited threads, and it's plausible that thread creation options could affect
> that (intentionally or otherwise.)

My guess is that the pthread library need to do this to get the pthread
semantics right on top of mach_threads.  Most likely in joining etc.

Wim Oudshoorn.

-------------- next part --------------
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>

int count = 0;
mach_port_t pp;

void
do_nothing (void *ignored)
{
  ++ count;
  pp = mach_thread_self ();
}

void
main (int argn, char ** argv)
{
  int ret;
  pthread_t tid = 0;
  pthread_attr_t attr;

  printf ("A: %d\n", count);
  printf ("Type in value:\n");
  scanf ("%d", &ret);

  pthread_attr_init (&attr);
  pthread_attr_setdetachstate (&attr, PTHREAD_CREATE_DETACHED);

  int err = pthread_create (&tid, &attr, do_nothing, NULL);
  if (err) {
    printf ("error A: [%d] [%s]\n", err, strerror (err));
  }
  printf ("B: %d\n", count);
  printf ("Type in value:\n");
  scanf ("%d", &ret);

  count = 0;
  mach_port_deallocate (mach_task_self (), pp);

  printf ("C: %d\n", count);
  printf ("Type in value:\n");
  scanf ("%d", &ret);
}