[Openmcl-devel] process-run-function and mach ports usage

Tue Feb 22 14:42:53 PST 2011

On Tue, 22 Feb 2011, Willem Rein Oudshoorn wrote:

> Gary Byers <gb at clozure.com> writes:
>
>> At this point, I'd probably say that it -looks- like there's a net loss
>> of ~1 port every time a thread is created and destroyed, but that isn't
>> entirely predictable.
>
> After a bit of experimentation, it seems that it always loses 1 port.
> (The first time it might not appear this way because doing a garbage
> collect immediately after booting the lisp image it will recover some
> ports already.)
>
> After some debugging, it turns out that the mach_thread port is not
> freed.  I think I know at least one reason why this is the case, but
> that can not be the whole story.  (The mach port business is all
> completely new for me, so it takes a bit of time figuring this out.)
>

The port that the the lisp kernel generally refers to as "mach_thread"
is effectively a "task-wide" (OS process-wide) identifier for the thread.
It's not explicitly created by user code; it's created by the OS (I use
the term loosely ...) when the thread is created, and I'd naively
expect the port to be deallocated at some point after the thread
exits.  (I'd also find it believable that "at some point after" might
not be "immediately" and that the port might linger, sort of like a
listening TCP socket does.)  I'll check, but I don't think that it's
meaningful for a thread to destroy its own self port while exiting,
and the basis for my naive belief that it's the kernel's responsibility
to recycle these ports is that it just doesn't seem to scale well to
do this in some other user thread.  ("Remember those 10,000 threads you
created earlier ?  Some of them are probably dead by now.  Go harvest
their kernel ports.")

Actually, it looks (at a rough glance) like there's some code in the
pthreads library that tries to do something like that; in the version
of that code that I'm looking at, it's called _pthread_reap_threads()
and it's called (under some circumstances) when a thread exits.  That's
worth looking at further.  If that works as my reading of the  code
suggests it should, then when a thread exits other threads are examined
and if their kernel ports are dead those ports are deallocated.  I don't
know if that reading's entirely correct, and I don't know why that isn't
having the intended effect if that is indeed the intended effect.

It's possible that something that CCL does inhibits port recycling by
Mach.  It's also possible that one would need to wait longer than we
have in order to see that recycling take place.  I don't know.

> The reason this matters:
>
>    (time (ccl:process-run-function "test" (lambda ())))
>
>    returns on my machine values around the follwoing:
>
> During that period, 304 microseconds (0.000304 seconds) were spent in user mode
>                    325 microseconds (0.000325 seconds) were spent in system mode
>
>
> However after
>
>    (loop :repeat 10000 :do
>        (ccl:process-run-function "test" (lambda ())))
>
> doing
>
>    (time (ccl:process-run-function "test" (lambda ())))
>
>    returns values in the range of:
>
> During that period, 315 microseconds (0.000315 seconds) were spent in user mode
>                    528 microseconds (0.000528 seconds) were spent in system mode
>
>
> And it is getting progressively worse.  After about 150000 thread
> creations,  the same function takes about 10ms.

I'd certainly agree that it's desirable for thread creation to be as
quick as possible and for it not to degrade over time.  It's possible
that the degradation that you see has something to do with port
leakage, but there are so many other things that can cause that sort
of thing that I'd be hesitant to conclude that there's a causal
relationship there.  Whether I need to or not, I also want to point
out that this can be hard to measure: all that we know for sure after
the loop above runs is that 10000 threads were created and have either
exited or are on their way towards exiting.  In order to exit, the
thread needs to run (get some CPU time), and if the number of runnable
threads exceeds the number of CPU cores ... well, it can take a while
for even the short life cycle of the threads in your example to
complete 10000 times.  All that we know for sure is that when the loop
above exits, it's been started 10000 times.

It's certainly possible (in fact, I think it's likely) that the degradation
you see happens even after everything's calmed down, but it can be hard to
measure this sort of thing and you have to be sure that you're measuring
what you think you are.

True as that might be, it's definitely the case that the number of Mach
ports that a task (Unix process) can reference is large but finite, and
my recollection is that there's a lot of performance degradation as this
limit is approached.

>
>>   I haven't looked at things over a long enough
>> period of time to have a sense of whether things are transient or whether
>> there's a true leak there.
>
> There is a true leak, and I will try to hunt it down.
>
>> (In the face of all this ignorance, I take comfort in the old adage that
>> says that "Mach sucks, but no one understands how.")
>
> I do not know Mach, so I cannot really comment on it.  However it is
> hard to find clear unambiguous documentation about the mach kernel
> in Mac OS X.

Amit Singh's book and website <http://osxbook.com/> deal with Mach
and other parts of OSX that generally aren't dealt with elsewhere.

>
> Wim Oudshoorn.
>
> P.S.:  sbcl also loses 1 port per thread creation.

I'm really skeptical that this has anything to do with user (non-OS-kernel)
code, but I don't know that with 100% certainty.

> P.S.:  Sorry for replying late, I am a bit swamped
>       at the moment and quite a few time consuming obligations
>       I can not move around.
>

If creating threads in C via pthread_create() doesn't seem to have the
same problem, it'd be interesting to see whether creating a detached
thread (via pthread_attr_setdetachstate(...,PTHREAD_CREATE_DETACHED))
affects this.  At this point, I'm most suspicious of the pthread cleanup
code that -looks- like it should be deallocation the Mach ports of recently
exited threads, and it's plausible that thread creation options could affect
that (intentionally or otherwise.)