[Openmcl-devel] mach ports leaking?

Gary Byers gb at clozure.com
Tue Feb 3 14:01:02 PST 2009



On Tue, 3 Feb 2009, David Reitter wrote:

> On 3 Feb 2009, at 02:26, Gary Byers wrote:
>
>> I don't know what your program's doing, so I can't really guess why top
>> clams that it's using as many Mach ports as it is.
>
> Most of the code that I'm running there isn't mine, so I can't tell for sure. 
> But asking the author (and doing a "grep") revealed that no threads seem to 
> be used.
>
>> I'm sure that there are lots of ways in which ports are used in and 
>> allocated
>> by the Mach kernel (possibly by the memory system, I/O, other forms of IPC,
>> whatever ...).  I don't know exactly what it would mean for an application
>> to "leak" ports that it doesn't explicitly create.
>
> After running this overnight, it does not appear that the ports are cleaned 
> up.  I have 39M and 354M ports, respectively, and, what's worse, they seem to 
> be correlated with memory usage.
>
>   PID COMMAND      %CPU   TIME   #TH #PRTS #MREGS RPRVT  RSHRD  RSIZE  VSIZE
> 30211 dx86cl64     0.0%  2:03:17   3     0    419   39M   188K    40M   470M
> 14994 dx86cl64     0.0%  8:21:10   3     0    473  354M   188K   355M   625M

I don't think that it's too likely that either of these processes are using
0 Mach ports.

>
>> Depending on how files are opened, they may hae locks associated with
>> them, as can hash tables and other lisp data structures.  (Locks in
>> Darwin have semaphores - and therefore Mach ports - associated with
>> them.)  All of these things should get GCed eventually, unless
>> something takes unusual steps to keep this from happening (e.g., (PUSH
>> (MAKE-HASH-TABLE) *BIG-LIST-OF-HASH-TABLES*))
>
> lsof doesn't show excessive numbers of open files.  How do I show the heap 
> size?
> (ccl:gc) doesn't help.
>

CL:ROOM.

? (room)
Approximately 29,622,272 bytes of memory can be allocated 
before the next full GC is triggered.

                    Total Size             Free                 Used
Lisp Heap:       49414144 (48256K)   29622272 (28928K)   19791872 (19328K)

The most interesting number here is the "Total Size", which is "how much
memory the lisp process has obtained from the OS for current and near-future
needs."  When the "Free" part of that fills up (another 30M or so), a full
GC will run; depending on how much the GC is able to free, we might ask
the OS to increase the total size, might decrease the total size, or
leave it alone.

I wish that I knew how to correlate that number (ROOM's notion of total
heap size) to anything that top spits out on OSX.   I don't, and whatever
I knew about top's output seems to have been invalidated by recent changes
in that output (sometime between the last time that I looked - 10.5.4 or
10.5.5 - and 10.5.6.

One way of getting numbers that are more likely to be meaningful on OSX
is to use 'vmmap', which will walk the allocated memory regions in a
process and summarize what it finds.  If 'pid' is the process id of a
CCL process, then

shell> vmmap -interleaved pid

will show detailed information about the memory regions in the process
whose id is 'pid' and generate a summary that looks like:

REGION TYPE             [ VIRTUAL]
===========             [ =======]
MALLOC                  [   9380K]
Mach message            [      8K]
STACK GUARD             [  512.0G]  <- address space that's reserved but not usable in its current state
Stack                   [   10.6M]
VM_ALLOCATE ?           [   53.6M]
__DATA                  [    860K]
__LINKEDIT              [   4460K]
__TEXT                  [   2192K]
mapped file             [   84.1M]
shared memory           [      4K]

The total that is shown here as "VM_ALLOCATEd" memory - 53M - is just
a little larger than what ROOM thinks the total size of the lisp hep
is.  (Most of the difference may be a few MB for lisp stacks that 
vmmap doesn't identify as stacks.)  I'd be a little surprised if 9+
MB has really been malloc'ed; 9380K might be the current size of
the malloc arena.  The total size of all things that the lisp runtime
has allocated via malloc() at this point in time is probably no
more than a few 100KB, and I'd be surprised if the C runtime has done
much more.  (What's VM_ALLOCATED and what's mapped to a file can
change over time, depending on how it's mapped)

Ah.

? (rlet ((stats #>malloc_statistics_t))
   (#_malloc_zone_statistics (#_malloc_default_zone) stats)
   (values (pref stats #>malloc_statistics_t.size_allocated)
           (pref stats #>malloc_statistics_t.size_in_use)))
9601024 ; reserved for malloc's use
191568  ; actually allocated



The detailed region discriptions that vmmap spits out can be interesting
in some cases, but the summary is often more interesting and I find
these numbers more meaningful than what top spits out; they're produced
by actually looking at the memory regions in the process, whereas top's
numbers are produced by asking Mach for bookkeeping information.  The
fact that the numbers produced by pre-10.5.6 versions of top and the
currrent version's numbers suggests that things have changed in Mach's
accounting code at least, but I'm not sure if it's getting better or
worse or how you'd tell.

If the code that you're running is consuming a lot of memory resources
and you don't understand how or why, the first step would seem to be
to try to understand what kind of resources are being consumed.  If
the total lisp heap size (as reported by ROOM) is a large percentage
of what vmmap identifies as VM_ALLOCATEd memory, then a reasonable
interpretation is that "most of the memory you're using is lisp data";
if there's lots of other stuff (MALLOC regions or a big discrepancy
betwen lisp heap size and VM_ALLOCATE totals), then a reasonable
interpretation might be that foreign code is allocating lots of
data and possibly neglecting to deallocate it.

If most of the memory that you're using seems to be lisp data,
then then the functionality described in

<http://trac.clozure.com/openmcl/wiki/HeapUtilization>

can summarize the numbers and sizes of each type of primitive lisp
object in the heap.

If some lisp object that you think should be getting GGed isn't
getting GCed, that's almost always because some other reachable
object is referencing it.  Bill St Clair wrote some tools to
try to identify what things directly or transitively reference
what other things; their use is described in:

<http://trac.clozure.com/openmcl/wiki/MemoryLeaks>

That code was originally written to work on a customer branch; it's
been ported to the trunk, but I don't think that it's in 1.2.  (The
trunk version should work in 1.2)  Bill's tools also contain functions
for identifying malloc-related leaks, but I think that those functions
are Linux-specific.  (I'm not sure that you can introspect malloc
behavior in other implementations, and if you can you'd likely have
to do it very differently.)

Apple does have tools that can be used to analyze malloc behavior,
and at least some of them work and are useful.  Setting some environment
variables (described in the the malloc man page) can cause information
about malloc-related calls to be logged to a file; the malloc_history
program can read these logfiles, which information about the C call
history as of the time that each call to malloc/free/etc is made




More information about the Openmcl-devel mailing list