[Openmcl-devel] RUN-PROGRAM and memory overcommit

Fri Feb 7 05:53:31 PST 2014

On Thu, 2014-02-06 at 19:16 -0500, R. Matthew Emerson wrote:
> On Feb 6, 2014, at 6:36 PM, Ron Garret <ron at flownet.com> wrote:
> 
> > The other day I got an out-of-memory error on a Linux machine when
> calling RUN-PROGRAM from a CCL process that was consuming about 75% of
> the available RAM.  The machine does not have a swap partition,
> and /proc/sys/vm/overcommit_memory was set to 0.

One quick fix here is to enable memory overcommit, if you don't have
that already, or increase the overcommit ratio. What are the values of
your /proc/sys/vm/overcommit_memory and /proc/sys/vm/overcommit_ratio ?

> > According to:
> > 
> > http://stackoverflow.com/questions/12924048/fork-memory-allocation-behavior
> > 
> > this error could be avoided by spawning the child process with vfork instead of fork.
> > 
> > Before submitting a ticket I thought I would ask: is there a reason
> that run-program uses fork instead of vfork?
> 
> Speaking generally, vfork() and threads don't get along.  For that
> matter, fork() and threads don't get along that well either. But in
> the vfork() case they really don't get along.

The reason why fork and threads don't play is because upon fork, only
the current thread is cloned and if you invoke libc functions that try
to acquire locks that were held by some other thread you'll deadlock
because, the other thread being gone, the lock won't get released ever.

That said, vfork() is explicitly limited(and optimized) so that you are
only allowed to execute the following two syscalls afterwards: _exit(2)
and execve(2) - not even the libc exit(3) - so it should be pretty safe
with threads.

> http://www.oracle.com/technetwork/server-storage/solaris10/subprocess-136439.html

In Solaris they work around the aforementioned issue by calling the two
syscalls above directly avoiding the libc wrappers but that's shouldn't
be a problem on Linux.

> I've wondered whether using posix_spawn() would be a better
> alternative, but there are apparently ways to lose there as well.

I've implemented a better version of posix_spawn(3) in libfixposix and
if you use the latest IOlib you'll have it.

> http://trac.clozure.com/ccl/ticket/687
> 
> In the really-far-out department, I've also thought that it might be
> possible to have a small run-program-helper process that the lisp
> kernel would start up before mapping the heap image.  There would be a
> lot of details to work out, but the idea is that the lisp could talk
> to the run-program-helper via some form of IPC and have it do the
> fork/exec dance on its behalf.  Since the helper would be tiny, having
> fork() duplicate its address space would be no big deal.

I started such a project a few years ago
(https://github.com/sionescu/local-execution-daemon), but it's really
complicated and I'm not sure it's worth the trouble.

-- 
Stelian Ionescu a.k.a. fe[nl]ix
Quidquid latine dictum sit, altum videtur.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: <https://lists.clozure.com/pipermail/openmcl-devel/attachments/20140207/62855ad1/attachment.bin>