[Openmcl-devel] Coroutines/lightweight threads

Tue Nov 13 02:54:18 PST 2012

A lot of my motivation for trying to steer this towards basing coroutines
on native threads is that it avoids reinventing a lot of wheels (thread-specific
special bindings and CATCH/UNWIND-PROTECT context all involve the concept of
thread-local data in CCL (and it's likely that that's true of many or all of
those things in other implementations as well.)  There's another set of things
that I'll just call "GC integration" that basically involve how the GC views
and interacts with threads and I think that this all makes me very leary of
introducing some new kind of primitive non-native thread.

You're right that the kind of context switch that'd occur when thread A releases
a lock that thread B has been waiting for is likely to be much slower than a
context switch that involves saving and restoring some registers.  If it's
the case that many lispy things depend native threads in subtle ways, then
user-space context switch is infinitely slow (because it doesn't get you
anywhere that you want to be.)  The assumption that all threads are native
threads isn't CCL- or Lisp-specific: a lot of C runtime functions may involve
locking, and locking generally involves some notion of what the current (native)
thread is.

Unless you want to reimplement all of the things that depend on threads being
native threads, I don't think that there's really much of an alternative to
the general idea that I suggested earlier.

On Tue, 13 Nov 2012, Andrew Lyon wrote:

> A lot of what you said makes sense. My main goal is to replace CPS in an
> asynchronous application (without littering my code with cl-cont macros). So
> select() and poll() are what I'm using already, but transfer of control from
> one operation to the next around a non-blocking operation has to take place
> via a callback. One could build "coroutines" around real OS threads as
> you've laid out, but I'm guessing there would be a significant performance
> penalty associated (memory/context switching/etc). From my understanding,
> having a bunch of coroutines laying around is a lot cheaper than the same
> amount of OS threads in both memory and switch time. However, my
> understanding might be flawed...I really don't know what it would take to
> implement coroutines in lisp, so maybe there wouldn't be a significant
> amount of difference between that and using OS threads.
> Also, I'd like to echo as well that I don't really want "green threads"
> where the lisp is scheduling things for me, I'd much rather have explicit
> control.
> 
> On Mon, Nov 12, 2012 at 10:57 PM, Gary Byers <gb at clozure.com> wrote:
>       I've heard some people express interest before; I'd say that the
>       interest
>       seemed to be low-to-moderate, but non-zero. ?When it's come up,
>       I think that
>       my first reaction to hearing someone say "I want cooperative
>       threads" is to
>       say "no, you don't", but I may be failing to consider all
>       aspects of the issue.
>
>       One approach to layering cooperative threads on top of native
>       threads is to
>       use some kind of object very much like a lock; a cooperative
>       thread is just
>       a (native) thread that waits for that lock before doing
>       anything, and yelding
>       to another (unspecified) coooperative thread basically involves
>       releasing that
>       lock and then waiting to obtain it again.
> 
>
>       ? (defvar *the-cooperative-thread-lock* (make-lock))
>       *THE-COOPERATIVE-THREAD-LOCK*
>       ? (defun yin () (loop (with-lock-grabbed
>       (*the-cooperative-thread-lock*) (print "Yin!")) (sleep 1)))
>       YIN
>       ? (defun yang () (loop (with-lock-grabbed
>       (*the-cooperative-thread-lock*) (print "Yang!")) (sleep 1)))
>       YANG
>       ? (progn (process-run-function "yin" #'yin)
>       (process-run-function "yang" #'yang))
> 
>
>       Yow. ?Are we COROUTINING yet ?
>
>       That's a bit of a rhetorical question: the old stack-groups API
>       that Scott
>       referred to is a little richer than that and provides a clean
>       way of transferring
>       values between threads; that's left as an exercise. ?I wrote
>       that in terms of
>       WITH-LOCK-GRABBED and we might actually want to use GRAB-LOCK
>       and RELEASE-LOCK
>       directly, so:
> 
>
>       (defun yield-to-any-cooperative-thread ()
>       ?(release-lock *the-cooperative-thread-lock*)
>       ?(grab-lock *the-cooperative-thread-lock*))
>
>       That's almost suspiciously simple, but it's almost exactly what
>       Apple did to
>       implement traditional cooperative threads in Carbon; there are
>       some classic
>       problems for which coroutines provide a natural solution, and
>       the mechanism
>       above (augmented with some means of transferring values around)
>       is probably
>       adequate to address many such problems (Google for "samefringe
>       problem" if
>       you're looking for an example.)
>
>       If we have problems for which we need more than two cooperative
>       threads,
>       then we may need to say "yield to some specific other
>       cooperative thread",
>       and that would be something like:
>
>       (defun yield-to-specific-cooperative-thread (other-guy)
>       ?(release-lock-and-transfer-ownership-to
>       *the-cooperative-thread-lock* other-guy)
>       ?(grab-lock *the-cooperative-thread-lock*))
>
>       and the functionality that I'm calling
>       RELEASE-LOCK-AND-TRANSFER-OWNERSHIP-TO
>       doesn't exist in CCL and is a bit hard to implement reliably.
>       ?(CCL locks
>       generally don't keep track of which threads are waiting for them
>       and a thread
>       that's waiting for a lock can abandon that wait - via
>       PROCESS-INTERRUPT - whenever
>       it wants to, so the global lock in my example above may be
>       something a little
>       different from a CCL lock.)
>
>       I haven't needed to solve the SAMEFRINGE problem elegantly in a
>       long
>       time and when I hear terms like "a blocking wrapper around
>       non-blocking I/O" I wonder if or how that differs from things
>       like
>       #_select or #_poll. ?I'm willing to believe that there could be
>       cases
>       where coroutines (the ability to control the scheduling of a
>       small
>       number of threads relative to each other) could be useful, ?but
>       I think
>       that that could be provided by fleshing out the interface that's
>       sketched
>       above.
>
>       Lisp implememtations that provide(d) cooperative threads (I
>       don't know
>       of any implementations that still do so) typically provided a
>       "lisp
>       scheduler" on top of what I described above; that layer
>       generally
>       tried to do some sort of periodic preemption (so that a thread
>       that
>       hadn't yielded in a while was made to do so) and that layer was
>       effectively spread all over the implementation (so that blocking
>       operations were replaced with code which combined yielding and
>       polling.)
>       I would not want to see that kind of code reappear and if
>       anyone's
>       saying that they want that, I'm still very much at the "no, you
>       don't"
>       stage.
> 
>
>       On Mon, 12 Nov 2012, Andrew Lyon wrote:
>
>       Hello, I'm an avid CCL user (have been for over a year
>       now). This is my
>       first post on the dev list, and I did a lot of research on
>       this topic before
>       deciding to post to make sure this hasn't been covered
>       before.
>       Is there any interest in the ClozureCL community in having
>       lightweight/cooperative threading available in the
>       implementation? I have a
>       few problems that would be a perfect fit for this (for
>       instance, creating a
>       blocking interface over non-blocking IO) and I'd love to
>       not only voice my
>       support for the feature, but also know if anybody else
>       would also like
> something like this.?
> 
> I think the most ideal implementation would be where
> you?explicitly?give up
> control of the current "micro-thread" to another known thread
> (on top of
> this, something like "yield" could be built in the app itself,
> if needed).
> Matching this to the way OS threads currently work would be
> awesome...for
> instance, unwind-protect would only work for the coroutine it's
> wrapping
> around, so if you give control to another coroutine, that
> unwind-protect
> won't fire if there is an exception. Obviously this would be a
> big feature,
> and probably at least a few people would have opinions on how it
> would be
> implemented, not to mention there's probably a lot going on
> under the hood
> that I'm not aware about...but I'd like to at least open a
> discussion.
> 
> I did try to implement coroutines outside of CCL via libpcl/CFFI
> (http://xmailserver.org/libpcl.html) but was met with much
> resistance and
> many segfaults.
> 
> Although I'm not familiar with the internals of CCL more than
> reading the
> "Internals" page and most of the docs, I'm more than happy to
> try getting my
> hands dirty and add support myself with some guidance from
> others (where do
> I start, what are the caveats, has anyone else tried this, etc).
> I'd also
> like to know if this is possible using the Virtual Instructions
> in the
> compiler.
> 
> I'd love to hear anyone's thoughts on this, and thanks for the
> great
> implementation.
> 
> Andrew
> 
> 
> 
> 
>