[Openmcl-devel] Coroutines/lightweight threads

Thu Nov 15 06:14:16 PST 2012

This isn't directly relevant, but ...  I was always taught that if you can't
say anything nice about an OS it's best to not say anything at all.  However
true that may be, sometimes it's just not practical to remain silent ...  It
likely isn't too interesting in the details; the conclusion/punchline (such
as it is) might actually be relevant to this discussion.

On most Unix-like systems, a machine exception (division-by-zero, an
illegal instruction, an invalid memory access ...) can cause an
application-defined "signal handler" function to be called.  The
mechanism generally involves the OS suspending execution of the
offending thread at the point when an exception occurred, pushing a
data structure or describing the state of the thread on the thread's
stack, and making the thread resume execution as if the signal
handling function had been called with pointers to those data
structures as arguments.  If the thread's stack has nearly overflowed
(memory pages near the stack pointer aren't mapped or can't be written
to), the OS can't push data structures on the stack or call code that
would try to run on that stack, so the OS will typically just terminate
the process abruptly in this case (making it difficult to diagnose and
impossible to recover from) the original problem.  (This is an example
of a "double bus fault", where attempting to recover from a memory fault
would cause a memory fault.)

Partly to (partly) address situations like that one, most/all Unix-like
systems additionally provide an "alternate signal stack" mechanism: when
a thread starts up, it can create a (typically fairly small) block of
memory to be used as an alternate stack, and when a signal handler is
defined, the OS can be told that the handler should be called "on the
alternate signal stack" - the data structures will be pushed on the
alternate stack and the thread's stack pointer is made to point appropriately
into that alternate stack.

Until OSX 10.5 (IIRC), Apple thought that all threads should share the same
signal stack (so the code that ran on thread startup changed a global rather
than establishing a specific alternate stack for the current thread.)  That
actually sort of worked - CCL uses the sigaltstack mechanism heavily on x86,
and it took a while to notice that it's relatively rare for two threads to
receive signals at the same time.  (When they did, they of course tried to
run on the same stack at the same time, and it wasn't surprising that that
didn't work.)  Apple eventually fixed this (though I think that there was
a flag that could be set for applications that thought that that was wasteful
or that their old interpretation made any sense at all.)

Um, that was the promised punchline.  Sorry if you were expecting something
funnier (and not as tragic).  I still laugh hysterically when I think of that,
but I'm probably doing so for other reasons.

Some functional languages (including Scheme) refrain from using a
stack in the traditional way.  Language constructs like
"call-with-current-continuation" effectively capture what might
otherwise be on a stack ("the continuation" - what might be pending
return addresses in a language without call/cc, local variable
bindings in call frames, etc.)  Call/cc is a powerful (though arguably
overly costly) construct that can be used to implement powerful control structures
and used as the basis for coroutines.)

The person who started this thread of discussion asked about
coroutines in CCL.  Common Lisp isn't Scheme (or ML, or Haskell, or
...).  Common Lisp isn't required to be tail-recursive (and the fact
that CCL often is causes some people to submit bug reports claiming
that "backtrace isn't showing all stack frames!").  Common Lisp
doesn't have language constructs like call/cc, and CPS has little or
nothing to do with how CCL (or, AFAIK, other CL implementations) are
implemented.

CL and CCL contain and depend on other support for other language and
implementation features, and (as I said the other day) the way that many
things are implemented in CCL are tied to notions that are tied to the
native thread implementation, so it would be hard (= lots of work) to 
implement things that're sort of like threads (allow the same semantics
with respect to special binding, CATCH/THROW and friends, other things
that I mentioned the other day) unless those other things ("cooperative
threads") are implemented at a very low level.

You could (and apparently can) use closures (and suspensions, and futures,
and other things that you can build with CL closures) to implement control
structures that can function as coroutines.  You'll almost certainly find
that there are limits on what you can do with this sort of coroutine: if
two coroutines bind a special variable to different values, does that value
have the "right" value when you switch between coroutines ?  (For extra
credit, how on earth could it ?)  Things like special variable binding
and CL:CATCH/CL:THROW have dynamic-extent and need something stack-like
(VERY stack-like in CCL's case) to model that, and you can't easily
have multiple ... threads ... of execution running on the same stack
(or stack-like thing.)  Geez.  Even Apple eventually realized that ...

You might find that this kind of coroutine is useful enough to accept some
limitations.  If not, and if you want things that are more like "lightweight
threads" (that have their own dynamic execution contexts - aka stacks), then
you either have to build that into the system at a fairly low level or you
have to (mis)use native threads as I indicated the other day.

(I think that another way of saying the same thing is that there
differences between what can be portably captured and expressed in a
series of lexical closures and what things like call/cc captures/expresses/
allows one to express.)

There are at least a couple of conversations going on here.  The original question
had to do with support for "cooperative threads" and specifically asked about
things like CATCH/THROW working with such things; some subsequent messages
have seemingly discussed approaches to coroutining based on CPS transforms
(a sort of poor person's call/cc).  Those issues are obviosly related and both
questions are potentially interesting, but I think that it's fair and most useful
to remember that there are different sets of issues here.

There's also a bit of a danger - or a few bits - here in that there's a
potential disconnect between how things actually work and how someone who has no
reason to understand how things actually work imagines that they do.  That
can work the other way - it's also possible to confuse "how things work" with
"the only way that they possibly could", and that isn't entirely desirable
either.  Acknowledging the latter consideration, there is a set of things
that constitute "reality" or  "close approximations to it", and even if it
isn't generally useful or necessary to understand this reality thing in great
detail most of the time, there are limits on what one can say when the world
under discussion and one's model of it have little overlap.

(I have a hunch that I've said that before, and that it's possible to say
it more concisely than I just did.)

On Thu, 15 Nov 2012, Alex Mizrahi wrote:

> I'm not sure that having a separate stack for each lightweight thread is the
> right solution since it sounds kinda wasteful.
> 
> On the other hand, you don't need to CPS-transform all the code, but only
> portions of it which do I/O or yield.
> So, basically, only top level control and I/O code will be transformed and
> the "meat" will stay undisturbed, as fast as usual.
> 
> 
>