[Openmcl-devel] Help with error: [Stacks reset due to overflow.]

Sun May 18 14:04:34 PDT 2003

On Sun, 18 May 2003, Barry Perryman wrote:

> Hi,
>
> I am having some problems with the finger server, from the samples folder,
> and multiple processes. This code has been tidied up a bit and included at
> the end.
>
> The ideas was to try out the new native processes with a simple "stress
> test" type app that would start n threads and just make multiple finger
> queries to a server. This would be done in both lisp processes and native
> threads, and then repeated using thread/process pooling.
>
> When I run this I get the following error:
>
> ;[Stacks reset due to overflow.]
>
> in Lisp processes this seems to continue quite nicely after the reset - I
> haven't examined the data, but the load on the processor continues and there
> is information being shipped across the network. In native threads this
> error does not show and there is no graceful recovery the load on the
> processor stops and so does any further network traffic - to be expected I
> guess.
>
> My problems is that I've never seen this type of error and I don't have the
> first clue about what could be causing it, and where to look to rectify the
> situation. I've looked for resource leaks and I can't see any; it could be
> simple and that I've got code blindness, but even re-working the some of the
> code to tidy up my previous hacks didn't help.

Ordinarily, a stack-overflow error is supposed to be signaled when there's
still quite a bit of room on the stack (enough for a few K function calls,
typically.)  The stack overflow would be signaled as a STORAGE-CONDITION
and you'd get to look at a few miles of backtrace to determine the cause
of the overflow. (If this happens in some thread other than the listener
thread, you'd have to go through the whole nonsense of "yielding" terminal
input to the thread that got the overflow, and you'd probably find that
that mechanism doesn't scale well.)

If a thread continues to use stack space (by deliberately doing
something deeply recursive in the break loop or for some other, deeper
reason) and starts to get "close" to the physical limit of the stack,
the system gives up on the idea of trying to signal a condition and
quietly resets the thread: it does a THROW to the outermost CATCH in
the thread , and the message you see above is printed.  (It'd be nice
if that message said something about which thread's stacks were just
reset due to overflow.)

I'm suspicious of the "near-fatal stack overflow recovery" code in
0.14: it's a variant of the same code that's used to start running
lisp code on a newly-created thread, and that's different.  If you
get an otherwise unrecoverable stack overflow and there's nothing
else to be done, it should certainly try to reset the thread as
cleanly as it can, and things have probably changed since this code
was last tested.

The deeper mystery is why code like this should be stack-overflowing
in any case, and why those overflows are (apparently) not being
signaled.

>
> If anybody could offer up any pointers it would be appreciated.

I'll try to run the code and see if I can see what's going on.  Nothing
in the code you sent -looks- like it should be using more than a few KB
of stack space in either the client or server.

Attempts to create a lisp thread -can- fail (because of resource
limitations); I don't think that it's too easy to provoke that failure,
but don't know how gracefully that failure's handled, either.

>
> Barry
>
>

_______________________________________________
Openmcl-devel mailing list
Openmcl-devel at clozure.com
http://clozure.com/cgi-bin/mailman/listinfo/openmcl-devel