[Openmcl-devel] P4 to i5 port

Gary Byers gb at clozure.com
Tue Oct 12 15:22:59 PDT 2010


The error has to do with data structures ("frag vectors") that're used
in the assembler/code-generator.  These things (a) are entirely private
to the assembler (b) have a very well-defined lifetime - once the assembler
has used them to generate a function, they aren't referenced (c) are used
very heavily, so they're freelisted (recycled).

The freelists are supposed to be thread-specific (e.g., each thread is
supposed to have its own binding of CCL::*X86-LAP-FRAG-VECTOR-FREELIST*,
and the value of that that variable in each thread is supposed to be
an object of type CCL::POOL.)  The only thing that's special about that
kind of object is that the GC will set a POOL's data slot to NIL whenever
it encounters one (so that freelists don't grow indefinitely and their
contents eventually get GCed.)

Between GCs then, the assembler pops "frag vectors" off of a thread-private
freelist (the fact that it's a thread-private list means that locking isn't
needed, though this code isn't reentrant and has to disable interrupts) or
creates one if that list is empty; when the assembler's created the function,
it returns all of the frag vectors to the freelist.  There are ways to go
wrong here, but that strategy does significantly reduce consing (enough to
have a measurable and significant improves compilation speed, or did the last
time that I tried to measure it.)

One of the things that can go wrong would do so if two threads were trying
to use the same freelist: if that happened, they could both pop the same
element off of the freelist at (roughly) the same time and eventually return
those elements to the freelist more than once.  If that happens, the freelist
would become circular, and there's some leftover debugging code that checks
for that and signals the error that appears in the backtrace.

That's not supposed to happen, because each thread is supposed to use
its own binding of the variable that contains that freelist: that's a
"standard initial binding" that ordinarily takes effect whenever a
thread is created.  There's an option to MAKE-PROCESS/PROCESS-RUN-FUNCTION 
that allows a thread to be created without any standard initial
bindings, but the documentation doesn't stress that that's an
exceptional and potentially dangerous thing to do.  A few years ago
some problem that several people reported was traced to the fact that
some third-party package was creating threads without "standard
initial bindings" and were therefore stepping on shared resources that
they thought were thread-private, and we promised to deprecate that
option or to use scary language in the documentation. ("You should
only use this option if you fully understand the implications of doing
so.  If you do understand those issues, please explain them to us
...")

My first guess is that your application is running threads that don't
have standard initial bindings (and that two or more threads are
therefore stepping on internal assembler data structures), and that
the thread-creation code that you're using (whether it's yours or a
third party's) should be changed to not say
:USE-STANDARD-INITIAL-BINDINGS NIL.  I don't know for sure that that's
the explanation, but if I'm correct in remembering that your P4 was a
single-core machine and that the i5 is multi-core, then the fact that
this problem showed up when you moved it to the i5 is consistent with
that explanation: the bad things that can happen when two threads try
to modify a data structure at the same time are more likely to happen
when those threads are really running concurrently (on multiple cores)
and literally trying to do do that modification at the same time.

If that's not it, I don't have a good guess: I don't think that it's
too easy for the freelist to get corrupted if it's only modified from
a single thread, and I haven't seen or heard of this happening in several
years.




On Tue, 12 Oct 2010, Lou Vanek wrote:

> Hi,
>
> I'm in the process of porting an openmcl project from a 32-bit Pentium
> 4 to a 64-bit i5.
> Most of the code runs fine, but I'm having problems with a snippet of
> code that is
> compiled on the fly.
>
> The line of code that is causing the problem is:
>    (setq res (funcall (coerce form 'function)))
>
> 'form' is bound to,
>   (LAMBDA NIL (IFDEF "mode" 0 "Select a cube."))
>
> The IFDEF function is never called. This is where openmcl throws an error.
>
> I can eval this form just fine in the REPL, but not at run-time in the web app
> on the i5 using code compiled by the i5. The P4 never had a problem
> running this.
>
> I know this code looks funky but since this is a web app some of the web pages
> are built at runtime using text templates with embedded lisp forms. It's these
> embedded forms that are causing the problem. None of the static code throws
> errors.
>
> A partial backtrace is shown below.
>
> Some background. This code runs fine both on my P4 and the i5 as long
> as I use FASLs
> that are compiled on the P4. When I compile on the i5 I get the error
> at runtime.
>
> My setup is a bit complicated. This is being run in a debian lenny
> virtual machine,
> using the latest stable virtualbox on a win7 host. Debian Lenny is
> stable and patched up.
>
> Guest OS:
>
>> uname -a
> Linux deb 2.6.26-2-686 #1 SMP Thu Sep 16 19:35:51 UTC 2010 i686 GNU/Linux
>
> I believe this is a 32-bit OS running on a 64-bit cpu.
>
> Openmcl version running on the i5:
>
> CL-USER> (lisp-implementation-type)
> "Clozure Common Lisp"
> CL-USER> (lisp-implementation-version)
> "Version 1.6-dev-r14347M-trunk  (LinuxX8632)"
>
> The openmcl version that is running on the P4 is about 3 months old, IIRC.
>
>
> The ccl/hunchentoot/slime error log:
>
> [2010-10-12 15:01:32 [ERROR]] Compiler bug or inconsistency:
> frag-vector freelist is circular
> (B665F2EC) : 0 (PRINT-CALL-HISTORY :CONTEXT NIL :PROCESS NIL :ORIGIN
> NIL :DETAILED-P NIL :COUNT 536870911 :START-FRAME-NUMBER 0 :STREAM
> #<STRING-OUTPUT-STREAM  #x1A74E066> :PRINT-LEVEL 2 :PRINT-LENGTH 5
> :SHOW-INTERNAL-FRAMES NIL :FORMAT :TRADITIONAL) 735
> (B665F3A0) : 1 (PRINT-BACKTRACE-TO-STREAM #<STRING-OUTPUT-STREAM
> #x1A74E066>) 71
> (B665F3B8) : 2 (GET-BACKTRACE) 327
> (B665F3EC) : 3 (FUNCALL #'#<(:INTERNAL (HUNCHENTOOT:HANDLE-REQUEST
> (HUNCHENTOOT:ACCEPTOR HUNCHENTOOT:REQUEST)))> #<CCL::COMPILER-BUG
> #x1A74E07E>) 119
> (B665F408) : 4 (SIGNAL #<CCL::COMPILER-BUG #x1A74E07E>) 903
> (B665F430) : 5 (%ERROR #<CCL::COMPILER-BUG #x1A74E07E> NIL -308708079) 111
> (B665F444) : 6 (COMPILER-BUG "frag-vector freelist is circular") 127
> (B665F454) : 7 (%ALLOCATE-VECTOR-LIST-SEGMENT) 135
> (B665F46C) : 8 ((SETF %VECTOR-LIST-REF) 0 (#(85 137 229 106 235 ...)) 24) 119
> (B665F490) : 9 (FRAG-PUSH-BYTE #<FRAG  #x1A1A434E> 0) 151
> (B665F4AC) : 10 (FRAG-LIST-PUSH-32 #<DLL-HEADER  #x1A74E766> 12) 223
> (B665F4C0) : 11 (X86-GENERATE-INSTRUCTION-CODE #<DLL-HEADER
> #x1A74E766> #S(X86::X86-INSTRUCTION :OPCODE-TEMPLATE
> #S(X86::X86-OPCODE-TEMPLATE :MNEMONIC "movl" :FLAGS 0 ...) :REX-PREFIX
> NIL ...)) 3943
> (B665F4EC) : 12 (FUNCALL #'#<(:INTERNAL CCL::EXPAND-INSN-FORM
> CCL::X862-EXPAND-VINSN)> (525 (ASH # 2) 2)) 359
> (B665F520) : 13 (FUNCALL #'#<(:INTERNAL CCL::EXPAND-FORM
> CCL::X862-EXPAND-VINSN)> ((:NOT #) (525 # 2))) 607
> (B665F554) : 14 (X862-EXPAND-VINSN #<SET-NARGS 3> #<DLL-HEADER
> #x1A74E766> #S(X86::X86-INSTRUCTION :OPCODE-TEMPLATE
> #S(X86::X86-OPCODE-TEMPLATE :MNEMONIC "movl" :FLAGS 0 ...) :REX-PREFIX
> NIL ...) #S(X86::X86-IMMEDIATE-OPERAND :TYPE 256 :VALUE 12)
> #<DLL-HEADER  #x1A74E756>) 1135
> (B665F590) : 15 (X862-EXPAND-VINSNS #<DLL-HEADER  #x1A74E8CE>
> #<DLL-HEADER  #x1A74E766> #S(X86::X86-INSTRUCTION :OPCODE-TEMPLATE
> #S(X86::X86-OPCODE-TEMPLATE :MNEMONIC "movl" :FLAGS 0 ...) :REX-PREFIX
> NIL ...) #<DLL-HEADER  #x1A74E756>) 543
> (B665F5B8) : 16 (X862-COMPILE #<CCL::AFUNC #x1A74EDA6> NIL T) 8863
> (B665FA5C) : 17 (COMPILE-NAMED-FUNCTION (LAMBDA NIL (IFDEF "mode" 0
> "Select a cube.")) :NAME NIL :ENV NIL :POLICY NIL
> :LOAD-TIME-EVAL-TOKEN NIL :TARGET NIL :FUNCTION-NOTE NIL :KEEP-LAMBDA
> NIL :KEEP-SYMBOLS T :SOURCE-NOTES NIL :RECORD-PC-MAPPING T
> :COMPILE-CODE-COVERAGE NIL) 1231
> (B665FB38) : 18 (COMPILE-USER-FUNCTION (LAMBDA NIL (IFDEF "mode" 0
> "Select a cube.")) NIL NIL) 183
> (B665FB50) : 19 (FUNCALL #'#<(:INTERNAL WEB::CONVERT
> WEB::EVAL-EMBEDDED-LISP)> "<div style='display:none'>
>    <div class='contact-top'></div>
>    <div class='contact-content'>
>        <!-- warning! don't put too many characters in this h1 or the
> whole dialog goes to crap -->
>        <h1 class='contact-title'>
>            <?cl (web::ifdef \"mode\" 0 \"Select a cube.\")?>
> ...
> [Above is the text template that is being evaluated. The last line of
> the backtrace
> shows the lisp form that is causing openmcl to throw an error.]
>
> If you require additional information I would be glad to get it to you.
> Thanks,
>
> Lou Vanek
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/mailman/listinfo/openmcl-devel
>
>



More information about the Openmcl-devel mailing list