[Openmcl-devel] add-gc-hook / drain-termination-queue

Carlos Ungil carlos.ungil at gmail.com
Fri Sep 6 19:57:35 PDT 2013


Hi Gary,

On 06/09/2013, at 19:06, Gary Byers <gb at clozure.com> wrote:
> CCL::ADD-GC-HOOK isn't documented for a number of reasons; as you're
> calling it, it doesn't do anything useful. [....]
> I don't know what problem you're having, but it's not clear that causing that
> termination function to run on a different thread fixes anything.


I'll try to explain better why add-gc-hook is doing something useful for me (even if I'm not completely sure it is the right solution for my problem).

Remember that when r-pointer objects are created the rf-protect foreign function is called. And the finalizer will take care of calling the rf-unprotect-ptr foreign function later (by default from a different thread).

BASE CASE

  (ql:quickload :rcl) 
  (rcl:r-init)
  (reduce #'+ (loop repeat 10000 append (rcl:r "runif" 1)))

    Warning: stack imbalance in 'c', 1522 then 1521
    Error: unprotect_ptr: pointer not found
    Execution halted

The outcome is non-deterministic: there might be several warnings before the error, or even no error printed, but in any case the lisp process doesn't respond.

My interpretation: the garbage collector is triggered and objects finalized in a background thread, resulting in concurrent calls to rf-protect and rf-unprotect-ptr.
The R process can't handle that, and eventually it crashes.

SIMPLE SOLUTION: disable automatic termination before running the test, and rely on the user calling (ccl:drain-termination-queue) at appropriate times.
Kind of works, but there is a problem if too many objects are protected before the user manually unprotects those that have been gc'ed.

  (setq ccl:*enable-automatic-termination* nil)
  (ql:quickload :rcl) 
  (rcl:r-init)
  (reduce #'+ (loop repeat 10000 append (rcl:r "runif" 1)))

    Error: protect(): protection stack overflow
    > Error: error calling textConnection
    > While executing: RCL::R-FUNCALL, in process listener(1). 
    > Type :POP to abort, :R for a list of available restarts.
    > Type :? for other options.

In this case the lisp process can recover, trying to run R functions again the same error is raised until I call (ccl:drain-termination-queue)

My interpretation: I'm hitting the limit in the protection stack in the R process but when I run drain-termination-queue the finalizers run correctly cleaning the stack.

IMPROVED SOLUTION: run drain-termination-queue each time the garbage is collected, but do so from the main thread to be sure (?) that there are no concurrents calls to rf-protect.

  (setq ccl:*enable-automatic-termination* nil)
  (ccl::add-gc-hook
   (let ((process ccl:*current-process*))
     (lambda ()
       (ccl:process-interrupt process #'ccl:drain-termination-queue)))
   :post-gc)
  (ql:quickload :rcl)
  (rcl:r-init)
  (reduce #'+ (loop repeat 10000 append (rcl:r "runif" 1)))

My interpretation: Reading the documentation of process-interrupt I understand that drain-termination-queue is guaranteed to run at a time when no call to rf-protect is running (so the finalizers can run safely):
  "It is still difficult to reliably interrupt arbitrary foreign code (that may be stateful or otherwise non-reentrant); the interrupt request is handled when such foreign code returns to or enters lisp."
I'd say it solves my problem (using only one thread to interface with R-land seems an easy way avoid multi-thread issues) but I'm not sure whether disabling automatic-termination could also have undesirable side-effects.

> You could force the termination function for R-POINTER objects to run
> on some arbirtary thread P by making that function use PROCESS-INTERRUPT
> (e.g.,
> (trivial-garbage:finalize r-pointer?(lambda () (process-interrupt p (lambda () (rf-unprotect-ptr ptr)))))
> )

PROPOSED SOLUTION:  As I mentioned at the end of my previous message I also tried this solution but it doesn't work properly (I think it should: at worst I would have expected some impact on performance, given that there is an interruption for each object instead of one for each run of the garbage collector).

  (ql:quickload :rcl)
  (defun rcl::make-r-pointer (ptr)
    (let ((r-pointer (make-instance 'rcl::r-pointer :pointer ptr)))
      (rcl::rf-protect ptr)
       (let ((process ccl:*current-process*))
         (trivial-garbage:finalize r-pointer 
           (lambda () ;;; (princ ".")
             (ccl:process-interrupt process #'rcl::rf-unprotect-ptr ptr))))
      r-pointer))
  (rcl:r-init)
  (reduce #'+ (loop repeat 10000 append (rcl:r "runif" 1)))

It "hangs" (*), unless the number of iterations is low enough (v.g. 1000, the exact threshold seems to depend on the system load).
It works when the (princ ".") statement above is uncommented, and then I can run many more iterations (v.g. 100000) without problems.

My interpretation: CCL has issues if too many process-interruptions are scheduled at the same time. Slowing down the program (by printing) keeps the number of pending interruptions manageable.

Regards,

Carlos

(*) I'm running MacOSX, in Activity Monitor I see that the number of Mach Messages In/Out, System Calls and Context Switches keeps growing. The output of taking a "sample" follows:

Sampling process 69796 for 3 seconds with 1 millisecond of run time between samples
Sampling completed, processing symbols...
Analysis of sampling dx86cl (pid 69796) every 1 millisecond
Process:         dx86cl [69796]
Path:            /Users/ungil/lisp/ccl/dx86cl
Load Address:    0x11000
Identifier:      dx86cl
Version:         ??? (???)
Code Type:       X86 (Native)
Parent Process:  zsh [69790]

Date/Time:       2013-09-07 04:42:02.105 +0200
OS Version:      Mac OS X 10.8.4 (12E55)
Report Version:  7

Call graph:
    2784 Thread_866305   DispatchQueue_1: com.apple.main-thread  (serial)
    + 2784 start  (in dx86cl) + 40  [0x33ef4]
    +   2784 _start  (in dx86cl) + 207  [0x33fc4]
    +     2784 main  (in dx86cl) + 1310  [0x1c52e]  pmcl-kernel.c:2125
    +       2784 func_start  (in dx86cl) + 94  [0x19f79]
    +         2784 ???  (in <unknown binary>)  [0x387fbc]
    +           2784 ???  (in <unknown binary>)  [0x387fbc]
    +             2784 SPffcall  (in dx86cl) + 85  [0x1991d]  x86-spentry32.s:4326
    +               2784 nanosleep  (in libsystem_c.dylib) + 375  [0x96a37c10]
    +                 2784 mach_wait_until  (in libsystem_kernel.dylib) + 10  [0x9906c8e6]
    2784 Thread_866312
    + 2784 thread_start  (in libsystem_c.dylib) + 34  [0x96990d4e]
    +   2784 _pthread_start  (in libsystem_c.dylib) + 344  [0x969a65b7]
    +     2784 lisp_thread_entry  (in dx86cl) + 300  [0x2137c]  thread_manager.c:1676
    +       2784 func_start  (in dx86cl) + 94  [0x19f79]
    +         2784 SPffcall  (in dx86cl) + 85  [0x1991d]  x86-spentry32.s:4326
    +           2784 __read  (in libsystem_kernel.dylib) + 10  [0x9906fdba]
    2784 Thread_867787   DispatchQueue_2: com.apple.libdispatch-manager  (serial)
      2784 _dispatch_mgr_thread  (in libdispatch.dylib) + 53  [0x92b477a9]
        2784 _dispatch_mgr_invoke  (in libdispatch.dylib) + 993  [0x92b47c71]
          2784 kevent  (in libsystem_kernel.dylib) + 10  [0x9906f9ae]

Total number in stack (recursive counted multiple, when >=5):

Sort by top of stack, same collapsed (when >= 5):
        __read  (in libsystem_kernel.dylib)        2784
        kevent  (in libsystem_kernel.dylib)        2784
        mach_wait_until  (in libsystem_kernel.dylib)        2784



> On Fri, 6 Sep 2013, Carlos Ungil wrote:
> 
>> Hello,
>> to ensure that finalizers are run from the main thread, I'm doing the
>> following:
>> (setq ccl:*enable-automatic-termination* nil)
>> (ccl::add-gc-hook
>> ?(let ((process ccl:*current-process*))
>> ? ?(lambda ()
>> ? ? ?(ccl:process-interrupt process #'ccl:drain-termination-queue)))
>> ?:post-gc))
>> Is this a good idea? The fact that add-gc-hook is not exported makes me
>> doubt...
>> The context is the following: RCL embeds R in Common Lisp (using CFFI, the
>> rf-protect and rf-unprotect-ptr functions below are foreign functions). When
>> I need to hold a foreign pointer in the lisp side I wrap it in an object: I
>> "protect" it to prevent it from being garbage-collected in the R side and I
>> set up a finalizer to "unprotect" it when it's no longer in use.?
>> (defun make-r-pointer (ptr)
>> ? (let ((r-pointer (make-instance 'r-pointer :pointer ptr)))
>> ? ? (rf-protect ptr)
>> ? ? (trivial-garbage:finalize r-pointer?(lambda () (rf-unprotect-ptr ptr)))
>> ? ? r-pointer))
>> However, a function triggering garbage collection like the following one
>> crashes the program (because the protect and unprotect functions will run
>> concurrently, I think).
>> (ql:quickload :rcl)
>> (rcl:r-init)
>> (reduce #'+ (loop repeat 10000 append (rcl:r "runif" 1)))
>> The workaround mentioned above seems to solve the problem, but I wonder if
>> there might be some undesirable effects.?
>> I also tried to let the automatic termination enabled, forcing the finalizer
>> to run?each time?from the main thread
>> ? ? (trivial-garbage:finalize r-pointer?(let
>> ((process??ccl:*current-process*)) (lambda () (process-interrupt process
>> #'rf-unprotect-ptr ptr))))
>> but I couldn't make it work (the program hangs).
>> Cheers,
>> Carlos
>> 




More information about the Openmcl-devel mailing list