C thread becoming Mu thread (exposed functions, a.k.a. ".expfunc")
This issue is about calling Mu functions from C functions. It is not a problem if Mu initiated the call to native program and then it calls back. But when a fresh native thread (such as created by pthread_create
) directly calls a Mu function, thread-local states (such as GC states) must have been initialised, or the Mu program will not work properly.
Related spec: https://gitlab.anu.edu.au/mu/mu-spec/blob/master/native-interface.rst#native-functions-calling-mu-functions
Previous issue: #39
The problem
When a Mu thread is executing, there are thread-local states that needs to exist to support the execution of Mu IR programs.
For example, if the Mu IR program uses bump-pointer GC, the "current pointer" is a per-thread state, and it should point to the next available memory all the time. Mu instructions (such as NEW
and NEWHYBRID
) assumes such thread-local pointers are set up when such instructions are executed.
Such states are usually set up when a Mu thread is created. When a thread is created using the NEWTHREAD
instruction or its equivalent API, the micro VM will initialise the states properly.
But the problem arises when the thread is created natively (for example, by pthread_create
). Such POSIX functions are not designed with Mu in mind and will not initialise Mu-specific states. So a PThread cannot call Mu directly call a Mu function unless some preparation is done.
Current design
Related spec: https://gitlab.anu.edu.au/mu/mu-spec/blob/master/native-interface.rst#native-functions-calling-mu-functions
The current Mu spec requires implementation-defined functions to be called before native threads not created by Mu (such as POSIX threads) can call any exposed Mu functions.
A Mu bundle can define .expfunc
top-level definitions to directly expose pointers to C programs. For example:
.funcdef @fac ... {...}
.expfunc @fac_native = @fac #DEFAULT @I64_0 // expose @fac, default calling convention, use 0 as "cookie".
@fac_native
is a raw function pointer which can be called back when Mu calls C and then C calls back to Mu. But when PThread wants to call @fac_native
, it needs implementation-defined set-up.
Possible implementations
- The concrete micro VM can forbid such calls, and enforce that only Mu threads can execute Mu functions.
- The concrete micro VM can extend the API with a function to attach or detach PThreads, or threads using other APIs.
- The concrete micro VM can create Mu-specific thread-local states lazily when entering from native to Mu. Since the only way to enter Mu is via "exposed functions", hence stubs can be created at those "expfuncs" to lazily check for such states, or use SIGSEGV to trap when such pointers are zero.
Each has its own strength and weakness. This is why this interface is still implementation-defined for now. Real-world experiences will tell which method is better.
Multiple micro VMs in the same process?
It is rare that there will be one process running two micro VMs. But it is definitely possible. For example:
- A C host program provides both Python and Lua as extension languages (real-world applications exist), but both language implementations use the Mu micro VM.
- The client has some kind of sandbox mechanism and forces some part of the program to run in a separate micro VM.
Related works
JNI Invocation API
Related document: https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/invocation.html#attaching_to_the_vm
The JVM invocation API provides the AttachCurrentThread
function to attach a PThread to a JVM, under the limitation that a native thread cannot be attached to two different JVMs. JNI also require that the PThread stack "should have enough stack space to perform a reasonable amount of work" and "The allocation of stack space per thread is operating system-specific. For example, using pthreads, the stack size can be specified in the pthread_attr_t argument to pthread_create.".
From Mu's point of view, the MuCtx
structure holds Mu states for the client, so calling API functions in MuCtx
does not need any attaching. However, calling "exposed Mu functions" will need special set-up like AttachCurrentThread
.
JikesRVM
JikesRVM's GC is designed in such a way that it will work even if the related thread-local data structure is all zero (as is initialised by the system). This gracefully avoided the problem related to GC. But it could not be the most general solution.
.NET framework
Related documents: https://msdn.microsoft.com/en-us/library/74169f59(v=vs.110).aspx
VM-related thread-local states are created lazily when an unmanaged thread enters the managed runtime.