general-issue-tracker issueshttps://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues2018-06-28T23:11:31+10:00https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/71Benchmarking framework for Mu projects2018-06-28T23:11:31+10:00Zixian CaiBenchmarking framework for Mu projectsWe'd want a benchmarking framework to run various tests across different Mu projects (implementations, clients).
This will help us discover bugs or performance problems introduced in commits. It can also facilitate development by know...We'd want a benchmarking framework to run various tests across different Mu projects (implementations, clients).
This will help us discover bugs or performance problems introduced in commits. It can also facilitate development by knowing, for example, what the impact of decisions are.
Currently, we have a benchmark suite developed by @u5157779 under https://gitlab.anu.edu.au/mu/mu-impl-fast/tree/master/tests/test_jit
It compares the performance of
- RPython with Mu backend (which can be run on different Mu implementations)
- RPython with C backend
- Hand-written C
- Hand-written Mu (which can be run on different Mu implementations)
Some of these are specific to a certain Mu client (in this case, Mu backend of RPython).
For a more general framework, following client-neutral aspects can be abstracted out
- Collecting, storing and processing metrics
- Visualizing
Then, each project can have its own agent for the framework to invoke for executing tests and collecting results.
cc @U1817699 @u3789498 @u4776528 @u5157779Zixian CaiZixian Caihttps://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/59API: Automatic creattion of MuInstResNode2018-06-28T23:11:31+10:00Kunshan WangAPI: Automatic creattion of MuInstResNode# Problem
Currently instruction results are *added* to the instruction after the instruction is created.
Alternatively, results can be created *when* the instruction is created. When taking this approach, the client will never add th...# Problem
Currently instruction results are *added* to the instruction after the instruction is created.
Alternatively, results can be created *when* the instruction is created. When taking this approach, the client will never add the wrong number of results to any instructions.
In theory, the client knows everything the micro VM knows about the IR bundle, so it is always capable of adding the right number of results. Changing the API in the following way will let the micro VM provide slightly more help to the client, but should not affect the performance: the micro VM always has the required information.
Minimalism will be affected if more query API (finding the opcode/mnemonic of instructions, the instruction parameters, etc.) is added. The client builds the IR, and always has full information of the structure of the CFG. So the client is always capable of pretty-printing what it is going to build without using any of the Mu API functions. For convenience, however, query API could be provided at a higher level by client-side libraries, while the C-based API provides an **efficient** (rather than rich) tool to transfer the information from the client to the micro VM. For debug purpose, the reference implementation can now print the loaded bundle as text.
# Proposed change
Replace the `new_inst_res` API function:
```c
MuInstResNode (*new_inst_res )(MuCtx *ctx, MuInstNode inst);
```
with these :
```c
MuInstResNode (*get_inst_res)(MuCtx *ctx, MuInstNode inst, int index);
int (*get_num_res)(MuCtx *ctx, MuInstNode inst);
```
`get_inst_res` will get the *index-th* result from an instruction. `get_num_res` gets the number of results.
Other API functions do not need to be changed.
# About the number of results
**Can the number of results be determined when constructing the instruction?**
Checklist ( *instruction*: *numOfResults* ) :
- binary ops: 1
- comparing: 1
- conversion: 1
- SELECT: 1
- BRANCH: 0
- BRANCH2: 0
- SWITCH: 0
- CALL: the number of results in the signature
- TAILCALL: 0
- RET: 0
- THROW: 0
- EXTRACTVALUE: 1
- INSERTVALUE: 1
- EXTRACTELEMENT: 1
- INSERTELEMENT: 1
- SHUFFLEVECTOR: 1
- NEW: 1
- NEWHYBRID: 1
- ALLOCA: 1
- ALLOCAHYBRID: 1
- GETIREF: 1
- GETFIELDIREF: 1
- GETELEMIREF: 1
- SHIFTIREF: 1
- GETVARPARTIREF 1
- LOAD: 1
- STORE: 0
- CMPXCHG: 2 (succ, oldVal)
- ATOMICRMW: 1
- FENCE: 0
- TRAP: as many as the length of the type parameters.
- WATCHPOINT: as many as the length of the type parameters.
- WPBRANCH: 0
- CCALL: the number of results in the signature. Same as CALL
- NEWTHREAD: 1
- SWAPSTACK:
- 0 if `KILL_OLD`
- n if `RET_WITH<T1 T2 ... Tn>`
- COMMINST: currently all comminsts have fixed number of return values.
Kunshan WangKunshan Wanghttps://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/46Initialized aggregate values / objects2018-06-28T23:11:31+10:00John ZhangInitialized aggregate values / objects*Created by: eliotmoss*
For the C client, and more particularly for the Java client, we would like to be able to describe an initialized aggregate value. The obvious case is string literals for C. In Java we would like to be able to d...*Created by: eliotmoss*
For the C client, and more particularly for the Java client, we would like to be able to describe an initialized aggregate value. The obvious case is string literals for C. In Java we would like to be able to describe a Java char array for these. A Java array will be a HYBRID object. Not having these means unpleasant work-arounds involving element by element initialization by code that we have to arrange to have invoked before the objects could possibly be used.https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/45Multiple return values2018-06-28T23:11:32+10:00John ZhangMultiple return values*Created by: wks*
*As for 2016, this change is already merged in both the spec and the refimpl.*
This issue discusses the design choices of allowing functions and Mu instructions to return more than one value (or a tuple of values)...*Created by: wks*
*As for 2016, this change is already merged in both the spec and the refimpl.*
This issue discusses the design choices of allowing functions and Mu instructions to return more than one value (or a tuple of values).
# Status quo
The one-return-value assumption has **a chain of causes and effects** to the design of the IR.
First of all, Mu [**functions**](https://github.com/microvm/mu-spec-goto/blob/master/uvm-ir.rest#function-signature-definition) have multiple parameters and exactly one return value. i.e. `f : (P1,P2,P3,...,Pm) -> R` This is the tradition of C-like languages as well as LLVM. In case it does not return useful values, it returns `void`
Following this tradition, all Mu **instructions** return exactly one value, too. e.g. `%rv = CALL <...> @some_func (%a1 %a2 ... %am)`. In case more than one values need to be returned, a function shall return a `struct<T1 T2 ... Tm>`. In case no useful value is returned, it returns `void`. For example:
```
%result = CMPXCHG WEAK SEQ_CST SEQ_CST <T> %loc %expected %desired
// %result is a struct<T @i1>. The second result indicates if it is successful.
%oldval = EXTRACTFIELD <@cmpxchg_result_T 0> %result
%is_successful = EXTRACTFIELD <@cmpxchg_result_T 1> %result
```
Since all Mu instructions have a single return value, the [SWAPSTACK](https://github.com/microvm/mu-spec-goto/blob/master/instruction-set.rest#swapstack-instruction) instruction shall have exactly one return value, too. That return value is the value received from another stack (or injected by the client). Since one stack also passes value using the `SWAPSTACK` instruction, **SWAPSTACK takes one parameter**, too, since *the peer only receives exactly one return value*. e.g. in the sender: `SWAPSTACK %receiver ... PASS_VALUE <@T> %val`; in the receiver: `%retval = SWAPSTACK %sender RET_WITH <@T> ...`
Since a swapped-away stack always stops at so called "**resumption point**" (resumable by swapstack) instructions (CALL, TRAP, WATCHPOINT, SWAPSTACK, CCALL), all of which receive exactly one return value, then the **state of a stack** is `READY<T>` when swapped away, where T is a single type -- the return type of the *resumption point*. This makes one important special case: **a newly-created frame** (by [NEWSTACK](https://github.com/microvm/mu-spec-goto/blob/master/instruction-set.rest#newstack-instruction) or [push_frame](https://github.com/microvm/mu-spec-goto/blob/master/uvm-client-interface.rest#on-stack-replacement) (OSR)). Currently since a function takes multiple parameters but SWAPSTACK passes only one argument, there is no way to supply all arguments during SWAPSTACK. As a compromise, NEWSTACK initialises all arguments and the subsequent SWAPSTACK passes `void`.
As a side effect, **the return value uniquely identifies the instruction**. This can bring some profits: during OSR, the variable that holds the return value can identify the OSR-point (instructions that may be the "current instruction" in a `READY<T>` stack) instructions (CALL, TRAP, WATCHPOINT, SWAPSTACK, CCALL, all happen to be *resumption points*. This is actually intentionally designed as so.). This is especially useful for **the TRAP instruction**, in which case the trap handler always first identify which TRAP is triggered. Because of the one-to-one mapping between instructions and return values, the term "SSA variable of an instruction" and "the instruction" are used interchangeably.
# Proposed change
The change starts with adopting "multiple return values".
First, **functions now return 0 or more values**. i.e. `f : (P1,P2,P3,..,Pm) -> (R1,R2,...,Rn)`. For compatibility with C programs (in the unsafe native interface) , a 0-tuple is returned when the C function returns `void`. i.e. either `c_function: (P1,P2,...,Pm) -> (R1)` or `c_function: (P1,P2,...,Pm) -> ()`.
Then **Mu instructions may return 0 or more values**. When no useful values are returned, the instruction returns `()`. The `CALL` instruction has exactly as many return values as the callee's return values. For example:
* `(%oldval %succ) = CMPXCHG WEAK SEQ_CST SEQ_CST %loc %expected %desired`:
* `(%rv1 %rv2 %rv3) = CALL <..> @func (%a1 %a2 ...)`
* `RET %rv1 %rv2 %rv3`
* `() = BRANCH %blah(....)` or just `BRANCH %blah(...)`
Then **SWAPSTACK can pass more than one values to the swappee**. The swappee expects to receive a list of values (statically decided at the particular SWAPSTACK instruction). If the swapper does not match the value required by swappee, it has undefined behaviour. For example:
```
// receiver:
(%rv1 %rv2 %rv3) = SWAPSTACK %sender_stack RET_WITH <@T1 @T2 @T3> PASS_VALUE <> ()
// sender:
() = SWAPSTACK %receiver_stack RET_WITH <> PASS_VALUES <@T1 @T2 @T3> (%v1 %v2 %v3)
// bad sender:
() = SWAPSTACK %receiver_stack RET_WITH <> PASS_VALUES <@T1 @T2> (%v1 %v2) // ERROR: too few values
```
Since all *resumption point* instructions (CALL, TRAP, WATCHPOINT, SWAPSTACK, CCALL) may receive multiple values (in fact, all instructions can), **the state of a "swapped-away" stack is `READY<T1 T2 ... Tn>`**. When binding a thread to a stack, exactly that many values need to be bound.
Then **the state of a newly created frame is `READY<T1 T2 ... Tn>`** where T1 T2 ... Tn are the parameter types of the function (also the entry block). **NEWSTACK no longer supply arguments**, but **the first SWAPSTACK supplies the parameters to the new frame**. For example:
```
.funcsig @sig = (@T1 @T2 ... @Tm) -> (@R1 @R2 ... @Rn)
.funcdef @f VERSION %v1 <@sig> {
%entry(<@T1> %p1 <@T2> %p2 .. <@Tm> %pm):
...
}
// in another function:
%new_stack = NEWSTACK <@sig> @a
// later
SWAPSTACK %new_stack RET_WITH <...> (...) PASS_VALUES <@T1 @T2 ... @Tm> (%v1 %v2 ... %vm)
```
Now **not all instructions are identifiable**. If an instruction does not return value, e.g. a trap that does not expect values from the client (which is the most common case): `TRAP <> KEEPALIVE(...)`, then there is no identifiers for these instructions. As a compromise, explicit names can be given to instructions using a slightly different syntax:
```
() = [%the_important_trap] TRAP <> KEEPALIVE (%local %vars)
// or simply
[%the_important_trap] TRAP <> KEEPALIVE (%local %vars)
// it also works if there are return values:
(%rv1 %rv2) = [%the_important_call_site] CALL <@sig> @func (%arg1 %arg2)
```
And **SSA variables** and **instructions** completely divorce. Now an instruction has **a list of results**, each of which is an SSA variable. The new SSA variable hierarchy is:
- SSA variable
- global variable
- constant, global cell, function, exposed function
- local variable
- parameter
- one result of an instruction (was just "instruction")
This change also undoes https://github.com/microvm/microvm-meta/issues/43, a previous proposal to make void more like "unit". Now we have the real "unit" return value: `()`, and the `void` type has only two use cases:
* `hybrid<void T>` for hybrids without a fixed part, and
* `ref<void>`, `iref<void>`, `weakref<void>` and `ptr<void>`, for referencess/pointers to "anything".
# Coroutine Abstractions in High-level Languages
## Similar to the old Mu model
Python's generator is similar to the old model. All arguments are supplied when the generator is created. The first `g.send()` must send `None`.
```python
def gen(a,b,c):
print("Hello!", a, b, c)
d = (yield)
print("d = ", d)
return
g = gen(1,2,3) # initialises a, b and c
g.send(None) # must be None, otherwise TypeError: can't send non-None value to a just-started generator
# prints Hello 1 2 3
try:
g.send(4) # received by d. The generator throws StopIteration to the main coroutine.
# prints d = 4
except StopIteration as e:
print("bye")
```
If the new model is desired, the program can be rewritten as:
```python
def gen():
def actual_gen():
a,b,c = (yield)
print("Hello!", a, b, c)
d = (yield)
print("d = ", d)
return
ag = actual_gen()
ag.send(None)
# now ag stops at the first yield
return ag
g = gen() # just creates the actual_gen
g.send((1,2,3)) # received by a,b,c at the first yield. a,b,c are not really parameters.
# prints Hello 1 2 3
try:
g.send(4) # received by d. The generator throws StopIteration to the main coroutine.
# prints d = 4
except StopIteration as e:
print("bye")
```
## Similar to the new model
Lua's coroutine model is similar to the new model. Arguments are supplied at the first `coroutine.resume`.
```lua
function gen(a,b,c)
print("Hello", a, b, c)
local d = coroutine.yield()
print("d = ", d)
return
end
co = coroutine.create(gen)
coroutine.resume(co, 1,2,3) -- initialises a, b, c. If less arguments are supplied, other parameters receive nil
-- prints Hello 1 2 3
coroutine.resume(co, 4) -- received by d
-- prints d = 4
```
If the old model is desired, the code can be rewritten as:
```lua
function make_gen(a,b,c)
local function actual_gen(a, b, c)
coroutine.yield()
print("Hello", a, b, c)
local d = coroutine.yield()
print("d = ", d)
return
end
local coro = coroutine.create(actual_gen)
coroutine.resume(coro, a,b,c) -- immediately initialises a,b,c. They are not parameters of actual_gen, but "up-values" of it.
return coro
end
co = make_gen(1,2,3)
coroutine.resume(co) -- It is stopping on its first empty yield() point
-- prints Hello 1 2 3
coroutine.resume(co, 4) -- received by d
-- prints d = 4
```
In conclusion, both models of swapstack are equivalent. It just takes a few initial swapstack operations to mimic each other.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/42Redesign the undefined function handler API2018-06-28T23:11:32+10:00John ZhangRedesign the undefined function handler API*Created by: wks*
# Current status
Currently there are two handlers: *trap handler* and *undefined function handler*. They are different.
In trap handlers:
* Thread is temporarily unbound from the stack, and is rebound after tr...*Created by: wks*
# Current status
Currently there are two handlers: *trap handler* and *undefined function handler*. They are different.
In trap handlers:
* Thread is temporarily unbound from the stack, and is rebound after trap handling.
* Stack is unbound and is ready for introspection.
* OSR (pop frame, push frame) is possible.
In undefined function handlers:
* Thread is, conceptually, still executing the CALL instruction that calls the undefined function.
* The stack is in the RUNNING state, not available for introspection or OSR
* The client must define the function. Then CALL will be tried again. i.e. If the client does not define the function, it will loop forever.
This asymmetry is complicating the interface.
1. The undefined function handling API **leaves no room for errors in loading the bundle**. If, in Java, the client puts a stub for an unloaded method, and there is an error when loading the class, then the client has no other things to do than "defining" the method as "doing nothing but throwing an exception". This isn't very nice.
2. It leaves the stack and the thread in a strange state: they cannot continue, but still "RUNNING".
3. Introspection and OSR are impossible.
# Proposed change
I propose unifying the two interfaces.
Let undefined functions behave like:
```
.funcdef @undefined_function VERSION noversion <sig> (a1 a2 a3 ...) {
%entry:
TRAP <void> KEEPALIVE (a1 a2 a3...)
undefined behaviour after TRAP
}
```
That is, an undefined function behaves like executing a trap with all parameters as keep-alive variables.
Add an extra API call:
```
MuID (*cur_func)(MuCtx *ctx, MuStackRefValue stack, int frame)
```
It returns the ID of the function of the current frame. It works even if the function is not defined. Return 0 if the frame is native.
Modify the `cur_func_ver` API call: It returns 0 when the frame is a frame without version (undefined function).
So a client can identify an undefined function by "cur_func != 0 && cur_func_ver == 0". FYI:
* ``cur_func != 0 && cur_func_ver != 0``: a defined Mu function
* ``cur_func != 0 && cur_func_ver == 0``: an undefiend Mu function
* ``cur_func == 0 && cur_func_ver == 0``: native frame
* ``cur_func == 0 && cur_func_ver != 0``: impossible
Modify the `cur_inst` API call: It returns 0 if the frame is just created (before the first instruction), is native, or *if the function is not defined*.
Modify the `dump_keepalives` API call: It dumps keep-alive variables for OSR point instructions, or *the arguments to an undefined function*. Cannot be used on native frames or a just-created frame.
The `pop_frame` and the `push_frame` API behave like before. Specifically to undefined functions, popping an undefined function frame reveals its caller to the top of the stack. After loading a bundle that defines the function, a `push_frame` can re-creates the callee frame, but this time the callee is defined. The client can also choose not to rebind to the same stack, but swap away. Or just throw an exception (probably `ClassNotFoundException`) to the caller without defining the callee.
# About the NEWSTACK instruction
Creation of the stack will be successful, but it will trap when executing the undefined function after binding the stack to a thread.https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/41Checklist for the first final version of the specification2018-06-28T23:11:32+10:00John ZhangChecklist for the first final version of the specification*Created by: wks*
- [X] HAIL #29
- [X] Unsafe native interface. #24
- [X] API in C (header)
- [X] API in C (documented)
- [X] API as Mu instructions (common instructions). #36
- [X] Adjust the undefined function API
*Created by: wks*
- [X] HAIL #29
- [X] Unsafe native interface. #24
- [X] API in C (header)
- [X] API in C (documented)
- [X] API as Mu instructions (common instructions). #36
- [X] Adjust the undefined function API
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/36Metacircular Client Interface2018-06-28T23:11:32+10:00John ZhangMetacircular Client Interface*Created by: wks*
It has been proposed long ago that "everything the API can do should also be possible in the Mu IR".
This issue maintains a checklist of features not in the Mu IR. These functions should be gradually added to the IR...*Created by: wks*
It has been proposed long ago that "everything the API can do should also be possible in the Mu IR".
This issue maintains a checklist of features not in the Mu IR. These functions should be gradually added to the IR.
**Features in the API but not the IR**:
* bundle loading
* stack introspection: `current_func_ver`, `current_instruction`, `dump_keepalives`
* OSR: `pop_frame`, `push_frame`
* Handle traps and undefined functions in Mu IR. It depends on how the Mu VM itself is implemented.
* Handle watchpoint in Mu IR: an instruction which is a no-op when disabled, but a (maybe limited kind of) Mu function call when enabled.
**Things that can be done dynamically via handles in the API, but can only be done statically in the IR**. I am not sure how dynamic Mu should be, or need to be, because some of the items below can be worked around with some Mu IR code, such as maintaining a hash table implemented in Mu IR, or writing wrapper functions.
* Opaque handle type: a handle that holds **any** Mu value. `ref<T>` can be a candidate for this purpose.
* Getting a Mu constant value (including constants, globals and functions) by its ID. This is a kind of introspection.
* This can be done with a type argument: `%c = GET_CONST <@T> %id`, `%c` is a `@T`
* Or return a handle: `%ch = GET_CONST_H %id`, `%ch` is a `ref<void>`
* Creating Mu heap objects by a type ID.
* Calling a Mu function (or constructing a Mu frame) with both the callee and the arguments as handles. This allows calling a Mu function with a run-time-determined arity and arg types.
* Dump keepalive variables.
* This can be done with a type argument: `%sixth_ka = GET_KEEPALIVE <@T> %frame 6`
* or return handle: `%sixth_ka_handle = GET_KEEPALIVE_H %frame 6`
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/34Extra types for the Native Interface2018-06-28T23:11:32+10:00John ZhangExtra types for the Native Interface*Created by: wks*
Philosophy: There should be a subset of Mu types and instructions that can do what C can do. It should be possible to implement the C programming language in this subset of Mu while still be able to access the memory i...*Created by: wks*
Philosophy: There should be a subset of Mu types and instructions that can do what C can do. It should be possible to implement the C programming language in this subset of Mu while still be able to access the memory in a way specified by the platform's ABI (be compatible with "good" native programs).
# Types
## Pointer types
* `ptr<T>`: A memory pointer to type `T`. (Is there a better name? A pointer always points to somewhere in the memory. Maybe "data pointer" or "value pointer"? In C, it is object pointer, but "object" has a different meaning in Mu.)
* `funcptr<sig>`: A function pointer to a function with signature `sig`
Pointers are addresses. They can be cast to and from integer values by interpreting the integer as the address. Mu does not check the validity of this cast.
`ptr<T>` can be used by the memory addressing instructions: `GETFIELDIREF`, `GETELEMIREF`, ... will work as they are `iref` types. Memory access instructions can work with `ptr<T>` with a `PTR` flag:
```
// %p is ptr<int<64>>
%result1 = LOAD PTR ACQUIRE <@i64> %p
STORE PTR RELEASE <@i64> %p @const1
%result2 = CMPXCHG PTR SEQ_CST SEQ_CST <@i64> %p @const1 @const2
%result3 = ATOMICRMW PTR SEQ_CST ADD <@i64> %p @const3
```
`funcptr<sig>` can be called with the `CCALL` instruction:
```
// assume @write is funcptr<@size_t (@i32 @voidptr @size_t)>
%result = CCALL C <@sig> @write (%fd %buf %sz) // C means the "C" calling convention
```
## Union type
I think there is a way to introduce the union type from C without compromising the safety of Mu's reference types.
Define the union type as: `union<T1 T2 T3 ...>`
`T1`, `T2`, `T3`, ... are its members. The members of a union type cannot contain `ref`, `iref`, `weakref`, `func`, `thread`, `stack` or `tagref64` types as they are either object references or opaque references. However, `ptr` and `funcptr` are allowed.
`union` must be in the memory. It cannot be the type of an SSA variable. It does not make sense: union is a, err..., "union" of several types (no puns intended), but an SSA variable holds exactly one type.
> One may argue that "I want to LOAD a union and STORE to another location without looking into it, so I need union to be an SSA variable". However, for data transfer, there could be a `memcpy`-like instruction that can copy large structures efficiently. So it is unnecessary.
When allocated in the Mu memory, its initial value is all zeros: If any member is loaded before another value is stored into it, the result is always the "zero value" of that type (int 0, fp +0.0, ref NULL).
A union only holds the latest stored member:
* if a load is **not atomic*, and there is only one visible store to a member of the union, then
* if the store accesses the same member as the load, the load gets the value of that store;
* if the store accesses a different member, the load instruction has undefined behaviour.
* Union members cannot be accessed atomically.
> I am still uncertain how the C memory model plays together with unions. C11 defines a union as "an overlapping set of member objects" and "When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values." This implies that storing into one member of a union has the side effect of modifying other members.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/30Mu Loadable Format (MuLF)2018-06-28T23:11:32+10:00John ZhangMu Loadable Format (MuLF)*Created by: wks*
This proposal describes an extended code delivery unit of the Mu VM.
# Rationale
A "standalone Mu IR" (if there is such thing) needs more than a code bundle to run. They include:
* A way to allocate and initiali...*Created by: wks*
This proposal describes an extended code delivery unit of the Mu VM.
# Rationale
A "standalone Mu IR" (if there is such thing) needs more than a code bundle to run. They include:
* A way to allocate and initialise heap objects at load time (addressed by the HAIL format. See https://github.com/microvm/microvm-meta/issues/29 )
* Embedded binary native programs (for example, native libraries, a native client, or even the Mu VM itself)
* Static dependencies to other units of loading (MuLF files).
* (optionally) An entry point to start execution.
Existing mechanisms can perform all of the above because the client has total control over the Mu VM. This proposal only gives a "standard" format to do so.
# Proposal
This new unit of loading is called **Mu Loadable Format (MuLF)**.
## The MuLF file sample
This proposal uses XML as the human-readable format. It is obviously not ideal.
```xml
<mulf>
<dependency kind="mulf" name="uvm.std.io" />
<dependency kind="native" name="libc.so" />
<muir-bundle> <!-- the code section --> <![CDATA[
.typedef @i64 = int<64>
.typedef @i8 = int<8>
.typedef @string = hybrid<@i64 @i8>
.typedef @ref_string = ref<@string>
.typedef @array_ref_string = hybrid<@i64 @ref_string>
.typedef @ref_array_ref_string = ref<@array_ref_string>
.const @I64_0 <@i64> = 0 // to be initialised in the next section
.global @helloWorld <@ref_string>
.funcsig @main_sig = @i64 (@ref_array_ref_string)
.funcdef @main <@main_sig> (%args) {
%entry:
%hw = LOAD <@ref_string> @helloWorld
CALL <@println_sig> @println (%hw)
RET <@i64> @I64_0
}
]]>
</muir-bundle>
<heap-initialise format="hail"> <!-- the "heap section" --> <![CDATA[
.newhybrid $hw_obj <@string>
.init $hw_obj = {12, {'H', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd', '!'}}
.init @helloWorld = $hw_obj // Assign this object to the global cell @helloWorld
]]>
</heap-initialise>
<binary kind="native-code">
...
</binary>
<binary kind="native-data">
...
</binary>
<initialiser function="@main" synchronous="true">
<param kind="cmdline-args-as-standard-string-ref-array" />
</initialiser>
</mulf>
```
## MuLF Clients
As other programming languages, this format is handled by a **MuLF client**. This is to keep the core Mu VM minimal.
* The Mu micro VM provides the Mu Client API, which can load [**Mu IR bundles**](https://github.com/microvm/microvm-spec/wiki/uvm-ir) and [**HAIL files**](https://github.com/microvm/microvm-meta/issues/29), as well as mechanisms to create Mu stacks and Mu threads.
* The MuLF client handles the MuLF format. The client loads and parses the MuLF file, invokes the API calls to load the included bundle and the included HAIL file, creates stacks and threads to execute the initialisation functions and perform necessary synchronisation to make sure the initialisation functions finish before "other parts" (the meaning depends on the concrete high-level language) can run.
## MuLF sections
* **dependencies**: references to other MuLF files or native libraries
* **uir-bundle**: a Mu IR bundle
* **heap-initialise**: a HAIL file which initialises the heap
* **binary**: embedded binary data
* **initialiser**: a function to be executed after loading the MuLF bundle
## The loading process
The MuLF client shall process dependencies before loading the current MuLF file.
TODO: The Mu IR is designed not to allow circular dependencies (just put multiple Mu IR bundles into one big bundle so circular types and function calls can be resolved). But whether MuLF allows circular dependencies is an open question.
Then the MuLF client loads the binary section, then the uir-bundle section, then the heap-initialiser section.
Finally the MuLF executes the initialisers in the order they are declared. Each initialiser is executed in a new Mu stack and a new Mu thread. If an initialiser is marked as "synchronous", the loading process pauses and waits for the initialiser function to return. But this does not prevent the initialisers to trigger traps to the client and result in other Mu IR bundles or MuLF files to be loaded.
Then the loading process finishes. The Mu VM will continue executing until the last Mu thread is killed.
## The relation between the MuLF client and the higher-level language client
The MuLF client can be considered as a "client-level library" which helps a higher-level client which implements a language.
Conversely the higher-level client can be considered as a library in the MuLF "framework": implementing language-specific things reactively as "call-backs" from the MuLF client.
# ELF compatibility
Using the standard ELF format will bring many profits, including making use of existing system facilities. It is possible to make a MuLF file a self-contained executable file.
# Open questions
**Do we standardise this MuLF as part of the Mu VM specification?**
Probably yes. It should provide a standard way to load "something more than a code bundle". However, this MuLF format is more oriented to the traditional ahead-of-time "linker-loader" model rather than the JIT compiling model. Complex languages (like Java) may wish to precisely control its loading process (e.g. loading more than one circularly-related classes and submit them in one huge Mu IR bundle and initialise heap objects together). In this case, MuLF is not as useful as ahead-of-time compiled programs.
We may make MuLF an "optional component" of the Mu VM. Some very tiny Mu implementation may not have it, but anyone who claims to implement MuLF shall do it in the standard-compliant way.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/26Tutorial and Spec versioning2018-06-28T23:11:32+10:00John ZhangTutorial and Spec versioning*Created by: wks*
I am evaluating http://readthedocs.org/ . A µVM tutorial is being written.
Jekyll seems to bias strongly towards blogging and the structures (including a table of content and a per-page sidebar of the outline) are n...*Created by: wks*
I am evaluating http://readthedocs.org/ . A µVM tutorial is being written.
Jekyll seems to bias strongly towards blogging and the structures (including a table of content and a per-page sidebar of the outline) are not automatically handled.
There should be a common version scheme for the spec, the refimpl and the tutorial. I propose using the major version "2" for the current version. The tutorial will target the "main" version (currently "2").
GitHub Wiki sucks (very primitive structure support, no table of content). Considering migrating the specification (https://github.com/microvm/microvm-spec/wiki) back to Sphinx/reStructuredText if readthedocs.org is proved useful.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/22Trap and OSR2018-06-28T23:11:32+10:00John ZhangTrap and OSR*Created by: wks*
This ticket tracks the design of trap handling, stack introspection and OSR API.
# Overview
There are three flavours of stack usage.
The first case is using TRAP to compute some value (or change some µVM state...*Created by: wks*
This ticket tracks the design of trap handling, stack introspection and OSR API.
# Overview
There are three flavours of stack usage.
The first case is using TRAP to compute some value (or change some µVM states)
1. Enter the `handle_trap` call-back from a TRAP instruction, leaving the stack in the `READY<T>` state. The current thread is unbound and suspended.
2. The Client compute some value (of type `T`) and change some µVM states.
3. Return from the trap and continue normally. That is, re-bind the thread to the stack, passing the value of type T.
The second case is using TRAP for OSR.
1. Enter the `handle_trap` call-back from a TRAP instruction, leaving the stack in the `READY<T>` state. The current thread is unbound and suspended.
2. The Client queries the current version of function, the current instruction, and the current KEEPALIVE variables of any frame in the current stack.
3. The Client pops frames. Now the stack is in some inconsistent state.
4. The Client pushes frames. For each frame, supply the current version of function, the current instruction and the value of any live variables.
5. The Client re-bind the thread to the stack and return from the TRAP. Just before rebinding, the stack should be in some `READY<U>` state where U may not be T.
The third case is to manipulate some arbitrary stack in a `READY<T>` state.
1. The Client do whatever it wants to the stack.
2. The stack is in `READY<U>` state where U may not be T.
# Open questions
## Is an UNDER_CONSTRUCTION flag needed?
Observed from the previous cases, there are generally two categories:
1. Do not perform OSR and simply return with some value (or throw exceptions to the stack).
2. Perform OSR.
The second case may leave the stack temporarily in an inconsistent state. Any attempt to swap-stack to such a stack is meaningless. The UNDER_CONSTRUCTION flag indicates such a state.
This requirement can be interpreted in two ways:
1. This flag is a physical flag. The Client takes an action to set the flag. After OSR, it clears the flag. This flag can be probed and can be tested during SWAP-STACK. A µVM implementation may implement a mutual-exclusive lock for swap-stack (but may be inefficient).
2. This flag is only conceptual, that is, it does not physically exist. The Client simply does OSR. There is no way to see whether a stack is "under construction". Swapping to such a stack gives undefined behaviour.
I prefer the second approach. Swapping to an "under construction" stack is never meaningful and always requires extra synchronisation in the program. We may trust the Client to generate correctly synchronised code.
## What state is a stack in when some frames are popped?
All frames other than the top frame must be executing the CALL instruction. After popping any frame, the "caller frame" is exposed as the top frame and it may continue with a value or receive an exception just like the TRAP instruction. So it is natural to define that after popping, the stack is in the `READY<R>` state where R is the return type of the current function.
However, from the implementation point of view, SWAP-STACK must have a different calling convention from ordinary calls (mainly because SWAP-STACK cannot have any callee-saved registers because the callee may not swap back). There must be a "ghost frame" above the current frame with CALL to adapt to the SWAP-STACK calling convention. The value passed by SWAP-STACK will be returned from the "ghost frame" to the CALLer.
We may assume adding the "ghost frame" is cheap. Maybe not.
# Hypothetical Client code in Java
This code lets the Client perform some computation.
```Java
/* Assume the following µVM IR code:
%bb:
@current_time_millis_1234567 = TRAP <@i64>
CALL <...> @print (@current_time_millis_1234567)
....
*/
class Client extends MicroVMClient {
@Override
public TrapReturnValue handleTrap(ClientAgent ca, int threadHandle, int stackHandle) {
long time = System.currentTimeMillis();
ca.putLong("@i64", time);
return new RebindThreadPassValue(stackHandle, time);
}
}
```
This code replaces the top frame:
```Java
class Client extends MicroVMClient {
@Override
public TrapReturnValue handleTrap(ClientAgent ca, int threadHandle, int stackHandle) {
// Introspect the frames
int curInstID = ca.getCurrentInstruction(stackHandle, 0); // 0 = top frame
int[] keepAlives = ca.dumpKeepAlives(stackHandle, 0); // 0 = top frame
// Re-compile the function. newFunc also tells the Client where to continue.
HighLevelFunction newFunc = compileNewFunction(...);
// Pop a frame
ca.popFrame();
// What µVM function is the new high-level function?
int funcID = newFunc.getUvmFuncID();
// Where to continue?
int contInstID = newFunc.getContinuationPoint();
// What are the values of local variables?
Map<Integer, Integer> variableToValue = new HashMap<Integer, Integer>();
for (LocalVariable lv: newFunc.localVariables()) {
int valHandle = ca.putXxxx(lv.getValue())
int varID = lv.getUvmVarID();
variableToValue.put(varID, valHandle)
}
// Push the frame
ca.pushFrame(funcID, contInstID, variableToValue);
// Return from TRAP, tell the µVM to re-bind the thread with the stack. The trap does not receive values.
return new RebindThreadPassVoid(stackHandle);
}
}
```
This example emulates the JVMTI function `ForceEarlyReturnInt` (force a function (of `int` return value) to return early with a specific value, not executing any finalisers).
```java
/*
Assume the following µVM function:
.funcsig @foo_sig = @i32 ()
.funcdef @foo VERSION @foo_v1 () {
%entry:
@my_trap_xxxxxx = TRAP <@void>
THROW @NULLREF
}
*/
class Client extends MicroVMClient {
@Override
public TrapReturnValue handleTrap(ClientAgent ca, int threadHandle, int stackHandle) {
ca.popFrame(stackHandle); // Pop the top frame and expose its caller to the top
int returnValue = 42;
int rvHandle = ca.putInt("@i32", returnValue);
return new RebindThreadPassValue(stackHandle, rvHandle);
}
}
```
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/21Proposed Lua-like µVM-Client Interface2018-06-28T23:11:32+10:00John ZhangProposed Lua-like µVM-Client Interface*Created by: wks*
In Lua, the C program exchanges values with a "Lua state" using a stack.
1. All Lua values are kept on the stack and can converted to and from C values on demand.
2. All Lua references must be kept on the stack. Al...*Created by: wks*
In Lua, the C program exchanges values with a "Lua state" using a stack.
1. All Lua values are kept on the stack and can converted to and from C values on demand.
2. All Lua references must be kept on the stack. All operations involving tables require the table operand to be on the stack.
Why?
1. The type systems of Lua and C are different. This stack segregates all Lua types from C types.
2. Lua uses garbage collection. (The official Lua uses mark-sweep.) This stack simplifies GC by preventing the C program from keeping a reference to the Lua world.
Reference: The stack in the Lua C API http://www.lua.org/pil/24.2.html
# Overview
The Client interacts with the µVM via a µVM Client Agent. The Agent keeps a stack of µVM values, a thread-local allocator and so on. It is the counterpart of a µVM thread, albeit working for the Client. Each Agent is only accessible from one Client thread (not thread safe), but a µVM Client may have arbitrary number of Agents.
There are several principles:
* The stack holds any µVM value that can be held in a µVM SSA variable. A cell in the stack is like a µVM SSA variable. Unlike the µVM memory, it does not have memory location and cannot be referred to.
* The Client provide implementation-specific way to add Client values into the Agent stack and extracting µVM values to Client values.
* All µVM API messages that take µVM values as parameters shall use existing µVM values on the stack. All µVM API messages that return µVM values shall push new values on the stack.
Example (Java as Client):
```java
MicroVM mvm = …;
ClientAgent ca = mvm.newClientAgent();
// push some values
ca.pushInt("@i32", 0x12345678);
ca.pushLong("@i64", 0x123456789abcdef0L);
ca.pushFloat(3.14F);
ca.pushDouble(6.28);
// The Java integer type and the µVM type does not need to match.
// The following pushes convert Java integer type to µVM types of different lengths
ca.pushInt("@i64", -0x22334455);
ca.pushLong("@i32", 0x123456789abcdef0L); // truncated to 0x9abcdef0
ca.pushInt("@i1", 1); // integer of 1 bit (boolean type)
ca.pushBool("@i1", true); // same as above.
// Retrieving values from the stack
int v1 = ca.toInt(-4); Get the 4th top element from the stack and convert to Java int. So it is -0x22334455.
long v2 = ca.toLong(-4); Same as above, but convert to Java long (zero extended). So it is 0xddccbbabL.
// Popping
ca.pop(7); Pop 7 elements from the stack.
// Memory access
ca.pushGlobal("@global_var"); // Get the internal reference of a global cell and push on the stack
ca.load(MEMORD_ACQUIRE); // Load. Assume the top element is an internal reference.
ca.pushGlobal("@global_var");
ca.pushInt("@i64", 42);
ca.store(MEMORD_RELEASE); // Store. The top element is the new value and the second is the internal reference.
// Calling a µVM function
// This is fairly complicated because this involves creating both a µVM stack and a µVM thread.
ca.pushFunc("@some_func"); // Push a function reference
ca.pushInt("@i32", 42); // Push argument1
ca.pushDouble(3.14); // Push argument2
try {
ca.newStack(2); // Create a new stack, using a function with 2 arguments on the stack.
// So the top 2 elements are arguments and the third element is the function itself.
// Those elements are popped and a new stack value is pushed.
} catch (MicroVMStackOverflowException e) {
...
}
ca.newThread(); // Create a new thread. Assume the top element on the stack is a µVM stack value.
```
# API functions
The principles are
+ Pushing operations add new values to the top.
+ The popping operation removes values from the top.
+ Queries (the operations that convert values back to Java values) are non-destructive. They keep the values.
+ Operations on the top of the stack are destructive: they pop the operands, like the JVM, then push new values on the top.
## Getting the µVM Client Agent
* message: `new_client_agent`
* parameters: none
* returns: a handle of new client agent
Create a new Client Agent.
Example Java signature: `public ClientAgent MicroVM#newClientAgent()`
* message: `close_client_agent`
* parameters:
1. `ca`: the handle to the client agent
* returns: none
Close the Client Agent.
Example Java signature: `public void ClientAgent#close()`
## Pushing new values to the stack
* message: `push_value`
* parameter:
1. `uvm_type`: the µVM type of the value
2. `val`: the value in the Client's representation
* returns: none
* stack top:
+ before: ...
+ after: ..., `val_in_uvm_type`
Convert the Client value `val` to the µVM type `uvm_type` and push it to the stack. This message only work for non-reference values, including integers and floating point numbers.
The µVM may implement this as multiple functions/methods that best suits the Client programming language.
Example Java signatures:
* `public void ClientAgent#pushInt(int uvmTypeID, int val)`
* `public void ClientAgent#pushLong(int uvmTypeID, long val)`
* `public void ClientAgent#pushBigInteger(int uvmTypeID, BigInteger val)`
* `public void ClientAgent#pushFloat(int uvmTypeID, float val)`: will truncate/extend to the µVM type
* `public void ClientAgent#pushDouble(int uvmTypeID, double val)`: will truncate/extend to the µVM type
* `public void ClientAgent#pushFloatNoType(float val)`: always convert to the µVM `float` type
* `public void ClientAgent#pushDoubleNoType(double val)`: always convert to the µVM `float` type.
* message: `push_global`
* parameters:
1. `global`: the ID/name of a µVM global cell
* returns: none
* stack top:
+ before: ...
+ after: ..., `global_val`
Push an internal reference of a global cell to the stack.
Example Java signature: `public void ClientAgent#pushGlobal(int uvmGlobalID)`
* message: `push_func`
* parameters:
1. `func`: the ID/name of a µVM function
* returns: none
* stack top:
+ before: ...
+ after: ..., `func_val`
Push a function reference (µVM's `func` type) of a µVM function to the stack
Example Java signature: `public void ClientAgent#pushFunc(int uvmFuncID)`
## Converting to Client types
* message: `to_client_value`
* parameters:
1. `pos`: the position in the stack
* returns: the µVM value in the client type
* stack top: not changed
Convert a value in the stack to the client type. This applies for non-reference types including integers and floating point numbers.
The µVM may implement this as multiple functions/methods that best suits the Client language.
Example Java signatures:
* `public int ClientAgent#toInt(int pos)`
* `public long ClientAgent#toLong(int pos)`
* `public BigInteger ClientAgent#toBigInteger(int pos)`
* `public float ClientAgent#toFloat(int pos)`
* `public double ClientAgent#toDouble(int pos)`
## Popping
* message: `pop`
* parameters:
1. `num`: the number of values to pop
* returns: none
* stack top:
+ before: ..., `elem_1`, `elem_2`, ..., `elem_num`
+ after: ...
Pop `num` elements from the stack.
Example Java signature: `public void ClientAgent#pop(int num)`
## Memory Allocation
* message: `new`
* parameters:
1. `type`: the ID/name of the µVM type of the object
* returns: none
* stack top:
+ before: ...
+ after: ..., `ref`
Allocate an object of type `type` on the µVM heap and push the object reference on the stack.
Example Java signature: `public void ClientAgent#newObj(int typeID)`
* message: `new_hybrid`
* parameters:
1. `type`: the ID/name of the µVM type of the object
* returns: none
* stack top:
+ before: ..., `len`
+ after: ..., `ref`
Allocate an object of type `type`, which must be a `hybrid` type, on the µVM heap and push the object reference on the stack. The length of the variable part is `len`, which is any µVM integer types zero_extended to the machine word length.
Example Java signature: `public void ClientAgent#newHybridObj(int typeID)`
## Memory Access
* message: `load`
* parameters:
1. `memord`: the memory ordering
* returns: none
* stack top:
+ before: ..., `iref`
+ after: ..., `val`
Load from an internal reference `iref` on the stack and push the loaded value to the stack, using the `memord` memory ordering.
Example Java signature: `public void ClientAgent#load(MemoryOrder memOrd)`
* message: `store`
* parameters:
1. `memord`: the memory ordering
* returns: none
* stack top:
+ before: ..., `iref`, `new_val`
+ after: ...
Store `new_val` on the stack into an internal reference `iref` on the stack, using the `memord` memory ordering.
Example Java signature: `public void ClientAgent#load(MemoryOrder memOrd)`
## Stack and Thread operations
* message: `new_stack`
* parameters:
1. `nparams`: the number of parameters to the stack-bottom function
* returns: none
* stack top:
+ before: ..., `func`, `arg_1`, `arg_2`, ..., `arg_nparams`
+ after: ..., `stack`
Create a new stack using `func` as the stack-bottom function and `arg_x` as its arguments. Push the newly created `stack` value to the stack.
Example Java signature: `public void ClientAgent#newStack(int nParams)`
* message: `new_thread`
* parameters: none
* returns: none
* stack top:
+ before: ..., `stack`
+ after: ..., `thread`
Create a new thread which is initially bound to a stack `stack`. Push the thread value on the Client Agent stack. The new thread `thread` starts execution immediately.
Example Java signature: `public void ClientAgent#new_thread()`
## Other API functions
TODO: define them later
* `is_int`, `is_float`, `is_ref`, `is_iref`, ... `is_stack`, `is_thread`, `is_tagref64`
* `tagref64_is_int`, ... `tagref64_get_ref`, ..., `tagref64_set_fp`... : manipulate the `tagref64` type.
* `copy_value`, `remove_value`: manipulate the Client Agent stack.
* `extract_field`, `insert_field`: manipulate `struct` types
* `extract_element`, `insert_element`: manipulate `vector` types
* `get_iref`, `get_field_iref`, `get_elem_iref`, `shift_iref`, `get_fixed_part_iref`, `get_var_part_iref`: manipulating reference types.
* `get_current_stack`, `kill_stack`, `bind_thread_to_stack`: advanced thread/stack operations
* `get_active_func_version_id`, `get_current_instruction_id`, `dump_keepalive_variables`, `pop_frame`, `push_frame`: for OSR
# Known Issues
This API assumes a stack which can contain ANY µVM types that were applicable for SSA variables. This makes it a dynamically types. This is good for Lua because Lua is dynamic and has a small set of types (only nil, boolean, number, string, table, function, userdata, ...), all of which have similar sizes.
The main problem with the µVM is when there are µVM `struct` type values (especially large structs which, themselves, are bad to be represented as value rather than reference to heap object). Some corner cases include the "complex number" type which can be represented as a struct of two doubles. In any cases, extra type information must be kept for the stack to know the type of all of its elements.
I have to trust the µVM implementation to handle the dynamic typing efficiently.
spec-2https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/18How to represent merging of variables / Phi functions2018-06-28T23:11:32+10:00John ZhangHow to represent merging of variables / Phi functions*Created by: eliotmoss*
This is an update of a proposal from a year ago.
The current Mu definition of SSA-form has labels, branches to labels, and Phi functions just after labels.
I propose an alternative, which we might call "got...*Created by: eliotmoss*
This is an update of a proposal from a year ago.
The current Mu definition of SSA-form has labels, branches to labels, and Phi functions just after labels.
I propose an alternative, which we might call "goto with values". It is similar to continuation passing style except that all the continuations are statically defined (one for each label). This alternative would work as follows:
* Each label would have zero or more (SSA-variable : type) pairs. These SSA-variables are local to the block that starts at the label.
* Each branch label (both labels in a conditional branch, the single labels in an unconditional branch, etc.) would list locally visible values being "sent" to the branch target.
Example:
x = 3;
y = 25;
goto l(x, y);
...
l: (x2:int<32>,y2:int<8>)
This form makes it clear that the association ("assignment") of values to phi-variables has to happen as part of the control transfer. Phi-functions are simply one way of representing that, but they're not not a way that seems immediately helpful for code generation.
I further observe that the current definition apparently allows any value to be passed in to a Phi, but I think it should be restricted to global/constant SSA variables or SSA variables defined in the sending block. The new form perhaps makes that clear, not least because with it you can explicitly disallow referring to local SSA variables defined in other blocks.
Putting these another way, non-global SSA-variables have a scope only from where they are defined to the end of their block. Some may be defined at the label that start the block and others may be defined later, but all live values must be mentioned at branches and explicitly passed on.
Compared with traditional SSA form this may appear verbose. However it has at least a couple of advantages:
* A code generator or optimizer need not perform a live variable analysis to know what is live at a given point in the code.
* Longer live ranges are broken up into smaller ones, which may allow better register allocation, and which is very suited to a coalescing register allocator.
* This form appears simpler to deal with in a formal specification of Mu.https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/16Undefined vs Implementation-defined2018-06-28T23:11:32+10:00John ZhangUndefined vs Implementation-defined*Created by: wks*
NOTE: a higher-level discussion is in https://github.com/microvm/microvm-meta/issues/17
During the meeting in 23 September 2014, we talked about the difference between "undefined behaviour" and "implementation-defin...*Created by: wks*
NOTE: a higher-level discussion is in https://github.com/microvm/microvm-meta/issues/17
During the meeting in 23 September 2014, we talked about the difference between "undefined behaviour" and "implementation-defined behaviour".
# Background
Some operations in C as well as LLVM are undefined behaviours.
* Division by zero.
* Example: 42 / 0
* Overflow in signed division.
* Example: int a = -0x80000000; int b = a / -1;
* Shifting an integer by a number of bits greater than the length of the left-hand-side
* Example: int a = 42; int b = a << 32; int c = a << -1; assume int is 32-bit.
However, the machine instruction counterparts have defined behaviours in each and every architecture.
* Division by zero
* x86: IDIV, DIV: Divide-by-zero raises "divide error".
* ARMv7: SDIV, UDIV:
* ARMv7-A: Divide-by-zero always gets 0.
* ARMv7-R: Controlled by SCTLR.DZ, an "Undefined Instruction" exception may or may not be raised.
* A64: Divide-by-zero always gets 0
* division overflow
* x86: IDIV, DIV: If the result is not representable (positive too large, negative too small) by the corresponding type (signed or unsigned), then raises "divide error".
* ARMv7-A: SDIV and UDIV are optional. They may be implemented by software.
* ARMv7, A64: SDIV, UDIV: result is truncated to the number of bits of the corresponding type. No error is raised. So -0x80000000 / -1 == -0x80000000 when it is 32-bit.
* shifting:
* x86: SAL,SAR,SHL,SHR: the count operand is masked to 5 bits (32-bit integer) or 6 bits (64-bit integer)
* ARMv7:
* LSL, LSR, ASR (immediate): It can only encode 5 bits of shift amount.
* LSL, LSR, ASR (register): The shift mount register is masked to 8 bits. After shifting, the last 32 bits are the result.
* A64:
* ASR, LSL, LSR (immediate): It can only encode 6 bits of shift amount.
* ASRV, LSLV, LSRV (register controlled): The shift amount is the second register modulo the register size (i.e. masked).
# Undefined behaviour vs implementation-defined behaviour
In C11:
+ **undefined behavior**: behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements
+ **implementation-defined behavior**: unspecified behavior where each implementation documents how the choice is made
Implementation-defined behaviour has an additional requirement that the implementation should document the behaviour. As long as the behaviour is documented, it is still considered "defined", but at a different layer.
If a behaviour is never defined, the higher level (e.g. the Client) has no chance to depend on it even if the lower level (e.g. the CPU) has a precisely defined behaviour. On the contrary, if the behaviour is implementation-defined, there are still ways for the higher level to use the low-level detail. The ways include (assume the higher level is the client, the middle level is the µVM and the lower level is the CPU):
+ The Client programmer read the µVM manual for the particular CPU.
+ The µVM provides compile-time-checkable flags and the Client is conditionally compiled. (`configure --with-uvm=x86-64`)
+ The µVM provides run-time-callable functions and the Client generates code conditionally (`if (uvm.wordSize() == 64) { emitStore64(reg,mem); }`).
# How should an abstraction layer be made over differences?
Undefined behaviours usually occur in the cases (other than errors) where different platforms behave differently. Division and shifting are two examples.
When creating an abstraction layer over such differences, there are basically three choices.
1. Define the different part as *undefined behavior* and the high-level user must avoid using them.
* C and LLVM takes this approach.
* The advantage is to make the specification very simple.
* The disadvantage is making it very difficult for the higher level to make efficient use of these cases since they must try everything to avoid those undefined behaviours.
2. Define the different part in one particular way and provide an implementation on every platform to behave like that.
* Java takes this approach.
* The advantage is the maximum portability of high-level code, since all programs work the same everywhere.
* The disadvantage is to make implementation and optimisation very difficult because the are too much invariants to maintain.
3. Define the different part as *platform-specific* and require the high-level to handle the difference.
* This will be the approach taken by the µVM.
* The advantage is to let the client make full use of the capabilities provided by the platform, resulting in efficient code. The µVM spec is also kept simple by pushing the differences to the implementation and the client.
* The disadvantage is adding more burden to the clients. The client must be aware of the difference between the µVM implementation on different platforms. The good thing is, the µVM still abstracts over the hardware, so the knowledge required by the client is limited to the µVM layer, not the hardware.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/15Summary of changes in the Type System and the Instruction Set2018-06-28T23:11:32+10:00John ZhangSummary of changes in the Type System and the Instruction Set*Created by: wks*
# Top-level IR
* Support new OSR mechanism #5
- Function version clause #5
# Type system
* Add vector type. #9
* (REMOVED) Add lock type #11
# Instruction set
* Add "exception clause" #6 #7 #10
...*Created by: wks*
# Top-level IR
* Support new OSR mechanism #5
- Function version clause #5
# Type system
* Add vector type. #9
* (REMOVED) Add lock type #11
# Instruction set
* Add "exception clause" #6 #7 #10
- Also merge instructions: `CALL`/`INVOKE`, `ICALL`/`IINVOKE`
* Basic operations allow vector types and new instructions for vectors #9
- BinOp, Cmp, Conv, `SELECT`, `PHI`, `CALL`, `RET`, `LOAD` (gather), `STORE` (scatter)
- `EXTRACTELEMENT`, `INSERTELEMENT`, `SHUFFLEVECTOR`
* Match the new stack states (`READY<T>` and friends) #7
- Revised `TRAP`/`WATCHPOINT` instruction #7
- Revised `SWAPSTACK` instruction #7
* Revised `ICALL` instruction #9 spec-2https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/14Document memory layout requirements2018-06-28T23:11:32+10:00John ZhangDocument memory layout requirements*Created by: wks*
Because of the difference between architectures, it is better to leave the object layout to be implementation-specific. An implementation of a µVM can choose its optimal strategy to make memory accesses as fast as poss...*Created by: wks*
Because of the difference between architectures, it is better to leave the object layout to be implementation-specific. An implementation of a µVM can choose its optimal strategy to make memory accesses as fast as possible.
However, although the memory layout, is part of the implementation, some guarantees must be made to enable:
1. implementation of object-oriented languages: the superclass-subclass relationship can be most conveniently implemented as the superclass being a prefix of a subclass structure.
2. interoperation between µVM programs and C programs: during a foreign function call, data structures are shared so that data can be passed both ways.
3. some basic data structures must support atomic accesses. One of such type is references.
# Prefix rules
Some type **is a prefix of** another type. If T1 is a prefix of T2, then there are **shared components** between T1 and T2. A component can be the whole value or some part of it, including fields in structs, elements in arrays and vectors and both the fixed part and the variable part in hybrids.
Specifically:
* Any type is a prefix of itself.
+ All corresponding components are shared.
* `void` is trivially a prefix of any type.
+ No component is shared.
* `T1 = T` is a prefix of `T2 = struct <SEQ>` for any T where SEQ is a sequence of types beginning with T.
+ The whole T1 itself is a shared component with the first field in the struct T2.
* `T1 = T` is a prefix of `T2 = hybrid<T U>` for any T.
+ The whole T1 is a shared component with the fixed part in the hybrid T2.
* `T1 = T` is a prefix of `T2 = array<T n>` for any T if n >= 1.
+ The whole T1 is a shared component with the first element in array T2.
* For all types T1, T2 and T3, if T1 is a prefix of T3 and T3 is a prefix of T2, then T1 is a prefix of T2.
+ The shared component between T1 and T2 are their mutual shared components with T3.
Examples:
* `float` is a prefix of `struct<float double>`.
+ The first field is shared.
* `struct<@TIB_REF @LOCK @LENGTH_TYPE>` is a prefix of `hybrid<struct<@TIB_REF @LOCK @LENGTH_TYPE> int<8>>`.
+ The fixed part of the latter type is shared with the former type.
* `int<8>` is a prefix of `array <int<8> 100>`.
+ The first byte element of the latter is shared with the former.
* `@TIB_REF` is a prefix of `hybrid<struct<@TIB_REF @LOCK @LENGTH_TYPE> int<8>>`
+ There is an intermediate type `struct<@TIB_REF @LOCK @LENGTH_TYPE>` that bridges the "is a prefix of" relation.
If:
* There is a memory location M which represents data of type T2, and,
* T1 is a type and T1 is a prefix of T2, and,
* r1 is an `iref<T1>` and refers to memory location M1, and,
* r2 is an `iref<T2>` and refers to memory location M, and,
* the beginning of M and M1 are the same (i.e. the have the same address), and,
* rc1 is r1 or an internal reference derived from r1, and,
* rc2 is r2 or an internal reference derived from r2, and,
* rc1 and rc2 refer to a shared component between T1 and T2,
then rc1 and rc2 refer to the same memory location. This means the shared components can be accessed as if it is a field of a prefix. This allows treating a subclass as an instance of a superclass.
Related standard: C11
* 6.3.2-3: (**array --> pointer to first element**) ... an expression that has type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’ that points to the initial element of the array object and is not an lvalue. ...
* 6.7.2.1-15:(**pointer to struct <--> pointer to first field**) ... A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. ...
TODO: C does not explicitly allow the prefixing between the following structs where their sequences of elements has a "prefix" relation:
```
struct Foo { short a; int b; long c; };
struct Bar { short a; int b; long c; float d; double e; };
```
There may be a reason behind it.
# Object layout and C foreign function interface (FFI)
The object layout *should* follow the application binary interface (ABI) as much as possible because:
* The ABI is carefully designed by system programmers for performance. The µVM implementer needs a good reason why not to follow it.
* When a foreign function call to external C programs is needed, the µVM data structure should already be in the desired layout expected by the C programs.
The µVM only needs to guarantee some compatibility between µVM types and C types in the FFI. It is already documented in the [Instruction Set](https://github.com/microvm/microvm-spec/wiki/instruction-set), but needs to be double-checked.
# Basic data structures that needs atomic accesses
To guarantee memory safety, the µVM must not allow out-of-thin-air reference values or opaque values. Affected types are:
* ref
* iref
* weakref (loaded into SSA variables as ref)
* func
* thread
* stack
* tagref64 (may contain ref)
* futex (one word integer, loaded into SSA variables as plain `int<WORD_SIZE>`)
Storing internal references (`iref`) in the memory (heap or stack or global) is discouraged because of space inefficiency (no better way than encoding them as fat pointers). But if an implementation does allow putting `iref` in the memory, the accesses to them should be atomic. Other types than `iref` are too important not to be implemented as lock-free atomic types (all atomic accesses in µVM are lock-free).
An alternative to requiring `iref` to be atomic is to:
- document that accesses to `iref` in the memory is not atomic, and,
- require the client to compile all memory accesses to `iref` with locks, and,
- have significant performance penalty.
Since a fat pointer consists of two words: an object reference plus an in-object offset, some (I assume there are only very few of them) architectures may not provide atomic access to such a length.
spec-2https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/13Change the term "SSA Value" to "SSA Variable"2018-06-28T23:11:32+10:00John ZhangChange the term "SSA Value" to "SSA Variable"*Created by: wks*
The term "SSA Value" currently used in the µVM is confusing. People tend to think "value" means data value, that is, instances of types and are representable in binary. But what it actually means in the µVM is the **th...*Created by: wks*
The term "SSA Value" currently used in the µVM is confusing. People tend to think "value" means data value, that is, instances of types and are representable in binary. But what it actually means in the µVM is the **things** that can **generate** data value.
# The current conceptual model
The current hierarchy is:
* **SSA Value** (the union of all such things)
- **Global SSA Value** (defined globally, actually constants, written as `@a`, `@b`, `@c`, ...)
+ **Declared constants** (defined literally as constants)
+ **Global memory references** (internal references to the global memory, which never moves. Hence the reference is constant)
+ **Functions** (There is a constant "handle" for each function, which does not change even when a function is redefined. So it is effectively a constant.)
- **Local SSA Values** (defined locally, determined during program execution, written as `%a`, `%b`, `%c`, ...)
+ **Parameters** (passed to functions, determined at each call site)
+ **Instructions** (computations that generate data values, or void, the "unit" value. The resulting data value is determined every time the instruction is executed)
The other confusion is that currently "an instruction **is an** SSA Value". When an instruction uses another instruction as its parameter, like:
```
.const @a <int<64>> = 0x123456789abcdef0
%x = ADD <int<64>> @a @a
%y = SUB <int<64>> %x @a
```
In the above example, `@a`, `%x` and `%y` are all names of SSA Values. In the definition of `%y`, it means "`%y` **is** a SUB instruction and it takes the **computing result of** instruction `%x` and **the data value of** declared constant `@a` as arguments".
When the Client runs a trap handler, the names identify the TRAP instructions.
```
%trap0 = TRAP KEEPALIVE (%a %b %c %d ...)
```
Then the client will see "the `%trap0` instruction caused the trap".
The above model, though internally consistent, confused many people.
# The use in LLVM
* Value
- Argument
- BasicBlock
- InlineASM
- MDNode
- MDString
- User
+ Constant
* BlockAddress
* ConstantInt
* ConstantFP
* ConstantPointerNull
* ConstantStruct
* ConstantVector
* ConstantExpr
- BinaryConstantExpr
- CompareConstantExpr
- ...
* GlobalValue
- GlobalAlias
- GlobalObject
+ Function
+ GlobalVariable
+ Instruction
- BinaryOperator
- CmpInst
- PHINode
- CallInst
- UnaryOperation
+ LoadInst
+ AllocaInst
+ ...
- StoreInst
- TerminatorInst
+ BranchInst
+ ReturnInst
+ InvokeInst
+ ...
- ...
+ Operator
- ...
LLVM defines Value as (http://llvm.org/doxygen/classllvm_1_1Value.html):
> LLVM Value Representation.
>
> **This is a very important LLVM class. It is the base class of all values computed by a program that may be used as operands to other values.** Value is the super class of other important classes such as Instruction and Function. All Values have a Type. Type is not a subclass of Value. Some values can have a name and they belong to some Module. Setting the name on the Value automatically updates the module's symbol table.
>
> Every value has a "use list" that keeps track of which other Values are using this Value. A Value can also have an arbitrary number of ValueHandle objects that watch it and listen to RAUW and Destroy events. See llvm/IR/ValueHandle.h for details.
The µVM's hierarchy is a subtree of the LLVM's Value hierarchy. Similar with LLVM, instructions that does not return values (like ReturnInst, BranchInst, ...) are also a kind of "Value".
# Alternative notation
An SSA Variable is a thing that holds a data value. A data value is an instance of a type.
An SSA Variable has exactly one definition and the definition corresponds to exactly one SSA Variable. That definition can be a declaration of constant, global memory, function, parameter or instruction.
The revised hierarchy is:
* **SSA Variable**
- **Globally defined SSA Variable** (written as `@a`, `@b`, `@c`, ...)
+ defined by **Declared constants**
+ defined by **Global memory references**
+ defined by **Functions**
- **Locally defined SSA Variable** (written as `%a`, `%b`, `%c`, ...)
+ defined by **Parameters**
+ defined by **Instructions**
This does not change the hierarchy. In an implementation, it is practical to let class `Instruction` extend class `SSAValue`. In the documentation, we say, if `%a = ADD <int<64>> %b %c`, then:
* `%a` is an SSA Variable
* `%a` is defined by the `ADD <int<64>> %b %c`
* That instruction is bound to the SSA Variable `%a`
In this example:
```
.const @a <int<64>> = 0x123456789abcdef0
%x = ADD <int<64>> @a @a
%y = SUB <int<64>> %x @a
```
We say, `@a`, `%x` and `%y` are all SSA Variables. `%y` is **defined by** a SUB instruction. It takes the **value held by** the SSA Variables `%x` and `@a` as arguments".
In a trap handler in the Client for the following TRAP:
```
%trap0 = TRAP KEEPALIVE (%a %b %c %d ...)
```
The client will see "the TRAP instruction that **defines** `%trap0` caused the trap".
spec-2https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/12Document trampoline instruction2018-06-28T23:11:32+10:00John ZhangDocument trampoline instruction*Created by: mn200*
At meeting on 2 September 2014, we agreed that an LLVM-style trampoline primitive would probably be a good idea. Because
* the abstraction will allow generate of good machine code for varying architectures
* it ...*Created by: mn200*
At meeting on 2 September 2014, we agreed that an LLVM-style trampoline primitive would probably be a good idea. Because
* the abstraction will allow generate of good machine code for varying architectures
* it may be necessary to turn `fnptr * environment` into a single `fnptr` for the sake of FFI calls https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/10More instructions with abnormal control flow2018-06-28T23:11:32+10:00John ZhangMore instructions with abnormal control flow*Created by: wks*
There are a few instructions that may not always be executed successfully.
* TRAP, WATCHPOINT, SWAPSTACK, CALL/INVOKE, ICALL/IINVOKE: Some instructions expect exceptions. We already have multiple versions of instruc...*Created by: wks*
There are a few instructions that may not always be executed successfully.
* TRAP, WATCHPOINT, SWAPSTACK, CALL/INVOKE, ICALL/IINVOKE: Some instructions expect exceptions. We already have multiple versions of instructions for those with or without exceptions.
* BinOp: division-by-zero and floating-point exceptions are generated by the hardware, but many programming languages have defined behaviour on those cases (e.g. Java throws language-level exceptions). Some (e.g. FP exceptions) cannot be prevented easily by checking before doing operation
* NEW/NEWHYBRID/ALLOCA/ALLOCAHYBRID: out-of-memory error
* Function calls and the NEWSTACK instruction may call declared but undefined functions.
* Function calls may result in out-of-memory errors.
* LOAD/STORE/CMPXCHG/ATOMICRMW: they may access invalid memory locations, including the `NULL` reference and invalid addresses introduced by array element addressing. The hardware cannot cover all illegal cases, but it may be possible to let the hardware handle common errors (like `NULL` reference dereferencing). See Issue #6
* All intrinsic functions that may have exceptions (covered by the IINVOKE instruction).
# The proposal
Change the grammar so that all such instructions take an optional "exception clause". This generalises the handling of many instructions. Existing instruction pairs that distinguish the awareness of exceptions (CALL/INVOKE, ICALL/IINVOKE) can be merged. The identification of "basic block terminator" instructions now depends on the presence of such clause, not just by the opcode.
Example:
Do not handle exception locally:
```
%entry:
%rv = CALL <@sig> @func (%arg) // continue with the next instruction, or unwind the current frame
%b = ADD <@i64> %rv 1
...
```
Handle exception locally:
```
%entry:
%rv = CALL <@sig> @func (%arg) EXC(%nor %exc) // terminates the current basic block.
%nor:
... // normal destination
%exc:
%e = LANDINGPAD // exceptional destination
...
```
Even simple divisions may go wrong:
```
%entry:
%result = SDIV <@i64> %a %b EXC(%nor %exc)
%nor:
... // continue normally
%exc:
THROW @DivisionByZeroException
```
NOTE: The µVM will not create new exceptions for the client. If the exceptional case is not because of an exception being thrown from a function or another stack (by swap-stack), then it is an error. Errors are undefined behaviours. The following code is fatal when `%b` is zero:
```
%entry:
%result = SDIV <@i64> %a %b // undefined behaviour if %b is zero.
```
# About factored CFG
Factored CFG is used by the JikesRVM. It allows a basic block to have multiple exceptional side-exits, breaking the single-entry single-exit property. Although that will reduce the number of exceptional branches, the traditional CFG is better-understood and is still capable to express all control flows with additional verbosity.
spec-2https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/8Next milestone2018-06-28T23:11:32+10:00John ZhangNext milestone*Created by: wks*
As we already found many problems in the current µVM design, many changes can be done to improve it. Here is a list of issues to be addressed in the next µVM specification.
- [X] #6 #10 : More instructions can resu...*Created by: wks*
As we already found many problems in the current µVM design, many changes can be done to improve it. Here is a list of issues to be addressed in the next µVM specification.
- [X] #6 #10 : More instructions can result in abnormal control flows. This should be reflected in the instruction set.
- Stick to the standard control flow graph, not factored control flow graph.
- [X] #4 : Use C++11 memory model.
- [X] #7 : Update the swap-stack API.
- [X] #5 : Update the OSR API.
- [X] #9 : Support vector instructions.
- [X] Provide mechanisms (futex) to support blocking locks and other thread synchronisation primitives.
In case there are someone who wants to play with the reference implementation, the current implementation will be branched and the current specification will be forked into another repository.
spec-2https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/73License2018-06-28T23:11:33+10:00John ZhangLicenseWe have decided to use the Apache 2.0 license for all our code.
The following two commits put forward a draft for the license file.
verbatim: mu/mu-perf-benchmarks@47045501
added ANU copyright: mu/mu-perf-benchmarks@76b1b7ca
If...We have decided to use the Apache 2.0 license for all our code.
The following two commits put forward a draft for the license file.
verbatim: mu/mu-perf-benchmarks@47045501
added ANU copyright: mu/mu-perf-benchmarks@76b1b7ca
If there is no problem, then I will merge the branch into master, and you can all put a copy of the LICENSE file into your project.https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/72Alternative serialisable format (such as JSON/YAML/XML/...)2018-06-28T23:11:33+10:00Kunshan WangAlternative serialisable format (such as JSON/YAML/XML/...)I am glad to see the [mu-tool-compiler](https://gitlab.anu.edu.au/mu/mu-tool-compiler) project existing.
I have conjectured having an alternative serialisable and human-readable format to the current text-based IR. In fact, the text-ba...I am glad to see the [mu-tool-compiler](https://gitlab.anu.edu.au/mu/mu-tool-compiler) project existing.
I have conjectured having an alternative serialisable and human-readable format to the current text-based IR. In fact, the text-based Mu IR is a thing that I am unhappy with. It has various problems.
- It requires a dedicated parser, which has to be implemented by hand.
- When new features are added, the grammar changes, and the parser needs to be modified.
- The text-based IR is confined by aesthetic considerations, and has many inconsistencies. For example:
- The reason why `.funcdef ... <@sig>` has a signature is because it also works as a syntax sugar, using which a human writer only needs to write a `.funcdef` to create both a function and its first version.
- As a convention, types and signatures in Mu instructions are in angular brackets, such as `ADD <@i32> %x %y`. But instructions may have more than types and signatures. One example is `GETFIELDIREF`. It has a integer literal argument. But the current form `GETFIELDIREF <@type 3> %ref` is ugly. The number `3` looks out of place.
I suggest there should be a Mu IR format in a well-known structured data format, such as JSON, YAML, XML, and so on.
Related work:
- LLVM yaml2obj: http://llvm.org/docs/yaml2obj.html
Potential advantages:
- There are mature open-source parsers available.
- Easy to extend.
- Easy to specify (in mu-spec).
For example, if we want to add an externally-usable symbol to an exposed function, we only need to add a property, not redesigning the grammar:
```yaml
name: foo
func: func
callconv: DEFAULT
cookie: cookie
symbol: externally_visible_symbol # This is an added property
```
It is easy to specify because we can define the IR as an (abstract) object tree with properties, similar to [how the HTML5 DOM is defined](https://html.spec.whatwg.org/multipage/dom.html#elements-in-the-dom).
There are also potential disadvantages:
- More verbose
- Less human-readable than the current text form, but human readability should not be the primary concern.
XML example:
```xml
<bundle>
<type id="i8" ctor="int" length="8" /> <!-- note: XML ID is actually a name -->
<type id="i32" ctor="int" length="32" />
<type id="i64" ctor="int" length="64" />
<type id="pi8" ctor="uptr" type="i8" />
<type id="ppi8" ctor="uptr" type="pi8" />
<type id="refi32" kind="ref" type="i32" />
<funcsig id="mainsig" />
<paramty type="i32" />
<paramty type="ppi8" />
<retty type="i32" />
</funcsig>
<const id="I32_42" type="i32" value="42" />
<const id="I64_0" type="i64" value="0" />
<global id="errno" type="i32" />
<funcdecl id="main" sig="mainsig" />
<funcdef func="main" />
<bb lname="entry"> <!-- lname = local name -->
<param type="i32" lname="argc" />
<param type="ppi8" lname="argv" />
<inst opcode="ADD" flags="V" type="i32" opnd1="%argc" opnd2="@I32_42">
<result lname="res" />
<result lname="ovf" />
</inst>
<inst opcode="CALL" sig="some_sig" callee="some_callee">
<arg val="argc" />
<result lname="r1" />
<nor-dest name="bb2">
<pass-value val="r1" />
</nor-dest>
<exc-dest name="bb3" />
</inst>
<inst opcode="SWAPSTACK" swappee="%some_hypothetic_stack">
<return-with>
<result type="i32" lname="ss_res1" />
<result type="i32" lname="ss_res2" />
</return-with>
<pass-values>
<pass-value type="i32" val="%res" />
<pass-value type="i32" val="%r1" />
</pass-valuse>
</inst>
<!-- more instructions here -->
</bb>
<bb lname="bb2">
<param type="i32" lname="r1" />
<!-- more instructions here -->
</bb>
<bb lname="bb3">
<exc-param lname="exc" />
<!-- more instructions here -->
</bb>
</funcdef>
<expose id="exposed_main" symbol="c_callable_symbol_of_exposed_main"
func="main" callconv="DEFAULT" cookie="@I64_0" />
</bundle>
```
A YAML example:
```yaml
types:
- name: i8
ctor: int
length: 8
- {name: "i32", ctor: "int", length: 32}
- {name: "i64", ctor: "int", length: 64}
- {name: "double", ctor: "double"}
function_signatures:
- name: "mainsig"
paramtys: ["i32", "ppi8"]
rettys: ["i32"]
constants:
- {name: "I32_42", type: "i32", value: 42}
- {name: "I64_0", type: "i64", value: 0}
- {name: "D_0", type: "double", value: 0.0}
- name: "D_NAN"
type: "double"
value_from_int: 0x7ff0000000000001
globals:
- {name: "errno", type: "i32"}
functions:
- name: "main"
sig: "main_sig"
initial_version:
- bbname: "entry"
params:
- {type: "i32", lname: "argc"}
- {type: "ppi8", lname: "argv"}
insts:
- {opcode: "ADD", flags: "V", type: "i32", opnd1: "%argc", opnd2: "@I32_42",
results: ["res", "ovf"]}
- opcode: "CALL"
sig: "some_sig"
callee: "some_callee"
args: ["%argc"]
results: ["r1"]
nor_dest:
bb: "bb2"
pass_values: ["%r1"]
exc_dest:
bb: "bb3"
- opcode: "SWAPSTACK"
swappee: "%some_hypothetic_stack"
ret_with:
- {type: "i32", lname: "ss_res1"}
- {type: "i32", lname: "ss_res2"}
pass_value:
- {type: "i32", val: "%res"}
- {type: "i32", val: "%r1"}
# more instructions here
- bbname: "bb2"
params:
- {type: "i32", lname: "r1"}
insts:
# more instructions here
- bbname: "bb2"
excparam: "exc"
insts:
# more instructions here
exposed_functions:
- name: "exposed_main"
symbol: "c_callable_symbol_of_exposed_main"
func: "main"
callconv: "DEFAULT"
cookie: "@I64_0"
```
LISP:
```lisp
(type i8 int 8)
(type i32 int 32)
(type i64 int 64)
(type pi8 ptr i8)
(type ppi8 ptr pi8)
(funcsig mainsig (i32 ppi8) (i32))
(const I32_42 i32 42)
(const I64_0 i64 0)
(global errno i32)
(funcdecl main main_sig)
(funcdef main.v1 main
(bb entry ((i32 argc) (ppi8 argv))
(ADD i32 %argc @I32_42 res
((C carry)
(V ovf)
))
(CALL some_sig some_callee (%argc) (r1)
((bb2 (%r1)) (bb3)))
(SWAPSTACK %some_hypothetic_stack
(ret-with ((i32 ss_res1)
(i64 ss_res2)))
(pass-values ((i32 %res)
(i32 %r1))))
(bb bb2 ((i32 r1))
# More instructions here
)
(bb bb2 (exc)
# More instructions here
)
(expose exposed_main main DEFAULT @I64_0
((symbol "c_callable_symbol_of_exposed_main")))
```https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/70Debugging facilities for Mu clients2018-06-28T23:11:33+10:00Zixian CaiDebugging facilities for Mu clientshttps://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/69Comparing ref<T> against ref<U>2018-06-28T23:11:33+10:00Kunshan WangComparing ref<T> against ref<U># The problem
Currently the cmpOp instructions take only one type parameter, and both operands must have the same type.
```
%result1 = EQ <int<32>> %a1 %b1
%result2 = EQ <int<64>> %a2 %b2
%result3 = FEQ <float> %a3 %b3
%result...# The problem
Currently the cmpOp instructions take only one type parameter, and both operands must have the same type.
```
%result1 = EQ <int<32>> %a1 %b1
%result2 = EQ <int<64>> %a2 %b2
%result3 = FEQ <float> %a3 %b3
%result4 = EQ <ref<T>> %a4 %b4
%result5 = EQ <iref<T>> %a5 %b5
%result6 = EQ <funcref<sig>> %a6 %b6
%result7 = EQ <uptr<T>> %a7 %b7
%result8 = EQ <ufuncptr<sig>> %a8 %b8
```
In object-oriented programming, we sometimes want to compare a `ref<T>` against `ref<U>`, where T is a superclass of U. Mu does not know OOP, but Mu has the [prefix rule](https://gitlab.anu.edu.au/mu/mu-spec/blob/master/memory.rst#prefix-rule) so that a value of type `ref<T>` can actually refer to an object of type `U` as long as `T` is a prefix of `U`. This allows OOP to be implemented in the Mu type system.
For this reason, comparing `ref<T>` against `ref<U>` for equality is meaningful: The result is true iff both references refer to the same object, or both NULL. The semantics of `EQ` is actually [defined as so in the spec](https://gitlab.anu.edu.au/mu/mu-spec/blob/master/instruction-set.rst#comparison), but requires both operands to have the `ref<T>` type.
```
// Assume %a is ref<T> and %b is ref<U>
%result = EQ <ref<T>> %a %b // Disallowed in the spec because %b is a ref<U>. But the refimpl does not check the type parameter, so it works for now.
```
To work-around this problem, the client can use `REFCAST` to cast both operands to the same type (such as `ref<void>`) before comparing:
```
// Assume %a is ref<T> and %b is ref<U>
%aa = REFCAST <ref<T> ref<void>> %a
%bb = REFCAST <ref<U> ref<void>> %b
%result = EQ <ref<void>> %aa %bb
```
This would potentially make the Mu instruction stream very verbose.
# Simplest solution
The simplest work-around is to let REFCAST ignore the type parameter when comparing between two `ref`, `iref`, `funcref`, `uptr` or `ufuncptr` values.
The code will look like:
```
// Assume %a is ref<T> and %b is ref<U>
%result = EQ <ref<Blah>> %a %b // The micro VM ignores the Blah
```
This will elide the two `REFCAST` instructions. Similarly, when comparing `iref<T>` and `uptr<T>`, `T` is ignored; it also disregards the `sig` signature in `funcref<sig>` and `ufuncptr<sig>`. If this behaviour is *standardised*, the client can rely on this and emit less instructions.
## What the micro VM sees
When the micro VM sees this instruction: `EQ <ref<Blah>> %a %b`, the compiler knows that both `%a` and `%b` are `ref` of something, but does not know what `%a` and `%b` refers to (the "Blah" can just be a lie). As long as refs are always represented in the same way (such as represented as pointers to the beginning of the object, but may be moved by the GC), the compiler can still generate code without knowing the object type. **The compiler cares about the storage type**, not the high-level parameterised type.
## Potential side effects (unlikely)
This will require all `ref<_>` types to have the same representation (sizes, as pointer or as handle) regardless of the type parameter. It prevents the possibility that "`ref<T>` and `ref<U>` may have different sizes. But I don't think implementing different refs in different sizes would be useful.
# A more aggressive design
We can push it further by removing all type and signature parameters in `ref<T>`, `iref<T>`, `funcref<sig>`, `uptr<T>` and `ufuncptr<sig>`, so they become simply `ref`, `iref`, `funcref`, `uptr` and `ufuncptr`.
To compensate the lack of the knowledge about the referent type, instructions must be annotated with the referent types. But the micro VM only needs to know the referent type when doing pointer arithmetics (GETFIELDIREF...) and memory access (LOAD, STORE, ...). For example:
```
// Assume %a is an iref to T, and T is a struct
%b = GETFIELDIREF <T 3> %a
// Assume %c is an iref to int<64>
%v = LOAD <int<64>> %c
```
Actually this is *the same as the current Mu IR*. The type annotations on the instructions are intended to ease the job of the Mu-to-machine compiler inside the micro VM.
By discarding the type parameters, REFCAST will be unnecessary, and PTRCAST only casts between pointers and integers, but not between pointers.
But the Mu IR programs themselves will carry less information about the destination of refs/uptrs. It may make the behaviour of the program harder to reason about. But since the client can perform REFCAST at any time, it can always choose to cast all refs to `ref<void>`, and still write correct programs.
It is unlikely that we will adopt this aggressive design soon, but may be considered if we redesign the IR.
# Comparing ref against ptr
A related topic is whether it should be allowed to compare `ref` against `ptr`.
The obvious answer is "no". `ref` and `ptr` (as well as `funcref`) do not have the same storage type. `ref` can be represented as the address to the beginning of the object, and may be modified by the GC when the object is moved. `ref` can also be represented as a handle, or as a pair of <addr, type>. On the other hand, `ptr` must be treated as raw addresses. Even if `ref` is represented as address, consider an extreme case where we have a micro VM that performs GC between every pair of instructions, and moves every object at that time. It is a valid micro VM implementatoin, but the address of any `ref` is totally non-determinestic.
In some VMs (such as JikesRVM), there are VMMagics that allows getting the address from an object reference, or converting an address into an object reference. In this way, the GC can be implemented in the same language as the language it is serving. However, the `addr->objref` and `obj->addr` conversions alone are not enoug. Such VMs must also have mechanisms to specify **uninterruptable** regions in which GC must not happen. If the GC is concurrent, there must be other mechanisms to handle this gracefully. But all "magics" are closely related to the concrete (micro)VM implementation, and should be kept private.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/68Duration of Object Pinning2018-06-28T23:11:33+10:00Kunshan WangDuration of Object PinningI am just being pedantic about the semantics of object pinning. The current "object pinning" is vague w.r.t. the duration of pinning.
Now that we define global cells as always pinned [to support relocations for raw pointers](https://g...I am just being pedantic about the semantics of object pinning. The current "object pinning" is vague w.r.t. the duration of pinning.
Now that we define global cells as always pinned [to support relocations for raw pointers](https://gitlab.anu.edu.au/mu/general-issue-tracker/issues/60), there are two different scopes of pinning. An object is pinned iff
1. it is a global cell, or
2. it is in the "pin set" of a thread.
The purpose of the so-called "pin set" is to make pinning operations locally scoped: It is like a per-thread reference-count rather than global referenece-count. If copying GC never happens, threads only need to modify local thread states rather than global states.
Consequently, the thread state of one thread may not be visible (or consistent w.r.t. concurrency) to other threads, hence the phrase "is in the pin set" is vague: whether a memory location is pinned depends on the observer.
But there is one guarantee the micro VM must make:
**During the time when a memory location is pinned, its address must not change**, at least as observed by the same Mu thread which executes C functions while pinned. So naturally we can define that **the thread that pins the location must not observe the address changed between its own pinning and unpinning**. So if two threads independently pin and then unpin the same Mu object concurrently without any synchronisation, they may observe different addresses, because their durations may not overlap, and GC may happen in between. If the two threads do not communicate and their C functions do not save the pointers, it should just work even if the object is moved while **not** pinned.
But more interesting questions may arise if we consider inter-thread communication: If
1. one thread T1 pins an object O1, then
2. sends the address of O1 to another thread O2, then
3. O2 pins the same object O2, then sends a message to T1, then
4. T1 unpins O1, and
5. T2 independently unpins O1
Then **should O1 have constant address since T1 pins O1 until both T1 and T2 unpins O1**? If we interpret "sending a message" as "forming a happens-before relation", then the whole process looks pretty sequential.
We can use the "happens-before" relation to define the duration of pinning, so that the pinning/unpinning operations from different threads can chain up. Then some Mu objects may have a very long duration of pinning, during which it has constant address. This is not a problem, at least not more problematic than one single thread pinning an object for a really long time. We just need to precisely define the duration of pinning so that the client can depend on it.
I still don't know how to precisely express it. This is definitely trickier than the visibility rules for LOAD/STORE operations because this time it is about duration rather than just a value. The easiest model, of course, is to make pinning/unpinning sequentially consistent, but I wonder if it would require excessive fencing to prevent weird behaviours in something like:
1. T1 pins O1 and sends a message to T2, then
2. T2 pins O2 and sends a message to T3, then
3. the programmer thinks T3 should see O1 being pinned, but observed otherwise.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/67C thread becoming Mu thread (exposed functions, a.k.a. ".expfunc")2018-06-28T23:11:33+10:00Kunshan WangC thread becoming Mu thread (exposed functions, a.k.a. ".expfunc")This issue is about calling Mu functions from C functions. It is not a problem if Mu initiated the call to native program and then it calls back. But when a fresh native thread (such as created by `pthread_create`) directly calls a Mu fu...This issue is about calling Mu functions from C functions. It is not a problem if Mu initiated the call to native program and then it calls back. But when a fresh native thread (such as created by `pthread_create`) directly calls a Mu function, thread-local states (such as GC states) must have been initialised, or the Mu program will not work properly.
Related spec: https://gitlab.anu.edu.au/mu/mu-spec/blob/master/native-interface.rst#native-functions-calling-mu-functions
Previous issue: https://gitlab.anu.edu.au/mu/general-issue-tracker/issues/39
# The problem
When a Mu thread is executing, there are thread-local states that needs to exist to support the execution of Mu IR programs.
For example, if the Mu IR program uses bump-pointer GC, the "current pointer" is a per-thread state, and it should point to the next available memory all the time. Mu instructions (such as `NEW` and `NEWHYBRID`) assumes such thread-local pointers are set up when such instructions are executed.
Such states are usually set up when a Mu thread is created. When a thread is created using the `NEWTHREAD` instruction or its equivalent API, the micro VM will initialise the states properly.
But the problem arises when the thread is created natively (for example, by `pthread_create`). Such **POSIX functions are not designed with Mu in mind** and will not initialise Mu-specific states. So a PThread cannot call Mu directly call a Mu function unless some preparation is done.
# Current design
Related spec: https://gitlab.anu.edu.au/mu/mu-spec/blob/master/native-interface.rst#native-functions-calling-mu-functions
The current Mu spec requires **implementation-defined** functions to be called before native threads not created by Mu (such as POSIX threads) can call any exposed Mu functions.
A Mu bundle can define `.expfunc` top-level definitions to directly expose pointers to C programs. For example:
```
.funcdef @fac ... {...}
.expfunc @fac_native = @fac #DEFAULT @I64_0 // expose @fac, default calling convention, use 0 as "cookie".
```
`@fac_native` is a raw function pointer which can be **called back** when Mu calls C and then C calls back to Mu. But when PThread wants to call `@fac_native`, it needs implementation-defined set-up.
## Possible implementations
* The concrete micro VM can forbid such calls, and enforce that only Mu threads can execute Mu functions.
* The concrete micro VM can extend the API with a function to attach or detach PThreads, or threads using other APIs.
* The concrete micro VM can create Mu-specific thread-local states lazily when entering from native to Mu. Since the only way to enter Mu is via "exposed functions", hence stubs can be created at those "expfuncs" to lazily check for such states, or use SIGSEGV to trap when such pointers are zero.
Each has its own strength and weakness. This is why this interface is still implementation-defined for now. Real-world experiences will tell which method is better.
## Multiple micro VMs in the same process?
It is rare that there will be one process running two micro VMs. But it is definitely possible. For example:
* A C host program provides both Python and Lua as extension languages (real-world applications exist), but both language implementations use the Mu micro VM.
* The client has some kind of sandbox mechanism and forces some part of the program to run in a separate micro VM.
# Related works
## JNI Invocation API
Related document: https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/invocation.html#attaching_to_the_vm
The JVM invocation API provides the `AttachCurrentThread` function to attach a PThread to a JVM, under the limitation that a native thread cannot be attached to two different JVMs. JNI also require that the PThread stack "should have enough stack space to perform a reasonable amount of work" and "The allocation of stack space per thread is operating system-specific. For example, using pthreads, the stack size can be specified in the pthread_attr_t argument to pthread_create.".
From Mu's point of view, the `MuCtx` structure holds Mu states for the client, so calling API functions in `MuCtx` does not need any attaching. However, calling "exposed Mu functions" will need special set-up like `AttachCurrentThread`.
## JikesRVM
JikesRVM's GC is designed in such a way that it will work even if the related thread-local data structure is all zero (as is initialised by the system). This gracefully avoided the problem related to GC. But it could not be the most general solution.
## .NET framework
Related documents: https://msdn.microsoft.com/en-us/library/74169f59(v=vs.110).aspx
VM-related thread-local states are created lazily when an unmanaged thread enters the managed runtime.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/66Are bundles the unit of compiling or the unit of loading?2018-06-28T23:11:33+10:00Kunshan WangAre bundles the unit of compiling or the unit of loading?# Two different views of "bundle"
From the compiler's point of view, compiling is a process where:
1. There are many modules (such as .class files), all of them needs to be compiled (to Mu IR, for example). Modules may have inter-d...# Two different views of "bundle"
From the compiler's point of view, compiling is a process where:
1. There are many modules (such as .class files), all of them needs to be compiled (to Mu IR, for example). Modules may have inter-dependencies, and there could even be mutual recursions (A imports B, B imports C, and C imports A).
2. Compilers should compile each module separately, with no knowledge of other modules. This implies each module compiles to a stand-alone bundle that has all the necessary things (types, functions, ...) used inside the bundle. Since types use "structural equivalence", it does not matter if two structurally isomorphic types are defined twice in two bundles. In particular, functions can be declared multiple times in different bundles.
3. When they are linked, bundles are merged. Different bundles should have no intersections, with one exception: Functions of the same name are resolved to be the same, so calling a declared but not defined function may ends up calling a function defined in another bundle. (Global cells should have similar properties, too.) We also allow calling functions that are declared but not defined in any other bundles. which triggers lazy loading.
In the current Mu design, the micro VM's view of bundles is:
1. There is a global bundle, which includes everything that is ever loaded at a point of time.
2. Bundle is the unit of loading. Bundles are loaded sequentially. (At least it is perceived to be so through the API. The Mu impl can load them in parallel while ensuring sequential consistency.)
3. Each bundle can refer to things (types, functions, ...) defined in the current bundle or the global bundle.
4. Every time a bundle is loaded, the contents (types, functions, ...) are merged into the global bundle. That is, *the is one single global bundle which gets gradually augmented as bundles are loaded*. Conflicts are not allowed. If a new function version (FuncVer) is defined on an existing function, this FuncVer becomes the "most recent version" of the function.
The main difference between the two views is whether we consider bundle loading to be a static, separated and parallel process, or a dynamic and sequentially inter-dependent one.
## Why is Mu designed like this?
The current Mu design is based on that (1) Mu is a run-time JIT compiler, and (2) Mu supports function re-definition. Because Mu is a run-time entity, it is one single thing that lives through the life of the application. It will observe all things the client ever deliver to it (bundles), and this is a temporal process. The micro VM always starts with no knowledge, and the client "teaches" the micro VM more and more knowledge by loading bundles. So the "global bundle" represents the "current knowledge" the micro VM has about the world (i.e. the types, functions, ... of the client's language). Since the growth of knowledge is a sequential process, it is natural to assume bundles are loaded in a sequence. In this way, if a later bundle refers to things the micro VM already knows (for example, types defined in previously loaded bundles), then it does not need to define/declare them again because Mu already knows it, so the bundle can just refer to them by name/ID. The sequential nature also makes it easy to support function re-definition. Since there is a sequence in bundles, a FuncVer in a newer bundle will replace the current "most recent" version in the global bundle.
The separate-compiling approach is the traditional and well-known way how the C compilers work. And it does not address function re-definition. Re-definition is still an "action" rather than a declaration, and the order of "which FuncVer invalidates which older FuncVer" does matter.
## What the client may want
But compiler (traditional C compiler or Mu client) writers may want a certain degree of flexibility of parallel compilation, and some aesthetic appeal that "**separate modules should be compiled to separate Mu bundles**". For example, as a JVM client, it will be more intuitive to generate one Mu IR bundle for each .class file, and each .class file can be compiled separately, and still allow lazy loading. For example:
```java
//// Foo.class
public class Foo {
public static void run() { Bar.run(); }
}
//// Bar.class
public class Bar {
public static void run() { Foo.run(); }
}
```
The separate-compilng model will deliver two Mu bundles:
```
//// Bundle1:
.typedef @Foo = ....
.funcdef @Foo.run VERSION %v1 ... {
...
CALL @Bar.run()
}
.funcdecl @Bar.run ... // Declare @Bar.run in Bundle1
//// Bundle2
.typedef @Bar = ....
.funcdef @Bar.run VERSION %v1 ... {
...
CALL @Foo.run()
}
.funcdecl @Foo.run ... // Declare @Foo.run in Bundle2
```
That is, `@Bar.run` is declared in Bundle1 and `@Foo.run` is declared in Bundle2. They declare functions in each other because neither has knowledge of the other.
However, in the current Mu model, the two bundles will look like:
```
//// Bundle1:
.typedef @Foo = ....
.funcdef @Foo.run VERSION %v1 ... {
...
CALL @Bar.run()
}
.funcdecl @Bar.run ... // Declare @Bar.run in Bundle1
//// Bundle2
.typedef @Bar = ....
.funcdef @Bar.run VERSION %v1 ... {
...
CALL @Foo.run()
}
```
The difference is subtle: Bundle2 does not declare `@Foo.run`, because it knows Bundle1 is loaded before it, and `@Foo.run` is already defined.
It is arguable that this will require two bundles to be built sequentially. But it can be worked around by "lifting" both declarations in to a third bundle:
```
//// Bundle0:
.funcdecl @Bar.run ... // Declare @Bar.run in Bundle1
.funcdecl @Foo.run ... // Declare @Foo.run in Bundle2
//// Bundle1:
.typedef @Foo = ....
.funcdef @Foo.run VERSION %v1 ... {
...
CALL @Bar.run()
}
//// Bundle2
.typedef @Bar = ....
.funcdef @Bar.run VERSION %v1 ... {
...
CALL @Foo.run()
}
```
Declaring functions is faster than defining. After Bundle 0 is loaded, Bundle 1 and Bundle 2 can be built and loaded in parallel.
It is also arguable that "lifting both declarations into a separate bundle" is a redundant step. But in practice, this step cannot be avoided. Still take Java as example. If one Java ClassLoader visits both Foo.class and Bar.class, then it already knows both classes, and it can simply build both into a single Mu bundle rather than splitting them into two. If two Java ClassLoaders attempt to load Foo and Bar in parallel, and they found the inter-dependency, but also found each other working on the two respective .class files simultaneously, then the ClassLoaders need certain synchronisation mechanism so that classes are not loaded twice. This is necessary even in existing non-Mu productional JVMs. So *if there are needs for compiling two Java classes in parallel and they have inter-dependencies, then the client has to factor out the common parts, which naturally leads to the "Bundle0"*.
An orthogonal issue is about the type system. Assume we have the two Java classes:
```java
class Foo { Bar bar; }
class Bar { Foo foo; }
```
Naturally `@Foo` should be `struct<@JavaHeader ref<@Bar>>`. However, without looking at bar.class, we cannot define the type `@Bar` which is supposed to match the structure of the Java class fields in Bar. So if we enforce lazy loading, then Foo.bar has to be represented as `ref<void>` rather than `ref<@Bar>`. This has been [discussed in a separate issue before](https://gitlab.anu.edu.au/mu/general-issue-tracker/issues/38). **The separate-compiling model does not solve this problem** because the crux is that the **knowledge** of Bar is only obtained by looking at Bar.class. Unlike declared-but-not-defined functions, having **types** that are not yet known (the C language calls it "incomplete type") will cause many problems. These types are inaccessible. If traps should be triggered when a type is used, it is hard to define what it means by "a type is used". If we define it as accessing an object that has that type, or simply performing BinOp on such types, then almost all instructions can trigger traps.
## Conclusion
In the end, we still believe the current Mu design is reasonable for its purpose as a JIT compiler.
The current "single global bundle" design is also easier for the boot image writer because there is only one bundle to consider.
But we may consider the needs of programming language implementers that "modular languages should be compiled to modular object code". The implication of adopting this model is still not clear. Alternatively, this model could also be implemented in a layer above Mu.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/65The "surface" IR as a layer above the formal IR2018-06-28T23:11:33+10:00Kunshan WangThe "surface" IR as a layer above the formal IR# Abstract
After the recent discussions about several alternative Mu IR forms, I realised the need for formal verification, the concrete micro VM implementation, and the application/client may be different. The IR, as a language, is t...# Abstract
After the recent discussions about several alternative Mu IR forms, I realised the need for formal verification, the concrete micro VM implementation, and the application/client may be different. The IR, as a language, is the means for the client to transfer programs to the micro VM, and this implies *compactness*, *efficiency* and a certain degree of *expressiveness* that does not hinder efficient implementation. Formal verification, on the other hand, will benefit from the IR designed for *functional* languages, *abstraction*, *consistency*, and *simplicity* with respect to an operational model.
This issue proposes a two-level model: a "surface" form primarily for client-µVM communication, and a "formal" form for formalisation. The "surface" form is strictly a syntax sugar of the "formal" form, and can be transformed statically and automatically to the latter when needed.
With the surface form "detached" but mappable to the formal form, the surface form can be designed more aggressively for code compactness, such as introducing side-exits in the middle of basic blocks. But I am not advocating making such aggressive design changes now.
# Different concerns
## Terminating Instructions and Code Bloating
Currently, in the "surface" form as described by the (informal) [Mu Spec](https://gitlab.anu.edu.au/mu/mu-spec/blob/master/instruction-set.rst#ssa-variables), some Mu IR instructions can be either a normal instruction in the middle of a basic block, or a terminal instruction that implies several destinations. The `CALL` instruction is a well-known example. A basic block may contain may `CALL` instructions in a sequence. When an exception is thrown, the default behaviour is "rethrow". While in the "formal" form as described in [uvm-formal-hol](https://gitlab.anu.edu.au/mu/mu-formal-hol/blob/master/uvmIRScript.sml#L370), `CALL` is always a terminating instruction, and both normal and exceptional destinations must be explicitly defined.
The "surface" form is designed with compactness and the machine behaviour in mind. Real-world programs will contain many function calls in a sequence. When no exceptions are thrown, each (both machine-level and language-level) call will simply fall-through to the next instruction, with the return values available to be accessed by subsequent instructions and function calls. The default "rethrow" behaviour is design so that when the current call site cannot catch exceptions, the stack-unwinder will simply skip the current function. It is *the fact that the call site does not handle the exception in most cases* that makes the code efficient.
A related topic is to add "side exits" in the middle of basic blocks, as is done by JikesRVM. This will remove the need to split basic blocks even when exceptions and "uncommon branch targets" are present and thus makes the IR more compact, but it will also make the control flow less explicit.
On the other hand, the "formal" form makes it easy to reason about all branches and all possible execution paths. If the IR is augmented with "NO_THROW" annotations, it can further make exceptions undefined behaviours, and relief the burdens of verification of some Mu functions.
Real-world programming languages, however, usually consistently choose one way or another, with "rethrow" more probable. Java, C#, Python and RPython, for example, always "rethrow", and does not provide the option of "nothrow". C++ allows the ["noexcept"](http://en.cppreference.com/w/cpp/language/noexcept) annotation on some C++ functions, and C++ exceptions can silently pass through any C functions on the stack, as long as the C functions are compiled with compatible compilers (such as g++/gcc), and the object files contain the appropriate unwinding information (.eh_frame in ELF).
But if future languages or existing languages with extensions (vmmagic) can make use of this feature, it has the potential to provide positive effects for the verifiability of carefully-written Mu functions.
## Declarative vs Operational
The current Mu IR is inspired by the LLVM IR. This IR describes an AST of functions and top-level declarations, such as constants (which, in LLVM's terminology, include both literals, global variables and functions). The SSA form is naturally a "dependency-description language": an instruction has many "uses" of other [Value](http://llvm.org/docs/doxygen/html/classllvm_1_1Value.html), and each "Value" can be anything that we can get a value from, that is, either constant or local variable. The compiler sees the dependency of instructions, such as "this `add` instruction depends on an instruction result and a ConstantInt". With a type hierarchy, the compiler will need to pattern-match against the kinds of Values, as is what a static code-transformer always has to do.
On the other hand, [mu-formal-hol](https://gitlab.anu.edu.au/mu/mu-formal-hol) describes the execution of a thread as a sequence of state transition (a kind of interpreting in the functional style). A thread has may registers, each holds the value of a local variable. The registers are modified as the side effect of execution:
1. When [entering a basic block](https://gitlab.anu.edu.au/mu/mu-formal-hol/blob/master/uvmThreadSemanticsScript.sml#L383), the basic block arguments are assigned to the registers of the parameters.
2. When [executing an instruction](https://gitlab.anu.edu.au/mu/mu-formal-hol/blob/master/uvmThreadSemanticsScript.sml#L270), the instruction will affect the values of some registers, and the register values are updated.
With a "Value" being either a constant or a register, the value needs to be pattern-matched against the two cases, namely constant or register, every time an instruction argument is evaluated.
A solution to this complexity is to introduce instructions that loads global vars into local vars. For example, GETCONST, GETGLOBALCELLIREF, GETFUNCREF and GETEXPFUNC. (I intentionally avoid using the word "load" in order to emphasise they are not memory operations, but merely aliasing.) This will make the IR semantically clearer and probably make proof easier, at the cost of making basic blocks more verbose. But fundamentally the two forms are equivalent.
# Where the Two Forms Reconcile
I propose splitting the IR into two layers, with a "surface" catering to compactness, and the "formal" form designed to be more consistent. There will be *a mapping from every "surface" IR bundle to a "formal" IR bundle*, where the two forms reconcile. The point is, for every "surface" bundle, there will be an equivalent "formal" bundle. So it will not compromise verifiability by introducing "unfriendly" syntaxes.
The mapping will "desugar" the "surface" form. There will be a conversion rule for every "surface" syntax that does not exist in the "formal" form. For example:
1. The "fall-through" call
```
%cur_bb(%v1 %v2 ... %vn):
%lv1 = ...
%lv2 = ...
...
CALL <@sig> @callee (...) EXC(
(%rv1 %rv2) = CALL <@sig> @callee (...)
%lv3 = next instruction...
...
```
will be desugared into:
```
%cur_bb(%v1 %v2 ... %vn):
%lv1 = ...
%lv2 = ...
CALL <@sig> @callee (...) EXC(
%generated_continue_block(%v1 %v2 ... %vn %lv1 %lv2 ... $0 $1)
%generated_rethrow_block())
%generated_continue_block(%v1 %v2 ... %vn %lv1 %lv2 .. %rv1 %rv2):
%lv3 = next instruction...
...
%generated_rethrow_block() [%the_exception]:
THROW %the_exception
```
2. References to global variables:
```
%a = ADD <@i32> %a @ONE
```
will be desugared into:
```
%_tmp = GETCONST @ONE
%a = ADD <@i32> %a %_tmp
```
## Other Implications
With the introduction of an additional form, the "surface" form can be more aggressive in design.
The "surface" form may go beyond the current "single-exit" form by introducing **side-exits**, such as:
- DIV by zero, CALL/TRAP/WATCHPOINT/SWAPSTACK with exceptions, NEW/ALLOCA failure, LOAD/STORE with NULL pointers, may take side exits rather than forcing the basic block to be split.
- "guard" instructions (as usually demanded by tracing JIT compilers) may be implemented as side-exiting conditional branches. This also implies that the "side-exit" is the slow path while the "fall-through" case is the common fast path.
Currently I fear that breaking the "single-exit" property may result in the micro VM still having to splitting them internally. LLVM, with its basic blocks not taking parameters, and its optimisers having lots of transforms to do, probably would keep the single-exit SSA form. But since Mu already adopted the "goto-with-values" form, whether "side-exits" should be introduced to the IR should depend on the experiences in the high-performance Mu implementation.
Related works:
- [B3](https://webkit.org/docs/b3/intermediate-representation.html): Apple's B3 still requires Jump/Branch/Switch to be at the end of basic blocks. The reason could be that B3 still uses the text-book SSA form, so non-merging control flow branches do not need PHI nodes, hence it is cheap to add basic blocks.
- RPython: Its transformers (including GC transformers and exception transformers) will split basic blocks for function calls with exceptions. But since RPython is static, code compactness may not be a concern.
- JikesRVM: JikesRVM uses Factored CFG (FCFG), where a Potential Excepting Instruction (PEI) does not necessarily end a basic block. As described [here](http://www.jikesrvm.org/JavaDoc/org/jikesrvm/compilers/opt/ir/ControlFlowGraph.html), FCFG will significantly reduce the number of basic blocks, but will complicate flow-sensitive global analysis. But given that Mu pushes most optimisations out of the micro VM, it is arguable that the micro VM back end may favour a simpler form.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/64New Symbolic IR building API2018-06-28T23:11:33+10:00Kunshan WangNew Symbolic IR building API# Introduction
Recent experiences from Yi shows it is difficult to implement the current bundle building API in some languages such as Rust with respect to *mutability* and *cyclic references* between IR nodes. We are proposing an alt...# Introduction
Recent experiences from Yi shows it is difficult to implement the current bundle building API in some languages such as Rust with respect to *mutability* and *cyclic references* between IR nodes. We are proposing an alternative IR building API. In the new API, **IR nodes will refer to other nodes by symbols (IDs and optionally names)**, rather than referring to the nodes directly. The advantages will be:
1. The client can build the IR nodes in any order.
2. The implementation language of *the micro VM itself* can keep symbolic references between IR nodes, which can be helpful if the language (such as Rust or Haskell) is sensitive to mutability and cycles.
The potential disadvantage is that it will require the micro VM to resolve symbolic reference between IR nodes during bundle building or after bundle loading. But this practice has been commonly used in the history of compiler construction (according to Tony). We expect the cost of micro VM-side symbol resolution to be acceptable, though the actual performance implication is not yet known.
TL;DR: Scroll down to "The Proposed New API"
# Background
The [high-performance Mu implementation](https://gitlab.anu.edu.au/mu/mu-impl-fast) is written in Rust.
**Rust does not like cycles.** Every Rust l-value is owned by another unique l-value. This forbids having cyclic reference between nodes. The moment the programmer attempt to form a cyclic reference, it will complain about ownership problems.
But the Mu IR often contain cycles. Such as:
```
.typedef @linkedlist = struct < @i64 @linkedlist_ref >
.typedef @i64 = int<64>
.typedef @linkedlist_ref = ref < @linkedlist >
```
Note the recursion on the `ref` type. The text form can express this recursion because each type node has a name: `@linkedlist`, `@i64` and `@linkedlist_ref`.
The current IR building API constructs nodes directly, so API functions cannot refer to nodes that will be created in the future. To support cycles, the API mutates IR nodes after a node is created:
```c
MuCtx *ctx = ...;
MuBundleNode b = ...;
// Create a type: ref<?>
MuTypeNode linkedlist_ref = ctx->new_type_ref(ctx, b)
MuTypeNode i64 = ctx->new_type_int(ctx, b, 64);
MuTypeNode members[] = {i64, linkedlist_ref};
MuTypeNode linkedlist = ctx->new_type_struct(ctx, b, members, 2);
// Set the ref<?> to ref<@linkedlist>
ctx->set_type_ref(ctx, b, linkedlist_ref, linkedlist);
```
Note that the IR nodes must be created in a particular order, with reference targets populated in the end. Such order always exists for all Mu IR bundles, because the Mu IR can only be recursive at certain spots. Constructing the IR nodes in the following order will always work: ref/uptr types -> other types and signatures -> populate ref/uptr types -> constants/globalCells/funcs/expfuncs -> funcvers -> basic blocks -> bb params -> insts/results -> branch destinations/exception clauses/keepalives/subclauses of NEWTHREAD and SWAPSTACK.
But **Rust does not like mutation, either**. Rust is designed for memory safety, especially trying to avoid data races in multi-threaded programs. So it is a well-known rule in Rust that *if anything is shared, it cannot be modified*. Most sharing mechanisms (&mut (mutably borrowed reference), Rc (refcount box), Arc (atomic refcount box)) share in a read-only mode, unless synchornisation mechanisms are also applied (such as Mutex).
This does not match the current Mu bundle building and loading design: currently, the bundles is mutable when being built, but becomes immutable once loaded. To implement such mutation, Rust needs to use `Cell<T>` -- "internal mutable fields" in immutable objects. And when an object is constructed but the field is not yet supplied (such as `ref`), Rust needs to use `Option<T>` to leave space for the `None` case. We would have `struct TypeRef { target: Cell<Option<Type>> }`, while when loaded, the `target` field is neither optional nor mutable.
## Other languages
No known programming languages can express the idea of "mutable while being constructed, immutable when used".
In Java, JavaBeans is a famous "mutable construction" pattern: a "bean" class provides a parameter-less constructor and many setters, so properties can be set via setters. It is useful for introspection-based object initialisation:
```java
class Foo {
public Foo() { }
private Bar bar;
public void setBar(Bar bar) { this.bar = bar; }
public Bar getBar() { return bar; }
}
class Bar {
public Bar() { }
private Foo foo;
public void setFoo(Foo foo) { this.foo = foo; }
public Foo getFoo() { return foo; }
}
```
This pattern allows multiple objects to be created, but their dependencies injected later. The Spring Framework is one such container:
```xml
<beans>
<bean id="theFoo" class="Foo">
<property name="bar" ref="theBar" />
</bean>
<bean id="theBar" class="Bar">
<property name="foo" ref="theFoo" />
</bean>
</beans>
```
Many Spring Framework objects claim to be "thread-safe *after* configured". It implies that they may not be ready to use *during* configuration. Such objects depend on the protocol the programmers know in order to work properly.
But this "JavaBeans" pattern is also criticised for leaving the object fields non-final, though none of the properties are supposed to be changed after setting. They claim the compiler will not be able to do certain optimisations given that the fields are still mutable and the public setters are still accessible.
Despite the criticism, constructors with immutable fields are known not being able to form object graphs with cycle. Scala is another programming language that promotes immutable objects, but also suffers from the inability to form cycles of immutable references directly. [One workaround](http://stackoverflow.com/questions/8374010/scala-circular-references-in-immutable-data-types) is to use lazy evaluation to resolve some reference edges later, but the underlying implementation still depends on JVM-level mutable fields.
```scala
class Element [T] (val value: T, p : => Element[T], n : => Element [T]) {
lazy val prev = p
lazy val next = n
}
```
Unfortunately, such workarounds do not exist in Rust.
## rustc: The Rust Compiler
rustc is the compiler of the Rust language. Internally, it has an LLVM-like CFG representation, and there are inter-references between instructions and basic blocks.
rustc separates the *references* to things and the *contents* of things. Take basic blocks for example. A CFG contains many BasicBlock:
```rust
// For information, only. May not match the actual source code.
struct CFG {
bbs: Map<BasicBlock, BasicBlockData>
}
struct BasicBlock(u32)
struct BasicBlockData {
instructions: ...
blahblah: ...
blahblahblah: ...
}
struct BranchInstruction {
destination: BasicBlock
}
```
In the snippet above, `BasicBlockData` actually holds the information about a basic block, while `BasicBlock` is just an alias of `u32`. Instructions, such as `BranchInstruction`, refer to the destination by `BasicBlock`, which is a symbolic reference, and does not own the basic block. The actual owner of `BasicBlockData` is the `bbs` field in `CFG`. Using this approach, the AST is still a tree from Rust's point of view. When the program needs information about a basic block, it can borrow the reference to `BasicBlockData` from `CFG.bbs` using the `BasicBlock` as the key.
It is true that this approach will require a redirection whenever accessing the information about a basic block. But if the `Map` is implemented as an array and `BasicBlock` holds the index, then the lookup can be just an extra ADD and a LOAD. This is probably the best we can get if we use Rust without also using its unsafe features.
```rust
let bb: BasicBlock = ...;
{
let bbi: &BasicBlockData = cfg.bbs[bb]; // Borrow BasicBlockData
// use bbi
} // BasicBlockData returned here.
```
Similarly, the high-performance micro VM can use this pattern to handle cyclic references between entities. All Mu types can be owned by the bundle, and one type can refer to another via indices into the array of "MuTypeData".
The implication of this is that the API should also refer to other IR nodes via symbols rather than actual constructed nodes.
# The Proposed New API
The new API will introduce another C-level struct: `struct MuIRBuilder`. Similar to the `MuCtx` which is used by only one client thread, the client must only use `MuIRBuilder` in one client thread, otherwise the micro VM will need to synchronise every method of it. This new struct is actually orthogonal to this topic. From the observation of the current bundle building API, the functions related to bundle building has no intersection with other functions in the `MuCtx` struct whose purpose is to mutate the running Mu state.
`struct MuIRBuilder` is also a function pointer table, like the current `MuCtx` design. It is an open topic if we adopt the traditional C approach -- having all methods as top-level C functions, but it is a separate issue.
```c
typedef struct MuIRBuilder MuIRBuilder;
struct MuIRBuilder {
void *header; // implementation-specific private field
MuID (*gen_sym)(MuIRBuilder *b, MuCString name);
void (*load)(MuIRBuilder *b);
void (*abort)(MuIRBuilder *b);
void (*new_type_int)(MuIRBuilder *b, MuID id, int length);
void (*new_type_ref)(MuIRBuilder *b, MuID id, MuID target);
void (*new_type_struct)(MuIRBuilder *b, MuID id, MuID fields[], MuArraySize nfields);
...
};
```
## gen_sym: use ID for everything
The `gen_sym` method creates a "Mu symbol", or just "sym" for short. A "sym" has a numerical ID and an optional string name. A "sym" identifies a node in the bundle. The ID is generated by the micro VM when `gen_sym` is called. `name` can be `NULL`. The generated ID is used to identify this "sym".
All other functions take `MuID` as parameter for the ID of the node it is creating, and use the `MuID` to refer to other nodes. The ID can be used as long as it is returned by `gen_sym`, and *may even be used before the actual thing is created*. For example, if we are creating the linked list type, the C client code will look like:
```c
MuIRBuilder *b = ...;
MuID linkedlist = b->gen_sym(b, "@linkedlist");
MuID i64 = b->gen_sym(b, NULL /* I don't care about its name. */);
MuID linkedlist_ref = b->gen_sym(b, "@linkedlist_ref");
MuID members[] = {i64, linkedlist_ref}; // Use i64 and linkedlist_ref before these types are defined.
b->new_type_struct(b, linkedlist, members, 2);
b->new_type_int(b, i64, 64);
b->new_type_ref(b, linkedlist_ref, linkedlist);
```
Note that `i64` is used by `new_type_struct` before `new_type_int` is called.
Also note that the `new_xxxx` functions return `void` rather than handles. Since nodes refer to each other by "sym", handles are no longer necessary.
## complex sub-structures
More things will become IR nodes, such as
- branching destinations (basic block + arguments)
- exception clauses
- keep-alive clauses
- sub-clauses in the NEWTHREAD and SWAPSTACK instructions (these two instructions are too complex to be created by one function)
For example:
```uir
(@x @y) = CALL <@sig> @callee (@v1 @v2 @v3)
EXC(
@bb1(@v4 @x @v5 @y @v6)
@bb2(@blah @blah @blah))
KEEPALIVES(@v1 @v3 @v5)
```
```c
MuID call_inst = b->gen_sym(b, "@the_OSR_introspectable_call_site");
MuID x = b->gen_sym(b, "@x");
MuID y = b->gen_sym(b, "@y");
MuID exc_clause = b->gen_sym(b, NULL /* I prefer not to give too many names */);
MuID nor_dest = b->gen_sym(b, NULL /* I prefer not to give too many names */);
MuID exc_dest = b->gen_sym(b, NULL /* I prefer not to give too many names */);
MuID keepalive_clause = b->gen_sym(b, NULL /* I prefer not to give too many names */);
MuID call_args[] = {v1, v2, v3};
MuID call_results[] = {x, y}; // Sorry Eliot
b->new_call_inst(b, call_inst,
call_results, 2, // SSA variables for return values
sig, callee,
call_args, 3, // arguments to the function
exc_clause,
keepalive_clause);
b->new_exc_clause(b, exc_clause, nor_dest, exc_dest);
MuID nor_args[] = {v4, x, v5, y, v6};
b->new_dest(b, nor_dest, bb1, nor_args, 5);
MuID exc_args[] = {blah, blah, blah};
b->new_dest(b, exc_dest, bb2, exc_args, 3);
MuID keepalive_vars[] = {v1, v3, v5};
b->new_keepalive_clause(b, keepalive_clause, keepalive_vars, 3);
```
It's quite some code, but it should be okay if the client doesn't construct bundles via this API by hand.
## Instruction results are passed in, too
Note that the "syms" of return values are passed in. They are SSA variables, too, and the instructions need to refer to their results. Most instructions will be created like "three-address instructions".
But most instructions has a known number of return values, such as:
```uir
%c = EQ <@i64> %a %b
```
```c
MuID i64, a, b = ...;
MuID cmp_inst = b->gen_sym(b, "@the_name_of_the_instruction_itself_for_tracing_and_debugging");
MuID c = b->gen_sym(b, "@func.ver.entry.c");
b->new_cmp(b, cmp_inst,
c, // the only return
MU_CMP_EQ, i64, a, b);
```
Some instructions (namely binOp (with zero/neg/ovf/carry flags), CALL, TRAP, WATCHPOINT, SWAPSTACK and
COMMINST) may have different number of results depending on the arguments. But since all instructions are "three-address", all results have to be passed in anyway.
## Put stuffs together
To prevent mutability, all instructions (themselves) are created in one step, while complex sub-components (such as exc-clause, keepalive-clause, ...) are created separately.
A basic block is created in one step, too. When creating basic blocks, instructions are passed in as parameters.
```uir
%bb1(<@T1> %p1 <@T2> %p2) [%exc_param]:
%x = [%inst1] ADD ...
%y = [%inst2] SUB ...
[%inst3] TRAP ...
```
```c
MuID bb1, p1, p2, exc_param inst1, inst2, inst3 = ... // gen_syms
b->new_binop(b, inst1, ...);
b->new_binop(b, inst2, ...);
b->new_trap(b, inst3, ...);
MuID bb1_param_tys[] = {T1, T2};
MuID bb1_params[] = {p1, p2};
MuID bb1_insts[] = {inst1, inst2, inst3};
b->new_bb(b, bb1,
bb1_param_tys, bb1_params, 2, // Two parameters
exc_param, // This block may be used to catch exceptions.
bb1_insts, 3 // Three instructions
);
```
The top-level is implicit. When the `MuIRBuilder->load` method is called, all top-level definitions (types, signatures, constants, global cells, functions, exposed functions) ever created will be part of the bundle. It should be reasonable to keep mutability at the very top level.
# Performance impact
It still needs to be observed from experiments.
Kunshan WangKunshan Wanghttps://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/63muapi.h: destinations2018-06-28T23:11:33+10:00Eliot Mossmuapi.h: destinationsWe revised the muapi.h interface recently around results of most instructions, removing add_result, and adding get_result and num_results. It seems that the situation around destinations for instructions such as BRANCH2 (just an example...We revised the muapi.h interface recently around results of most instructions, removing add_result, and adding get_result and num_results. It seems that the situation around destinations for instructions such as BRANCH2 (just an example) is similar. However, in this case I think it appropriate that the destination be built by the user of the API **before** making the call to create the BRANCH2, and the two handles on the destinations be passed in to the API function that creates the BRANCH2. I think this works for most branching things.
CALLs may be more problematic in that the destination may want to mention CALL results to pass on. However, a CALL node could support add_normal_dest (or set_normal_dest) and similarly for the exceptional destination.
But when we know the number of destinations, and the instruction itself is not producing more results that the destination can refer to, it seems cleaner to build the dests first and pass them in.John ZhangJohn Zhanghttps://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/62Possible statistics trivially implementable in the refimpl2018-06-28T23:11:34+10:00Kunshan WangPossible statistics trivially implementable in the refimplThese will provide more information about the characteristics of real-world Mu programs, and help implementers.
Dynamic recording:
* counting instructions executed (how many ADD, LOAD, BRANCH, ... executed)
* number of instructio...These will provide more information about the characteristics of real-world Mu programs, and help implementers.
Dynamic recording:
* counting instructions executed (how many ADD, LOAD, BRANCH, ... executed)
* number of instructions between branching (median)
* function call count
Static analysis:
* register pressure (static analysis)
- simultaneous live variables
Kunshan WangKunshan Wanghttps://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/61Unary operators2018-06-28T23:11:34+10:00Eliot MossUnary operatorsAs we work on the pypy JIT to Mu, we have found, to our surprise, that Mu lacks certain unary operators that most language use:
Integer negation: x -> -x
Bitwise complement/inversion: x -> ~x
Floating point negation: x -> -x
...As we work on the pypy JIT to Mu, we have found, to our surprise, that Mu lacks certain unary operators that most language use:
Integer negation: x -> -x
Bitwise complement/inversion: x -> ~x
Floating point negation: x -> -x
Is this intentional or an oversight? I will ask the students to code around
it using a binary operation as a constant, but it feels yucky ...Kunshan WangKunshan Wanghttps://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/60External linkage of uptr fields in the boot image2018-06-28T23:11:34+10:00Kunshan WangExternal linkage of uptr fields in the boot imageThis issue addresses the need that in addition to "external constants" (`.const @blah <@T> = EXTERN "blah"`), some **uptr fields in the Mu memory** needs to be initialised to the address of external symbols, too. This pattern exists in r...This issue addresses the need that in addition to "external constants" (`.const @blah <@T> = EXTERN "blah"`), some **uptr fields in the Mu memory** needs to be initialised to the address of external symbols, too. This pattern exists in regular C programs as well as PyPy which compiles to C.
However, there is a pure client-side solution which can relocate uptr fields without extending either the Mu IR or the API.
Another solution that adds only one API call can let the Mu boot image builder use the system linker/loader.
# Problem
In C, global variables can be initialised to constant values. The values can be literals, and can also be pointers to other global variables. In the latter case, the pointers are expressed in the form of `&symbol` where `symbol` is the name of the other global variable.
Consider the following C program:
```c
// Foo.c
struct Foo {
int v;
struct Bar *bar;
};
struct Bar {
double w;
};
struct Baz {
File *fp
};
struct Bar bar1 = { 3.14 };
struct Foo foo1 = {
42,
&bar1 // This field is initialised to the pointer to bar1 at link time.
};
struct Baz baz1 = {
&stdin // This field is initialised to the pointer to stdin at link time.
};
```
Both the `foo1.bar` and the `baz1.fp` fields are initialised to pointers. The former points to an object in the same compilation unit, while the latter points to a variable in the standard library.
However, the address of neither destinations can be determined at compile time nor link time.
1. Obviously, the address of `stdin` is determined only after `libc` is loaded.
2. The address of `bar1`, though only referenced within a .c file, is also indeterminate. The reason is that the current module (executable or .so), may be loaded at different memory addresses for each run. So it is not until the program is loaded could the run-time linker figure out their absolute addresses, and patch the `foo1.bar` field.
## Use in PyPy
PyPy uses some external libraries. One example is `libffi`.
`libffi` defines some global variables which the `libffi` users are supposed to use.
```c
// Excerpt from ffi.h
// libffi is an FFI implementation
// This struct describes a C data type, including both primitive types and structs.
// For structs, the *elements member points to an array of field types.
typedef struct _ffi_type
{
size_t size;
unsigned short alignment;
unsigned short type;
struct _ffi_type **elements;
} ffi_type;
// These are the descriptors of primitive C types.
FFI_EXTERN ffi_type ffi_type_void;
FFI_EXTERN ffi_type ffi_type_uint8;
FFI_EXTERN ffi_type ffi_type_sint8;
FFI_EXTERN ffi_type ffi_type_uint16;
FFI_EXTERN ffi_type ffi_type_sint16;
FFI_EXTERN ffi_type ffi_type_uint32;
FFI_EXTERN ffi_type ffi_type_sint32;
FFI_EXTERN ffi_type ffi_type_uint64;
FFI_EXTERN ffi_type ffi_type_sint64;
FFI_EXTERN ffi_type ffi_type_float;
FFI_EXTERN ffi_type ffi_type_double;
FFI_EXTERN ffi_type ffi_type_pointer;
```
`libffi` describes C data types using the `ffi_type` struct. Primitive types are pre-defined global variables. If the user wants to describe a custom C struct, it creates an instance of `ffi_type` and fills in the fields.
```c
// Suppose we want to describe this struct:
struct Foo { int a; char b; void* c; };
// We define an ffi_type instance:
/// First make an array of field types
ffi_type field_types[4] = { &ffi_type_int, &ffi_type_int8, &ffi_type_pointer, NULL };
/// Then describe struct Foo itself.
ffi_type foo_type = {
0, // will be initialised by libffi
0, // will be initialised by libffi
FFI_TYPE_STRUCT, // it means "Foo is a struct"
&field_types // "Foo has these fields"
};
```
Keep in mind that these data structures are **raw C data structures**.
PyPy, as a high-level language, will store the pointer to such structs into PyPy-level **heap objects** and use the pointer later. In RPython, heap objects in their "boot image" (the `pypy` executable) are global C variables. It will look like:
```c
struct pypy_path_to_module_SomePyPyObjectType object = {
GC_HEADER,
HASHCODE,
BLAHBLAH,
&foo_type // Untraced pointer to C global variable
};
```
At compile time (from RPython source code to C source code), the RPython toolchain describes objects in the boot image symbolically: structs are described field by field, and may contain pointers to other struct values. The toolchain also makes uses of the fact that RPython program eventually compiles to C. All of such struct values become global variables in C, *no matter whether they are GC-ed heap objects or not* (this also mean they are immortal). **This approach avoided the dynamic linking problem** because C source code can still refer to other global variables symbolically, whether they are traced or not, and off-loads the task of address resolution to the linker and the loader.
## Problem to handle this in Mu
Mu strictly distinguishes between traced references (`ref<T>`) and untraced pointers (`uptr<T>`). Mu treats `uptr<T>` as raw integer values and does not care about its destination. This means, in Mu, **untraced pointers are literally untraced**.
But the reason why the boot image builder work is that *the boot image builder can use the GC to trace all references in all heap objects and global cells (which are still scanned) and find the transitive closures*. The GC can find all reference fields and record references between objects. This object-reference graph can generate *relocation entries* which allows the loader to fix reference between heap objects.
So the boot image builder has no power to "trace" "untraced pointers" and find "which memory location contains a raw pointer to which untraced memory region". i.e. The boot image builder cannot express the following C structure:
```c
struct Foo {
int v;
struct Bar *bar;
};
struct Bar {
double w;
};
struct Bar bar1 = { 3.14 };
struct Foo foo1 = {
42,
&bar1 // Cannot express this, because it is UNTRACED pointer.
};
```
The reason is, the boot image builder takes the **value** held inside objects as the input, not their symbolic initialisers. The boot image builder sees the **current** address of `bar1`, but the boot image is relocatable, and the address will not be valid after loading.
# Solution
## Solution 1: Redesign the PyPy-level library, or the translation process.
The reason why PyPy needs such C structs is because it needs to call C functions that need them. Currently these C structs are expressed as "constant struct values", which are initialised at compile time. If the PyPy-side library were written with the fact that "raw pointer are not preserved across boot image building" in mind, such structs would have been created at run time rather than compile time, and there will be no need to preserve "pointers from one struct to another".
Alternatively, all untraced structs can be translated to C programs, compiled by conventional C compilers (GCC), and linked against the Mu program (pypy in this case) dynamically. Given that the program's purpose is to interact with native C programs, having extra C programs is not completely wrong, though not very elegant.
## Solution 2: Let the client reinvent the linker
This approach require the client to record a list of (iref, symbol) pairs. Each pair means: "Please fill the pointer field at this iref to the address of this symbol before running the `main` function". This is exactly what the system linker is doing. With the existing `.const @blah <@T> = EXTERN "blah"` external constant, the client only needs to generate an intialiser function which has a list of STORE instructions to update each iref. The list of irefs can be saved in a heap object which is held by a global cell and built into the boot image. As soon as this list is used, it can be GC-ed (just nullify the only global cell that holds reference to it).
## Solution 3: Let Mu support such relocation
There is just one API function need to be added:
```c
void (*add_ptr_reloc)(MuCtx *ctx, MuIRefValue field, const char *symbol);
```
`field` is an IRef to a memory location (heap object or global cell) of `uptr<T>` or `ufuncptr<sig>` type. This function, when called, will add a relocation entry to the running micro VM. It has no effect on the running VM. But when the client later orders the micro VM to build a boot image, the boot image will contain relocation entries that will re-initialise the given field to the address of the given symbol.
Unlike Solution 2, this solution can make use of the system linker/loader, but adds more burden to the micro VM. But given that the micro VM's boot image builder already has to handle relocation entries, this requirement looks reasonable.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/58Signal handling for clients or user programs2018-06-28T23:11:34+10:00Kunshan WangSignal handling for clients or user programsTL;DR: The concept **signal** is unix-specific and varies *a lot* among operating systems. However, many programs, especially unix command-line programs, depend on signals for basic user interaction, such as CTRL-C. This issue talks abou...TL;DR: The concept **signal** is unix-specific and varies *a lot* among operating systems. However, many programs, especially unix command-line programs, depend on signals for basic user interaction, such as CTRL-C. This issue talks about what a Mu implementation can do for the client and the application programmers. The *simplest* thing a Mu implementation can do is *do nothing*. If the implementation wants to be responsible, there is much room it can design its implementation-specific interface.
# Problem statement
Many UNIX command line programs use signals to interact with the user or other programs **in normal use cases**, such as
- Asynchronous signals:
- SIGINT: received when the user presses CTRL-C
- SIGWINCH: received when the size of the terminal window is changed
- Synchronous signals:
- SIGPIPE: when attempt to write to a broken pipe. For example, when `cat foo.txt | sort | uniq`, and `cat` tries to write to stdout.
## Mu status quo
Mu is not designed for Unix only, and does not contain "signal" in its IR or API.
For most errors, Mu IR programs behave in a signal-agnostic way. For example:
- division by zero: In the [UDIV/UREM/SDIV/SREM](https://gitlab.anu.edu.au/mu/mu-spec/blob/master/instruction-set.rst#binary-operations) instructions, an "exceptional destination" must be specified. It is a basic block in the same function as the `*DIV`/`*REM` instruction, and it behaves like a jump. Implementations may use signals to implement such jump, but may also generate a `CMP reg, 0`, `JE abnormal_dest`, `DIV` sequence.
- NULL pointer: Just like division by zero, the [LOAD/STORE](https://gitlab.anu.edu.au/mu/mu-spec/blob/master/instruction-set.rst#load-instruction) instructions also take an "exceptional destination", which is jumped to when NULL pointer error occurs.
But Mu does not define what happens when CTRL-C is pressed
## Signals, VMs and operating systems
[This article from IBM](http://public.dhe.ibm.com/software/dw/java/i-signalhandling-pdf.pdf) described what happens when the user presses CTRL-C on different operating systems. Obviously the behaviours vary greatly.
Quote:
- On z/OS and AIX: A single thread, chosen by the operating system, receives the signal.
- Linux: All threads receive the signal, and the signal handler is invoked on each thread. Linux threads are just separate processes that share the same address space, so it is also possible for another application to raise a signal on a specific thread.
- Windows: A new thread is created for executing the signal handler. This thread dies once the signal handler is complete.
JVM does not provide any official mechanism to handle signals, probably because JVM is not UNIX-specific, either. In fact, [Jython cannot handle CTRL-C like CPython does](http://bugs.jython.org/issue1270): Jython immediately terminates the Python program rather than raising a catchable Python exception.
The closest public Java API is `Runtime.addShutdownHook`. The hook will be called when the VM is shutting down, including when CTRL-C is pressed. But this mechanism cannot prevent the shutdown sequence from happening.
There is a proprietary interface: [sun.misc.Signal](http://www.docjar.com/docs/api/sun/misc/Signal.html). The documentation says the signal is handled in a new Java Thread running at MAX_PRIORITY. But `sun.misc.*` is private to the JVM implementation, and thus cannot be depended on.
# What can Mu implementations do?
For synchronous signals, such as `SIGPIPE`, the the client should provide wrappers so that `write` does not raise `SIGPIPE`, but instead returns a special error code, or throw a Mu exception. In this way, the client bypasses the potential signal. It looks like [simply masking this signal, or setting a file-descriptor to not raise SIGPIPE] will do the job.
For asynchronous
1. Do nothing. SIGINT, ... will simply kill the process. Or,
2. Provide platform-specific interfaces to the client.
The first option is the easiest if we just want a running micro VM.
The second option is where the Mu implementation writers (such as [mu-impl-fast](https://gitlab.anu.edu.au/mu/mu-impl-fast)) can demonstrate their creativity. It will be like doing a mental exercise of "How will you design the Java API so that it is easy to write command line tools (such as `cat` and `grep`) in Java?"
One method I can think about is to provide a global event queue. The client should provide a thread that polls from this queue in the background and take appropriate options (such as sending messages to other Mu threads, or interrupting them via setting shared variables, using watchpoints or futexes).
If we find one interface particularly favourable, we may consider *standardising* the extensions for *all Mu micro VMs on UNIX*.
# Higher-level view
"Signal" is a 1980s-1990s UNIX idea. Before 1995, POSIX does not have "threads", so there were one thread per process. Signals are sent to processes, and then they are handled by **the only thread** in the process. At that time, signals were probably a very straightforward message-passing mechanism. The **stack layout is exposed** to the C programmer via the parameter to the signal handler, probably because at that time, those who handle signals are probably system experts.
But things have changed when **multi-threading** model comes into being, and **VMs make the stacks opaque**. This makes us think **what really are the endpoints of signal communication**? It looks like the old signal model is no longer perfectly suited for the multi-threaded world, but the *command line interface is designed at the old ages* and has not fully adapted to the new world, yet. I am looking forward to how future programming languages and VMs can change the way people write command-line programs.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/57Ahead-of-time Compiling & Boot-image2018-06-28T23:11:34+10:00Kunshan WangAhead-of-time Compiling & Boot-imageThis issus is about supporting ahead-of-time compiling and building boot-images.
The Mu (including both the IR and the API) is designed for JIT compiling. Very little is specified about the ahead-of-time compiling scenario. However, r...This issus is about supporting ahead-of-time compiling and building boot-images.
The Mu (including both the IR and the API) is designed for JIT compiling. Very little is specified about the ahead-of-time compiling scenario. However, real-world language VMs (such as the `pypy.exe`, `python.exe` or `java.exe` executable images) are executable images and should be in the system-specific native image format (such as ELF). The image should contain the micro VM and the client. Preferably it should also contain **AoT-compiled core libraries** (such as built-in objec types, `java.lang.Object`) and, in some cases (such as PyPy), the **AoT-compiled interpreter and metacircular client**.
This issue will discuss the following topics:
- Dynamic linking and loading (linking at start-up time by the system linker)
- Symbol resolution (determine the addresses of symbols (such as `write`))
- This will revive an old idea: "load-time constants" (https://gitlab.anu.edu.au/mu/general-issue-tracker/issues/47)
- Proposed [load-time constants](https://gitlab.anu.edu.au/mu/general-issue-tracker/issues/57#note_343): `.const @Xxxx <@T> = EXTERN "write"
- Library dependencies (which `.so` should be loaded?)
- Each ELF or Mach-O file can specify its library dependencies. But this part is extremely platform-specific.
- Could [add a new top-level](https://gitlab.anu.edu.au/mu/general-issue-tracker/issues/57#note_344), but my hypothetical scenarios ([1](https://gitlab.anu.edu.au/mu/general-issue-tracker/issues/57#note_345), [2](https://gitlab.anu.edu.au/mu/general-issue-tracker/issues/57#note_346)) suggest external linkage should be specified in a separate linking step, like: `ld impl_supplied_entry_point.o bootimage.o -l external-lib -o executable`.
- Possible extensions to the API to address boot-image building
- What should be in the boot image?
- This is very client-specific. It's determined by how the client is implemented, metacircular or not.
- How to determine what is in a boot image?
- Probably using a whitelist. The client can always record all necessary things.
I will consider the following scenarios:
1. VM with non-metacircular client (No active project. My obsolete [js-mu](https://gitlab.anu.edu.au/mu/obsolete-js-mu) was an example).
2. AoT compiling Mu IR program into the boot image. (RPySOM interpreter as an RPython program)
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/56ASM-style IR builder2018-06-28T23:11:34+10:00John ZhangASM-style IR builder*Created by: wks*
This issue discusses a higher-level abstraction over the IR builder API. It will allow the client to construct Mu IR CFG in a stateful style. The stateful builder will hold a pointer to the "current basic block" at any...*Created by: wks*
This issue discusses a higher-level abstraction over the IR builder API. It will allow the client to construct Mu IR CFG in a stateful style. The stateful builder will hold a pointer to the "current basic block" at any time. New instructions are implicitly appended to the end of the current basic block. Such interface can also emulate fall-through-style ASM instructions, such as JL, JE, JNE, etc.
It is a layer above the API. The muapi.h should still be kept minimal.
There is a problem in implementation. Such builder is easy to build in SSA form, but since we have switched to the "goto-with-values" form, more book-keeping needs to be done in the client. Probably we still need a soup of objects in the client and do liveness analysis and convert SSA to goto-with-value.https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/55Mu, LibMu and LibMuXxx: Layered API for the client2018-06-28T23:11:34+10:00John ZhangMu, LibMu and LibMuXxx: Layered API for the client*Created by: wks*
As we discussed today, the structure of Mu and its client-facing interfaces should be like this picture:
![mu-libmu](https://cloud.githubusercontent.com/assets/370317/15414816/6b55849c-1e81-11e6-839a-c6cac254845f.pn...*Created by: wks*
As we discussed today, the structure of Mu and its client-facing interfaces should be like this picture:
![mu-libmu](https://cloud.githubusercontent.com/assets/370317/15414816/6b55849c-1e81-11e6-839a-c6cac254845f.png)
(Black text represents the component, and red text represents the programming language they are implemented in.)
In the inner circle is the micro VM. It can be implemented in any language, but it provides a C API, and *both the micro VM and the C API (i.e. the inner black circle boundary) need to be verified*. Outside the outer circle is the client. The ring in between is a library which we call "LibMu". In theory, the client, the LibMu and the Mu micro VM can be implemented in different languages.
When LibMu (or some language-specific LibMu wrappers, such as LibMu-Z for some hypothetical language Z, as shown in the picture as the client-facing semi-circle) talks with the client, **it should present a nice client-friendly interface for the client to construct Mu IR bundles**. Such interface should provide appropriate data structures, data types and constructors or even high-level transformers for the convenience of the client. *This layer does not need to be minimal*.
When LibMu talks with the Mu micro VM, **the interface must be minimal and verifiable**. We agreed (#50) that it is difficult to verify a parser, so it rules out "sending text or binary blobs into the micro VM (across the black circle)". The C API of the micro VM provides a function call-style API (also discussed in #50, but need to be revised) so that LibMu constructs a bundle into Mu by making a sequence of function calls, each function constructs a Mu IR node (such as instruction, basic block, type or constant).
Some programming languages (such as Python, Haskell, ...) may have relatively high overhead when calling C foreign functions, comparing to direct C-to-C calls. If the client is written in such languages (language Y in the picture), it will be slow to construct the Mu-side AST by frequently calling through the C interface. We consider this as a problem of the language implementation of Y. In such case, the client should have some part of it written in C (the inner micro VM-facing semi-circle in LibMu) so that language Y can encode the MuIR bundle and send it to the C component (this interface does not need to be verified) and the C component constructs the MuIR in Mu via the C API (this interface is verified).
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/54Flags in arithmetic/logical operations2018-06-28T23:11:34+10:00John ZhangFlags in arithmetic/logical operations*Created by: wks*
This issue will give the client access to the flags set by arithmetic or logical operations, such as overflow, carry, zero, negative, ... This issue should only affect the BinOp instructions (ADD, SUB, MUL, ...).
Th...*Created by: wks*
This issue will give the client access to the flags set by arithmetic or logical operations, such as overflow, carry, zero, negative, ... This issue should only affect the BinOp instructions (ADD, SUB, MUL, ...).
The design should consider:
- [ ] scalar integral types
- [ ] scalar floating point types
- [ ] vector typeshttps://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/53Access to native thread-local memory2018-06-28T23:11:34+10:00John ZhangAccess to native thread-local memory*Created by: wks*
This issue is about accessing thread-local memory/variables defined in native programs (C). One important application is the `errno` variable in C/C++.
This is only slightly related to #52 which introduces thread-lo...*Created by: wks*
This issue is about accessing thread-local memory/variables defined in native programs (C). One important application is the `errno` variable in C/C++.
This is only slightly related to #52 which introduces thread-local storage to Mu itself. There is no intention to force Mu's thread-local storage use the same mechanism as native programs.
Thread-local storage in native programs is highly machine/OS/ABI-dependent. The register used to point to thread-local buffers varies, and maybe not all platform have such register.
One possible workaround could be depending on helper functions written in C or assembly.
But if we want Mu to integrate deeper with native programs (i.e. do things more efficiently), we can define more instructions (probably "common instructions") to give Mu more capabilities, such as getting/setting the value of the FS register. But any such instructions would likely be platform-dependent and probably optional for unsuitable platforms.https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/52Thread-local storage2018-06-28T23:11:34+10:00John ZhangThread-local storage*Created by: wks*
Add **thread-local** memory to Mu, in addition to the existing *heap*, *stack* and *global* memory.
[Proposal 1](https://github.com/microvm/microvm-meta/issues/52#issuecomment-213364592): the C-like approach, has kn...*Created by: wks*
Add **thread-local** memory to Mu, in addition to the existing *heap*, *stack* and *global* memory.
[Proposal 1](https://github.com/microvm/microvm-meta/issues/52#issuecomment-213364592): the C-like approach, has known problems
[Proposal 2](https://github.com/microvm/microvm-meta/issues/52#issuecomment-213375674) (preferred): a more aggressive designhttps://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/3The client should see opaque references rather than transparent addresses.2016-06-17T15:22:32+10:00John ZhangThe client should see opaque references rather than transparent addresses.*Created by: wks*
(Discussed in 5 Aug 2014 meeting) The client may hold references to the µVM heap, but the client should see opaque references rather than raw addresses. More specifically,
1. The client may hold actual raw addresses...*Created by: wks*
(Discussed in 5 Aug 2014 meeting) The client may hold references to the µVM heap, but the client should see opaque references rather than raw addresses. More specifically,
1. The client may hold actual raw addresses, but it should consider it opaque. It should access the µVM memory using the µVM-provided API and should not depend on the fact that they are addresses.
2. The client holds indices into a table maintained by the µVM (or keys to a hashtable by µVM) and has to access the µVM memory through the API.
And the µVM should keep track on all references held externally.
In either cases, this behaviour should be documented and implemented accordingly.
spec-2https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/4Should fully stick to C++11 memory model2016-06-17T15:22:34+10:00John ZhangShould fully stick to C++11 memory model*Created by: wks*
Currently the memory ordering primitives are copied from LLVM. As stated by LLVM langref, their memory model is not precisely defined.
> These semantics (ordering) are borrowed from Java and C++0x, but are somewhat ...*Created by: wks*
Currently the memory ordering primitives are copied from LLVM. As stated by LLVM langref, their memory model is not precisely defined.
> These semantics (ordering) are borrowed from Java and C++0x, but are somewhat more colloquial. If these descriptions aren’t precise enough, check those specs (see spec references in the atomics guide).
But the MicroVM should have a precise memory model. The C++11 memory model is the result of a lot of effort and we should stick to it.
Things should be done in the MicroVM:
* Remove the "UNORDERED" memory order. It is designed by LLVM for Java, but the MicroVM will treat data race as undefined behaviour and will leave the security constraints of Java to the client.
- If JVM can be implemented in C/C++, its memory model should also be implementable on a MicroVM with C++11-like memory model.
* Add "CONSUME" memory order. Define the "carries-a-data-dependency-to" relation in MicroVM. Some hardwares are aware of dependency.
* Define the program order (which is a total order per MicroVM thread because MicroVM does not have unspecified parameter evaluation order), the synchronisation order, the synchronises-with and the happens-before relations.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/7Choose between swap-stack with and without parameters.2016-06-17T15:22:41+10:00John ZhangChoose between swap-stack with and without parameters.*Created by: wks*
Coroutines may communicate with each other by yielding and, at the same time, passing data between stacks. There are two ways this can be done.
1. Allocate memory region in one stack using ALLOCA and let the other s...*Created by: wks*
Coroutines may communicate with each other by yielding and, at the same time, passing data between stacks. There are two ways this can be done.
1. Allocate memory region in one stack using ALLOCA and let the other stack write data in those region.
2. Let the swap-stack operation take parameters.
The first approach is currently appreciated by @wks and @hosking, but (Dolan et al.)[http://dl.acm.org/citation.cfm?id=2400695] proposed the second approach, but does not discuss the difference between the two.
We need to choose the appropriate one.spec-2https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/1Must specify protocol for client-uvm sharing of signal handlers2016-06-17T15:22:27+10:00John ZhangMust specify protocol for client-uvm sharing of signal handlers*Created by: mn200*
As per discussion in meeting on 1 August 2014, we think that the uVM should be the only entity making the `signal` system call. Other entities (*i.e.*, clients in practice) can register signal-handler-like objects w...*Created by: mn200*
As per discussion in meeting on 1 August 2014, we think that the uVM should be the only entity making the `signal` system call. Other entities (*i.e.*, clients in practice) can register signal-handler-like objects with the uVM, which promises to pass on signals generated by their code.
This needs to be documented in the spec-wiki, perhaps in a section about client-uVM API.https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/2Multiple versions of the same function2016-06-17T15:22:30+10:00John ZhangMultiple versions of the same function*Created by: wks*
This issue addresses the representation of multiple versions of a function due to function redefinition.
Affects: the MicroVM reference implementation, the MicroVM-Client interface.
Does not affect: the MicroVM IR ...*Created by: wks*
This issue addresses the representation of multiple versions of a function due to function redefinition.
Affects: the MicroVM reference implementation, the MicroVM-Client interface.
Does not affect: the MicroVM IR language
## Background
In the current microvm-refimpl, there are several classes whose relations are as following:
1. Function: represents a callable function. One per function ID
2. CFG: a concrete function definition. Has many basic blocks, each of which has many instructions.
3. A Function has zero or one CFG: if zero, the function is declared but not defined; if one, the refers to the most recent version of the function definition.
4. Stack: represents the contexts of nested function activations.
5. Frame: the context of a function activation
6. A Stack has many Frame
7. A Frame has one Function: the function this frame is created for.
## Problem
After function re-definition, a new CFG is created and `Function.cfg` is set to the new CFG. The old CFG is discarded. This is problematic because:
a. Function redefinition only affects future invocations, but there are existing activations deep in the stack.
b. A frame corresponds to a concrete CFG, not an abstract callable Function. When a Function is redefined, the CFG of an existing Frame remains the same (is not redefined).
c. When a trap in an old version of a function is triggered, the client will introspect the frame which requires the metadata of the old version of a function, i.e. the CFG. It cannot be disposed.
## Example
Assume we have a naive Fibonacci number function:
```
int executeCount = 0;
int fib(int n) { // version 1
if (executeCount++ == 1024) { trap(keepalive=[n]); }
if (n<=1) return n; else return fib(n-1) + fib(n-2);
}
```
In the MicroVM, a dictionary is kept so that
```
functions = { "fib" : <version 1 of fib> }
```
When this is executed for too many times, the trap is triggered and the client decides to redefine `fib` as following:
```
int fib(int n) { // version 2
int a=0, b=1;
while(n--) { int tmp=a+b; a=b; b=tmp; }
return a;
}
```
And in MicroVM:
```
functions = { "fib" : <version 2 of fib> }
```
However, when the trap is triggered again (this is possible for this scenario), the control goes to the client and the client looks up `functions["fib"]` to get the metadata. The frame is still for version 1, but it gets `<version 2 of fib>`. This will cause error.
## solution
Change the object relations so that:
* In 3, a Function not only has the most recent CFG, but also maintains a list of historical CFGs.
* In 7, a Frame no longer has a Function, but has a CFG, instead.
* When introspecting a frame, during trap or by other means, the client gets the concrete version of a function rather than just a function ID.
* Specify that instructions in the newer version cannot reuse the IDs from instructions in the older version (as is already like this in the refimpl) so that all instructions (especially TRAP instructions) have unique IDs through time. Each TRAP instruction can uniquely identify the CFG this instruction is defined in.
All function definitions, i.e. CFGs, are kept alive until the last activation have returned.
## open questions
How to identify a particular version of a function? Does the MicroVM really need an interface for getting a specific version of a CFG? The client certainly has more information than the MicroVM about the program.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/5A heavy-weighted "Frame State Construction" mechanism2016-06-17T15:22:37+10:00John ZhangA heavy-weighted "Frame State Construction" mechanism*Created by: wks*
During on-stack replacement (OSR), it is often desired to save the state of a partially executed function, compile a new optimised version of the function and restore the state to the state before. How to map the old s...*Created by: wks*
During on-stack replacement (OSR), it is often desired to save the state of a partially executed function, compile a new optimised version of the function and restore the state to the state before. How to map the old state to the new state is the job of the Client, but the µVM can provide a new mechanism to make this easier.
Frame State Construction creates a new stack frame and populate it to the state of a partially executed function. The client specifies the values of all SSA Values (at least all live values) in the function and the next µVM instruction to execute. Then the stack can be resumed.
This is an epic powerful mechanism that allows the program to continue at any point of code, but may be difficult to implement.
p.s. I do not want to use the word "restore" because the µVM does not care about the "restore" semantic.
Example (in the C language):
```c
void inc_all(int ar[], long sz) {
int i;
for(i=0;i<sz;i++) {
cont:
ar[i] += 1;
}
}
```
But instead of starting from the beginning, I want to continue from the label "cont" with i = 100. The µVM should let me express something like:
```c
construct_frame(func=inc_all,
next_inst="cond",
local_vars={
ar: SOME_OLD_ARRAY,
sz: SOME_OLD_VALUE,
i: 100
})
```
Similarly in µVM IR, the code should be like:
```uir
.typedef @i32 = int<32>
.typedef @i64 = int<64>
.funcdef @inc_all <void (iref<@i32> @i64)> (%ar %sz) {
%entry:
BRANCH %head
%head:
%i = PHI @i64 { %entry: 0; %body: %i2; }
%cond = SLT @i64 %i %sz
BRANCH2 %cond %body %exit
%body:
%addr = SHIFTIREF <@i32> %ar %i
%old_val = LOAD <@i32> %addr
%new_val = ADD <@i32> %old_val 1
STORE <@i32> %addr %new_val
%i2 = ADD <@i64> %i 1
BRANCH %head
%exit:
RETVOID
}
```
I should be able to let it continue with:
```c
stack.create_new_frame(func = "@inc_all",
next_inst = "%addr", // %addr is actually the instruction's name,
// i.e. the instruction that calculates %addr.
local_vals = {
"%ar": SOME_VALUE,
"%sz": SOME_VALUE2,
"%i": 100,
"%cond": 1, // true
"%addr": WHATEVER, // This will be calculated immediately
"%old_val": WHATEVER, // This will be calculated immediately
"%new_val": WHATEVER, // This will be calculated immediately
"%i2": WHATEVER, // This will be calculated immediately
})
```
**Potential challenges**
1. This needs close collaboration with the code generator, especially the register allocator. This may require stack map (mapping in which machine register or memory location each local SSA Value is stored) at **every instruction** that can potentially be continued from.
* solution1: Add some dedicated "continue point" instruction where the code generator generates stack map. The "continue point" itself is a no-op.
* solution2: Upon request, re-compute the stack map for the desired instruction to continue. This must match the actual function code.
2. Cannot continue before a PHI node or a LANDINGPAD. PHI depends on incoming control flow and LANDINGPAD depends on the exception.
* solution: continue **after** those instructions, instead.
**possibilities**
1. Theoretically all possible states of stack can be constructed, not just "continuing from an instruction", but also a frame that "is calling some function but has not returned", or a frame that "is trapped to the client", or a dead stack.
2. Can we preserve the state of a full stack, quit the program, re-run and re-construct the whole stack again? (persistent program state)
# Alternative solutions for OSR state preserving
## Save states in global variables.
This (problematic) approach is taken by [Lameed et al.](http://dl.acm.org/citation.cfm?id=2451541). The saved states are loaded in the beginning of a newly-compiled function.
Problems:
1. used global variables. bad concurrency.
2. needs to generate code for loading those global variables. Lameed et al. compiles the function twice where those loads are removed in the the second compiling.
## Create a partially-evaluated function
This is a functional approach, similar to the concept of "continuation" in SCHEME. The new function takes no parameter (or arbitrary parameters) and behaves like the "bottom half" of the old function.
Advantage:
1. This "continuation" is just an ordinary µVM function and does not require special mechanisms other than OSR and function definition (not even **re** -definition)
2. At least as fast as Lameed et al.'s approach. Both compiles two versions of the new function: one for continuing and the other for newer fresh calls.
Problems:
1. Requires compiling a one-shot function just for one continuation. This epic "Frame State Construction" may look heavy, but is still lighter than compiling a new function.
2. For imperative programming languages, "continuation" may be difficult to create and may require complex control-flow analysis.
spec-2https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/6Document safe/unsafe memory access instructions2016-06-17T15:22:39+10:00John ZhangDocument safe/unsafe memory access instructions*Created by: mn200*
As per discussion on 22 August, we want to have two forms of every memory accessing instruction. One is "unsafe" and the machine is not required to present a corresponding abstract state to the client. One is "safe...*Created by: mn200*
As per discussion on 22 August, we want to have two forms of every memory accessing instruction. One is "unsafe" and the machine is not required to present a corresponding abstract state to the client. One is "safe", and if a memory exception occurs, the machine commits to giving the client an appropriate picture of the abstract state at that point.
All this discussion needs documenting.spec-2https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/9Vector operations2016-06-17T15:22:44+10:00John ZhangVector operations*Created by: wks*
Some modern processors provide SIMD instructions. Using them properly can greatly increase the performance of some computations. The µVM should expose them to the user.
LLVM's approach:
* Vector types are first-cla...*Created by: wks*
Some modern processors provide SIMD instructions. Using them properly can greatly increase the performance of some computations. The µVM should expose them to the user.
LLVM's approach:
* Vector types are first-class types. Most instructions which accept scalar values also accept vector values.
* Binary operations are done element-wise.
* Comparison operations are done element-wise, resulting in a vector of one-bit integers.
* Conversions can be done between integer vectors, integer vector to FP vector, but not between different FP types (no fptrunc and fpext for vectors)
* The select instruction works with vector condition with vector values, and also scalar condition with vector values.
* extractelement and insertelement to build and extract element from vectors
* shufflevector: repermutate elements of two vectors
* All other instructions that work with first-class types, including phi, ret, function calls, the contents to load/store.
* LLVM does not address scatter/gather.
Things to be done in the µVM
* Alignment requirements.
* Provide extra intrinsic functions to support machine-provided operations not covered by the `BinOp` µVM instruction. Example: reciprocal, exp, log, abs, ...spec-2https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/11Mechanisms to support thread synchronisation primitives (including blocking l...2016-09-12T17:09:11+10:00John ZhangMechanisms to support thread synchronisation primitives (including blocking locks)*Created by: wks*
In a multi-threaded environment, threads may need to wait for the availability of mutually exclusive resources or asynchronous events. Blocking locks are provided by many programming languages and operating systems as ...*Created by: wks*
In a multi-threaded environment, threads may need to wait for the availability of mutually exclusive resources or asynchronous events. Blocking locks are provided by many programming languages and operating systems as primitives to facilitate threaded programming.
Characteristics of the µVM prevents directly adopting pThread Mutexes.
1. [Dolan et al](http://dl.acm.org/citation.cfm?id=2400695) implemented efficient blocking locks in an environment with green threads implemented with SWAP-STACK. When there is only one native thread, atomic and ordered memory access is not necessary. Even there are multiple native threads, locks can still be implemented with merely an integer, atomic memory operations and a lock-free waiting queue.
2. There may not always be a one-to-one mapping between language-level threads and native threads. (SWAP-STACK-based green threads is one example) The client may also implement m-to-n relationship where logical threads can be scheduled on any native thread. Re-entrant locks are based on the logical notation of "thread", which may not be native threads. However, pThread recursive mutexes depends on native threads.
3. Spin locks can be implemented using only integers and atomic memory accesses.
4. System calls are expensive.
# The proposal
The µVM only needs to provide two primitives: SUSPEND and RESUME
* SUSPEND: suspend the current thread. The thread blocks, but the state of the underlying stack is still ACTIVE.
* RESUME: resume a given thread if it is waiting.
Note: more commonly known as PARK, UNPARK
On Linux, futex provides these two primitives, except the Linux kernel maintains an internal waiting queue. Their WAIT and WAKE are address-based (the address of the lock) rather than thread-based (Which thread should suspend/resume?). The above two primitives can be implemented on top of futex.
We still need to find equivalent counterparts on OSX, Windows and other operating systems. Using pThread mutex for waiting is also an option, albeit expensive. (On Linux, glibc implements pThread mutexes using a mixture of user-level fast paths and the futex)
# proof of concept
It still needs to be shown (by code) whether these two primitives are sufficient.
# yield points
How to suspend another thread? Who inserts yield points where threads can be suspend, the client or the µVM?
spec-2https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/17Portability vs exploiting architecture potentials.2016-06-17T15:23:01+10:00John ZhangPortability vs exploiting architecture potentials.*Created by: wks*
> For should the enemy strengthen his van, he will weaken his rear; should he strengthen his rear, he will weaken his van; should he strengthen his left, he will weaken his right; should he strengthen his right, he wil...*Created by: wks*
> For should the enemy strengthen his van, he will weaken his rear; should he strengthen his rear, he will weaken his van; should he strengthen his left, he will weaken his right; should he strengthen his right, he will weaken his left. If he sends reinforcements everywhere, he will everywhere be weak. -- Art of War, by Sun Zi
The µVM is designed to abstract over the hardware, but only acts as a thin layer of abstraction.
There are many differences between underlying hardware platforms. Scroll down to the Appendix or read https://github.com/microvm/microvm-meta/issues/16
# Solutions to differences
As mentioned in https://github.com/microvm/microvm-meta/issues/16, there are three solutions, from the weakest to the strongest.
1. Define the differences as **undefined behaviour**. The client must prevent touching those fields at all cost.
2. Define the differences as **implement-defined behaviour**. The µVM implementation define the behaviour and provide documents/compile-time/run-time checkable mechanisms.
3. **Define the behaviour** in the µVM. The µVM implementation bridges the differences.
The 1st approach is the weakest. It makes it absolutely impossible to make use of any platform-specific features and will introduce excessive checkings to avoid undefined behaviour. So the µVM should lie somewhere in between approach 2 and 3.
# The µVM design goal
The µVM is having several conflicting design goals.
1. µVM is low-level. It is a thin layer over the hardware.
2. µVM is minimal. Anything that can be done efficiently by the Client should be done by the Client.
3. µVM is **portable**.
4. µVM should support high-performance VMs
5. µVM potentially run on resource-constrained devices.
Depending on how to interpret **portable**, there are two different implications:
1. The µVM provide compile-time or run-time flags so that the Client can generate platform-dependent µVM IR code.
2. The µVM IR code is cross-platform. The same µVM IR code should run on all platforms.
The second interpretation is more portable than the first. Overall, the more portable the µVM is, the thicker the abstraction layer is.
Goal 4 can be interpreted in different ways:
1. Highest theoretical possible performance.
2. As high as C for computation-intensive problems.
3. As high as Java for computation-intensive problems.
4. Much higher than Python/PHP/R/<*insert your favourite scripting language here*>
5. As high as Python/PHP/R/...
6. As long as the program eventually terminates.
Depending on the concrete implementation, an average desktop/server µVM should reach between 2 and 3.
Goal 5 requires the µVM and the Client to be simple. In this case, the Client and the µVM cannot perform too much reasoning about the program. This results in sub-optimal performance of emitted code.
# Choices in the µVM
## Behaviour of UDIV/SDIV
1. UDIV/SDIV are implementation-defined, or
2. They always behave like Java (div by 0 is an exception, -0x80000000/-1 = -0x80000000)
## Supported vector sizes
Conflicts:
1. The current µVM IR is very expressive. It can express any vector types, given type T and size n: `vector <T n>`
2. Not all are supported by the architecture
So the µVM should
1. Make it **implementation-defined**, or
2. Only support **some selected vector sizes**, or
3. Support some selected vector sizes **and platform-specific vector sizes**, or
3. Support **all** `vector<T n>`, either using scalar operations or vector instructions.
Rationale:
1. The Client can **probe** the supported vector sizes and can generate code accordingly. So it is the Client's responsibility to choose vectors. This is also a kind of **specialisation** which is usually done by the high-level optimiser.
2. Choosing the appropriate vector size is a kind of **instruction selection** and should be done by the µVM in a platform-specific fashion.
For rationale 1, (DeVito et al)[http://terralang.org/pldi071-devito.pdf] used auto-tuning technique to determine the optimal vector size for vectorised matrix multiplication problem.
For rationale 2, (Jibaja et al)[https://01.org/node/1495] proposed adding SIMD into JavaScript, but only providing 128-bit registers to the programmer. However, the following code is difficult for the µVM to convert to 256-bit vectors:
```
__m128 elems[N];
for (int i=0; i<N; i++) {
elems[i] = vector_add_floatX4(elems[i], constant_vector(1,2,3,4));
}
```
The reason is:
1. It requires extensive control-flow analysis to merge two adjacent 128-bit adding to one 256-bit adding.
2. 256-bit vectors have different alignment requirement. So array __m128[] cannot be treated as the array __m256[].
The following code, however, is easier to convert:
```
__align_to(256) __m128 elems[N];
for (int i=0; i<N; i+=2) {
elems[i] = vector_add_floatX4(elems[i], constant_vector(1,2,3,4));
elems[i+1] = vector_add_floatX4(elems[i+1], constant_vector(1,2,3,4));
}
```
However, the client, when generating such code, **is already aware of** the presence of 256-bit vectors.
## Array indexing
Conflict:
1. Array indexing seems to be platform-independent. It is "begin+index" (index can be negative).
2. It is implemented using address calculation and memory accessing, where "address" is word-sized.
Solutions
1. Use **any integer type** as the index and are sign-extended (used by LLVM. See http://llvm.org/docs/LangRef.html#getelementptr-instruction), or
2. The index must be **word-sized** and the Client must appropriately extend/truncate the index to word-sized (current µVM design)
Rationale:
1. It only involves a signed extension of truncation and the µVM can cheaply do it. However,
2. Since the Client can probe the word size and the Client can generate TRUNC/SEXT instructions, it should be done by the client.
# Portable µVM IR as a subset of the µVM
If we can define **a subset of the µVM** with **defined behaviours** and **reasonable** performance (perhaps calling it "µVM Mobile Edition"), then some simplistic µVM Clients (I assume there will soon be many such Clients) can generate portable µVM IR code.
For implementations that seek ultimate performance, the µVM implementation can implement many **machine-dependent instructions** as a super set of such a "portable" IR. (call it "µVM Enterprise Edition"?)
Alternatively, we can define that "subset" as "The µVM IR ®" and treat the implementation-dependent extensions as "extended µVM".
# Appendex: Examples of differences
There are differences between architectures. For example:
The **word length** is different. Some operations (including array indexing) depends on the word length.
The **integer division** instruction behaves differently among architectures, especially in "division by zero" and "signed integer overflow".
Not all operations on all types perform equally well. 64-bit integer operations perform significantly worse (if possible at all) than 32-bit counterparts on 32-bit machines.
The supported **vector length** of vector instructions is different. This may vary from 64 bits (ARM) up to 512 bits (x86 with AVX) with 128 bits widely supported.
Some instructions are "optional" and the functionality must be software-implemented on those architectures. For example, **UDIV** and **SDIV** in ARMv7.
Although most architectures provide all operations (binary arithmetic and logical, comparison and conversion) mentioned in LLVM, concrete instructions does not work on all data sizes. For example, **conversion from floating point to integer** can only convert between certain data types (float to 32-bit int or 64-bit int, but not other int types.) Not all **vector operations** work for all vector types.
The address size (and indexes) of memory (and array) operations is different among architectures.
Supported data type that can be **atomically loaded/stored**. This affects how the Client is going to implement mutex locks.
Alignment requirement for **vector load/store** operations.
In the native interface/C interface/foreign function interface(FFI)
* The **pointer size** depends on the architecture. This also affects **function pointers** for external C functions.
* The **available system calls** depend on the operating system, the ABI and the processor. The Client must handle the differences between operating systems.
Other issues mentioned in https://github.com/microvm/microvm-meta/issues/16
# Current extension mechanism
An **intrinsic function** (or IFUNC for short) is something that has the similar form of a function, but is treated by the µVM as a regular instruction. The simplest form takes only the name (or ID) of the IFUNC:
```
%result = ICALL @uvm.thread_exit
```
The most complete form takes type arguments, value arguments and an exceptional destination:
```
%result = ICALL @uvm.do_something_weird <@T1 @T2 @T3> (%val1 %val2 %val3) EXC %normal %exceptional
```
Theoretically all binary operations can be defined as IFUNCs. For example:
```
%result = ICALL @uvm.math.sdiv <int<64>> (%lhs %rhs) EXC %normal %div_by_zero_handler
```
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/19Code generator prototyping strategies2016-06-17T15:23:07+10:00John ZhangCode generator prototyping strategies*Created by: eliotmoss*
As we start to ramp up in Amherst, we're pondering how best to get things rolling (i.e., what a good starting point is) and where we want to end up.
**Initial prototype thought**
For our initial prototype w...*Created by: eliotmoss*
As we start to ramp up in Amherst, we're pondering how best to get things rolling (i.e., what a good starting point is) and where we want to end up.
**Initial prototype thought**
For our initial prototype we are thinking about using QEMU (locally pronounced KEE-mew), the Quick Emulator. QEMU is a whole machine emulator (but can also be configured to run as user-mode only) that supports emulating one hardware ISA on another one (and same to same, of course). Beng whole-machine it deals with devices, interrupts, memory mapping, etc., but the part of greatest interest to us is that it takes guest ISA code blocks and dynamically translates them to host machine code, maintaining the translations in a code cache. Thus parts of its design will be great for dynamic and adaptive code generation. QEMU works by translating guest code into QEMU IR, which is then passed to a target code generator. That code generator does a useful level of local register allocation and local optimizations (hence the Quick in Quick Emulator).
For our case, μVM IR would be the guest ISA and our target would the host ISA that QEMU targets. A part of how QEMU does things is that the guest register file is represented as a memory data structure and (to first approximation) the host registers are used for local register allocation. To finer approximation I **believe** QEMU can bind some guest registers to host registers, useful where the host register set is larger. For μVM IR the "registers" are the SSA values and we can probably get some mileage from the SSA form in telling the QEMU register allocator helpful things about when things "die", etc.
Downsides of this prototype approach include:
* Dependence on QEMU
* Code quality may be limited, particularly in that it will tend to push all variables to memory as it crosses from block to block
* It is not clear what the situation is with QEMU and multithreading, though it **may** support concurrent execution
Upsides:
* Fast path to getting native code going
* Useful optimizations, especially for a dynamic environment
* Back-end architecture design that we might want to use a guide in our own work
* Already implements major platforms of interest
**Eventual system design thought**
Down the line we may want something like QEMU's code generation design, with more provision for (optional) higher levels of optimization and register allocation across larger scopes (that is, for whole functions, not just blocks/traces). But their internal IR, code generators, etc., might be a very useful base, as well as having a notion of the code cache design and so forth.https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/20Client languages in play2016-06-17T15:23:09+10:00John ZhangClient languages in play*Created by: eliotmoss*
Maybe this is also being discussed elsewhere -- if so, we can close this and move the discussion there. Anyway, on the US side we are wondering about the specific suite of client languages being worked on. We h...*Created by: eliotmoss*
Maybe this is also being discussed elsewhere -- if so, we can close this and move the discussion there. Anyway, on the US side we are wondering about the specific suite of client languages being worked on. We have at least one student interested in doing a mapping from Java bytecode to μVM IR, and we also wonder about some of the dynamic languages (Javascript, maybe? I should perhaps come back and edit this after this week's meeting with the students). What about C or C--? For C we could look at lcc, a little C compiler originally designed for a compiler class, I think, but it accepts some standard kind of C, as I recall. gcc is more of a bear, but ought to be on the agenda at some point, no?https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/23Micro VM interface changes precipitated by having C as a client2016-06-17T15:23:18+10:00John ZhangMicro VM interface changes precipitated by having C as a client*Created by: eliotmoss*
While there will be a bunch of carefully worked out details that need to follow, I thought I would get the ball rolling with a quick summary / highlights of the Modula-3 approach. (See https://www.cs.purdue.edu/...*Created by: eliotmoss*
While there will be a bunch of carefully worked out details that need to follow, I thought I would get the ball rolling with a quick summary / highlights of the Modula-3 approach. (See https://www.cs.purdue.edu/homes/hosking/m3/reference/m3.html for a Modula-3 reference.)
First, M3 has ordinary *traced* references and also **untraced** ones. REF T is the type of a traced reference to a T, while UNTRACED REF T is the type of an untraced one. Traced vs. untraced is a concept distinct from safe vs. unsafe. Safe use of traced and untraced keep them segregated. Untraced objects come from an explicitly managed storage area. You can use NEW on an untraced reference type to allocate in such an area.
Certain regions of code (interfaces or modules -- probably a bundle is the corresponding thing for the Micro VM) can be marked UNSAFE. Unsafe code can do additional things. These are ones that I remember:
* Cast an expression to an arbitrary type (of the same size in bits) - this clearly allows interconverting between traced and untraced pointers, but also other things; this is called LOOPHOLE(e, T).
* Address arithmetic (Digression: In M3 REFANY is the type of a traced reference to any object, which in safe code requires a dynamically type-checked downcast to turn into a REF T and do more interesting things. The corresponding untraced type is ADDRESS, and it has arithmetic operators.)
* Up/down cast without checking
* Free an untraced reference's referent (the function is called DISPOSE)
Safe code must use traced/untraced only is specific ways, described in the language reference.
We may need to add to this sort of design to allow short unsafe code regions to manipulate ordinary (traced) objects safely in the presence of (say) concurrent GC. Also, an untraced version of iref would make sense for the Micro VM.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/24Native Interface (super issue)2016-06-17T15:23:20+10:00John ZhangNative Interface (super issue)*Created by: wks*
This is an outline of issues related to the native interface, that is, interacting with the native world. This topic includes but is not limited to object layout, pointer types, object pinning and foreign function call...*Created by: wks*
This is an outline of issues related to the native interface, that is, interacting with the native world. This topic includes but is not limited to object layout, pointer types, object pinning and foreign function calls. We should open other issues to discuss concrete problems.
* Make a platform-specific Mu specification
+ Address some of the following issues, including object layout, calling convention, ...
+ Draft for AMD64: https://github.com/microvm/microvm-spec/wiki/native-interface-x64-unix
* Type system: (https://github.com/microvm/microvm-meta/issues/34)
+ Raw pointer types
+ Structure types with native/explicit object layout
+ Union types (unlikely to have in Mu)
+ Mapping Mu types to C types and native object layout: (in the [Native Interface](https://github.com/microvm/microvm-spec/wiki/native-interface) chapter in the spec)
* Memory space beyond heap/stack/global
+ Memory spaces with various constraints
- Is it movable, pinnable, has reference, can be referenced to, GC-traced, GC-collected, ...?
+ Object pinning: https://github.com/microvm/microvm-meta/issues/28
- If object pinning is allowed, what does "pin" mean?
* Foreign function interfaces
+ Calling foreign functions from Mu: (The CCALL instruction. See the [spec](https://github.com/microvm/microvm-spec/wiki/native-interface))
+ Calling C functions
+ System calls
+ Calling back to Mu from foreign functions: https://github.com/microvm/microvm-meta/issues/39
+ From C functions
+ Signal handling
The following should be addressed by a higher-level abstraction:
* Loading native libraries
+ The client loads libc and finds the address of `dlopen`, `dlsym`, `dlclose` and `dlerror`. Then the `CCALL` instruction takes care of the rest by calling them.
* Loading "heavier-weight" Mu bundles (currently called MuLF): https://github.com/microvm/microvm-meta/issues/30
The following are not related to the native interface, but are related to raw memory:
* How to expose the address of objects so that the user can analyse the memory behaviour? (This involves profiling, too. We may open a dedicated issue for profiling.)
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/25Extended memory model2016-06-17T15:23:23+10:00John ZhangExtended memory model*Created by: wks*
The current memory model is designed according to the C11/C++11 standard. It involves atomic memory access, locks and threads.
However, there are issues which C11 and C++11 do not address. They are:
- [x] Futex: ...*Created by: wks*
The current memory model is designed according to the C11/C++11 standard. It involves atomic memory access, locks and threads.
However, there are issues which C11 and C++11 do not address. They are:
- [x] Futex: Futex `wait` implies an atomic load-compare-sleep operation. (See [spec](https://github.com/microvm/microvm-spec/wiki/memory-model#special-rules-for-futex))
- [x] Trap/watchpoint/osr: They involves stack binding/unbinding. Just make more things explicit. (See [spec](https://github.com/microvm/microvm-spec/wiki/memory-model#special-rules-for-stack-operations))
- [x] Swap-stack: C11 does not address swap-stack, but swap-stack operations in different threads may conflict and visibility issues involved in swap-stack operations should be clarified. (*making racing swap-stack operations undefined behaviours is okay*) (See [spec](https://github.com/microvm/microvm-spec/wiki/memory-model#special-rules-for-stack-operations))
- [X] Foreign language interface: When parallel µVM programs runs together with parallel native programs, we need a way to synchronise the two worlds. This also involves object pinning. See https://github.com/microvm/microvm-meta/issues/37 (It ends up that Mu cannot make much guarantees known that the worst case of native memory access is segmentation fault. This thing must be very implementation-specific.)
Current ideas:
**Futex**: (open question) Should `futex_wake` happen before the next instruction after the `futex_wait` that actually wakes up? **Probably yes.**
```
int shared_state = 42;
atomic_int thread1_wake = 0;
futex thread1_futex;
thread1:
while(load(ACQUIRE, &thread1_wake)!=1) { // OP4
futex_wait(thread1_futex); // OP5
}
int s = shared_state // OP6
thread2:
shared_state = 99; // OP1
store(RELEASE, &thread1_wake, 1); // OP2
futex_wake(thread1_futex); // OP3
```
According to the semantic of RELEASE and ACQUIRE, if OP4 sees the store by OP2, then OP6 must see the store of OP1.
The problem is, if the OP3 `futex_wake` is not guaranteed to happen before the waking of OP5 `futex_wait`, then the next OP4 in the loop may not see the store by OP2 at all, and may go to sleep another time. Then it will be sleeping forever.
**stack binding/unbinding** cannot be atomic. Swap-stack is a combination of two and cannot be atomic, either.
We may require the language implementer to correctly use other synchronisations to make sure racy `swap-stack` operations do not happen.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/27Finaliser2017-04-05T15:34:00+10:00John ZhangFinaliser*Created by: wks*
NOTE: This proposal may go to the interface of a specific µVM implementation rather than the µVM specification. The µVM can cheat by presenting an "infinite heap" and telling the program no objects are ever reclaimed.
...*Created by: wks*
NOTE: This proposal may go to the interface of a specific µVM implementation rather than the µVM specification. The µVM can cheat by presenting an "infinite heap" and telling the program no objects are ever reclaimed.
# Proposed mechanism
Add an **objects-to-finalise queue**. This queue is maintained internally by the µVM. Each element is a strong reference to an object. It may not be FIFO.
Add a new instruction `@uvm.gc.prevent_death_once (%r: ref<T>) -> void`. Then the next time when the GC decided that the object of `%r` is a garbage, it will put it in the queue. This instruction is *atomic*.
Add a new instruction `@uvm.gc.next_object_to_finalise () -> ref<void>`. This instruction removes one reference from the *objects-to-finalise queue* and return it. The current µVM thread blocks if this queue is empty. This instruction is *atomic*.
## Typical usage
Take Java for example.
+ On allocating any object that has an overridden `finalize` method, the `@uvm.gc.prevent_death_once` instruction is executed on the newly allocated object.
+ There is a background thread (ordinary µVM thread, running Client-supplied µVM IR code) running in a loop, executing the `@uvm.gc.next_object_to_finalise` instruction.
+ When the GC is about to collect the finalisable object, it instead puts the object in the queue. The background thread gets that object and do whatever it wants according to the high-level language semantics.
# Backgrounds
**Java**: Objects with `finalize` methods are finalised. `finalize` is **executed automatically only once** per object. Finalisers are executed in **unspecified threads**, at **unspecified times**, in **unspecified order**, may be executed **concurrently**. This sentence: "Every pre-finalization write to a field of an object must be visible to the finalization of that object." implies some kind of fence.
**Python, Ruby, PHP, ...**: Using naive reference counting, finalisers are deterministically executed when the reference count drops to 0.
# Open questions
+ The blocking mechanism (the instruction `@uvm.gc.next_object_to_finalise` can block) looks redundant given that the µVM already has the Futex interface which blocks a thread. There is no obvious way to wake up a thread waiting on this "queue".https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/28Object Pinning2017-05-02T16:25:32+10:00John ZhangObject Pinning*Created by: wks*
This issue is part of https://github.com/microvm/microvm-meta/issues/24
# TL;DR
This proposal gives meaning to the "object pinning" operation.
The meaning is: The PIN operation takes an `ref<T>` or `iref<T>`, ...*Created by: wks*
This issue is part of https://github.com/microvm/microvm-meta/issues/24
# TL;DR
This proposal gives meaning to the "object pinning" operation.
The meaning is: The PIN operation takes an `ref<T>` or `iref<T>`, pins the object for the current thread, and returns a `ptr<T>` (pointer to `T`). This pointer **can be used to** access the memory location of the `iref` until all threads which have pinned the object unpinned it using UNPIN operations.
Note: This has very few implications to the Mu implementation. It only says the pointer can be *used* in the expected way, but does not say anything about the storage of the actual object. (The micro VM can cheat!)
## Operations
In the following two instructions, `R` can be either `ref` or `iref`.
* `PIN(%r: R<T>) -> ptr<T>`: Add the object referred by `%r` to the *pinning set* of the current thread, and return a pointer.
* `UNPIN(%r: R<T>) -> void`: Remove the object referred by `%r` from the *pinning set* of the current thread.
`PIN` and `UNPIN` do not pin any object if `%r` refers to a memory location not in any heap object. If `%r` is NULL, `PIN` returns a NULL pointer. If `%r` is an `iref` and refers to a stack cell or a global cell, `PIN` returns a pointer to it.
> NOTE: All memory locations in Mu, not just heap objects, are referred by `iref`. In order to let native code work with the Mu memory, pointers always have to be generated. That is why `PIN` and `UNPIN` trivially work with non-heap memory locations as well. It may be impossible at compile time to know whether an `iref` refers to the heap. For example, there may be a function taking an `iref` as a parameter.
## The guarantees
The pointer returned by `PIN` has the following guarantees:
* The pointer is usable as long as the object pinned by `PIN` is in the *pinning set* of **any** thread.
* The pointer points to a region of address which can be used to access the memory location of the parameter of `PIN` (i.e. `%r`). Specifically:
+ The object layout conforms to the platform's Mu Application Binary Interface (yet to be defined).
+ The native code can perform LOAD, STORE, CMPXCHG, ATOMICRMW, FENCE operations on those locations and they shall conform to the Mu memory model. However, *which native instruction/operator/function performs which operation in the Mu memory model* is implementation defined.
> One memory order can be implemented in multiple different ways. e.g. on x86, SEQ_CST can be implemented as (load: MOV, store: XCHG), but also (load: LOCK XADD(0), store: MOV). It is the implementation to guarantee the Mu memory operation (Mu IR instructions) is compatible with the native counterparts (C11 `<stdatomic.h>` or C++11 `<atomic>`). For example, one particular implementation may let the `atomic_load(ptr, memory_order_xxxxxx)` function in glibc (but not `atomic<T>.load(xxxx)` libc++ provided by LLVM) to perform the LOAD operation in xxxxxx memory order in the Mu memory model.
## Issues about multi-threading
It is possible for two threads to pin the same object. For example, there are two threads T1 and T2 and object O. The execution appears like the following sequence:
1. T1: pin O
2. T2: pin O
3. T1: do something with O
4. T1: unpin O
5. T2: do something with O
6. T2: unpin O
In step 5, T1 has performed an unpin operation. If an object can be pinned from one thread but unpinned by another thread, then there will be a problem: If the object O is no longer pinned, it will be an error if T2 do anything to the pointer.
It is possible to require a thread to acquire a lock or perform reference counting before pinning/unpinning, but this will be inefficient because this inevitably involves expensive atomic operations. But one reason for using the FFI is performance.
Therefore, we let different threads to pin/unpin an object **locally**: `PIN` means pinning an object **for the current thread**. An object is pinned if and only if at least one thread is pinning it.
Implementation-wise, this can be done by keeping a thread-local buffer which records all objects the current thread is pinning. When GC happens the marker looks at the thread-local buffers to find all objects pinned by any thread. In this way, mutators do not need atomic memory operations, but the GC needs to look at all threads.
This "thread-local pinning" mechanism cannot be implemented by the client if the `PIN` instruction in Mu is racy. Giving the client access to the thread-local buffer is no different from the thread-local `PIN` instruction. So this thread-local pinning mechanism does not violate the principle of *minimalism* of Mu: it cannot be implemented efficiently outside Mu.
-----------------------------CUT HERE. BELOW ARE LEGACY TEXTS-------------------------------
# Abstract
I propose defining two kinds of memory spaces: *real space* which models the memory used by C or native programs, and *imaginary space* for that of the µVM. *Object pinning* (or *realising*) is an operation that temporarily makes a memory location in the imaginary space real so that it can be access form C programs.
# Proposal
## Concepts
* **memory**: self-explanatory, but... I don't trust "common sense".
* **memory location**: a region of data storage. Holds values.
* **virtual memory space**: the abstraction provided by the OS and the architecture. It has the following properties:
+ At any moment, it is a mapping from addresses (a subset of integers) to byte values. (I don't like this property. For any multi-threaded program, different threads may not see the same value, and Albert Einstein does not like "the same time".)
+ It can be accessed (read/written/atomicRMW) in various granularities (sizes). The atomicity and visibility between threads follows a certain memory model (the one defined by the architecture, OS and related programming languages).
+ It may be shared between processes and threads. Thus it can be accessed by things not in the µVM.
* **real memory**: memory in which memory locations satisfy the following properties:
+ (Does not need to have "addresses", that is, a memory location can be a variable, not numerical value.)
+ Allows memory accesses (load/store/atomicRMW).
+ For every memory location L, there is a unique memory location L' in the virtual memory space. (This disallows replication.) This L' does not change during the lifetime of L. (This disallows moving.) Accessing of both locations are equivalent.
+ For any two memory locations L1 and L2, their corresponding memory locations in the virtual memory space do not overlap. That is, their accesses are independent. (This disallows aliasing.)
+ For an array in a real memory, its corresponding memory location in the virtual memory space is contiguous. (This disallows implementing arrays as multiple disjoint sub-arrays.)
* **imaginary memory**: memory in which memory locations satisfy the following properties:
+ (Does not need to have "addresses", that is, a memory location can be a variable, not numerical value.)
+ Allows memory accesses (load/store/atomicRMW).
NOTE: As can be seen, "real memory" is trivially "imaginary memory".
* **realising**: temporarily letting a memory location in an imaginary memory have the property of real memory. (This is colloquially called **object pinning**, but it is more than "not moving").
* `iref<T>` (**internal reference**): refer to a memory location in real or imaginary memory.
* `ptr<T>` (**pointer**): an address. May or may correspond to a memory location in the real memory.
## In the µVM
* All memory in the µVM (heap, stack and global) are imaginary memory.
* Introduce the pointer type `ptr<T>`. It is just a raw address, but is typed.
* Introduce the `PTRCAST` instruction which can freely cast `ptr<T>` to or from `int<n>` if n is the appropriate size.
* `LOAD`, `STORE`, `CMPXCHG`, `ATOMICRMW` now work with both `iref<T>` and `ptr<T>`.
* The `CCALL` can call a C function.
+ Plan A: The callee can have type `int<n>`. It is just an integer address.
+ Plan B: Introduce a `c_func<sig>` type. It is castable to/from `int<n>`. NOTE: `func<sig>` refers to µVM functions.
## Pinning
* "Pinning a memory location" means "realising" it, granting it the property of real memory.
* Implicit pinning: Any `iref<T>` values used as arguments of `CCALL` are implicitly pinned during this call.
* Explicit pinning:
+ Plan A: Introduce `REALISE` and `UNREALISE` instructions. Do as it means. The `REALISE` instruction returns a `ptr<T>` value.
+ Plan B: `REALISE` and `UNREALISE` have counting semantics. An object is "unpinned" if its pin-count reduces to 0.
+ Plan C: (the tracing approach) Introduce a type `pinner_iref<T>` which actually holds an `iref<T>` (a [marked storage type](https://github.com/microvm/microvm-spec/wiki/type-system#types-and-type-constructors) of `iref`). `pinner_iref<T>` must be in the memory (not SSA, just like `weakref<T>` cannot be SSA variable). If such a reference is reachable, the referent is pinned. After pinning, the pointer can be obtained via a `GETPTR` instruction. (Plan C does not address replication and non-contiguous arrays)
## Open questions
1. Do we assume stacks and globals as "real" by default?
2. If stacks can move, how do we efficiently realise (pin) it?
3. Do we prevent non-contiguous arrays?
4. How to implement temporary "un-replicating".
# Background: Inter-language interaction
Currently the only way for the µVM to interact with the "outside world" is via traps handled by the client. This interface is called **µVM-client Interface** or **The API**.
For performance concerns, we should introduce a more direct and low-level interface to the "outside world". This new interface is called **foreign function interface** or **FFI**.
## Two worlds
**Imaginary memory**: In a world with advanced garbage collectors, the memory is managed by the GC.
* A high-level memory location (in object or not, for example, if a VM implements movable stacks) may be moved from one address to another (address is the operating system or architecture's virtual address space).
* A high-level memory location It can be replicated (a single high-level object/field corresponds to multiple system memory addresses). This may have different purposes, for example, concurrent GC, security, etc.
* A high-level memory data structure may not have the same structure of the system-level memory. For example, a high-level array may be implemented as segments of (non-contiguous) arrays.
* Programs written in C can only access this kind of memory assisted by the memory manager (GC).
* Example: Java, µVM.
**Real memory**: In a world closely interacting with C, the GC is somewhat naive, or there is no GC at all.
* High-level memory locations (as seen by the programming languages (like C) or VMs (like CPython)) do not move and are not replicated. Each high-level memory location (in object or not) corresponds to exactly one OS/architecture-level address as long as it is not deallocated.
* Programs written in C can directly access the memory as long as it has a raw pointer to the memory location.
* Example:
* Any non-GC language: C, C++, Rust, ...
* Any language/impl that tightly interacts with C: CPython, Lua (partially)
## Examples
* The µVM uses "imaginary memory". It does not assume any low-level memory layout except some high-level rules.
* Java exclusively use "imaginary memory". All Java memory accesses through JNI must go though handles. It is even a problem to expose an array to the C language: 1) the VM must support object pinning, and 2) the VM must implement arrays contiguously.
* CPython uses "real memory". C programs hold any Python objects by raw pointers. A C module can customise its own Python object layout to include its own private data.
* Lua uses "real memory". "Userdata" (a chunk of memory allocated by Lua, but is used by the user, like a managed "malloc") is a Lua object. `lua_touserdata` gets a raw pointer to such a chunk of memory and does not need pinning. `lua_topointer` gets a raw pointer to any Lua object (for debugging purpose).
* SpiderMonkey uses something hybrid. Its GC can move objects, but not within a "request" (a delimited region in C programs where GC "must not happen"). In a "request" (probably everything in C that interacts with SpiderMonkey, the C program can use raw pointers to refer to JS objects, though their structures are opaque, and it is recommended to use `JSHandleValue` to mark it as a GC root.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/29Heap Allocation and Initialisation Language (HAIL)2016-07-21T14:40:31+10:00John ZhangHeap Allocation and Initialisation Language (HAIL)*Created by: wks*
This proposal describes a language that allocates and initialises heap objects (and also global memory)
This proposal *does not* address *initialiser function*. It will be addressed in another issue.
# Rationale
...*Created by: wks*
This proposal describes a language that allocates and initialises heap objects (and also global memory)
This proposal *does not* address *initialiser function*. It will be addressed in another issue.
# Rationale
A code bundle (or simply "bundle" in our current terminology) contains **types**, **function signatures**, **constants**, **global memory cells** and **functions**. This is insufficient for a standalone Mu IR program.
A typical program usually contain statically declared and **load-time initialised** **heap objects**, e.g. **strings**, **class objects** (`java.lang.Class`) and so on. A developer from the PyPy project has indicated that there can be a lot of statically declared heap object. Currently those objects can be created and initialised in two ways:
1. The client allocates and initialises heap objects via the Mu Client API. This approach suffers from one particular shortcoming: performance. The API can only initialise one memory location (e.g. one element of an array, or one scalar field of a struct) per API call.
2. Include a particular function per bundle which creates and initialises heap objects. This approach has performance and complexity problems. This "function" must contain full description of all heap objects: their types, and the values of all (or some non-zero) fields, therefore the function can be huge. This information has to be encoded as Mu IR instructions and Mu IR constants, and the compiler has to translate this **humongous** "initialiser function" into runnable form and then execute it to make heap objects, and this function is executed only once. It is a waste of time and memory to compile such a one-shot function.
# Solution
The proposed solution is a compact file format that describes heap objects and initialises the memory.
Sample:
Assume we have a "traditional" Mu IR Bundle:
```
.typedef @i64 = int<64>
.typedef @i8 = int<8>
.typedef @double = double
.typedef @string = hybrid <@i64 @i8>
.typedef @void = void
.typedef @refstring = ref<@string>
.typedef @refvoid = ref<@void>
.typedef @ClassFoo = struct<@i64 @double @refstring>
.typedef @intarray = hybrid<@i64 @i32>
.global @HW <@refstring> // A global memory cell, initialised to NULL, which may hold a string reference later.
```
**After** loading the previous bundle, load this Heap Allocation and Initialisation Language (HAIL) file:
```
// HAIL file
.new $a <@i64> // A new object of just a number
.newhybrid $hw <@string> 12
.new $classFoo <@Foo>
.new $x <@refvoid> // An object whose content is only a heap reference to void
.new $y <@refvoid> // ditto
.newhybrid $hugeArray <@intarray> 10000
.init $a = 42
.init $hw = {12, {'H', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd'}}
.init $classFoo = {42, 42.0d, $hw} // Objects can directly refer to each other
.init $x = $y // Objects are first allocated and then initialised
.init $y = $x // So they can form circular references
.init @HW = $hw // @HW is a global cell declared in the previous "traditional" bundle. HAIL can initialise global cells in traditional bundles, too.
.init $hugeArray[5000] = 42 // Only initialise a particular elements. Other elements are 0.
// NOTE: only $hw is retailed because it is referenced by the global cell @HW. Other objects may immediately be garbage-collected (or not allocated at all if the Mu VM can "cheat")
```
## Structure
Heap objects allocated in this form has a special sigil `$` which is local to the current file.
A Heap Allocation and Initialisation Language (HAIL) file contains many of the following **top-level definitions**:
**.new**: Allocate scalar object in the heap. Has the form: `.new $name <@type>`
* `$name`: the local name of the object.
* `@type`: the type of the object.
**.newhybrid**: Allocate hybrid object in the heap. Has the form: `.newhybrid $name <@type> length`
* `$name`, `@type`: same as ".new"
* `length`: the length of the var part
**.init**: Initialise a heap object or a global cell. Has the form: `.init name[sub1][sub2]... = val`
* `name`: The name of the heap object or global cell. In this format, heap objects use special sigils (`$xxx`) while global cells uses global names in the Mu IR (`@xxx`).
* `sub1`, `sub2`, ...: Subscriptions. Ways to navigate through structs, arrays and hybrids. Specifically, in hybrid, the fixed part is 0 and the var part is 1.
* `val`: The value. It can be one of the following:
* Integer literals: 1, 24, -345, 0x456, 'H'
* FP literals: 1.0f, 3.14d, nanf, nand, +infd, -infd, bitsd(0x7ff0000000000001)
* Struct/array/hybrid literals: {elem0, elem1, elem2, ...}
* NULL
* other names (can be other heap objects of Mu IR constants, global cells (as internal references) and functions (as function references)): `$hw` `@HW` `@main`
## Comparing to API-based object allocation and initialisation
A HAIL file is a unit of delivery to the Mu VM. Only one API call is needed to load a whole HAIL file and it can allocate and initialise many objects.
"Loading a HAIL file" will be a new API message (or function).
# Performance Concerns
For better performance, this format should have a more compact binary format. Ideally the binary format can be very close to the in-memory representation of objects and require little more than copying data from the file to the memory and handle data sizes/padding/alignments. It cannot be perfectly identical to the in-memory representation because Mu's object layout is platform-dependent.
# When to use the HAIL format
HAIL should be used when the client wishes to allocate many objects and bulk-initialise the memory. For example, when loading a Java .class file, a Mu IR bundle is loaded for the Java functions, and then a HAIL file is loaded to create/initalise the `Class` object, the virtual table, string literals and so on.
Another example: Assume there is a PyPy interpreter implemented on Mu IR. The executable PyPy interpreter is represented as Mu IR bundle, but a HAIL file can be used to initialise the interpreter **instance** and associated objects.
# When HAIL may not be ideal
If the Mu VM is metacircular, the client is written in the Mu IR and accessing the Mu memory from the client will have no overhead. The HAIL format can still be implemented for compatible reason, but would not have any advantage in performance over direct memory accesses. For example, a metacircular Mu-based JVM can load a .class file and compile its methods to Mu IR, but the Class object can be created directly in the Mu IR because the JVM client itself is in Mu IR. It does not need to serialise the sequence of object allocations and initialisations into HAIL before doing them.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/31GC: Is "liveness" of objects really needs to be defined?2015-04-15T19:47:10+10:00John ZhangGC: Is "liveness" of objects really needs to be defined?*Created by: wks*
Currently the Mu spec defines "live object" as being reachable from roots.
An alternative definition is to define heap objects as *always live*, but implementations can "*cheat*". We define:
A memory location has...*Created by: wks*
Currently the Mu spec defines "live object" as being reachable from roots.
An alternative definition is to define heap objects as *always live*, but implementations can "*cheat*". We define:
A memory location has a **lifetime**. An internal reference (iref) to a memory location is **valid** as long as the memory location's lifetime has not expired. Specifically,
* The lifetime of a memory location in the *heap* begins when the `NEW` or `NEWHYBRID` that allocates the heap object is executed. It **never expires**.
* The lifetime of a memory location in the *global memory* begins when the bundle that defines it is loaded. It never expires.
* The lifetime of a memory location in the *stack* begins when the `ALLOCA` or `ALLOCAHYBRID` that allocates the stack cell is executed. It expires when the function activation which the stack cell is allocated is destroyed by either returning, throwing an exception, or killing the stack.
If the Mu spec no longer define liveness by the "root set" and transitive reachability, the Mu implementation must infer those reachability rules from other parts of the spec. I believe a carefully defined spec implies the same rules as explicitly defined reachability rules.
## Examples and corner cases
**When an object is unreachable from the roots** (previously defined as "dead"): Mu can reclaim the object. Since it cannot be reached, the client and Mu IR programs will never find out Mu is cheating about the lifetime of heap objects, which were defined as "forever". (You can kill an immortal if nobody can see him/her again.)
**Object pinning**: Since an address is exposed during the period of pinning, the GC must not collect the object otherwise the native code will find Mu is cheating. In other words, pinning keeps an object alive.
## Weak references and finalisers
**Weak reference**: We must change the meaning of "weak references" because we no longer define "reachable". We can define it as:
* At any time, Mu **may** atomically set the values of some weak references to NULL if "after doing so, no one can prove that their referred objects can otherwise be reached". (This is not formal at all. Maybe weak references are really meaningless.)
In this way, a Mu implementation that never clear weak references is a valid implementation. But an implementation that does so may legally do so.
**Finaliser**: It was not defined because of it is not guaranteed to be caught. But in order to allow the Mu implementation cheat, I define it as:
* Any object may have a "prevent-one-death" flag (set when a finalisable object is created).
* There is a queue maintained by Mu. (finalising queue. The client implements a finalising thread watching the queue.)
* At any time, Mu **may** atomically remove the "prevent-one-death" flag of an object and put it in the queue mentioned above, provided that "the only way to get a reference to that object is via the queue". (This does not sound very formal, either.)
In this way, a Mu implementation that never call any finaliser is a valid implementation. But an implementation that does so may legally do so.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/32Better support for the tagref64 type2015-05-21T17:20:07+10:00John ZhangBetter support for the tagref64 type*Created by: wks*
In dynamic languages, the `tagref64` type (or other future tagged reference type variants) will be used pervasively in the language runtime.
This issue summarises potential improvements on the support for such types...*Created by: wks*
In dynamic languages, the `tagref64` type (or other future tagged reference type variants) will be used pervasively in the language runtime.
This issue summarises potential improvements on the support for such types.
# Tagged reference constant
The Mu IR currently does not have a constant for `tagref64` mainly because it may holds a reference and non-NULL references cannot be constant. However, one possible use of the `tagref64` type is to store a NULL reference together with an `int<6>` tag. In this case, the tag determines the concrete thing it is representing (undefined, nil, null, false, true or other frequently used singleton objects). So it should be possible in Mu to create such `tagref64` as a constant.
Proposed new syntax:
```
.const @name <@tagref64> = TR64FP @double_constant
.const @name <@tagref64> = TR64INT @int52_constant
.const @name <@tagref64> = TR64NULLTAG @int6_constant // The ref is NULL, the tag is @int64_constant
.const @double_constant <@double> = 3.14d
.const @int52_constant <@i52> = 0x123456789abcd
.const @int6_constant <@i6> = 30
```
# Tagged reference equality
Comparing floating point numbers bit by bit is not equivalent to IEEE754's definition of "equality". However, when two `tagref64` values both holds integers or references+tags, the result is deterministic.
In dynamic languages, such comparisons can quickly determine whether two tagged references have the same type (identified by the tag part) and refers to the same object.
Proposed semantic of `EQ` comparison between `tagref64` values:
The result of the `EQ` comparing instruction between `v1` and `v2` is 1 (true) if and only if any of the following is true:
* Both holds `double` values, and
* neither were NaN and both have the same bit-wise representation, or
* both are NaN and they happen to have the same bit-wise representation after converted to `tagref64`.
* Both holds `int<52>` values and they are bit-wise equal.
* Both holds references, and
* their references refer to the same object or both are NULL, and
* their `int<6>` tags are bit-wise equal.
The `NE` instruction returns the opposite result of `EQ`.
> NOTE: `tagref64` uses the NaN space of double. Real NaN `double` values may lose its precise bit-wise representation when converted to `tagref64`. So comparing two `tagref64` values both holding NaNs has unspecified result.
*Alternative possibility*: Require Mu to canonicalise all NaNs to one unique bit-wise representation. In this way, all NaNs compare equal when comparing `tagref64` values bit by bit.
# Default values of `tagref64` types.
Currently the default value (all zero bits. All newly-allocated memory (heap, stack, global) holds all zero bits.) of `tagref64` holds +0.0 as a `double` value. In this representation, all `tagref64` values which hold `double` contents are bit-wise equal to its real `double` representation. So converting a `tagref64` to `double` is trivial: just do a bitcast.
However, languages usually define the values for uninitialised variables/fields as null-like values: `undefined` in JS, `nil` in Lua, `null` in java. There should be an option to make 00000000..00 represent their null types.
There could be a flag to determine the zero value of a `tagref64` type. The proposed syntax is:
```
.typedef @tr64_with_fp_default = tagref64 <DEF_FP(3.14d)> // All 0s represents double value 3.14d
.typedef @tr64_with_ref_default = tagref64 <DEF_REF(0x5a)> // All 0s represents NULL ref with 0x5a as tag.
.typedef @tr64_with_int_default = tagref64 <DEF_INT(0x55aa55aa55aa5)> // All 0s represents integer 0x55aa55aa55aa5.
.typedef @tr64_as_current = tagref64 <FP_DEF(0.0d)> // All 0s represents double value 0.0d, which is the same as the current `tagref64`.
```
The kind of default is a static metadata and the garbage collector can identify it.
This can be implemented by applying an XOR mask on the value after encoding to `tagref64` and before decoding an existing `tagref64`.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/33Impossible states in full-state frame making (OSR)2015-06-18T12:48:59+10:00John ZhangImpossible states in full-state frame making (OSR)*Created by: wks*
There was a proposal about the OSR before, https://github.com/microvm/microvm-meta/issues/5
According to our discussion recently, we decided that:
1. the state of a frame is the PC and a set of local variable val...*Created by: wks*
There was a proposal about the OSR before, https://github.com/microvm/microvm-meta/issues/5
According to our discussion recently, we decided that:
1. the state of a frame is the PC and a set of local variable values.
2. It should be only possible to continue from some designated "OSR points", or the beginning of a basic block, rather than from arbitrary point in the code. This reduces the compiler's burden to generate stack maps.
3. The client supplies an arbitrary subset of local variables and their values and
1. if a variable is supplied but is never used, it has no effect on the execution and is simply ignored.
2. if a variable is not supplied but is used later, it gives undefined behaviour.
But this leads to a problem: the supplied state may never be reproducible from normal execution. For example:
```
.funcsig @foo_sig = @i64 (@i64 @i64)
.funcdef @foo VERSION @foo_v1 <@foo_sig> (%a %b) {
%entry:
%x = MUL <@i64> %a %b
%y = ADD <@i64> %x @i64_0 // The rhs is the constant 0.
%trap = TRAP <@void>
// OSR continues here
CALL @print (%x)
CALL @print (%y)
RET <@i64> %y
}
```
Assume we perform an OSR and construct a frame which continues *after* the `%trap` instruction with local variable values: `%a = 6; %b = 9; %x = 42; %y = 54`. Then the value `%x = 42` is impossible.
But the code generator may consider "adding zero" as a no-op and thus generates the machine code that aliases the register of `%x` and `%y`. For example:
```
foo:
push rbx ; save callee-saved register
mov rbx, rdi ; do multiplication. rbx holds the value of %x and also %y
mul rbx, rsi
mov rdi, rbx ; prepare to call @print (%x)
call print
mov rdi, rbx ; prepare to call @print (%y)
call print
pop rbx
ret
```
Then it is impossible to create such a state as mentioned above. This implies that either
1. we require that such state construction must be possible and **require the code generator not to generate the code like above**, or
2. we further **restrict our API** on frame state construction.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/35Alternative LISP-like Mu IR format2015-06-22T16:04:08+10:00John ZhangAlternative LISP-like Mu IR format*Created by: wks*
Problem: Mu IR needs a parser, but constructing a parser is tedious. Parser generators pulls in additional dependencies.
Solution: Use a simplistic syntax based on LISP.
Example:
```scheme
(typedef @i32 int 3...*Created by: wks*
Problem: Mu IR needs a parser, but constructing a parser is tedious. Parser generators pulls in additional dependencies.
Solution: Use a simplistic syntax based on LISP.
Example:
```scheme
(typedef @i32 int 32)
(typedef @float float)
(typedef @void void)
(typedef @refvoid ref @void)
(typedef @foo struct @i32 @i64 @float @double @refvoid)
(funcsig @f_sig @i32 (@i32 @i32))
(const @FORTY_TWO @i32 42)
(const @DOUBLE_FORTY_TWO @double 42.0d)
(const @SOME_STRUCT_CONST @some_struct @const1 @const2 @const3)
(const @NULLREF @refvoid NULL)
(global @errno @i32)
(funcdecl @write @write_sig)
(funcdef @write @write_v1 @write_sig (%p0 %p1 %p2)
(basic-block %entry
(inst %a (ADD @i32 %p0 %p1))
(inst %b (CALL @sig @callee (%arg1 %arg2 %arg3) (exc %nor %exc) (keepalive %v1 %v2 %v3)))
)
(basic-block %nor
(inst _ (SUB @i32 %p0 %p2)) ; unnamed instruction
(inst _ (BRANCH %exit))
)
(basic-block %exc
(inst _ (TRAP @void))
)
(basic-block %exit
(inst _ (@uvm.thread_exit)) ; COMMINST is no longer necessary because the syntax is already dynamic
)
)
```
**How would this benefit the Mu implementer?** The parser can be written by hand in very few lines of code. This is convenient for languages that has less capabilities (such as C which does not handle complex type hierarchies easily).
**How would this benefit client implementers?** The code generator can be more typed (using structured nested lists), rather than constructing arbitrary strings (using string formatting).
**Binary format?** There can be a simpler and direct mapping between the text format and the binary format. For example, atoms can be encoded as a hash code, and a list can be encoded as a type, a length and a list of values. Mu spec no longer needs to define a text format and a binary format separately.
Problems?
Does not look like assembly.
May be less readable than the current text format without aggressive pretty-printing.
Extra validation should be performed by the parser. (Really? The Mu micro VM is not required to correct any errors. Any error is allowed to have undefined behaviours.)
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/37Memory model in native interface2016-06-17T15:23:38+10:00John ZhangMemory model in native interface*Created by: wks*
# Problem
Currently the Mu memory is all about "memory locations" – a region that holds a Mu value, not directly related to addresses or bytes. The native memory is a sequence of bytes, addressed by integer "address...*Created by: wks*
# Problem
Currently the Mu memory is all about "memory locations" – a region that holds a Mu value, not directly related to addresses or bytes. The native memory is a sequence of bytes, addressed by integer "addresses". They are separate until a Mu memory location is pinned. In that case, the Mu memory location is mapped to a region of bytes in the address space. Accessing one will affect another.
Meanwhile Mu's memory model uses the C++11-style model based on the happen-before relation.
This model imposes a challenge that the model should bridge the Mu and the native world. The native view of the memory as a sequence of bytes should work nicely with the Mu memory, i.e. map to meaningful memory operations in the Mu world. Atomic actions should be consistent and may establish the happen-before relation between two worlds. Specifically:
* What is the unit of memory actions? Previously, it is "Mu memory location".
* If a "load" action is modeled as a tuple: `LOAD(order, type, location)`, and location was "Mu memory location", then what should location be now? Address? What value does it see? Some store? Or something else?
* If a "store" action is modeled as a tuple: `STORE(order, type, location, newvalue)`, and location was "Mu memory location", then what should location be now?
* If a Mu memory location is pinned, and is accessed in a different granularity than the type declared, what will be the result?
* If stored as a whole, but loaded in parts...
* If stored in parts, but loaded as a whole...
* But we cannot model the memory as a byte array which sequentially changes state. (or, can we? Since non-atomic conflicting accesses are meaningless, does this imply it must be changed sequentially, or errors occur?)
# The current model
* A non-atomic load sees the unique store operation that happens before it, and there isn't another store operation that happens between the visible store and the load. If there are more than one such operations, it has undefined behaviour.
* An atomic load sees the value from any of its visible sequence of store operations.
* Mixing non-atomic and atomic operations on the same memory location has undefined behaviour.
# Possible directions
In any way, pure Mu programs should keep its original C++11-like semantics.
1. Make the memory model more machine-oriented and machine-specific.
* May give more dependable behaviours. For example, unaligned memory access is allowed in many architectures, but are not always atomic.
* Obviously this makes Mu less portable. All pointer-based memory access will have machine-specific semantics. But does this matter? This is the "native interface" anyway.
* Interoperability with the C++11 memory model for C/C++ programs will be built upon the machine-specific memory model.
2. Limit what operations are allowed in the native memory.
* Simpler model.
* Probably more undefined behaviours, because they cannot be defined if we tries to make things simple and generic.
* Will limit the capability. e.g. unions won't be used by Mu.
3. Something in between
# Examples
The native program should synchronise with the Mu program via atomic memory accesses.
```c++
// C++ pseudo code
struct Foo {
int x;
int y;
};
Mu_thread_1 {
ref<Foo> f = new<Foo>
ptr<Foo> fp = pin(f);
create_thread(native_thread_2, fp);
store(&f->x, 10, NOT_ATOMIC); // Mu-level store
store(&f->y, 20, RELEASE); // Mu-level store
}
native_thread_2(ptr<Foo> fp) {
while(load(&fp->y, ACQUIRE) != 20) {} // Native load
int a = load(&fp->x, NOT_ATOMIC); // Native load
assert(a == 10);
}
```
In non-atomic memory access, partial reads/write should be based on the bytes representation (it is called the "object representation" of a value in C11).
```c++
ref<i32> r = new<i32>;
store(r, 0x12345678); // Assume little endian
ptr<i32> p = pin(r);
i64 addr = ptrcast<i64>(p); // cast the pointer to the integer address
addr += 3;
ptr<i8> p2 = ptrcast<ptr<i8>>(addr); // cast back to pointer, but a different type
i8 value = load(p2);
assert(value == 0x12);
store(p2, 0x9a);
i32 value2 = load(r);
assert(value2 == 0x9a345678);
```
Unaligned 16-, 32- and 64-bit memory access is allowed in x64 (and P6-family guarantees atomicity if not crossing any cache line boundary).
```C++
struct Foo { i32 a; i32 b; };
ref<Foo> r = new<Foo>;
store(&r->a, 0x9abcdef0);
store(&r->b, 0x12345678);
ptr<Foo> p = pin(r);
ptr<i64> p2 = ptrcast<ptr<i64>>(p);
i64 value = load(p2);
assert(value == 0x123456789abcdef0);
```
Could non-atomic memory access mix with atomic counterparts?
```C++
struct Foo { i32 x; i8 y; double z; };
ref<Foo> r1 = new<Foo>;
ref<Foo> r2 = new<Foo>;
ptr<Foo> p1 = pin(r1);
ptr<Foo> p2 = pin(r2);
store(&p1->x, 0x12345678, NOT_ATOMIC);
store(&p1->y, 42, NOT_ATOMIC);
store(&p1->z, 3.1415927D, NOT_ATOMIC);
memcpy(p2, p1, sizeof(Foo)); // This is obviously not atomic
some_synchronization_operation_after_which_atomic_accesses_will_be_safe(); // What should this be?
thread1 {
store(r2->y, 84, RELAXED); // This is atomic
store(r2->x, 0x9abcdef0, RELEASE); // This is atomic
}
thread2 {
i32 a = load(&r2->x, ACQUIRE); // This is atomic
if (a == 0x9abcdef0) {
i8 b = load(&r2->y, RELAXED); // This is atomic
double c = load(&r2->z, RELAXED); // This is atomic
assert(b == 84 && c == 3.1415927D);
}
}
```
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/38Dynamic loading for Java2016-08-11T14:46:43+10:00John ZhangDynamic loading for Java*Created by: eliotmoss*
So Adam and I have run into an interesting question about how to do dynamic loading for Java. The thing is, one does not know all the details of a class in advance. Therefore, it is hard to give things signatur...*Created by: eliotmoss*
So Adam and I have run into an interesting question about how to do dynamic loading for Java. The thing is, one does not know all the details of a class in advance. Therefore, it is hard to give things signatures. Consider, for example, the vtable. We need to have Mu types for all the classes mentioned in all the methods -- the vtable will be a struct of function pointers, each pointer specifically typed. But that would force eager loading of the entire universe to figure out the types!
The only alternative seems to be to refcast all over the place at run time. Is that the intent? (Coming from Java I had a (mistaken) bias that this involves a cost, but I see on referring to the spec that refcast does not involve any run-time work.)https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/39Call-back from native to Mu2016-09-06T20:28:02+10:00John ZhangCall-back from native to Mu*Created by: wks*
# Overview
## Rationale
Some existing C libraries or system interfaces use call-back functions, i.e. user-provided function pointers which are called by C or system libraries. Mu should provide appropriate mechan...*Created by: wks*
# Overview
## Rationale
Some existing C libraries or system interfaces use call-back functions, i.e. user-provided function pointers which are called by C or system libraries. Mu should provide appropriate mechanisms to interface with those libraries.
This is part of the (unsafe) native interface. See super issue: https://github.com/microvm/microvm-meta/issues/24
## Exposing appropriate Mu functions as C-style function pointers
"Appropriate" Mu functions must only use the following types as their parameter types or return types: `int<n>`, `float`, `double`, `vector<T>`, `ptr<T>` or `struct` types whose components are these types. In the case of `ptr<T>`, `T` can also be `array<T n>` or `hybrid<F V>` where `T`, `F` and `V` are one of the above types. In other words, (traced) references and Mu-specific opaque types are not allowed.
The Mu ABI will be designed to be compatible with the C calling convention as defined by the platform ABI.
**way 1**: (simple) Mu functions are declared with the optional `WITH_FP` clauses to create their associated C-style function pointers. For example:
```
.funcdecl @some_func WITH_FP(@fp_some_func DEFAULT @COOKIE) <@sig>
.funcdef @other_func VERSION @other_func_v1 WITH_FP(@fp_other_func DEFAULT @COOKIE) <@sig2> WITH_FP @fp_other_func (%param0) {
...
}
```
With the above definitions, `@some_func` has type `func<@sig>`, which is a Mu function reference value. `@fp_some_func` has type `funcptr<@sig>`, which is a C-style function pointer. Similarly `@other_func` is a `func<@sig2>`, while `@fp_other_func` is a `funcptr<@sig2>`. `DEFAULT` is the calling convention. `@COOKIE` is a "cookie" (see *way 2* below).
The Mu IR program or the API can pass the function pointer to the native program. When called, the Mu function will run and return its return value to the native caller.
* pros:
1. simple
2. The native funcptr is immediately available after loading the Mu bundle.
* cons: does not support "closures" well. Some languages/implementations (e.g. LuaJIT) would like to expose closures (rather than just functions) to C as callbacks.
**way 2**: (complex) Mu functions are exposed with a run-time invocation of a Mu instruction or a Mu API message.
Format:
* Instruction: *fp* = `EXPOSE_MU_FUNC` `<` *sig* `>` *mufunc* *cookie*
* API: *fpHandle* = ca.exposeMuFunc( *hMuFunc*, *hCookie* )
The resulting *fp* has type `funcptr<sig>` and can be called from C. A function can be exposed multiple times, and the resulting function pointers are mutually inequal. The *cookie* is an `int<64>` value associated to the resulting function pointer. If a Mu function is called through a particular function pointer, a special instruction `NATIVE_COOKIE` will return the associated *cookie* value.
Example:
```
%fp1 = EXPOSE_MU_FUNC <@sig> @some_func @some_int64_value
%fp2 = EXPOSE_MU_FUNC <@sig> @some_func @other_int64_value
...
UNEXPOSE_MU_FUNC %fp1
UNEXPOSE_MU_FUNC %fp2
// in @some_func
%cookie = NATIVE_COOKIE
%eq = EQ <@i64> %cookie @some_int64_value
...
```
```
val hFP = ca.exposeMuFunc(hFunc, hSomeInt64Value)
...
ca.unexposeMuFunc(hFP)
```
Both `%fp1` and `%fp2` have type `funcptr<@sig>`. But if the Mu fucntion `@some_func` is called from C via `%fp1`, the `NATIVE_COOKIE` instruction will return `@some_int64_value`. If called via `%fp2`, then `NATIVE_COOKIE` returns `@other_int64_value`, instead.
* pro: the cookie can be used to identify different closures and look up the contexts of the closures.
* con:
1. Not as simple as way1.
2. Exposing a Mu function requires a Mu instruction or an API message. This makes "implementing the Mu client API directly as exposed Mu functions" difficult. (In this case, exposing a Mu function requires an API function, which is also an exposed Mu function.)
## Contexts necessary for Mu functions to run
Even if a Mu function is exposed to the native program as a `functpr<sig>`, some contexts must be set up so that the Mu function can make use of Mu-specific features. These include:
* **Thread-local garbage collection states**: including thread-local allocation pools, and registering the thread for yielding as requested by the GC.
* **Stack context**: Each Mu stack has an associated `stack` value (the opaque reference to the current stack). This is necessary for swap-stack.
Similar to the JNI's "attaching a native thread to the JVM", Mu will also require attaching Mu contexts to a native thread before any exposed Mu function pointers can be called.
If the native program is executed because some Mu program called the native function through the native interface (via `CCALL`), the context is already set up and the C program can safely call back to Mu.
## Mixed native/Mu stacks
With the possibility of both C-to-Mu and Mu-to-C calling, a stack may have mixed C or Mu frames. It has some implications for stack introspection and exception handling. Possible approaches are:
1. Stack introspection cannot go deeper than the last contiguous Mu frame from the top. i.e. introspection is immediately unavailable when reached a native frame. Exceptions may not go into native frames. This approach has the weakest promise from Mu, and is thus the easiest.
2. Mu can skip non-Mu frames and unwind to other Mu frames underneath.
3. Stack introspection and stack unwinding caused by exceptions can go through frames which are supported by the native debugger. This is harder than the previous one, but still practicable.
4. Support non-standard frames (such as JavaScript frames of SpiderMonkey or V8). Too hard.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/40Mu Client Interface as C Binding2015-08-21T15:55:43+10:00John ZhangMu Client Interface as C Binding*Created by: wks*
The current API is expressed in a language-neutral form, and it is the implementation that decides how to implement such an interface. Programmers still need to resort to implementation-specific interfaces to actually ...*Created by: wks*
The current API is expressed in a language-neutral form, and it is the implementation that decides how to implement such an interface. Programmers still need to resort to implementation-specific interfaces to actually use a particular Mu implementation.
Since C is so widely used as a system programming language, the Mu client interface (a.k.a the API) should be expressed as data types and function calls in the C programming language. If the client is not in C, it usually still has a C FFI.
# The API in C
Resources in Mu are exposed in opaque types which have reference semantics: they can be copied and still refers to the same resource.
* `mu_micro_vm_t`: a reference to a Mu micro VM instance.
* `mu_client_agent_t`: a reference to a client agent.
* `mu_handle_t`: a handle to a value in the Mu type system exposed to the client.
Messages are C functions. Like JNI, they are contained in a struct: `typedef struct mu_api_msgs {...} mu_api_msgs_t`. In this way, the client in C does not need to link against any libraries when compiling. The reason is, for a Mu micro VM implemented in a higher-level language (like the reference implementation in Scala), the binding of the callable C function is generated very late, later than even the loading time, and has no access to the native loader.
For example, assume there is a `mu_api_msgs_t* msgs` defined:
```c
mu_client_agent_t ca = ...
char buf[999999];
int sz;
// load file into buf
msgs->load_bundle(ca, buf, sz); // Load a bundle
// Putting C values into Mu
mu_handle_t h1 = msgs->put_schar(ca, 127);
mu_handle_t h2 = msgs->put_sshort(ca, 32767);
mu_handle_t h3 = msgs->put_sint(ca, 42);
mu_handle_t h4 = msgs->put_slong(ca, 42);
mu_handle_t h5 = msgs->put_slonglong(ca, 999999999999999);
// Converting Mu values to C
int v3 = msgs->to_sint(ca, h3);
unsigned long v4 = msgs->to_ulong(ca, h4); // just treat the int as unsigned
```
Mu-level flags are C preprocessor macros. They have type int.
```c
msgs->store(SEQ_CST, hLoc, hNewVal); // SEQ_CST is a macro
```
Callbacks, including the trap handler and undefined function handler, have defined signatures:
```
typedef mu_trap_return_status_t (*mu_trap_handler_t)(
mu_client_agent_t ca,
mu_handle_t stack,
mu_handle_t thread,
int watchpoint_id,
mu_handle_t &new_stack,
mu_handle_t &data_passed,
mu_handle_t &new_exception,
mu_api_msgs *msgs,
void *user_data);
typedef void (*mu_undefined_function_handler_t)(
mu_micro_vm_t microvm,
int funciton_id,
mu_api_msgs *msgs,
void *user_data);
```
These functions are registered via the `msgs->register_trap_handler` and `msgs->register_undefined_function_handler` API messages. In their parameters, the `user_data` is an arbitrary pointer provided by the client in an implementation-specific manner (see below).
# Implementation-defined behaviours
Some aspects of the C binding are implementation-specified. They include:
* How to create a Mu micro VM? Options are:
1. The C executable creates the Mu instance.
2. Mu loads the C dynamic library.
3. Mu starts separately and C connects to the existing instance in the same process.
4. C connects to a Mu instance in a different process, or a different machine.
* Options in creating Mu instances. Options are:
1. Heap size. Giving a heap size means the Client determines the heap size rather than Mu automatically decide its own storage.
2. Global data space size. Setting this value means the global data may have their own storage. Actual implementation could use the heap space, too.
3. Stack size. Similarly, this is too implementation-specific.
* What happens during initialisation?
1. Mu calls a C function to initialise the client, and the client provides a `void*` to Mu for the client's own context. (note: in this case, it is Mu loading C rather than C creating Mu.)
2. C creates a Mu instance, and sets its `void*` user data in a proprietary API message.
# Open questions
* Should we allow each Mu implementation have its own "namespace"? The opaque types (`mu_micro_vm_t` and so on) are opaque, but different implementations may have different representations. The current C binding design forbids one C program working with more than one Mu implementations (though it is okay to work with more than one *instances* of the same implementation).
* JNI does not solve this problem, either.https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/43Reduce special cases involving the void type2016-06-17T15:23:50+10:00John ZhangReduce special cases involving the void type*Created by: wks*
# The current status
The `void` type is a special type in the Mu type system. It has no value, and thus many instructions/mechanisms have special cases for the `void` type.
**Instructions that have special cases ...*Created by: wks*
# The current status
The `void` type is a special type in the Mu type system. It has no value, and thus many instructions/mechanisms have special cases for the `void` type.
**Instructions that have special cases for `void`**:
- `RET` and `RETVOID`: Since `void` has no value (In fact it does. The return value of the `BRANCH` instruction, for example, is a value of the `void` type.), we needed a special syntax to return `void`, thus we have `RETVOID`.
- The "new-stack clause" of the `SWAPSTACK` instruction: `PASS_VALUE <T> %val` and `PASS_VOID`: for the same reason why we have `RET` and `RETVOID`.
**The trap handler has a special case for `void`**:
Just like `SWAPSTACK`, the trap handler may rebind the thread to a stack and either "pass a value" or "pass `void`" or "throw an exception".
## Other existing uses
**Instructions that always return `void`**: `BRAHCN`, `BRANCH2`, `SELECT`, `TAILCALL`, `RET`, `RETVOID`, `THROW`, `STORE`, `FENCE`, some common instructions: `@uvm.kill_stack`, `@uvm.thread_exit`, `@uvm.native.unpin`, `@uvm.native.unexpose`, `@uvm.meta.load_bundle`, `@uvm.meta.load_hail`, `@uvm.meta.pop_frame`, `@uvm.meta.push_frame`, `@uvm.meta.enable_watchpoint`, `@uvm.meta.disable_watchpoint`, `@uvm.meta.set_trap_handler`: These instructions do not return meaningful values.
**Instructions that may return `void` sometimes**: `CALL`, `TRAP`, `WATCHPOINT`, `CCALL`, `SWAP_STACK`: The callee, client, swappee, or whatever the other end of communication is, may not return meaningful values.
## Current properties of `void`
`void` can only be used in 3 cases:
1. As the type of allocation units that do not represent values. Hence it is usable as the referent type of reference types and pointer types. e.g. You can run `NEW <@void>`. Each time you NEW a void, you have a **new** empty object, not the same as any other.
2. As the fixed part of a hybrid to indicate the absence of the fixed part. e.g. `hybrid<void int<64>>` is a variable-length array of `int<64>`, without a fixed part.
3. As the type of instructions or the return type of functions that do not return values. e.g. the `BRANCH` instruction returns `void`.
Other properties:
- `void` has no value (in fact it does, as mentioned before)
- `void` is neither a scalar type nor a composite type.
- Only scalar types can be used for memory access: `LOAD`, `STORE`, ...
- Only composite types have other types as components: fields/elements
- `void` is nether storable nor loadable. It does not contain other parts. It cannot be part of a struct/array/vector. i.e. there is no "array of void". The "fixed part of a hybrid" is an exception.
- `void` is native-safe: It can be returned from native functions; and there can be `uptr<void>`.
# Proposed changes
**value of `void`**: Instead of "having no value", `void` now has exactly one value: NULL. This is consistent with Python: `NoneType` has only one value `None`.
**`void` constant**: We reuse the `NULL` literal to create a "void constant":
```c
.const @VOID <@void> = NULL // The only possible value of void.
// For the sake of consistency, we require the client to define it.
//
// Alternative: make it a pre-defined value, such as the @uvm.predef.void_t type
// and the @uvm.predef.VOID value. We could define @uvm.predef.i8, @uvm.predef.i16,
// @uvm.predef.i32, @uvm.predef.i64, @uvm.predef.float, @uvm.predef.double,
// @uvm.predef.ref_void, @uvm.predef.ref_i32..., @uvm.predef.but the choice seems too arbitrary.
```
All existing instructions that return `void` return this `NULL` value. In theory, the following snippet is valid, but stupid:
```c
%entry:
%x = BRANCH %bb1
%bb1:
RET <@void> %x // return void. Should have said RET <@void> @VOID
// or even "RET @VOID" omitting the type argument, because RET always returns the
// return type of the current function. ADD, SUB, MUL ... would have to infer the operand
// types if the operand type is not provided, but RET does not need to be inferred: the
// function return type is explicit.
```
**Remove the `RETVOID` instruction**: Use `RET <@void> @VOID` instead, or simply `RET @VOID`.
**Remove the `SWAPSTACK` clause `PASS_VOID`**: Use ``PASS_VAL <@void> @VOID`` instead. Unlike `RET`, the type parameter here is necessary: the type that the swappee expects is dynamic. It may expect a different type at a different `SWAPSTACK` site. Guessing the wrong type while swapping has undefined behaviour.
**Trap handlers no longer needs a PASS_VOID return case**: Instead, pass a `NULL` constant.
## New ways to use `void`
In addition to the existing three ways, i.e. empty objects, hybrid fixed part, empty return value, `void` can now be used in the following ways:
- In `RET` to return from a function of `void` return type.
- In `SWAPSTACK` to swap to a stack that does not expect to receive a value (it receives the `NULL` value of the `void` type).
- In the trap handler, rebind the stack which expect void.
They all fit into the category that "the other end of communication" does not pass a value.
## Things that should still be forbidden
**`void` must not be a parameter type**: I don't have a very compelling reason, but it is completely useless (only increases the apparent arity of a function).
**`void` must not be part of a struct/array/vector or the variable part of a hybrid**: Not allowing this will gain us a very nice property: each field/element in any struct/array/vector/varpart has a different offset. In `struct<@i32 void void void void @x>`, since `void` should have size 0 and alignment 1 (in the sense `void` can be allocated at any address *a* such that *a* % 1 == 0), void does occupy space. Then all of the void fields are at the same offset as `@x`. Another reason: C does not allow void to be a struct field.
**Empty structs (`struct<>`) should be forbidden**: For the same reason as `void` as a field. Just use `void` because it is so special. C forbids empty structs, too, but GCC allows it.
# How about LLVM?
LLVM IR has two syntax for the `ret` instruction:
- `ret <type> <value>` for example: `ret i32 100`
- `ret void` this returns void.
LLVM does not have "void constant", either, since `void` is not a "first class type".
LLVM `void` is not a "first class type". Only `void` and function types are not "first class type". LLVM has both "function" types and "pointer to function" types.
LLVM LangRef does not say parameter types cannot be `void`, but `void` is never used as parameter types. In C, `void` is an incomplete type, and thus cannot be a parameter type.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/44Pre-SSA form2016-06-17T15:23:52+10:00John ZhangPre-SSA form*Created by: eliotmoss*
We have concluded that while the official form of Mu IR is SSA form (but see Issue #18 for current thoughts on how to represent that form), many clients will find it more convenient to generate something that is ...*Created by: eliotmoss*
We have concluded that while the official form of Mu IR is SSA form (but see Issue #18 for current thoughts on how to represent that form), many clients will find it more convenient to generate something that is mostly Mu IR but that is not in SSA form, and that is it further desirable to offer a standard tool to convert from some "pre-SSA" form to proper SSA form. This tool may operate in a stand alone manner or be more in bed with an implementation of Mu.
We propose the following specific pre-SSA form, according to how it differs from SSA-form Mu.
1. "SSA-variables" may be assigned more than once; however, any individual such variable must be used in a type-consistent manner.
1. PHIs may be omitted (or, in the proposal of #18, values may be omitted at branches and variables omitted at labels)
1. For convenience we introduce a "copy" operator, var = ID <T> arg, which takes one argument arg of type T and assigns it to variable var. This operator seems to be convenient sometimes from a client perspective.
The converter to SSA-form will perform live-ness analysis and add variables to labels and values to branches as necessary, checking for type consistency. If some variable is live but not initialized, then the converter will insert a safe initialization (to 0 or 0.0 for numeric types, null for a pointer, etc.) at the latest possible point that does not interfere with existing assignments to the variable. (Optimization may move the initialization earlier as deemed appropriate.)
We will undertake to develop the converter in Scala or Java.https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/47Sizeof?2016-06-21T13:56:11+10:00John ZhangSizeof?*Created by: eliotmoss*
We have encountered an interesting issue in developing the C client, namely how to deal with union types. Our thought was to define a separate struct type for each union variant, and then to cast to the appropri...*Created by: eliotmoss*
We have encountered an interesting issue in developing the C client, namely how to deal with union types. Our thought was to define a separate struct type for each union variant, and then to cast to the appropriate struct type when accessing a particular variant. (Note that this requires structs to be heap or alloca allocated, which I think is ok -- C does not treat them as single values that can go into a register, etc., as I recall.)
The problem we have is that because Mu defines the detailed layout of a struct on a given target, we cannot determine the sizes of the structs, and thus we cannot determine the maximum size, something we need in order to allocate an instance of a union type.
We observe that Mu gives no way to as the size of a type (or to get the offset of a field in a struct or an element of an array). While such information may not be used for typical accesses, we now see that it has at least one important use case. Given that C programs are typical way-ahead-of-time compiled, we do not consider it appropriate to generate Mu for C code only at the last minute.
We suggest that Mu provide means to determine sizes and perhaps to do simple load-time (if that is the right word) computations over these constants. Here is some possible syntax (admitting that I have not thought about it long or deeply yet):
.sizeof **name** **type**
Define **name** to be the constant that is the number of bytes needed for **type**.
.sizeof **name** **op** **t1** **t2** ... **tn**
Define **name** to be the sizes of **t1** through **tn** combined with operator **op**, where **op** can be at least **max** and **sum**.
Alternatively, we could define names for the sizes of each type, and a more general constant-computing form:
.define **name** **op** **e1** ... **en**
This would define **name** to be **op** applied to the **ei**. We could provide a suitable range of operators.
For offsets we could have:
.offset **name** **struct or array type** **idx**
This would define **name** to be the constant giving the offset of the **idx**'th field/element of the given struct or array type.
The point is to allow target-dependent computations over constants to be written in a target independent (symbolic) way. I believe this would meet the needs of C.https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/48Mu IR rewriting library2015-10-29T16:47:55+11:00John ZhangMu IR rewriting library*Created by: wks*
Mu aims to be minimal, but such minimalism has made the construction of Mu clients hard. A client-level library can ease the client's job by accepting a slightly higher-level variant of the Mu IR and translating that h...*Created by: wks*
Mu aims to be minimal, but such minimalism has made the construction of Mu clients hard. A client-level library can ease the client's job by accepting a slightly higher-level variant of the Mu IR and translating that higher-level IR to actual Mu IR code (and/or HAIL scripts and/or subsequent API calls).
This issue tracks tasks that should be done at this layer.
* Pre-SSA to SSA converter. #44
- Writing Mu IR in the SSA form is hard, and the goto-with-values form is even harder. The library should automatically convert ordinary CFGs into the goto-with-values form using well-known algorithms.
* Platform-dependent constant values. #47
- Some ahead-of-time clients (notably C or other "traditional" languages) exposes platform details to the programmer as compile-time constants, but binding those values too early will make the object code non-portable. The rewriter should help the client determine these values so that the client compiler can be strictly ahead-of-time.
* Merge Mu IR and HAIL: #29 #46
- The library is not minimal. Integrating both languages will make the client's job easier.
* Annotations
- This will allow clients to attach arbitrary information to the Mu IR code, which can help the client introspect the program at run time.
- Note that if we need to use system debuggers (such as GDB), then these annotations need to go through the micro VM itself because it is the micro VM's responsibility to generate object codes (including the DWARF debug info).
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/49Stack frame iterator2015-11-13T14:34:31+11:00John ZhangStack frame iterator*Created by: wks*
# Problem
The current stack introspection/OSR API is inefficient.
It selects a frame by a number. For example:
```
ctx->cur_func(ctx, stack, 1);
ctx->cur_func_ver(ctx, stack, 2);
ctx->cur_inst(ctx, stack, 3...*Created by: wks*
# Problem
The current stack introspection/OSR API is inefficient.
It selects a frame by a number. For example:
```
ctx->cur_func(ctx, stack, 1);
ctx->cur_func_ver(ctx, stack, 2);
ctx->cur_inst(ctx, stack, 3);
ctx->dump_keepalives(ctx, stack, 4, &values);
```
A real-world implementation may need to unwind the stack from the top, one frame at a time, until reached the frame. If the client needs to traverse through stacks of many frames, the O(n^2) complexity may be a performance bottleneck. One application is to use the Client API (or the equivalent Mu IR (common) instructions) to generate the stack trace for exception handling.
This link shows a real-world Java stack trace: https://ptrthomas.wordpress.com/2006/06/06/java-call-stack-from-http-upto-jdbc-as-a-picture/ Or click here: [jtrac-callstack.pdf](https://github.com/microvm/microvm-meta/files/24885/jtrac-callstack.pdf)
EDIT: well, this is a call graph, not a stack trace. But imagine something goes wrong in JDBC...
# Desired API
The API should provide a "frame cursor" data type, which refers to a frame in a stack. It can be generated for a stack, and iterate through its frames from top to bottom.
The introspection API `cur_func`, `cur_func_ver`, `cur_inst` and `dump_keepalives` will work on this "cursor" instead of a stack and a number.
The OSR API `pop_frame`, can, instead of popping one frame at a time, pop all frames above a particular "cursor". [Preliminary experiments](https://github.com/microvm/liblushan/blob/master/src/test_remote_stack_chop.c) show that this is possible with C programs and [libunwind](http://www.nongnu.org/libunwind/).
## The "frame cursor" type
The "frame cursor" type shall be an opaque reference to ~~a Mu frame~~ a cursor. The cursor holds the context of a "current frame", and can move down to the parent frame.
It must be platform-independent.
It could potentially be large (given the number of registers in a CPU). Therefore it is desirable to be **mutable** – making a fresh copy for each frame would be costly (Haskell programmers may disagree).
There are some subtle interactions between it and the GC. GC may modify references on the stack, but the API must hide this detail from the client. So the API should not expose raw CPU
The cursor may be allocated on the Mu heap, but also may not.
The cursor is only valid when the stack is still unbound. As soon as the stack is bound again, the stack may change in any ways and the cursor is invalidated.
So I can think of some possible solutions:
1. ~~Create a new type `frameref`, like our existing `threadref` and `stackref`.~~ Create a new type `framecursor`. It has reference semantics: it refers to a mutable structure internally held by the Mu VM.
* pro: A dedicated opaque type, the cleanest model.
* con: A new primitive type, pointing to a large structure, just for introspection? Well... maybe not that bad.
* choices: Is it managed by the GC? GC is the easiest way, but we may not be able to print stack trace for OutOfMemoryException. (really? I am not sure) Alternatively it may be required to be closed explicitly.
2. Use `ref<void>` for the "cursor" type. Its content is allocated on the heap, opaque to the client, and may be platform-specific. When invalidated, the object remains live, but the content becomes invalid.
* pro: No new types introduced
* con: This implies the data content is on the Mu heap.
* con: The GC must have special knowledge of such a heap object, which is not a regular Mu object.
3. Use `ptr<void>`. Similar to `ref<void>`, but implies it is not GC-ed.
* pro, con: same as `ref<void>`
## Example Mu API
This example prints a stack trace on Mu.
```c
// This trap handler prints the stack trace.
void stack_printing_trap_handler(
MuCtx *ctx, // Equivalent to JNIEnv
MuThreadRefValue thread, // The current thread
MuStackRefValue stack, // The current stack
int wpid, // Watchpoint ID. 0 for ordinary traps.
MuTrapHandlerResult *result, // How the Mu thread should resume?
MuStackRefValue *new_stack, // Which stack shall the Mu thread bind to? Usually the old stack.
MuValue *values, int *nvalues, // What values shall be passed on the stack?
MuRefValue *exception, // What exception shall be thrown on that stack?
MuCPtr userdata) { // Client-specific data
ClientCompiler *clientCompiler = (ClientCompiler*) userdata; // The client-specific compiler.
// Get a cursor to the top of the stack.
MuFrameCursorValue *cursor = ctx->get_stack_cursor(ctx, stack);
// Iterate through.
int func_id;
while((func_id = ctx->cur_func(ctx, cursor)) != ID_OF_MY_STACK_BOTTOM_FUNC) {
if (func_id == 0) { // func_id == 0 means the frame is a native frame.
printf("This frame is native");
} else { // It is a Mu frame.
// Get the ID of the current Mu instruction.
int inst_id = ctx->cur_inst(ctx, cursor);
// The client looks up the source-level information.
SourcePosition sp = clientCompiler->getSourcePosition(inst_id);
printf("File: %s, Function: %s, Line: %d, Column: %d: %d\n",
sp.file, sp.func, sp.line, sp.column);
}
}
printf("End of stack trace\n");
// Close the cursor. (Alternatively let the GC close the cursor.)
ctx->close_cursor(ctx, cursor);
// We want to return to the old stack and continue normally,
*new_stack = stack;
// but do not pass any values.
*nvalues = 0;
// Continue normally (not throwing exception).
return MU_REBIND_PASS_VALUES. // passing 0 values
}
```
# Existing approaches
[libunwind](http://www.nongnu.org/libunwind/) is a portable way to walk stack frames in the C language. There are different implementations on different platforms (OSX has its own implementation), but the API is the same.
`unw_getcontext` creates a `unw_ucontext_t` structure for the current stack. `unw_init_local` creates a `unw_cursor_t` on the context. Then the user can call `unw_step` on the cursor to step through stack frames. `unw_get_reg` gets the value of a machine register from a cursor. The cursor keeps the state of registers (usually it is only able to recover callee-saved registers) at the resumption points (return addreses) of frames.
Example:
```c
#define UNW_LOCAL_ONLY
#include <libunwind.h>
void show_backtrace (void) {
unw_cursor_t cursor; unw_context_t uc;
unw_word_t ip, sp;
unw_getcontext(&uc);
unw_init_local(&cursor, &uc);
while (unw_step(&cursor) > 0) {
unw_get_reg(&cursor, UNW_REG_IP, &ip);
unw_get_reg(&cursor, UNW_REG_SP, &sp);
printf ("ip = %lx, sp = %lx\n", (long) ip, (long) sp);
}
}
```
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/50IR construction API for the convenience of verification2016-06-17T15:24:01+10:00John ZhangIR construction API for the convenience of verification*Created by: wks*
# Problem
The only way for the current API to transfer an IR bundle from a client to a Mu micro VM is via the `load_bundle` API (for example, `microvm.load_bundle(""".typedef @i32 = int<32>\n""")`, or use a binary f...*Created by: wks*
# Problem
The only way for the current API to transfer an IR bundle from a client to a Mu micro VM is via the `load_bundle` API (for example, `microvm.load_bundle(""".typedef @i32 = int<32>\n""")`, or use a binary format), which passes a serialised IR format (text or binary). JVM takes this approach. Its serialisation format is "Java bytecode".
An alternative way to deliver a bundle is to let the client construct each node (type, function, basic block, instruction, ...) in the bundle by calling an API function (for example, `handle = microvm.make_instruction("ADD", i32, op1, op2)`) which returns a handle to the node, then passing this handle to the Mu micro VM. LLVM uses a similar approach: it provides constructors to each node class, and provides a "builder" to conveniently build a CFG.
Since the construction of the IR should be off-loaded from the micro VM, there should be a client-side library (we call it **libmu**) which constructs the IR for the client and provides an API to the client. `libmu` itself talks with the micro VM through implementation-dependent "private" APIs.
It is not clear whether **serialisation** or **construction by calling** is "better" because "better" has many definitions, but *the calling-based API must exist* because it is reported that it is very difficult to verify a parser.
## Comparing the two approaches
**serialisation**
* pros
+ language independent
+ The serialisation format can be standardised and (almost) all languages can access byte arrays.
+ faster than *construction by calling* when the client and the micro VM have different runtime environments
+ (e.g. Haskell to C, Java to C, ...)
+ FFI can introduce considerable cost. The Go lang impl performs a SWAP-STACK operation every time it calls C.
+ simple API
+ one API function `load_bundle` handles the entire IR format.
* cons
- difficult to verify (there are reports that it is very difficult to verify a parser)
- require a parser inside the micro VM:
- there may be a mixed approach to offload the parser outside the micro VM.
- not suitable for the end user, i.e. those who writes the high-level language compiler.
- Either the high-level compiler writer or some third party need to build a library for serialisation.
**construction by calling**
* pros
+ easy to verify
+ minimal for the micro VM
+ No parser in the micro VM is needed.
+ faster than *serialisation* when the client and the micro VM have similar runtime environments
+ (e.g. C to C, Java to Java, C to C++, Haskell to Haskell, ...)
+ The libmu library can be implemented in the same language as the client.
* cons
- bloated API
- There will be hundreds of API functions just to construct each IR node. All of them need to be verified.
- The internal states of libmu needs to be verified, but hopefully this can be easier than verifying the parser.
- paradigm impedance
- There is no single API that satisfies all languages. For example, if the API is defined in C, it may be unsuitable for a client written in Haskell. So ideally there should be a libmu for Haskell, whether that is part of the formally verified Mu/libmu pair or not.
# Details
## Cost of foreign function calls
Depending on the two languages calling each other, the cost can be trivial (such as C and C++) or huge. Java must go through JNI to call native functions. Go performs a swap-stack operation every time it calls C in order to work around blocking system calls in its M*N threading model. Other language implementations, such as Python, relies on `libffi`, which builds a native call by dynamically prepare its arguments, and the high-level language such as Python needs to convert C types to Python types and vice versa.
An experiment shows calling a simplest C function from Haskell introduces 30x overhead comparing to calling the same C function from another C function.
But from verification's point of view, the cost does not matter as long as it is spent in the client.
## Cost of serialisation
Serialisation is not free. In a simple experiment, serialising a CFG-like data structure to an intermediate format and then parse it in another module (both written in C) introduces a 10% overhead comparing to directly constructing the target CFG structure in the receiver while assuming the sender only holds opaque references. The major cost is memory allocation (where malloc is the bottleneck) and resolving the cross-references between nodes (where the hash map is the bottleneck).
Serialisation is not free, but is reasonably cheap. When foreign function call is expensive, serialisation can be used as an alternative.
## Not calling across languages
Since calling across languages is expensive, it is desirable to implement part of the libmu in the same (or similar) language the client is written in, and let libmu construct the data structure that is native to the micro VM.
For example, if the micro VM is written in C++, the libmu should construct a tree of C++ class instances as LLVM does.
Note that LLVM is designed and implemented in C++ and serves C/C++. It is not a problem if the only official API it provides is C++. There is a C binding, too, but it is trivial. However, Mu has a specification which allows multiple implementations. In this case, the micro VM core would not always be in C/C++ or any particular language.
However, if the micro VM is written in a managed language, such as RJava or Java, then it will be interesting:
* If both the client, libmu and the micro VM happen to share the same (or similar runtime), then the API calls can be cheap. The ideal case is being "metacircular", i.e. both the client and libmu run on the same micro VM as the micro VM itself. The cost is minimal.
* If libmu is written in C and provides the C API, then there is a semantic mismatching between the micro VM and libmu: *somewhere in libmu* must cross the line between two different runtimes, which introduces a FFI-like overhead, the cost of which depends on the concrete languages. Holding opaque references of objects in Mu requires such object to be pinned, or being held by some containers (like the `MuCtx` structure in the current API, which is not light-weight).
* If the client and libmu uses a different runtime as the micro VM ("Haskell on libmu on micro VM" is this case), it will pay for two levels of cross-runtime calls.
# Other concerns in design
## Does libmu need to be minimal?
Maybe. At least the verified libmu needs to be minimal.
There may be even higher-level libraries outside above in the client's world outside libmu. Those libraries are not minimal.
## How many languages should be supported?
Ideally the libmu should have both intimate knowledge of a particular Mu implementation, and intimate knowledge of the language the client is written in. In theory, there can be one (or more) libmu for each (micro VM impl, client language) pair.
Since C is so popular, we will define a C API for libmu. In theory,
1. there can be more than one C APIs
2. there can be APIs for other languages and they may or may not look like the C API. (preferably not, to avoid paradigm impedance)
## Mu CFG and the client CFG (or AST)
**How much should the client use the Mu IR CFG?** i.e. should the client construct Mu IR nodes and do transformations on it as LLVM does? Probably not.
LLVM is designed to be maximum: it attempts to be maximum and its CFG contains much information for optimisation, such as the "nsw" and "nuw" flag on the "add" and the "sub" instructions.
But the Mu IR is designed to be minimal and is only designed for the micro VM to consume. It does not contain much information that benefits the client.
It is possible that a client-side library performs IR transformations, but it is doubtful whether that IR is the same Mu IR. Many optimisations, such as whether `x+1 > x` is always true, depends on extra information (such as the "nsw" flag in LLVM) which the Mu IR do not provide.
I @wks believe the Mu IR is only generated as the last step of the client-side transformation, i.e. the next step is to deliver it into the micro VM.
# Towards the new API
## Micro-VM-to-client API
The existing controlling API does not need to be changed.
The bundle loading API can be removed. i.e. "how to load bundles into the micro VM" is implementation-dependent, which actually mean **libmu-dependent**.
## libmu-to-client API
This API needs to be carefully designed because it is part of the formal verification.
There should be a model of the internal states of libmu which includes:
* The set of handles which map to Mu IR nodes.
* The state of each Mu IR node.
* Isolation between threads.
During construction, each node can hold incomplete information at a given time. The Mu IR nodes may circularly refer to each other (currently they refer to each other via IDs), so it is desirable to allocate several nodes and then link them with each other.
If multiple threads can use the same libmu, the transition in its internal state must be properly handled.
Cares must be taken to select the minimum set of API functions, because the current Mu IR has 18 types and 37 instructions. The number of API functions may easily bloat to 100+ if too many CRUD commands are added.
## Micro VM-to-libmu interaction
This interface does not need to be public, but the proper handling of data structures in the micro VM is important. This interface is part of the verification.
In all cases, the choice of languages does matter. Properly chosen languages for the client and the micro VM will result in high performance and verifiability.
# Future languages
This section contains my @wks personal opinions. These affects my opinions on this API, too.
In the future, popular system programming languages will generally be higher-level than current languages (such as C). We already showed that a high-level language, such as RJava and Java, can produce high-quality language runtimes, such as JikesRVM.
Instead of relying on C, high-level system programming languages will gain direct control over low-level operations, such as raw memory access (pointers). These low-level stuff can be done even though the language itself still has very high-level features, such as garbage collection and object-oriented programming. It is possible (in my opinion at least) that eventually such high-level language can replace libc and directly interface with the kernel, thus eliminates the necessity of C except for some very rare cases.
This trend is already visible in several languages. C# already has unsafe pointers and unsafe native interface (P/Invoke). The Java API is mostly implemented in Java, unlike the old days when the standard Java library is mostly implemented in C++. There is also a [JEP to add unsafe native interface in Java](http://openjdk.java.net/jeps/191), which still has not become mainstream yet. However, OpenJDK already exposes some low-level stuff though the `sun.misc.Unsafe` class. RJava obviously used magic to gain low-level support. In Ruby, [ruby-ffi](https://github.com/ffi/ffi) is recommended over directly writing C modules.
With C being redundant, the high-level language (or runtime) may be optimised for internal interoperation (e.g. Java-to-Java calls will be faster and faster) at the expense of the interoperability with C (e.g. even unsafe native call may be costly. Object pinning or opaque handles are required for native programs, which would only run briefly.).
Since raw memory access will be faster while foreign function call will be slower, serialisation may have an advantage over calling (my experiment already shows this for Java and C). But this does not rule out the calling-based API because it may not be foreign calls, in which case it is still faster than serialisation. Ideally, in the meta-circular setting, if both Mu, libmu and the client are in Mu, then Mu-to-Mu function calls are virtually free.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/51WebKit's B3 JIT compiler2016-02-19T14:27:53+11:00John ZhangWebKit's B3 JIT compiler*Created by: wks*
The B3 JIT compiler has received much attention recently.
I started a Wiki page: https://github.com/microvm/microvm-meta/wiki/B3-JIT-%28WebKit%29
Let's summarise B3 and its influences to Mu in the Wiki.
*Created by: wks*
The B3 JIT compiler has received much attention recently.
I started a Wiki page: https://github.com/microvm/microvm-meta/wiki/B3-JIT-%28WebKit%29
Let's summarise B3 and its influences to Mu in the Wiki.