general-issue-tracker issueshttps://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues2018-06-28T23:11:32+10:00https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/21Proposed Lua-like µVM-Client Interface2018-06-28T23:11:32+10:00John ZhangProposed Lua-like µVM-Client Interface*Created by: wks*
In Lua, the C program exchanges values with a "Lua state" using a stack.
1. All Lua values are kept on the stack and can converted to and from C values on demand.
2. All Lua references must be kept on the stack. Al...*Created by: wks*
In Lua, the C program exchanges values with a "Lua state" using a stack.
1. All Lua values are kept on the stack and can converted to and from C values on demand.
2. All Lua references must be kept on the stack. All operations involving tables require the table operand to be on the stack.
Why?
1. The type systems of Lua and C are different. This stack segregates all Lua types from C types.
2. Lua uses garbage collection. (The official Lua uses mark-sweep.) This stack simplifies GC by preventing the C program from keeping a reference to the Lua world.
Reference: The stack in the Lua C API http://www.lua.org/pil/24.2.html
# Overview
The Client interacts with the µVM via a µVM Client Agent. The Agent keeps a stack of µVM values, a thread-local allocator and so on. It is the counterpart of a µVM thread, albeit working for the Client. Each Agent is only accessible from one Client thread (not thread safe), but a µVM Client may have arbitrary number of Agents.
There are several principles:
* The stack holds any µVM value that can be held in a µVM SSA variable. A cell in the stack is like a µVM SSA variable. Unlike the µVM memory, it does not have memory location and cannot be referred to.
* The Client provide implementation-specific way to add Client values into the Agent stack and extracting µVM values to Client values.
* All µVM API messages that take µVM values as parameters shall use existing µVM values on the stack. All µVM API messages that return µVM values shall push new values on the stack.
Example (Java as Client):
```java
MicroVM mvm = …;
ClientAgent ca = mvm.newClientAgent();
// push some values
ca.pushInt("@i32", 0x12345678);
ca.pushLong("@i64", 0x123456789abcdef0L);
ca.pushFloat(3.14F);
ca.pushDouble(6.28);
// The Java integer type and the µVM type does not need to match.
// The following pushes convert Java integer type to µVM types of different lengths
ca.pushInt("@i64", -0x22334455);
ca.pushLong("@i32", 0x123456789abcdef0L); // truncated to 0x9abcdef0
ca.pushInt("@i1", 1); // integer of 1 bit (boolean type)
ca.pushBool("@i1", true); // same as above.
// Retrieving values from the stack
int v1 = ca.toInt(-4); Get the 4th top element from the stack and convert to Java int. So it is -0x22334455.
long v2 = ca.toLong(-4); Same as above, but convert to Java long (zero extended). So it is 0xddccbbabL.
// Popping
ca.pop(7); Pop 7 elements from the stack.
// Memory access
ca.pushGlobal("@global_var"); // Get the internal reference of a global cell and push on the stack
ca.load(MEMORD_ACQUIRE); // Load. Assume the top element is an internal reference.
ca.pushGlobal("@global_var");
ca.pushInt("@i64", 42);
ca.store(MEMORD_RELEASE); // Store. The top element is the new value and the second is the internal reference.
// Calling a µVM function
// This is fairly complicated because this involves creating both a µVM stack and a µVM thread.
ca.pushFunc("@some_func"); // Push a function reference
ca.pushInt("@i32", 42); // Push argument1
ca.pushDouble(3.14); // Push argument2
try {
ca.newStack(2); // Create a new stack, using a function with 2 arguments on the stack.
// So the top 2 elements are arguments and the third element is the function itself.
// Those elements are popped and a new stack value is pushed.
} catch (MicroVMStackOverflowException e) {
...
}
ca.newThread(); // Create a new thread. Assume the top element on the stack is a µVM stack value.
```
# API functions
The principles are
+ Pushing operations add new values to the top.
+ The popping operation removes values from the top.
+ Queries (the operations that convert values back to Java values) are non-destructive. They keep the values.
+ Operations on the top of the stack are destructive: they pop the operands, like the JVM, then push new values on the top.
## Getting the µVM Client Agent
* message: `new_client_agent`
* parameters: none
* returns: a handle of new client agent
Create a new Client Agent.
Example Java signature: `public ClientAgent MicroVM#newClientAgent()`
* message: `close_client_agent`
* parameters:
1. `ca`: the handle to the client agent
* returns: none
Close the Client Agent.
Example Java signature: `public void ClientAgent#close()`
## Pushing new values to the stack
* message: `push_value`
* parameter:
1. `uvm_type`: the µVM type of the value
2. `val`: the value in the Client's representation
* returns: none
* stack top:
+ before: ...
+ after: ..., `val_in_uvm_type`
Convert the Client value `val` to the µVM type `uvm_type` and push it to the stack. This message only work for non-reference values, including integers and floating point numbers.
The µVM may implement this as multiple functions/methods that best suits the Client programming language.
Example Java signatures:
* `public void ClientAgent#pushInt(int uvmTypeID, int val)`
* `public void ClientAgent#pushLong(int uvmTypeID, long val)`
* `public void ClientAgent#pushBigInteger(int uvmTypeID, BigInteger val)`
* `public void ClientAgent#pushFloat(int uvmTypeID, float val)`: will truncate/extend to the µVM type
* `public void ClientAgent#pushDouble(int uvmTypeID, double val)`: will truncate/extend to the µVM type
* `public void ClientAgent#pushFloatNoType(float val)`: always convert to the µVM `float` type
* `public void ClientAgent#pushDoubleNoType(double val)`: always convert to the µVM `float` type.
* message: `push_global`
* parameters:
1. `global`: the ID/name of a µVM global cell
* returns: none
* stack top:
+ before: ...
+ after: ..., `global_val`
Push an internal reference of a global cell to the stack.
Example Java signature: `public void ClientAgent#pushGlobal(int uvmGlobalID)`
* message: `push_func`
* parameters:
1. `func`: the ID/name of a µVM function
* returns: none
* stack top:
+ before: ...
+ after: ..., `func_val`
Push a function reference (µVM's `func` type) of a µVM function to the stack
Example Java signature: `public void ClientAgent#pushFunc(int uvmFuncID)`
## Converting to Client types
* message: `to_client_value`
* parameters:
1. `pos`: the position in the stack
* returns: the µVM value in the client type
* stack top: not changed
Convert a value in the stack to the client type. This applies for non-reference types including integers and floating point numbers.
The µVM may implement this as multiple functions/methods that best suits the Client language.
Example Java signatures:
* `public int ClientAgent#toInt(int pos)`
* `public long ClientAgent#toLong(int pos)`
* `public BigInteger ClientAgent#toBigInteger(int pos)`
* `public float ClientAgent#toFloat(int pos)`
* `public double ClientAgent#toDouble(int pos)`
## Popping
* message: `pop`
* parameters:
1. `num`: the number of values to pop
* returns: none
* stack top:
+ before: ..., `elem_1`, `elem_2`, ..., `elem_num`
+ after: ...
Pop `num` elements from the stack.
Example Java signature: `public void ClientAgent#pop(int num)`
## Memory Allocation
* message: `new`
* parameters:
1. `type`: the ID/name of the µVM type of the object
* returns: none
* stack top:
+ before: ...
+ after: ..., `ref`
Allocate an object of type `type` on the µVM heap and push the object reference on the stack.
Example Java signature: `public void ClientAgent#newObj(int typeID)`
* message: `new_hybrid`
* parameters:
1. `type`: the ID/name of the µVM type of the object
* returns: none
* stack top:
+ before: ..., `len`
+ after: ..., `ref`
Allocate an object of type `type`, which must be a `hybrid` type, on the µVM heap and push the object reference on the stack. The length of the variable part is `len`, which is any µVM integer types zero_extended to the machine word length.
Example Java signature: `public void ClientAgent#newHybridObj(int typeID)`
## Memory Access
* message: `load`
* parameters:
1. `memord`: the memory ordering
* returns: none
* stack top:
+ before: ..., `iref`
+ after: ..., `val`
Load from an internal reference `iref` on the stack and push the loaded value to the stack, using the `memord` memory ordering.
Example Java signature: `public void ClientAgent#load(MemoryOrder memOrd)`
* message: `store`
* parameters:
1. `memord`: the memory ordering
* returns: none
* stack top:
+ before: ..., `iref`, `new_val`
+ after: ...
Store `new_val` on the stack into an internal reference `iref` on the stack, using the `memord` memory ordering.
Example Java signature: `public void ClientAgent#load(MemoryOrder memOrd)`
## Stack and Thread operations
* message: `new_stack`
* parameters:
1. `nparams`: the number of parameters to the stack-bottom function
* returns: none
* stack top:
+ before: ..., `func`, `arg_1`, `arg_2`, ..., `arg_nparams`
+ after: ..., `stack`
Create a new stack using `func` as the stack-bottom function and `arg_x` as its arguments. Push the newly created `stack` value to the stack.
Example Java signature: `public void ClientAgent#newStack(int nParams)`
* message: `new_thread`
* parameters: none
* returns: none
* stack top:
+ before: ..., `stack`
+ after: ..., `thread`
Create a new thread which is initially bound to a stack `stack`. Push the thread value on the Client Agent stack. The new thread `thread` starts execution immediately.
Example Java signature: `public void ClientAgent#new_thread()`
## Other API functions
TODO: define them later
* `is_int`, `is_float`, `is_ref`, `is_iref`, ... `is_stack`, `is_thread`, `is_tagref64`
* `tagref64_is_int`, ... `tagref64_get_ref`, ..., `tagref64_set_fp`... : manipulate the `tagref64` type.
* `copy_value`, `remove_value`: manipulate the Client Agent stack.
* `extract_field`, `insert_field`: manipulate `struct` types
* `extract_element`, `insert_element`: manipulate `vector` types
* `get_iref`, `get_field_iref`, `get_elem_iref`, `shift_iref`, `get_fixed_part_iref`, `get_var_part_iref`: manipulating reference types.
* `get_current_stack`, `kill_stack`, `bind_thread_to_stack`: advanced thread/stack operations
* `get_active_func_version_id`, `get_current_instruction_id`, `dump_keepalive_variables`, `pop_frame`, `push_frame`: for OSR
# Known Issues
This API assumes a stack which can contain ANY µVM types that were applicable for SSA variables. This makes it a dynamically types. This is good for Lua because Lua is dynamic and has a small set of types (only nil, boolean, number, string, table, function, userdata, ...), all of which have similar sizes.
The main problem with the µVM is when there are µVM `struct` type values (especially large structs which, themselves, are bad to be represented as value rather than reference to heap object). Some corner cases include the "complex number" type which can be represented as a struct of two doubles. In any cases, extra type information must be kept for the stack to know the type of all of its elements.
I have to trust the µVM implementation to handle the dynamic typing efficiently.
spec-2https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/15Summary of changes in the Type System and the Instruction Set2018-06-28T23:11:32+10:00John ZhangSummary of changes in the Type System and the Instruction Set*Created by: wks*
# Top-level IR
* Support new OSR mechanism #5
- Function version clause #5
# Type system
* Add vector type. #9
* (REMOVED) Add lock type #11
# Instruction set
* Add "exception clause" #6 #7 #10
...*Created by: wks*
# Top-level IR
* Support new OSR mechanism #5
- Function version clause #5
# Type system
* Add vector type. #9
* (REMOVED) Add lock type #11
# Instruction set
* Add "exception clause" #6 #7 #10
- Also merge instructions: `CALL`/`INVOKE`, `ICALL`/`IINVOKE`
* Basic operations allow vector types and new instructions for vectors #9
- BinOp, Cmp, Conv, `SELECT`, `PHI`, `CALL`, `RET`, `LOAD` (gather), `STORE` (scatter)
- `EXTRACTELEMENT`, `INSERTELEMENT`, `SHUFFLEVECTOR`
* Match the new stack states (`READY<T>` and friends) #7
- Revised `TRAP`/`WATCHPOINT` instruction #7
- Revised `SWAPSTACK` instruction #7
* Revised `ICALL` instruction #9 spec-2https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/14Document memory layout requirements2018-06-28T23:11:32+10:00John ZhangDocument memory layout requirements*Created by: wks*
Because of the difference between architectures, it is better to leave the object layout to be implementation-specific. An implementation of a µVM can choose its optimal strategy to make memory accesses as fast as poss...*Created by: wks*
Because of the difference between architectures, it is better to leave the object layout to be implementation-specific. An implementation of a µVM can choose its optimal strategy to make memory accesses as fast as possible.
However, although the memory layout, is part of the implementation, some guarantees must be made to enable:
1. implementation of object-oriented languages: the superclass-subclass relationship can be most conveniently implemented as the superclass being a prefix of a subclass structure.
2. interoperation between µVM programs and C programs: during a foreign function call, data structures are shared so that data can be passed both ways.
3. some basic data structures must support atomic accesses. One of such type is references.
# Prefix rules
Some type **is a prefix of** another type. If T1 is a prefix of T2, then there are **shared components** between T1 and T2. A component can be the whole value or some part of it, including fields in structs, elements in arrays and vectors and both the fixed part and the variable part in hybrids.
Specifically:
* Any type is a prefix of itself.
+ All corresponding components are shared.
* `void` is trivially a prefix of any type.
+ No component is shared.
* `T1 = T` is a prefix of `T2 = struct <SEQ>` for any T where SEQ is a sequence of types beginning with T.
+ The whole T1 itself is a shared component with the first field in the struct T2.
* `T1 = T` is a prefix of `T2 = hybrid<T U>` for any T.
+ The whole T1 is a shared component with the fixed part in the hybrid T2.
* `T1 = T` is a prefix of `T2 = array<T n>` for any T if n >= 1.
+ The whole T1 is a shared component with the first element in array T2.
* For all types T1, T2 and T3, if T1 is a prefix of T3 and T3 is a prefix of T2, then T1 is a prefix of T2.
+ The shared component between T1 and T2 are their mutual shared components with T3.
Examples:
* `float` is a prefix of `struct<float double>`.
+ The first field is shared.
* `struct<@TIB_REF @LOCK @LENGTH_TYPE>` is a prefix of `hybrid<struct<@TIB_REF @LOCK @LENGTH_TYPE> int<8>>`.
+ The fixed part of the latter type is shared with the former type.
* `int<8>` is a prefix of `array <int<8> 100>`.
+ The first byte element of the latter is shared with the former.
* `@TIB_REF` is a prefix of `hybrid<struct<@TIB_REF @LOCK @LENGTH_TYPE> int<8>>`
+ There is an intermediate type `struct<@TIB_REF @LOCK @LENGTH_TYPE>` that bridges the "is a prefix of" relation.
If:
* There is a memory location M which represents data of type T2, and,
* T1 is a type and T1 is a prefix of T2, and,
* r1 is an `iref<T1>` and refers to memory location M1, and,
* r2 is an `iref<T2>` and refers to memory location M, and,
* the beginning of M and M1 are the same (i.e. the have the same address), and,
* rc1 is r1 or an internal reference derived from r1, and,
* rc2 is r2 or an internal reference derived from r2, and,
* rc1 and rc2 refer to a shared component between T1 and T2,
then rc1 and rc2 refer to the same memory location. This means the shared components can be accessed as if it is a field of a prefix. This allows treating a subclass as an instance of a superclass.
Related standard: C11
* 6.3.2-3: (**array --> pointer to first element**) ... an expression that has type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’ that points to the initial element of the array object and is not an lvalue. ...
* 6.7.2.1-15:(**pointer to struct <--> pointer to first field**) ... A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. ...
TODO: C does not explicitly allow the prefixing between the following structs where their sequences of elements has a "prefix" relation:
```
struct Foo { short a; int b; long c; };
struct Bar { short a; int b; long c; float d; double e; };
```
There may be a reason behind it.
# Object layout and C foreign function interface (FFI)
The object layout *should* follow the application binary interface (ABI) as much as possible because:
* The ABI is carefully designed by system programmers for performance. The µVM implementer needs a good reason why not to follow it.
* When a foreign function call to external C programs is needed, the µVM data structure should already be in the desired layout expected by the C programs.
The µVM only needs to guarantee some compatibility between µVM types and C types in the FFI. It is already documented in the [Instruction Set](https://github.com/microvm/microvm-spec/wiki/instruction-set), but needs to be double-checked.
# Basic data structures that needs atomic accesses
To guarantee memory safety, the µVM must not allow out-of-thin-air reference values or opaque values. Affected types are:
* ref
* iref
* weakref (loaded into SSA variables as ref)
* func
* thread
* stack
* tagref64 (may contain ref)
* futex (one word integer, loaded into SSA variables as plain `int<WORD_SIZE>`)
Storing internal references (`iref`) in the memory (heap or stack or global) is discouraged because of space inefficiency (no better way than encoding them as fat pointers). But if an implementation does allow putting `iref` in the memory, the accesses to them should be atomic. Other types than `iref` are too important not to be implemented as lock-free atomic types (all atomic accesses in µVM are lock-free).
An alternative to requiring `iref` to be atomic is to:
- document that accesses to `iref` in the memory is not atomic, and,
- require the client to compile all memory accesses to `iref` with locks, and,
- have significant performance penalty.
Since a fat pointer consists of two words: an object reference plus an in-object offset, some (I assume there are only very few of them) architectures may not provide atomic access to such a length.
spec-2https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/13Change the term "SSA Value" to "SSA Variable"2018-06-28T23:11:32+10:00John ZhangChange the term "SSA Value" to "SSA Variable"*Created by: wks*
The term "SSA Value" currently used in the µVM is confusing. People tend to think "value" means data value, that is, instances of types and are representable in binary. But what it actually means in the µVM is the **th...*Created by: wks*
The term "SSA Value" currently used in the µVM is confusing. People tend to think "value" means data value, that is, instances of types and are representable in binary. But what it actually means in the µVM is the **things** that can **generate** data value.
# The current conceptual model
The current hierarchy is:
* **SSA Value** (the union of all such things)
- **Global SSA Value** (defined globally, actually constants, written as `@a`, `@b`, `@c`, ...)
+ **Declared constants** (defined literally as constants)
+ **Global memory references** (internal references to the global memory, which never moves. Hence the reference is constant)
+ **Functions** (There is a constant "handle" for each function, which does not change even when a function is redefined. So it is effectively a constant.)
- **Local SSA Values** (defined locally, determined during program execution, written as `%a`, `%b`, `%c`, ...)
+ **Parameters** (passed to functions, determined at each call site)
+ **Instructions** (computations that generate data values, or void, the "unit" value. The resulting data value is determined every time the instruction is executed)
The other confusion is that currently "an instruction **is an** SSA Value". When an instruction uses another instruction as its parameter, like:
```
.const @a <int<64>> = 0x123456789abcdef0
%x = ADD <int<64>> @a @a
%y = SUB <int<64>> %x @a
```
In the above example, `@a`, `%x` and `%y` are all names of SSA Values. In the definition of `%y`, it means "`%y` **is** a SUB instruction and it takes the **computing result of** instruction `%x` and **the data value of** declared constant `@a` as arguments".
When the Client runs a trap handler, the names identify the TRAP instructions.
```
%trap0 = TRAP KEEPALIVE (%a %b %c %d ...)
```
Then the client will see "the `%trap0` instruction caused the trap".
The above model, though internally consistent, confused many people.
# The use in LLVM
* Value
- Argument
- BasicBlock
- InlineASM
- MDNode
- MDString
- User
+ Constant
* BlockAddress
* ConstantInt
* ConstantFP
* ConstantPointerNull
* ConstantStruct
* ConstantVector
* ConstantExpr
- BinaryConstantExpr
- CompareConstantExpr
- ...
* GlobalValue
- GlobalAlias
- GlobalObject
+ Function
+ GlobalVariable
+ Instruction
- BinaryOperator
- CmpInst
- PHINode
- CallInst
- UnaryOperation
+ LoadInst
+ AllocaInst
+ ...
- StoreInst
- TerminatorInst
+ BranchInst
+ ReturnInst
+ InvokeInst
+ ...
- ...
+ Operator
- ...
LLVM defines Value as (http://llvm.org/doxygen/classllvm_1_1Value.html):
> LLVM Value Representation.
>
> **This is a very important LLVM class. It is the base class of all values computed by a program that may be used as operands to other values.** Value is the super class of other important classes such as Instruction and Function. All Values have a Type. Type is not a subclass of Value. Some values can have a name and they belong to some Module. Setting the name on the Value automatically updates the module's symbol table.
>
> Every value has a "use list" that keeps track of which other Values are using this Value. A Value can also have an arbitrary number of ValueHandle objects that watch it and listen to RAUW and Destroy events. See llvm/IR/ValueHandle.h for details.
The µVM's hierarchy is a subtree of the LLVM's Value hierarchy. Similar with LLVM, instructions that does not return values (like ReturnInst, BranchInst, ...) are also a kind of "Value".
# Alternative notation
An SSA Variable is a thing that holds a data value. A data value is an instance of a type.
An SSA Variable has exactly one definition and the definition corresponds to exactly one SSA Variable. That definition can be a declaration of constant, global memory, function, parameter or instruction.
The revised hierarchy is:
* **SSA Variable**
- **Globally defined SSA Variable** (written as `@a`, `@b`, `@c`, ...)
+ defined by **Declared constants**
+ defined by **Global memory references**
+ defined by **Functions**
- **Locally defined SSA Variable** (written as `%a`, `%b`, `%c`, ...)
+ defined by **Parameters**
+ defined by **Instructions**
This does not change the hierarchy. In an implementation, it is practical to let class `Instruction` extend class `SSAValue`. In the documentation, we say, if `%a = ADD <int<64>> %b %c`, then:
* `%a` is an SSA Variable
* `%a` is defined by the `ADD <int<64>> %b %c`
* That instruction is bound to the SSA Variable `%a`
In this example:
```
.const @a <int<64>> = 0x123456789abcdef0
%x = ADD <int<64>> @a @a
%y = SUB <int<64>> %x @a
```
We say, `@a`, `%x` and `%y` are all SSA Variables. `%y` is **defined by** a SUB instruction. It takes the **value held by** the SSA Variables `%x` and `@a` as arguments".
In a trap handler in the Client for the following TRAP:
```
%trap0 = TRAP KEEPALIVE (%a %b %c %d ...)
```
The client will see "the TRAP instruction that **defines** `%trap0` caused the trap".
spec-2https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/11Mechanisms to support thread synchronisation primitives (including blocking l...2016-09-12T17:09:11+10:00John ZhangMechanisms to support thread synchronisation primitives (including blocking locks)*Created by: wks*
In a multi-threaded environment, threads may need to wait for the availability of mutually exclusive resources or asynchronous events. Blocking locks are provided by many programming languages and operating systems as ...*Created by: wks*
In a multi-threaded environment, threads may need to wait for the availability of mutually exclusive resources or asynchronous events. Blocking locks are provided by many programming languages and operating systems as primitives to facilitate threaded programming.
Characteristics of the µVM prevents directly adopting pThread Mutexes.
1. [Dolan et al](http://dl.acm.org/citation.cfm?id=2400695) implemented efficient blocking locks in an environment with green threads implemented with SWAP-STACK. When there is only one native thread, atomic and ordered memory access is not necessary. Even there are multiple native threads, locks can still be implemented with merely an integer, atomic memory operations and a lock-free waiting queue.
2. There may not always be a one-to-one mapping between language-level threads and native threads. (SWAP-STACK-based green threads is one example) The client may also implement m-to-n relationship where logical threads can be scheduled on any native thread. Re-entrant locks are based on the logical notation of "thread", which may not be native threads. However, pThread recursive mutexes depends on native threads.
3. Spin locks can be implemented using only integers and atomic memory accesses.
4. System calls are expensive.
# The proposal
The µVM only needs to provide two primitives: SUSPEND and RESUME
* SUSPEND: suspend the current thread. The thread blocks, but the state of the underlying stack is still ACTIVE.
* RESUME: resume a given thread if it is waiting.
Note: more commonly known as PARK, UNPARK
On Linux, futex provides these two primitives, except the Linux kernel maintains an internal waiting queue. Their WAIT and WAKE are address-based (the address of the lock) rather than thread-based (Which thread should suspend/resume?). The above two primitives can be implemented on top of futex.
We still need to find equivalent counterparts on OSX, Windows and other operating systems. Using pThread mutex for waiting is also an option, albeit expensive. (On Linux, glibc implements pThread mutexes using a mixture of user-level fast paths and the futex)
# proof of concept
It still needs to be shown (by code) whether these two primitives are sufficient.
# yield points
How to suspend another thread? Who inserts yield points where threads can be suspend, the client or the µVM?
spec-2https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/10More instructions with abnormal control flow2018-06-28T23:11:32+10:00John ZhangMore instructions with abnormal control flow*Created by: wks*
There are a few instructions that may not always be executed successfully.
* TRAP, WATCHPOINT, SWAPSTACK, CALL/INVOKE, ICALL/IINVOKE: Some instructions expect exceptions. We already have multiple versions of instruc...*Created by: wks*
There are a few instructions that may not always be executed successfully.
* TRAP, WATCHPOINT, SWAPSTACK, CALL/INVOKE, ICALL/IINVOKE: Some instructions expect exceptions. We already have multiple versions of instructions for those with or without exceptions.
* BinOp: division-by-zero and floating-point exceptions are generated by the hardware, but many programming languages have defined behaviour on those cases (e.g. Java throws language-level exceptions). Some (e.g. FP exceptions) cannot be prevented easily by checking before doing operation
* NEW/NEWHYBRID/ALLOCA/ALLOCAHYBRID: out-of-memory error
* Function calls and the NEWSTACK instruction may call declared but undefined functions.
* Function calls may result in out-of-memory errors.
* LOAD/STORE/CMPXCHG/ATOMICRMW: they may access invalid memory locations, including the `NULL` reference and invalid addresses introduced by array element addressing. The hardware cannot cover all illegal cases, but it may be possible to let the hardware handle common errors (like `NULL` reference dereferencing). See Issue #6
* All intrinsic functions that may have exceptions (covered by the IINVOKE instruction).
# The proposal
Change the grammar so that all such instructions take an optional "exception clause". This generalises the handling of many instructions. Existing instruction pairs that distinguish the awareness of exceptions (CALL/INVOKE, ICALL/IINVOKE) can be merged. The identification of "basic block terminator" instructions now depends on the presence of such clause, not just by the opcode.
Example:
Do not handle exception locally:
```
%entry:
%rv = CALL <@sig> @func (%arg) // continue with the next instruction, or unwind the current frame
%b = ADD <@i64> %rv 1
...
```
Handle exception locally:
```
%entry:
%rv = CALL <@sig> @func (%arg) EXC(%nor %exc) // terminates the current basic block.
%nor:
... // normal destination
%exc:
%e = LANDINGPAD // exceptional destination
...
```
Even simple divisions may go wrong:
```
%entry:
%result = SDIV <@i64> %a %b EXC(%nor %exc)
%nor:
... // continue normally
%exc:
THROW @DivisionByZeroException
```
NOTE: The µVM will not create new exceptions for the client. If the exceptional case is not because of an exception being thrown from a function or another stack (by swap-stack), then it is an error. Errors are undefined behaviours. The following code is fatal when `%b` is zero:
```
%entry:
%result = SDIV <@i64> %a %b // undefined behaviour if %b is zero.
```
# About factored CFG
Factored CFG is used by the JikesRVM. It allows a basic block to have multiple exceptional side-exits, breaking the single-entry single-exit property. Although that will reduce the number of exceptional branches, the traditional CFG is better-understood and is still capable to express all control flows with additional verbosity.
spec-2https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/9Vector operations2016-06-17T15:22:44+10:00John ZhangVector operations*Created by: wks*
Some modern processors provide SIMD instructions. Using them properly can greatly increase the performance of some computations. The µVM should expose them to the user.
LLVM's approach:
* Vector types are first-cla...*Created by: wks*
Some modern processors provide SIMD instructions. Using them properly can greatly increase the performance of some computations. The µVM should expose them to the user.
LLVM's approach:
* Vector types are first-class types. Most instructions which accept scalar values also accept vector values.
* Binary operations are done element-wise.
* Comparison operations are done element-wise, resulting in a vector of one-bit integers.
* Conversions can be done between integer vectors, integer vector to FP vector, but not between different FP types (no fptrunc and fpext for vectors)
* The select instruction works with vector condition with vector values, and also scalar condition with vector values.
* extractelement and insertelement to build and extract element from vectors
* shufflevector: repermutate elements of two vectors
* All other instructions that work with first-class types, including phi, ret, function calls, the contents to load/store.
* LLVM does not address scatter/gather.
Things to be done in the µVM
* Alignment requirements.
* Provide extra intrinsic functions to support machine-provided operations not covered by the `BinOp` µVM instruction. Example: reciprocal, exp, log, abs, ...spec-2https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/8Next milestone2018-06-28T23:11:32+10:00John ZhangNext milestone*Created by: wks*
As we already found many problems in the current µVM design, many changes can be done to improve it. Here is a list of issues to be addressed in the next µVM specification.
- [X] #6 #10 : More instructions can resu...*Created by: wks*
As we already found many problems in the current µVM design, many changes can be done to improve it. Here is a list of issues to be addressed in the next µVM specification.
- [X] #6 #10 : More instructions can result in abnormal control flows. This should be reflected in the instruction set.
- Stick to the standard control flow graph, not factored control flow graph.
- [X] #4 : Use C++11 memory model.
- [X] #7 : Update the swap-stack API.
- [X] #5 : Update the OSR API.
- [X] #9 : Support vector instructions.
- [X] Provide mechanisms (futex) to support blocking locks and other thread synchronisation primitives.
In case there are someone who wants to play with the reference implementation, the current implementation will be branched and the current specification will be forked into another repository.
spec-2https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/7Choose between swap-stack with and without parameters.2016-06-17T15:22:41+10:00John ZhangChoose between swap-stack with and without parameters.*Created by: wks*
Coroutines may communicate with each other by yielding and, at the same time, passing data between stacks. There are two ways this can be done.
1. Allocate memory region in one stack using ALLOCA and let the other s...*Created by: wks*
Coroutines may communicate with each other by yielding and, at the same time, passing data between stacks. There are two ways this can be done.
1. Allocate memory region in one stack using ALLOCA and let the other stack write data in those region.
2. Let the swap-stack operation take parameters.
The first approach is currently appreciated by @wks and @hosking, but (Dolan et al.)[http://dl.acm.org/citation.cfm?id=2400695] proposed the second approach, but does not discuss the difference between the two.
We need to choose the appropriate one.spec-2https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/6Document safe/unsafe memory access instructions2016-06-17T15:22:39+10:00John ZhangDocument safe/unsafe memory access instructions*Created by: mn200*
As per discussion on 22 August, we want to have two forms of every memory accessing instruction. One is "unsafe" and the machine is not required to present a corresponding abstract state to the client. One is "safe...*Created by: mn200*
As per discussion on 22 August, we want to have two forms of every memory accessing instruction. One is "unsafe" and the machine is not required to present a corresponding abstract state to the client. One is "safe", and if a memory exception occurs, the machine commits to giving the client an appropriate picture of the abstract state at that point.
All this discussion needs documenting.spec-2https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/5A heavy-weighted "Frame State Construction" mechanism2016-06-17T15:22:37+10:00John ZhangA heavy-weighted "Frame State Construction" mechanism*Created by: wks*
During on-stack replacement (OSR), it is often desired to save the state of a partially executed function, compile a new optimised version of the function and restore the state to the state before. How to map the old s...*Created by: wks*
During on-stack replacement (OSR), it is often desired to save the state of a partially executed function, compile a new optimised version of the function and restore the state to the state before. How to map the old state to the new state is the job of the Client, but the µVM can provide a new mechanism to make this easier.
Frame State Construction creates a new stack frame and populate it to the state of a partially executed function. The client specifies the values of all SSA Values (at least all live values) in the function and the next µVM instruction to execute. Then the stack can be resumed.
This is an epic powerful mechanism that allows the program to continue at any point of code, but may be difficult to implement.
p.s. I do not want to use the word "restore" because the µVM does not care about the "restore" semantic.
Example (in the C language):
```c
void inc_all(int ar[], long sz) {
int i;
for(i=0;i<sz;i++) {
cont:
ar[i] += 1;
}
}
```
But instead of starting from the beginning, I want to continue from the label "cont" with i = 100. The µVM should let me express something like:
```c
construct_frame(func=inc_all,
next_inst="cond",
local_vars={
ar: SOME_OLD_ARRAY,
sz: SOME_OLD_VALUE,
i: 100
})
```
Similarly in µVM IR, the code should be like:
```uir
.typedef @i32 = int<32>
.typedef @i64 = int<64>
.funcdef @inc_all <void (iref<@i32> @i64)> (%ar %sz) {
%entry:
BRANCH %head
%head:
%i = PHI @i64 { %entry: 0; %body: %i2; }
%cond = SLT @i64 %i %sz
BRANCH2 %cond %body %exit
%body:
%addr = SHIFTIREF <@i32> %ar %i
%old_val = LOAD <@i32> %addr
%new_val = ADD <@i32> %old_val 1
STORE <@i32> %addr %new_val
%i2 = ADD <@i64> %i 1
BRANCH %head
%exit:
RETVOID
}
```
I should be able to let it continue with:
```c
stack.create_new_frame(func = "@inc_all",
next_inst = "%addr", // %addr is actually the instruction's name,
// i.e. the instruction that calculates %addr.
local_vals = {
"%ar": SOME_VALUE,
"%sz": SOME_VALUE2,
"%i": 100,
"%cond": 1, // true
"%addr": WHATEVER, // This will be calculated immediately
"%old_val": WHATEVER, // This will be calculated immediately
"%new_val": WHATEVER, // This will be calculated immediately
"%i2": WHATEVER, // This will be calculated immediately
})
```
**Potential challenges**
1. This needs close collaboration with the code generator, especially the register allocator. This may require stack map (mapping in which machine register or memory location each local SSA Value is stored) at **every instruction** that can potentially be continued from.
* solution1: Add some dedicated "continue point" instruction where the code generator generates stack map. The "continue point" itself is a no-op.
* solution2: Upon request, re-compute the stack map for the desired instruction to continue. This must match the actual function code.
2. Cannot continue before a PHI node or a LANDINGPAD. PHI depends on incoming control flow and LANDINGPAD depends on the exception.
* solution: continue **after** those instructions, instead.
**possibilities**
1. Theoretically all possible states of stack can be constructed, not just "continuing from an instruction", but also a frame that "is calling some function but has not returned", or a frame that "is trapped to the client", or a dead stack.
2. Can we preserve the state of a full stack, quit the program, re-run and re-construct the whole stack again? (persistent program state)
# Alternative solutions for OSR state preserving
## Save states in global variables.
This (problematic) approach is taken by [Lameed et al.](http://dl.acm.org/citation.cfm?id=2451541). The saved states are loaded in the beginning of a newly-compiled function.
Problems:
1. used global variables. bad concurrency.
2. needs to generate code for loading those global variables. Lameed et al. compiles the function twice where those loads are removed in the the second compiling.
## Create a partially-evaluated function
This is a functional approach, similar to the concept of "continuation" in SCHEME. The new function takes no parameter (or arbitrary parameters) and behaves like the "bottom half" of the old function.
Advantage:
1. This "continuation" is just an ordinary µVM function and does not require special mechanisms other than OSR and function definition (not even **re** -definition)
2. At least as fast as Lameed et al.'s approach. Both compiles two versions of the new function: one for continuing and the other for newer fresh calls.
Problems:
1. Requires compiling a one-shot function just for one continuation. This epic "Frame State Construction" may look heavy, but is still lighter than compiling a new function.
2. For imperative programming languages, "continuation" may be difficult to create and may require complex control-flow analysis.
spec-2https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/3The client should see opaque references rather than transparent addresses.2016-06-17T15:22:32+10:00John ZhangThe client should see opaque references rather than transparent addresses.*Created by: wks*
(Discussed in 5 Aug 2014 meeting) The client may hold references to the µVM heap, but the client should see opaque references rather than raw addresses. More specifically,
1. The client may hold actual raw addresses...*Created by: wks*
(Discussed in 5 Aug 2014 meeting) The client may hold references to the µVM heap, but the client should see opaque references rather than raw addresses. More specifically,
1. The client may hold actual raw addresses, but it should consider it opaque. It should access the µVM memory using the µVM-provided API and should not depend on the fact that they are addresses.
2. The client holds indices into a table maintained by the µVM (or keys to a hashtable by µVM) and has to access the µVM memory through the API.
And the µVM should keep track on all references held externally.
In either cases, this behaviour should be documented and implemented accordingly.
spec-2