general-issue-tracker issueshttps://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues2017-04-05T15:34:00+10:00https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/27Finaliser2017-04-05T15:34:00+10:00John ZhangFinaliser*Created by: wks*
NOTE: This proposal may go to the interface of a specific µVM implementation rather than the µVM specification. The µVM can cheat by presenting an "infinite heap" and telling the program no objects are ever reclaimed.
...*Created by: wks*
NOTE: This proposal may go to the interface of a specific µVM implementation rather than the µVM specification. The µVM can cheat by presenting an "infinite heap" and telling the program no objects are ever reclaimed.
# Proposed mechanism
Add an **objects-to-finalise queue**. This queue is maintained internally by the µVM. Each element is a strong reference to an object. It may not be FIFO.
Add a new instruction `@uvm.gc.prevent_death_once (%r: ref<T>) -> void`. Then the next time when the GC decided that the object of `%r` is a garbage, it will put it in the queue. This instruction is *atomic*.
Add a new instruction `@uvm.gc.next_object_to_finalise () -> ref<void>`. This instruction removes one reference from the *objects-to-finalise queue* and return it. The current µVM thread blocks if this queue is empty. This instruction is *atomic*.
## Typical usage
Take Java for example.
+ On allocating any object that has an overridden `finalize` method, the `@uvm.gc.prevent_death_once` instruction is executed on the newly allocated object.
+ There is a background thread (ordinary µVM thread, running Client-supplied µVM IR code) running in a loop, executing the `@uvm.gc.next_object_to_finalise` instruction.
+ When the GC is about to collect the finalisable object, it instead puts the object in the queue. The background thread gets that object and do whatever it wants according to the high-level language semantics.
# Backgrounds
**Java**: Objects with `finalize` methods are finalised. `finalize` is **executed automatically only once** per object. Finalisers are executed in **unspecified threads**, at **unspecified times**, in **unspecified order**, may be executed **concurrently**. This sentence: "Every pre-finalization write to a field of an object must be visible to the finalization of that object." implies some kind of fence.
**Python, Ruby, PHP, ...**: Using naive reference counting, finalisers are deterministically executed when the reference count drops to 0.
# Open questions
+ The blocking mechanism (the instruction `@uvm.gc.next_object_to_finalise` can block) looks redundant given that the µVM already has the Futex interface which blocks a thread. There is no obvious way to wake up a thread waiting on this "queue".https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/25Extended memory model2016-06-17T15:23:23+10:00John ZhangExtended memory model*Created by: wks*
The current memory model is designed according to the C11/C++11 standard. It involves atomic memory access, locks and threads.
However, there are issues which C11 and C++11 do not address. They are:
- [x] Futex: ...*Created by: wks*
The current memory model is designed according to the C11/C++11 standard. It involves atomic memory access, locks and threads.
However, there are issues which C11 and C++11 do not address. They are:
- [x] Futex: Futex `wait` implies an atomic load-compare-sleep operation. (See [spec](https://github.com/microvm/microvm-spec/wiki/memory-model#special-rules-for-futex))
- [x] Trap/watchpoint/osr: They involves stack binding/unbinding. Just make more things explicit. (See [spec](https://github.com/microvm/microvm-spec/wiki/memory-model#special-rules-for-stack-operations))
- [x] Swap-stack: C11 does not address swap-stack, but swap-stack operations in different threads may conflict and visibility issues involved in swap-stack operations should be clarified. (*making racing swap-stack operations undefined behaviours is okay*) (See [spec](https://github.com/microvm/microvm-spec/wiki/memory-model#special-rules-for-stack-operations))
- [X] Foreign language interface: When parallel µVM programs runs together with parallel native programs, we need a way to synchronise the two worlds. This also involves object pinning. See https://github.com/microvm/microvm-meta/issues/37 (It ends up that Mu cannot make much guarantees known that the worst case of native memory access is segmentation fault. This thing must be very implementation-specific.)
Current ideas:
**Futex**: (open question) Should `futex_wake` happen before the next instruction after the `futex_wait` that actually wakes up? **Probably yes.**
```
int shared_state = 42;
atomic_int thread1_wake = 0;
futex thread1_futex;
thread1:
while(load(ACQUIRE, &thread1_wake)!=1) { // OP4
futex_wait(thread1_futex); // OP5
}
int s = shared_state // OP6
thread2:
shared_state = 99; // OP1
store(RELEASE, &thread1_wake, 1); // OP2
futex_wake(thread1_futex); // OP3
```
According to the semantic of RELEASE and ACQUIRE, if OP4 sees the store by OP2, then OP6 must see the store of OP1.
The problem is, if the OP3 `futex_wake` is not guaranteed to happen before the waking of OP5 `futex_wait`, then the next OP4 in the loop may not see the store by OP2 at all, and may go to sleep another time. Then it will be sleeping forever.
**stack binding/unbinding** cannot be atomic. Swap-stack is a combination of two and cannot be atomic, either.
We may require the language implementer to correctly use other synchronisations to make sure racy `swap-stack` operations do not happen.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/24Native Interface (super issue)2016-06-17T15:23:20+10:00John ZhangNative Interface (super issue)*Created by: wks*
This is an outline of issues related to the native interface, that is, interacting with the native world. This topic includes but is not limited to object layout, pointer types, object pinning and foreign function call...*Created by: wks*
This is an outline of issues related to the native interface, that is, interacting with the native world. This topic includes but is not limited to object layout, pointer types, object pinning and foreign function calls. We should open other issues to discuss concrete problems.
* Make a platform-specific Mu specification
+ Address some of the following issues, including object layout, calling convention, ...
+ Draft for AMD64: https://github.com/microvm/microvm-spec/wiki/native-interface-x64-unix
* Type system: (https://github.com/microvm/microvm-meta/issues/34)
+ Raw pointer types
+ Structure types with native/explicit object layout
+ Union types (unlikely to have in Mu)
+ Mapping Mu types to C types and native object layout: (in the [Native Interface](https://github.com/microvm/microvm-spec/wiki/native-interface) chapter in the spec)
* Memory space beyond heap/stack/global
+ Memory spaces with various constraints
- Is it movable, pinnable, has reference, can be referenced to, GC-traced, GC-collected, ...?
+ Object pinning: https://github.com/microvm/microvm-meta/issues/28
- If object pinning is allowed, what does "pin" mean?
* Foreign function interfaces
+ Calling foreign functions from Mu: (The CCALL instruction. See the [spec](https://github.com/microvm/microvm-spec/wiki/native-interface))
+ Calling C functions
+ System calls
+ Calling back to Mu from foreign functions: https://github.com/microvm/microvm-meta/issues/39
+ From C functions
+ Signal handling
The following should be addressed by a higher-level abstraction:
* Loading native libraries
+ The client loads libc and finds the address of `dlopen`, `dlsym`, `dlclose` and `dlerror`. Then the `CCALL` instruction takes care of the rest by calling them.
* Loading "heavier-weight" Mu bundles (currently called MuLF): https://github.com/microvm/microvm-meta/issues/30
The following are not related to the native interface, but are related to raw memory:
* How to expose the address of objects so that the user can analyse the memory behaviour? (This involves profiling, too. We may open a dedicated issue for profiling.)
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/23Micro VM interface changes precipitated by having C as a client2016-06-17T15:23:18+10:00John ZhangMicro VM interface changes precipitated by having C as a client*Created by: eliotmoss*
While there will be a bunch of carefully worked out details that need to follow, I thought I would get the ball rolling with a quick summary / highlights of the Modula-3 approach. (See https://www.cs.purdue.edu/...*Created by: eliotmoss*
While there will be a bunch of carefully worked out details that need to follow, I thought I would get the ball rolling with a quick summary / highlights of the Modula-3 approach. (See https://www.cs.purdue.edu/homes/hosking/m3/reference/m3.html for a Modula-3 reference.)
First, M3 has ordinary *traced* references and also **untraced** ones. REF T is the type of a traced reference to a T, while UNTRACED REF T is the type of an untraced one. Traced vs. untraced is a concept distinct from safe vs. unsafe. Safe use of traced and untraced keep them segregated. Untraced objects come from an explicitly managed storage area. You can use NEW on an untraced reference type to allocate in such an area.
Certain regions of code (interfaces or modules -- probably a bundle is the corresponding thing for the Micro VM) can be marked UNSAFE. Unsafe code can do additional things. These are ones that I remember:
* Cast an expression to an arbitrary type (of the same size in bits) - this clearly allows interconverting between traced and untraced pointers, but also other things; this is called LOOPHOLE(e, T).
* Address arithmetic (Digression: In M3 REFANY is the type of a traced reference to any object, which in safe code requires a dynamically type-checked downcast to turn into a REF T and do more interesting things. The corresponding untraced type is ADDRESS, and it has arithmetic operators.)
* Up/down cast without checking
* Free an untraced reference's referent (the function is called DISPOSE)
Safe code must use traced/untraced only is specific ways, described in the language reference.
We may need to add to this sort of design to allow short unsafe code regions to manipulate ordinary (traced) objects safely in the presence of (say) concurrent GC. Also, an untraced version of iref would make sense for the Micro VM.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/20Client languages in play2016-06-17T15:23:09+10:00John ZhangClient languages in play*Created by: eliotmoss*
Maybe this is also being discussed elsewhere -- if so, we can close this and move the discussion there. Anyway, on the US side we are wondering about the specific suite of client languages being worked on. We h...*Created by: eliotmoss*
Maybe this is also being discussed elsewhere -- if so, we can close this and move the discussion there. Anyway, on the US side we are wondering about the specific suite of client languages being worked on. We have at least one student interested in doing a mapping from Java bytecode to μVM IR, and we also wonder about some of the dynamic languages (Javascript, maybe? I should perhaps come back and edit this after this week's meeting with the students). What about C or C--? For C we could look at lcc, a little C compiler originally designed for a compiler class, I think, but it accepts some standard kind of C, as I recall. gcc is more of a bear, but ought to be on the agenda at some point, no?https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/19Code generator prototyping strategies2016-06-17T15:23:07+10:00John ZhangCode generator prototyping strategies*Created by: eliotmoss*
As we start to ramp up in Amherst, we're pondering how best to get things rolling (i.e., what a good starting point is) and where we want to end up.
**Initial prototype thought**
For our initial prototype w...*Created by: eliotmoss*
As we start to ramp up in Amherst, we're pondering how best to get things rolling (i.e., what a good starting point is) and where we want to end up.
**Initial prototype thought**
For our initial prototype we are thinking about using QEMU (locally pronounced KEE-mew), the Quick Emulator. QEMU is a whole machine emulator (but can also be configured to run as user-mode only) that supports emulating one hardware ISA on another one (and same to same, of course). Beng whole-machine it deals with devices, interrupts, memory mapping, etc., but the part of greatest interest to us is that it takes guest ISA code blocks and dynamically translates them to host machine code, maintaining the translations in a code cache. Thus parts of its design will be great for dynamic and adaptive code generation. QEMU works by translating guest code into QEMU IR, which is then passed to a target code generator. That code generator does a useful level of local register allocation and local optimizations (hence the Quick in Quick Emulator).
For our case, μVM IR would be the guest ISA and our target would the host ISA that QEMU targets. A part of how QEMU does things is that the guest register file is represented as a memory data structure and (to first approximation) the host registers are used for local register allocation. To finer approximation I **believe** QEMU can bind some guest registers to host registers, useful where the host register set is larger. For μVM IR the "registers" are the SSA values and we can probably get some mileage from the SSA form in telling the QEMU register allocator helpful things about when things "die", etc.
Downsides of this prototype approach include:
* Dependence on QEMU
* Code quality may be limited, particularly in that it will tend to push all variables to memory as it crosses from block to block
* It is not clear what the situation is with QEMU and multithreading, though it **may** support concurrent execution
Upsides:
* Fast path to getting native code going
* Useful optimizations, especially for a dynamic environment
* Back-end architecture design that we might want to use a guide in our own work
* Already implements major platforms of interest
**Eventual system design thought**
Down the line we may want something like QEMU's code generation design, with more provision for (optional) higher levels of optimization and register allocation across larger scopes (that is, for whole functions, not just blocks/traces). But their internal IR, code generators, etc., might be a very useful base, as well as having a notion of the code cache design and so forth.https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/17Portability vs exploiting architecture potentials.2016-06-17T15:23:01+10:00John ZhangPortability vs exploiting architecture potentials.*Created by: wks*
> For should the enemy strengthen his van, he will weaken his rear; should he strengthen his rear, he will weaken his van; should he strengthen his left, he will weaken his right; should he strengthen his right, he wil...*Created by: wks*
> For should the enemy strengthen his van, he will weaken his rear; should he strengthen his rear, he will weaken his van; should he strengthen his left, he will weaken his right; should he strengthen his right, he will weaken his left. If he sends reinforcements everywhere, he will everywhere be weak. -- Art of War, by Sun Zi
The µVM is designed to abstract over the hardware, but only acts as a thin layer of abstraction.
There are many differences between underlying hardware platforms. Scroll down to the Appendix or read https://github.com/microvm/microvm-meta/issues/16
# Solutions to differences
As mentioned in https://github.com/microvm/microvm-meta/issues/16, there are three solutions, from the weakest to the strongest.
1. Define the differences as **undefined behaviour**. The client must prevent touching those fields at all cost.
2. Define the differences as **implement-defined behaviour**. The µVM implementation define the behaviour and provide documents/compile-time/run-time checkable mechanisms.
3. **Define the behaviour** in the µVM. The µVM implementation bridges the differences.
The 1st approach is the weakest. It makes it absolutely impossible to make use of any platform-specific features and will introduce excessive checkings to avoid undefined behaviour. So the µVM should lie somewhere in between approach 2 and 3.
# The µVM design goal
The µVM is having several conflicting design goals.
1. µVM is low-level. It is a thin layer over the hardware.
2. µVM is minimal. Anything that can be done efficiently by the Client should be done by the Client.
3. µVM is **portable**.
4. µVM should support high-performance VMs
5. µVM potentially run on resource-constrained devices.
Depending on how to interpret **portable**, there are two different implications:
1. The µVM provide compile-time or run-time flags so that the Client can generate platform-dependent µVM IR code.
2. The µVM IR code is cross-platform. The same µVM IR code should run on all platforms.
The second interpretation is more portable than the first. Overall, the more portable the µVM is, the thicker the abstraction layer is.
Goal 4 can be interpreted in different ways:
1. Highest theoretical possible performance.
2. As high as C for computation-intensive problems.
3. As high as Java for computation-intensive problems.
4. Much higher than Python/PHP/R/<*insert your favourite scripting language here*>
5. As high as Python/PHP/R/...
6. As long as the program eventually terminates.
Depending on the concrete implementation, an average desktop/server µVM should reach between 2 and 3.
Goal 5 requires the µVM and the Client to be simple. In this case, the Client and the µVM cannot perform too much reasoning about the program. This results in sub-optimal performance of emitted code.
# Choices in the µVM
## Behaviour of UDIV/SDIV
1. UDIV/SDIV are implementation-defined, or
2. They always behave like Java (div by 0 is an exception, -0x80000000/-1 = -0x80000000)
## Supported vector sizes
Conflicts:
1. The current µVM IR is very expressive. It can express any vector types, given type T and size n: `vector <T n>`
2. Not all are supported by the architecture
So the µVM should
1. Make it **implementation-defined**, or
2. Only support **some selected vector sizes**, or
3. Support some selected vector sizes **and platform-specific vector sizes**, or
3. Support **all** `vector<T n>`, either using scalar operations or vector instructions.
Rationale:
1. The Client can **probe** the supported vector sizes and can generate code accordingly. So it is the Client's responsibility to choose vectors. This is also a kind of **specialisation** which is usually done by the high-level optimiser.
2. Choosing the appropriate vector size is a kind of **instruction selection** and should be done by the µVM in a platform-specific fashion.
For rationale 1, (DeVito et al)[http://terralang.org/pldi071-devito.pdf] used auto-tuning technique to determine the optimal vector size for vectorised matrix multiplication problem.
For rationale 2, (Jibaja et al)[https://01.org/node/1495] proposed adding SIMD into JavaScript, but only providing 128-bit registers to the programmer. However, the following code is difficult for the µVM to convert to 256-bit vectors:
```
__m128 elems[N];
for (int i=0; i<N; i++) {
elems[i] = vector_add_floatX4(elems[i], constant_vector(1,2,3,4));
}
```
The reason is:
1. It requires extensive control-flow analysis to merge two adjacent 128-bit adding to one 256-bit adding.
2. 256-bit vectors have different alignment requirement. So array __m128[] cannot be treated as the array __m256[].
The following code, however, is easier to convert:
```
__align_to(256) __m128 elems[N];
for (int i=0; i<N; i+=2) {
elems[i] = vector_add_floatX4(elems[i], constant_vector(1,2,3,4));
elems[i+1] = vector_add_floatX4(elems[i+1], constant_vector(1,2,3,4));
}
```
However, the client, when generating such code, **is already aware of** the presence of 256-bit vectors.
## Array indexing
Conflict:
1. Array indexing seems to be platform-independent. It is "begin+index" (index can be negative).
2. It is implemented using address calculation and memory accessing, where "address" is word-sized.
Solutions
1. Use **any integer type** as the index and are sign-extended (used by LLVM. See http://llvm.org/docs/LangRef.html#getelementptr-instruction), or
2. The index must be **word-sized** and the Client must appropriately extend/truncate the index to word-sized (current µVM design)
Rationale:
1. It only involves a signed extension of truncation and the µVM can cheaply do it. However,
2. Since the Client can probe the word size and the Client can generate TRUNC/SEXT instructions, it should be done by the client.
# Portable µVM IR as a subset of the µVM
If we can define **a subset of the µVM** with **defined behaviours** and **reasonable** performance (perhaps calling it "µVM Mobile Edition"), then some simplistic µVM Clients (I assume there will soon be many such Clients) can generate portable µVM IR code.
For implementations that seek ultimate performance, the µVM implementation can implement many **machine-dependent instructions** as a super set of such a "portable" IR. (call it "µVM Enterprise Edition"?)
Alternatively, we can define that "subset" as "The µVM IR ®" and treat the implementation-dependent extensions as "extended µVM".
# Appendex: Examples of differences
There are differences between architectures. For example:
The **word length** is different. Some operations (including array indexing) depends on the word length.
The **integer division** instruction behaves differently among architectures, especially in "division by zero" and "signed integer overflow".
Not all operations on all types perform equally well. 64-bit integer operations perform significantly worse (if possible at all) than 32-bit counterparts on 32-bit machines.
The supported **vector length** of vector instructions is different. This may vary from 64 bits (ARM) up to 512 bits (x86 with AVX) with 128 bits widely supported.
Some instructions are "optional" and the functionality must be software-implemented on those architectures. For example, **UDIV** and **SDIV** in ARMv7.
Although most architectures provide all operations (binary arithmetic and logical, comparison and conversion) mentioned in LLVM, concrete instructions does not work on all data sizes. For example, **conversion from floating point to integer** can only convert between certain data types (float to 32-bit int or 64-bit int, but not other int types.) Not all **vector operations** work for all vector types.
The address size (and indexes) of memory (and array) operations is different among architectures.
Supported data type that can be **atomically loaded/stored**. This affects how the Client is going to implement mutex locks.
Alignment requirement for **vector load/store** operations.
In the native interface/C interface/foreign function interface(FFI)
* The **pointer size** depends on the architecture. This also affects **function pointers** for external C functions.
* The **available system calls** depend on the operating system, the ABI and the processor. The Client must handle the differences between operating systems.
Other issues mentioned in https://github.com/microvm/microvm-meta/issues/16
# Current extension mechanism
An **intrinsic function** (or IFUNC for short) is something that has the similar form of a function, but is treated by the µVM as a regular instruction. The simplest form takes only the name (or ID) of the IFUNC:
```
%result = ICALL @uvm.thread_exit
```
The most complete form takes type arguments, value arguments and an exceptional destination:
```
%result = ICALL @uvm.do_something_weird <@T1 @T2 @T3> (%val1 %val2 %val3) EXC %normal %exceptional
```
Theoretically all binary operations can be defined as IFUNCs. For example:
```
%result = ICALL @uvm.math.sdiv <int<64>> (%lhs %rhs) EXC %normal %div_by_zero_handler
```