general-issue-tracker issueshttps://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues2016-06-17T15:22:41+10:00https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/7Choose between swap-stack with and without parameters.2016-06-17T15:22:41+10:00John ZhangChoose between swap-stack with and without parameters.*Created by: wks*
Coroutines may communicate with each other by yielding and, at the same time, passing data between stacks. There are two ways this can be done.
1. Allocate memory region in one stack using ALLOCA and let the other s...*Created by: wks*
Coroutines may communicate with each other by yielding and, at the same time, passing data between stacks. There are two ways this can be done.
1. Allocate memory region in one stack using ALLOCA and let the other stack write data in those region.
2. Let the swap-stack operation take parameters.
The first approach is currently appreciated by @wks and @hosking, but (Dolan et al.)[http://dl.acm.org/citation.cfm?id=2400695] proposed the second approach, but does not discuss the difference between the two.
We need to choose the appropriate one.spec-2https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/5A heavy-weighted "Frame State Construction" mechanism2016-06-17T15:22:37+10:00John ZhangA heavy-weighted "Frame State Construction" mechanism*Created by: wks*
During on-stack replacement (OSR), it is often desired to save the state of a partially executed function, compile a new optimised version of the function and restore the state to the state before. How to map the old s...*Created by: wks*
During on-stack replacement (OSR), it is often desired to save the state of a partially executed function, compile a new optimised version of the function and restore the state to the state before. How to map the old state to the new state is the job of the Client, but the µVM can provide a new mechanism to make this easier.
Frame State Construction creates a new stack frame and populate it to the state of a partially executed function. The client specifies the values of all SSA Values (at least all live values) in the function and the next µVM instruction to execute. Then the stack can be resumed.
This is an epic powerful mechanism that allows the program to continue at any point of code, but may be difficult to implement.
p.s. I do not want to use the word "restore" because the µVM does not care about the "restore" semantic.
Example (in the C language):
```c
void inc_all(int ar[], long sz) {
int i;
for(i=0;i<sz;i++) {
cont:
ar[i] += 1;
}
}
```
But instead of starting from the beginning, I want to continue from the label "cont" with i = 100. The µVM should let me express something like:
```c
construct_frame(func=inc_all,
next_inst="cond",
local_vars={
ar: SOME_OLD_ARRAY,
sz: SOME_OLD_VALUE,
i: 100
})
```
Similarly in µVM IR, the code should be like:
```uir
.typedef @i32 = int<32>
.typedef @i64 = int<64>
.funcdef @inc_all <void (iref<@i32> @i64)> (%ar %sz) {
%entry:
BRANCH %head
%head:
%i = PHI @i64 { %entry: 0; %body: %i2; }
%cond = SLT @i64 %i %sz
BRANCH2 %cond %body %exit
%body:
%addr = SHIFTIREF <@i32> %ar %i
%old_val = LOAD <@i32> %addr
%new_val = ADD <@i32> %old_val 1
STORE <@i32> %addr %new_val
%i2 = ADD <@i64> %i 1
BRANCH %head
%exit:
RETVOID
}
```
I should be able to let it continue with:
```c
stack.create_new_frame(func = "@inc_all",
next_inst = "%addr", // %addr is actually the instruction's name,
// i.e. the instruction that calculates %addr.
local_vals = {
"%ar": SOME_VALUE,
"%sz": SOME_VALUE2,
"%i": 100,
"%cond": 1, // true
"%addr": WHATEVER, // This will be calculated immediately
"%old_val": WHATEVER, // This will be calculated immediately
"%new_val": WHATEVER, // This will be calculated immediately
"%i2": WHATEVER, // This will be calculated immediately
})
```
**Potential challenges**
1. This needs close collaboration with the code generator, especially the register allocator. This may require stack map (mapping in which machine register or memory location each local SSA Value is stored) at **every instruction** that can potentially be continued from.
* solution1: Add some dedicated "continue point" instruction where the code generator generates stack map. The "continue point" itself is a no-op.
* solution2: Upon request, re-compute the stack map for the desired instruction to continue. This must match the actual function code.
2. Cannot continue before a PHI node or a LANDINGPAD. PHI depends on incoming control flow and LANDINGPAD depends on the exception.
* solution: continue **after** those instructions, instead.
**possibilities**
1. Theoretically all possible states of stack can be constructed, not just "continuing from an instruction", but also a frame that "is calling some function but has not returned", or a frame that "is trapped to the client", or a dead stack.
2. Can we preserve the state of a full stack, quit the program, re-run and re-construct the whole stack again? (persistent program state)
# Alternative solutions for OSR state preserving
## Save states in global variables.
This (problematic) approach is taken by [Lameed et al.](http://dl.acm.org/citation.cfm?id=2451541). The saved states are loaded in the beginning of a newly-compiled function.
Problems:
1. used global variables. bad concurrency.
2. needs to generate code for loading those global variables. Lameed et al. compiles the function twice where those loads are removed in the the second compiling.
## Create a partially-evaluated function
This is a functional approach, similar to the concept of "continuation" in SCHEME. The new function takes no parameter (or arbitrary parameters) and behaves like the "bottom half" of the old function.
Advantage:
1. This "continuation" is just an ordinary µVM function and does not require special mechanisms other than OSR and function definition (not even **re** -definition)
2. At least as fast as Lameed et al.'s approach. Both compiles two versions of the new function: one for continuing and the other for newer fresh calls.
Problems:
1. Requires compiling a one-shot function just for one continuation. This epic "Frame State Construction" may look heavy, but is still lighter than compiling a new function.
2. For imperative programming languages, "continuation" may be difficult to create and may require complex control-flow analysis.
spec-2https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/4Should fully stick to C++11 memory model2016-06-17T15:22:34+10:00John ZhangShould fully stick to C++11 memory model*Created by: wks*
Currently the memory ordering primitives are copied from LLVM. As stated by LLVM langref, their memory model is not precisely defined.
> These semantics (ordering) are borrowed from Java and C++0x, but are somewhat ...*Created by: wks*
Currently the memory ordering primitives are copied from LLVM. As stated by LLVM langref, their memory model is not precisely defined.
> These semantics (ordering) are borrowed from Java and C++0x, but are somewhat more colloquial. If these descriptions aren’t precise enough, check those specs (see spec references in the atomics guide).
But the MicroVM should have a precise memory model. The C++11 memory model is the result of a lot of effort and we should stick to it.
Things should be done in the MicroVM:
* Remove the "UNORDERED" memory order. It is designed by LLVM for Java, but the MicroVM will treat data race as undefined behaviour and will leave the security constraints of Java to the client.
- If JVM can be implemented in C/C++, its memory model should also be implementable on a MicroVM with C++11-like memory model.
* Add "CONSUME" memory order. Define the "carries-a-data-dependency-to" relation in MicroVM. Some hardwares are aware of dependency.
* Define the program order (which is a total order per MicroVM thread because MicroVM does not have unspecified parameter evaluation order), the synchronisation order, the synchronises-with and the happens-before relations.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/3The client should see opaque references rather than transparent addresses.2016-06-17T15:22:32+10:00John ZhangThe client should see opaque references rather than transparent addresses.*Created by: wks*
(Discussed in 5 Aug 2014 meeting) The client may hold references to the µVM heap, but the client should see opaque references rather than raw addresses. More specifically,
1. The client may hold actual raw addresses...*Created by: wks*
(Discussed in 5 Aug 2014 meeting) The client may hold references to the µVM heap, but the client should see opaque references rather than raw addresses. More specifically,
1. The client may hold actual raw addresses, but it should consider it opaque. It should access the µVM memory using the µVM-provided API and should not depend on the fact that they are addresses.
2. The client holds indices into a table maintained by the µVM (or keys to a hashtable by µVM) and has to access the µVM memory through the API.
And the µVM should keep track on all references held externally.
In either cases, this behaviour should be documented and implemented accordingly.
spec-2https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/2Multiple versions of the same function2016-06-17T15:22:30+10:00John ZhangMultiple versions of the same function*Created by: wks*
This issue addresses the representation of multiple versions of a function due to function redefinition.
Affects: the MicroVM reference implementation, the MicroVM-Client interface.
Does not affect: the MicroVM IR ...*Created by: wks*
This issue addresses the representation of multiple versions of a function due to function redefinition.
Affects: the MicroVM reference implementation, the MicroVM-Client interface.
Does not affect: the MicroVM IR language
## Background
In the current microvm-refimpl, there are several classes whose relations are as following:
1. Function: represents a callable function. One per function ID
2. CFG: a concrete function definition. Has many basic blocks, each of which has many instructions.
3. A Function has zero or one CFG: if zero, the function is declared but not defined; if one, the refers to the most recent version of the function definition.
4. Stack: represents the contexts of nested function activations.
5. Frame: the context of a function activation
6. A Stack has many Frame
7. A Frame has one Function: the function this frame is created for.
## Problem
After function re-definition, a new CFG is created and `Function.cfg` is set to the new CFG. The old CFG is discarded. This is problematic because:
a. Function redefinition only affects future invocations, but there are existing activations deep in the stack.
b. A frame corresponds to a concrete CFG, not an abstract callable Function. When a Function is redefined, the CFG of an existing Frame remains the same (is not redefined).
c. When a trap in an old version of a function is triggered, the client will introspect the frame which requires the metadata of the old version of a function, i.e. the CFG. It cannot be disposed.
## Example
Assume we have a naive Fibonacci number function:
```
int executeCount = 0;
int fib(int n) { // version 1
if (executeCount++ == 1024) { trap(keepalive=[n]); }
if (n<=1) return n; else return fib(n-1) + fib(n-2);
}
```
In the MicroVM, a dictionary is kept so that
```
functions = { "fib" : <version 1 of fib> }
```
When this is executed for too many times, the trap is triggered and the client decides to redefine `fib` as following:
```
int fib(int n) { // version 2
int a=0, b=1;
while(n--) { int tmp=a+b; a=b; b=tmp; }
return a;
}
```
And in MicroVM:
```
functions = { "fib" : <version 2 of fib> }
```
However, when the trap is triggered again (this is possible for this scenario), the control goes to the client and the client looks up `functions["fib"]` to get the metadata. The frame is still for version 1, but it gets `<version 2 of fib>`. This will cause error.
## solution
Change the object relations so that:
* In 3, a Function not only has the most recent CFG, but also maintains a list of historical CFGs.
* In 7, a Frame no longer has a Function, but has a CFG, instead.
* When introspecting a frame, during trap or by other means, the client gets the concrete version of a function rather than just a function ID.
* Specify that instructions in the newer version cannot reuse the IDs from instructions in the older version (as is already like this in the refimpl) so that all instructions (especially TRAP instructions) have unique IDs through time. Each TRAP instruction can uniquely identify the CFG this instruction is defined in.
All function definitions, i.e. CFGs, are kept alive until the last activation have returned.
## open questions
How to identify a particular version of a function? Does the MicroVM really need an interface for getting a specific version of a CFG? The client certainly has more information than the MicroVM about the program.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/1Must specify protocol for client-uvm sharing of signal handlers2016-06-17T15:22:27+10:00John ZhangMust specify protocol for client-uvm sharing of signal handlers*Created by: mn200*
As per discussion in meeting on 1 August 2014, we think that the uVM should be the only entity making the `signal` system call. Other entities (*i.e.*, clients in practice) can register signal-handler-like objects w...*Created by: mn200*
As per discussion in meeting on 1 August 2014, we think that the uVM should be the only entity making the `signal` system call. Other entities (*i.e.*, clients in practice) can register signal-handler-like objects with the uVM, which promises to pass on signals generated by their code.
This needs to be documented in the spec-wiki, perhaps in a section about client-uVM API.