general-issue-tracker issueshttps://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues2016-06-17T15:22:37+10:00https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/5A heavy-weighted "Frame State Construction" mechanism2016-06-17T15:22:37+10:00John ZhangA heavy-weighted "Frame State Construction" mechanism*Created by: wks*
During on-stack replacement (OSR), it is often desired to save the state of a partially executed function, compile a new optimised version of the function and restore the state to the state before. How to map the old s...*Created by: wks*
During on-stack replacement (OSR), it is often desired to save the state of a partially executed function, compile a new optimised version of the function and restore the state to the state before. How to map the old state to the new state is the job of the Client, but the µVM can provide a new mechanism to make this easier.
Frame State Construction creates a new stack frame and populate it to the state of a partially executed function. The client specifies the values of all SSA Values (at least all live values) in the function and the next µVM instruction to execute. Then the stack can be resumed.
This is an epic powerful mechanism that allows the program to continue at any point of code, but may be difficult to implement.
p.s. I do not want to use the word "restore" because the µVM does not care about the "restore" semantic.
Example (in the C language):
```c
void inc_all(int ar[], long sz) {
int i;
for(i=0;i<sz;i++) {
cont:
ar[i] += 1;
}
}
```
But instead of starting from the beginning, I want to continue from the label "cont" with i = 100. The µVM should let me express something like:
```c
construct_frame(func=inc_all,
next_inst="cond",
local_vars={
ar: SOME_OLD_ARRAY,
sz: SOME_OLD_VALUE,
i: 100
})
```
Similarly in µVM IR, the code should be like:
```uir
.typedef @i32 = int<32>
.typedef @i64 = int<64>
.funcdef @inc_all <void (iref<@i32> @i64)> (%ar %sz) {
%entry:
BRANCH %head
%head:
%i = PHI @i64 { %entry: 0; %body: %i2; }
%cond = SLT @i64 %i %sz
BRANCH2 %cond %body %exit
%body:
%addr = SHIFTIREF <@i32> %ar %i
%old_val = LOAD <@i32> %addr
%new_val = ADD <@i32> %old_val 1
STORE <@i32> %addr %new_val
%i2 = ADD <@i64> %i 1
BRANCH %head
%exit:
RETVOID
}
```
I should be able to let it continue with:
```c
stack.create_new_frame(func = "@inc_all",
next_inst = "%addr", // %addr is actually the instruction's name,
// i.e. the instruction that calculates %addr.
local_vals = {
"%ar": SOME_VALUE,
"%sz": SOME_VALUE2,
"%i": 100,
"%cond": 1, // true
"%addr": WHATEVER, // This will be calculated immediately
"%old_val": WHATEVER, // This will be calculated immediately
"%new_val": WHATEVER, // This will be calculated immediately
"%i2": WHATEVER, // This will be calculated immediately
})
```
**Potential challenges**
1. This needs close collaboration with the code generator, especially the register allocator. This may require stack map (mapping in which machine register or memory location each local SSA Value is stored) at **every instruction** that can potentially be continued from.
* solution1: Add some dedicated "continue point" instruction where the code generator generates stack map. The "continue point" itself is a no-op.
* solution2: Upon request, re-compute the stack map for the desired instruction to continue. This must match the actual function code.
2. Cannot continue before a PHI node or a LANDINGPAD. PHI depends on incoming control flow and LANDINGPAD depends on the exception.
* solution: continue **after** those instructions, instead.
**possibilities**
1. Theoretically all possible states of stack can be constructed, not just "continuing from an instruction", but also a frame that "is calling some function but has not returned", or a frame that "is trapped to the client", or a dead stack.
2. Can we preserve the state of a full stack, quit the program, re-run and re-construct the whole stack again? (persistent program state)
# Alternative solutions for OSR state preserving
## Save states in global variables.
This (problematic) approach is taken by [Lameed et al.](http://dl.acm.org/citation.cfm?id=2451541). The saved states are loaded in the beginning of a newly-compiled function.
Problems:
1. used global variables. bad concurrency.
2. needs to generate code for loading those global variables. Lameed et al. compiles the function twice where those loads are removed in the the second compiling.
## Create a partially-evaluated function
This is a functional approach, similar to the concept of "continuation" in SCHEME. The new function takes no parameter (or arbitrary parameters) and behaves like the "bottom half" of the old function.
Advantage:
1. This "continuation" is just an ordinary µVM function and does not require special mechanisms other than OSR and function definition (not even **re** -definition)
2. At least as fast as Lameed et al.'s approach. Both compiles two versions of the new function: one for continuing and the other for newer fresh calls.
Problems:
1. Requires compiling a one-shot function just for one continuation. This epic "Frame State Construction" may look heavy, but is still lighter than compiling a new function.
2. For imperative programming languages, "continuation" may be difficult to create and may require complex control-flow analysis.
spec-2https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/53Access to native thread-local memory2018-06-28T23:11:34+10:00John ZhangAccess to native thread-local memory*Created by: wks*
This issue is about accessing thread-local memory/variables defined in native programs (C). One important application is the `errno` variable in C/C++.
This is only slightly related to #52 which introduces thread-lo...*Created by: wks*
This issue is about accessing thread-local memory/variables defined in native programs (C). One important application is the `errno` variable in C/C++.
This is only slightly related to #52 which introduces thread-local storage to Mu itself. There is no intention to force Mu's thread-local storage use the same mechanism as native programs.
Thread-local storage in native programs is highly machine/OS/ABI-dependent. The register used to point to thread-local buffers varies, and maybe not all platform have such register.
One possible workaround could be depending on helper functions written in C or assembly.
But if we want Mu to integrate deeper with native programs (i.e. do things more efficiently), we can define more instructions (probably "common instructions") to give Mu more capabilities, such as getting/setting the value of the FS register. But any such instructions would likely be platform-dependent and probably optional for unsuitable platforms.https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/57Ahead-of-time Compiling & Boot-image2018-06-28T23:11:34+10:00Kunshan WangAhead-of-time Compiling & Boot-imageThis issus is about supporting ahead-of-time compiling and building boot-images.
The Mu (including both the IR and the API) is designed for JIT compiling. Very little is specified about the ahead-of-time compiling scenario. However, r...This issus is about supporting ahead-of-time compiling and building boot-images.
The Mu (including both the IR and the API) is designed for JIT compiling. Very little is specified about the ahead-of-time compiling scenario. However, real-world language VMs (such as the `pypy.exe`, `python.exe` or `java.exe` executable images) are executable images and should be in the system-specific native image format (such as ELF). The image should contain the micro VM and the client. Preferably it should also contain **AoT-compiled core libraries** (such as built-in objec types, `java.lang.Object`) and, in some cases (such as PyPy), the **AoT-compiled interpreter and metacircular client**.
This issue will discuss the following topics:
- Dynamic linking and loading (linking at start-up time by the system linker)
- Symbol resolution (determine the addresses of symbols (such as `write`))
- This will revive an old idea: "load-time constants" (https://gitlab.anu.edu.au/mu/general-issue-tracker/issues/47)
- Proposed [load-time constants](https://gitlab.anu.edu.au/mu/general-issue-tracker/issues/57#note_343): `.const @Xxxx <@T> = EXTERN "write"
- Library dependencies (which `.so` should be loaded?)
- Each ELF or Mach-O file can specify its library dependencies. But this part is extremely platform-specific.
- Could [add a new top-level](https://gitlab.anu.edu.au/mu/general-issue-tracker/issues/57#note_344), but my hypothetical scenarios ([1](https://gitlab.anu.edu.au/mu/general-issue-tracker/issues/57#note_345), [2](https://gitlab.anu.edu.au/mu/general-issue-tracker/issues/57#note_346)) suggest external linkage should be specified in a separate linking step, like: `ld impl_supplied_entry_point.o bootimage.o -l external-lib -o executable`.
- Possible extensions to the API to address boot-image building
- What should be in the boot image?
- This is very client-specific. It's determined by how the client is implemented, metacircular or not.
- How to determine what is in a boot image?
- Probably using a whitelist. The client can always record all necessary things.
I will consider the following scenarios:
1. VM with non-metacircular client (No active project. My obsolete [js-mu](https://gitlab.anu.edu.au/mu/obsolete-js-mu) was an example).
2. AoT compiling Mu IR program into the boot image. (RPySOM interpreter as an RPython program)
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/35Alternative LISP-like Mu IR format2015-06-22T16:04:08+10:00John ZhangAlternative LISP-like Mu IR format*Created by: wks*
Problem: Mu IR needs a parser, but constructing a parser is tedious. Parser generators pulls in additional dependencies.
Solution: Use a simplistic syntax based on LISP.
Example:
```scheme
(typedef @i32 int 3...*Created by: wks*
Problem: Mu IR needs a parser, but constructing a parser is tedious. Parser generators pulls in additional dependencies.
Solution: Use a simplistic syntax based on LISP.
Example:
```scheme
(typedef @i32 int 32)
(typedef @float float)
(typedef @void void)
(typedef @refvoid ref @void)
(typedef @foo struct @i32 @i64 @float @double @refvoid)
(funcsig @f_sig @i32 (@i32 @i32))
(const @FORTY_TWO @i32 42)
(const @DOUBLE_FORTY_TWO @double 42.0d)
(const @SOME_STRUCT_CONST @some_struct @const1 @const2 @const3)
(const @NULLREF @refvoid NULL)
(global @errno @i32)
(funcdecl @write @write_sig)
(funcdef @write @write_v1 @write_sig (%p0 %p1 %p2)
(basic-block %entry
(inst %a (ADD @i32 %p0 %p1))
(inst %b (CALL @sig @callee (%arg1 %arg2 %arg3) (exc %nor %exc) (keepalive %v1 %v2 %v3)))
)
(basic-block %nor
(inst _ (SUB @i32 %p0 %p2)) ; unnamed instruction
(inst _ (BRANCH %exit))
)
(basic-block %exc
(inst _ (TRAP @void))
)
(basic-block %exit
(inst _ (@uvm.thread_exit)) ; COMMINST is no longer necessary because the syntax is already dynamic
)
)
```
**How would this benefit the Mu implementer?** The parser can be written by hand in very few lines of code. This is convenient for languages that has less capabilities (such as C which does not handle complex type hierarchies easily).
**How would this benefit client implementers?** The code generator can be more typed (using structured nested lists), rather than constructing arbitrary strings (using string formatting).
**Binary format?** There can be a simpler and direct mapping between the text format and the binary format. For example, atoms can be encoded as a hash code, and a list can be encoded as a type, a length and a list of values. Mu spec no longer needs to define a text format and a binary format separately.
Problems?
Does not look like assembly.
May be less readable than the current text format without aggressive pretty-printing.
Extra validation should be performed by the parser. (Really? The Mu micro VM is not required to correct any errors. Any error is allowed to have undefined behaviours.)
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/72Alternative serialisable format (such as JSON/YAML/XML/...)2018-06-28T23:11:33+10:00Kunshan WangAlternative serialisable format (such as JSON/YAML/XML/...)I am glad to see the [mu-tool-compiler](https://gitlab.anu.edu.au/mu/mu-tool-compiler) project existing.
I have conjectured having an alternative serialisable and human-readable format to the current text-based IR. In fact, the text-ba...I am glad to see the [mu-tool-compiler](https://gitlab.anu.edu.au/mu/mu-tool-compiler) project existing.
I have conjectured having an alternative serialisable and human-readable format to the current text-based IR. In fact, the text-based Mu IR is a thing that I am unhappy with. It has various problems.
- It requires a dedicated parser, which has to be implemented by hand.
- When new features are added, the grammar changes, and the parser needs to be modified.
- The text-based IR is confined by aesthetic considerations, and has many inconsistencies. For example:
- The reason why `.funcdef ... <@sig>` has a signature is because it also works as a syntax sugar, using which a human writer only needs to write a `.funcdef` to create both a function and its first version.
- As a convention, types and signatures in Mu instructions are in angular brackets, such as `ADD <@i32> %x %y`. But instructions may have more than types and signatures. One example is `GETFIELDIREF`. It has a integer literal argument. But the current form `GETFIELDIREF <@type 3> %ref` is ugly. The number `3` looks out of place.
I suggest there should be a Mu IR format in a well-known structured data format, such as JSON, YAML, XML, and so on.
Related work:
- LLVM yaml2obj: http://llvm.org/docs/yaml2obj.html
Potential advantages:
- There are mature open-source parsers available.
- Easy to extend.
- Easy to specify (in mu-spec).
For example, if we want to add an externally-usable symbol to an exposed function, we only need to add a property, not redesigning the grammar:
```yaml
name: foo
func: func
callconv: DEFAULT
cookie: cookie
symbol: externally_visible_symbol # This is an added property
```
It is easy to specify because we can define the IR as an (abstract) object tree with properties, similar to [how the HTML5 DOM is defined](https://html.spec.whatwg.org/multipage/dom.html#elements-in-the-dom).
There are also potential disadvantages:
- More verbose
- Less human-readable than the current text form, but human readability should not be the primary concern.
XML example:
```xml
<bundle>
<type id="i8" ctor="int" length="8" /> <!-- note: XML ID is actually a name -->
<type id="i32" ctor="int" length="32" />
<type id="i64" ctor="int" length="64" />
<type id="pi8" ctor="uptr" type="i8" />
<type id="ppi8" ctor="uptr" type="pi8" />
<type id="refi32" kind="ref" type="i32" />
<funcsig id="mainsig" />
<paramty type="i32" />
<paramty type="ppi8" />
<retty type="i32" />
</funcsig>
<const id="I32_42" type="i32" value="42" />
<const id="I64_0" type="i64" value="0" />
<global id="errno" type="i32" />
<funcdecl id="main" sig="mainsig" />
<funcdef func="main" />
<bb lname="entry"> <!-- lname = local name -->
<param type="i32" lname="argc" />
<param type="ppi8" lname="argv" />
<inst opcode="ADD" flags="V" type="i32" opnd1="%argc" opnd2="@I32_42">
<result lname="res" />
<result lname="ovf" />
</inst>
<inst opcode="CALL" sig="some_sig" callee="some_callee">
<arg val="argc" />
<result lname="r1" />
<nor-dest name="bb2">
<pass-value val="r1" />
</nor-dest>
<exc-dest name="bb3" />
</inst>
<inst opcode="SWAPSTACK" swappee="%some_hypothetic_stack">
<return-with>
<result type="i32" lname="ss_res1" />
<result type="i32" lname="ss_res2" />
</return-with>
<pass-values>
<pass-value type="i32" val="%res" />
<pass-value type="i32" val="%r1" />
</pass-valuse>
</inst>
<!-- more instructions here -->
</bb>
<bb lname="bb2">
<param type="i32" lname="r1" />
<!-- more instructions here -->
</bb>
<bb lname="bb3">
<exc-param lname="exc" />
<!-- more instructions here -->
</bb>
</funcdef>
<expose id="exposed_main" symbol="c_callable_symbol_of_exposed_main"
func="main" callconv="DEFAULT" cookie="@I64_0" />
</bundle>
```
A YAML example:
```yaml
types:
- name: i8
ctor: int
length: 8
- {name: "i32", ctor: "int", length: 32}
- {name: "i64", ctor: "int", length: 64}
- {name: "double", ctor: "double"}
function_signatures:
- name: "mainsig"
paramtys: ["i32", "ppi8"]
rettys: ["i32"]
constants:
- {name: "I32_42", type: "i32", value: 42}
- {name: "I64_0", type: "i64", value: 0}
- {name: "D_0", type: "double", value: 0.0}
- name: "D_NAN"
type: "double"
value_from_int: 0x7ff0000000000001
globals:
- {name: "errno", type: "i32"}
functions:
- name: "main"
sig: "main_sig"
initial_version:
- bbname: "entry"
params:
- {type: "i32", lname: "argc"}
- {type: "ppi8", lname: "argv"}
insts:
- {opcode: "ADD", flags: "V", type: "i32", opnd1: "%argc", opnd2: "@I32_42",
results: ["res", "ovf"]}
- opcode: "CALL"
sig: "some_sig"
callee: "some_callee"
args: ["%argc"]
results: ["r1"]
nor_dest:
bb: "bb2"
pass_values: ["%r1"]
exc_dest:
bb: "bb3"
- opcode: "SWAPSTACK"
swappee: "%some_hypothetic_stack"
ret_with:
- {type: "i32", lname: "ss_res1"}
- {type: "i32", lname: "ss_res2"}
pass_value:
- {type: "i32", val: "%res"}
- {type: "i32", val: "%r1"}
# more instructions here
- bbname: "bb2"
params:
- {type: "i32", lname: "r1"}
insts:
# more instructions here
- bbname: "bb2"
excparam: "exc"
insts:
# more instructions here
exposed_functions:
- name: "exposed_main"
symbol: "c_callable_symbol_of_exposed_main"
func: "main"
callconv: "DEFAULT"
cookie: "@I64_0"
```
LISP:
```lisp
(type i8 int 8)
(type i32 int 32)
(type i64 int 64)
(type pi8 ptr i8)
(type ppi8 ptr pi8)
(funcsig mainsig (i32 ppi8) (i32))
(const I32_42 i32 42)
(const I64_0 i64 0)
(global errno i32)
(funcdecl main main_sig)
(funcdef main.v1 main
(bb entry ((i32 argc) (ppi8 argv))
(ADD i32 %argc @I32_42 res
((C carry)
(V ovf)
))
(CALL some_sig some_callee (%argc) (r1)
((bb2 (%r1)) (bb3)))
(SWAPSTACK %some_hypothetic_stack
(ret-with ((i32 ss_res1)
(i64 ss_res2)))
(pass-values ((i32 %res)
(i32 %r1))))
(bb bb2 ((i32 r1))
# More instructions here
)
(bb bb2 (exc)
# More instructions here
)
(expose exposed_main main DEFAULT @I64_0
((symbol "c_callable_symbol_of_exposed_main")))
```https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/59API: Automatic creattion of MuInstResNode2018-06-28T23:11:31+10:00Kunshan WangAPI: Automatic creattion of MuInstResNode# Problem
Currently instruction results are *added* to the instruction after the instruction is created.
Alternatively, results can be created *when* the instruction is created. When taking this approach, the client will never add th...# Problem
Currently instruction results are *added* to the instruction after the instruction is created.
Alternatively, results can be created *when* the instruction is created. When taking this approach, the client will never add the wrong number of results to any instructions.
In theory, the client knows everything the micro VM knows about the IR bundle, so it is always capable of adding the right number of results. Changing the API in the following way will let the micro VM provide slightly more help to the client, but should not affect the performance: the micro VM always has the required information.
Minimalism will be affected if more query API (finding the opcode/mnemonic of instructions, the instruction parameters, etc.) is added. The client builds the IR, and always has full information of the structure of the CFG. So the client is always capable of pretty-printing what it is going to build without using any of the Mu API functions. For convenience, however, query API could be provided at a higher level by client-side libraries, while the C-based API provides an **efficient** (rather than rich) tool to transfer the information from the client to the micro VM. For debug purpose, the reference implementation can now print the loaded bundle as text.
# Proposed change
Replace the `new_inst_res` API function:
```c
MuInstResNode (*new_inst_res )(MuCtx *ctx, MuInstNode inst);
```
with these :
```c
MuInstResNode (*get_inst_res)(MuCtx *ctx, MuInstNode inst, int index);
int (*get_num_res)(MuCtx *ctx, MuInstNode inst);
```
`get_inst_res` will get the *index-th* result from an instruction. `get_num_res` gets the number of results.
Other API functions do not need to be changed.
# About the number of results
**Can the number of results be determined when constructing the instruction?**
Checklist ( *instruction*: *numOfResults* ) :
- binary ops: 1
- comparing: 1
- conversion: 1
- SELECT: 1
- BRANCH: 0
- BRANCH2: 0
- SWITCH: 0
- CALL: the number of results in the signature
- TAILCALL: 0
- RET: 0
- THROW: 0
- EXTRACTVALUE: 1
- INSERTVALUE: 1
- EXTRACTELEMENT: 1
- INSERTELEMENT: 1
- SHUFFLEVECTOR: 1
- NEW: 1
- NEWHYBRID: 1
- ALLOCA: 1
- ALLOCAHYBRID: 1
- GETIREF: 1
- GETFIELDIREF: 1
- GETELEMIREF: 1
- SHIFTIREF: 1
- GETVARPARTIREF 1
- LOAD: 1
- STORE: 0
- CMPXCHG: 2 (succ, oldVal)
- ATOMICRMW: 1
- FENCE: 0
- TRAP: as many as the length of the type parameters.
- WATCHPOINT: as many as the length of the type parameters.
- WPBRANCH: 0
- CCALL: the number of results in the signature. Same as CALL
- NEWTHREAD: 1
- SWAPSTACK:
- 0 if `KILL_OLD`
- n if `RET_WITH<T1 T2 ... Tn>`
- COMMINST: currently all comminsts have fixed number of return values.
Kunshan WangKunshan Wanghttps://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/66Are bundles the unit of compiling or the unit of loading?2018-06-28T23:11:33+10:00Kunshan WangAre bundles the unit of compiling or the unit of loading?# Two different views of "bundle"
From the compiler's point of view, compiling is a process where:
1. There are many modules (such as .class files), all of them needs to be compiled (to Mu IR, for example). Modules may have inter-d...# Two different views of "bundle"
From the compiler's point of view, compiling is a process where:
1. There are many modules (such as .class files), all of them needs to be compiled (to Mu IR, for example). Modules may have inter-dependencies, and there could even be mutual recursions (A imports B, B imports C, and C imports A).
2. Compilers should compile each module separately, with no knowledge of other modules. This implies each module compiles to a stand-alone bundle that has all the necessary things (types, functions, ...) used inside the bundle. Since types use "structural equivalence", it does not matter if two structurally isomorphic types are defined twice in two bundles. In particular, functions can be declared multiple times in different bundles.
3. When they are linked, bundles are merged. Different bundles should have no intersections, with one exception: Functions of the same name are resolved to be the same, so calling a declared but not defined function may ends up calling a function defined in another bundle. (Global cells should have similar properties, too.) We also allow calling functions that are declared but not defined in any other bundles. which triggers lazy loading.
In the current Mu design, the micro VM's view of bundles is:
1. There is a global bundle, which includes everything that is ever loaded at a point of time.
2. Bundle is the unit of loading. Bundles are loaded sequentially. (At least it is perceived to be so through the API. The Mu impl can load them in parallel while ensuring sequential consistency.)
3. Each bundle can refer to things (types, functions, ...) defined in the current bundle or the global bundle.
4. Every time a bundle is loaded, the contents (types, functions, ...) are merged into the global bundle. That is, *the is one single global bundle which gets gradually augmented as bundles are loaded*. Conflicts are not allowed. If a new function version (FuncVer) is defined on an existing function, this FuncVer becomes the "most recent version" of the function.
The main difference between the two views is whether we consider bundle loading to be a static, separated and parallel process, or a dynamic and sequentially inter-dependent one.
## Why is Mu designed like this?
The current Mu design is based on that (1) Mu is a run-time JIT compiler, and (2) Mu supports function re-definition. Because Mu is a run-time entity, it is one single thing that lives through the life of the application. It will observe all things the client ever deliver to it (bundles), and this is a temporal process. The micro VM always starts with no knowledge, and the client "teaches" the micro VM more and more knowledge by loading bundles. So the "global bundle" represents the "current knowledge" the micro VM has about the world (i.e. the types, functions, ... of the client's language). Since the growth of knowledge is a sequential process, it is natural to assume bundles are loaded in a sequence. In this way, if a later bundle refers to things the micro VM already knows (for example, types defined in previously loaded bundles), then it does not need to define/declare them again because Mu already knows it, so the bundle can just refer to them by name/ID. The sequential nature also makes it easy to support function re-definition. Since there is a sequence in bundles, a FuncVer in a newer bundle will replace the current "most recent" version in the global bundle.
The separate-compiling approach is the traditional and well-known way how the C compilers work. And it does not address function re-definition. Re-definition is still an "action" rather than a declaration, and the order of "which FuncVer invalidates which older FuncVer" does matter.
## What the client may want
But compiler (traditional C compiler or Mu client) writers may want a certain degree of flexibility of parallel compilation, and some aesthetic appeal that "**separate modules should be compiled to separate Mu bundles**". For example, as a JVM client, it will be more intuitive to generate one Mu IR bundle for each .class file, and each .class file can be compiled separately, and still allow lazy loading. For example:
```java
//// Foo.class
public class Foo {
public static void run() { Bar.run(); }
}
//// Bar.class
public class Bar {
public static void run() { Foo.run(); }
}
```
The separate-compilng model will deliver two Mu bundles:
```
//// Bundle1:
.typedef @Foo = ....
.funcdef @Foo.run VERSION %v1 ... {
...
CALL @Bar.run()
}
.funcdecl @Bar.run ... // Declare @Bar.run in Bundle1
//// Bundle2
.typedef @Bar = ....
.funcdef @Bar.run VERSION %v1 ... {
...
CALL @Foo.run()
}
.funcdecl @Foo.run ... // Declare @Foo.run in Bundle2
```
That is, `@Bar.run` is declared in Bundle1 and `@Foo.run` is declared in Bundle2. They declare functions in each other because neither has knowledge of the other.
However, in the current Mu model, the two bundles will look like:
```
//// Bundle1:
.typedef @Foo = ....
.funcdef @Foo.run VERSION %v1 ... {
...
CALL @Bar.run()
}
.funcdecl @Bar.run ... // Declare @Bar.run in Bundle1
//// Bundle2
.typedef @Bar = ....
.funcdef @Bar.run VERSION %v1 ... {
...
CALL @Foo.run()
}
```
The difference is subtle: Bundle2 does not declare `@Foo.run`, because it knows Bundle1 is loaded before it, and `@Foo.run` is already defined.
It is arguable that this will require two bundles to be built sequentially. But it can be worked around by "lifting" both declarations in to a third bundle:
```
//// Bundle0:
.funcdecl @Bar.run ... // Declare @Bar.run in Bundle1
.funcdecl @Foo.run ... // Declare @Foo.run in Bundle2
//// Bundle1:
.typedef @Foo = ....
.funcdef @Foo.run VERSION %v1 ... {
...
CALL @Bar.run()
}
//// Bundle2
.typedef @Bar = ....
.funcdef @Bar.run VERSION %v1 ... {
...
CALL @Foo.run()
}
```
Declaring functions is faster than defining. After Bundle 0 is loaded, Bundle 1 and Bundle 2 can be built and loaded in parallel.
It is also arguable that "lifting both declarations into a separate bundle" is a redundant step. But in practice, this step cannot be avoided. Still take Java as example. If one Java ClassLoader visits both Foo.class and Bar.class, then it already knows both classes, and it can simply build both into a single Mu bundle rather than splitting them into two. If two Java ClassLoaders attempt to load Foo and Bar in parallel, and they found the inter-dependency, but also found each other working on the two respective .class files simultaneously, then the ClassLoaders need certain synchronisation mechanism so that classes are not loaded twice. This is necessary even in existing non-Mu productional JVMs. So *if there are needs for compiling two Java classes in parallel and they have inter-dependencies, then the client has to factor out the common parts, which naturally leads to the "Bundle0"*.
An orthogonal issue is about the type system. Assume we have the two Java classes:
```java
class Foo { Bar bar; }
class Bar { Foo foo; }
```
Naturally `@Foo` should be `struct<@JavaHeader ref<@Bar>>`. However, without looking at bar.class, we cannot define the type `@Bar` which is supposed to match the structure of the Java class fields in Bar. So if we enforce lazy loading, then Foo.bar has to be represented as `ref<void>` rather than `ref<@Bar>`. This has been [discussed in a separate issue before](https://gitlab.anu.edu.au/mu/general-issue-tracker/issues/38). **The separate-compiling model does not solve this problem** because the crux is that the **knowledge** of Bar is only obtained by looking at Bar.class. Unlike declared-but-not-defined functions, having **types** that are not yet known (the C language calls it "incomplete type") will cause many problems. These types are inaccessible. If traps should be triggered when a type is used, it is hard to define what it means by "a type is used". If we define it as accessing an object that has that type, or simply performing BinOp on such types, then almost all instructions can trigger traps.
## Conclusion
In the end, we still believe the current Mu design is reasonable for its purpose as a JIT compiler.
The current "single global bundle" design is also easier for the boot image writer because there is only one bundle to consider.
But we may consider the needs of programming language implementers that "modular languages should be compiled to modular object code". The implication of adopting this model is still not clear. Alternatively, this model could also be implemented in a layer above Mu.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/56ASM-style IR builder2018-06-28T23:11:34+10:00John ZhangASM-style IR builder*Created by: wks*
This issue discusses a higher-level abstraction over the IR builder API. It will allow the client to construct Mu IR CFG in a stateful style. The stateful builder will hold a pointer to the "current basic block" at any...*Created by: wks*
This issue discusses a higher-level abstraction over the IR builder API. It will allow the client to construct Mu IR CFG in a stateful style. The stateful builder will hold a pointer to the "current basic block" at any time. New instructions are implicitly appended to the end of the current basic block. Such interface can also emulate fall-through-style ASM instructions, such as JL, JE, JNE, etc.
It is a layer above the API. The muapi.h should still be kept minimal.
There is a problem in implementation. Such builder is easy to build in SSA form, but since we have switched to the "goto-with-values" form, more book-keeping needs to be done in the client. Probably we still need a soup of objects in the client and do liveness analysis and convert SSA to goto-with-value.https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/71Benchmarking framework for Mu projects2018-06-28T23:11:31+10:00Zixian CaiBenchmarking framework for Mu projectsWe'd want a benchmarking framework to run various tests across different Mu projects (implementations, clients).
This will help us discover bugs or performance problems introduced in commits. It can also facilitate development by know...We'd want a benchmarking framework to run various tests across different Mu projects (implementations, clients).
This will help us discover bugs or performance problems introduced in commits. It can also facilitate development by knowing, for example, what the impact of decisions are.
Currently, we have a benchmark suite developed by @u5157779 under https://gitlab.anu.edu.au/mu/mu-impl-fast/tree/master/tests/test_jit
It compares the performance of
- RPython with Mu backend (which can be run on different Mu implementations)
- RPython with C backend
- Hand-written C
- Hand-written Mu (which can be run on different Mu implementations)
Some of these are specific to a certain Mu client (in this case, Mu backend of RPython).
For a more general framework, following client-neutral aspects can be abstracted out
- Collecting, storing and processing metrics
- Visualizing
Then, each project can have its own agent for the framework to invoke for executing tests and collecting results.
cc @U1817699 @u3789498 @u4776528 @u5157779Zixian CaiZixian Caihttps://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/32Better support for the tagref64 type2015-05-21T17:20:07+10:00John ZhangBetter support for the tagref64 type*Created by: wks*
In dynamic languages, the `tagref64` type (or other future tagged reference type variants) will be used pervasively in the language runtime.
This issue summarises potential improvements on the support for such types...*Created by: wks*
In dynamic languages, the `tagref64` type (or other future tagged reference type variants) will be used pervasively in the language runtime.
This issue summarises potential improvements on the support for such types.
# Tagged reference constant
The Mu IR currently does not have a constant for `tagref64` mainly because it may holds a reference and non-NULL references cannot be constant. However, one possible use of the `tagref64` type is to store a NULL reference together with an `int<6>` tag. In this case, the tag determines the concrete thing it is representing (undefined, nil, null, false, true or other frequently used singleton objects). So it should be possible in Mu to create such `tagref64` as a constant.
Proposed new syntax:
```
.const @name <@tagref64> = TR64FP @double_constant
.const @name <@tagref64> = TR64INT @int52_constant
.const @name <@tagref64> = TR64NULLTAG @int6_constant // The ref is NULL, the tag is @int64_constant
.const @double_constant <@double> = 3.14d
.const @int52_constant <@i52> = 0x123456789abcd
.const @int6_constant <@i6> = 30
```
# Tagged reference equality
Comparing floating point numbers bit by bit is not equivalent to IEEE754's definition of "equality". However, when two `tagref64` values both holds integers or references+tags, the result is deterministic.
In dynamic languages, such comparisons can quickly determine whether two tagged references have the same type (identified by the tag part) and refers to the same object.
Proposed semantic of `EQ` comparison between `tagref64` values:
The result of the `EQ` comparing instruction between `v1` and `v2` is 1 (true) if and only if any of the following is true:
* Both holds `double` values, and
* neither were NaN and both have the same bit-wise representation, or
* both are NaN and they happen to have the same bit-wise representation after converted to `tagref64`.
* Both holds `int<52>` values and they are bit-wise equal.
* Both holds references, and
* their references refer to the same object or both are NULL, and
* their `int<6>` tags are bit-wise equal.
The `NE` instruction returns the opposite result of `EQ`.
> NOTE: `tagref64` uses the NaN space of double. Real NaN `double` values may lose its precise bit-wise representation when converted to `tagref64`. So comparing two `tagref64` values both holding NaNs has unspecified result.
*Alternative possibility*: Require Mu to canonicalise all NaNs to one unique bit-wise representation. In this way, all NaNs compare equal when comparing `tagref64` values bit by bit.
# Default values of `tagref64` types.
Currently the default value (all zero bits. All newly-allocated memory (heap, stack, global) holds all zero bits.) of `tagref64` holds +0.0 as a `double` value. In this representation, all `tagref64` values which hold `double` contents are bit-wise equal to its real `double` representation. So converting a `tagref64` to `double` is trivial: just do a bitcast.
However, languages usually define the values for uninitialised variables/fields as null-like values: `undefined` in JS, `nil` in Lua, `null` in java. There should be an option to make 00000000..00 represent their null types.
There could be a flag to determine the zero value of a `tagref64` type. The proposed syntax is:
```
.typedef @tr64_with_fp_default = tagref64 <DEF_FP(3.14d)> // All 0s represents double value 3.14d
.typedef @tr64_with_ref_default = tagref64 <DEF_REF(0x5a)> // All 0s represents NULL ref with 0x5a as tag.
.typedef @tr64_with_int_default = tagref64 <DEF_INT(0x55aa55aa55aa5)> // All 0s represents integer 0x55aa55aa55aa5.
.typedef @tr64_as_current = tagref64 <FP_DEF(0.0d)> // All 0s represents double value 0.0d, which is the same as the current `tagref64`.
```
The kind of default is a static metadata and the garbage collector can identify it.
This can be implemented by applying an XOR mask on the value after encoding to `tagref64` and before decoding an existing `tagref64`.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/67C thread becoming Mu thread (exposed functions, a.k.a. ".expfunc")2018-06-28T23:11:33+10:00Kunshan WangC thread becoming Mu thread (exposed functions, a.k.a. ".expfunc")This issue is about calling Mu functions from C functions. It is not a problem if Mu initiated the call to native program and then it calls back. But when a fresh native thread (such as created by `pthread_create`) directly calls a Mu fu...This issue is about calling Mu functions from C functions. It is not a problem if Mu initiated the call to native program and then it calls back. But when a fresh native thread (such as created by `pthread_create`) directly calls a Mu function, thread-local states (such as GC states) must have been initialised, or the Mu program will not work properly.
Related spec: https://gitlab.anu.edu.au/mu/mu-spec/blob/master/native-interface.rst#native-functions-calling-mu-functions
Previous issue: https://gitlab.anu.edu.au/mu/general-issue-tracker/issues/39
# The problem
When a Mu thread is executing, there are thread-local states that needs to exist to support the execution of Mu IR programs.
For example, if the Mu IR program uses bump-pointer GC, the "current pointer" is a per-thread state, and it should point to the next available memory all the time. Mu instructions (such as `NEW` and `NEWHYBRID`) assumes such thread-local pointers are set up when such instructions are executed.
Such states are usually set up when a Mu thread is created. When a thread is created using the `NEWTHREAD` instruction or its equivalent API, the micro VM will initialise the states properly.
But the problem arises when the thread is created natively (for example, by `pthread_create`). Such **POSIX functions are not designed with Mu in mind** and will not initialise Mu-specific states. So a PThread cannot call Mu directly call a Mu function unless some preparation is done.
# Current design
Related spec: https://gitlab.anu.edu.au/mu/mu-spec/blob/master/native-interface.rst#native-functions-calling-mu-functions
The current Mu spec requires **implementation-defined** functions to be called before native threads not created by Mu (such as POSIX threads) can call any exposed Mu functions.
A Mu bundle can define `.expfunc` top-level definitions to directly expose pointers to C programs. For example:
```
.funcdef @fac ... {...}
.expfunc @fac_native = @fac #DEFAULT @I64_0 // expose @fac, default calling convention, use 0 as "cookie".
```
`@fac_native` is a raw function pointer which can be **called back** when Mu calls C and then C calls back to Mu. But when PThread wants to call `@fac_native`, it needs implementation-defined set-up.
## Possible implementations
* The concrete micro VM can forbid such calls, and enforce that only Mu threads can execute Mu functions.
* The concrete micro VM can extend the API with a function to attach or detach PThreads, or threads using other APIs.
* The concrete micro VM can create Mu-specific thread-local states lazily when entering from native to Mu. Since the only way to enter Mu is via "exposed functions", hence stubs can be created at those "expfuncs" to lazily check for such states, or use SIGSEGV to trap when such pointers are zero.
Each has its own strength and weakness. This is why this interface is still implementation-defined for now. Real-world experiences will tell which method is better.
## Multiple micro VMs in the same process?
It is rare that there will be one process running two micro VMs. But it is definitely possible. For example:
* A C host program provides both Python and Lua as extension languages (real-world applications exist), but both language implementations use the Mu micro VM.
* The client has some kind of sandbox mechanism and forces some part of the program to run in a separate micro VM.
# Related works
## JNI Invocation API
Related document: https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/invocation.html#attaching_to_the_vm
The JVM invocation API provides the `AttachCurrentThread` function to attach a PThread to a JVM, under the limitation that a native thread cannot be attached to two different JVMs. JNI also require that the PThread stack "should have enough stack space to perform a reasonable amount of work" and "The allocation of stack space per thread is operating system-specific. For example, using pthreads, the stack size can be specified in the pthread_attr_t argument to pthread_create.".
From Mu's point of view, the `MuCtx` structure holds Mu states for the client, so calling API functions in `MuCtx` does not need any attaching. However, calling "exposed Mu functions" will need special set-up like `AttachCurrentThread`.
## JikesRVM
JikesRVM's GC is designed in such a way that it will work even if the related thread-local data structure is all zero (as is initialised by the system). This gracefully avoided the problem related to GC. But it could not be the most general solution.
## .NET framework
Related documents: https://msdn.microsoft.com/en-us/library/74169f59(v=vs.110).aspx
VM-related thread-local states are created lazily when an unmanaged thread enters the managed runtime.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/39Call-back from native to Mu2016-09-06T20:28:02+10:00John ZhangCall-back from native to Mu*Created by: wks*
# Overview
## Rationale
Some existing C libraries or system interfaces use call-back functions, i.e. user-provided function pointers which are called by C or system libraries. Mu should provide appropriate mechan...*Created by: wks*
# Overview
## Rationale
Some existing C libraries or system interfaces use call-back functions, i.e. user-provided function pointers which are called by C or system libraries. Mu should provide appropriate mechanisms to interface with those libraries.
This is part of the (unsafe) native interface. See super issue: https://github.com/microvm/microvm-meta/issues/24
## Exposing appropriate Mu functions as C-style function pointers
"Appropriate" Mu functions must only use the following types as their parameter types or return types: `int<n>`, `float`, `double`, `vector<T>`, `ptr<T>` or `struct` types whose components are these types. In the case of `ptr<T>`, `T` can also be `array<T n>` or `hybrid<F V>` where `T`, `F` and `V` are one of the above types. In other words, (traced) references and Mu-specific opaque types are not allowed.
The Mu ABI will be designed to be compatible with the C calling convention as defined by the platform ABI.
**way 1**: (simple) Mu functions are declared with the optional `WITH_FP` clauses to create their associated C-style function pointers. For example:
```
.funcdecl @some_func WITH_FP(@fp_some_func DEFAULT @COOKIE) <@sig>
.funcdef @other_func VERSION @other_func_v1 WITH_FP(@fp_other_func DEFAULT @COOKIE) <@sig2> WITH_FP @fp_other_func (%param0) {
...
}
```
With the above definitions, `@some_func` has type `func<@sig>`, which is a Mu function reference value. `@fp_some_func` has type `funcptr<@sig>`, which is a C-style function pointer. Similarly `@other_func` is a `func<@sig2>`, while `@fp_other_func` is a `funcptr<@sig2>`. `DEFAULT` is the calling convention. `@COOKIE` is a "cookie" (see *way 2* below).
The Mu IR program or the API can pass the function pointer to the native program. When called, the Mu function will run and return its return value to the native caller.
* pros:
1. simple
2. The native funcptr is immediately available after loading the Mu bundle.
* cons: does not support "closures" well. Some languages/implementations (e.g. LuaJIT) would like to expose closures (rather than just functions) to C as callbacks.
**way 2**: (complex) Mu functions are exposed with a run-time invocation of a Mu instruction or a Mu API message.
Format:
* Instruction: *fp* = `EXPOSE_MU_FUNC` `<` *sig* `>` *mufunc* *cookie*
* API: *fpHandle* = ca.exposeMuFunc( *hMuFunc*, *hCookie* )
The resulting *fp* has type `funcptr<sig>` and can be called from C. A function can be exposed multiple times, and the resulting function pointers are mutually inequal. The *cookie* is an `int<64>` value associated to the resulting function pointer. If a Mu function is called through a particular function pointer, a special instruction `NATIVE_COOKIE` will return the associated *cookie* value.
Example:
```
%fp1 = EXPOSE_MU_FUNC <@sig> @some_func @some_int64_value
%fp2 = EXPOSE_MU_FUNC <@sig> @some_func @other_int64_value
...
UNEXPOSE_MU_FUNC %fp1
UNEXPOSE_MU_FUNC %fp2
// in @some_func
%cookie = NATIVE_COOKIE
%eq = EQ <@i64> %cookie @some_int64_value
...
```
```
val hFP = ca.exposeMuFunc(hFunc, hSomeInt64Value)
...
ca.unexposeMuFunc(hFP)
```
Both `%fp1` and `%fp2` have type `funcptr<@sig>`. But if the Mu fucntion `@some_func` is called from C via `%fp1`, the `NATIVE_COOKIE` instruction will return `@some_int64_value`. If called via `%fp2`, then `NATIVE_COOKIE` returns `@other_int64_value`, instead.
* pro: the cookie can be used to identify different closures and look up the contexts of the closures.
* con:
1. Not as simple as way1.
2. Exposing a Mu function requires a Mu instruction or an API message. This makes "implementing the Mu client API directly as exposed Mu functions" difficult. (In this case, exposing a Mu function requires an API function, which is also an exposed Mu function.)
## Contexts necessary for Mu functions to run
Even if a Mu function is exposed to the native program as a `functpr<sig>`, some contexts must be set up so that the Mu function can make use of Mu-specific features. These include:
* **Thread-local garbage collection states**: including thread-local allocation pools, and registering the thread for yielding as requested by the GC.
* **Stack context**: Each Mu stack has an associated `stack` value (the opaque reference to the current stack). This is necessary for swap-stack.
Similar to the JNI's "attaching a native thread to the JVM", Mu will also require attaching Mu contexts to a native thread before any exposed Mu function pointers can be called.
If the native program is executed because some Mu program called the native function through the native interface (via `CCALL`), the context is already set up and the C program can safely call back to Mu.
## Mixed native/Mu stacks
With the possibility of both C-to-Mu and Mu-to-C calling, a stack may have mixed C or Mu frames. It has some implications for stack introspection and exception handling. Possible approaches are:
1. Stack introspection cannot go deeper than the last contiguous Mu frame from the top. i.e. introspection is immediately unavailable when reached a native frame. Exceptions may not go into native frames. This approach has the weakest promise from Mu, and is thus the easiest.
2. Mu can skip non-Mu frames and unwind to other Mu frames underneath.
3. Stack introspection and stack unwinding caused by exceptions can go through frames which are supported by the native debugger. This is harder than the previous one, but still practicable.
4. Support non-standard frames (such as JavaScript frames of SpiderMonkey or V8). Too hard.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/13Change the term "SSA Value" to "SSA Variable"2018-06-28T23:11:32+10:00John ZhangChange the term "SSA Value" to "SSA Variable"*Created by: wks*
The term "SSA Value" currently used in the µVM is confusing. People tend to think "value" means data value, that is, instances of types and are representable in binary. But what it actually means in the µVM is the **th...*Created by: wks*
The term "SSA Value" currently used in the µVM is confusing. People tend to think "value" means data value, that is, instances of types and are representable in binary. But what it actually means in the µVM is the **things** that can **generate** data value.
# The current conceptual model
The current hierarchy is:
* **SSA Value** (the union of all such things)
- **Global SSA Value** (defined globally, actually constants, written as `@a`, `@b`, `@c`, ...)
+ **Declared constants** (defined literally as constants)
+ **Global memory references** (internal references to the global memory, which never moves. Hence the reference is constant)
+ **Functions** (There is a constant "handle" for each function, which does not change even when a function is redefined. So it is effectively a constant.)
- **Local SSA Values** (defined locally, determined during program execution, written as `%a`, `%b`, `%c`, ...)
+ **Parameters** (passed to functions, determined at each call site)
+ **Instructions** (computations that generate data values, or void, the "unit" value. The resulting data value is determined every time the instruction is executed)
The other confusion is that currently "an instruction **is an** SSA Value". When an instruction uses another instruction as its parameter, like:
```
.const @a <int<64>> = 0x123456789abcdef0
%x = ADD <int<64>> @a @a
%y = SUB <int<64>> %x @a
```
In the above example, `@a`, `%x` and `%y` are all names of SSA Values. In the definition of `%y`, it means "`%y` **is** a SUB instruction and it takes the **computing result of** instruction `%x` and **the data value of** declared constant `@a` as arguments".
When the Client runs a trap handler, the names identify the TRAP instructions.
```
%trap0 = TRAP KEEPALIVE (%a %b %c %d ...)
```
Then the client will see "the `%trap0` instruction caused the trap".
The above model, though internally consistent, confused many people.
# The use in LLVM
* Value
- Argument
- BasicBlock
- InlineASM
- MDNode
- MDString
- User
+ Constant
* BlockAddress
* ConstantInt
* ConstantFP
* ConstantPointerNull
* ConstantStruct
* ConstantVector
* ConstantExpr
- BinaryConstantExpr
- CompareConstantExpr
- ...
* GlobalValue
- GlobalAlias
- GlobalObject
+ Function
+ GlobalVariable
+ Instruction
- BinaryOperator
- CmpInst
- PHINode
- CallInst
- UnaryOperation
+ LoadInst
+ AllocaInst
+ ...
- StoreInst
- TerminatorInst
+ BranchInst
+ ReturnInst
+ InvokeInst
+ ...
- ...
+ Operator
- ...
LLVM defines Value as (http://llvm.org/doxygen/classllvm_1_1Value.html):
> LLVM Value Representation.
>
> **This is a very important LLVM class. It is the base class of all values computed by a program that may be used as operands to other values.** Value is the super class of other important classes such as Instruction and Function. All Values have a Type. Type is not a subclass of Value. Some values can have a name and they belong to some Module. Setting the name on the Value automatically updates the module's symbol table.
>
> Every value has a "use list" that keeps track of which other Values are using this Value. A Value can also have an arbitrary number of ValueHandle objects that watch it and listen to RAUW and Destroy events. See llvm/IR/ValueHandle.h for details.
The µVM's hierarchy is a subtree of the LLVM's Value hierarchy. Similar with LLVM, instructions that does not return values (like ReturnInst, BranchInst, ...) are also a kind of "Value".
# Alternative notation
An SSA Variable is a thing that holds a data value. A data value is an instance of a type.
An SSA Variable has exactly one definition and the definition corresponds to exactly one SSA Variable. That definition can be a declaration of constant, global memory, function, parameter or instruction.
The revised hierarchy is:
* **SSA Variable**
- **Globally defined SSA Variable** (written as `@a`, `@b`, `@c`, ...)
+ defined by **Declared constants**
+ defined by **Global memory references**
+ defined by **Functions**
- **Locally defined SSA Variable** (written as `%a`, `%b`, `%c`, ...)
+ defined by **Parameters**
+ defined by **Instructions**
This does not change the hierarchy. In an implementation, it is practical to let class `Instruction` extend class `SSAValue`. In the documentation, we say, if `%a = ADD <int<64>> %b %c`, then:
* `%a` is an SSA Variable
* `%a` is defined by the `ADD <int<64>> %b %c`
* That instruction is bound to the SSA Variable `%a`
In this example:
```
.const @a <int<64>> = 0x123456789abcdef0
%x = ADD <int<64>> @a @a
%y = SUB <int<64>> %x @a
```
We say, `@a`, `%x` and `%y` are all SSA Variables. `%y` is **defined by** a SUB instruction. It takes the **value held by** the SSA Variables `%x` and `@a` as arguments".
In a trap handler in the Client for the following TRAP:
```
%trap0 = TRAP KEEPALIVE (%a %b %c %d ...)
```
The client will see "the TRAP instruction that **defines** `%trap0` caused the trap".
spec-2https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/41Checklist for the first final version of the specification2018-06-28T23:11:32+10:00John ZhangChecklist for the first final version of the specification*Created by: wks*
- [X] HAIL #29
- [X] Unsafe native interface. #24
- [X] API in C (header)
- [X] API in C (documented)
- [X] API as Mu instructions (common instructions). #36
- [X] Adjust the undefined function API
*Created by: wks*
- [X] HAIL #29
- [X] Unsafe native interface. #24
- [X] API in C (header)
- [X] API in C (documented)
- [X] API as Mu instructions (common instructions). #36
- [X] Adjust the undefined function API
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/7Choose between swap-stack with and without parameters.2016-06-17T15:22:41+10:00John ZhangChoose between swap-stack with and without parameters.*Created by: wks*
Coroutines may communicate with each other by yielding and, at the same time, passing data between stacks. There are two ways this can be done.
1. Allocate memory region in one stack using ALLOCA and let the other s...*Created by: wks*
Coroutines may communicate with each other by yielding and, at the same time, passing data between stacks. There are two ways this can be done.
1. Allocate memory region in one stack using ALLOCA and let the other stack write data in those region.
2. Let the swap-stack operation take parameters.
The first approach is currently appreciated by @wks and @hosking, but (Dolan et al.)[http://dl.acm.org/citation.cfm?id=2400695] proposed the second approach, but does not discuss the difference between the two.
We need to choose the appropriate one.spec-2https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/20Client languages in play2016-06-17T15:23:09+10:00John ZhangClient languages in play*Created by: eliotmoss*
Maybe this is also being discussed elsewhere -- if so, we can close this and move the discussion there. Anyway, on the US side we are wondering about the specific suite of client languages being worked on. We h...*Created by: eliotmoss*
Maybe this is also being discussed elsewhere -- if so, we can close this and move the discussion there. Anyway, on the US side we are wondering about the specific suite of client languages being worked on. We have at least one student interested in doing a mapping from Java bytecode to μVM IR, and we also wonder about some of the dynamic languages (Javascript, maybe? I should perhaps come back and edit this after this week's meeting with the students). What about C or C--? For C we could look at lcc, a little C compiler originally designed for a compiler class, I think, but it accepts some standard kind of C, as I recall. gcc is more of a bear, but ought to be on the agenda at some point, no?https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/19Code generator prototyping strategies2016-06-17T15:23:07+10:00John ZhangCode generator prototyping strategies*Created by: eliotmoss*
As we start to ramp up in Amherst, we're pondering how best to get things rolling (i.e., what a good starting point is) and where we want to end up.
**Initial prototype thought**
For our initial prototype w...*Created by: eliotmoss*
As we start to ramp up in Amherst, we're pondering how best to get things rolling (i.e., what a good starting point is) and where we want to end up.
**Initial prototype thought**
For our initial prototype we are thinking about using QEMU (locally pronounced KEE-mew), the Quick Emulator. QEMU is a whole machine emulator (but can also be configured to run as user-mode only) that supports emulating one hardware ISA on another one (and same to same, of course). Beng whole-machine it deals with devices, interrupts, memory mapping, etc., but the part of greatest interest to us is that it takes guest ISA code blocks and dynamically translates them to host machine code, maintaining the translations in a code cache. Thus parts of its design will be great for dynamic and adaptive code generation. QEMU works by translating guest code into QEMU IR, which is then passed to a target code generator. That code generator does a useful level of local register allocation and local optimizations (hence the Quick in Quick Emulator).
For our case, μVM IR would be the guest ISA and our target would the host ISA that QEMU targets. A part of how QEMU does things is that the guest register file is represented as a memory data structure and (to first approximation) the host registers are used for local register allocation. To finer approximation I **believe** QEMU can bind some guest registers to host registers, useful where the host register set is larger. For μVM IR the "registers" are the SSA values and we can probably get some mileage from the SSA form in telling the QEMU register allocator helpful things about when things "die", etc.
Downsides of this prototype approach include:
* Dependence on QEMU
* Code quality may be limited, particularly in that it will tend to push all variables to memory as it crosses from block to block
* It is not clear what the situation is with QEMU and multithreading, though it **may** support concurrent execution
Upsides:
* Fast path to getting native code going
* Useful optimizations, especially for a dynamic environment
* Back-end architecture design that we might want to use a guide in our own work
* Already implements major platforms of interest
**Eventual system design thought**
Down the line we may want something like QEMU's code generation design, with more provision for (optional) higher levels of optimization and register allocation across larger scopes (that is, for whole functions, not just blocks/traces). But their internal IR, code generators, etc., might be a very useful base, as well as having a notion of the code cache design and so forth.https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/69Comparing ref<T> against ref<U>2018-06-28T23:11:33+10:00Kunshan WangComparing ref<T> against ref<U># The problem
Currently the cmpOp instructions take only one type parameter, and both operands must have the same type.
```
%result1 = EQ <int<32>> %a1 %b1
%result2 = EQ <int<64>> %a2 %b2
%result3 = FEQ <float> %a3 %b3
%result...# The problem
Currently the cmpOp instructions take only one type parameter, and both operands must have the same type.
```
%result1 = EQ <int<32>> %a1 %b1
%result2 = EQ <int<64>> %a2 %b2
%result3 = FEQ <float> %a3 %b3
%result4 = EQ <ref<T>> %a4 %b4
%result5 = EQ <iref<T>> %a5 %b5
%result6 = EQ <funcref<sig>> %a6 %b6
%result7 = EQ <uptr<T>> %a7 %b7
%result8 = EQ <ufuncptr<sig>> %a8 %b8
```
In object-oriented programming, we sometimes want to compare a `ref<T>` against `ref<U>`, where T is a superclass of U. Mu does not know OOP, but Mu has the [prefix rule](https://gitlab.anu.edu.au/mu/mu-spec/blob/master/memory.rst#prefix-rule) so that a value of type `ref<T>` can actually refer to an object of type `U` as long as `T` is a prefix of `U`. This allows OOP to be implemented in the Mu type system.
For this reason, comparing `ref<T>` against `ref<U>` for equality is meaningful: The result is true iff both references refer to the same object, or both NULL. The semantics of `EQ` is actually [defined as so in the spec](https://gitlab.anu.edu.au/mu/mu-spec/blob/master/instruction-set.rst#comparison), but requires both operands to have the `ref<T>` type.
```
// Assume %a is ref<T> and %b is ref<U>
%result = EQ <ref<T>> %a %b // Disallowed in the spec because %b is a ref<U>. But the refimpl does not check the type parameter, so it works for now.
```
To work-around this problem, the client can use `REFCAST` to cast both operands to the same type (such as `ref<void>`) before comparing:
```
// Assume %a is ref<T> and %b is ref<U>
%aa = REFCAST <ref<T> ref<void>> %a
%bb = REFCAST <ref<U> ref<void>> %b
%result = EQ <ref<void>> %aa %bb
```
This would potentially make the Mu instruction stream very verbose.
# Simplest solution
The simplest work-around is to let REFCAST ignore the type parameter when comparing between two `ref`, `iref`, `funcref`, `uptr` or `ufuncptr` values.
The code will look like:
```
// Assume %a is ref<T> and %b is ref<U>
%result = EQ <ref<Blah>> %a %b // The micro VM ignores the Blah
```
This will elide the two `REFCAST` instructions. Similarly, when comparing `iref<T>` and `uptr<T>`, `T` is ignored; it also disregards the `sig` signature in `funcref<sig>` and `ufuncptr<sig>`. If this behaviour is *standardised*, the client can rely on this and emit less instructions.
## What the micro VM sees
When the micro VM sees this instruction: `EQ <ref<Blah>> %a %b`, the compiler knows that both `%a` and `%b` are `ref` of something, but does not know what `%a` and `%b` refers to (the "Blah" can just be a lie). As long as refs are always represented in the same way (such as represented as pointers to the beginning of the object, but may be moved by the GC), the compiler can still generate code without knowing the object type. **The compiler cares about the storage type**, not the high-level parameterised type.
## Potential side effects (unlikely)
This will require all `ref<_>` types to have the same representation (sizes, as pointer or as handle) regardless of the type parameter. It prevents the possibility that "`ref<T>` and `ref<U>` may have different sizes. But I don't think implementing different refs in different sizes would be useful.
# A more aggressive design
We can push it further by removing all type and signature parameters in `ref<T>`, `iref<T>`, `funcref<sig>`, `uptr<T>` and `ufuncptr<sig>`, so they become simply `ref`, `iref`, `funcref`, `uptr` and `ufuncptr`.
To compensate the lack of the knowledge about the referent type, instructions must be annotated with the referent types. But the micro VM only needs to know the referent type when doing pointer arithmetics (GETFIELDIREF...) and memory access (LOAD, STORE, ...). For example:
```
// Assume %a is an iref to T, and T is a struct
%b = GETFIELDIREF <T 3> %a
// Assume %c is an iref to int<64>
%v = LOAD <int<64>> %c
```
Actually this is *the same as the current Mu IR*. The type annotations on the instructions are intended to ease the job of the Mu-to-machine compiler inside the micro VM.
By discarding the type parameters, REFCAST will be unnecessary, and PTRCAST only casts between pointers and integers, but not between pointers.
But the Mu IR programs themselves will carry less information about the destination of refs/uptrs. It may make the behaviour of the program harder to reason about. But since the client can perform REFCAST at any time, it can always choose to cast all refs to `ref<void>`, and still write correct programs.
It is unlikely that we will adopt this aggressive design soon, but may be considered if we redesign the IR.
# Comparing ref against ptr
A related topic is whether it should be allowed to compare `ref` against `ptr`.
The obvious answer is "no". `ref` and `ptr` (as well as `funcref`) do not have the same storage type. `ref` can be represented as the address to the beginning of the object, and may be modified by the GC when the object is moved. `ref` can also be represented as a handle, or as a pair of <addr, type>. On the other hand, `ptr` must be treated as raw addresses. Even if `ref` is represented as address, consider an extreme case where we have a micro VM that performs GC between every pair of instructions, and moves every object at that time. It is a valid micro VM implementatoin, but the address of any `ref` is totally non-determinestic.
In some VMs (such as JikesRVM), there are VMMagics that allows getting the address from an object reference, or converting an address into an object reference. In this way, the GC can be implemented in the same language as the language it is serving. However, the `addr->objref` and `obj->addr` conversions alone are not enoug. Such VMs must also have mechanisms to specify **uninterruptable** regions in which GC must not happen. If the GC is concurrent, there must be other mechanisms to handle this gracefully. But all "magics" are closely related to the concrete (micro)VM implementation, and should be kept private.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/70Debugging facilities for Mu clients2018-06-28T23:11:33+10:00Zixian CaiDebugging facilities for Mu clientshttps://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/14Document memory layout requirements2018-06-28T23:11:32+10:00John ZhangDocument memory layout requirements*Created by: wks*
Because of the difference between architectures, it is better to leave the object layout to be implementation-specific. An implementation of a µVM can choose its optimal strategy to make memory accesses as fast as poss...*Created by: wks*
Because of the difference between architectures, it is better to leave the object layout to be implementation-specific. An implementation of a µVM can choose its optimal strategy to make memory accesses as fast as possible.
However, although the memory layout, is part of the implementation, some guarantees must be made to enable:
1. implementation of object-oriented languages: the superclass-subclass relationship can be most conveniently implemented as the superclass being a prefix of a subclass structure.
2. interoperation between µVM programs and C programs: during a foreign function call, data structures are shared so that data can be passed both ways.
3. some basic data structures must support atomic accesses. One of such type is references.
# Prefix rules
Some type **is a prefix of** another type. If T1 is a prefix of T2, then there are **shared components** between T1 and T2. A component can be the whole value or some part of it, including fields in structs, elements in arrays and vectors and both the fixed part and the variable part in hybrids.
Specifically:
* Any type is a prefix of itself.
+ All corresponding components are shared.
* `void` is trivially a prefix of any type.
+ No component is shared.
* `T1 = T` is a prefix of `T2 = struct <SEQ>` for any T where SEQ is a sequence of types beginning with T.
+ The whole T1 itself is a shared component with the first field in the struct T2.
* `T1 = T` is a prefix of `T2 = hybrid<T U>` for any T.
+ The whole T1 is a shared component with the fixed part in the hybrid T2.
* `T1 = T` is a prefix of `T2 = array<T n>` for any T if n >= 1.
+ The whole T1 is a shared component with the first element in array T2.
* For all types T1, T2 and T3, if T1 is a prefix of T3 and T3 is a prefix of T2, then T1 is a prefix of T2.
+ The shared component between T1 and T2 are their mutual shared components with T3.
Examples:
* `float` is a prefix of `struct<float double>`.
+ The first field is shared.
* `struct<@TIB_REF @LOCK @LENGTH_TYPE>` is a prefix of `hybrid<struct<@TIB_REF @LOCK @LENGTH_TYPE> int<8>>`.
+ The fixed part of the latter type is shared with the former type.
* `int<8>` is a prefix of `array <int<8> 100>`.
+ The first byte element of the latter is shared with the former.
* `@TIB_REF` is a prefix of `hybrid<struct<@TIB_REF @LOCK @LENGTH_TYPE> int<8>>`
+ There is an intermediate type `struct<@TIB_REF @LOCK @LENGTH_TYPE>` that bridges the "is a prefix of" relation.
If:
* There is a memory location M which represents data of type T2, and,
* T1 is a type and T1 is a prefix of T2, and,
* r1 is an `iref<T1>` and refers to memory location M1, and,
* r2 is an `iref<T2>` and refers to memory location M, and,
* the beginning of M and M1 are the same (i.e. the have the same address), and,
* rc1 is r1 or an internal reference derived from r1, and,
* rc2 is r2 or an internal reference derived from r2, and,
* rc1 and rc2 refer to a shared component between T1 and T2,
then rc1 and rc2 refer to the same memory location. This means the shared components can be accessed as if it is a field of a prefix. This allows treating a subclass as an instance of a superclass.
Related standard: C11
* 6.3.2-3: (**array --> pointer to first element**) ... an expression that has type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’ that points to the initial element of the array object and is not an lvalue. ...
* 6.7.2.1-15:(**pointer to struct <--> pointer to first field**) ... A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. ...
TODO: C does not explicitly allow the prefixing between the following structs where their sequences of elements has a "prefix" relation:
```
struct Foo { short a; int b; long c; };
struct Bar { short a; int b; long c; float d; double e; };
```
There may be a reason behind it.
# Object layout and C foreign function interface (FFI)
The object layout *should* follow the application binary interface (ABI) as much as possible because:
* The ABI is carefully designed by system programmers for performance. The µVM implementer needs a good reason why not to follow it.
* When a foreign function call to external C programs is needed, the µVM data structure should already be in the desired layout expected by the C programs.
The µVM only needs to guarantee some compatibility between µVM types and C types in the FFI. It is already documented in the [Instruction Set](https://github.com/microvm/microvm-spec/wiki/instruction-set), but needs to be double-checked.
# Basic data structures that needs atomic accesses
To guarantee memory safety, the µVM must not allow out-of-thin-air reference values or opaque values. Affected types are:
* ref
* iref
* weakref (loaded into SSA variables as ref)
* func
* thread
* stack
* tagref64 (may contain ref)
* futex (one word integer, loaded into SSA variables as plain `int<WORD_SIZE>`)
Storing internal references (`iref`) in the memory (heap or stack or global) is discouraged because of space inefficiency (no better way than encoding them as fat pointers). But if an implementation does allow putting `iref` in the memory, the accesses to them should be atomic. Other types than `iref` are too important not to be implemented as lock-free atomic types (all atomic accesses in µVM are lock-free).
An alternative to requiring `iref` to be atomic is to:
- document that accesses to `iref` in the memory is not atomic, and,
- require the client to compile all memory accesses to `iref` with locks, and,
- have significant performance penalty.
Since a fat pointer consists of two words: an object reference plus an in-object offset, some (I assume there are only very few of them) architectures may not provide atomic access to such a length.
spec-2