general-issue-tracker issueshttps://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues2015-06-18T12:48:59+10:00https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/33Impossible states in full-state frame making (OSR)2015-06-18T12:48:59+10:00John ZhangImpossible states in full-state frame making (OSR)*Created by: wks*
There was a proposal about the OSR before, https://github.com/microvm/microvm-meta/issues/5
According to our discussion recently, we decided that:
1. the state of a frame is the PC and a set of local variable val...*Created by: wks*
There was a proposal about the OSR before, https://github.com/microvm/microvm-meta/issues/5
According to our discussion recently, we decided that:
1. the state of a frame is the PC and a set of local variable values.
2. It should be only possible to continue from some designated "OSR points", or the beginning of a basic block, rather than from arbitrary point in the code. This reduces the compiler's burden to generate stack maps.
3. The client supplies an arbitrary subset of local variables and their values and
1. if a variable is supplied but is never used, it has no effect on the execution and is simply ignored.
2. if a variable is not supplied but is used later, it gives undefined behaviour.
But this leads to a problem: the supplied state may never be reproducible from normal execution. For example:
```
.funcsig @foo_sig = @i64 (@i64 @i64)
.funcdef @foo VERSION @foo_v1 <@foo_sig> (%a %b) {
%entry:
%x = MUL <@i64> %a %b
%y = ADD <@i64> %x @i64_0 // The rhs is the constant 0.
%trap = TRAP <@void>
// OSR continues here
CALL @print (%x)
CALL @print (%y)
RET <@i64> %y
}
```
Assume we perform an OSR and construct a frame which continues *after* the `%trap` instruction with local variable values: `%a = 6; %b = 9; %x = 42; %y = 54`. Then the value `%x = 42` is impossible.
But the code generator may consider "adding zero" as a no-op and thus generates the machine code that aliases the register of `%x` and `%y`. For example:
```
foo:
push rbx ; save callee-saved register
mov rbx, rdi ; do multiplication. rbx holds the value of %x and also %y
mul rbx, rsi
mov rdi, rbx ; prepare to call @print (%x)
call print
mov rdi, rbx ; prepare to call @print (%y)
call print
pop rbx
ret
```
Then it is impossible to create such a state as mentioned above. This implies that either
1. we require that such state construction must be possible and **require the code generator not to generate the code like above**, or
2. we further **restrict our API** on frame state construction.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/32Better support for the tagref64 type2015-05-21T17:20:07+10:00John ZhangBetter support for the tagref64 type*Created by: wks*
In dynamic languages, the `tagref64` type (or other future tagged reference type variants) will be used pervasively in the language runtime.
This issue summarises potential improvements on the support for such types...*Created by: wks*
In dynamic languages, the `tagref64` type (or other future tagged reference type variants) will be used pervasively in the language runtime.
This issue summarises potential improvements on the support for such types.
# Tagged reference constant
The Mu IR currently does not have a constant for `tagref64` mainly because it may holds a reference and non-NULL references cannot be constant. However, one possible use of the `tagref64` type is to store a NULL reference together with an `int<6>` tag. In this case, the tag determines the concrete thing it is representing (undefined, nil, null, false, true or other frequently used singleton objects). So it should be possible in Mu to create such `tagref64` as a constant.
Proposed new syntax:
```
.const @name <@tagref64> = TR64FP @double_constant
.const @name <@tagref64> = TR64INT @int52_constant
.const @name <@tagref64> = TR64NULLTAG @int6_constant // The ref is NULL, the tag is @int64_constant
.const @double_constant <@double> = 3.14d
.const @int52_constant <@i52> = 0x123456789abcd
.const @int6_constant <@i6> = 30
```
# Tagged reference equality
Comparing floating point numbers bit by bit is not equivalent to IEEE754's definition of "equality". However, when two `tagref64` values both holds integers or references+tags, the result is deterministic.
In dynamic languages, such comparisons can quickly determine whether two tagged references have the same type (identified by the tag part) and refers to the same object.
Proposed semantic of `EQ` comparison between `tagref64` values:
The result of the `EQ` comparing instruction between `v1` and `v2` is 1 (true) if and only if any of the following is true:
* Both holds `double` values, and
* neither were NaN and both have the same bit-wise representation, or
* both are NaN and they happen to have the same bit-wise representation after converted to `tagref64`.
* Both holds `int<52>` values and they are bit-wise equal.
* Both holds references, and
* their references refer to the same object or both are NULL, and
* their `int<6>` tags are bit-wise equal.
The `NE` instruction returns the opposite result of `EQ`.
> NOTE: `tagref64` uses the NaN space of double. Real NaN `double` values may lose its precise bit-wise representation when converted to `tagref64`. So comparing two `tagref64` values both holding NaNs has unspecified result.
*Alternative possibility*: Require Mu to canonicalise all NaNs to one unique bit-wise representation. In this way, all NaNs compare equal when comparing `tagref64` values bit by bit.
# Default values of `tagref64` types.
Currently the default value (all zero bits. All newly-allocated memory (heap, stack, global) holds all zero bits.) of `tagref64` holds +0.0 as a `double` value. In this representation, all `tagref64` values which hold `double` contents are bit-wise equal to its real `double` representation. So converting a `tagref64` to `double` is trivial: just do a bitcast.
However, languages usually define the values for uninitialised variables/fields as null-like values: `undefined` in JS, `nil` in Lua, `null` in java. There should be an option to make 00000000..00 represent their null types.
There could be a flag to determine the zero value of a `tagref64` type. The proposed syntax is:
```
.typedef @tr64_with_fp_default = tagref64 <DEF_FP(3.14d)> // All 0s represents double value 3.14d
.typedef @tr64_with_ref_default = tagref64 <DEF_REF(0x5a)> // All 0s represents NULL ref with 0x5a as tag.
.typedef @tr64_with_int_default = tagref64 <DEF_INT(0x55aa55aa55aa5)> // All 0s represents integer 0x55aa55aa55aa5.
.typedef @tr64_as_current = tagref64 <FP_DEF(0.0d)> // All 0s represents double value 0.0d, which is the same as the current `tagref64`.
```
The kind of default is a static metadata and the garbage collector can identify it.
This can be implemented by applying an XOR mask on the value after encoding to `tagref64` and before decoding an existing `tagref64`.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/31GC: Is "liveness" of objects really needs to be defined?2015-04-15T19:47:10+10:00John ZhangGC: Is "liveness" of objects really needs to be defined?*Created by: wks*
Currently the Mu spec defines "live object" as being reachable from roots.
An alternative definition is to define heap objects as *always live*, but implementations can "*cheat*". We define:
A memory location has...*Created by: wks*
Currently the Mu spec defines "live object" as being reachable from roots.
An alternative definition is to define heap objects as *always live*, but implementations can "*cheat*". We define:
A memory location has a **lifetime**. An internal reference (iref) to a memory location is **valid** as long as the memory location's lifetime has not expired. Specifically,
* The lifetime of a memory location in the *heap* begins when the `NEW` or `NEWHYBRID` that allocates the heap object is executed. It **never expires**.
* The lifetime of a memory location in the *global memory* begins when the bundle that defines it is loaded. It never expires.
* The lifetime of a memory location in the *stack* begins when the `ALLOCA` or `ALLOCAHYBRID` that allocates the stack cell is executed. It expires when the function activation which the stack cell is allocated is destroyed by either returning, throwing an exception, or killing the stack.
If the Mu spec no longer define liveness by the "root set" and transitive reachability, the Mu implementation must infer those reachability rules from other parts of the spec. I believe a carefully defined spec implies the same rules as explicitly defined reachability rules.
## Examples and corner cases
**When an object is unreachable from the roots** (previously defined as "dead"): Mu can reclaim the object. Since it cannot be reached, the client and Mu IR programs will never find out Mu is cheating about the lifetime of heap objects, which were defined as "forever". (You can kill an immortal if nobody can see him/her again.)
**Object pinning**: Since an address is exposed during the period of pinning, the GC must not collect the object otherwise the native code will find Mu is cheating. In other words, pinning keeps an object alive.
## Weak references and finalisers
**Weak reference**: We must change the meaning of "weak references" because we no longer define "reachable". We can define it as:
* At any time, Mu **may** atomically set the values of some weak references to NULL if "after doing so, no one can prove that their referred objects can otherwise be reached". (This is not formal at all. Maybe weak references are really meaningless.)
In this way, a Mu implementation that never clear weak references is a valid implementation. But an implementation that does so may legally do so.
**Finaliser**: It was not defined because of it is not guaranteed to be caught. But in order to allow the Mu implementation cheat, I define it as:
* Any object may have a "prevent-one-death" flag (set when a finalisable object is created).
* There is a queue maintained by Mu. (finalising queue. The client implements a finalising thread watching the queue.)
* At any time, Mu **may** atomically remove the "prevent-one-death" flag of an object and put it in the queue mentioned above, provided that "the only way to get a reference to that object is via the queue". (This does not sound very formal, either.)
In this way, a Mu implementation that never call any finaliser is a valid implementation. But an implementation that does so may legally do so.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/30Mu Loadable Format (MuLF)2018-06-28T23:11:32+10:00John ZhangMu Loadable Format (MuLF)*Created by: wks*
This proposal describes an extended code delivery unit of the Mu VM.
# Rationale
A "standalone Mu IR" (if there is such thing) needs more than a code bundle to run. They include:
* A way to allocate and initiali...*Created by: wks*
This proposal describes an extended code delivery unit of the Mu VM.
# Rationale
A "standalone Mu IR" (if there is such thing) needs more than a code bundle to run. They include:
* A way to allocate and initialise heap objects at load time (addressed by the HAIL format. See https://github.com/microvm/microvm-meta/issues/29 )
* Embedded binary native programs (for example, native libraries, a native client, or even the Mu VM itself)
* Static dependencies to other units of loading (MuLF files).
* (optionally) An entry point to start execution.
Existing mechanisms can perform all of the above because the client has total control over the Mu VM. This proposal only gives a "standard" format to do so.
# Proposal
This new unit of loading is called **Mu Loadable Format (MuLF)**.
## The MuLF file sample
This proposal uses XML as the human-readable format. It is obviously not ideal.
```xml
<mulf>
<dependency kind="mulf" name="uvm.std.io" />
<dependency kind="native" name="libc.so" />
<muir-bundle> <!-- the code section --> <![CDATA[
.typedef @i64 = int<64>
.typedef @i8 = int<8>
.typedef @string = hybrid<@i64 @i8>
.typedef @ref_string = ref<@string>
.typedef @array_ref_string = hybrid<@i64 @ref_string>
.typedef @ref_array_ref_string = ref<@array_ref_string>
.const @I64_0 <@i64> = 0 // to be initialised in the next section
.global @helloWorld <@ref_string>
.funcsig @main_sig = @i64 (@ref_array_ref_string)
.funcdef @main <@main_sig> (%args) {
%entry:
%hw = LOAD <@ref_string> @helloWorld
CALL <@println_sig> @println (%hw)
RET <@i64> @I64_0
}
]]>
</muir-bundle>
<heap-initialise format="hail"> <!-- the "heap section" --> <![CDATA[
.newhybrid $hw_obj <@string>
.init $hw_obj = {12, {'H', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd', '!'}}
.init @helloWorld = $hw_obj // Assign this object to the global cell @helloWorld
]]>
</heap-initialise>
<binary kind="native-code">
...
</binary>
<binary kind="native-data">
...
</binary>
<initialiser function="@main" synchronous="true">
<param kind="cmdline-args-as-standard-string-ref-array" />
</initialiser>
</mulf>
```
## MuLF Clients
As other programming languages, this format is handled by a **MuLF client**. This is to keep the core Mu VM minimal.
* The Mu micro VM provides the Mu Client API, which can load [**Mu IR bundles**](https://github.com/microvm/microvm-spec/wiki/uvm-ir) and [**HAIL files**](https://github.com/microvm/microvm-meta/issues/29), as well as mechanisms to create Mu stacks and Mu threads.
* The MuLF client handles the MuLF format. The client loads and parses the MuLF file, invokes the API calls to load the included bundle and the included HAIL file, creates stacks and threads to execute the initialisation functions and perform necessary synchronisation to make sure the initialisation functions finish before "other parts" (the meaning depends on the concrete high-level language) can run.
## MuLF sections
* **dependencies**: references to other MuLF files or native libraries
* **uir-bundle**: a Mu IR bundle
* **heap-initialise**: a HAIL file which initialises the heap
* **binary**: embedded binary data
* **initialiser**: a function to be executed after loading the MuLF bundle
## The loading process
The MuLF client shall process dependencies before loading the current MuLF file.
TODO: The Mu IR is designed not to allow circular dependencies (just put multiple Mu IR bundles into one big bundle so circular types and function calls can be resolved). But whether MuLF allows circular dependencies is an open question.
Then the MuLF client loads the binary section, then the uir-bundle section, then the heap-initialiser section.
Finally the MuLF executes the initialisers in the order they are declared. Each initialiser is executed in a new Mu stack and a new Mu thread. If an initialiser is marked as "synchronous", the loading process pauses and waits for the initialiser function to return. But this does not prevent the initialisers to trigger traps to the client and result in other Mu IR bundles or MuLF files to be loaded.
Then the loading process finishes. The Mu VM will continue executing until the last Mu thread is killed.
## The relation between the MuLF client and the higher-level language client
The MuLF client can be considered as a "client-level library" which helps a higher-level client which implements a language.
Conversely the higher-level client can be considered as a library in the MuLF "framework": implementing language-specific things reactively as "call-backs" from the MuLF client.
# ELF compatibility
Using the standard ELF format will bring many profits, including making use of existing system facilities. It is possible to make a MuLF file a self-contained executable file.
# Open questions
**Do we standardise this MuLF as part of the Mu VM specification?**
Probably yes. It should provide a standard way to load "something more than a code bundle". However, this MuLF format is more oriented to the traditional ahead-of-time "linker-loader" model rather than the JIT compiling model. Complex languages (like Java) may wish to precisely control its loading process (e.g. loading more than one circularly-related classes and submit them in one huge Mu IR bundle and initialise heap objects together). In this case, MuLF is not as useful as ahead-of-time compiled programs.
We may make MuLF an "optional component" of the Mu VM. Some very tiny Mu implementation may not have it, but anyone who claims to implement MuLF shall do it in the standard-compliant way.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/29Heap Allocation and Initialisation Language (HAIL)2016-07-21T14:40:31+10:00John ZhangHeap Allocation and Initialisation Language (HAIL)*Created by: wks*
This proposal describes a language that allocates and initialises heap objects (and also global memory)
This proposal *does not* address *initialiser function*. It will be addressed in another issue.
# Rationale
...*Created by: wks*
This proposal describes a language that allocates and initialises heap objects (and also global memory)
This proposal *does not* address *initialiser function*. It will be addressed in another issue.
# Rationale
A code bundle (or simply "bundle" in our current terminology) contains **types**, **function signatures**, **constants**, **global memory cells** and **functions**. This is insufficient for a standalone Mu IR program.
A typical program usually contain statically declared and **load-time initialised** **heap objects**, e.g. **strings**, **class objects** (`java.lang.Class`) and so on. A developer from the PyPy project has indicated that there can be a lot of statically declared heap object. Currently those objects can be created and initialised in two ways:
1. The client allocates and initialises heap objects via the Mu Client API. This approach suffers from one particular shortcoming: performance. The API can only initialise one memory location (e.g. one element of an array, or one scalar field of a struct) per API call.
2. Include a particular function per bundle which creates and initialises heap objects. This approach has performance and complexity problems. This "function" must contain full description of all heap objects: their types, and the values of all (or some non-zero) fields, therefore the function can be huge. This information has to be encoded as Mu IR instructions and Mu IR constants, and the compiler has to translate this **humongous** "initialiser function" into runnable form and then execute it to make heap objects, and this function is executed only once. It is a waste of time and memory to compile such a one-shot function.
# Solution
The proposed solution is a compact file format that describes heap objects and initialises the memory.
Sample:
Assume we have a "traditional" Mu IR Bundle:
```
.typedef @i64 = int<64>
.typedef @i8 = int<8>
.typedef @double = double
.typedef @string = hybrid <@i64 @i8>
.typedef @void = void
.typedef @refstring = ref<@string>
.typedef @refvoid = ref<@void>
.typedef @ClassFoo = struct<@i64 @double @refstring>
.typedef @intarray = hybrid<@i64 @i32>
.global @HW <@refstring> // A global memory cell, initialised to NULL, which may hold a string reference later.
```
**After** loading the previous bundle, load this Heap Allocation and Initialisation Language (HAIL) file:
```
// HAIL file
.new $a <@i64> // A new object of just a number
.newhybrid $hw <@string> 12
.new $classFoo <@Foo>
.new $x <@refvoid> // An object whose content is only a heap reference to void
.new $y <@refvoid> // ditto
.newhybrid $hugeArray <@intarray> 10000
.init $a = 42
.init $hw = {12, {'H', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd'}}
.init $classFoo = {42, 42.0d, $hw} // Objects can directly refer to each other
.init $x = $y // Objects are first allocated and then initialised
.init $y = $x // So they can form circular references
.init @HW = $hw // @HW is a global cell declared in the previous "traditional" bundle. HAIL can initialise global cells in traditional bundles, too.
.init $hugeArray[5000] = 42 // Only initialise a particular elements. Other elements are 0.
// NOTE: only $hw is retailed because it is referenced by the global cell @HW. Other objects may immediately be garbage-collected (or not allocated at all if the Mu VM can "cheat")
```
## Structure
Heap objects allocated in this form has a special sigil `$` which is local to the current file.
A Heap Allocation and Initialisation Language (HAIL) file contains many of the following **top-level definitions**:
**.new**: Allocate scalar object in the heap. Has the form: `.new $name <@type>`
* `$name`: the local name of the object.
* `@type`: the type of the object.
**.newhybrid**: Allocate hybrid object in the heap. Has the form: `.newhybrid $name <@type> length`
* `$name`, `@type`: same as ".new"
* `length`: the length of the var part
**.init**: Initialise a heap object or a global cell. Has the form: `.init name[sub1][sub2]... = val`
* `name`: The name of the heap object or global cell. In this format, heap objects use special sigils (`$xxx`) while global cells uses global names in the Mu IR (`@xxx`).
* `sub1`, `sub2`, ...: Subscriptions. Ways to navigate through structs, arrays and hybrids. Specifically, in hybrid, the fixed part is 0 and the var part is 1.
* `val`: The value. It can be one of the following:
* Integer literals: 1, 24, -345, 0x456, 'H'
* FP literals: 1.0f, 3.14d, nanf, nand, +infd, -infd, bitsd(0x7ff0000000000001)
* Struct/array/hybrid literals: {elem0, elem1, elem2, ...}
* NULL
* other names (can be other heap objects of Mu IR constants, global cells (as internal references) and functions (as function references)): `$hw` `@HW` `@main`
## Comparing to API-based object allocation and initialisation
A HAIL file is a unit of delivery to the Mu VM. Only one API call is needed to load a whole HAIL file and it can allocate and initialise many objects.
"Loading a HAIL file" will be a new API message (or function).
# Performance Concerns
For better performance, this format should have a more compact binary format. Ideally the binary format can be very close to the in-memory representation of objects and require little more than copying data from the file to the memory and handle data sizes/padding/alignments. It cannot be perfectly identical to the in-memory representation because Mu's object layout is platform-dependent.
# When to use the HAIL format
HAIL should be used when the client wishes to allocate many objects and bulk-initialise the memory. For example, when loading a Java .class file, a Mu IR bundle is loaded for the Java functions, and then a HAIL file is loaded to create/initalise the `Class` object, the virtual table, string literals and so on.
Another example: Assume there is a PyPy interpreter implemented on Mu IR. The executable PyPy interpreter is represented as Mu IR bundle, but a HAIL file can be used to initialise the interpreter **instance** and associated objects.
# When HAIL may not be ideal
If the Mu VM is metacircular, the client is written in the Mu IR and accessing the Mu memory from the client will have no overhead. The HAIL format can still be implemented for compatible reason, but would not have any advantage in performance over direct memory accesses. For example, a metacircular Mu-based JVM can load a .class file and compile its methods to Mu IR, but the Class object can be created directly in the Mu IR because the JVM client itself is in Mu IR. It does not need to serialise the sequence of object allocations and initialisations into HAIL before doing them.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/28Object Pinning2017-05-02T16:25:32+10:00John ZhangObject Pinning*Created by: wks*
This issue is part of https://github.com/microvm/microvm-meta/issues/24
# TL;DR
This proposal gives meaning to the "object pinning" operation.
The meaning is: The PIN operation takes an `ref<T>` or `iref<T>`, ...*Created by: wks*
This issue is part of https://github.com/microvm/microvm-meta/issues/24
# TL;DR
This proposal gives meaning to the "object pinning" operation.
The meaning is: The PIN operation takes an `ref<T>` or `iref<T>`, pins the object for the current thread, and returns a `ptr<T>` (pointer to `T`). This pointer **can be used to** access the memory location of the `iref` until all threads which have pinned the object unpinned it using UNPIN operations.
Note: This has very few implications to the Mu implementation. It only says the pointer can be *used* in the expected way, but does not say anything about the storage of the actual object. (The micro VM can cheat!)
## Operations
In the following two instructions, `R` can be either `ref` or `iref`.
* `PIN(%r: R<T>) -> ptr<T>`: Add the object referred by `%r` to the *pinning set* of the current thread, and return a pointer.
* `UNPIN(%r: R<T>) -> void`: Remove the object referred by `%r` from the *pinning set* of the current thread.
`PIN` and `UNPIN` do not pin any object if `%r` refers to a memory location not in any heap object. If `%r` is NULL, `PIN` returns a NULL pointer. If `%r` is an `iref` and refers to a stack cell or a global cell, `PIN` returns a pointer to it.
> NOTE: All memory locations in Mu, not just heap objects, are referred by `iref`. In order to let native code work with the Mu memory, pointers always have to be generated. That is why `PIN` and `UNPIN` trivially work with non-heap memory locations as well. It may be impossible at compile time to know whether an `iref` refers to the heap. For example, there may be a function taking an `iref` as a parameter.
## The guarantees
The pointer returned by `PIN` has the following guarantees:
* The pointer is usable as long as the object pinned by `PIN` is in the *pinning set* of **any** thread.
* The pointer points to a region of address which can be used to access the memory location of the parameter of `PIN` (i.e. `%r`). Specifically:
+ The object layout conforms to the platform's Mu Application Binary Interface (yet to be defined).
+ The native code can perform LOAD, STORE, CMPXCHG, ATOMICRMW, FENCE operations on those locations and they shall conform to the Mu memory model. However, *which native instruction/operator/function performs which operation in the Mu memory model* is implementation defined.
> One memory order can be implemented in multiple different ways. e.g. on x86, SEQ_CST can be implemented as (load: MOV, store: XCHG), but also (load: LOCK XADD(0), store: MOV). It is the implementation to guarantee the Mu memory operation (Mu IR instructions) is compatible with the native counterparts (C11 `<stdatomic.h>` or C++11 `<atomic>`). For example, one particular implementation may let the `atomic_load(ptr, memory_order_xxxxxx)` function in glibc (but not `atomic<T>.load(xxxx)` libc++ provided by LLVM) to perform the LOAD operation in xxxxxx memory order in the Mu memory model.
## Issues about multi-threading
It is possible for two threads to pin the same object. For example, there are two threads T1 and T2 and object O. The execution appears like the following sequence:
1. T1: pin O
2. T2: pin O
3. T1: do something with O
4. T1: unpin O
5. T2: do something with O
6. T2: unpin O
In step 5, T1 has performed an unpin operation. If an object can be pinned from one thread but unpinned by another thread, then there will be a problem: If the object O is no longer pinned, it will be an error if T2 do anything to the pointer.
It is possible to require a thread to acquire a lock or perform reference counting before pinning/unpinning, but this will be inefficient because this inevitably involves expensive atomic operations. But one reason for using the FFI is performance.
Therefore, we let different threads to pin/unpin an object **locally**: `PIN` means pinning an object **for the current thread**. An object is pinned if and only if at least one thread is pinning it.
Implementation-wise, this can be done by keeping a thread-local buffer which records all objects the current thread is pinning. When GC happens the marker looks at the thread-local buffers to find all objects pinned by any thread. In this way, mutators do not need atomic memory operations, but the GC needs to look at all threads.
This "thread-local pinning" mechanism cannot be implemented by the client if the `PIN` instruction in Mu is racy. Giving the client access to the thread-local buffer is no different from the thread-local `PIN` instruction. So this thread-local pinning mechanism does not violate the principle of *minimalism* of Mu: it cannot be implemented efficiently outside Mu.
-----------------------------CUT HERE. BELOW ARE LEGACY TEXTS-------------------------------
# Abstract
I propose defining two kinds of memory spaces: *real space* which models the memory used by C or native programs, and *imaginary space* for that of the µVM. *Object pinning* (or *realising*) is an operation that temporarily makes a memory location in the imaginary space real so that it can be access form C programs.
# Proposal
## Concepts
* **memory**: self-explanatory, but... I don't trust "common sense".
* **memory location**: a region of data storage. Holds values.
* **virtual memory space**: the abstraction provided by the OS and the architecture. It has the following properties:
+ At any moment, it is a mapping from addresses (a subset of integers) to byte values. (I don't like this property. For any multi-threaded program, different threads may not see the same value, and Albert Einstein does not like "the same time".)
+ It can be accessed (read/written/atomicRMW) in various granularities (sizes). The atomicity and visibility between threads follows a certain memory model (the one defined by the architecture, OS and related programming languages).
+ It may be shared between processes and threads. Thus it can be accessed by things not in the µVM.
* **real memory**: memory in which memory locations satisfy the following properties:
+ (Does not need to have "addresses", that is, a memory location can be a variable, not numerical value.)
+ Allows memory accesses (load/store/atomicRMW).
+ For every memory location L, there is a unique memory location L' in the virtual memory space. (This disallows replication.) This L' does not change during the lifetime of L. (This disallows moving.) Accessing of both locations are equivalent.
+ For any two memory locations L1 and L2, their corresponding memory locations in the virtual memory space do not overlap. That is, their accesses are independent. (This disallows aliasing.)
+ For an array in a real memory, its corresponding memory location in the virtual memory space is contiguous. (This disallows implementing arrays as multiple disjoint sub-arrays.)
* **imaginary memory**: memory in which memory locations satisfy the following properties:
+ (Does not need to have "addresses", that is, a memory location can be a variable, not numerical value.)
+ Allows memory accesses (load/store/atomicRMW).
NOTE: As can be seen, "real memory" is trivially "imaginary memory".
* **realising**: temporarily letting a memory location in an imaginary memory have the property of real memory. (This is colloquially called **object pinning**, but it is more than "not moving").
* `iref<T>` (**internal reference**): refer to a memory location in real or imaginary memory.
* `ptr<T>` (**pointer**): an address. May or may correspond to a memory location in the real memory.
## In the µVM
* All memory in the µVM (heap, stack and global) are imaginary memory.
* Introduce the pointer type `ptr<T>`. It is just a raw address, but is typed.
* Introduce the `PTRCAST` instruction which can freely cast `ptr<T>` to or from `int<n>` if n is the appropriate size.
* `LOAD`, `STORE`, `CMPXCHG`, `ATOMICRMW` now work with both `iref<T>` and `ptr<T>`.
* The `CCALL` can call a C function.
+ Plan A: The callee can have type `int<n>`. It is just an integer address.
+ Plan B: Introduce a `c_func<sig>` type. It is castable to/from `int<n>`. NOTE: `func<sig>` refers to µVM functions.
## Pinning
* "Pinning a memory location" means "realising" it, granting it the property of real memory.
* Implicit pinning: Any `iref<T>` values used as arguments of `CCALL` are implicitly pinned during this call.
* Explicit pinning:
+ Plan A: Introduce `REALISE` and `UNREALISE` instructions. Do as it means. The `REALISE` instruction returns a `ptr<T>` value.
+ Plan B: `REALISE` and `UNREALISE` have counting semantics. An object is "unpinned" if its pin-count reduces to 0.
+ Plan C: (the tracing approach) Introduce a type `pinner_iref<T>` which actually holds an `iref<T>` (a [marked storage type](https://github.com/microvm/microvm-spec/wiki/type-system#types-and-type-constructors) of `iref`). `pinner_iref<T>` must be in the memory (not SSA, just like `weakref<T>` cannot be SSA variable). If such a reference is reachable, the referent is pinned. After pinning, the pointer can be obtained via a `GETPTR` instruction. (Plan C does not address replication and non-contiguous arrays)
## Open questions
1. Do we assume stacks and globals as "real" by default?
2. If stacks can move, how do we efficiently realise (pin) it?
3. Do we prevent non-contiguous arrays?
4. How to implement temporary "un-replicating".
# Background: Inter-language interaction
Currently the only way for the µVM to interact with the "outside world" is via traps handled by the client. This interface is called **µVM-client Interface** or **The API**.
For performance concerns, we should introduce a more direct and low-level interface to the "outside world". This new interface is called **foreign function interface** or **FFI**.
## Two worlds
**Imaginary memory**: In a world with advanced garbage collectors, the memory is managed by the GC.
* A high-level memory location (in object or not, for example, if a VM implements movable stacks) may be moved from one address to another (address is the operating system or architecture's virtual address space).
* A high-level memory location It can be replicated (a single high-level object/field corresponds to multiple system memory addresses). This may have different purposes, for example, concurrent GC, security, etc.
* A high-level memory data structure may not have the same structure of the system-level memory. For example, a high-level array may be implemented as segments of (non-contiguous) arrays.
* Programs written in C can only access this kind of memory assisted by the memory manager (GC).
* Example: Java, µVM.
**Real memory**: In a world closely interacting with C, the GC is somewhat naive, or there is no GC at all.
* High-level memory locations (as seen by the programming languages (like C) or VMs (like CPython)) do not move and are not replicated. Each high-level memory location (in object or not) corresponds to exactly one OS/architecture-level address as long as it is not deallocated.
* Programs written in C can directly access the memory as long as it has a raw pointer to the memory location.
* Example:
* Any non-GC language: C, C++, Rust, ...
* Any language/impl that tightly interacts with C: CPython, Lua (partially)
## Examples
* The µVM uses "imaginary memory". It does not assume any low-level memory layout except some high-level rules.
* Java exclusively use "imaginary memory". All Java memory accesses through JNI must go though handles. It is even a problem to expose an array to the C language: 1) the VM must support object pinning, and 2) the VM must implement arrays contiguously.
* CPython uses "real memory". C programs hold any Python objects by raw pointers. A C module can customise its own Python object layout to include its own private data.
* Lua uses "real memory". "Userdata" (a chunk of memory allocated by Lua, but is used by the user, like a managed "malloc") is a Lua object. `lua_touserdata` gets a raw pointer to such a chunk of memory and does not need pinning. `lua_topointer` gets a raw pointer to any Lua object (for debugging purpose).
* SpiderMonkey uses something hybrid. Its GC can move objects, but not within a "request" (a delimited region in C programs where GC "must not happen"). In a "request" (probably everything in C that interacts with SpiderMonkey, the C program can use raw pointers to refer to JS objects, though their structures are opaque, and it is recommended to use `JSHandleValue` to mark it as a GC root.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/27Finaliser2017-04-05T15:34:00+10:00John ZhangFinaliser*Created by: wks*
NOTE: This proposal may go to the interface of a specific µVM implementation rather than the µVM specification. The µVM can cheat by presenting an "infinite heap" and telling the program no objects are ever reclaimed.
...*Created by: wks*
NOTE: This proposal may go to the interface of a specific µVM implementation rather than the µVM specification. The µVM can cheat by presenting an "infinite heap" and telling the program no objects are ever reclaimed.
# Proposed mechanism
Add an **objects-to-finalise queue**. This queue is maintained internally by the µVM. Each element is a strong reference to an object. It may not be FIFO.
Add a new instruction `@uvm.gc.prevent_death_once (%r: ref<T>) -> void`. Then the next time when the GC decided that the object of `%r` is a garbage, it will put it in the queue. This instruction is *atomic*.
Add a new instruction `@uvm.gc.next_object_to_finalise () -> ref<void>`. This instruction removes one reference from the *objects-to-finalise queue* and return it. The current µVM thread blocks if this queue is empty. This instruction is *atomic*.
## Typical usage
Take Java for example.
+ On allocating any object that has an overridden `finalize` method, the `@uvm.gc.prevent_death_once` instruction is executed on the newly allocated object.
+ There is a background thread (ordinary µVM thread, running Client-supplied µVM IR code) running in a loop, executing the `@uvm.gc.next_object_to_finalise` instruction.
+ When the GC is about to collect the finalisable object, it instead puts the object in the queue. The background thread gets that object and do whatever it wants according to the high-level language semantics.
# Backgrounds
**Java**: Objects with `finalize` methods are finalised. `finalize` is **executed automatically only once** per object. Finalisers are executed in **unspecified threads**, at **unspecified times**, in **unspecified order**, may be executed **concurrently**. This sentence: "Every pre-finalization write to a field of an object must be visible to the finalization of that object." implies some kind of fence.
**Python, Ruby, PHP, ...**: Using naive reference counting, finalisers are deterministically executed when the reference count drops to 0.
# Open questions
+ The blocking mechanism (the instruction `@uvm.gc.next_object_to_finalise` can block) looks redundant given that the µVM already has the Futex interface which blocks a thread. There is no obvious way to wake up a thread waiting on this "queue".https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/26Tutorial and Spec versioning2018-06-28T23:11:32+10:00John ZhangTutorial and Spec versioning*Created by: wks*
I am evaluating http://readthedocs.org/ . A µVM tutorial is being written.
Jekyll seems to bias strongly towards blogging and the structures (including a table of content and a per-page sidebar of the outline) are n...*Created by: wks*
I am evaluating http://readthedocs.org/ . A µVM tutorial is being written.
Jekyll seems to bias strongly towards blogging and the structures (including a table of content and a per-page sidebar of the outline) are not automatically handled.
There should be a common version scheme for the spec, the refimpl and the tutorial. I propose using the major version "2" for the current version. The tutorial will target the "main" version (currently "2").
GitHub Wiki sucks (very primitive structure support, no table of content). Considering migrating the specification (https://github.com/microvm/microvm-spec/wiki) back to Sphinx/reStructuredText if readthedocs.org is proved useful.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/25Extended memory model2016-06-17T15:23:23+10:00John ZhangExtended memory model*Created by: wks*
The current memory model is designed according to the C11/C++11 standard. It involves atomic memory access, locks and threads.
However, there are issues which C11 and C++11 do not address. They are:
- [x] Futex: ...*Created by: wks*
The current memory model is designed according to the C11/C++11 standard. It involves atomic memory access, locks and threads.
However, there are issues which C11 and C++11 do not address. They are:
- [x] Futex: Futex `wait` implies an atomic load-compare-sleep operation. (See [spec](https://github.com/microvm/microvm-spec/wiki/memory-model#special-rules-for-futex))
- [x] Trap/watchpoint/osr: They involves stack binding/unbinding. Just make more things explicit. (See [spec](https://github.com/microvm/microvm-spec/wiki/memory-model#special-rules-for-stack-operations))
- [x] Swap-stack: C11 does not address swap-stack, but swap-stack operations in different threads may conflict and visibility issues involved in swap-stack operations should be clarified. (*making racing swap-stack operations undefined behaviours is okay*) (See [spec](https://github.com/microvm/microvm-spec/wiki/memory-model#special-rules-for-stack-operations))
- [X] Foreign language interface: When parallel µVM programs runs together with parallel native programs, we need a way to synchronise the two worlds. This also involves object pinning. See https://github.com/microvm/microvm-meta/issues/37 (It ends up that Mu cannot make much guarantees known that the worst case of native memory access is segmentation fault. This thing must be very implementation-specific.)
Current ideas:
**Futex**: (open question) Should `futex_wake` happen before the next instruction after the `futex_wait` that actually wakes up? **Probably yes.**
```
int shared_state = 42;
atomic_int thread1_wake = 0;
futex thread1_futex;
thread1:
while(load(ACQUIRE, &thread1_wake)!=1) { // OP4
futex_wait(thread1_futex); // OP5
}
int s = shared_state // OP6
thread2:
shared_state = 99; // OP1
store(RELEASE, &thread1_wake, 1); // OP2
futex_wake(thread1_futex); // OP3
```
According to the semantic of RELEASE and ACQUIRE, if OP4 sees the store by OP2, then OP6 must see the store of OP1.
The problem is, if the OP3 `futex_wake` is not guaranteed to happen before the waking of OP5 `futex_wait`, then the next OP4 in the loop may not see the store by OP2 at all, and may go to sleep another time. Then it will be sleeping forever.
**stack binding/unbinding** cannot be atomic. Swap-stack is a combination of two and cannot be atomic, either.
We may require the language implementer to correctly use other synchronisations to make sure racy `swap-stack` operations do not happen.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/24Native Interface (super issue)2016-06-17T15:23:20+10:00John ZhangNative Interface (super issue)*Created by: wks*
This is an outline of issues related to the native interface, that is, interacting with the native world. This topic includes but is not limited to object layout, pointer types, object pinning and foreign function call...*Created by: wks*
This is an outline of issues related to the native interface, that is, interacting with the native world. This topic includes but is not limited to object layout, pointer types, object pinning and foreign function calls. We should open other issues to discuss concrete problems.
* Make a platform-specific Mu specification
+ Address some of the following issues, including object layout, calling convention, ...
+ Draft for AMD64: https://github.com/microvm/microvm-spec/wiki/native-interface-x64-unix
* Type system: (https://github.com/microvm/microvm-meta/issues/34)
+ Raw pointer types
+ Structure types with native/explicit object layout
+ Union types (unlikely to have in Mu)
+ Mapping Mu types to C types and native object layout: (in the [Native Interface](https://github.com/microvm/microvm-spec/wiki/native-interface) chapter in the spec)
* Memory space beyond heap/stack/global
+ Memory spaces with various constraints
- Is it movable, pinnable, has reference, can be referenced to, GC-traced, GC-collected, ...?
+ Object pinning: https://github.com/microvm/microvm-meta/issues/28
- If object pinning is allowed, what does "pin" mean?
* Foreign function interfaces
+ Calling foreign functions from Mu: (The CCALL instruction. See the [spec](https://github.com/microvm/microvm-spec/wiki/native-interface))
+ Calling C functions
+ System calls
+ Calling back to Mu from foreign functions: https://github.com/microvm/microvm-meta/issues/39
+ From C functions
+ Signal handling
The following should be addressed by a higher-level abstraction:
* Loading native libraries
+ The client loads libc and finds the address of `dlopen`, `dlsym`, `dlclose` and `dlerror`. Then the `CCALL` instruction takes care of the rest by calling them.
* Loading "heavier-weight" Mu bundles (currently called MuLF): https://github.com/microvm/microvm-meta/issues/30
The following are not related to the native interface, but are related to raw memory:
* How to expose the address of objects so that the user can analyse the memory behaviour? (This involves profiling, too. We may open a dedicated issue for profiling.)
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/23Micro VM interface changes precipitated by having C as a client2016-06-17T15:23:18+10:00John ZhangMicro VM interface changes precipitated by having C as a client*Created by: eliotmoss*
While there will be a bunch of carefully worked out details that need to follow, I thought I would get the ball rolling with a quick summary / highlights of the Modula-3 approach. (See https://www.cs.purdue.edu/...*Created by: eliotmoss*
While there will be a bunch of carefully worked out details that need to follow, I thought I would get the ball rolling with a quick summary / highlights of the Modula-3 approach. (See https://www.cs.purdue.edu/homes/hosking/m3/reference/m3.html for a Modula-3 reference.)
First, M3 has ordinary *traced* references and also **untraced** ones. REF T is the type of a traced reference to a T, while UNTRACED REF T is the type of an untraced one. Traced vs. untraced is a concept distinct from safe vs. unsafe. Safe use of traced and untraced keep them segregated. Untraced objects come from an explicitly managed storage area. You can use NEW on an untraced reference type to allocate in such an area.
Certain regions of code (interfaces or modules -- probably a bundle is the corresponding thing for the Micro VM) can be marked UNSAFE. Unsafe code can do additional things. These are ones that I remember:
* Cast an expression to an arbitrary type (of the same size in bits) - this clearly allows interconverting between traced and untraced pointers, but also other things; this is called LOOPHOLE(e, T).
* Address arithmetic (Digression: In M3 REFANY is the type of a traced reference to any object, which in safe code requires a dynamically type-checked downcast to turn into a REF T and do more interesting things. The corresponding untraced type is ADDRESS, and it has arithmetic operators.)
* Up/down cast without checking
* Free an untraced reference's referent (the function is called DISPOSE)
Safe code must use traced/untraced only is specific ways, described in the language reference.
We may need to add to this sort of design to allow short unsafe code regions to manipulate ordinary (traced) objects safely in the presence of (say) concurrent GC. Also, an untraced version of iref would make sense for the Micro VM.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/22Trap and OSR2018-06-28T23:11:32+10:00John ZhangTrap and OSR*Created by: wks*
This ticket tracks the design of trap handling, stack introspection and OSR API.
# Overview
There are three flavours of stack usage.
The first case is using TRAP to compute some value (or change some µVM state...*Created by: wks*
This ticket tracks the design of trap handling, stack introspection and OSR API.
# Overview
There are three flavours of stack usage.
The first case is using TRAP to compute some value (or change some µVM states)
1. Enter the `handle_trap` call-back from a TRAP instruction, leaving the stack in the `READY<T>` state. The current thread is unbound and suspended.
2. The Client compute some value (of type `T`) and change some µVM states.
3. Return from the trap and continue normally. That is, re-bind the thread to the stack, passing the value of type T.
The second case is using TRAP for OSR.
1. Enter the `handle_trap` call-back from a TRAP instruction, leaving the stack in the `READY<T>` state. The current thread is unbound and suspended.
2. The Client queries the current version of function, the current instruction, and the current KEEPALIVE variables of any frame in the current stack.
3. The Client pops frames. Now the stack is in some inconsistent state.
4. The Client pushes frames. For each frame, supply the current version of function, the current instruction and the value of any live variables.
5. The Client re-bind the thread to the stack and return from the TRAP. Just before rebinding, the stack should be in some `READY<U>` state where U may not be T.
The third case is to manipulate some arbitrary stack in a `READY<T>` state.
1. The Client do whatever it wants to the stack.
2. The stack is in `READY<U>` state where U may not be T.
# Open questions
## Is an UNDER_CONSTRUCTION flag needed?
Observed from the previous cases, there are generally two categories:
1. Do not perform OSR and simply return with some value (or throw exceptions to the stack).
2. Perform OSR.
The second case may leave the stack temporarily in an inconsistent state. Any attempt to swap-stack to such a stack is meaningless. The UNDER_CONSTRUCTION flag indicates such a state.
This requirement can be interpreted in two ways:
1. This flag is a physical flag. The Client takes an action to set the flag. After OSR, it clears the flag. This flag can be probed and can be tested during SWAP-STACK. A µVM implementation may implement a mutual-exclusive lock for swap-stack (but may be inefficient).
2. This flag is only conceptual, that is, it does not physically exist. The Client simply does OSR. There is no way to see whether a stack is "under construction". Swapping to such a stack gives undefined behaviour.
I prefer the second approach. Swapping to an "under construction" stack is never meaningful and always requires extra synchronisation in the program. We may trust the Client to generate correctly synchronised code.
## What state is a stack in when some frames are popped?
All frames other than the top frame must be executing the CALL instruction. After popping any frame, the "caller frame" is exposed as the top frame and it may continue with a value or receive an exception just like the TRAP instruction. So it is natural to define that after popping, the stack is in the `READY<R>` state where R is the return type of the current function.
However, from the implementation point of view, SWAP-STACK must have a different calling convention from ordinary calls (mainly because SWAP-STACK cannot have any callee-saved registers because the callee may not swap back). There must be a "ghost frame" above the current frame with CALL to adapt to the SWAP-STACK calling convention. The value passed by SWAP-STACK will be returned from the "ghost frame" to the CALLer.
We may assume adding the "ghost frame" is cheap. Maybe not.
# Hypothetical Client code in Java
This code lets the Client perform some computation.
```Java
/* Assume the following µVM IR code:
%bb:
@current_time_millis_1234567 = TRAP <@i64>
CALL <...> @print (@current_time_millis_1234567)
....
*/
class Client extends MicroVMClient {
@Override
public TrapReturnValue handleTrap(ClientAgent ca, int threadHandle, int stackHandle) {
long time = System.currentTimeMillis();
ca.putLong("@i64", time);
return new RebindThreadPassValue(stackHandle, time);
}
}
```
This code replaces the top frame:
```Java
class Client extends MicroVMClient {
@Override
public TrapReturnValue handleTrap(ClientAgent ca, int threadHandle, int stackHandle) {
// Introspect the frames
int curInstID = ca.getCurrentInstruction(stackHandle, 0); // 0 = top frame
int[] keepAlives = ca.dumpKeepAlives(stackHandle, 0); // 0 = top frame
// Re-compile the function. newFunc also tells the Client where to continue.
HighLevelFunction newFunc = compileNewFunction(...);
// Pop a frame
ca.popFrame();
// What µVM function is the new high-level function?
int funcID = newFunc.getUvmFuncID();
// Where to continue?
int contInstID = newFunc.getContinuationPoint();
// What are the values of local variables?
Map<Integer, Integer> variableToValue = new HashMap<Integer, Integer>();
for (LocalVariable lv: newFunc.localVariables()) {
int valHandle = ca.putXxxx(lv.getValue())
int varID = lv.getUvmVarID();
variableToValue.put(varID, valHandle)
}
// Push the frame
ca.pushFrame(funcID, contInstID, variableToValue);
// Return from TRAP, tell the µVM to re-bind the thread with the stack. The trap does not receive values.
return new RebindThreadPassVoid(stackHandle);
}
}
```
This example emulates the JVMTI function `ForceEarlyReturnInt` (force a function (of `int` return value) to return early with a specific value, not executing any finalisers).
```java
/*
Assume the following µVM function:
.funcsig @foo_sig = @i32 ()
.funcdef @foo VERSION @foo_v1 () {
%entry:
@my_trap_xxxxxx = TRAP <@void>
THROW @NULLREF
}
*/
class Client extends MicroVMClient {
@Override
public TrapReturnValue handleTrap(ClientAgent ca, int threadHandle, int stackHandle) {
ca.popFrame(stackHandle); // Pop the top frame and expose its caller to the top
int returnValue = 42;
int rvHandle = ca.putInt("@i32", returnValue);
return new RebindThreadPassValue(stackHandle, rvHandle);
}
}
```
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/21Proposed Lua-like µVM-Client Interface2018-06-28T23:11:32+10:00John ZhangProposed Lua-like µVM-Client Interface*Created by: wks*
In Lua, the C program exchanges values with a "Lua state" using a stack.
1. All Lua values are kept on the stack and can converted to and from C values on demand.
2. All Lua references must be kept on the stack. Al...*Created by: wks*
In Lua, the C program exchanges values with a "Lua state" using a stack.
1. All Lua values are kept on the stack and can converted to and from C values on demand.
2. All Lua references must be kept on the stack. All operations involving tables require the table operand to be on the stack.
Why?
1. The type systems of Lua and C are different. This stack segregates all Lua types from C types.
2. Lua uses garbage collection. (The official Lua uses mark-sweep.) This stack simplifies GC by preventing the C program from keeping a reference to the Lua world.
Reference: The stack in the Lua C API http://www.lua.org/pil/24.2.html
# Overview
The Client interacts with the µVM via a µVM Client Agent. The Agent keeps a stack of µVM values, a thread-local allocator and so on. It is the counterpart of a µVM thread, albeit working for the Client. Each Agent is only accessible from one Client thread (not thread safe), but a µVM Client may have arbitrary number of Agents.
There are several principles:
* The stack holds any µVM value that can be held in a µVM SSA variable. A cell in the stack is like a µVM SSA variable. Unlike the µVM memory, it does not have memory location and cannot be referred to.
* The Client provide implementation-specific way to add Client values into the Agent stack and extracting µVM values to Client values.
* All µVM API messages that take µVM values as parameters shall use existing µVM values on the stack. All µVM API messages that return µVM values shall push new values on the stack.
Example (Java as Client):
```java
MicroVM mvm = …;
ClientAgent ca = mvm.newClientAgent();
// push some values
ca.pushInt("@i32", 0x12345678);
ca.pushLong("@i64", 0x123456789abcdef0L);
ca.pushFloat(3.14F);
ca.pushDouble(6.28);
// The Java integer type and the µVM type does not need to match.
// The following pushes convert Java integer type to µVM types of different lengths
ca.pushInt("@i64", -0x22334455);
ca.pushLong("@i32", 0x123456789abcdef0L); // truncated to 0x9abcdef0
ca.pushInt("@i1", 1); // integer of 1 bit (boolean type)
ca.pushBool("@i1", true); // same as above.
// Retrieving values from the stack
int v1 = ca.toInt(-4); Get the 4th top element from the stack and convert to Java int. So it is -0x22334455.
long v2 = ca.toLong(-4); Same as above, but convert to Java long (zero extended). So it is 0xddccbbabL.
// Popping
ca.pop(7); Pop 7 elements from the stack.
// Memory access
ca.pushGlobal("@global_var"); // Get the internal reference of a global cell and push on the stack
ca.load(MEMORD_ACQUIRE); // Load. Assume the top element is an internal reference.
ca.pushGlobal("@global_var");
ca.pushInt("@i64", 42);
ca.store(MEMORD_RELEASE); // Store. The top element is the new value and the second is the internal reference.
// Calling a µVM function
// This is fairly complicated because this involves creating both a µVM stack and a µVM thread.
ca.pushFunc("@some_func"); // Push a function reference
ca.pushInt("@i32", 42); // Push argument1
ca.pushDouble(3.14); // Push argument2
try {
ca.newStack(2); // Create a new stack, using a function with 2 arguments on the stack.
// So the top 2 elements are arguments and the third element is the function itself.
// Those elements are popped and a new stack value is pushed.
} catch (MicroVMStackOverflowException e) {
...
}
ca.newThread(); // Create a new thread. Assume the top element on the stack is a µVM stack value.
```
# API functions
The principles are
+ Pushing operations add new values to the top.
+ The popping operation removes values from the top.
+ Queries (the operations that convert values back to Java values) are non-destructive. They keep the values.
+ Operations on the top of the stack are destructive: they pop the operands, like the JVM, then push new values on the top.
## Getting the µVM Client Agent
* message: `new_client_agent`
* parameters: none
* returns: a handle of new client agent
Create a new Client Agent.
Example Java signature: `public ClientAgent MicroVM#newClientAgent()`
* message: `close_client_agent`
* parameters:
1. `ca`: the handle to the client agent
* returns: none
Close the Client Agent.
Example Java signature: `public void ClientAgent#close()`
## Pushing new values to the stack
* message: `push_value`
* parameter:
1. `uvm_type`: the µVM type of the value
2. `val`: the value in the Client's representation
* returns: none
* stack top:
+ before: ...
+ after: ..., `val_in_uvm_type`
Convert the Client value `val` to the µVM type `uvm_type` and push it to the stack. This message only work for non-reference values, including integers and floating point numbers.
The µVM may implement this as multiple functions/methods that best suits the Client programming language.
Example Java signatures:
* `public void ClientAgent#pushInt(int uvmTypeID, int val)`
* `public void ClientAgent#pushLong(int uvmTypeID, long val)`
* `public void ClientAgent#pushBigInteger(int uvmTypeID, BigInteger val)`
* `public void ClientAgent#pushFloat(int uvmTypeID, float val)`: will truncate/extend to the µVM type
* `public void ClientAgent#pushDouble(int uvmTypeID, double val)`: will truncate/extend to the µVM type
* `public void ClientAgent#pushFloatNoType(float val)`: always convert to the µVM `float` type
* `public void ClientAgent#pushDoubleNoType(double val)`: always convert to the µVM `float` type.
* message: `push_global`
* parameters:
1. `global`: the ID/name of a µVM global cell
* returns: none
* stack top:
+ before: ...
+ after: ..., `global_val`
Push an internal reference of a global cell to the stack.
Example Java signature: `public void ClientAgent#pushGlobal(int uvmGlobalID)`
* message: `push_func`
* parameters:
1. `func`: the ID/name of a µVM function
* returns: none
* stack top:
+ before: ...
+ after: ..., `func_val`
Push a function reference (µVM's `func` type) of a µVM function to the stack
Example Java signature: `public void ClientAgent#pushFunc(int uvmFuncID)`
## Converting to Client types
* message: `to_client_value`
* parameters:
1. `pos`: the position in the stack
* returns: the µVM value in the client type
* stack top: not changed
Convert a value in the stack to the client type. This applies for non-reference types including integers and floating point numbers.
The µVM may implement this as multiple functions/methods that best suits the Client language.
Example Java signatures:
* `public int ClientAgent#toInt(int pos)`
* `public long ClientAgent#toLong(int pos)`
* `public BigInteger ClientAgent#toBigInteger(int pos)`
* `public float ClientAgent#toFloat(int pos)`
* `public double ClientAgent#toDouble(int pos)`
## Popping
* message: `pop`
* parameters:
1. `num`: the number of values to pop
* returns: none
* stack top:
+ before: ..., `elem_1`, `elem_2`, ..., `elem_num`
+ after: ...
Pop `num` elements from the stack.
Example Java signature: `public void ClientAgent#pop(int num)`
## Memory Allocation
* message: `new`
* parameters:
1. `type`: the ID/name of the µVM type of the object
* returns: none
* stack top:
+ before: ...
+ after: ..., `ref`
Allocate an object of type `type` on the µVM heap and push the object reference on the stack.
Example Java signature: `public void ClientAgent#newObj(int typeID)`
* message: `new_hybrid`
* parameters:
1. `type`: the ID/name of the µVM type of the object
* returns: none
* stack top:
+ before: ..., `len`
+ after: ..., `ref`
Allocate an object of type `type`, which must be a `hybrid` type, on the µVM heap and push the object reference on the stack. The length of the variable part is `len`, which is any µVM integer types zero_extended to the machine word length.
Example Java signature: `public void ClientAgent#newHybridObj(int typeID)`
## Memory Access
* message: `load`
* parameters:
1. `memord`: the memory ordering
* returns: none
* stack top:
+ before: ..., `iref`
+ after: ..., `val`
Load from an internal reference `iref` on the stack and push the loaded value to the stack, using the `memord` memory ordering.
Example Java signature: `public void ClientAgent#load(MemoryOrder memOrd)`
* message: `store`
* parameters:
1. `memord`: the memory ordering
* returns: none
* stack top:
+ before: ..., `iref`, `new_val`
+ after: ...
Store `new_val` on the stack into an internal reference `iref` on the stack, using the `memord` memory ordering.
Example Java signature: `public void ClientAgent#load(MemoryOrder memOrd)`
## Stack and Thread operations
* message: `new_stack`
* parameters:
1. `nparams`: the number of parameters to the stack-bottom function
* returns: none
* stack top:
+ before: ..., `func`, `arg_1`, `arg_2`, ..., `arg_nparams`
+ after: ..., `stack`
Create a new stack using `func` as the stack-bottom function and `arg_x` as its arguments. Push the newly created `stack` value to the stack.
Example Java signature: `public void ClientAgent#newStack(int nParams)`
* message: `new_thread`
* parameters: none
* returns: none
* stack top:
+ before: ..., `stack`
+ after: ..., `thread`
Create a new thread which is initially bound to a stack `stack`. Push the thread value on the Client Agent stack. The new thread `thread` starts execution immediately.
Example Java signature: `public void ClientAgent#new_thread()`
## Other API functions
TODO: define them later
* `is_int`, `is_float`, `is_ref`, `is_iref`, ... `is_stack`, `is_thread`, `is_tagref64`
* `tagref64_is_int`, ... `tagref64_get_ref`, ..., `tagref64_set_fp`... : manipulate the `tagref64` type.
* `copy_value`, `remove_value`: manipulate the Client Agent stack.
* `extract_field`, `insert_field`: manipulate `struct` types
* `extract_element`, `insert_element`: manipulate `vector` types
* `get_iref`, `get_field_iref`, `get_elem_iref`, `shift_iref`, `get_fixed_part_iref`, `get_var_part_iref`: manipulating reference types.
* `get_current_stack`, `kill_stack`, `bind_thread_to_stack`: advanced thread/stack operations
* `get_active_func_version_id`, `get_current_instruction_id`, `dump_keepalive_variables`, `pop_frame`, `push_frame`: for OSR
# Known Issues
This API assumes a stack which can contain ANY µVM types that were applicable for SSA variables. This makes it a dynamically types. This is good for Lua because Lua is dynamic and has a small set of types (only nil, boolean, number, string, table, function, userdata, ...), all of which have similar sizes.
The main problem with the µVM is when there are µVM `struct` type values (especially large structs which, themselves, are bad to be represented as value rather than reference to heap object). Some corner cases include the "complex number" type which can be represented as a struct of two doubles. In any cases, extra type information must be kept for the stack to know the type of all of its elements.
I have to trust the µVM implementation to handle the dynamic typing efficiently.
spec-2https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/20Client languages in play2016-06-17T15:23:09+10:00John ZhangClient languages in play*Created by: eliotmoss*
Maybe this is also being discussed elsewhere -- if so, we can close this and move the discussion there. Anyway, on the US side we are wondering about the specific suite of client languages being worked on. We h...*Created by: eliotmoss*
Maybe this is also being discussed elsewhere -- if so, we can close this and move the discussion there. Anyway, on the US side we are wondering about the specific suite of client languages being worked on. We have at least one student interested in doing a mapping from Java bytecode to μVM IR, and we also wonder about some of the dynamic languages (Javascript, maybe? I should perhaps come back and edit this after this week's meeting with the students). What about C or C--? For C we could look at lcc, a little C compiler originally designed for a compiler class, I think, but it accepts some standard kind of C, as I recall. gcc is more of a bear, but ought to be on the agenda at some point, no?https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/19Code generator prototyping strategies2016-06-17T15:23:07+10:00John ZhangCode generator prototyping strategies*Created by: eliotmoss*
As we start to ramp up in Amherst, we're pondering how best to get things rolling (i.e., what a good starting point is) and where we want to end up.
**Initial prototype thought**
For our initial prototype w...*Created by: eliotmoss*
As we start to ramp up in Amherst, we're pondering how best to get things rolling (i.e., what a good starting point is) and where we want to end up.
**Initial prototype thought**
For our initial prototype we are thinking about using QEMU (locally pronounced KEE-mew), the Quick Emulator. QEMU is a whole machine emulator (but can also be configured to run as user-mode only) that supports emulating one hardware ISA on another one (and same to same, of course). Beng whole-machine it deals with devices, interrupts, memory mapping, etc., but the part of greatest interest to us is that it takes guest ISA code blocks and dynamically translates them to host machine code, maintaining the translations in a code cache. Thus parts of its design will be great for dynamic and adaptive code generation. QEMU works by translating guest code into QEMU IR, which is then passed to a target code generator. That code generator does a useful level of local register allocation and local optimizations (hence the Quick in Quick Emulator).
For our case, μVM IR would be the guest ISA and our target would the host ISA that QEMU targets. A part of how QEMU does things is that the guest register file is represented as a memory data structure and (to first approximation) the host registers are used for local register allocation. To finer approximation I **believe** QEMU can bind some guest registers to host registers, useful where the host register set is larger. For μVM IR the "registers" are the SSA values and we can probably get some mileage from the SSA form in telling the QEMU register allocator helpful things about when things "die", etc.
Downsides of this prototype approach include:
* Dependence on QEMU
* Code quality may be limited, particularly in that it will tend to push all variables to memory as it crosses from block to block
* It is not clear what the situation is with QEMU and multithreading, though it **may** support concurrent execution
Upsides:
* Fast path to getting native code going
* Useful optimizations, especially for a dynamic environment
* Back-end architecture design that we might want to use a guide in our own work
* Already implements major platforms of interest
**Eventual system design thought**
Down the line we may want something like QEMU's code generation design, with more provision for (optional) higher levels of optimization and register allocation across larger scopes (that is, for whole functions, not just blocks/traces). But their internal IR, code generators, etc., might be a very useful base, as well as having a notion of the code cache design and so forth.https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/18How to represent merging of variables / Phi functions2018-06-28T23:11:32+10:00John ZhangHow to represent merging of variables / Phi functions*Created by: eliotmoss*
This is an update of a proposal from a year ago.
The current Mu definition of SSA-form has labels, branches to labels, and Phi functions just after labels.
I propose an alternative, which we might call "got...*Created by: eliotmoss*
This is an update of a proposal from a year ago.
The current Mu definition of SSA-form has labels, branches to labels, and Phi functions just after labels.
I propose an alternative, which we might call "goto with values". It is similar to continuation passing style except that all the continuations are statically defined (one for each label). This alternative would work as follows:
* Each label would have zero or more (SSA-variable : type) pairs. These SSA-variables are local to the block that starts at the label.
* Each branch label (both labels in a conditional branch, the single labels in an unconditional branch, etc.) would list locally visible values being "sent" to the branch target.
Example:
x = 3;
y = 25;
goto l(x, y);
...
l: (x2:int<32>,y2:int<8>)
This form makes it clear that the association ("assignment") of values to phi-variables has to happen as part of the control transfer. Phi-functions are simply one way of representing that, but they're not not a way that seems immediately helpful for code generation.
I further observe that the current definition apparently allows any value to be passed in to a Phi, but I think it should be restricted to global/constant SSA variables or SSA variables defined in the sending block. The new form perhaps makes that clear, not least because with it you can explicitly disallow referring to local SSA variables defined in other blocks.
Putting these another way, non-global SSA-variables have a scope only from where they are defined to the end of their block. Some may be defined at the label that start the block and others may be defined later, but all live values must be mentioned at branches and explicitly passed on.
Compared with traditional SSA form this may appear verbose. However it has at least a couple of advantages:
* A code generator or optimizer need not perform a live variable analysis to know what is live at a given point in the code.
* Longer live ranges are broken up into smaller ones, which may allow better register allocation, and which is very suited to a coalescing register allocator.
* This form appears simpler to deal with in a formal specification of Mu.https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/17Portability vs exploiting architecture potentials.2016-06-17T15:23:01+10:00John ZhangPortability vs exploiting architecture potentials.*Created by: wks*
> For should the enemy strengthen his van, he will weaken his rear; should he strengthen his rear, he will weaken his van; should he strengthen his left, he will weaken his right; should he strengthen his right, he wil...*Created by: wks*
> For should the enemy strengthen his van, he will weaken his rear; should he strengthen his rear, he will weaken his van; should he strengthen his left, he will weaken his right; should he strengthen his right, he will weaken his left. If he sends reinforcements everywhere, he will everywhere be weak. -- Art of War, by Sun Zi
The µVM is designed to abstract over the hardware, but only acts as a thin layer of abstraction.
There are many differences between underlying hardware platforms. Scroll down to the Appendix or read https://github.com/microvm/microvm-meta/issues/16
# Solutions to differences
As mentioned in https://github.com/microvm/microvm-meta/issues/16, there are three solutions, from the weakest to the strongest.
1. Define the differences as **undefined behaviour**. The client must prevent touching those fields at all cost.
2. Define the differences as **implement-defined behaviour**. The µVM implementation define the behaviour and provide documents/compile-time/run-time checkable mechanisms.
3. **Define the behaviour** in the µVM. The µVM implementation bridges the differences.
The 1st approach is the weakest. It makes it absolutely impossible to make use of any platform-specific features and will introduce excessive checkings to avoid undefined behaviour. So the µVM should lie somewhere in between approach 2 and 3.
# The µVM design goal
The µVM is having several conflicting design goals.
1. µVM is low-level. It is a thin layer over the hardware.
2. µVM is minimal. Anything that can be done efficiently by the Client should be done by the Client.
3. µVM is **portable**.
4. µVM should support high-performance VMs
5. µVM potentially run on resource-constrained devices.
Depending on how to interpret **portable**, there are two different implications:
1. The µVM provide compile-time or run-time flags so that the Client can generate platform-dependent µVM IR code.
2. The µVM IR code is cross-platform. The same µVM IR code should run on all platforms.
The second interpretation is more portable than the first. Overall, the more portable the µVM is, the thicker the abstraction layer is.
Goal 4 can be interpreted in different ways:
1. Highest theoretical possible performance.
2. As high as C for computation-intensive problems.
3. As high as Java for computation-intensive problems.
4. Much higher than Python/PHP/R/<*insert your favourite scripting language here*>
5. As high as Python/PHP/R/...
6. As long as the program eventually terminates.
Depending on the concrete implementation, an average desktop/server µVM should reach between 2 and 3.
Goal 5 requires the µVM and the Client to be simple. In this case, the Client and the µVM cannot perform too much reasoning about the program. This results in sub-optimal performance of emitted code.
# Choices in the µVM
## Behaviour of UDIV/SDIV
1. UDIV/SDIV are implementation-defined, or
2. They always behave like Java (div by 0 is an exception, -0x80000000/-1 = -0x80000000)
## Supported vector sizes
Conflicts:
1. The current µVM IR is very expressive. It can express any vector types, given type T and size n: `vector <T n>`
2. Not all are supported by the architecture
So the µVM should
1. Make it **implementation-defined**, or
2. Only support **some selected vector sizes**, or
3. Support some selected vector sizes **and platform-specific vector sizes**, or
3. Support **all** `vector<T n>`, either using scalar operations or vector instructions.
Rationale:
1. The Client can **probe** the supported vector sizes and can generate code accordingly. So it is the Client's responsibility to choose vectors. This is also a kind of **specialisation** which is usually done by the high-level optimiser.
2. Choosing the appropriate vector size is a kind of **instruction selection** and should be done by the µVM in a platform-specific fashion.
For rationale 1, (DeVito et al)[http://terralang.org/pldi071-devito.pdf] used auto-tuning technique to determine the optimal vector size for vectorised matrix multiplication problem.
For rationale 2, (Jibaja et al)[https://01.org/node/1495] proposed adding SIMD into JavaScript, but only providing 128-bit registers to the programmer. However, the following code is difficult for the µVM to convert to 256-bit vectors:
```
__m128 elems[N];
for (int i=0; i<N; i++) {
elems[i] = vector_add_floatX4(elems[i], constant_vector(1,2,3,4));
}
```
The reason is:
1. It requires extensive control-flow analysis to merge two adjacent 128-bit adding to one 256-bit adding.
2. 256-bit vectors have different alignment requirement. So array __m128[] cannot be treated as the array __m256[].
The following code, however, is easier to convert:
```
__align_to(256) __m128 elems[N];
for (int i=0; i<N; i+=2) {
elems[i] = vector_add_floatX4(elems[i], constant_vector(1,2,3,4));
elems[i+1] = vector_add_floatX4(elems[i+1], constant_vector(1,2,3,4));
}
```
However, the client, when generating such code, **is already aware of** the presence of 256-bit vectors.
## Array indexing
Conflict:
1. Array indexing seems to be platform-independent. It is "begin+index" (index can be negative).
2. It is implemented using address calculation and memory accessing, where "address" is word-sized.
Solutions
1. Use **any integer type** as the index and are sign-extended (used by LLVM. See http://llvm.org/docs/LangRef.html#getelementptr-instruction), or
2. The index must be **word-sized** and the Client must appropriately extend/truncate the index to word-sized (current µVM design)
Rationale:
1. It only involves a signed extension of truncation and the µVM can cheaply do it. However,
2. Since the Client can probe the word size and the Client can generate TRUNC/SEXT instructions, it should be done by the client.
# Portable µVM IR as a subset of the µVM
If we can define **a subset of the µVM** with **defined behaviours** and **reasonable** performance (perhaps calling it "µVM Mobile Edition"), then some simplistic µVM Clients (I assume there will soon be many such Clients) can generate portable µVM IR code.
For implementations that seek ultimate performance, the µVM implementation can implement many **machine-dependent instructions** as a super set of such a "portable" IR. (call it "µVM Enterprise Edition"?)
Alternatively, we can define that "subset" as "The µVM IR ®" and treat the implementation-dependent extensions as "extended µVM".
# Appendex: Examples of differences
There are differences between architectures. For example:
The **word length** is different. Some operations (including array indexing) depends on the word length.
The **integer division** instruction behaves differently among architectures, especially in "division by zero" and "signed integer overflow".
Not all operations on all types perform equally well. 64-bit integer operations perform significantly worse (if possible at all) than 32-bit counterparts on 32-bit machines.
The supported **vector length** of vector instructions is different. This may vary from 64 bits (ARM) up to 512 bits (x86 with AVX) with 128 bits widely supported.
Some instructions are "optional" and the functionality must be software-implemented on those architectures. For example, **UDIV** and **SDIV** in ARMv7.
Although most architectures provide all operations (binary arithmetic and logical, comparison and conversion) mentioned in LLVM, concrete instructions does not work on all data sizes. For example, **conversion from floating point to integer** can only convert between certain data types (float to 32-bit int or 64-bit int, but not other int types.) Not all **vector operations** work for all vector types.
The address size (and indexes) of memory (and array) operations is different among architectures.
Supported data type that can be **atomically loaded/stored**. This affects how the Client is going to implement mutex locks.
Alignment requirement for **vector load/store** operations.
In the native interface/C interface/foreign function interface(FFI)
* The **pointer size** depends on the architecture. This also affects **function pointers** for external C functions.
* The **available system calls** depend on the operating system, the ABI and the processor. The Client must handle the differences between operating systems.
Other issues mentioned in https://github.com/microvm/microvm-meta/issues/16
# Current extension mechanism
An **intrinsic function** (or IFUNC for short) is something that has the similar form of a function, but is treated by the µVM as a regular instruction. The simplest form takes only the name (or ID) of the IFUNC:
```
%result = ICALL @uvm.thread_exit
```
The most complete form takes type arguments, value arguments and an exceptional destination:
```
%result = ICALL @uvm.do_something_weird <@T1 @T2 @T3> (%val1 %val2 %val3) EXC %normal %exceptional
```
Theoretically all binary operations can be defined as IFUNCs. For example:
```
%result = ICALL @uvm.math.sdiv <int<64>> (%lhs %rhs) EXC %normal %div_by_zero_handler
```
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/16Undefined vs Implementation-defined2018-06-28T23:11:32+10:00John ZhangUndefined vs Implementation-defined*Created by: wks*
NOTE: a higher-level discussion is in https://github.com/microvm/microvm-meta/issues/17
During the meeting in 23 September 2014, we talked about the difference between "undefined behaviour" and "implementation-defin...*Created by: wks*
NOTE: a higher-level discussion is in https://github.com/microvm/microvm-meta/issues/17
During the meeting in 23 September 2014, we talked about the difference between "undefined behaviour" and "implementation-defined behaviour".
# Background
Some operations in C as well as LLVM are undefined behaviours.
* Division by zero.
* Example: 42 / 0
* Overflow in signed division.
* Example: int a = -0x80000000; int b = a / -1;
* Shifting an integer by a number of bits greater than the length of the left-hand-side
* Example: int a = 42; int b = a << 32; int c = a << -1; assume int is 32-bit.
However, the machine instruction counterparts have defined behaviours in each and every architecture.
* Division by zero
* x86: IDIV, DIV: Divide-by-zero raises "divide error".
* ARMv7: SDIV, UDIV:
* ARMv7-A: Divide-by-zero always gets 0.
* ARMv7-R: Controlled by SCTLR.DZ, an "Undefined Instruction" exception may or may not be raised.
* A64: Divide-by-zero always gets 0
* division overflow
* x86: IDIV, DIV: If the result is not representable (positive too large, negative too small) by the corresponding type (signed or unsigned), then raises "divide error".
* ARMv7-A: SDIV and UDIV are optional. They may be implemented by software.
* ARMv7, A64: SDIV, UDIV: result is truncated to the number of bits of the corresponding type. No error is raised. So -0x80000000 / -1 == -0x80000000 when it is 32-bit.
* shifting:
* x86: SAL,SAR,SHL,SHR: the count operand is masked to 5 bits (32-bit integer) or 6 bits (64-bit integer)
* ARMv7:
* LSL, LSR, ASR (immediate): It can only encode 5 bits of shift amount.
* LSL, LSR, ASR (register): The shift mount register is masked to 8 bits. After shifting, the last 32 bits are the result.
* A64:
* ASR, LSL, LSR (immediate): It can only encode 6 bits of shift amount.
* ASRV, LSLV, LSRV (register controlled): The shift amount is the second register modulo the register size (i.e. masked).
# Undefined behaviour vs implementation-defined behaviour
In C11:
+ **undefined behavior**: behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements
+ **implementation-defined behavior**: unspecified behavior where each implementation documents how the choice is made
Implementation-defined behaviour has an additional requirement that the implementation should document the behaviour. As long as the behaviour is documented, it is still considered "defined", but at a different layer.
If a behaviour is never defined, the higher level (e.g. the Client) has no chance to depend on it even if the lower level (e.g. the CPU) has a precisely defined behaviour. On the contrary, if the behaviour is implementation-defined, there are still ways for the higher level to use the low-level detail. The ways include (assume the higher level is the client, the middle level is the µVM and the lower level is the CPU):
+ The Client programmer read the µVM manual for the particular CPU.
+ The µVM provides compile-time-checkable flags and the Client is conditionally compiled. (`configure --with-uvm=x86-64`)
+ The µVM provides run-time-callable functions and the Client generates code conditionally (`if (uvm.wordSize() == 64) { emitStore64(reg,mem); }`).
# How should an abstraction layer be made over differences?
Undefined behaviours usually occur in the cases (other than errors) where different platforms behave differently. Division and shifting are two examples.
When creating an abstraction layer over such differences, there are basically three choices.
1. Define the different part as *undefined behavior* and the high-level user must avoid using them.
* C and LLVM takes this approach.
* The advantage is to make the specification very simple.
* The disadvantage is making it very difficult for the higher level to make efficient use of these cases since they must try everything to avoid those undefined behaviours.
2. Define the different part in one particular way and provide an implementation on every platform to behave like that.
* Java takes this approach.
* The advantage is the maximum portability of high-level code, since all programs work the same everywhere.
* The disadvantage is to make implementation and optimisation very difficult because the are too much invariants to maintain.
3. Define the different part as *platform-specific* and require the high-level to handle the difference.
* This will be the approach taken by the µVM.
* The advantage is to let the client make full use of the capabilities provided by the platform, resulting in efficient code. The µVM spec is also kept simple by pushing the differences to the implementation and the client.
* The disadvantage is adding more burden to the clients. The client must be aware of the difference between the µVM implementation on different platforms. The good thing is, the µVM still abstracts over the hardware, so the knowledge required by the client is limited to the µVM layer, not the hardware.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/15Summary of changes in the Type System and the Instruction Set2018-06-28T23:11:32+10:00John ZhangSummary of changes in the Type System and the Instruction Set*Created by: wks*
# Top-level IR
* Support new OSR mechanism #5
- Function version clause #5
# Type system
* Add vector type. #9
* (REMOVED) Add lock type #11
# Instruction set
* Add "exception clause" #6 #7 #10
...*Created by: wks*
# Top-level IR
* Support new OSR mechanism #5
- Function version clause #5
# Type system
* Add vector type. #9
* (REMOVED) Add lock type #11
# Instruction set
* Add "exception clause" #6 #7 #10
- Also merge instructions: `CALL`/`INVOKE`, `ICALL`/`IINVOKE`
* Basic operations allow vector types and new instructions for vectors #9
- BinOp, Cmp, Conv, `SELECT`, `PHI`, `CALL`, `RET`, `LOAD` (gather), `STORE` (scatter)
- `EXTRACTELEMENT`, `INSERTELEMENT`, `SHUFFLEVECTOR`
* Match the new stack states (`READY<T>` and friends) #7
- Revised `TRAP`/`WATCHPOINT` instruction #7
- Revised `SWAPSTACK` instruction #7
* Revised `ICALL` instruction #9 spec-2https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/14Document memory layout requirements2018-06-28T23:11:32+10:00John ZhangDocument memory layout requirements*Created by: wks*
Because of the difference between architectures, it is better to leave the object layout to be implementation-specific. An implementation of a µVM can choose its optimal strategy to make memory accesses as fast as poss...*Created by: wks*
Because of the difference between architectures, it is better to leave the object layout to be implementation-specific. An implementation of a µVM can choose its optimal strategy to make memory accesses as fast as possible.
However, although the memory layout, is part of the implementation, some guarantees must be made to enable:
1. implementation of object-oriented languages: the superclass-subclass relationship can be most conveniently implemented as the superclass being a prefix of a subclass structure.
2. interoperation between µVM programs and C programs: during a foreign function call, data structures are shared so that data can be passed both ways.
3. some basic data structures must support atomic accesses. One of such type is references.
# Prefix rules
Some type **is a prefix of** another type. If T1 is a prefix of T2, then there are **shared components** between T1 and T2. A component can be the whole value or some part of it, including fields in structs, elements in arrays and vectors and both the fixed part and the variable part in hybrids.
Specifically:
* Any type is a prefix of itself.
+ All corresponding components are shared.
* `void` is trivially a prefix of any type.
+ No component is shared.
* `T1 = T` is a prefix of `T2 = struct <SEQ>` for any T where SEQ is a sequence of types beginning with T.
+ The whole T1 itself is a shared component with the first field in the struct T2.
* `T1 = T` is a prefix of `T2 = hybrid<T U>` for any T.
+ The whole T1 is a shared component with the fixed part in the hybrid T2.
* `T1 = T` is a prefix of `T2 = array<T n>` for any T if n >= 1.
+ The whole T1 is a shared component with the first element in array T2.
* For all types T1, T2 and T3, if T1 is a prefix of T3 and T3 is a prefix of T2, then T1 is a prefix of T2.
+ The shared component between T1 and T2 are their mutual shared components with T3.
Examples:
* `float` is a prefix of `struct<float double>`.
+ The first field is shared.
* `struct<@TIB_REF @LOCK @LENGTH_TYPE>` is a prefix of `hybrid<struct<@TIB_REF @LOCK @LENGTH_TYPE> int<8>>`.
+ The fixed part of the latter type is shared with the former type.
* `int<8>` is a prefix of `array <int<8> 100>`.
+ The first byte element of the latter is shared with the former.
* `@TIB_REF` is a prefix of `hybrid<struct<@TIB_REF @LOCK @LENGTH_TYPE> int<8>>`
+ There is an intermediate type `struct<@TIB_REF @LOCK @LENGTH_TYPE>` that bridges the "is a prefix of" relation.
If:
* There is a memory location M which represents data of type T2, and,
* T1 is a type and T1 is a prefix of T2, and,
* r1 is an `iref<T1>` and refers to memory location M1, and,
* r2 is an `iref<T2>` and refers to memory location M, and,
* the beginning of M and M1 are the same (i.e. the have the same address), and,
* rc1 is r1 or an internal reference derived from r1, and,
* rc2 is r2 or an internal reference derived from r2, and,
* rc1 and rc2 refer to a shared component between T1 and T2,
then rc1 and rc2 refer to the same memory location. This means the shared components can be accessed as if it is a field of a prefix. This allows treating a subclass as an instance of a superclass.
Related standard: C11
* 6.3.2-3: (**array --> pointer to first element**) ... an expression that has type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’ that points to the initial element of the array object and is not an lvalue. ...
* 6.7.2.1-15:(**pointer to struct <--> pointer to first field**) ... A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. ...
TODO: C does not explicitly allow the prefixing between the following structs where their sequences of elements has a "prefix" relation:
```
struct Foo { short a; int b; long c; };
struct Bar { short a; int b; long c; float d; double e; };
```
There may be a reason behind it.
# Object layout and C foreign function interface (FFI)
The object layout *should* follow the application binary interface (ABI) as much as possible because:
* The ABI is carefully designed by system programmers for performance. The µVM implementer needs a good reason why not to follow it.
* When a foreign function call to external C programs is needed, the µVM data structure should already be in the desired layout expected by the C programs.
The µVM only needs to guarantee some compatibility between µVM types and C types in the FFI. It is already documented in the [Instruction Set](https://github.com/microvm/microvm-spec/wiki/instruction-set), but needs to be double-checked.
# Basic data structures that needs atomic accesses
To guarantee memory safety, the µVM must not allow out-of-thin-air reference values or opaque values. Affected types are:
* ref
* iref
* weakref (loaded into SSA variables as ref)
* func
* thread
* stack
* tagref64 (may contain ref)
* futex (one word integer, loaded into SSA variables as plain `int<WORD_SIZE>`)
Storing internal references (`iref`) in the memory (heap or stack or global) is discouraged because of space inefficiency (no better way than encoding them as fat pointers). But if an implementation does allow putting `iref` in the memory, the accesses to them should be atomic. Other types than `iref` are too important not to be implemented as lock-free atomic types (all atomic accesses in µVM are lock-free).
An alternative to requiring `iref` to be atomic is to:
- document that accesses to `iref` in the memory is not atomic, and,
- require the client to compile all memory accesses to `iref` with locks, and,
- have significant performance penalty.
Since a fat pointer consists of two words: an object reference plus an in-object offset, some (I assume there are only very few of them) architectures may not provide atomic access to such a length.
spec-2