general-issue-tracker issueshttps://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues2015-04-15T19:47:10+10:00https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/31GC: Is "liveness" of objects really needs to be defined?2015-04-15T19:47:10+10:00John ZhangGC: Is "liveness" of objects really needs to be defined?*Created by: wks*
Currently the Mu spec defines "live object" as being reachable from roots.
An alternative definition is to define heap objects as *always live*, but implementations can "*cheat*". We define:
A memory location has...*Created by: wks*
Currently the Mu spec defines "live object" as being reachable from roots.
An alternative definition is to define heap objects as *always live*, but implementations can "*cheat*". We define:
A memory location has a **lifetime**. An internal reference (iref) to a memory location is **valid** as long as the memory location's lifetime has not expired. Specifically,
* The lifetime of a memory location in the *heap* begins when the `NEW` or `NEWHYBRID` that allocates the heap object is executed. It **never expires**.
* The lifetime of a memory location in the *global memory* begins when the bundle that defines it is loaded. It never expires.
* The lifetime of a memory location in the *stack* begins when the `ALLOCA` or `ALLOCAHYBRID` that allocates the stack cell is executed. It expires when the function activation which the stack cell is allocated is destroyed by either returning, throwing an exception, or killing the stack.
If the Mu spec no longer define liveness by the "root set" and transitive reachability, the Mu implementation must infer those reachability rules from other parts of the spec. I believe a carefully defined spec implies the same rules as explicitly defined reachability rules.
## Examples and corner cases
**When an object is unreachable from the roots** (previously defined as "dead"): Mu can reclaim the object. Since it cannot be reached, the client and Mu IR programs will never find out Mu is cheating about the lifetime of heap objects, which were defined as "forever". (You can kill an immortal if nobody can see him/her again.)
**Object pinning**: Since an address is exposed during the period of pinning, the GC must not collect the object otherwise the native code will find Mu is cheating. In other words, pinning keeps an object alive.
## Weak references and finalisers
**Weak reference**: We must change the meaning of "weak references" because we no longer define "reachable". We can define it as:
* At any time, Mu **may** atomically set the values of some weak references to NULL if "after doing so, no one can prove that their referred objects can otherwise be reached". (This is not formal at all. Maybe weak references are really meaningless.)
In this way, a Mu implementation that never clear weak references is a valid implementation. But an implementation that does so may legally do so.
**Finaliser**: It was not defined because of it is not guaranteed to be caught. But in order to allow the Mu implementation cheat, I define it as:
* Any object may have a "prevent-one-death" flag (set when a finalisable object is created).
* There is a queue maintained by Mu. (finalising queue. The client implements a finalising thread watching the queue.)
* At any time, Mu **may** atomically remove the "prevent-one-death" flag of an object and put it in the queue mentioned above, provided that "the only way to get a reference to that object is via the queue". (This does not sound very formal, either.)
In this way, a Mu implementation that never call any finaliser is a valid implementation. But an implementation that does so may legally do so.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/29Heap Allocation and Initialisation Language (HAIL)2016-07-21T14:40:31+10:00John ZhangHeap Allocation and Initialisation Language (HAIL)*Created by: wks*
This proposal describes a language that allocates and initialises heap objects (and also global memory)
This proposal *does not* address *initialiser function*. It will be addressed in another issue.
# Rationale
...*Created by: wks*
This proposal describes a language that allocates and initialises heap objects (and also global memory)
This proposal *does not* address *initialiser function*. It will be addressed in another issue.
# Rationale
A code bundle (or simply "bundle" in our current terminology) contains **types**, **function signatures**, **constants**, **global memory cells** and **functions**. This is insufficient for a standalone Mu IR program.
A typical program usually contain statically declared and **load-time initialised** **heap objects**, e.g. **strings**, **class objects** (`java.lang.Class`) and so on. A developer from the PyPy project has indicated that there can be a lot of statically declared heap object. Currently those objects can be created and initialised in two ways:
1. The client allocates and initialises heap objects via the Mu Client API. This approach suffers from one particular shortcoming: performance. The API can only initialise one memory location (e.g. one element of an array, or one scalar field of a struct) per API call.
2. Include a particular function per bundle which creates and initialises heap objects. This approach has performance and complexity problems. This "function" must contain full description of all heap objects: their types, and the values of all (or some non-zero) fields, therefore the function can be huge. This information has to be encoded as Mu IR instructions and Mu IR constants, and the compiler has to translate this **humongous** "initialiser function" into runnable form and then execute it to make heap objects, and this function is executed only once. It is a waste of time and memory to compile such a one-shot function.
# Solution
The proposed solution is a compact file format that describes heap objects and initialises the memory.
Sample:
Assume we have a "traditional" Mu IR Bundle:
```
.typedef @i64 = int<64>
.typedef @i8 = int<8>
.typedef @double = double
.typedef @string = hybrid <@i64 @i8>
.typedef @void = void
.typedef @refstring = ref<@string>
.typedef @refvoid = ref<@void>
.typedef @ClassFoo = struct<@i64 @double @refstring>
.typedef @intarray = hybrid<@i64 @i32>
.global @HW <@refstring> // A global memory cell, initialised to NULL, which may hold a string reference later.
```
**After** loading the previous bundle, load this Heap Allocation and Initialisation Language (HAIL) file:
```
// HAIL file
.new $a <@i64> // A new object of just a number
.newhybrid $hw <@string> 12
.new $classFoo <@Foo>
.new $x <@refvoid> // An object whose content is only a heap reference to void
.new $y <@refvoid> // ditto
.newhybrid $hugeArray <@intarray> 10000
.init $a = 42
.init $hw = {12, {'H', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd'}}
.init $classFoo = {42, 42.0d, $hw} // Objects can directly refer to each other
.init $x = $y // Objects are first allocated and then initialised
.init $y = $x // So they can form circular references
.init @HW = $hw // @HW is a global cell declared in the previous "traditional" bundle. HAIL can initialise global cells in traditional bundles, too.
.init $hugeArray[5000] = 42 // Only initialise a particular elements. Other elements are 0.
// NOTE: only $hw is retailed because it is referenced by the global cell @HW. Other objects may immediately be garbage-collected (or not allocated at all if the Mu VM can "cheat")
```
## Structure
Heap objects allocated in this form has a special sigil `$` which is local to the current file.
A Heap Allocation and Initialisation Language (HAIL) file contains many of the following **top-level definitions**:
**.new**: Allocate scalar object in the heap. Has the form: `.new $name <@type>`
* `$name`: the local name of the object.
* `@type`: the type of the object.
**.newhybrid**: Allocate hybrid object in the heap. Has the form: `.newhybrid $name <@type> length`
* `$name`, `@type`: same as ".new"
* `length`: the length of the var part
**.init**: Initialise a heap object or a global cell. Has the form: `.init name[sub1][sub2]... = val`
* `name`: The name of the heap object or global cell. In this format, heap objects use special sigils (`$xxx`) while global cells uses global names in the Mu IR (`@xxx`).
* `sub1`, `sub2`, ...: Subscriptions. Ways to navigate through structs, arrays and hybrids. Specifically, in hybrid, the fixed part is 0 and the var part is 1.
* `val`: The value. It can be one of the following:
* Integer literals: 1, 24, -345, 0x456, 'H'
* FP literals: 1.0f, 3.14d, nanf, nand, +infd, -infd, bitsd(0x7ff0000000000001)
* Struct/array/hybrid literals: {elem0, elem1, elem2, ...}
* NULL
* other names (can be other heap objects of Mu IR constants, global cells (as internal references) and functions (as function references)): `$hw` `@HW` `@main`
## Comparing to API-based object allocation and initialisation
A HAIL file is a unit of delivery to the Mu VM. Only one API call is needed to load a whole HAIL file and it can allocate and initialise many objects.
"Loading a HAIL file" will be a new API message (or function).
# Performance Concerns
For better performance, this format should have a more compact binary format. Ideally the binary format can be very close to the in-memory representation of objects and require little more than copying data from the file to the memory and handle data sizes/padding/alignments. It cannot be perfectly identical to the in-memory representation because Mu's object layout is platform-dependent.
# When to use the HAIL format
HAIL should be used when the client wishes to allocate many objects and bulk-initialise the memory. For example, when loading a Java .class file, a Mu IR bundle is loaded for the Java functions, and then a HAIL file is loaded to create/initalise the `Class` object, the virtual table, string literals and so on.
Another example: Assume there is a PyPy interpreter implemented on Mu IR. The executable PyPy interpreter is represented as Mu IR bundle, but a HAIL file can be used to initialise the interpreter **instance** and associated objects.
# When HAIL may not be ideal
If the Mu VM is metacircular, the client is written in the Mu IR and accessing the Mu memory from the client will have no overhead. The HAIL format can still be implemented for compatible reason, but would not have any advantage in performance over direct memory accesses. For example, a metacircular Mu-based JVM can load a .class file and compile its methods to Mu IR, but the Class object can be created directly in the Mu IR because the JVM client itself is in Mu IR. It does not need to serialise the sequence of object allocations and initialisations into HAIL before doing them.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/33Impossible states in full-state frame making (OSR)2015-06-18T12:48:59+10:00John ZhangImpossible states in full-state frame making (OSR)*Created by: wks*
There was a proposal about the OSR before, https://github.com/microvm/microvm-meta/issues/5
According to our discussion recently, we decided that:
1. the state of a frame is the PC and a set of local variable val...*Created by: wks*
There was a proposal about the OSR before, https://github.com/microvm/microvm-meta/issues/5
According to our discussion recently, we decided that:
1. the state of a frame is the PC and a set of local variable values.
2. It should be only possible to continue from some designated "OSR points", or the beginning of a basic block, rather than from arbitrary point in the code. This reduces the compiler's burden to generate stack maps.
3. The client supplies an arbitrary subset of local variables and their values and
1. if a variable is supplied but is never used, it has no effect on the execution and is simply ignored.
2. if a variable is not supplied but is used later, it gives undefined behaviour.
But this leads to a problem: the supplied state may never be reproducible from normal execution. For example:
```
.funcsig @foo_sig = @i64 (@i64 @i64)
.funcdef @foo VERSION @foo_v1 <@foo_sig> (%a %b) {
%entry:
%x = MUL <@i64> %a %b
%y = ADD <@i64> %x @i64_0 // The rhs is the constant 0.
%trap = TRAP <@void>
// OSR continues here
CALL @print (%x)
CALL @print (%y)
RET <@i64> %y
}
```
Assume we perform an OSR and construct a frame which continues *after* the `%trap` instruction with local variable values: `%a = 6; %b = 9; %x = 42; %y = 54`. Then the value `%x = 42` is impossible.
But the code generator may consider "adding zero" as a no-op and thus generates the machine code that aliases the register of `%x` and `%y`. For example:
```
foo:
push rbx ; save callee-saved register
mov rbx, rdi ; do multiplication. rbx holds the value of %x and also %y
mul rbx, rsi
mov rdi, rbx ; prepare to call @print (%x)
call print
mov rdi, rbx ; prepare to call @print (%y)
call print
pop rbx
ret
```
Then it is impossible to create such a state as mentioned above. This implies that either
1. we require that such state construction must be possible and **require the code generator not to generate the code like above**, or
2. we further **restrict our API** on frame state construction.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/50IR construction API for the convenience of verification2016-06-17T15:24:01+10:00John ZhangIR construction API for the convenience of verification*Created by: wks*
# Problem
The only way for the current API to transfer an IR bundle from a client to a Mu micro VM is via the `load_bundle` API (for example, `microvm.load_bundle(""".typedef @i32 = int<32>\n""")`, or use a binary f...*Created by: wks*
# Problem
The only way for the current API to transfer an IR bundle from a client to a Mu micro VM is via the `load_bundle` API (for example, `microvm.load_bundle(""".typedef @i32 = int<32>\n""")`, or use a binary format), which passes a serialised IR format (text or binary). JVM takes this approach. Its serialisation format is "Java bytecode".
An alternative way to deliver a bundle is to let the client construct each node (type, function, basic block, instruction, ...) in the bundle by calling an API function (for example, `handle = microvm.make_instruction("ADD", i32, op1, op2)`) which returns a handle to the node, then passing this handle to the Mu micro VM. LLVM uses a similar approach: it provides constructors to each node class, and provides a "builder" to conveniently build a CFG.
Since the construction of the IR should be off-loaded from the micro VM, there should be a client-side library (we call it **libmu**) which constructs the IR for the client and provides an API to the client. `libmu` itself talks with the micro VM through implementation-dependent "private" APIs.
It is not clear whether **serialisation** or **construction by calling** is "better" because "better" has many definitions, but *the calling-based API must exist* because it is reported that it is very difficult to verify a parser.
## Comparing the two approaches
**serialisation**
* pros
+ language independent
+ The serialisation format can be standardised and (almost) all languages can access byte arrays.
+ faster than *construction by calling* when the client and the micro VM have different runtime environments
+ (e.g. Haskell to C, Java to C, ...)
+ FFI can introduce considerable cost. The Go lang impl performs a SWAP-STACK operation every time it calls C.
+ simple API
+ one API function `load_bundle` handles the entire IR format.
* cons
- difficult to verify (there are reports that it is very difficult to verify a parser)
- require a parser inside the micro VM:
- there may be a mixed approach to offload the parser outside the micro VM.
- not suitable for the end user, i.e. those who writes the high-level language compiler.
- Either the high-level compiler writer or some third party need to build a library for serialisation.
**construction by calling**
* pros
+ easy to verify
+ minimal for the micro VM
+ No parser in the micro VM is needed.
+ faster than *serialisation* when the client and the micro VM have similar runtime environments
+ (e.g. C to C, Java to Java, C to C++, Haskell to Haskell, ...)
+ The libmu library can be implemented in the same language as the client.
* cons
- bloated API
- There will be hundreds of API functions just to construct each IR node. All of them need to be verified.
- The internal states of libmu needs to be verified, but hopefully this can be easier than verifying the parser.
- paradigm impedance
- There is no single API that satisfies all languages. For example, if the API is defined in C, it may be unsuitable for a client written in Haskell. So ideally there should be a libmu for Haskell, whether that is part of the formally verified Mu/libmu pair or not.
# Details
## Cost of foreign function calls
Depending on the two languages calling each other, the cost can be trivial (such as C and C++) or huge. Java must go through JNI to call native functions. Go performs a swap-stack operation every time it calls C in order to work around blocking system calls in its M*N threading model. Other language implementations, such as Python, relies on `libffi`, which builds a native call by dynamically prepare its arguments, and the high-level language such as Python needs to convert C types to Python types and vice versa.
An experiment shows calling a simplest C function from Haskell introduces 30x overhead comparing to calling the same C function from another C function.
But from verification's point of view, the cost does not matter as long as it is spent in the client.
## Cost of serialisation
Serialisation is not free. In a simple experiment, serialising a CFG-like data structure to an intermediate format and then parse it in another module (both written in C) introduces a 10% overhead comparing to directly constructing the target CFG structure in the receiver while assuming the sender only holds opaque references. The major cost is memory allocation (where malloc is the bottleneck) and resolving the cross-references between nodes (where the hash map is the bottleneck).
Serialisation is not free, but is reasonably cheap. When foreign function call is expensive, serialisation can be used as an alternative.
## Not calling across languages
Since calling across languages is expensive, it is desirable to implement part of the libmu in the same (or similar) language the client is written in, and let libmu construct the data structure that is native to the micro VM.
For example, if the micro VM is written in C++, the libmu should construct a tree of C++ class instances as LLVM does.
Note that LLVM is designed and implemented in C++ and serves C/C++. It is not a problem if the only official API it provides is C++. There is a C binding, too, but it is trivial. However, Mu has a specification which allows multiple implementations. In this case, the micro VM core would not always be in C/C++ or any particular language.
However, if the micro VM is written in a managed language, such as RJava or Java, then it will be interesting:
* If both the client, libmu and the micro VM happen to share the same (or similar runtime), then the API calls can be cheap. The ideal case is being "metacircular", i.e. both the client and libmu run on the same micro VM as the micro VM itself. The cost is minimal.
* If libmu is written in C and provides the C API, then there is a semantic mismatching between the micro VM and libmu: *somewhere in libmu* must cross the line between two different runtimes, which introduces a FFI-like overhead, the cost of which depends on the concrete languages. Holding opaque references of objects in Mu requires such object to be pinned, or being held by some containers (like the `MuCtx` structure in the current API, which is not light-weight).
* If the client and libmu uses a different runtime as the micro VM ("Haskell on libmu on micro VM" is this case), it will pay for two levels of cross-runtime calls.
# Other concerns in design
## Does libmu need to be minimal?
Maybe. At least the verified libmu needs to be minimal.
There may be even higher-level libraries outside above in the client's world outside libmu. Those libraries are not minimal.
## How many languages should be supported?
Ideally the libmu should have both intimate knowledge of a particular Mu implementation, and intimate knowledge of the language the client is written in. In theory, there can be one (or more) libmu for each (micro VM impl, client language) pair.
Since C is so popular, we will define a C API for libmu. In theory,
1. there can be more than one C APIs
2. there can be APIs for other languages and they may or may not look like the C API. (preferably not, to avoid paradigm impedance)
## Mu CFG and the client CFG (or AST)
**How much should the client use the Mu IR CFG?** i.e. should the client construct Mu IR nodes and do transformations on it as LLVM does? Probably not.
LLVM is designed to be maximum: it attempts to be maximum and its CFG contains much information for optimisation, such as the "nsw" and "nuw" flag on the "add" and the "sub" instructions.
But the Mu IR is designed to be minimal and is only designed for the micro VM to consume. It does not contain much information that benefits the client.
It is possible that a client-side library performs IR transformations, but it is doubtful whether that IR is the same Mu IR. Many optimisations, such as whether `x+1 > x` is always true, depends on extra information (such as the "nsw" flag in LLVM) which the Mu IR do not provide.
I @wks believe the Mu IR is only generated as the last step of the client-side transformation, i.e. the next step is to deliver it into the micro VM.
# Towards the new API
## Micro-VM-to-client API
The existing controlling API does not need to be changed.
The bundle loading API can be removed. i.e. "how to load bundles into the micro VM" is implementation-dependent, which actually mean **libmu-dependent**.
## libmu-to-client API
This API needs to be carefully designed because it is part of the formal verification.
There should be a model of the internal states of libmu which includes:
* The set of handles which map to Mu IR nodes.
* The state of each Mu IR node.
* Isolation between threads.
During construction, each node can hold incomplete information at a given time. The Mu IR nodes may circularly refer to each other (currently they refer to each other via IDs), so it is desirable to allocate several nodes and then link them with each other.
If multiple threads can use the same libmu, the transition in its internal state must be properly handled.
Cares must be taken to select the minimum set of API functions, because the current Mu IR has 18 types and 37 instructions. The number of API functions may easily bloat to 100+ if too many CRUD commands are added.
## Micro VM-to-libmu interaction
This interface does not need to be public, but the proper handling of data structures in the micro VM is important. This interface is part of the verification.
In all cases, the choice of languages does matter. Properly chosen languages for the client and the micro VM will result in high performance and verifiability.
# Future languages
This section contains my @wks personal opinions. These affects my opinions on this API, too.
In the future, popular system programming languages will generally be higher-level than current languages (such as C). We already showed that a high-level language, such as RJava and Java, can produce high-quality language runtimes, such as JikesRVM.
Instead of relying on C, high-level system programming languages will gain direct control over low-level operations, such as raw memory access (pointers). These low-level stuff can be done even though the language itself still has very high-level features, such as garbage collection and object-oriented programming. It is possible (in my opinion at least) that eventually such high-level language can replace libc and directly interface with the kernel, thus eliminates the necessity of C except for some very rare cases.
This trend is already visible in several languages. C# already has unsafe pointers and unsafe native interface (P/Invoke). The Java API is mostly implemented in Java, unlike the old days when the standard Java library is mostly implemented in C++. There is also a [JEP to add unsafe native interface in Java](http://openjdk.java.net/jeps/191), which still has not become mainstream yet. However, OpenJDK already exposes some low-level stuff though the `sun.misc.Unsafe` class. RJava obviously used magic to gain low-level support. In Ruby, [ruby-ffi](https://github.com/ffi/ffi) is recommended over directly writing C modules.
With C being redundant, the high-level language (or runtime) may be optimised for internal interoperation (e.g. Java-to-Java calls will be faster and faster) at the expense of the interoperability with C (e.g. even unsafe native call may be costly. Object pinning or opaque handles are required for native programs, which would only run briefly.).
Since raw memory access will be faster while foreign function call will be slower, serialisation may have an advantage over calling (my experiment already shows this for Java and C). But this does not rule out the calling-based API because it may not be foreign calls, in which case it is still faster than serialisation. Ideally, in the meta-circular setting, if both Mu, libmu and the client are in Mu, then Mu-to-Mu function calls are virtually free.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/73License2018-06-28T23:11:33+10:00John ZhangLicenseWe have decided to use the Apache 2.0 license for all our code.
The following two commits put forward a draft for the license file.
verbatim: mu/mu-perf-benchmarks@47045501
added ANU copyright: mu/mu-perf-benchmarks@76b1b7ca
If...We have decided to use the Apache 2.0 license for all our code.
The following two commits put forward a draft for the license file.
verbatim: mu/mu-perf-benchmarks@47045501
added ANU copyright: mu/mu-perf-benchmarks@76b1b7ca
If there is no problem, then I will merge the branch into master, and you can all put a copy of the LICENSE file into your project.https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/11Mechanisms to support thread synchronisation primitives (including blocking l...2016-09-12T17:09:11+10:00John ZhangMechanisms to support thread synchronisation primitives (including blocking locks)*Created by: wks*
In a multi-threaded environment, threads may need to wait for the availability of mutually exclusive resources or asynchronous events. Blocking locks are provided by many programming languages and operating systems as ...*Created by: wks*
In a multi-threaded environment, threads may need to wait for the availability of mutually exclusive resources or asynchronous events. Blocking locks are provided by many programming languages and operating systems as primitives to facilitate threaded programming.
Characteristics of the µVM prevents directly adopting pThread Mutexes.
1. [Dolan et al](http://dl.acm.org/citation.cfm?id=2400695) implemented efficient blocking locks in an environment with green threads implemented with SWAP-STACK. When there is only one native thread, atomic and ordered memory access is not necessary. Even there are multiple native threads, locks can still be implemented with merely an integer, atomic memory operations and a lock-free waiting queue.
2. There may not always be a one-to-one mapping between language-level threads and native threads. (SWAP-STACK-based green threads is one example) The client may also implement m-to-n relationship where logical threads can be scheduled on any native thread. Re-entrant locks are based on the logical notation of "thread", which may not be native threads. However, pThread recursive mutexes depends on native threads.
3. Spin locks can be implemented using only integers and atomic memory accesses.
4. System calls are expensive.
# The proposal
The µVM only needs to provide two primitives: SUSPEND and RESUME
* SUSPEND: suspend the current thread. The thread blocks, but the state of the underlying stack is still ACTIVE.
* RESUME: resume a given thread if it is waiting.
Note: more commonly known as PARK, UNPARK
On Linux, futex provides these two primitives, except the Linux kernel maintains an internal waiting queue. Their WAIT and WAKE are address-based (the address of the lock) rather than thread-based (Which thread should suspend/resume?). The above two primitives can be implemented on top of futex.
We still need to find equivalent counterparts on OSX, Windows and other operating systems. Using pThread mutex for waiting is also an option, albeit expensive. (On Linux, glibc implements pThread mutexes using a mixture of user-level fast paths and the futex)
# proof of concept
It still needs to be shown (by code) whether these two primitives are sufficient.
# yield points
How to suspend another thread? Who inserts yield points where threads can be suspend, the client or the µVM?
spec-2https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/37Memory model in native interface2016-06-17T15:23:38+10:00John ZhangMemory model in native interface*Created by: wks*
# Problem
Currently the Mu memory is all about "memory locations" – a region that holds a Mu value, not directly related to addresses or bytes. The native memory is a sequence of bytes, addressed by integer "address...*Created by: wks*
# Problem
Currently the Mu memory is all about "memory locations" – a region that holds a Mu value, not directly related to addresses or bytes. The native memory is a sequence of bytes, addressed by integer "addresses". They are separate until a Mu memory location is pinned. In that case, the Mu memory location is mapped to a region of bytes in the address space. Accessing one will affect another.
Meanwhile Mu's memory model uses the C++11-style model based on the happen-before relation.
This model imposes a challenge that the model should bridge the Mu and the native world. The native view of the memory as a sequence of bytes should work nicely with the Mu memory, i.e. map to meaningful memory operations in the Mu world. Atomic actions should be consistent and may establish the happen-before relation between two worlds. Specifically:
* What is the unit of memory actions? Previously, it is "Mu memory location".
* If a "load" action is modeled as a tuple: `LOAD(order, type, location)`, and location was "Mu memory location", then what should location be now? Address? What value does it see? Some store? Or something else?
* If a "store" action is modeled as a tuple: `STORE(order, type, location, newvalue)`, and location was "Mu memory location", then what should location be now?
* If a Mu memory location is pinned, and is accessed in a different granularity than the type declared, what will be the result?
* If stored as a whole, but loaded in parts...
* If stored in parts, but loaded as a whole...
* But we cannot model the memory as a byte array which sequentially changes state. (or, can we? Since non-atomic conflicting accesses are meaningless, does this imply it must be changed sequentially, or errors occur?)
# The current model
* A non-atomic load sees the unique store operation that happens before it, and there isn't another store operation that happens between the visible store and the load. If there are more than one such operations, it has undefined behaviour.
* An atomic load sees the value from any of its visible sequence of store operations.
* Mixing non-atomic and atomic operations on the same memory location has undefined behaviour.
# Possible directions
In any way, pure Mu programs should keep its original C++11-like semantics.
1. Make the memory model more machine-oriented and machine-specific.
* May give more dependable behaviours. For example, unaligned memory access is allowed in many architectures, but are not always atomic.
* Obviously this makes Mu less portable. All pointer-based memory access will have machine-specific semantics. But does this matter? This is the "native interface" anyway.
* Interoperability with the C++11 memory model for C/C++ programs will be built upon the machine-specific memory model.
2. Limit what operations are allowed in the native memory.
* Simpler model.
* Probably more undefined behaviours, because they cannot be defined if we tries to make things simple and generic.
* Will limit the capability. e.g. unions won't be used by Mu.
3. Something in between
# Examples
The native program should synchronise with the Mu program via atomic memory accesses.
```c++
// C++ pseudo code
struct Foo {
int x;
int y;
};
Mu_thread_1 {
ref<Foo> f = new<Foo>
ptr<Foo> fp = pin(f);
create_thread(native_thread_2, fp);
store(&f->x, 10, NOT_ATOMIC); // Mu-level store
store(&f->y, 20, RELEASE); // Mu-level store
}
native_thread_2(ptr<Foo> fp) {
while(load(&fp->y, ACQUIRE) != 20) {} // Native load
int a = load(&fp->x, NOT_ATOMIC); // Native load
assert(a == 10);
}
```
In non-atomic memory access, partial reads/write should be based on the bytes representation (it is called the "object representation" of a value in C11).
```c++
ref<i32> r = new<i32>;
store(r, 0x12345678); // Assume little endian
ptr<i32> p = pin(r);
i64 addr = ptrcast<i64>(p); // cast the pointer to the integer address
addr += 3;
ptr<i8> p2 = ptrcast<ptr<i8>>(addr); // cast back to pointer, but a different type
i8 value = load(p2);
assert(value == 0x12);
store(p2, 0x9a);
i32 value2 = load(r);
assert(value2 == 0x9a345678);
```
Unaligned 16-, 32- and 64-bit memory access is allowed in x64 (and P6-family guarantees atomicity if not crossing any cache line boundary).
```C++
struct Foo { i32 a; i32 b; };
ref<Foo> r = new<Foo>;
store(&r->a, 0x9abcdef0);
store(&r->b, 0x12345678);
ptr<Foo> p = pin(r);
ptr<i64> p2 = ptrcast<ptr<i64>>(p);
i64 value = load(p2);
assert(value == 0x123456789abcdef0);
```
Could non-atomic memory access mix with atomic counterparts?
```C++
struct Foo { i32 x; i8 y; double z; };
ref<Foo> r1 = new<Foo>;
ref<Foo> r2 = new<Foo>;
ptr<Foo> p1 = pin(r1);
ptr<Foo> p2 = pin(r2);
store(&p1->x, 0x12345678, NOT_ATOMIC);
store(&p1->y, 42, NOT_ATOMIC);
store(&p1->z, 3.1415927D, NOT_ATOMIC);
memcpy(p2, p1, sizeof(Foo)); // This is obviously not atomic
some_synchronization_operation_after_which_atomic_accesses_will_be_safe(); // What should this be?
thread1 {
store(r2->y, 84, RELAXED); // This is atomic
store(r2->x, 0x9abcdef0, RELEASE); // This is atomic
}
thread2 {
i32 a = load(&r2->x, ACQUIRE); // This is atomic
if (a == 0x9abcdef0) {
i8 b = load(&r2->y, RELAXED); // This is atomic
double c = load(&r2->z, RELAXED); // This is atomic
assert(b == 84 && c == 3.1415927D);
}
}
```
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/23Micro VM interface changes precipitated by having C as a client2016-06-17T15:23:18+10:00John ZhangMicro VM interface changes precipitated by having C as a client*Created by: eliotmoss*
While there will be a bunch of carefully worked out details that need to follow, I thought I would get the ball rolling with a quick summary / highlights of the Modula-3 approach. (See https://www.cs.purdue.edu/...*Created by: eliotmoss*
While there will be a bunch of carefully worked out details that need to follow, I thought I would get the ball rolling with a quick summary / highlights of the Modula-3 approach. (See https://www.cs.purdue.edu/homes/hosking/m3/reference/m3.html for a Modula-3 reference.)
First, M3 has ordinary *traced* references and also **untraced** ones. REF T is the type of a traced reference to a T, while UNTRACED REF T is the type of an untraced one. Traced vs. untraced is a concept distinct from safe vs. unsafe. Safe use of traced and untraced keep them segregated. Untraced objects come from an explicitly managed storage area. You can use NEW on an untraced reference type to allocate in such an area.
Certain regions of code (interfaces or modules -- probably a bundle is the corresponding thing for the Micro VM) can be marked UNSAFE. Unsafe code can do additional things. These are ones that I remember:
* Cast an expression to an arbitrary type (of the same size in bits) - this clearly allows interconverting between traced and untraced pointers, but also other things; this is called LOOPHOLE(e, T).
* Address arithmetic (Digression: In M3 REFANY is the type of a traced reference to any object, which in safe code requires a dynamically type-checked downcast to turn into a REF T and do more interesting things. The corresponding untraced type is ADDRESS, and it has arithmetic operators.)
* Up/down cast without checking
* Free an untraced reference's referent (the function is called DISPOSE)
Safe code must use traced/untraced only is specific ways, described in the language reference.
We may need to add to this sort of design to allow short unsafe code regions to manipulate ordinary (traced) objects safely in the presence of (say) concurrent GC. Also, an untraced version of iref would make sense for the Micro VM.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/40Mu Client Interface as C Binding2015-08-21T15:55:43+10:00John ZhangMu Client Interface as C Binding*Created by: wks*
The current API is expressed in a language-neutral form, and it is the implementation that decides how to implement such an interface. Programmers still need to resort to implementation-specific interfaces to actually ...*Created by: wks*
The current API is expressed in a language-neutral form, and it is the implementation that decides how to implement such an interface. Programmers still need to resort to implementation-specific interfaces to actually use a particular Mu implementation.
Since C is so widely used as a system programming language, the Mu client interface (a.k.a the API) should be expressed as data types and function calls in the C programming language. If the client is not in C, it usually still has a C FFI.
# The API in C
Resources in Mu are exposed in opaque types which have reference semantics: they can be copied and still refers to the same resource.
* `mu_micro_vm_t`: a reference to a Mu micro VM instance.
* `mu_client_agent_t`: a reference to a client agent.
* `mu_handle_t`: a handle to a value in the Mu type system exposed to the client.
Messages are C functions. Like JNI, they are contained in a struct: `typedef struct mu_api_msgs {...} mu_api_msgs_t`. In this way, the client in C does not need to link against any libraries when compiling. The reason is, for a Mu micro VM implemented in a higher-level language (like the reference implementation in Scala), the binding of the callable C function is generated very late, later than even the loading time, and has no access to the native loader.
For example, assume there is a `mu_api_msgs_t* msgs` defined:
```c
mu_client_agent_t ca = ...
char buf[999999];
int sz;
// load file into buf
msgs->load_bundle(ca, buf, sz); // Load a bundle
// Putting C values into Mu
mu_handle_t h1 = msgs->put_schar(ca, 127);
mu_handle_t h2 = msgs->put_sshort(ca, 32767);
mu_handle_t h3 = msgs->put_sint(ca, 42);
mu_handle_t h4 = msgs->put_slong(ca, 42);
mu_handle_t h5 = msgs->put_slonglong(ca, 999999999999999);
// Converting Mu values to C
int v3 = msgs->to_sint(ca, h3);
unsigned long v4 = msgs->to_ulong(ca, h4); // just treat the int as unsigned
```
Mu-level flags are C preprocessor macros. They have type int.
```c
msgs->store(SEQ_CST, hLoc, hNewVal); // SEQ_CST is a macro
```
Callbacks, including the trap handler and undefined function handler, have defined signatures:
```
typedef mu_trap_return_status_t (*mu_trap_handler_t)(
mu_client_agent_t ca,
mu_handle_t stack,
mu_handle_t thread,
int watchpoint_id,
mu_handle_t &new_stack,
mu_handle_t &data_passed,
mu_handle_t &new_exception,
mu_api_msgs *msgs,
void *user_data);
typedef void (*mu_undefined_function_handler_t)(
mu_micro_vm_t microvm,
int funciton_id,
mu_api_msgs *msgs,
void *user_data);
```
These functions are registered via the `msgs->register_trap_handler` and `msgs->register_undefined_function_handler` API messages. In their parameters, the `user_data` is an arbitrary pointer provided by the client in an implementation-specific manner (see below).
# Implementation-defined behaviours
Some aspects of the C binding are implementation-specified. They include:
* How to create a Mu micro VM? Options are:
1. The C executable creates the Mu instance.
2. Mu loads the C dynamic library.
3. Mu starts separately and C connects to the existing instance in the same process.
4. C connects to a Mu instance in a different process, or a different machine.
* Options in creating Mu instances. Options are:
1. Heap size. Giving a heap size means the Client determines the heap size rather than Mu automatically decide its own storage.
2. Global data space size. Setting this value means the global data may have their own storage. Actual implementation could use the heap space, too.
3. Stack size. Similarly, this is too implementation-specific.
* What happens during initialisation?
1. Mu calls a C function to initialise the client, and the client provides a `void*` to Mu for the client's own context. (note: in this case, it is Mu loading C rather than C creating Mu.)
2. C creates a Mu instance, and sets its `void*` user data in a proprietary API message.
# Open questions
* Should we allow each Mu implementation have its own "namespace"? The opaque types (`mu_micro_vm_t` and so on) are opaque, but different implementations may have different representations. The current C binding design forbids one C program working with more than one Mu implementations (though it is okay to work with more than one *instances* of the same implementation).
* JNI does not solve this problem, either.https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/48Mu IR rewriting library2015-10-29T16:47:55+11:00John ZhangMu IR rewriting library*Created by: wks*
Mu aims to be minimal, but such minimalism has made the construction of Mu clients hard. A client-level library can ease the client's job by accepting a slightly higher-level variant of the Mu IR and translating that h...*Created by: wks*
Mu aims to be minimal, but such minimalism has made the construction of Mu clients hard. A client-level library can ease the client's job by accepting a slightly higher-level variant of the Mu IR and translating that higher-level IR to actual Mu IR code (and/or HAIL scripts and/or subsequent API calls).
This issue tracks tasks that should be done at this layer.
* Pre-SSA to SSA converter. #44
- Writing Mu IR in the SSA form is hard, and the goto-with-values form is even harder. The library should automatically convert ordinary CFGs into the goto-with-values form using well-known algorithms.
* Platform-dependent constant values. #47
- Some ahead-of-time clients (notably C or other "traditional" languages) exposes platform details to the programmer as compile-time constants, but binding those values too early will make the object code non-portable. The rewriter should help the client determine these values so that the client compiler can be strictly ahead-of-time.
* Merge Mu IR and HAIL: #29 #46
- The library is not minimal. Integrating both languages will make the client's job easier.
* Annotations
- This will allow clients to attach arbitrary information to the Mu IR code, which can help the client introspect the program at run time.
- Note that if we need to use system debuggers (such as GDB), then these annotations need to go through the micro VM itself because it is the micro VM's responsibility to generate object codes (including the DWARF debug info).
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/55Mu, LibMu and LibMuXxx: Layered API for the client2018-06-28T23:11:34+10:00John ZhangMu, LibMu and LibMuXxx: Layered API for the client*Created by: wks*
As we discussed today, the structure of Mu and its client-facing interfaces should be like this picture:
![mu-libmu](https://cloud.githubusercontent.com/assets/370317/15414816/6b55849c-1e81-11e6-839a-c6cac254845f.pn...*Created by: wks*
As we discussed today, the structure of Mu and its client-facing interfaces should be like this picture:
![mu-libmu](https://cloud.githubusercontent.com/assets/370317/15414816/6b55849c-1e81-11e6-839a-c6cac254845f.png)
(Black text represents the component, and red text represents the programming language they are implemented in.)
In the inner circle is the micro VM. It can be implemented in any language, but it provides a C API, and *both the micro VM and the C API (i.e. the inner black circle boundary) need to be verified*. Outside the outer circle is the client. The ring in between is a library which we call "LibMu". In theory, the client, the LibMu and the Mu micro VM can be implemented in different languages.
When LibMu (or some language-specific LibMu wrappers, such as LibMu-Z for some hypothetical language Z, as shown in the picture as the client-facing semi-circle) talks with the client, **it should present a nice client-friendly interface for the client to construct Mu IR bundles**. Such interface should provide appropriate data structures, data types and constructors or even high-level transformers for the convenience of the client. *This layer does not need to be minimal*.
When LibMu talks with the Mu micro VM, **the interface must be minimal and verifiable**. We agreed (#50) that it is difficult to verify a parser, so it rules out "sending text or binary blobs into the micro VM (across the black circle)". The C API of the micro VM provides a function call-style API (also discussed in #50, but need to be revised) so that LibMu constructs a bundle into Mu by making a sequence of function calls, each function constructs a Mu IR node (such as instruction, basic block, type or constant).
Some programming languages (such as Python, Haskell, ...) may have relatively high overhead when calling C foreign functions, comparing to direct C-to-C calls. If the client is written in such languages (language Y in the picture), it will be slow to construct the Mu-side AST by frequently calling through the C interface. We consider this as a problem of the language implementation of Y. In such case, the client should have some part of it written in C (the inner micro VM-facing semi-circle in LibMu) so that language Y can encode the MuIR bundle and send it to the C component (this interface does not need to be verified) and the C component constructs the MuIR in Mu via the C API (this interface is verified).
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/63muapi.h: destinations2018-06-28T23:11:33+10:00Eliot Mossmuapi.h: destinationsWe revised the muapi.h interface recently around results of most instructions, removing add_result, and adding get_result and num_results. It seems that the situation around destinations for instructions such as BRANCH2 (just an example...We revised the muapi.h interface recently around results of most instructions, removing add_result, and adding get_result and num_results. It seems that the situation around destinations for instructions such as BRANCH2 (just an example) is similar. However, in this case I think it appropriate that the destination be built by the user of the API **before** making the call to create the BRANCH2, and the two handles on the destinations be passed in to the API function that creates the BRANCH2. I think this works for most branching things.
CALLs may be more problematic in that the destination may want to mention CALL results to pass on. However, a CALL node could support add_normal_dest (or set_normal_dest) and similarly for the exceptional destination.
But when we know the number of destinations, and the instruction itself is not producing more results that the destination can refer to, it seems cleaner to build the dests first and pass them in.John ZhangJohn Zhanghttps://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/24Native Interface (super issue)2016-06-17T15:23:20+10:00John ZhangNative Interface (super issue)*Created by: wks*
This is an outline of issues related to the native interface, that is, interacting with the native world. This topic includes but is not limited to object layout, pointer types, object pinning and foreign function call...*Created by: wks*
This is an outline of issues related to the native interface, that is, interacting with the native world. This topic includes but is not limited to object layout, pointer types, object pinning and foreign function calls. We should open other issues to discuss concrete problems.
* Make a platform-specific Mu specification
+ Address some of the following issues, including object layout, calling convention, ...
+ Draft for AMD64: https://github.com/microvm/microvm-spec/wiki/native-interface-x64-unix
* Type system: (https://github.com/microvm/microvm-meta/issues/34)
+ Raw pointer types
+ Structure types with native/explicit object layout
+ Union types (unlikely to have in Mu)
+ Mapping Mu types to C types and native object layout: (in the [Native Interface](https://github.com/microvm/microvm-spec/wiki/native-interface) chapter in the spec)
* Memory space beyond heap/stack/global
+ Memory spaces with various constraints
- Is it movable, pinnable, has reference, can be referenced to, GC-traced, GC-collected, ...?
+ Object pinning: https://github.com/microvm/microvm-meta/issues/28
- If object pinning is allowed, what does "pin" mean?
* Foreign function interfaces
+ Calling foreign functions from Mu: (The CCALL instruction. See the [spec](https://github.com/microvm/microvm-spec/wiki/native-interface))
+ Calling C functions
+ System calls
+ Calling back to Mu from foreign functions: https://github.com/microvm/microvm-meta/issues/39
+ From C functions
+ Signal handling
The following should be addressed by a higher-level abstraction:
* Loading native libraries
+ The client loads libc and finds the address of `dlopen`, `dlsym`, `dlclose` and `dlerror`. Then the `CCALL` instruction takes care of the rest by calling them.
* Loading "heavier-weight" Mu bundles (currently called MuLF): https://github.com/microvm/microvm-meta/issues/30
The following are not related to the native interface, but are related to raw memory:
* How to expose the address of objects so that the user can analyse the memory behaviour? (This involves profiling, too. We may open a dedicated issue for profiling.)
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/64New Symbolic IR building API2018-06-28T23:11:33+10:00Kunshan WangNew Symbolic IR building API# Introduction
Recent experiences from Yi shows it is difficult to implement the current bundle building API in some languages such as Rust with respect to *mutability* and *cyclic references* between IR nodes. We are proposing an alt...# Introduction
Recent experiences from Yi shows it is difficult to implement the current bundle building API in some languages such as Rust with respect to *mutability* and *cyclic references* between IR nodes. We are proposing an alternative IR building API. In the new API, **IR nodes will refer to other nodes by symbols (IDs and optionally names)**, rather than referring to the nodes directly. The advantages will be:
1. The client can build the IR nodes in any order.
2. The implementation language of *the micro VM itself* can keep symbolic references between IR nodes, which can be helpful if the language (such as Rust or Haskell) is sensitive to mutability and cycles.
The potential disadvantage is that it will require the micro VM to resolve symbolic reference between IR nodes during bundle building or after bundle loading. But this practice has been commonly used in the history of compiler construction (according to Tony). We expect the cost of micro VM-side symbol resolution to be acceptable, though the actual performance implication is not yet known.
TL;DR: Scroll down to "The Proposed New API"
# Background
The [high-performance Mu implementation](https://gitlab.anu.edu.au/mu/mu-impl-fast) is written in Rust.
**Rust does not like cycles.** Every Rust l-value is owned by another unique l-value. This forbids having cyclic reference between nodes. The moment the programmer attempt to form a cyclic reference, it will complain about ownership problems.
But the Mu IR often contain cycles. Such as:
```
.typedef @linkedlist = struct < @i64 @linkedlist_ref >
.typedef @i64 = int<64>
.typedef @linkedlist_ref = ref < @linkedlist >
```
Note the recursion on the `ref` type. The text form can express this recursion because each type node has a name: `@linkedlist`, `@i64` and `@linkedlist_ref`.
The current IR building API constructs nodes directly, so API functions cannot refer to nodes that will be created in the future. To support cycles, the API mutates IR nodes after a node is created:
```c
MuCtx *ctx = ...;
MuBundleNode b = ...;
// Create a type: ref<?>
MuTypeNode linkedlist_ref = ctx->new_type_ref(ctx, b)
MuTypeNode i64 = ctx->new_type_int(ctx, b, 64);
MuTypeNode members[] = {i64, linkedlist_ref};
MuTypeNode linkedlist = ctx->new_type_struct(ctx, b, members, 2);
// Set the ref<?> to ref<@linkedlist>
ctx->set_type_ref(ctx, b, linkedlist_ref, linkedlist);
```
Note that the IR nodes must be created in a particular order, with reference targets populated in the end. Such order always exists for all Mu IR bundles, because the Mu IR can only be recursive at certain spots. Constructing the IR nodes in the following order will always work: ref/uptr types -> other types and signatures -> populate ref/uptr types -> constants/globalCells/funcs/expfuncs -> funcvers -> basic blocks -> bb params -> insts/results -> branch destinations/exception clauses/keepalives/subclauses of NEWTHREAD and SWAPSTACK.
But **Rust does not like mutation, either**. Rust is designed for memory safety, especially trying to avoid data races in multi-threaded programs. So it is a well-known rule in Rust that *if anything is shared, it cannot be modified*. Most sharing mechanisms (&mut (mutably borrowed reference), Rc (refcount box), Arc (atomic refcount box)) share in a read-only mode, unless synchornisation mechanisms are also applied (such as Mutex).
This does not match the current Mu bundle building and loading design: currently, the bundles is mutable when being built, but becomes immutable once loaded. To implement such mutation, Rust needs to use `Cell<T>` -- "internal mutable fields" in immutable objects. And when an object is constructed but the field is not yet supplied (such as `ref`), Rust needs to use `Option<T>` to leave space for the `None` case. We would have `struct TypeRef { target: Cell<Option<Type>> }`, while when loaded, the `target` field is neither optional nor mutable.
## Other languages
No known programming languages can express the idea of "mutable while being constructed, immutable when used".
In Java, JavaBeans is a famous "mutable construction" pattern: a "bean" class provides a parameter-less constructor and many setters, so properties can be set via setters. It is useful for introspection-based object initialisation:
```java
class Foo {
public Foo() { }
private Bar bar;
public void setBar(Bar bar) { this.bar = bar; }
public Bar getBar() { return bar; }
}
class Bar {
public Bar() { }
private Foo foo;
public void setFoo(Foo foo) { this.foo = foo; }
public Foo getFoo() { return foo; }
}
```
This pattern allows multiple objects to be created, but their dependencies injected later. The Spring Framework is one such container:
```xml
<beans>
<bean id="theFoo" class="Foo">
<property name="bar" ref="theBar" />
</bean>
<bean id="theBar" class="Bar">
<property name="foo" ref="theFoo" />
</bean>
</beans>
```
Many Spring Framework objects claim to be "thread-safe *after* configured". It implies that they may not be ready to use *during* configuration. Such objects depend on the protocol the programmers know in order to work properly.
But this "JavaBeans" pattern is also criticised for leaving the object fields non-final, though none of the properties are supposed to be changed after setting. They claim the compiler will not be able to do certain optimisations given that the fields are still mutable and the public setters are still accessible.
Despite the criticism, constructors with immutable fields are known not being able to form object graphs with cycle. Scala is another programming language that promotes immutable objects, but also suffers from the inability to form cycles of immutable references directly. [One workaround](http://stackoverflow.com/questions/8374010/scala-circular-references-in-immutable-data-types) is to use lazy evaluation to resolve some reference edges later, but the underlying implementation still depends on JVM-level mutable fields.
```scala
class Element [T] (val value: T, p : => Element[T], n : => Element [T]) {
lazy val prev = p
lazy val next = n
}
```
Unfortunately, such workarounds do not exist in Rust.
## rustc: The Rust Compiler
rustc is the compiler of the Rust language. Internally, it has an LLVM-like CFG representation, and there are inter-references between instructions and basic blocks.
rustc separates the *references* to things and the *contents* of things. Take basic blocks for example. A CFG contains many BasicBlock:
```rust
// For information, only. May not match the actual source code.
struct CFG {
bbs: Map<BasicBlock, BasicBlockData>
}
struct BasicBlock(u32)
struct BasicBlockData {
instructions: ...
blahblah: ...
blahblahblah: ...
}
struct BranchInstruction {
destination: BasicBlock
}
```
In the snippet above, `BasicBlockData` actually holds the information about a basic block, while `BasicBlock` is just an alias of `u32`. Instructions, such as `BranchInstruction`, refer to the destination by `BasicBlock`, which is a symbolic reference, and does not own the basic block. The actual owner of `BasicBlockData` is the `bbs` field in `CFG`. Using this approach, the AST is still a tree from Rust's point of view. When the program needs information about a basic block, it can borrow the reference to `BasicBlockData` from `CFG.bbs` using the `BasicBlock` as the key.
It is true that this approach will require a redirection whenever accessing the information about a basic block. But if the `Map` is implemented as an array and `BasicBlock` holds the index, then the lookup can be just an extra ADD and a LOAD. This is probably the best we can get if we use Rust without also using its unsafe features.
```rust
let bb: BasicBlock = ...;
{
let bbi: &BasicBlockData = cfg.bbs[bb]; // Borrow BasicBlockData
// use bbi
} // BasicBlockData returned here.
```
Similarly, the high-performance micro VM can use this pattern to handle cyclic references between entities. All Mu types can be owned by the bundle, and one type can refer to another via indices into the array of "MuTypeData".
The implication of this is that the API should also refer to other IR nodes via symbols rather than actual constructed nodes.
# The Proposed New API
The new API will introduce another C-level struct: `struct MuIRBuilder`. Similar to the `MuCtx` which is used by only one client thread, the client must only use `MuIRBuilder` in one client thread, otherwise the micro VM will need to synchronise every method of it. This new struct is actually orthogonal to this topic. From the observation of the current bundle building API, the functions related to bundle building has no intersection with other functions in the `MuCtx` struct whose purpose is to mutate the running Mu state.
`struct MuIRBuilder` is also a function pointer table, like the current `MuCtx` design. It is an open topic if we adopt the traditional C approach -- having all methods as top-level C functions, but it is a separate issue.
```c
typedef struct MuIRBuilder MuIRBuilder;
struct MuIRBuilder {
void *header; // implementation-specific private field
MuID (*gen_sym)(MuIRBuilder *b, MuCString name);
void (*load)(MuIRBuilder *b);
void (*abort)(MuIRBuilder *b);
void (*new_type_int)(MuIRBuilder *b, MuID id, int length);
void (*new_type_ref)(MuIRBuilder *b, MuID id, MuID target);
void (*new_type_struct)(MuIRBuilder *b, MuID id, MuID fields[], MuArraySize nfields);
...
};
```
## gen_sym: use ID for everything
The `gen_sym` method creates a "Mu symbol", or just "sym" for short. A "sym" has a numerical ID and an optional string name. A "sym" identifies a node in the bundle. The ID is generated by the micro VM when `gen_sym` is called. `name` can be `NULL`. The generated ID is used to identify this "sym".
All other functions take `MuID` as parameter for the ID of the node it is creating, and use the `MuID` to refer to other nodes. The ID can be used as long as it is returned by `gen_sym`, and *may even be used before the actual thing is created*. For example, if we are creating the linked list type, the C client code will look like:
```c
MuIRBuilder *b = ...;
MuID linkedlist = b->gen_sym(b, "@linkedlist");
MuID i64 = b->gen_sym(b, NULL /* I don't care about its name. */);
MuID linkedlist_ref = b->gen_sym(b, "@linkedlist_ref");
MuID members[] = {i64, linkedlist_ref}; // Use i64 and linkedlist_ref before these types are defined.
b->new_type_struct(b, linkedlist, members, 2);
b->new_type_int(b, i64, 64);
b->new_type_ref(b, linkedlist_ref, linkedlist);
```
Note that `i64` is used by `new_type_struct` before `new_type_int` is called.
Also note that the `new_xxxx` functions return `void` rather than handles. Since nodes refer to each other by "sym", handles are no longer necessary.
## complex sub-structures
More things will become IR nodes, such as
- branching destinations (basic block + arguments)
- exception clauses
- keep-alive clauses
- sub-clauses in the NEWTHREAD and SWAPSTACK instructions (these two instructions are too complex to be created by one function)
For example:
```uir
(@x @y) = CALL <@sig> @callee (@v1 @v2 @v3)
EXC(
@bb1(@v4 @x @v5 @y @v6)
@bb2(@blah @blah @blah))
KEEPALIVES(@v1 @v3 @v5)
```
```c
MuID call_inst = b->gen_sym(b, "@the_OSR_introspectable_call_site");
MuID x = b->gen_sym(b, "@x");
MuID y = b->gen_sym(b, "@y");
MuID exc_clause = b->gen_sym(b, NULL /* I prefer not to give too many names */);
MuID nor_dest = b->gen_sym(b, NULL /* I prefer not to give too many names */);
MuID exc_dest = b->gen_sym(b, NULL /* I prefer not to give too many names */);
MuID keepalive_clause = b->gen_sym(b, NULL /* I prefer not to give too many names */);
MuID call_args[] = {v1, v2, v3};
MuID call_results[] = {x, y}; // Sorry Eliot
b->new_call_inst(b, call_inst,
call_results, 2, // SSA variables for return values
sig, callee,
call_args, 3, // arguments to the function
exc_clause,
keepalive_clause);
b->new_exc_clause(b, exc_clause, nor_dest, exc_dest);
MuID nor_args[] = {v4, x, v5, y, v6};
b->new_dest(b, nor_dest, bb1, nor_args, 5);
MuID exc_args[] = {blah, blah, blah};
b->new_dest(b, exc_dest, bb2, exc_args, 3);
MuID keepalive_vars[] = {v1, v3, v5};
b->new_keepalive_clause(b, keepalive_clause, keepalive_vars, 3);
```
It's quite some code, but it should be okay if the client doesn't construct bundles via this API by hand.
## Instruction results are passed in, too
Note that the "syms" of return values are passed in. They are SSA variables, too, and the instructions need to refer to their results. Most instructions will be created like "three-address instructions".
But most instructions has a known number of return values, such as:
```uir
%c = EQ <@i64> %a %b
```
```c
MuID i64, a, b = ...;
MuID cmp_inst = b->gen_sym(b, "@the_name_of_the_instruction_itself_for_tracing_and_debugging");
MuID c = b->gen_sym(b, "@func.ver.entry.c");
b->new_cmp(b, cmp_inst,
c, // the only return
MU_CMP_EQ, i64, a, b);
```
Some instructions (namely binOp (with zero/neg/ovf/carry flags), CALL, TRAP, WATCHPOINT, SWAPSTACK and
COMMINST) may have different number of results depending on the arguments. But since all instructions are "three-address", all results have to be passed in anyway.
## Put stuffs together
To prevent mutability, all instructions (themselves) are created in one step, while complex sub-components (such as exc-clause, keepalive-clause, ...) are created separately.
A basic block is created in one step, too. When creating basic blocks, instructions are passed in as parameters.
```uir
%bb1(<@T1> %p1 <@T2> %p2) [%exc_param]:
%x = [%inst1] ADD ...
%y = [%inst2] SUB ...
[%inst3] TRAP ...
```
```c
MuID bb1, p1, p2, exc_param inst1, inst2, inst3 = ... // gen_syms
b->new_binop(b, inst1, ...);
b->new_binop(b, inst2, ...);
b->new_trap(b, inst3, ...);
MuID bb1_param_tys[] = {T1, T2};
MuID bb1_params[] = {p1, p2};
MuID bb1_insts[] = {inst1, inst2, inst3};
b->new_bb(b, bb1,
bb1_param_tys, bb1_params, 2, // Two parameters
exc_param, // This block may be used to catch exceptions.
bb1_insts, 3 // Three instructions
);
```
The top-level is implicit. When the `MuIRBuilder->load` method is called, all top-level definitions (types, signatures, constants, global cells, functions, exposed functions) ever created will be part of the bundle. It should be reasonable to keep mutability at the very top level.
# Performance impact
It still needs to be observed from experiments.
Kunshan WangKunshan Wanghttps://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/28Object Pinning2017-05-02T16:25:32+10:00John ZhangObject Pinning*Created by: wks*
This issue is part of https://github.com/microvm/microvm-meta/issues/24
# TL;DR
This proposal gives meaning to the "object pinning" operation.
The meaning is: The PIN operation takes an `ref<T>` or `iref<T>`, ...*Created by: wks*
This issue is part of https://github.com/microvm/microvm-meta/issues/24
# TL;DR
This proposal gives meaning to the "object pinning" operation.
The meaning is: The PIN operation takes an `ref<T>` or `iref<T>`, pins the object for the current thread, and returns a `ptr<T>` (pointer to `T`). This pointer **can be used to** access the memory location of the `iref` until all threads which have pinned the object unpinned it using UNPIN operations.
Note: This has very few implications to the Mu implementation. It only says the pointer can be *used* in the expected way, but does not say anything about the storage of the actual object. (The micro VM can cheat!)
## Operations
In the following two instructions, `R` can be either `ref` or `iref`.
* `PIN(%r: R<T>) -> ptr<T>`: Add the object referred by `%r` to the *pinning set* of the current thread, and return a pointer.
* `UNPIN(%r: R<T>) -> void`: Remove the object referred by `%r` from the *pinning set* of the current thread.
`PIN` and `UNPIN` do not pin any object if `%r` refers to a memory location not in any heap object. If `%r` is NULL, `PIN` returns a NULL pointer. If `%r` is an `iref` and refers to a stack cell or a global cell, `PIN` returns a pointer to it.
> NOTE: All memory locations in Mu, not just heap objects, are referred by `iref`. In order to let native code work with the Mu memory, pointers always have to be generated. That is why `PIN` and `UNPIN` trivially work with non-heap memory locations as well. It may be impossible at compile time to know whether an `iref` refers to the heap. For example, there may be a function taking an `iref` as a parameter.
## The guarantees
The pointer returned by `PIN` has the following guarantees:
* The pointer is usable as long as the object pinned by `PIN` is in the *pinning set* of **any** thread.
* The pointer points to a region of address which can be used to access the memory location of the parameter of `PIN` (i.e. `%r`). Specifically:
+ The object layout conforms to the platform's Mu Application Binary Interface (yet to be defined).
+ The native code can perform LOAD, STORE, CMPXCHG, ATOMICRMW, FENCE operations on those locations and they shall conform to the Mu memory model. However, *which native instruction/operator/function performs which operation in the Mu memory model* is implementation defined.
> One memory order can be implemented in multiple different ways. e.g. on x86, SEQ_CST can be implemented as (load: MOV, store: XCHG), but also (load: LOCK XADD(0), store: MOV). It is the implementation to guarantee the Mu memory operation (Mu IR instructions) is compatible with the native counterparts (C11 `<stdatomic.h>` or C++11 `<atomic>`). For example, one particular implementation may let the `atomic_load(ptr, memory_order_xxxxxx)` function in glibc (but not `atomic<T>.load(xxxx)` libc++ provided by LLVM) to perform the LOAD operation in xxxxxx memory order in the Mu memory model.
## Issues about multi-threading
It is possible for two threads to pin the same object. For example, there are two threads T1 and T2 and object O. The execution appears like the following sequence:
1. T1: pin O
2. T2: pin O
3. T1: do something with O
4. T1: unpin O
5. T2: do something with O
6. T2: unpin O
In step 5, T1 has performed an unpin operation. If an object can be pinned from one thread but unpinned by another thread, then there will be a problem: If the object O is no longer pinned, it will be an error if T2 do anything to the pointer.
It is possible to require a thread to acquire a lock or perform reference counting before pinning/unpinning, but this will be inefficient because this inevitably involves expensive atomic operations. But one reason for using the FFI is performance.
Therefore, we let different threads to pin/unpin an object **locally**: `PIN` means pinning an object **for the current thread**. An object is pinned if and only if at least one thread is pinning it.
Implementation-wise, this can be done by keeping a thread-local buffer which records all objects the current thread is pinning. When GC happens the marker looks at the thread-local buffers to find all objects pinned by any thread. In this way, mutators do not need atomic memory operations, but the GC needs to look at all threads.
This "thread-local pinning" mechanism cannot be implemented by the client if the `PIN` instruction in Mu is racy. Giving the client access to the thread-local buffer is no different from the thread-local `PIN` instruction. So this thread-local pinning mechanism does not violate the principle of *minimalism* of Mu: it cannot be implemented efficiently outside Mu.
-----------------------------CUT HERE. BELOW ARE LEGACY TEXTS-------------------------------
# Abstract
I propose defining two kinds of memory spaces: *real space* which models the memory used by C or native programs, and *imaginary space* for that of the µVM. *Object pinning* (or *realising*) is an operation that temporarily makes a memory location in the imaginary space real so that it can be access form C programs.
# Proposal
## Concepts
* **memory**: self-explanatory, but... I don't trust "common sense".
* **memory location**: a region of data storage. Holds values.
* **virtual memory space**: the abstraction provided by the OS and the architecture. It has the following properties:
+ At any moment, it is a mapping from addresses (a subset of integers) to byte values. (I don't like this property. For any multi-threaded program, different threads may not see the same value, and Albert Einstein does not like "the same time".)
+ It can be accessed (read/written/atomicRMW) in various granularities (sizes). The atomicity and visibility between threads follows a certain memory model (the one defined by the architecture, OS and related programming languages).
+ It may be shared between processes and threads. Thus it can be accessed by things not in the µVM.
* **real memory**: memory in which memory locations satisfy the following properties:
+ (Does not need to have "addresses", that is, a memory location can be a variable, not numerical value.)
+ Allows memory accesses (load/store/atomicRMW).
+ For every memory location L, there is a unique memory location L' in the virtual memory space. (This disallows replication.) This L' does not change during the lifetime of L. (This disallows moving.) Accessing of both locations are equivalent.
+ For any two memory locations L1 and L2, their corresponding memory locations in the virtual memory space do not overlap. That is, their accesses are independent. (This disallows aliasing.)
+ For an array in a real memory, its corresponding memory location in the virtual memory space is contiguous. (This disallows implementing arrays as multiple disjoint sub-arrays.)
* **imaginary memory**: memory in which memory locations satisfy the following properties:
+ (Does not need to have "addresses", that is, a memory location can be a variable, not numerical value.)
+ Allows memory accesses (load/store/atomicRMW).
NOTE: As can be seen, "real memory" is trivially "imaginary memory".
* **realising**: temporarily letting a memory location in an imaginary memory have the property of real memory. (This is colloquially called **object pinning**, but it is more than "not moving").
* `iref<T>` (**internal reference**): refer to a memory location in real or imaginary memory.
* `ptr<T>` (**pointer**): an address. May or may correspond to a memory location in the real memory.
## In the µVM
* All memory in the µVM (heap, stack and global) are imaginary memory.
* Introduce the pointer type `ptr<T>`. It is just a raw address, but is typed.
* Introduce the `PTRCAST` instruction which can freely cast `ptr<T>` to or from `int<n>` if n is the appropriate size.
* `LOAD`, `STORE`, `CMPXCHG`, `ATOMICRMW` now work with both `iref<T>` and `ptr<T>`.
* The `CCALL` can call a C function.
+ Plan A: The callee can have type `int<n>`. It is just an integer address.
+ Plan B: Introduce a `c_func<sig>` type. It is castable to/from `int<n>`. NOTE: `func<sig>` refers to µVM functions.
## Pinning
* "Pinning a memory location" means "realising" it, granting it the property of real memory.
* Implicit pinning: Any `iref<T>` values used as arguments of `CCALL` are implicitly pinned during this call.
* Explicit pinning:
+ Plan A: Introduce `REALISE` and `UNREALISE` instructions. Do as it means. The `REALISE` instruction returns a `ptr<T>` value.
+ Plan B: `REALISE` and `UNREALISE` have counting semantics. An object is "unpinned" if its pin-count reduces to 0.
+ Plan C: (the tracing approach) Introduce a type `pinner_iref<T>` which actually holds an `iref<T>` (a [marked storage type](https://github.com/microvm/microvm-spec/wiki/type-system#types-and-type-constructors) of `iref`). `pinner_iref<T>` must be in the memory (not SSA, just like `weakref<T>` cannot be SSA variable). If such a reference is reachable, the referent is pinned. After pinning, the pointer can be obtained via a `GETPTR` instruction. (Plan C does not address replication and non-contiguous arrays)
## Open questions
1. Do we assume stacks and globals as "real" by default?
2. If stacks can move, how do we efficiently realise (pin) it?
3. Do we prevent non-contiguous arrays?
4. How to implement temporary "un-replicating".
# Background: Inter-language interaction
Currently the only way for the µVM to interact with the "outside world" is via traps handled by the client. This interface is called **µVM-client Interface** or **The API**.
For performance concerns, we should introduce a more direct and low-level interface to the "outside world". This new interface is called **foreign function interface** or **FFI**.
## Two worlds
**Imaginary memory**: In a world with advanced garbage collectors, the memory is managed by the GC.
* A high-level memory location (in object or not, for example, if a VM implements movable stacks) may be moved from one address to another (address is the operating system or architecture's virtual address space).
* A high-level memory location It can be replicated (a single high-level object/field corresponds to multiple system memory addresses). This may have different purposes, for example, concurrent GC, security, etc.
* A high-level memory data structure may not have the same structure of the system-level memory. For example, a high-level array may be implemented as segments of (non-contiguous) arrays.
* Programs written in C can only access this kind of memory assisted by the memory manager (GC).
* Example: Java, µVM.
**Real memory**: In a world closely interacting with C, the GC is somewhat naive, or there is no GC at all.
* High-level memory locations (as seen by the programming languages (like C) or VMs (like CPython)) do not move and are not replicated. Each high-level memory location (in object or not) corresponds to exactly one OS/architecture-level address as long as it is not deallocated.
* Programs written in C can directly access the memory as long as it has a raw pointer to the memory location.
* Example:
* Any non-GC language: C, C++, Rust, ...
* Any language/impl that tightly interacts with C: CPython, Lua (partially)
## Examples
* The µVM uses "imaginary memory". It does not assume any low-level memory layout except some high-level rules.
* Java exclusively use "imaginary memory". All Java memory accesses through JNI must go though handles. It is even a problem to expose an array to the C language: 1) the VM must support object pinning, and 2) the VM must implement arrays contiguously.
* CPython uses "real memory". C programs hold any Python objects by raw pointers. A C module can customise its own Python object layout to include its own private data.
* Lua uses "real memory". "Userdata" (a chunk of memory allocated by Lua, but is used by the user, like a managed "malloc") is a Lua object. `lua_touserdata` gets a raw pointer to such a chunk of memory and does not need pinning. `lua_topointer` gets a raw pointer to any Lua object (for debugging purpose).
* SpiderMonkey uses something hybrid. Its GC can move objects, but not within a "request" (a delimited region in C programs where GC "must not happen"). In a "request" (probably everything in C that interacts with SpiderMonkey, the C program can use raw pointers to refer to JS objects, though their structures are opaque, and it is recommended to use `JSHandleValue` to mark it as a GC root.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/17Portability vs exploiting architecture potentials.2016-06-17T15:23:01+10:00John ZhangPortability vs exploiting architecture potentials.*Created by: wks*
> For should the enemy strengthen his van, he will weaken his rear; should he strengthen his rear, he will weaken his van; should he strengthen his left, he will weaken his right; should he strengthen his right, he wil...*Created by: wks*
> For should the enemy strengthen his van, he will weaken his rear; should he strengthen his rear, he will weaken his van; should he strengthen his left, he will weaken his right; should he strengthen his right, he will weaken his left. If he sends reinforcements everywhere, he will everywhere be weak. -- Art of War, by Sun Zi
The µVM is designed to abstract over the hardware, but only acts as a thin layer of abstraction.
There are many differences between underlying hardware platforms. Scroll down to the Appendix or read https://github.com/microvm/microvm-meta/issues/16
# Solutions to differences
As mentioned in https://github.com/microvm/microvm-meta/issues/16, there are three solutions, from the weakest to the strongest.
1. Define the differences as **undefined behaviour**. The client must prevent touching those fields at all cost.
2. Define the differences as **implement-defined behaviour**. The µVM implementation define the behaviour and provide documents/compile-time/run-time checkable mechanisms.
3. **Define the behaviour** in the µVM. The µVM implementation bridges the differences.
The 1st approach is the weakest. It makes it absolutely impossible to make use of any platform-specific features and will introduce excessive checkings to avoid undefined behaviour. So the µVM should lie somewhere in between approach 2 and 3.
# The µVM design goal
The µVM is having several conflicting design goals.
1. µVM is low-level. It is a thin layer over the hardware.
2. µVM is minimal. Anything that can be done efficiently by the Client should be done by the Client.
3. µVM is **portable**.
4. µVM should support high-performance VMs
5. µVM potentially run on resource-constrained devices.
Depending on how to interpret **portable**, there are two different implications:
1. The µVM provide compile-time or run-time flags so that the Client can generate platform-dependent µVM IR code.
2. The µVM IR code is cross-platform. The same µVM IR code should run on all platforms.
The second interpretation is more portable than the first. Overall, the more portable the µVM is, the thicker the abstraction layer is.
Goal 4 can be interpreted in different ways:
1. Highest theoretical possible performance.
2. As high as C for computation-intensive problems.
3. As high as Java for computation-intensive problems.
4. Much higher than Python/PHP/R/<*insert your favourite scripting language here*>
5. As high as Python/PHP/R/...
6. As long as the program eventually terminates.
Depending on the concrete implementation, an average desktop/server µVM should reach between 2 and 3.
Goal 5 requires the µVM and the Client to be simple. In this case, the Client and the µVM cannot perform too much reasoning about the program. This results in sub-optimal performance of emitted code.
# Choices in the µVM
## Behaviour of UDIV/SDIV
1. UDIV/SDIV are implementation-defined, or
2. They always behave like Java (div by 0 is an exception, -0x80000000/-1 = -0x80000000)
## Supported vector sizes
Conflicts:
1. The current µVM IR is very expressive. It can express any vector types, given type T and size n: `vector <T n>`
2. Not all are supported by the architecture
So the µVM should
1. Make it **implementation-defined**, or
2. Only support **some selected vector sizes**, or
3. Support some selected vector sizes **and platform-specific vector sizes**, or
3. Support **all** `vector<T n>`, either using scalar operations or vector instructions.
Rationale:
1. The Client can **probe** the supported vector sizes and can generate code accordingly. So it is the Client's responsibility to choose vectors. This is also a kind of **specialisation** which is usually done by the high-level optimiser.
2. Choosing the appropriate vector size is a kind of **instruction selection** and should be done by the µVM in a platform-specific fashion.
For rationale 1, (DeVito et al)[http://terralang.org/pldi071-devito.pdf] used auto-tuning technique to determine the optimal vector size for vectorised matrix multiplication problem.
For rationale 2, (Jibaja et al)[https://01.org/node/1495] proposed adding SIMD into JavaScript, but only providing 128-bit registers to the programmer. However, the following code is difficult for the µVM to convert to 256-bit vectors:
```
__m128 elems[N];
for (int i=0; i<N; i++) {
elems[i] = vector_add_floatX4(elems[i], constant_vector(1,2,3,4));
}
```
The reason is:
1. It requires extensive control-flow analysis to merge two adjacent 128-bit adding to one 256-bit adding.
2. 256-bit vectors have different alignment requirement. So array __m128[] cannot be treated as the array __m256[].
The following code, however, is easier to convert:
```
__align_to(256) __m128 elems[N];
for (int i=0; i<N; i+=2) {
elems[i] = vector_add_floatX4(elems[i], constant_vector(1,2,3,4));
elems[i+1] = vector_add_floatX4(elems[i+1], constant_vector(1,2,3,4));
}
```
However, the client, when generating such code, **is already aware of** the presence of 256-bit vectors.
## Array indexing
Conflict:
1. Array indexing seems to be platform-independent. It is "begin+index" (index can be negative).
2. It is implemented using address calculation and memory accessing, where "address" is word-sized.
Solutions
1. Use **any integer type** as the index and are sign-extended (used by LLVM. See http://llvm.org/docs/LangRef.html#getelementptr-instruction), or
2. The index must be **word-sized** and the Client must appropriately extend/truncate the index to word-sized (current µVM design)
Rationale:
1. It only involves a signed extension of truncation and the µVM can cheaply do it. However,
2. Since the Client can probe the word size and the Client can generate TRUNC/SEXT instructions, it should be done by the client.
# Portable µVM IR as a subset of the µVM
If we can define **a subset of the µVM** with **defined behaviours** and **reasonable** performance (perhaps calling it "µVM Mobile Edition"), then some simplistic µVM Clients (I assume there will soon be many such Clients) can generate portable µVM IR code.
For implementations that seek ultimate performance, the µVM implementation can implement many **machine-dependent instructions** as a super set of such a "portable" IR. (call it "µVM Enterprise Edition"?)
Alternatively, we can define that "subset" as "The µVM IR ®" and treat the implementation-dependent extensions as "extended µVM".
# Appendex: Examples of differences
There are differences between architectures. For example:
The **word length** is different. Some operations (including array indexing) depends on the word length.
The **integer division** instruction behaves differently among architectures, especially in "division by zero" and "signed integer overflow".
Not all operations on all types perform equally well. 64-bit integer operations perform significantly worse (if possible at all) than 32-bit counterparts on 32-bit machines.
The supported **vector length** of vector instructions is different. This may vary from 64 bits (ARM) up to 512 bits (x86 with AVX) with 128 bits widely supported.
Some instructions are "optional" and the functionality must be software-implemented on those architectures. For example, **UDIV** and **SDIV** in ARMv7.
Although most architectures provide all operations (binary arithmetic and logical, comparison and conversion) mentioned in LLVM, concrete instructions does not work on all data sizes. For example, **conversion from floating point to integer** can only convert between certain data types (float to 32-bit int or 64-bit int, but not other int types.) Not all **vector operations** work for all vector types.
The address size (and indexes) of memory (and array) operations is different among architectures.
Supported data type that can be **atomically loaded/stored**. This affects how the Client is going to implement mutex locks.
Alignment requirement for **vector load/store** operations.
In the native interface/C interface/foreign function interface(FFI)
* The **pointer size** depends on the architecture. This also affects **function pointers** for external C functions.
* The **available system calls** depend on the operating system, the ABI and the processor. The Client must handle the differences between operating systems.
Other issues mentioned in https://github.com/microvm/microvm-meta/issues/16
# Current extension mechanism
An **intrinsic function** (or IFUNC for short) is something that has the similar form of a function, but is treated by the µVM as a regular instruction. The simplest form takes only the name (or ID) of the IFUNC:
```
%result = ICALL @uvm.thread_exit
```
The most complete form takes type arguments, value arguments and an exceptional destination:
```
%result = ICALL @uvm.do_something_weird <@T1 @T2 @T3> (%val1 %val2 %val3) EXC %normal %exceptional
```
Theoretically all binary operations can be defined as IFUNCs. For example:
```
%result = ICALL @uvm.math.sdiv <int<64>> (%lhs %rhs) EXC %normal %div_by_zero_handler
```
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/62Possible statistics trivially implementable in the refimpl2018-06-28T23:11:34+10:00Kunshan WangPossible statistics trivially implementable in the refimplThese will provide more information about the characteristics of real-world Mu programs, and help implementers.
Dynamic recording:
* counting instructions executed (how many ADD, LOAD, BRANCH, ... executed)
* number of instructio...These will provide more information about the characteristics of real-world Mu programs, and help implementers.
Dynamic recording:
* counting instructions executed (how many ADD, LOAD, BRANCH, ... executed)
* number of instructions between branching (median)
* function call count
Static analysis:
* register pressure (static analysis)
- simultaneous live variables
Kunshan WangKunshan Wanghttps://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/44Pre-SSA form2016-06-17T15:23:52+10:00John ZhangPre-SSA form*Created by: eliotmoss*
We have concluded that while the official form of Mu IR is SSA form (but see Issue #18 for current thoughts on how to represent that form), many clients will find it more convenient to generate something that is ...*Created by: eliotmoss*
We have concluded that while the official form of Mu IR is SSA form (but see Issue #18 for current thoughts on how to represent that form), many clients will find it more convenient to generate something that is mostly Mu IR but that is not in SSA form, and that is it further desirable to offer a standard tool to convert from some "pre-SSA" form to proper SSA form. This tool may operate in a stand alone manner or be more in bed with an implementation of Mu.
We propose the following specific pre-SSA form, according to how it differs from SSA-form Mu.
1. "SSA-variables" may be assigned more than once; however, any individual such variable must be used in a type-consistent manner.
1. PHIs may be omitted (or, in the proposal of #18, values may be omitted at branches and variables omitted at labels)
1. For convenience we introduce a "copy" operator, var = ID <T> arg, which takes one argument arg of type T and assigns it to variable var. This operator seems to be convenient sometimes from a client perspective.
The converter to SSA-form will perform live-ness analysis and add variables to labels and values to branches as necessary, checking for type consistency. If some variable is live but not initialized, then the converter will insert a safe initialization (to 0 or 0.0 for numeric types, null for a pointer, etc.) at the latest possible point that does not interfere with existing assignments to the variable. (Optimization may move the initialization earlier as deemed appropriate.)
We will undertake to develop the converter in Scala or Java.https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/43Reduce special cases involving the void type2016-06-17T15:23:50+10:00John ZhangReduce special cases involving the void type*Created by: wks*
# The current status
The `void` type is a special type in the Mu type system. It has no value, and thus many instructions/mechanisms have special cases for the `void` type.
**Instructions that have special cases ...*Created by: wks*
# The current status
The `void` type is a special type in the Mu type system. It has no value, and thus many instructions/mechanisms have special cases for the `void` type.
**Instructions that have special cases for `void`**:
- `RET` and `RETVOID`: Since `void` has no value (In fact it does. The return value of the `BRANCH` instruction, for example, is a value of the `void` type.), we needed a special syntax to return `void`, thus we have `RETVOID`.
- The "new-stack clause" of the `SWAPSTACK` instruction: `PASS_VALUE <T> %val` and `PASS_VOID`: for the same reason why we have `RET` and `RETVOID`.
**The trap handler has a special case for `void`**:
Just like `SWAPSTACK`, the trap handler may rebind the thread to a stack and either "pass a value" or "pass `void`" or "throw an exception".
## Other existing uses
**Instructions that always return `void`**: `BRAHCN`, `BRANCH2`, `SELECT`, `TAILCALL`, `RET`, `RETVOID`, `THROW`, `STORE`, `FENCE`, some common instructions: `@uvm.kill_stack`, `@uvm.thread_exit`, `@uvm.native.unpin`, `@uvm.native.unexpose`, `@uvm.meta.load_bundle`, `@uvm.meta.load_hail`, `@uvm.meta.pop_frame`, `@uvm.meta.push_frame`, `@uvm.meta.enable_watchpoint`, `@uvm.meta.disable_watchpoint`, `@uvm.meta.set_trap_handler`: These instructions do not return meaningful values.
**Instructions that may return `void` sometimes**: `CALL`, `TRAP`, `WATCHPOINT`, `CCALL`, `SWAP_STACK`: The callee, client, swappee, or whatever the other end of communication is, may not return meaningful values.
## Current properties of `void`
`void` can only be used in 3 cases:
1. As the type of allocation units that do not represent values. Hence it is usable as the referent type of reference types and pointer types. e.g. You can run `NEW <@void>`. Each time you NEW a void, you have a **new** empty object, not the same as any other.
2. As the fixed part of a hybrid to indicate the absence of the fixed part. e.g. `hybrid<void int<64>>` is a variable-length array of `int<64>`, without a fixed part.
3. As the type of instructions or the return type of functions that do not return values. e.g. the `BRANCH` instruction returns `void`.
Other properties:
- `void` has no value (in fact it does, as mentioned before)
- `void` is neither a scalar type nor a composite type.
- Only scalar types can be used for memory access: `LOAD`, `STORE`, ...
- Only composite types have other types as components: fields/elements
- `void` is nether storable nor loadable. It does not contain other parts. It cannot be part of a struct/array/vector. i.e. there is no "array of void". The "fixed part of a hybrid" is an exception.
- `void` is native-safe: It can be returned from native functions; and there can be `uptr<void>`.
# Proposed changes
**value of `void`**: Instead of "having no value", `void` now has exactly one value: NULL. This is consistent with Python: `NoneType` has only one value `None`.
**`void` constant**: We reuse the `NULL` literal to create a "void constant":
```c
.const @VOID <@void> = NULL // The only possible value of void.
// For the sake of consistency, we require the client to define it.
//
// Alternative: make it a pre-defined value, such as the @uvm.predef.void_t type
// and the @uvm.predef.VOID value. We could define @uvm.predef.i8, @uvm.predef.i16,
// @uvm.predef.i32, @uvm.predef.i64, @uvm.predef.float, @uvm.predef.double,
// @uvm.predef.ref_void, @uvm.predef.ref_i32..., @uvm.predef.but the choice seems too arbitrary.
```
All existing instructions that return `void` return this `NULL` value. In theory, the following snippet is valid, but stupid:
```c
%entry:
%x = BRANCH %bb1
%bb1:
RET <@void> %x // return void. Should have said RET <@void> @VOID
// or even "RET @VOID" omitting the type argument, because RET always returns the
// return type of the current function. ADD, SUB, MUL ... would have to infer the operand
// types if the operand type is not provided, but RET does not need to be inferred: the
// function return type is explicit.
```
**Remove the `RETVOID` instruction**: Use `RET <@void> @VOID` instead, or simply `RET @VOID`.
**Remove the `SWAPSTACK` clause `PASS_VOID`**: Use ``PASS_VAL <@void> @VOID`` instead. Unlike `RET`, the type parameter here is necessary: the type that the swappee expects is dynamic. It may expect a different type at a different `SWAPSTACK` site. Guessing the wrong type while swapping has undefined behaviour.
**Trap handlers no longer needs a PASS_VOID return case**: Instead, pass a `NULL` constant.
## New ways to use `void`
In addition to the existing three ways, i.e. empty objects, hybrid fixed part, empty return value, `void` can now be used in the following ways:
- In `RET` to return from a function of `void` return type.
- In `SWAPSTACK` to swap to a stack that does not expect to receive a value (it receives the `NULL` value of the `void` type).
- In the trap handler, rebind the stack which expect void.
They all fit into the category that "the other end of communication" does not pass a value.
## Things that should still be forbidden
**`void` must not be a parameter type**: I don't have a very compelling reason, but it is completely useless (only increases the apparent arity of a function).
**`void` must not be part of a struct/array/vector or the variable part of a hybrid**: Not allowing this will gain us a very nice property: each field/element in any struct/array/vector/varpart has a different offset. In `struct<@i32 void void void void @x>`, since `void` should have size 0 and alignment 1 (in the sense `void` can be allocated at any address *a* such that *a* % 1 == 0), void does occupy space. Then all of the void fields are at the same offset as `@x`. Another reason: C does not allow void to be a struct field.
**Empty structs (`struct<>`) should be forbidden**: For the same reason as `void` as a field. Just use `void` because it is so special. C forbids empty structs, too, but GCC allows it.
# How about LLVM?
LLVM IR has two syntax for the `ret` instruction:
- `ret <type> <value>` for example: `ret i32 100`
- `ret void` this returns void.
LLVM does not have "void constant", either, since `void` is not a "first class type".
LLVM `void` is not a "first class type". Only `void` and function types are not "first class type". LLVM has both "function" types and "pointer to function" types.
LLVM LangRef does not say parameter types cannot be `void`, but `void` is never used as parameter types. In C, `void` is an incomplete type, and thus cannot be a parameter type.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/58Signal handling for clients or user programs2018-06-28T23:11:34+10:00Kunshan WangSignal handling for clients or user programsTL;DR: The concept **signal** is unix-specific and varies *a lot* among operating systems. However, many programs, especially unix command-line programs, depend on signals for basic user interaction, such as CTRL-C. This issue talks abou...TL;DR: The concept **signal** is unix-specific and varies *a lot* among operating systems. However, many programs, especially unix command-line programs, depend on signals for basic user interaction, such as CTRL-C. This issue talks about what a Mu implementation can do for the client and the application programmers. The *simplest* thing a Mu implementation can do is *do nothing*. If the implementation wants to be responsible, there is much room it can design its implementation-specific interface.
# Problem statement
Many UNIX command line programs use signals to interact with the user or other programs **in normal use cases**, such as
- Asynchronous signals:
- SIGINT: received when the user presses CTRL-C
- SIGWINCH: received when the size of the terminal window is changed
- Synchronous signals:
- SIGPIPE: when attempt to write to a broken pipe. For example, when `cat foo.txt | sort | uniq`, and `cat` tries to write to stdout.
## Mu status quo
Mu is not designed for Unix only, and does not contain "signal" in its IR or API.
For most errors, Mu IR programs behave in a signal-agnostic way. For example:
- division by zero: In the [UDIV/UREM/SDIV/SREM](https://gitlab.anu.edu.au/mu/mu-spec/blob/master/instruction-set.rst#binary-operations) instructions, an "exceptional destination" must be specified. It is a basic block in the same function as the `*DIV`/`*REM` instruction, and it behaves like a jump. Implementations may use signals to implement such jump, but may also generate a `CMP reg, 0`, `JE abnormal_dest`, `DIV` sequence.
- NULL pointer: Just like division by zero, the [LOAD/STORE](https://gitlab.anu.edu.au/mu/mu-spec/blob/master/instruction-set.rst#load-instruction) instructions also take an "exceptional destination", which is jumped to when NULL pointer error occurs.
But Mu does not define what happens when CTRL-C is pressed
## Signals, VMs and operating systems
[This article from IBM](http://public.dhe.ibm.com/software/dw/java/i-signalhandling-pdf.pdf) described what happens when the user presses CTRL-C on different operating systems. Obviously the behaviours vary greatly.
Quote:
- On z/OS and AIX: A single thread, chosen by the operating system, receives the signal.
- Linux: All threads receive the signal, and the signal handler is invoked on each thread. Linux threads are just separate processes that share the same address space, so it is also possible for another application to raise a signal on a specific thread.
- Windows: A new thread is created for executing the signal handler. This thread dies once the signal handler is complete.
JVM does not provide any official mechanism to handle signals, probably because JVM is not UNIX-specific, either. In fact, [Jython cannot handle CTRL-C like CPython does](http://bugs.jython.org/issue1270): Jython immediately terminates the Python program rather than raising a catchable Python exception.
The closest public Java API is `Runtime.addShutdownHook`. The hook will be called when the VM is shutting down, including when CTRL-C is pressed. But this mechanism cannot prevent the shutdown sequence from happening.
There is a proprietary interface: [sun.misc.Signal](http://www.docjar.com/docs/api/sun/misc/Signal.html). The documentation says the signal is handled in a new Java Thread running at MAX_PRIORITY. But `sun.misc.*` is private to the JVM implementation, and thus cannot be depended on.
# What can Mu implementations do?
For synchronous signals, such as `SIGPIPE`, the the client should provide wrappers so that `write` does not raise `SIGPIPE`, but instead returns a special error code, or throw a Mu exception. In this way, the client bypasses the potential signal. It looks like [simply masking this signal, or setting a file-descriptor to not raise SIGPIPE] will do the job.
For asynchronous
1. Do nothing. SIGINT, ... will simply kill the process. Or,
2. Provide platform-specific interfaces to the client.
The first option is the easiest if we just want a running micro VM.
The second option is where the Mu implementation writers (such as [mu-impl-fast](https://gitlab.anu.edu.au/mu/mu-impl-fast)) can demonstrate their creativity. It will be like doing a mental exercise of "How will you design the Java API so that it is easy to write command line tools (such as `cat` and `grep`) in Java?"
One method I can think about is to provide a global event queue. The client should provide a thread that polls from this queue in the background and take appropriate options (such as sending messages to other Mu threads, or interrupting them via setting shared variables, using watchpoints or futexes).
If we find one interface particularly favourable, we may consider *standardising* the extensions for *all Mu micro VMs on UNIX*.
# Higher-level view
"Signal" is a 1980s-1990s UNIX idea. Before 1995, POSIX does not have "threads", so there were one thread per process. Signals are sent to processes, and then they are handled by **the only thread** in the process. At that time, signals were probably a very straightforward message-passing mechanism. The **stack layout is exposed** to the C programmer via the parameter to the signal handler, probably because at that time, those who handle signals are probably system experts.
But things have changed when **multi-threading** model comes into being, and **VMs make the stacks opaque**. This makes us think **what really are the endpoints of signal communication**? It looks like the old signal model is no longer perfectly suited for the multi-threaded world, but the *command line interface is designed at the old ages* and has not fully adapted to the new world, yet. I am looking forward to how future programming languages and VMs can change the way people write command-line programs.