general-issue-tracker issueshttps://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues2018-06-28T23:11:34+10:00https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/53Access to native thread-local memory2018-06-28T23:11:34+10:00John ZhangAccess to native thread-local memory*Created by: wks*
This issue is about accessing thread-local memory/variables defined in native programs (C). One important application is the `errno` variable in C/C++.
This is only slightly related to #52 which introduces thread-lo...*Created by: wks*
This issue is about accessing thread-local memory/variables defined in native programs (C). One important application is the `errno` variable in C/C++.
This is only slightly related to #52 which introduces thread-local storage to Mu itself. There is no intention to force Mu's thread-local storage use the same mechanism as native programs.
Thread-local storage in native programs is highly machine/OS/ABI-dependent. The register used to point to thread-local buffers varies, and maybe not all platform have such register.
One possible workaround could be depending on helper functions written in C or assembly.
But if we want Mu to integrate deeper with native programs (i.e. do things more efficiently), we can define more instructions (probably "common instructions") to give Mu more capabilities, such as getting/setting the value of the FS register. But any such instructions would likely be platform-dependent and probably optional for unsuitable platforms.https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/52Thread-local storage2018-06-28T23:11:34+10:00John ZhangThread-local storage*Created by: wks*
Add **thread-local** memory to Mu, in addition to the existing *heap*, *stack* and *global* memory.
[Proposal 1](https://github.com/microvm/microvm-meta/issues/52#issuecomment-213364592): the C-like approach, has kn...*Created by: wks*
Add **thread-local** memory to Mu, in addition to the existing *heap*, *stack* and *global* memory.
[Proposal 1](https://github.com/microvm/microvm-meta/issues/52#issuecomment-213364592): the C-like approach, has known problems
[Proposal 2](https://github.com/microvm/microvm-meta/issues/52#issuecomment-213375674) (preferred): a more aggressive designhttps://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/51WebKit's B3 JIT compiler2016-02-19T14:27:53+11:00John ZhangWebKit's B3 JIT compiler*Created by: wks*
The B3 JIT compiler has received much attention recently.
I started a Wiki page: https://github.com/microvm/microvm-meta/wiki/B3-JIT-%28WebKit%29
Let's summarise B3 and its influences to Mu in the Wiki.
*Created by: wks*
The B3 JIT compiler has received much attention recently.
I started a Wiki page: https://github.com/microvm/microvm-meta/wiki/B3-JIT-%28WebKit%29
Let's summarise B3 and its influences to Mu in the Wiki.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/50IR construction API for the convenience of verification2016-06-17T15:24:01+10:00John ZhangIR construction API for the convenience of verification*Created by: wks*
# Problem
The only way for the current API to transfer an IR bundle from a client to a Mu micro VM is via the `load_bundle` API (for example, `microvm.load_bundle(""".typedef @i32 = int<32>\n""")`, or use a binary f...*Created by: wks*
# Problem
The only way for the current API to transfer an IR bundle from a client to a Mu micro VM is via the `load_bundle` API (for example, `microvm.load_bundle(""".typedef @i32 = int<32>\n""")`, or use a binary format), which passes a serialised IR format (text or binary). JVM takes this approach. Its serialisation format is "Java bytecode".
An alternative way to deliver a bundle is to let the client construct each node (type, function, basic block, instruction, ...) in the bundle by calling an API function (for example, `handle = microvm.make_instruction("ADD", i32, op1, op2)`) which returns a handle to the node, then passing this handle to the Mu micro VM. LLVM uses a similar approach: it provides constructors to each node class, and provides a "builder" to conveniently build a CFG.
Since the construction of the IR should be off-loaded from the micro VM, there should be a client-side library (we call it **libmu**) which constructs the IR for the client and provides an API to the client. `libmu` itself talks with the micro VM through implementation-dependent "private" APIs.
It is not clear whether **serialisation** or **construction by calling** is "better" because "better" has many definitions, but *the calling-based API must exist* because it is reported that it is very difficult to verify a parser.
## Comparing the two approaches
**serialisation**
* pros
+ language independent
+ The serialisation format can be standardised and (almost) all languages can access byte arrays.
+ faster than *construction by calling* when the client and the micro VM have different runtime environments
+ (e.g. Haskell to C, Java to C, ...)
+ FFI can introduce considerable cost. The Go lang impl performs a SWAP-STACK operation every time it calls C.
+ simple API
+ one API function `load_bundle` handles the entire IR format.
* cons
- difficult to verify (there are reports that it is very difficult to verify a parser)
- require a parser inside the micro VM:
- there may be a mixed approach to offload the parser outside the micro VM.
- not suitable for the end user, i.e. those who writes the high-level language compiler.
- Either the high-level compiler writer or some third party need to build a library for serialisation.
**construction by calling**
* pros
+ easy to verify
+ minimal for the micro VM
+ No parser in the micro VM is needed.
+ faster than *serialisation* when the client and the micro VM have similar runtime environments
+ (e.g. C to C, Java to Java, C to C++, Haskell to Haskell, ...)
+ The libmu library can be implemented in the same language as the client.
* cons
- bloated API
- There will be hundreds of API functions just to construct each IR node. All of them need to be verified.
- The internal states of libmu needs to be verified, but hopefully this can be easier than verifying the parser.
- paradigm impedance
- There is no single API that satisfies all languages. For example, if the API is defined in C, it may be unsuitable for a client written in Haskell. So ideally there should be a libmu for Haskell, whether that is part of the formally verified Mu/libmu pair or not.
# Details
## Cost of foreign function calls
Depending on the two languages calling each other, the cost can be trivial (such as C and C++) or huge. Java must go through JNI to call native functions. Go performs a swap-stack operation every time it calls C in order to work around blocking system calls in its M*N threading model. Other language implementations, such as Python, relies on `libffi`, which builds a native call by dynamically prepare its arguments, and the high-level language such as Python needs to convert C types to Python types and vice versa.
An experiment shows calling a simplest C function from Haskell introduces 30x overhead comparing to calling the same C function from another C function.
But from verification's point of view, the cost does not matter as long as it is spent in the client.
## Cost of serialisation
Serialisation is not free. In a simple experiment, serialising a CFG-like data structure to an intermediate format and then parse it in another module (both written in C) introduces a 10% overhead comparing to directly constructing the target CFG structure in the receiver while assuming the sender only holds opaque references. The major cost is memory allocation (where malloc is the bottleneck) and resolving the cross-references between nodes (where the hash map is the bottleneck).
Serialisation is not free, but is reasonably cheap. When foreign function call is expensive, serialisation can be used as an alternative.
## Not calling across languages
Since calling across languages is expensive, it is desirable to implement part of the libmu in the same (or similar) language the client is written in, and let libmu construct the data structure that is native to the micro VM.
For example, if the micro VM is written in C++, the libmu should construct a tree of C++ class instances as LLVM does.
Note that LLVM is designed and implemented in C++ and serves C/C++. It is not a problem if the only official API it provides is C++. There is a C binding, too, but it is trivial. However, Mu has a specification which allows multiple implementations. In this case, the micro VM core would not always be in C/C++ or any particular language.
However, if the micro VM is written in a managed language, such as RJava or Java, then it will be interesting:
* If both the client, libmu and the micro VM happen to share the same (or similar runtime), then the API calls can be cheap. The ideal case is being "metacircular", i.e. both the client and libmu run on the same micro VM as the micro VM itself. The cost is minimal.
* If libmu is written in C and provides the C API, then there is a semantic mismatching between the micro VM and libmu: *somewhere in libmu* must cross the line between two different runtimes, which introduces a FFI-like overhead, the cost of which depends on the concrete languages. Holding opaque references of objects in Mu requires such object to be pinned, or being held by some containers (like the `MuCtx` structure in the current API, which is not light-weight).
* If the client and libmu uses a different runtime as the micro VM ("Haskell on libmu on micro VM" is this case), it will pay for two levels of cross-runtime calls.
# Other concerns in design
## Does libmu need to be minimal?
Maybe. At least the verified libmu needs to be minimal.
There may be even higher-level libraries outside above in the client's world outside libmu. Those libraries are not minimal.
## How many languages should be supported?
Ideally the libmu should have both intimate knowledge of a particular Mu implementation, and intimate knowledge of the language the client is written in. In theory, there can be one (or more) libmu for each (micro VM impl, client language) pair.
Since C is so popular, we will define a C API for libmu. In theory,
1. there can be more than one C APIs
2. there can be APIs for other languages and they may or may not look like the C API. (preferably not, to avoid paradigm impedance)
## Mu CFG and the client CFG (or AST)
**How much should the client use the Mu IR CFG?** i.e. should the client construct Mu IR nodes and do transformations on it as LLVM does? Probably not.
LLVM is designed to be maximum: it attempts to be maximum and its CFG contains much information for optimisation, such as the "nsw" and "nuw" flag on the "add" and the "sub" instructions.
But the Mu IR is designed to be minimal and is only designed for the micro VM to consume. It does not contain much information that benefits the client.
It is possible that a client-side library performs IR transformations, but it is doubtful whether that IR is the same Mu IR. Many optimisations, such as whether `x+1 > x` is always true, depends on extra information (such as the "nsw" flag in LLVM) which the Mu IR do not provide.
I @wks believe the Mu IR is only generated as the last step of the client-side transformation, i.e. the next step is to deliver it into the micro VM.
# Towards the new API
## Micro-VM-to-client API
The existing controlling API does not need to be changed.
The bundle loading API can be removed. i.e. "how to load bundles into the micro VM" is implementation-dependent, which actually mean **libmu-dependent**.
## libmu-to-client API
This API needs to be carefully designed because it is part of the formal verification.
There should be a model of the internal states of libmu which includes:
* The set of handles which map to Mu IR nodes.
* The state of each Mu IR node.
* Isolation between threads.
During construction, each node can hold incomplete information at a given time. The Mu IR nodes may circularly refer to each other (currently they refer to each other via IDs), so it is desirable to allocate several nodes and then link them with each other.
If multiple threads can use the same libmu, the transition in its internal state must be properly handled.
Cares must be taken to select the minimum set of API functions, because the current Mu IR has 18 types and 37 instructions. The number of API functions may easily bloat to 100+ if too many CRUD commands are added.
## Micro VM-to-libmu interaction
This interface does not need to be public, but the proper handling of data structures in the micro VM is important. This interface is part of the verification.
In all cases, the choice of languages does matter. Properly chosen languages for the client and the micro VM will result in high performance and verifiability.
# Future languages
This section contains my @wks personal opinions. These affects my opinions on this API, too.
In the future, popular system programming languages will generally be higher-level than current languages (such as C). We already showed that a high-level language, such as RJava and Java, can produce high-quality language runtimes, such as JikesRVM.
Instead of relying on C, high-level system programming languages will gain direct control over low-level operations, such as raw memory access (pointers). These low-level stuff can be done even though the language itself still has very high-level features, such as garbage collection and object-oriented programming. It is possible (in my opinion at least) that eventually such high-level language can replace libc and directly interface with the kernel, thus eliminates the necessity of C except for some very rare cases.
This trend is already visible in several languages. C# already has unsafe pointers and unsafe native interface (P/Invoke). The Java API is mostly implemented in Java, unlike the old days when the standard Java library is mostly implemented in C++. There is also a [JEP to add unsafe native interface in Java](http://openjdk.java.net/jeps/191), which still has not become mainstream yet. However, OpenJDK already exposes some low-level stuff though the `sun.misc.Unsafe` class. RJava obviously used magic to gain low-level support. In Ruby, [ruby-ffi](https://github.com/ffi/ffi) is recommended over directly writing C modules.
With C being redundant, the high-level language (or runtime) may be optimised for internal interoperation (e.g. Java-to-Java calls will be faster and faster) at the expense of the interoperability with C (e.g. even unsafe native call may be costly. Object pinning or opaque handles are required for native programs, which would only run briefly.).
Since raw memory access will be faster while foreign function call will be slower, serialisation may have an advantage over calling (my experiment already shows this for Java and C). But this does not rule out the calling-based API because it may not be foreign calls, in which case it is still faster than serialisation. Ideally, in the meta-circular setting, if both Mu, libmu and the client are in Mu, then Mu-to-Mu function calls are virtually free.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/49Stack frame iterator2015-11-13T14:34:31+11:00John ZhangStack frame iterator*Created by: wks*
# Problem
The current stack introspection/OSR API is inefficient.
It selects a frame by a number. For example:
```
ctx->cur_func(ctx, stack, 1);
ctx->cur_func_ver(ctx, stack, 2);
ctx->cur_inst(ctx, stack, 3...*Created by: wks*
# Problem
The current stack introspection/OSR API is inefficient.
It selects a frame by a number. For example:
```
ctx->cur_func(ctx, stack, 1);
ctx->cur_func_ver(ctx, stack, 2);
ctx->cur_inst(ctx, stack, 3);
ctx->dump_keepalives(ctx, stack, 4, &values);
```
A real-world implementation may need to unwind the stack from the top, one frame at a time, until reached the frame. If the client needs to traverse through stacks of many frames, the O(n^2) complexity may be a performance bottleneck. One application is to use the Client API (or the equivalent Mu IR (common) instructions) to generate the stack trace for exception handling.
This link shows a real-world Java stack trace: https://ptrthomas.wordpress.com/2006/06/06/java-call-stack-from-http-upto-jdbc-as-a-picture/ Or click here: [jtrac-callstack.pdf](https://github.com/microvm/microvm-meta/files/24885/jtrac-callstack.pdf)
EDIT: well, this is a call graph, not a stack trace. But imagine something goes wrong in JDBC...
# Desired API
The API should provide a "frame cursor" data type, which refers to a frame in a stack. It can be generated for a stack, and iterate through its frames from top to bottom.
The introspection API `cur_func`, `cur_func_ver`, `cur_inst` and `dump_keepalives` will work on this "cursor" instead of a stack and a number.
The OSR API `pop_frame`, can, instead of popping one frame at a time, pop all frames above a particular "cursor". [Preliminary experiments](https://github.com/microvm/liblushan/blob/master/src/test_remote_stack_chop.c) show that this is possible with C programs and [libunwind](http://www.nongnu.org/libunwind/).
## The "frame cursor" type
The "frame cursor" type shall be an opaque reference to ~~a Mu frame~~ a cursor. The cursor holds the context of a "current frame", and can move down to the parent frame.
It must be platform-independent.
It could potentially be large (given the number of registers in a CPU). Therefore it is desirable to be **mutable** – making a fresh copy for each frame would be costly (Haskell programmers may disagree).
There are some subtle interactions between it and the GC. GC may modify references on the stack, but the API must hide this detail from the client. So the API should not expose raw CPU
The cursor may be allocated on the Mu heap, but also may not.
The cursor is only valid when the stack is still unbound. As soon as the stack is bound again, the stack may change in any ways and the cursor is invalidated.
So I can think of some possible solutions:
1. ~~Create a new type `frameref`, like our existing `threadref` and `stackref`.~~ Create a new type `framecursor`. It has reference semantics: it refers to a mutable structure internally held by the Mu VM.
* pro: A dedicated opaque type, the cleanest model.
* con: A new primitive type, pointing to a large structure, just for introspection? Well... maybe not that bad.
* choices: Is it managed by the GC? GC is the easiest way, but we may not be able to print stack trace for OutOfMemoryException. (really? I am not sure) Alternatively it may be required to be closed explicitly.
2. Use `ref<void>` for the "cursor" type. Its content is allocated on the heap, opaque to the client, and may be platform-specific. When invalidated, the object remains live, but the content becomes invalid.
* pro: No new types introduced
* con: This implies the data content is on the Mu heap.
* con: The GC must have special knowledge of such a heap object, which is not a regular Mu object.
3. Use `ptr<void>`. Similar to `ref<void>`, but implies it is not GC-ed.
* pro, con: same as `ref<void>`
## Example Mu API
This example prints a stack trace on Mu.
```c
// This trap handler prints the stack trace.
void stack_printing_trap_handler(
MuCtx *ctx, // Equivalent to JNIEnv
MuThreadRefValue thread, // The current thread
MuStackRefValue stack, // The current stack
int wpid, // Watchpoint ID. 0 for ordinary traps.
MuTrapHandlerResult *result, // How the Mu thread should resume?
MuStackRefValue *new_stack, // Which stack shall the Mu thread bind to? Usually the old stack.
MuValue *values, int *nvalues, // What values shall be passed on the stack?
MuRefValue *exception, // What exception shall be thrown on that stack?
MuCPtr userdata) { // Client-specific data
ClientCompiler *clientCompiler = (ClientCompiler*) userdata; // The client-specific compiler.
// Get a cursor to the top of the stack.
MuFrameCursorValue *cursor = ctx->get_stack_cursor(ctx, stack);
// Iterate through.
int func_id;
while((func_id = ctx->cur_func(ctx, cursor)) != ID_OF_MY_STACK_BOTTOM_FUNC) {
if (func_id == 0) { // func_id == 0 means the frame is a native frame.
printf("This frame is native");
} else { // It is a Mu frame.
// Get the ID of the current Mu instruction.
int inst_id = ctx->cur_inst(ctx, cursor);
// The client looks up the source-level information.
SourcePosition sp = clientCompiler->getSourcePosition(inst_id);
printf("File: %s, Function: %s, Line: %d, Column: %d: %d\n",
sp.file, sp.func, sp.line, sp.column);
}
}
printf("End of stack trace\n");
// Close the cursor. (Alternatively let the GC close the cursor.)
ctx->close_cursor(ctx, cursor);
// We want to return to the old stack and continue normally,
*new_stack = stack;
// but do not pass any values.
*nvalues = 0;
// Continue normally (not throwing exception).
return MU_REBIND_PASS_VALUES. // passing 0 values
}
```
# Existing approaches
[libunwind](http://www.nongnu.org/libunwind/) is a portable way to walk stack frames in the C language. There are different implementations on different platforms (OSX has its own implementation), but the API is the same.
`unw_getcontext` creates a `unw_ucontext_t` structure for the current stack. `unw_init_local` creates a `unw_cursor_t` on the context. Then the user can call `unw_step` on the cursor to step through stack frames. `unw_get_reg` gets the value of a machine register from a cursor. The cursor keeps the state of registers (usually it is only able to recover callee-saved registers) at the resumption points (return addreses) of frames.
Example:
```c
#define UNW_LOCAL_ONLY
#include <libunwind.h>
void show_backtrace (void) {
unw_cursor_t cursor; unw_context_t uc;
unw_word_t ip, sp;
unw_getcontext(&uc);
unw_init_local(&cursor, &uc);
while (unw_step(&cursor) > 0) {
unw_get_reg(&cursor, UNW_REG_IP, &ip);
unw_get_reg(&cursor, UNW_REG_SP, &sp);
printf ("ip = %lx, sp = %lx\n", (long) ip, (long) sp);
}
}
```
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/48Mu IR rewriting library2015-10-29T16:47:55+11:00John ZhangMu IR rewriting library*Created by: wks*
Mu aims to be minimal, but such minimalism has made the construction of Mu clients hard. A client-level library can ease the client's job by accepting a slightly higher-level variant of the Mu IR and translating that h...*Created by: wks*
Mu aims to be minimal, but such minimalism has made the construction of Mu clients hard. A client-level library can ease the client's job by accepting a slightly higher-level variant of the Mu IR and translating that higher-level IR to actual Mu IR code (and/or HAIL scripts and/or subsequent API calls).
This issue tracks tasks that should be done at this layer.
* Pre-SSA to SSA converter. #44
- Writing Mu IR in the SSA form is hard, and the goto-with-values form is even harder. The library should automatically convert ordinary CFGs into the goto-with-values form using well-known algorithms.
* Platform-dependent constant values. #47
- Some ahead-of-time clients (notably C or other "traditional" languages) exposes platform details to the programmer as compile-time constants, but binding those values too early will make the object code non-portable. The rewriter should help the client determine these values so that the client compiler can be strictly ahead-of-time.
* Merge Mu IR and HAIL: #29 #46
- The library is not minimal. Integrating both languages will make the client's job easier.
* Annotations
- This will allow clients to attach arbitrary information to the Mu IR code, which can help the client introspect the program at run time.
- Note that if we need to use system debuggers (such as GDB), then these annotations need to go through the micro VM itself because it is the micro VM's responsibility to generate object codes (including the DWARF debug info).
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/47Sizeof?2016-06-21T13:56:11+10:00John ZhangSizeof?*Created by: eliotmoss*
We have encountered an interesting issue in developing the C client, namely how to deal with union types. Our thought was to define a separate struct type for each union variant, and then to cast to the appropri...*Created by: eliotmoss*
We have encountered an interesting issue in developing the C client, namely how to deal with union types. Our thought was to define a separate struct type for each union variant, and then to cast to the appropriate struct type when accessing a particular variant. (Note that this requires structs to be heap or alloca allocated, which I think is ok -- C does not treat them as single values that can go into a register, etc., as I recall.)
The problem we have is that because Mu defines the detailed layout of a struct on a given target, we cannot determine the sizes of the structs, and thus we cannot determine the maximum size, something we need in order to allocate an instance of a union type.
We observe that Mu gives no way to as the size of a type (or to get the offset of a field in a struct or an element of an array). While such information may not be used for typical accesses, we now see that it has at least one important use case. Given that C programs are typical way-ahead-of-time compiled, we do not consider it appropriate to generate Mu for C code only at the last minute.
We suggest that Mu provide means to determine sizes and perhaps to do simple load-time (if that is the right word) computations over these constants. Here is some possible syntax (admitting that I have not thought about it long or deeply yet):
.sizeof **name** **type**
Define **name** to be the constant that is the number of bytes needed for **type**.
.sizeof **name** **op** **t1** **t2** ... **tn**
Define **name** to be the sizes of **t1** through **tn** combined with operator **op**, where **op** can be at least **max** and **sum**.
Alternatively, we could define names for the sizes of each type, and a more general constant-computing form:
.define **name** **op** **e1** ... **en**
This would define **name** to be **op** applied to the **ei**. We could provide a suitable range of operators.
For offsets we could have:
.offset **name** **struct or array type** **idx**
This would define **name** to be the constant giving the offset of the **idx**'th field/element of the given struct or array type.
The point is to allow target-dependent computations over constants to be written in a target independent (symbolic) way. I believe this would meet the needs of C.https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/46Initialized aggregate values / objects2018-06-28T23:11:31+10:00John ZhangInitialized aggregate values / objects*Created by: eliotmoss*
For the C client, and more particularly for the Java client, we would like to be able to describe an initialized aggregate value. The obvious case is string literals for C. In Java we would like to be able to d...*Created by: eliotmoss*
For the C client, and more particularly for the Java client, we would like to be able to describe an initialized aggregate value. The obvious case is string literals for C. In Java we would like to be able to describe a Java char array for these. A Java array will be a HYBRID object. Not having these means unpleasant work-arounds involving element by element initialization by code that we have to arrange to have invoked before the objects could possibly be used.https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/45Multiple return values2018-06-28T23:11:32+10:00John ZhangMultiple return values*Created by: wks*
*As for 2016, this change is already merged in both the spec and the refimpl.*
This issue discusses the design choices of allowing functions and Mu instructions to return more than one value (or a tuple of values)...*Created by: wks*
*As for 2016, this change is already merged in both the spec and the refimpl.*
This issue discusses the design choices of allowing functions and Mu instructions to return more than one value (or a tuple of values).
# Status quo
The one-return-value assumption has **a chain of causes and effects** to the design of the IR.
First of all, Mu [**functions**](https://github.com/microvm/mu-spec-goto/blob/master/uvm-ir.rest#function-signature-definition) have multiple parameters and exactly one return value. i.e. `f : (P1,P2,P3,...,Pm) -> R` This is the tradition of C-like languages as well as LLVM. In case it does not return useful values, it returns `void`
Following this tradition, all Mu **instructions** return exactly one value, too. e.g. `%rv = CALL <...> @some_func (%a1 %a2 ... %am)`. In case more than one values need to be returned, a function shall return a `struct<T1 T2 ... Tm>`. In case no useful value is returned, it returns `void`. For example:
```
%result = CMPXCHG WEAK SEQ_CST SEQ_CST <T> %loc %expected %desired
// %result is a struct<T @i1>. The second result indicates if it is successful.
%oldval = EXTRACTFIELD <@cmpxchg_result_T 0> %result
%is_successful = EXTRACTFIELD <@cmpxchg_result_T 1> %result
```
Since all Mu instructions have a single return value, the [SWAPSTACK](https://github.com/microvm/mu-spec-goto/blob/master/instruction-set.rest#swapstack-instruction) instruction shall have exactly one return value, too. That return value is the value received from another stack (or injected by the client). Since one stack also passes value using the `SWAPSTACK` instruction, **SWAPSTACK takes one parameter**, too, since *the peer only receives exactly one return value*. e.g. in the sender: `SWAPSTACK %receiver ... PASS_VALUE <@T> %val`; in the receiver: `%retval = SWAPSTACK %sender RET_WITH <@T> ...`
Since a swapped-away stack always stops at so called "**resumption point**" (resumable by swapstack) instructions (CALL, TRAP, WATCHPOINT, SWAPSTACK, CCALL), all of which receive exactly one return value, then the **state of a stack** is `READY<T>` when swapped away, where T is a single type -- the return type of the *resumption point*. This makes one important special case: **a newly-created frame** (by [NEWSTACK](https://github.com/microvm/mu-spec-goto/blob/master/instruction-set.rest#newstack-instruction) or [push_frame](https://github.com/microvm/mu-spec-goto/blob/master/uvm-client-interface.rest#on-stack-replacement) (OSR)). Currently since a function takes multiple parameters but SWAPSTACK passes only one argument, there is no way to supply all arguments during SWAPSTACK. As a compromise, NEWSTACK initialises all arguments and the subsequent SWAPSTACK passes `void`.
As a side effect, **the return value uniquely identifies the instruction**. This can bring some profits: during OSR, the variable that holds the return value can identify the OSR-point (instructions that may be the "current instruction" in a `READY<T>` stack) instructions (CALL, TRAP, WATCHPOINT, SWAPSTACK, CCALL, all happen to be *resumption points*. This is actually intentionally designed as so.). This is especially useful for **the TRAP instruction**, in which case the trap handler always first identify which TRAP is triggered. Because of the one-to-one mapping between instructions and return values, the term "SSA variable of an instruction" and "the instruction" are used interchangeably.
# Proposed change
The change starts with adopting "multiple return values".
First, **functions now return 0 or more values**. i.e. `f : (P1,P2,P3,..,Pm) -> (R1,R2,...,Rn)`. For compatibility with C programs (in the unsafe native interface) , a 0-tuple is returned when the C function returns `void`. i.e. either `c_function: (P1,P2,...,Pm) -> (R1)` or `c_function: (P1,P2,...,Pm) -> ()`.
Then **Mu instructions may return 0 or more values**. When no useful values are returned, the instruction returns `()`. The `CALL` instruction has exactly as many return values as the callee's return values. For example:
* `(%oldval %succ) = CMPXCHG WEAK SEQ_CST SEQ_CST %loc %expected %desired`:
* `(%rv1 %rv2 %rv3) = CALL <..> @func (%a1 %a2 ...)`
* `RET %rv1 %rv2 %rv3`
* `() = BRANCH %blah(....)` or just `BRANCH %blah(...)`
Then **SWAPSTACK can pass more than one values to the swappee**. The swappee expects to receive a list of values (statically decided at the particular SWAPSTACK instruction). If the swapper does not match the value required by swappee, it has undefined behaviour. For example:
```
// receiver:
(%rv1 %rv2 %rv3) = SWAPSTACK %sender_stack RET_WITH <@T1 @T2 @T3> PASS_VALUE <> ()
// sender:
() = SWAPSTACK %receiver_stack RET_WITH <> PASS_VALUES <@T1 @T2 @T3> (%v1 %v2 %v3)
// bad sender:
() = SWAPSTACK %receiver_stack RET_WITH <> PASS_VALUES <@T1 @T2> (%v1 %v2) // ERROR: too few values
```
Since all *resumption point* instructions (CALL, TRAP, WATCHPOINT, SWAPSTACK, CCALL) may receive multiple values (in fact, all instructions can), **the state of a "swapped-away" stack is `READY<T1 T2 ... Tn>`**. When binding a thread to a stack, exactly that many values need to be bound.
Then **the state of a newly created frame is `READY<T1 T2 ... Tn>`** where T1 T2 ... Tn are the parameter types of the function (also the entry block). **NEWSTACK no longer supply arguments**, but **the first SWAPSTACK supplies the parameters to the new frame**. For example:
```
.funcsig @sig = (@T1 @T2 ... @Tm) -> (@R1 @R2 ... @Rn)
.funcdef @f VERSION %v1 <@sig> {
%entry(<@T1> %p1 <@T2> %p2 .. <@Tm> %pm):
...
}
// in another function:
%new_stack = NEWSTACK <@sig> @a
// later
SWAPSTACK %new_stack RET_WITH <...> (...) PASS_VALUES <@T1 @T2 ... @Tm> (%v1 %v2 ... %vm)
```
Now **not all instructions are identifiable**. If an instruction does not return value, e.g. a trap that does not expect values from the client (which is the most common case): `TRAP <> KEEPALIVE(...)`, then there is no identifiers for these instructions. As a compromise, explicit names can be given to instructions using a slightly different syntax:
```
() = [%the_important_trap] TRAP <> KEEPALIVE (%local %vars)
// or simply
[%the_important_trap] TRAP <> KEEPALIVE (%local %vars)
// it also works if there are return values:
(%rv1 %rv2) = [%the_important_call_site] CALL <@sig> @func (%arg1 %arg2)
```
And **SSA variables** and **instructions** completely divorce. Now an instruction has **a list of results**, each of which is an SSA variable. The new SSA variable hierarchy is:
- SSA variable
- global variable
- constant, global cell, function, exposed function
- local variable
- parameter
- one result of an instruction (was just "instruction")
This change also undoes https://github.com/microvm/microvm-meta/issues/43, a previous proposal to make void more like "unit". Now we have the real "unit" return value: `()`, and the `void` type has only two use cases:
* `hybrid<void T>` for hybrids without a fixed part, and
* `ref<void>`, `iref<void>`, `weakref<void>` and `ptr<void>`, for referencess/pointers to "anything".
# Coroutine Abstractions in High-level Languages
## Similar to the old Mu model
Python's generator is similar to the old model. All arguments are supplied when the generator is created. The first `g.send()` must send `None`.
```python
def gen(a,b,c):
print("Hello!", a, b, c)
d = (yield)
print("d = ", d)
return
g = gen(1,2,3) # initialises a, b and c
g.send(None) # must be None, otherwise TypeError: can't send non-None value to a just-started generator
# prints Hello 1 2 3
try:
g.send(4) # received by d. The generator throws StopIteration to the main coroutine.
# prints d = 4
except StopIteration as e:
print("bye")
```
If the new model is desired, the program can be rewritten as:
```python
def gen():
def actual_gen():
a,b,c = (yield)
print("Hello!", a, b, c)
d = (yield)
print("d = ", d)
return
ag = actual_gen()
ag.send(None)
# now ag stops at the first yield
return ag
g = gen() # just creates the actual_gen
g.send((1,2,3)) # received by a,b,c at the first yield. a,b,c are not really parameters.
# prints Hello 1 2 3
try:
g.send(4) # received by d. The generator throws StopIteration to the main coroutine.
# prints d = 4
except StopIteration as e:
print("bye")
```
## Similar to the new model
Lua's coroutine model is similar to the new model. Arguments are supplied at the first `coroutine.resume`.
```lua
function gen(a,b,c)
print("Hello", a, b, c)
local d = coroutine.yield()
print("d = ", d)
return
end
co = coroutine.create(gen)
coroutine.resume(co, 1,2,3) -- initialises a, b, c. If less arguments are supplied, other parameters receive nil
-- prints Hello 1 2 3
coroutine.resume(co, 4) -- received by d
-- prints d = 4
```
If the old model is desired, the code can be rewritten as:
```lua
function make_gen(a,b,c)
local function actual_gen(a, b, c)
coroutine.yield()
print("Hello", a, b, c)
local d = coroutine.yield()
print("d = ", d)
return
end
local coro = coroutine.create(actual_gen)
coroutine.resume(coro, a,b,c) -- immediately initialises a,b,c. They are not parameters of actual_gen, but "up-values" of it.
return coro
end
co = make_gen(1,2,3)
coroutine.resume(co) -- It is stopping on its first empty yield() point
-- prints Hello 1 2 3
coroutine.resume(co, 4) -- received by d
-- prints d = 4
```
In conclusion, both models of swapstack are equivalent. It just takes a few initial swapstack operations to mimic each other.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/44Pre-SSA form2016-06-17T15:23:52+10:00John ZhangPre-SSA form*Created by: eliotmoss*
We have concluded that while the official form of Mu IR is SSA form (but see Issue #18 for current thoughts on how to represent that form), many clients will find it more convenient to generate something that is ...*Created by: eliotmoss*
We have concluded that while the official form of Mu IR is SSA form (but see Issue #18 for current thoughts on how to represent that form), many clients will find it more convenient to generate something that is mostly Mu IR but that is not in SSA form, and that is it further desirable to offer a standard tool to convert from some "pre-SSA" form to proper SSA form. This tool may operate in a stand alone manner or be more in bed with an implementation of Mu.
We propose the following specific pre-SSA form, according to how it differs from SSA-form Mu.
1. "SSA-variables" may be assigned more than once; however, any individual such variable must be used in a type-consistent manner.
1. PHIs may be omitted (or, in the proposal of #18, values may be omitted at branches and variables omitted at labels)
1. For convenience we introduce a "copy" operator, var = ID <T> arg, which takes one argument arg of type T and assigns it to variable var. This operator seems to be convenient sometimes from a client perspective.
The converter to SSA-form will perform live-ness analysis and add variables to labels and values to branches as necessary, checking for type consistency. If some variable is live but not initialized, then the converter will insert a safe initialization (to 0 or 0.0 for numeric types, null for a pointer, etc.) at the latest possible point that does not interfere with existing assignments to the variable. (Optimization may move the initialization earlier as deemed appropriate.)
We will undertake to develop the converter in Scala or Java.https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/43Reduce special cases involving the void type2016-06-17T15:23:50+10:00John ZhangReduce special cases involving the void type*Created by: wks*
# The current status
The `void` type is a special type in the Mu type system. It has no value, and thus many instructions/mechanisms have special cases for the `void` type.
**Instructions that have special cases ...*Created by: wks*
# The current status
The `void` type is a special type in the Mu type system. It has no value, and thus many instructions/mechanisms have special cases for the `void` type.
**Instructions that have special cases for `void`**:
- `RET` and `RETVOID`: Since `void` has no value (In fact it does. The return value of the `BRANCH` instruction, for example, is a value of the `void` type.), we needed a special syntax to return `void`, thus we have `RETVOID`.
- The "new-stack clause" of the `SWAPSTACK` instruction: `PASS_VALUE <T> %val` and `PASS_VOID`: for the same reason why we have `RET` and `RETVOID`.
**The trap handler has a special case for `void`**:
Just like `SWAPSTACK`, the trap handler may rebind the thread to a stack and either "pass a value" or "pass `void`" or "throw an exception".
## Other existing uses
**Instructions that always return `void`**: `BRAHCN`, `BRANCH2`, `SELECT`, `TAILCALL`, `RET`, `RETVOID`, `THROW`, `STORE`, `FENCE`, some common instructions: `@uvm.kill_stack`, `@uvm.thread_exit`, `@uvm.native.unpin`, `@uvm.native.unexpose`, `@uvm.meta.load_bundle`, `@uvm.meta.load_hail`, `@uvm.meta.pop_frame`, `@uvm.meta.push_frame`, `@uvm.meta.enable_watchpoint`, `@uvm.meta.disable_watchpoint`, `@uvm.meta.set_trap_handler`: These instructions do not return meaningful values.
**Instructions that may return `void` sometimes**: `CALL`, `TRAP`, `WATCHPOINT`, `CCALL`, `SWAP_STACK`: The callee, client, swappee, or whatever the other end of communication is, may not return meaningful values.
## Current properties of `void`
`void` can only be used in 3 cases:
1. As the type of allocation units that do not represent values. Hence it is usable as the referent type of reference types and pointer types. e.g. You can run `NEW <@void>`. Each time you NEW a void, you have a **new** empty object, not the same as any other.
2. As the fixed part of a hybrid to indicate the absence of the fixed part. e.g. `hybrid<void int<64>>` is a variable-length array of `int<64>`, without a fixed part.
3. As the type of instructions or the return type of functions that do not return values. e.g. the `BRANCH` instruction returns `void`.
Other properties:
- `void` has no value (in fact it does, as mentioned before)
- `void` is neither a scalar type nor a composite type.
- Only scalar types can be used for memory access: `LOAD`, `STORE`, ...
- Only composite types have other types as components: fields/elements
- `void` is nether storable nor loadable. It does not contain other parts. It cannot be part of a struct/array/vector. i.e. there is no "array of void". The "fixed part of a hybrid" is an exception.
- `void` is native-safe: It can be returned from native functions; and there can be `uptr<void>`.
# Proposed changes
**value of `void`**: Instead of "having no value", `void` now has exactly one value: NULL. This is consistent with Python: `NoneType` has only one value `None`.
**`void` constant**: We reuse the `NULL` literal to create a "void constant":
```c
.const @VOID <@void> = NULL // The only possible value of void.
// For the sake of consistency, we require the client to define it.
//
// Alternative: make it a pre-defined value, such as the @uvm.predef.void_t type
// and the @uvm.predef.VOID value. We could define @uvm.predef.i8, @uvm.predef.i16,
// @uvm.predef.i32, @uvm.predef.i64, @uvm.predef.float, @uvm.predef.double,
// @uvm.predef.ref_void, @uvm.predef.ref_i32..., @uvm.predef.but the choice seems too arbitrary.
```
All existing instructions that return `void` return this `NULL` value. In theory, the following snippet is valid, but stupid:
```c
%entry:
%x = BRANCH %bb1
%bb1:
RET <@void> %x // return void. Should have said RET <@void> @VOID
// or even "RET @VOID" omitting the type argument, because RET always returns the
// return type of the current function. ADD, SUB, MUL ... would have to infer the operand
// types if the operand type is not provided, but RET does not need to be inferred: the
// function return type is explicit.
```
**Remove the `RETVOID` instruction**: Use `RET <@void> @VOID` instead, or simply `RET @VOID`.
**Remove the `SWAPSTACK` clause `PASS_VOID`**: Use ``PASS_VAL <@void> @VOID`` instead. Unlike `RET`, the type parameter here is necessary: the type that the swappee expects is dynamic. It may expect a different type at a different `SWAPSTACK` site. Guessing the wrong type while swapping has undefined behaviour.
**Trap handlers no longer needs a PASS_VOID return case**: Instead, pass a `NULL` constant.
## New ways to use `void`
In addition to the existing three ways, i.e. empty objects, hybrid fixed part, empty return value, `void` can now be used in the following ways:
- In `RET` to return from a function of `void` return type.
- In `SWAPSTACK` to swap to a stack that does not expect to receive a value (it receives the `NULL` value of the `void` type).
- In the trap handler, rebind the stack which expect void.
They all fit into the category that "the other end of communication" does not pass a value.
## Things that should still be forbidden
**`void` must not be a parameter type**: I don't have a very compelling reason, but it is completely useless (only increases the apparent arity of a function).
**`void` must not be part of a struct/array/vector or the variable part of a hybrid**: Not allowing this will gain us a very nice property: each field/element in any struct/array/vector/varpart has a different offset. In `struct<@i32 void void void void @x>`, since `void` should have size 0 and alignment 1 (in the sense `void` can be allocated at any address *a* such that *a* % 1 == 0), void does occupy space. Then all of the void fields are at the same offset as `@x`. Another reason: C does not allow void to be a struct field.
**Empty structs (`struct<>`) should be forbidden**: For the same reason as `void` as a field. Just use `void` because it is so special. C forbids empty structs, too, but GCC allows it.
# How about LLVM?
LLVM IR has two syntax for the `ret` instruction:
- `ret <type> <value>` for example: `ret i32 100`
- `ret void` this returns void.
LLVM does not have "void constant", either, since `void` is not a "first class type".
LLVM `void` is not a "first class type". Only `void` and function types are not "first class type". LLVM has both "function" types and "pointer to function" types.
LLVM LangRef does not say parameter types cannot be `void`, but `void` is never used as parameter types. In C, `void` is an incomplete type, and thus cannot be a parameter type.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/42Redesign the undefined function handler API2018-06-28T23:11:32+10:00John ZhangRedesign the undefined function handler API*Created by: wks*
# Current status
Currently there are two handlers: *trap handler* and *undefined function handler*. They are different.
In trap handlers:
* Thread is temporarily unbound from the stack, and is rebound after tr...*Created by: wks*
# Current status
Currently there are two handlers: *trap handler* and *undefined function handler*. They are different.
In trap handlers:
* Thread is temporarily unbound from the stack, and is rebound after trap handling.
* Stack is unbound and is ready for introspection.
* OSR (pop frame, push frame) is possible.
In undefined function handlers:
* Thread is, conceptually, still executing the CALL instruction that calls the undefined function.
* The stack is in the RUNNING state, not available for introspection or OSR
* The client must define the function. Then CALL will be tried again. i.e. If the client does not define the function, it will loop forever.
This asymmetry is complicating the interface.
1. The undefined function handling API **leaves no room for errors in loading the bundle**. If, in Java, the client puts a stub for an unloaded method, and there is an error when loading the class, then the client has no other things to do than "defining" the method as "doing nothing but throwing an exception". This isn't very nice.
2. It leaves the stack and the thread in a strange state: they cannot continue, but still "RUNNING".
3. Introspection and OSR are impossible.
# Proposed change
I propose unifying the two interfaces.
Let undefined functions behave like:
```
.funcdef @undefined_function VERSION noversion <sig> (a1 a2 a3 ...) {
%entry:
TRAP <void> KEEPALIVE (a1 a2 a3...)
undefined behaviour after TRAP
}
```
That is, an undefined function behaves like executing a trap with all parameters as keep-alive variables.
Add an extra API call:
```
MuID (*cur_func)(MuCtx *ctx, MuStackRefValue stack, int frame)
```
It returns the ID of the function of the current frame. It works even if the function is not defined. Return 0 if the frame is native.
Modify the `cur_func_ver` API call: It returns 0 when the frame is a frame without version (undefined function).
So a client can identify an undefined function by "cur_func != 0 && cur_func_ver == 0". FYI:
* ``cur_func != 0 && cur_func_ver != 0``: a defined Mu function
* ``cur_func != 0 && cur_func_ver == 0``: an undefiend Mu function
* ``cur_func == 0 && cur_func_ver == 0``: native frame
* ``cur_func == 0 && cur_func_ver != 0``: impossible
Modify the `cur_inst` API call: It returns 0 if the frame is just created (before the first instruction), is native, or *if the function is not defined*.
Modify the `dump_keepalives` API call: It dumps keep-alive variables for OSR point instructions, or *the arguments to an undefined function*. Cannot be used on native frames or a just-created frame.
The `pop_frame` and the `push_frame` API behave like before. Specifically to undefined functions, popping an undefined function frame reveals its caller to the top of the stack. After loading a bundle that defines the function, a `push_frame` can re-creates the callee frame, but this time the callee is defined. The client can also choose not to rebind to the same stack, but swap away. Or just throw an exception (probably `ClassNotFoundException`) to the caller without defining the callee.
# About the NEWSTACK instruction
Creation of the stack will be successful, but it will trap when executing the undefined function after binding the stack to a thread.https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/41Checklist for the first final version of the specification2018-06-28T23:11:32+10:00John ZhangChecklist for the first final version of the specification*Created by: wks*
- [X] HAIL #29
- [X] Unsafe native interface. #24
- [X] API in C (header)
- [X] API in C (documented)
- [X] API as Mu instructions (common instructions). #36
- [X] Adjust the undefined function API
*Created by: wks*
- [X] HAIL #29
- [X] Unsafe native interface. #24
- [X] API in C (header)
- [X] API in C (documented)
- [X] API as Mu instructions (common instructions). #36
- [X] Adjust the undefined function API
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/40Mu Client Interface as C Binding2015-08-21T15:55:43+10:00John ZhangMu Client Interface as C Binding*Created by: wks*
The current API is expressed in a language-neutral form, and it is the implementation that decides how to implement such an interface. Programmers still need to resort to implementation-specific interfaces to actually ...*Created by: wks*
The current API is expressed in a language-neutral form, and it is the implementation that decides how to implement such an interface. Programmers still need to resort to implementation-specific interfaces to actually use a particular Mu implementation.
Since C is so widely used as a system programming language, the Mu client interface (a.k.a the API) should be expressed as data types and function calls in the C programming language. If the client is not in C, it usually still has a C FFI.
# The API in C
Resources in Mu are exposed in opaque types which have reference semantics: they can be copied and still refers to the same resource.
* `mu_micro_vm_t`: a reference to a Mu micro VM instance.
* `mu_client_agent_t`: a reference to a client agent.
* `mu_handle_t`: a handle to a value in the Mu type system exposed to the client.
Messages are C functions. Like JNI, they are contained in a struct: `typedef struct mu_api_msgs {...} mu_api_msgs_t`. In this way, the client in C does not need to link against any libraries when compiling. The reason is, for a Mu micro VM implemented in a higher-level language (like the reference implementation in Scala), the binding of the callable C function is generated very late, later than even the loading time, and has no access to the native loader.
For example, assume there is a `mu_api_msgs_t* msgs` defined:
```c
mu_client_agent_t ca = ...
char buf[999999];
int sz;
// load file into buf
msgs->load_bundle(ca, buf, sz); // Load a bundle
// Putting C values into Mu
mu_handle_t h1 = msgs->put_schar(ca, 127);
mu_handle_t h2 = msgs->put_sshort(ca, 32767);
mu_handle_t h3 = msgs->put_sint(ca, 42);
mu_handle_t h4 = msgs->put_slong(ca, 42);
mu_handle_t h5 = msgs->put_slonglong(ca, 999999999999999);
// Converting Mu values to C
int v3 = msgs->to_sint(ca, h3);
unsigned long v4 = msgs->to_ulong(ca, h4); // just treat the int as unsigned
```
Mu-level flags are C preprocessor macros. They have type int.
```c
msgs->store(SEQ_CST, hLoc, hNewVal); // SEQ_CST is a macro
```
Callbacks, including the trap handler and undefined function handler, have defined signatures:
```
typedef mu_trap_return_status_t (*mu_trap_handler_t)(
mu_client_agent_t ca,
mu_handle_t stack,
mu_handle_t thread,
int watchpoint_id,
mu_handle_t &new_stack,
mu_handle_t &data_passed,
mu_handle_t &new_exception,
mu_api_msgs *msgs,
void *user_data);
typedef void (*mu_undefined_function_handler_t)(
mu_micro_vm_t microvm,
int funciton_id,
mu_api_msgs *msgs,
void *user_data);
```
These functions are registered via the `msgs->register_trap_handler` and `msgs->register_undefined_function_handler` API messages. In their parameters, the `user_data` is an arbitrary pointer provided by the client in an implementation-specific manner (see below).
# Implementation-defined behaviours
Some aspects of the C binding are implementation-specified. They include:
* How to create a Mu micro VM? Options are:
1. The C executable creates the Mu instance.
2. Mu loads the C dynamic library.
3. Mu starts separately and C connects to the existing instance in the same process.
4. C connects to a Mu instance in a different process, or a different machine.
* Options in creating Mu instances. Options are:
1. Heap size. Giving a heap size means the Client determines the heap size rather than Mu automatically decide its own storage.
2. Global data space size. Setting this value means the global data may have their own storage. Actual implementation could use the heap space, too.
3. Stack size. Similarly, this is too implementation-specific.
* What happens during initialisation?
1. Mu calls a C function to initialise the client, and the client provides a `void*` to Mu for the client's own context. (note: in this case, it is Mu loading C rather than C creating Mu.)
2. C creates a Mu instance, and sets its `void*` user data in a proprietary API message.
# Open questions
* Should we allow each Mu implementation have its own "namespace"? The opaque types (`mu_micro_vm_t` and so on) are opaque, but different implementations may have different representations. The current C binding design forbids one C program working with more than one Mu implementations (though it is okay to work with more than one *instances* of the same implementation).
* JNI does not solve this problem, either.https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/39Call-back from native to Mu2016-09-06T20:28:02+10:00John ZhangCall-back from native to Mu*Created by: wks*
# Overview
## Rationale
Some existing C libraries or system interfaces use call-back functions, i.e. user-provided function pointers which are called by C or system libraries. Mu should provide appropriate mechan...*Created by: wks*
# Overview
## Rationale
Some existing C libraries or system interfaces use call-back functions, i.e. user-provided function pointers which are called by C or system libraries. Mu should provide appropriate mechanisms to interface with those libraries.
This is part of the (unsafe) native interface. See super issue: https://github.com/microvm/microvm-meta/issues/24
## Exposing appropriate Mu functions as C-style function pointers
"Appropriate" Mu functions must only use the following types as their parameter types or return types: `int<n>`, `float`, `double`, `vector<T>`, `ptr<T>` or `struct` types whose components are these types. In the case of `ptr<T>`, `T` can also be `array<T n>` or `hybrid<F V>` where `T`, `F` and `V` are one of the above types. In other words, (traced) references and Mu-specific opaque types are not allowed.
The Mu ABI will be designed to be compatible with the C calling convention as defined by the platform ABI.
**way 1**: (simple) Mu functions are declared with the optional `WITH_FP` clauses to create their associated C-style function pointers. For example:
```
.funcdecl @some_func WITH_FP(@fp_some_func DEFAULT @COOKIE) <@sig>
.funcdef @other_func VERSION @other_func_v1 WITH_FP(@fp_other_func DEFAULT @COOKIE) <@sig2> WITH_FP @fp_other_func (%param0) {
...
}
```
With the above definitions, `@some_func` has type `func<@sig>`, which is a Mu function reference value. `@fp_some_func` has type `funcptr<@sig>`, which is a C-style function pointer. Similarly `@other_func` is a `func<@sig2>`, while `@fp_other_func` is a `funcptr<@sig2>`. `DEFAULT` is the calling convention. `@COOKIE` is a "cookie" (see *way 2* below).
The Mu IR program or the API can pass the function pointer to the native program. When called, the Mu function will run and return its return value to the native caller.
* pros:
1. simple
2. The native funcptr is immediately available after loading the Mu bundle.
* cons: does not support "closures" well. Some languages/implementations (e.g. LuaJIT) would like to expose closures (rather than just functions) to C as callbacks.
**way 2**: (complex) Mu functions are exposed with a run-time invocation of a Mu instruction or a Mu API message.
Format:
* Instruction: *fp* = `EXPOSE_MU_FUNC` `<` *sig* `>` *mufunc* *cookie*
* API: *fpHandle* = ca.exposeMuFunc( *hMuFunc*, *hCookie* )
The resulting *fp* has type `funcptr<sig>` and can be called from C. A function can be exposed multiple times, and the resulting function pointers are mutually inequal. The *cookie* is an `int<64>` value associated to the resulting function pointer. If a Mu function is called through a particular function pointer, a special instruction `NATIVE_COOKIE` will return the associated *cookie* value.
Example:
```
%fp1 = EXPOSE_MU_FUNC <@sig> @some_func @some_int64_value
%fp2 = EXPOSE_MU_FUNC <@sig> @some_func @other_int64_value
...
UNEXPOSE_MU_FUNC %fp1
UNEXPOSE_MU_FUNC %fp2
// in @some_func
%cookie = NATIVE_COOKIE
%eq = EQ <@i64> %cookie @some_int64_value
...
```
```
val hFP = ca.exposeMuFunc(hFunc, hSomeInt64Value)
...
ca.unexposeMuFunc(hFP)
```
Both `%fp1` and `%fp2` have type `funcptr<@sig>`. But if the Mu fucntion `@some_func` is called from C via `%fp1`, the `NATIVE_COOKIE` instruction will return `@some_int64_value`. If called via `%fp2`, then `NATIVE_COOKIE` returns `@other_int64_value`, instead.
* pro: the cookie can be used to identify different closures and look up the contexts of the closures.
* con:
1. Not as simple as way1.
2. Exposing a Mu function requires a Mu instruction or an API message. This makes "implementing the Mu client API directly as exposed Mu functions" difficult. (In this case, exposing a Mu function requires an API function, which is also an exposed Mu function.)
## Contexts necessary for Mu functions to run
Even if a Mu function is exposed to the native program as a `functpr<sig>`, some contexts must be set up so that the Mu function can make use of Mu-specific features. These include:
* **Thread-local garbage collection states**: including thread-local allocation pools, and registering the thread for yielding as requested by the GC.
* **Stack context**: Each Mu stack has an associated `stack` value (the opaque reference to the current stack). This is necessary for swap-stack.
Similar to the JNI's "attaching a native thread to the JVM", Mu will also require attaching Mu contexts to a native thread before any exposed Mu function pointers can be called.
If the native program is executed because some Mu program called the native function through the native interface (via `CCALL`), the context is already set up and the C program can safely call back to Mu.
## Mixed native/Mu stacks
With the possibility of both C-to-Mu and Mu-to-C calling, a stack may have mixed C or Mu frames. It has some implications for stack introspection and exception handling. Possible approaches are:
1. Stack introspection cannot go deeper than the last contiguous Mu frame from the top. i.e. introspection is immediately unavailable when reached a native frame. Exceptions may not go into native frames. This approach has the weakest promise from Mu, and is thus the easiest.
2. Mu can skip non-Mu frames and unwind to other Mu frames underneath.
3. Stack introspection and stack unwinding caused by exceptions can go through frames which are supported by the native debugger. This is harder than the previous one, but still practicable.
4. Support non-standard frames (such as JavaScript frames of SpiderMonkey or V8). Too hard.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/38Dynamic loading for Java2016-08-11T14:46:43+10:00John ZhangDynamic loading for Java*Created by: eliotmoss*
So Adam and I have run into an interesting question about how to do dynamic loading for Java. The thing is, one does not know all the details of a class in advance. Therefore, it is hard to give things signatur...*Created by: eliotmoss*
So Adam and I have run into an interesting question about how to do dynamic loading for Java. The thing is, one does not know all the details of a class in advance. Therefore, it is hard to give things signatures. Consider, for example, the vtable. We need to have Mu types for all the classes mentioned in all the methods -- the vtable will be a struct of function pointers, each pointer specifically typed. But that would force eager loading of the entire universe to figure out the types!
The only alternative seems to be to refcast all over the place at run time. Is that the intent? (Coming from Java I had a (mistaken) bias that this involves a cost, but I see on referring to the spec that refcast does not involve any run-time work.)https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/37Memory model in native interface2016-06-17T15:23:38+10:00John ZhangMemory model in native interface*Created by: wks*
# Problem
Currently the Mu memory is all about "memory locations" – a region that holds a Mu value, not directly related to addresses or bytes. The native memory is a sequence of bytes, addressed by integer "address...*Created by: wks*
# Problem
Currently the Mu memory is all about "memory locations" – a region that holds a Mu value, not directly related to addresses or bytes. The native memory is a sequence of bytes, addressed by integer "addresses". They are separate until a Mu memory location is pinned. In that case, the Mu memory location is mapped to a region of bytes in the address space. Accessing one will affect another.
Meanwhile Mu's memory model uses the C++11-style model based on the happen-before relation.
This model imposes a challenge that the model should bridge the Mu and the native world. The native view of the memory as a sequence of bytes should work nicely with the Mu memory, i.e. map to meaningful memory operations in the Mu world. Atomic actions should be consistent and may establish the happen-before relation between two worlds. Specifically:
* What is the unit of memory actions? Previously, it is "Mu memory location".
* If a "load" action is modeled as a tuple: `LOAD(order, type, location)`, and location was "Mu memory location", then what should location be now? Address? What value does it see? Some store? Or something else?
* If a "store" action is modeled as a tuple: `STORE(order, type, location, newvalue)`, and location was "Mu memory location", then what should location be now?
* If a Mu memory location is pinned, and is accessed in a different granularity than the type declared, what will be the result?
* If stored as a whole, but loaded in parts...
* If stored in parts, but loaded as a whole...
* But we cannot model the memory as a byte array which sequentially changes state. (or, can we? Since non-atomic conflicting accesses are meaningless, does this imply it must be changed sequentially, or errors occur?)
# The current model
* A non-atomic load sees the unique store operation that happens before it, and there isn't another store operation that happens between the visible store and the load. If there are more than one such operations, it has undefined behaviour.
* An atomic load sees the value from any of its visible sequence of store operations.
* Mixing non-atomic and atomic operations on the same memory location has undefined behaviour.
# Possible directions
In any way, pure Mu programs should keep its original C++11-like semantics.
1. Make the memory model more machine-oriented and machine-specific.
* May give more dependable behaviours. For example, unaligned memory access is allowed in many architectures, but are not always atomic.
* Obviously this makes Mu less portable. All pointer-based memory access will have machine-specific semantics. But does this matter? This is the "native interface" anyway.
* Interoperability with the C++11 memory model for C/C++ programs will be built upon the machine-specific memory model.
2. Limit what operations are allowed in the native memory.
* Simpler model.
* Probably more undefined behaviours, because they cannot be defined if we tries to make things simple and generic.
* Will limit the capability. e.g. unions won't be used by Mu.
3. Something in between
# Examples
The native program should synchronise with the Mu program via atomic memory accesses.
```c++
// C++ pseudo code
struct Foo {
int x;
int y;
};
Mu_thread_1 {
ref<Foo> f = new<Foo>
ptr<Foo> fp = pin(f);
create_thread(native_thread_2, fp);
store(&f->x, 10, NOT_ATOMIC); // Mu-level store
store(&f->y, 20, RELEASE); // Mu-level store
}
native_thread_2(ptr<Foo> fp) {
while(load(&fp->y, ACQUIRE) != 20) {} // Native load
int a = load(&fp->x, NOT_ATOMIC); // Native load
assert(a == 10);
}
```
In non-atomic memory access, partial reads/write should be based on the bytes representation (it is called the "object representation" of a value in C11).
```c++
ref<i32> r = new<i32>;
store(r, 0x12345678); // Assume little endian
ptr<i32> p = pin(r);
i64 addr = ptrcast<i64>(p); // cast the pointer to the integer address
addr += 3;
ptr<i8> p2 = ptrcast<ptr<i8>>(addr); // cast back to pointer, but a different type
i8 value = load(p2);
assert(value == 0x12);
store(p2, 0x9a);
i32 value2 = load(r);
assert(value2 == 0x9a345678);
```
Unaligned 16-, 32- and 64-bit memory access is allowed in x64 (and P6-family guarantees atomicity if not crossing any cache line boundary).
```C++
struct Foo { i32 a; i32 b; };
ref<Foo> r = new<Foo>;
store(&r->a, 0x9abcdef0);
store(&r->b, 0x12345678);
ptr<Foo> p = pin(r);
ptr<i64> p2 = ptrcast<ptr<i64>>(p);
i64 value = load(p2);
assert(value == 0x123456789abcdef0);
```
Could non-atomic memory access mix with atomic counterparts?
```C++
struct Foo { i32 x; i8 y; double z; };
ref<Foo> r1 = new<Foo>;
ref<Foo> r2 = new<Foo>;
ptr<Foo> p1 = pin(r1);
ptr<Foo> p2 = pin(r2);
store(&p1->x, 0x12345678, NOT_ATOMIC);
store(&p1->y, 42, NOT_ATOMIC);
store(&p1->z, 3.1415927D, NOT_ATOMIC);
memcpy(p2, p1, sizeof(Foo)); // This is obviously not atomic
some_synchronization_operation_after_which_atomic_accesses_will_be_safe(); // What should this be?
thread1 {
store(r2->y, 84, RELAXED); // This is atomic
store(r2->x, 0x9abcdef0, RELEASE); // This is atomic
}
thread2 {
i32 a = load(&r2->x, ACQUIRE); // This is atomic
if (a == 0x9abcdef0) {
i8 b = load(&r2->y, RELAXED); // This is atomic
double c = load(&r2->z, RELAXED); // This is atomic
assert(b == 84 && c == 3.1415927D);
}
}
```
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/36Metacircular Client Interface2018-06-28T23:11:32+10:00John ZhangMetacircular Client Interface*Created by: wks*
It has been proposed long ago that "everything the API can do should also be possible in the Mu IR".
This issue maintains a checklist of features not in the Mu IR. These functions should be gradually added to the IR...*Created by: wks*
It has been proposed long ago that "everything the API can do should also be possible in the Mu IR".
This issue maintains a checklist of features not in the Mu IR. These functions should be gradually added to the IR.
**Features in the API but not the IR**:
* bundle loading
* stack introspection: `current_func_ver`, `current_instruction`, `dump_keepalives`
* OSR: `pop_frame`, `push_frame`
* Handle traps and undefined functions in Mu IR. It depends on how the Mu VM itself is implemented.
* Handle watchpoint in Mu IR: an instruction which is a no-op when disabled, but a (maybe limited kind of) Mu function call when enabled.
**Things that can be done dynamically via handles in the API, but can only be done statically in the IR**. I am not sure how dynamic Mu should be, or need to be, because some of the items below can be worked around with some Mu IR code, such as maintaining a hash table implemented in Mu IR, or writing wrapper functions.
* Opaque handle type: a handle that holds **any** Mu value. `ref<T>` can be a candidate for this purpose.
* Getting a Mu constant value (including constants, globals and functions) by its ID. This is a kind of introspection.
* This can be done with a type argument: `%c = GET_CONST <@T> %id`, `%c` is a `@T`
* Or return a handle: `%ch = GET_CONST_H %id`, `%ch` is a `ref<void>`
* Creating Mu heap objects by a type ID.
* Calling a Mu function (or constructing a Mu frame) with both the callee and the arguments as handles. This allows calling a Mu function with a run-time-determined arity and arg types.
* Dump keepalive variables.
* This can be done with a type argument: `%sixth_ka = GET_KEEPALIVE <@T> %frame 6`
* or return handle: `%sixth_ka_handle = GET_KEEPALIVE_H %frame 6`
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/35Alternative LISP-like Mu IR format2015-06-22T16:04:08+10:00John ZhangAlternative LISP-like Mu IR format*Created by: wks*
Problem: Mu IR needs a parser, but constructing a parser is tedious. Parser generators pulls in additional dependencies.
Solution: Use a simplistic syntax based on LISP.
Example:
```scheme
(typedef @i32 int 3...*Created by: wks*
Problem: Mu IR needs a parser, but constructing a parser is tedious. Parser generators pulls in additional dependencies.
Solution: Use a simplistic syntax based on LISP.
Example:
```scheme
(typedef @i32 int 32)
(typedef @float float)
(typedef @void void)
(typedef @refvoid ref @void)
(typedef @foo struct @i32 @i64 @float @double @refvoid)
(funcsig @f_sig @i32 (@i32 @i32))
(const @FORTY_TWO @i32 42)
(const @DOUBLE_FORTY_TWO @double 42.0d)
(const @SOME_STRUCT_CONST @some_struct @const1 @const2 @const3)
(const @NULLREF @refvoid NULL)
(global @errno @i32)
(funcdecl @write @write_sig)
(funcdef @write @write_v1 @write_sig (%p0 %p1 %p2)
(basic-block %entry
(inst %a (ADD @i32 %p0 %p1))
(inst %b (CALL @sig @callee (%arg1 %arg2 %arg3) (exc %nor %exc) (keepalive %v1 %v2 %v3)))
)
(basic-block %nor
(inst _ (SUB @i32 %p0 %p2)) ; unnamed instruction
(inst _ (BRANCH %exit))
)
(basic-block %exc
(inst _ (TRAP @void))
)
(basic-block %exit
(inst _ (@uvm.thread_exit)) ; COMMINST is no longer necessary because the syntax is already dynamic
)
)
```
**How would this benefit the Mu implementer?** The parser can be written by hand in very few lines of code. This is convenient for languages that has less capabilities (such as C which does not handle complex type hierarchies easily).
**How would this benefit client implementers?** The code generator can be more typed (using structured nested lists), rather than constructing arbitrary strings (using string formatting).
**Binary format?** There can be a simpler and direct mapping between the text format and the binary format. For example, atoms can be encoded as a hash code, and a list can be encoded as a type, a length and a list of values. Mu spec no longer needs to define a text format and a binary format separately.
Problems?
Does not look like assembly.
May be less readable than the current text format without aggressive pretty-printing.
Extra validation should be performed by the parser. (Really? The Mu micro VM is not required to correct any errors. Any error is allowed to have undefined behaviours.)
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/34Extra types for the Native Interface2018-06-28T23:11:32+10:00John ZhangExtra types for the Native Interface*Created by: wks*
Philosophy: There should be a subset of Mu types and instructions that can do what C can do. It should be possible to implement the C programming language in this subset of Mu while still be able to access the memory i...*Created by: wks*
Philosophy: There should be a subset of Mu types and instructions that can do what C can do. It should be possible to implement the C programming language in this subset of Mu while still be able to access the memory in a way specified by the platform's ABI (be compatible with "good" native programs).
# Types
## Pointer types
* `ptr<T>`: A memory pointer to type `T`. (Is there a better name? A pointer always points to somewhere in the memory. Maybe "data pointer" or "value pointer"? In C, it is object pointer, but "object" has a different meaning in Mu.)
* `funcptr<sig>`: A function pointer to a function with signature `sig`
Pointers are addresses. They can be cast to and from integer values by interpreting the integer as the address. Mu does not check the validity of this cast.
`ptr<T>` can be used by the memory addressing instructions: `GETFIELDIREF`, `GETELEMIREF`, ... will work as they are `iref` types. Memory access instructions can work with `ptr<T>` with a `PTR` flag:
```
// %p is ptr<int<64>>
%result1 = LOAD PTR ACQUIRE <@i64> %p
STORE PTR RELEASE <@i64> %p @const1
%result2 = CMPXCHG PTR SEQ_CST SEQ_CST <@i64> %p @const1 @const2
%result3 = ATOMICRMW PTR SEQ_CST ADD <@i64> %p @const3
```
`funcptr<sig>` can be called with the `CCALL` instruction:
```
// assume @write is funcptr<@size_t (@i32 @voidptr @size_t)>
%result = CCALL C <@sig> @write (%fd %buf %sz) // C means the "C" calling convention
```
## Union type
I think there is a way to introduce the union type from C without compromising the safety of Mu's reference types.
Define the union type as: `union<T1 T2 T3 ...>`
`T1`, `T2`, `T3`, ... are its members. The members of a union type cannot contain `ref`, `iref`, `weakref`, `func`, `thread`, `stack` or `tagref64` types as they are either object references or opaque references. However, `ptr` and `funcptr` are allowed.
`union` must be in the memory. It cannot be the type of an SSA variable. It does not make sense: union is a, err..., "union" of several types (no puns intended), but an SSA variable holds exactly one type.
> One may argue that "I want to LOAD a union and STORE to another location without looking into it, so I need union to be an SSA variable". However, for data transfer, there could be a `memcpy`-like instruction that can copy large structures efficiently. So it is unnecessary.
When allocated in the Mu memory, its initial value is all zeros: If any member is loaded before another value is stored into it, the result is always the "zero value" of that type (int 0, fp +0.0, ref NULL).
A union only holds the latest stored member:
* if a load is **not atomic*, and there is only one visible store to a member of the union, then
* if the store accesses the same member as the load, the load gets the value of that store;
* if the store accesses a different member, the load instruction has undefined behaviour.
* Union members cannot be accessed atomically.
> I am still uncertain how the C memory model plays together with unions. C11 defines a union as "an overlapping set of member objects" and "When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values." This implies that storing into one member of a union has the side effect of modifying other members.