general-issue-tracker issueshttps://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues2018-06-28T23:11:33+10:00https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/66Are bundles the unit of compiling or the unit of loading?2018-06-28T23:11:33+10:00Kunshan WangAre bundles the unit of compiling or the unit of loading?# Two different views of "bundle"
From the compiler's point of view, compiling is a process where:
1. There are many modules (such as .class files), all of them needs to be compiled (to Mu IR, for example). Modules may have inter-d...# Two different views of "bundle"
From the compiler's point of view, compiling is a process where:
1. There are many modules (such as .class files), all of them needs to be compiled (to Mu IR, for example). Modules may have inter-dependencies, and there could even be mutual recursions (A imports B, B imports C, and C imports A).
2. Compilers should compile each module separately, with no knowledge of other modules. This implies each module compiles to a stand-alone bundle that has all the necessary things (types, functions, ...) used inside the bundle. Since types use "structural equivalence", it does not matter if two structurally isomorphic types are defined twice in two bundles. In particular, functions can be declared multiple times in different bundles.
3. When they are linked, bundles are merged. Different bundles should have no intersections, with one exception: Functions of the same name are resolved to be the same, so calling a declared but not defined function may ends up calling a function defined in another bundle. (Global cells should have similar properties, too.) We also allow calling functions that are declared but not defined in any other bundles. which triggers lazy loading.
In the current Mu design, the micro VM's view of bundles is:
1. There is a global bundle, which includes everything that is ever loaded at a point of time.
2. Bundle is the unit of loading. Bundles are loaded sequentially. (At least it is perceived to be so through the API. The Mu impl can load them in parallel while ensuring sequential consistency.)
3. Each bundle can refer to things (types, functions, ...) defined in the current bundle or the global bundle.
4. Every time a bundle is loaded, the contents (types, functions, ...) are merged into the global bundle. That is, *the is one single global bundle which gets gradually augmented as bundles are loaded*. Conflicts are not allowed. If a new function version (FuncVer) is defined on an existing function, this FuncVer becomes the "most recent version" of the function.
The main difference between the two views is whether we consider bundle loading to be a static, separated and parallel process, or a dynamic and sequentially inter-dependent one.
## Why is Mu designed like this?
The current Mu design is based on that (1) Mu is a run-time JIT compiler, and (2) Mu supports function re-definition. Because Mu is a run-time entity, it is one single thing that lives through the life of the application. It will observe all things the client ever deliver to it (bundles), and this is a temporal process. The micro VM always starts with no knowledge, and the client "teaches" the micro VM more and more knowledge by loading bundles. So the "global bundle" represents the "current knowledge" the micro VM has about the world (i.e. the types, functions, ... of the client's language). Since the growth of knowledge is a sequential process, it is natural to assume bundles are loaded in a sequence. In this way, if a later bundle refers to things the micro VM already knows (for example, types defined in previously loaded bundles), then it does not need to define/declare them again because Mu already knows it, so the bundle can just refer to them by name/ID. The sequential nature also makes it easy to support function re-definition. Since there is a sequence in bundles, a FuncVer in a newer bundle will replace the current "most recent" version in the global bundle.
The separate-compiling approach is the traditional and well-known way how the C compilers work. And it does not address function re-definition. Re-definition is still an "action" rather than a declaration, and the order of "which FuncVer invalidates which older FuncVer" does matter.
## What the client may want
But compiler (traditional C compiler or Mu client) writers may want a certain degree of flexibility of parallel compilation, and some aesthetic appeal that "**separate modules should be compiled to separate Mu bundles**". For example, as a JVM client, it will be more intuitive to generate one Mu IR bundle for each .class file, and each .class file can be compiled separately, and still allow lazy loading. For example:
```java
//// Foo.class
public class Foo {
public static void run() { Bar.run(); }
}
//// Bar.class
public class Bar {
public static void run() { Foo.run(); }
}
```
The separate-compilng model will deliver two Mu bundles:
```
//// Bundle1:
.typedef @Foo = ....
.funcdef @Foo.run VERSION %v1 ... {
...
CALL @Bar.run()
}
.funcdecl @Bar.run ... // Declare @Bar.run in Bundle1
//// Bundle2
.typedef @Bar = ....
.funcdef @Bar.run VERSION %v1 ... {
...
CALL @Foo.run()
}
.funcdecl @Foo.run ... // Declare @Foo.run in Bundle2
```
That is, `@Bar.run` is declared in Bundle1 and `@Foo.run` is declared in Bundle2. They declare functions in each other because neither has knowledge of the other.
However, in the current Mu model, the two bundles will look like:
```
//// Bundle1:
.typedef @Foo = ....
.funcdef @Foo.run VERSION %v1 ... {
...
CALL @Bar.run()
}
.funcdecl @Bar.run ... // Declare @Bar.run in Bundle1
//// Bundle2
.typedef @Bar = ....
.funcdef @Bar.run VERSION %v1 ... {
...
CALL @Foo.run()
}
```
The difference is subtle: Bundle2 does not declare `@Foo.run`, because it knows Bundle1 is loaded before it, and `@Foo.run` is already defined.
It is arguable that this will require two bundles to be built sequentially. But it can be worked around by "lifting" both declarations in to a third bundle:
```
//// Bundle0:
.funcdecl @Bar.run ... // Declare @Bar.run in Bundle1
.funcdecl @Foo.run ... // Declare @Foo.run in Bundle2
//// Bundle1:
.typedef @Foo = ....
.funcdef @Foo.run VERSION %v1 ... {
...
CALL @Bar.run()
}
//// Bundle2
.typedef @Bar = ....
.funcdef @Bar.run VERSION %v1 ... {
...
CALL @Foo.run()
}
```
Declaring functions is faster than defining. After Bundle 0 is loaded, Bundle 1 and Bundle 2 can be built and loaded in parallel.
It is also arguable that "lifting both declarations into a separate bundle" is a redundant step. But in practice, this step cannot be avoided. Still take Java as example. If one Java ClassLoader visits both Foo.class and Bar.class, then it already knows both classes, and it can simply build both into a single Mu bundle rather than splitting them into two. If two Java ClassLoaders attempt to load Foo and Bar in parallel, and they found the inter-dependency, but also found each other working on the two respective .class files simultaneously, then the ClassLoaders need certain synchronisation mechanism so that classes are not loaded twice. This is necessary even in existing non-Mu productional JVMs. So *if there are needs for compiling two Java classes in parallel and they have inter-dependencies, then the client has to factor out the common parts, which naturally leads to the "Bundle0"*.
An orthogonal issue is about the type system. Assume we have the two Java classes:
```java
class Foo { Bar bar; }
class Bar { Foo foo; }
```
Naturally `@Foo` should be `struct<@JavaHeader ref<@Bar>>`. However, without looking at bar.class, we cannot define the type `@Bar` which is supposed to match the structure of the Java class fields in Bar. So if we enforce lazy loading, then Foo.bar has to be represented as `ref<void>` rather than `ref<@Bar>`. This has been [discussed in a separate issue before](https://gitlab.anu.edu.au/mu/general-issue-tracker/issues/38). **The separate-compiling model does not solve this problem** because the crux is that the **knowledge** of Bar is only obtained by looking at Bar.class. Unlike declared-but-not-defined functions, having **types** that are not yet known (the C language calls it "incomplete type") will cause many problems. These types are inaccessible. If traps should be triggered when a type is used, it is hard to define what it means by "a type is used". If we define it as accessing an object that has that type, or simply performing BinOp on such types, then almost all instructions can trigger traps.
## Conclusion
In the end, we still believe the current Mu design is reasonable for its purpose as a JIT compiler.
The current "single global bundle" design is also easier for the boot image writer because there is only one bundle to consider.
But we may consider the needs of programming language implementers that "modular languages should be compiled to modular object code". The implication of adopting this model is still not clear. Alternatively, this model could also be implemented in a layer above Mu.
https://gitlab.anu.edu.au/mu/general-issue-tracker/-/issues/65The "surface" IR as a layer above the formal IR2018-06-28T23:11:33+10:00Kunshan WangThe "surface" IR as a layer above the formal IR# Abstract
After the recent discussions about several alternative Mu IR forms, I realised the need for formal verification, the concrete micro VM implementation, and the application/client may be different. The IR, as a language, is t...# Abstract
After the recent discussions about several alternative Mu IR forms, I realised the need for formal verification, the concrete micro VM implementation, and the application/client may be different. The IR, as a language, is the means for the client to transfer programs to the micro VM, and this implies *compactness*, *efficiency* and a certain degree of *expressiveness* that does not hinder efficient implementation. Formal verification, on the other hand, will benefit from the IR designed for *functional* languages, *abstraction*, *consistency*, and *simplicity* with respect to an operational model.
This issue proposes a two-level model: a "surface" form primarily for client-µVM communication, and a "formal" form for formalisation. The "surface" form is strictly a syntax sugar of the "formal" form, and can be transformed statically and automatically to the latter when needed.
With the surface form "detached" but mappable to the formal form, the surface form can be designed more aggressively for code compactness, such as introducing side-exits in the middle of basic blocks. But I am not advocating making such aggressive design changes now.
# Different concerns
## Terminating Instructions and Code Bloating
Currently, in the "surface" form as described by the (informal) [Mu Spec](https://gitlab.anu.edu.au/mu/mu-spec/blob/master/instruction-set.rst#ssa-variables), some Mu IR instructions can be either a normal instruction in the middle of a basic block, or a terminal instruction that implies several destinations. The `CALL` instruction is a well-known example. A basic block may contain may `CALL` instructions in a sequence. When an exception is thrown, the default behaviour is "rethrow". While in the "formal" form as described in [uvm-formal-hol](https://gitlab.anu.edu.au/mu/mu-formal-hol/blob/master/uvmIRScript.sml#L370), `CALL` is always a terminating instruction, and both normal and exceptional destinations must be explicitly defined.
The "surface" form is designed with compactness and the machine behaviour in mind. Real-world programs will contain many function calls in a sequence. When no exceptions are thrown, each (both machine-level and language-level) call will simply fall-through to the next instruction, with the return values available to be accessed by subsequent instructions and function calls. The default "rethrow" behaviour is design so that when the current call site cannot catch exceptions, the stack-unwinder will simply skip the current function. It is *the fact that the call site does not handle the exception in most cases* that makes the code efficient.
A related topic is to add "side exits" in the middle of basic blocks, as is done by JikesRVM. This will remove the need to split basic blocks even when exceptions and "uncommon branch targets" are present and thus makes the IR more compact, but it will also make the control flow less explicit.
On the other hand, the "formal" form makes it easy to reason about all branches and all possible execution paths. If the IR is augmented with "NO_THROW" annotations, it can further make exceptions undefined behaviours, and relief the burdens of verification of some Mu functions.
Real-world programming languages, however, usually consistently choose one way or another, with "rethrow" more probable. Java, C#, Python and RPython, for example, always "rethrow", and does not provide the option of "nothrow". C++ allows the ["noexcept"](http://en.cppreference.com/w/cpp/language/noexcept) annotation on some C++ functions, and C++ exceptions can silently pass through any C functions on the stack, as long as the C functions are compiled with compatible compilers (such as g++/gcc), and the object files contain the appropriate unwinding information (.eh_frame in ELF).
But if future languages or existing languages with extensions (vmmagic) can make use of this feature, it has the potential to provide positive effects for the verifiability of carefully-written Mu functions.
## Declarative vs Operational
The current Mu IR is inspired by the LLVM IR. This IR describes an AST of functions and top-level declarations, such as constants (which, in LLVM's terminology, include both literals, global variables and functions). The SSA form is naturally a "dependency-description language": an instruction has many "uses" of other [Value](http://llvm.org/docs/doxygen/html/classllvm_1_1Value.html), and each "Value" can be anything that we can get a value from, that is, either constant or local variable. The compiler sees the dependency of instructions, such as "this `add` instruction depends on an instruction result and a ConstantInt". With a type hierarchy, the compiler will need to pattern-match against the kinds of Values, as is what a static code-transformer always has to do.
On the other hand, [mu-formal-hol](https://gitlab.anu.edu.au/mu/mu-formal-hol) describes the execution of a thread as a sequence of state transition (a kind of interpreting in the functional style). A thread has may registers, each holds the value of a local variable. The registers are modified as the side effect of execution:
1. When [entering a basic block](https://gitlab.anu.edu.au/mu/mu-formal-hol/blob/master/uvmThreadSemanticsScript.sml#L383), the basic block arguments are assigned to the registers of the parameters.
2. When [executing an instruction](https://gitlab.anu.edu.au/mu/mu-formal-hol/blob/master/uvmThreadSemanticsScript.sml#L270), the instruction will affect the values of some registers, and the register values are updated.
With a "Value" being either a constant or a register, the value needs to be pattern-matched against the two cases, namely constant or register, every time an instruction argument is evaluated.
A solution to this complexity is to introduce instructions that loads global vars into local vars. For example, GETCONST, GETGLOBALCELLIREF, GETFUNCREF and GETEXPFUNC. (I intentionally avoid using the word "load" in order to emphasise they are not memory operations, but merely aliasing.) This will make the IR semantically clearer and probably make proof easier, at the cost of making basic blocks more verbose. But fundamentally the two forms are equivalent.
# Where the Two Forms Reconcile
I propose splitting the IR into two layers, with a "surface" catering to compactness, and the "formal" form designed to be more consistent. There will be *a mapping from every "surface" IR bundle to a "formal" IR bundle*, where the two forms reconcile. The point is, for every "surface" bundle, there will be an equivalent "formal" bundle. So it will not compromise verifiability by introducing "unfriendly" syntaxes.
The mapping will "desugar" the "surface" form. There will be a conversion rule for every "surface" syntax that does not exist in the "formal" form. For example:
1. The "fall-through" call
```
%cur_bb(%v1 %v2 ... %vn):
%lv1 = ...
%lv2 = ...
...
CALL <@sig> @callee (...) EXC(
(%rv1 %rv2) = CALL <@sig> @callee (...)
%lv3 = next instruction...
...
```
will be desugared into:
```
%cur_bb(%v1 %v2 ... %vn):
%lv1 = ...
%lv2 = ...
CALL <@sig> @callee (...) EXC(
%generated_continue_block(%v1 %v2 ... %vn %lv1 %lv2 ... $0 $1)
%generated_rethrow_block())
%generated_continue_block(%v1 %v2 ... %vn %lv1 %lv2 .. %rv1 %rv2):
%lv3 = next instruction...
...
%generated_rethrow_block() [%the_exception]:
THROW %the_exception
```
2. References to global variables:
```
%a = ADD <@i32> %a @ONE
```
will be desugared into:
```
%_tmp = GETCONST @ONE
%a = ADD <@i32> %a %_tmp
```
## Other Implications
With the introduction of an additional form, the "surface" form can be more aggressive in design.
The "surface" form may go beyond the current "single-exit" form by introducing **side-exits**, such as:
- DIV by zero, CALL/TRAP/WATCHPOINT/SWAPSTACK with exceptions, NEW/ALLOCA failure, LOAD/STORE with NULL pointers, may take side exits rather than forcing the basic block to be split.
- "guard" instructions (as usually demanded by tracing JIT compilers) may be implemented as side-exiting conditional branches. This also implies that the "side-exit" is the slow path while the "fall-through" case is the common fast path.
Currently I fear that breaking the "single-exit" property may result in the micro VM still having to splitting them internally. LLVM, with its basic blocks not taking parameters, and its optimisers having lots of transforms to do, probably would keep the single-exit SSA form. But since Mu already adopted the "goto-with-values" form, whether "side-exits" should be introduced to the IR should depend on the experiences in the high-performance Mu implementation.
Related works:
- [B3](https://webkit.org/docs/b3/intermediate-representation.html): Apple's B3 still requires Jump/Branch/Switch to be at the end of basic blocks. The reason could be that B3 still uses the text-book SSA form, so non-merging control flow branches do not need PHI nodes, hence it is cheap to add basic blocks.
- RPython: Its transformers (including GC transformers and exception transformers) will split basic blocks for function calls with exceptions. But since RPython is static, code compactness may not be a concern.
- JikesRVM: JikesRVM uses Factored CFG (FCFG), where a Potential Excepting Instruction (PEI) does not necessarily end a basic block. As described [here](http://www.jikesrvm.org/JavaDoc/org/jikesrvm/compilers/opt/ir/ControlFlowGraph.html), FCFG will significantly reduce the number of basic blocks, but will complicate flow-sensitive global analysis. But given that Mu pushes most optimisations out of the micro VM, it is arguable that the micro VM back end may favour a simpler form.