mu-impl-fast issueshttps://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues2017-06-28T17:27:09+10:00https://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/46Fix frame slot offset after register allocation2017-06-28T17:27:09+10:00Yi LinFix frame slot offset after register allocationZebu is doing the following things regarding frame size in this order:
1. emit code to save callee saved registers and reserve frame slots for them
1. do register allocation, spill registers and reserve frame slots for them
1. rewrite co...Zebu is doing the following things regarding frame size in this order:
1. emit code to save callee saved registers and reserve frame slots for them
1. do register allocation, spill registers and reserve frame slots for them
1. rewrite code, and redo register allocation until finished
1. figure out which callee saved registers are not used, and remove the saving/restoring code for them
1. patch the frame size
We do not know the actual frame size until 3 is done. However, in 2, we need to make assumptions and emit code about frame slots.
Currently though we patch the frame size in the end, we do not deduct the space initially reserved for unused callee saved register.
We should offset all the frame slots, and patch spilled location in the code.Yi LinYi Linhttps://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/44Implementing TagRef642017-07-19T15:00:38+10:00Yi LinImplementing TagRef64reference: https://nikic.github.io/2012/02/02/Pointer-magic-for-efficient-dynamic-value-representations.htmlreference: https://nikic.github.io/2012/02/02/Pointer-magic-for-efficient-dynamic-value-representations.htmlYi LinYi Linhttps://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/42Exception handling for native frames2017-09-09T14:19:11+10:00Yi LinException handling for native framesCurrent implementation will fail for being unable to find information about native frames.Current implementation will fail for being unable to find information about native frames.https://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/37[x86-64] Passing a 128-bit integer to a C function using CCALL2017-06-11T19:34:41+10:00Isaac Garianoisaac@ecs.vuw.ac.nz[x86-64] Passing a 128-bit integer to a C function using CCALLPassing a 128-bit integer to a C function (ussing CCALL) dosn't work on x86-64, e.g:
I wrote a test to check that 128-bit integers are passed correctly when calling functions (it is designed to cause the last argument to be placed on th...Passing a 128-bit integer to a C function (ussing CCALL) dosn't work on x86-64, e.g:
I wrote a test to check that 128-bit integers are passed correctly when calling functions (it is designed to cause the last argument to be placed on the stack on x86, but on aarch64 it should be in a register).
arg_overflow.uir:
```
.funcsig test_arg_overflow_sig = () -> ()
.funcdef my_main<()->()>
{
entry():
CCALL #DEFAULT <ufuncptr<test_arg_overflow_sig> test_arg_overflow_sig> <ufuncptr<test_arg_overflow_sig>>EXTERN "c_test_arg_overflow" ()
CALL <test_arg_overflow_sig> mu_test_arg_overflow()
RET
}
.funcsig arg_overflow_sig = (int<64> int<128> int<128> int<128>) -> ()
.funcdef mu_test_arg_overflow<test_arg_overflow_sig>
{
entry():
int128_0 = ADD <int<128>> <int<128>>0 <int<128>>0
int128_F = ADD <int<128>> <int<128>>0 <int<128>>0xFFFFFFFFFFFFFFFF0000000000000000
CCALL #DEFAULT <ufuncptr<arg_overflow_sig> arg_overflow_sig> <ufuncptr<arg_overflow_sig>>EXTERN "arg_overflow" (<int<64>>0 int128_0 int128_0 int128_F)
RET
}
```
It needs to be compiled with the following C code:
arg_overflow.c
```
#include <stdint.h>
#include <stdio.h>
void arg_overflow(uint64_t a, __int128_t b, __int128_t c, __int128_t d) {
printf("d = %016lX%016lX\n", (uint64_t)(d >> 64), (uint64_t)d);
}
void c_test_arg_overflow()
{
arg_overflow(0, 0, 0, (__int128_t)(0xFFFFFFFFFFFFFFFF) << 64);
}
```
On x86-64 using the line ` ./muc -r -f my_main arg_overflow.uir arg_overflow/arg_overflow`, (using the latest commit in the aarch64 branch) it fails to compile, giving the error:
```
thread '<unnamed>' panicked at 'not yet implemented', src/compiler/backend/arch/x86_64/inst_sel.rs:3181
stack backtrace:
0: std::sys::imp::backtrace::tracing::imp::unwind_backtrace
at /checkout/src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
1: std::sys_common::backtrace::_print
at /checkout/src/libstd/sys_common/backtrace.rs:71
2: std::panicking::default_hook::{{closure}}
at /checkout/src/libstd/sys_common/backtrace.rs:60
at /checkout/src/libstd/panicking.rs:355
3: std::panicking::default_hook
at /checkout/src/libstd/panicking.rs:371
4: std::panicking::rust_panic_with_hook
at /checkout/src/libstd/panicking.rs:549
5: std::panicking::begin_panic
6: mu::compiler::backend::x86_64::inst_sel::InstructionSelection::emit_c_call_ir
7: mu::compiler::backend::x86_64::inst_sel::InstructionSelection::instruction_select
8: <mu::compiler::backend::x86_64::inst_sel::InstructionSelection as mu::compiler::passes::CompilerPass>::visit_function
9: mu::compiler::passes::CompilerPass::execute
10: mu::compiler::Compiler::compile
11: mu::vm::vm::VM::make_boot_image_internal
12: mu::vm::api::api_bridge::_forwarder__MuCtx__make_boot_image
13: main
14: __libc_start_main
15: _start
fatal runtime error: failed to initiate panic, error 5
Aborted (core dumped)
```
On aarch64 it compiles (well Zebu fails at linking, but it works if I add 'arg_overflow.c' to the clang command) and runs correctly, printing:
```
d = FFFFFFFFFFFFFFFF0000000000000000
d = FFFFFFFFFFFFFFFF0000000000000000
```Yi LinYi Linhttps://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/33Rework on pin/unpin2017-07-19T15:02:01+10:00Yi LinRework on pin/unpinCurrent `PIN`/`UNPIN` directly get translated into a runtime call into GC to pin/unpin the object, which simply push/remove the object reference to/from a root set (synchronisation required). This is inefficient, and also the behaviour i...Current `PIN`/`UNPIN` directly get translated into a runtime call into GC to pin/unpin the object, which simply push/remove the object reference to/from a root set (synchronisation required). This is inefficient, and also the behaviour is different from the spec (in which you need unpin operations to match every pin operation).
This will need rework, probably along with #12 .
And it is also related with https://gitlab.anu.edu.au/mu/mu-spec/issues/7, which proposes to separate immovability and immortal semantics of `PIN` instruction.https://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/32Calling function defined in another bundle does not work2017-06-07T00:22:48+10:00Isaac Garianoisaac@ecs.vuw.ac.nzCalling function defined in another bundle does not workI am trying to define a function in one bundle, and call it another, and i'm getting an error:
Heres an example that will reproduce the problem:
file argc_exit.uir:
```
.funcsig @exit_sig = (int<32>) -> ()
.funcdef @argc.exit <@...I am trying to define a function in one bundle, and call it another, and i'm getting an error:
Heres an example that will reproduce the problem:
file argc_exit.uir:
```
.funcsig @exit_sig = (int<32>) -> ()
.funcdef @argc.exit <@exit_sig>
{
%entry(<int<32>>%arg):
CCALL #DEFAULT <ufuncptr<@exit_sig> @exit_sig> <ufuncptr<@exit_sig>>EXTERN "exit"(%arg)
RET
}
```
file argc_inline.uir:
```
.typedef %char = int<8>
.funcdef @my_main <(int<32> uptr<uptr<%char>>)->(int<32>)> VERSION @my_main_v1
{
%entry(<int<32>>%argc <uptr<uptr<%char>>>%argv):
CALL <(int<32>)->()> @argc.exit (%argc)
RET <int<32>>1
}
```
Then using my mu-tool-compiler:
`./muc -r -f my_main argc_exit.uir argc_inline.uir emit/argc`
(use -c if you wan't to see the API calls it uses).
`thread '<unnamed>' panicked at 'Operand 1013' is neither a local var or a global var', src/vm/api/api_impl/muirbuilder.rs:1290 stack backtrace:`
(the symbol with Id 1013 is @argc.exit).
I tracked the error down and it appears to be comming from the function `get_treenode` in (src\vm\api\api_impl\muirbuilder.rs).
My guess is the API implementation only looks for things defined in the current bundle and not other bundles.
However from my understanding of the Mu-spec you should be able to refer to entities declared in previously loaded bundles.
A workaround is to combine both files into the same bundle such as with `./muc -r -f my_main <(cat argc_exit.uir && cat argc_inline.uir) emit/argc`.Kunshan WangKunshan Wanghttps://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/31[x86_64] floating point/int128 conversion2017-06-13T13:38:11+10:00Yi Lin[x86_64] floating point/int128 conversionunimplemented for nowunimplemented for nowYi LinYi Linhttps://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/29Implement SWITCH with switch table2017-06-06T15:10:22+10:00Yi LinImplement SWITCH with switch tableCurrently the compiler generates cascading conditional branches for SWITCH instruction. We should consider using switch table if there are many case arms.Currently the compiler generates cascading conditional branches for SWITCH instruction. We should consider using switch table if there are many case arms.https://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/22[x86_64] Status flags undefined for mul/div/idiv2017-06-29T12:53:30+10:00Yi Lin[x86_64] Status flags undefined for mul/div/idivThe following table summerizes how Mu integer binops are mapped to x86_64 insts, and how x86_64 insts affect status flags.
| Mu IR | X86_64 Inst | #N (signed) | #Z (zero) | #C (carry) | #V (overflow) |
|:-----: |:-----------: ...The following table summerizes how Mu integer binops are mapped to x86_64 insts, and how x86_64 insts affect status flags.
| Mu IR | X86_64 Inst | #N (signed) | #Z (zero) | #C (carry) | #V (overflow) |
|:-----: |:-----------: |:-----------: |:---------: |:----------: |:-------------: |
| ADD | add | ✓ | ✓ | ✓ | ✓ |
| SUB | sub | ✓ | ✓ | ✓ | ✓ |
| AND | and | ✓ | ✓ | - | - |
| OR | or | ✓ | ✓ | - | - |
| XOR | xor | ✓ | ✓ | - | - |
| MUL | mul | ✗ | ✗ | ✓ | ✓ |
| UDIV | div | ✗ | ✗ | - | - |
| SDIV | idiv | ✗ | ✗ | - | - |
| UREM | div | ✗ | ✗ | - | - |
| SREM | idiv | ✗ | ✗ | - | - |
| SHL | shl | ✓ | ✓ | - | - |
| LSHR | shr | ✓ | ✓ | - | - |
| ASHR | sar | ✓ | ✓ | - | - |
`mul`, `div` and `idiv` generate undefined signed flag (#N), and zero flag(#Z). We will need to generate extra code to check, and set those flags.https://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/21GC related C functions return inaccurate result2017-11-21T13:46:42+11:00Yi LinGC related C functions return inaccurate result`gc/src/heap/gc/clib_x64.c` contains C functions for GC, such as `get_registers()`, which contains inline assembly to save all values in general purpose registers into an array. However C compilers may generate code that changes the regi...`gc/src/heap/gc/clib_x64.c` contains C functions for GC, such as `get_registers()`, which contains inline assembly to save all values in general purpose registers into an array. However C compilers may generate code that changes the registers before saving.
We may want to rewrite the function in assembly instead of C. And I believe it is reasonable that we want to eliminate all C functions in the code base and replace them with assembly (all C functions are pretty simple). Ideally we want only Rust code and assembly in the code base.https://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/20[x86_64] int<1> arithmetics return wrong result2017-05-01T15:36:35+10:00Yi Lin[x86_64] int<1> arithmetics return wrong resultCurrently Zebu treats int<1> the same as int<8>. This is fine if the client only uses int<1> as boolean. If the client uses int<1> arithmetic operations, Zebu returns wrong result.
We should either explicitly forbid int<1> arithmetic...Currently Zebu treats int<1> the same as int<8>. This is fine if the client only uses int<1> as boolean. If the client uses int<1> arithmetic operations, Zebu returns wrong result.
We should either explicitly forbid int<1> arithmetics or implement it.https://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/19Register allocation validator2017-06-23T18:15:34+10:00Yi LinRegister allocation validatorThis issue describes the algorithm for the register allocation validation in Zebu.
Eliot proposed the approach. I implemented it in Zebu, and put the pseudo code here. I am not exactly confident if I have captured all the ideas from ...This issue describes the algorithm for the register allocation validation in Zebu.
Eliot proposed the approach. I implemented it in Zebu, and put the pseudo code here. I am not exactly confident if I have captured all the ideas from Eliot, and implemented it correctly. This issue serves as a record of how the register allocation validation works. Any discussion is welcome, and it may help future work.
<br /><br />
Input: register assignment `ASSIGNED(reg) -> mreg`, and code. <br/>
Output: `PASS` if `ASSIGNED` is correct; or `ERROR` if it is not.
<br /><br />
Assumptions:
* we have correct liveness information.
* spill code (`SPILL_LOAD`/`SPILL_STORE`) is already inserted, and spilled virtual registers are replaced with short-lived virtual registers, and we know the mapping.
<br /><br />
We are trying to iterate through the code focusing on registers by maintaining an `alive set`, each entry of which is a 3-element tuple `ENTRY(reg, machine_reg, mem)`, meaning `reg` is alive (contains a valid value) at the moment in `machine_reg` and spilled location `mem`.
* `(-, machine_reg, -)` means a machine register is alive, but no virtual register and no spill location is associated.
* `(*, machine_reg, *)` means matching entries that have `machine_reg`
* `(reg?, machine_reg, *)` means matching entries that have `machine_reg`, and naming the first element as `reg` (it may exist or not)
<br /><br />
We have a few primitives about alive set
* `x <- ALIVE[block]` loads alive set for `block` and uses alias `x` for it.
* `x.ADD_ALIVE(r, m, mem)`adds `(r, m, mem)` to alive set `x`.
* `ALIVE[block] <- x` saves alive set `x` with `block`.
* `x.ASSERT_ALIVE(r, m, mem)` tries to match pattern `(r, m, mem)`. If no entry is found, ends with `ERROR`.
* `x.KILL_ALIVE(r, m, mem)` tries to match pattern `(r, m, mem)`. If any entry is found, delete the entries.
* `x.EXIST_ALIVE(r, m, mem)` tries to match pattern `(r, m, mem)`. If any entry is found, it is true.
* `x.TRIM(list)`: for any `r`, `m` in `list`, preserve them in `x`, delete other entries in `x`
* `x.INTERSECT(y)`: if `(r, m, *)` or `(-, m, *)` appear in both `x` and `y`, preserve it in `x`; otherwise delete it from `x`.
* `y <- x.COPY`: copy `x` into `y`
<br /><br />
Pseudo code:
```
// set the initial alive set for a function start
init.ADD_ALIVE(-, stack pointer, -)
init.ADD_ALIVE(-, frame pointer, -)
init.ADD_ALIVE(-, program counter, -)
init.ADD_ALIVE(-, callee saved registers, -)
init.ADD_ALIVE(-, used argument registers, -)
push entry block to work queue
ALIVE[entry block] <- init
while work queue is not empty {
pop work queue -> block
alive <- ALIVE[block]
for each inst in block {
// (1) check spill
// we only care about the variable before spilling
if inst is SPILL_LOAD(mem) -> t for var v {
alive.ASSERT_ALIVE(v, *, mem)
} else if inst is SPILL_STORE(t) -> mem for var v {
alive.ADD_ALIVE(v, -, mem)
}
// (2) check use
// when we use a register, it needs to contain a valid value
for reg in inst.virtual_register_uses() {
mreg = ASSIGNED(reg)
alive.ASSERT_ALIVE(reg, mreg, *)
}
for mreg in inst.real_register_uses() {
alive.ASSERT_ALIVE(*, mreg, *)
}
// (3) kill died regs
// when a register dies, we remove its entry from alive set
for reg in inst.virtual_register_dies() {
alive.KILL_ALIVE(reg, *, *)
}
for mreg in inst.real_register_dies() {
alive.KILL_ALIVE(*, mreg, *)
}
// (4) check and add defines
for reg in inst.virtual_register_defines() {
if reg NOT in inst.liveout() {
// when a register is defined, but doesnt live out of this instruction
// we kill its previous values from alive set (by deleting the entries)
alive.KILL_ALIVE(reg, *, *)
} else {
mreg = ASSIGNED(reg)
if NOT alive.EXIST_ALIVE(*, mreg, *) {
// mreg doesnt hold any value at the moment, we simply add an entry
alive,ADD_ALIVE(reg, mreg, -)
} else {
// we need to ensure assigning mreg will not destroy useful values
for (x?, mreg, *) in ALIVE {
if x NOT exist {
x <- reg
} else {
// we have x in mreg, and we want reg in mreg as well
if x == reg {
// overwrite value (safe)
} else {
if inst is MOVE {
// possible coalescing (safe)
} else {
// we are destroying the value of x
// and x is alive at the moment (otherwise it is killed already in step 3)
ERROR
}
}
}
alive.ADD_ALIVE(reg, mreg, -)
}
}
}
}
for mreg in inst.real_register_defines() {
if mreg NOT in inst.liveout() {
// when a register is defined, but doesnt live out of this instruction
// we kill its previous values from alive set (by deleting the entries)
alive.KILL_ALIVE(*, mreg, *)
} else {
if NOT alive.EXIST_ALIVE(*, mreg, *) {
// mreg doesnt hold any value at the moment, we simply add an entry
alive.ADD_ALIVE(-, mreg, -)
} else {
for (reg?, mreg, -) in ALIVE {
if reg NOT exist {
// we have value in mreg, but it doesnt hold value of any variables
// overwrite the value is safe
} else {
// we are holding reg in mreg, defining mreg will destroy the value of reg
ERROR
}
}
}
}
}
// finishing the block, we only preserve what are alive at the end of the block
alive.TRIM(block.liveout())
if block is NOT visited {
ALIVE[block] <- alive
push_successors = true
} else {
alive_before <- ALIVE[block]
alive.INTERSECT(alive_before)
if alive <> alive_before {
push_successors = true
}
ALIVE[block] <- alive
}
mark block as visited
if push_successors {
if block has 1 successor {
push successor to work queue
ALIVE[successor] <- alive
} else block has 2 successors {
push successor1 to work queue
alive1 <- alive.COPY
ALIVE[successor1] <- alive1.TRIM(successor1.livein())
push successor2 to work queue
alive2 <- alive.COPY
ALIVE[successor2] <- alive2.TRIM(successor2,livein())
}
}
}
}
```https://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/18Separating compilation information from IR data structures2017-06-10T10:28:39+10:00Yi LinSeparating compilation information from IR data structuresIt is poorly designed that the data structures for IR also contains information that is gradually generated during compilation, such as:
* `use_count` and `expr` in `SSAVarEntry`
* `block_trace` in `MuFunctionVersion`
* `exception_block...It is poorly designed that the data structures for IR also contains information that is gradually generated during compilation, such as:
* `use_count` and `expr` in `SSAVarEntry`
* `block_trace` in `MuFunctionVersion`
* `exception_blocks` in `FunctionContent`
* `control_flow` in `Block`
They are initially not available, and are generated during compilation. They can be safely destroyed after the compilation.
These compilation information should be stored separately from the IR.https://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/17Implementing ALLOCA/ALLOCA_HYBRID2017-07-19T15:07:32+10:00Yi LinImplementing ALLOCA/ALLOCA_HYBRIDCurrently the compiler assumes that frame size is constant at compile time.
For *x86_64*, stack pointer needs to be 16-bytes aligned before a function call. The compiler ensures this by:
* `rbp` is always 16-bytes aligned.
* frame ...Currently the compiler assumes that frame size is constant at compile time.
For *x86_64*, stack pointer needs to be 16-bytes aligned before a function call. The compiler ensures this by:
* `rbp` is always 16-bytes aligned.
* frame size is a multiple of 16-bytes (align up to 16-bytes if it is not, see `frame.rs`).
* if any call argument is passed on stack, if necessary, push a padding value to stack so that `rsp` is still 16-bytes aligned after pushing call arguments.
* restoring from an exception will set `rsp` from `rbp` and the constant frame size.
We can implement `ALLOCA` by computing allocating size during compile time, and frame size is still a compile-time constant. However, the implementation of `ALLOCA_HYBRID` will break this assumption. A straightforward solution is to make the alloca'd size always a multiple of 16-bytes (for alignment requirement), and record a *current frame size* somewhere (for restoring from exception) - this would keep most of the above unchanged. This issue tracks related discussion.Yi LinYi Linhttps://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/16IR Building: Confusion in `FuncRef` ID2017-03-23T03:50:11+11:00Yi LinIR Building: Confusion in `FuncRef` IDCurrently when building a bundle through IR building API, for a function of Mu ID *x*, a `MuFunction` with *x* is created. Also a constant `FuncRef`pointing to the function is created, and the constant has its own ID as *x*. In a word, *...Currently when building a bundle through IR building API, for a function of Mu ID *x*, a `MuFunction` with *x* is created. Also a constant `FuncRef`pointing to the function is created, and the constant has its own ID as *x*. In a word, *x* is used for both for the function, and the constant. I am not sure if this is a typo or a deliberate decision. I assume that everything in Mu should have a unique ID, and using one ID for both a function and a funcref constant seems confusing. @u5211824https://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/15Removing unnecessary use of lock/`Option` for IR data structures2017-05-19T14:32:29+10:00Yi LinRemoving unnecessary use of lock/`Option` for IR data structuresThis issue tracks unnecessary uses of locks in Zebu. The old Mu spec requires mutation on the IR, for example, creating a IR node, then adding a name to it. Thus either locks or `Option<T>` are introduced to allow mutation. The new spec ...This issue tracks unnecessary uses of locks in Zebu. The old Mu spec requires mutation on the IR, for example, creating a IR node, then adding a name to it. Thus either locks or `Option<T>` are introduced to allow mutation. The new spec makes the IR almost (if not completely) immutable. It is possible to remove most of the locks.
Some rewrite compilation passes try to mutate on IR nodes as well. However, we can always copy and update instead of mutating.
Lock:
* [x] `name` field in `MuEntityHeader`
* [x] `ops` field in `Instruction`
`Option<T>`:
* [ ] Most of the `Option` uses in `ir.rs`https://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/13In-place code generation2017-03-01T17:40:17+11:00Yi LinIn-place code generationThe idea is to generate machine code directly without going through some representation of the machine code. The motivation is to save allocation/spaces for the machine code representation. Current x86_64 assembly backend tests the idea ...The idea is to generate machine code directly without going through some representation of the machine code. The motivation is to save allocation/spaces for the machine code representation. Current x86_64 assembly backend tests the idea of in-place code generation (though it is completely not necessary for an assembly backend to do this).
Issues:
* Too many eliminated moves (that are turned into `nop`), which results in bad performance. One solution is to do compaction on the code in the end (slide/copy the code to remove 'holes').
* In-place code generation introduces extra memory overhead, as we need to save locations of registers.
*A note for x86_64 asm backend:
Metadata for instruction can be more optimised. For example, currently I am using `LinkedHashMap<MuID, Vec<ASMLocation>>` to store information on used and defined registers. Both `LinkedHashMap` and `Vec` is expensive. We can use fixed length array, or simply `use1`, `use2`, `use3`...*
* It makes machine code level optimisations and transformations hard to implement.
Questions:
* Is it possible that we can emit machine code (binary) before we know the final code? For example, for `jmp`, it may turn into different machine code based on how big the offset is. However any assembler may need to deal with this question.
We need to discuss this more about this before starting implementing JIT.https://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/10Use higher byte of x86 16bits registers2017-06-14T00:06:06+10:00Yi LinUse higher byte of x86 16bits registersCurrently the register allocator only uses lower bytes of the aliased registers, AL/BL/CL/DL. The high byte (AH/BH/CH/DH) is wasted. Currently the register allocator only uses lower bytes of the aliased registers, AL/BL/CL/DL. The high byte (AH/BH/CH/DH) is wasted. https://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/6IR rewrite pass before instruction selection2017-11-21T13:50:53+11:00Yi LinIR rewrite pass before instruction selectionSome instructions such as `NEW` will be expanded into a sequence of code (may involve new blocks), and some instructions such as `THREADEXIT` will be expanded into a CCall into runtime service functions. Currently this is done at instruc...Some instructions such as `NEW` will be expanded into a sequence of code (may involve new blocks), and some instructions such as `THREADEXIT` will be expanded into a CCall into runtime service functions. Currently this is done at instruction selection pass, by directly expanding such instructions into machine code. Alternatively, a better choice is to rewrite/expand such instructions into Mu IR before instruction selection. Yi LinYi Linhttps://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/2Granularity of RwLocks2017-06-14T00:11:57+10:00Kunshan WangGranularity of RwLocksDesigned for concurrency, Rust enforces the use of proper synchronisation when working with shareable memory locations. Atomic integers, Mutexes and RwLocks are a few examples. These types can be used even if the memory locations are "im...Designed for concurrency, Rust enforces the use of proper synchronisation when working with shareable memory locations. Atomic integers, Mutexes and RwLocks are a few examples. These types can be used even if the memory locations are "immutably shared": the user can gain read-write access by obtaining the read-write lock, or using atomic operations.
In the centre of the mu-impl-fast is the [VM](https://gitlab.anu.edu.au/mu/mu-impl-fast/blob/master/src/vm/vm.rs#L25) object. It contains the global IR information, including all types, constants and functions ever loaded into the micro VM. Accesses to these data structures must be properly synchronised.
Currently the way of synchronisation is fine-grained locking. Every HashMap is protected by a RwLock. This will enable data-race-free access to the shared data structures, but it does have its disadvantages.
- **Potential deadlocks.** Sometimes one operation needs to have read-write access to multiple objects. It is very easy to run into the simplest [ABBA deadlock](http://cmdlinelinux.blogspot.com.au/2014/01/linux-kernel-deadlocks-and-how-to-avoid.html) if the locks are obtained in different orders by different threads.
- workaround: If "bundle loading" is the only operation that needs RW access, it can simply obtain all RW locks before performing any operations. During the operations, it is free to obtain any extra RO locks if necessary.
- **The locking is too fine-grained.** Putting locks on too many objects will increase the space overhead, and will require the user to perform more locking operations which will increase the time overhead.
- workaround: For any operation that needs RO access, holding the RO locks during the entire operation rather than frequently acquiring/releasing the locks can reduce the time overhead, but not the space overhead.
- **Using locks instead of lock-free data structures.** Bundle loading is extremely rare comparing to RO accesses. In the current design, Mu-level exception handling needs to obtain RO access to the VM stack-unwinding metadata. Although exception handling is slow-path from the user program's perspective, it is still more common than client-to-MuVM API calls. User-level exception handling should be lock-free.
Ideally, the shared data structures should be implemented with **transactional lock-free data structures that strongly biases towards fast read-only accesses**. The ideal data structure should be the [RCU](https://en.wikipedia.org/wiki/Read-copy-update)-like [multi-version concurrency control](https://en.wikipedia.org/wiki/Multiversion_concurrency_control) algorithm. Actually all Mu IR nodes are immutable by design (NOTE: Function "redefinition" is actually, by design, "adding a version to a function", so existing versions do not change. There is also no API that asks "what versions does a function have".) , so it doesn't matter if the client sees a slightly older version of the data structure: it will not see some new-coming nodes concurrently being inserted, but whatever it sees, it is the correct version that it should see. (For example, it is OK if a newly-started function runs its older version rather than the "newest version", because it is allowed, unless the client uses proper fences and synchronisations by itself.)
But since our current goal is to have a working VM, we can postpone these optimisations and just use the status quo for now.