mu-impl-fast issueshttps://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues2018-09-10T06:31:05+10:00https://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/60[aarch64] Unimplemented Backend Features2018-09-10T06:31:05+10:00Isaac Garianoisaac@ecs.vuw.ac.nz[aarch64] Unimplemented Backend FeaturesI noticed there are lots of things that zebu doesn't implement, so I decided to make a list, and will try and keep it up to do (I would like to implement stuff here specifically the mu specific things like traps, watch points, threads an...I noticed there are lots of things that zebu doesn't implement, so I decided to make a list, and will try and keep it up to do (I would like to implement stuff here specifically the mu specific things like traps, watch points, threads and stack related things, as I feel currently Zebu is mostly just LLVM with exceptions and a garbage collector that doesn't collect..)
I'm not entirely sure what the x86-64 backend implements, so i've only listed things for aarch64: (note features with a * next to them haven't been tested properly yet), I have also included things I believe are not implemented on x86-64 but are on aarch64:
Types:
* [ ] `int<n>`:
* [x] n = 1 (some arithmetic)
* [x] * n <= 64 and n != 8, 16, 32, or 64
* [x] * n = 128, floating point conversions
* [ ] n > 64 and n != 128
* [ ] `struct` SSA variables
* [ ] `hybrid` SSA variables
* [ ] `array` SSA variables
* [x] `tagref64`
* [x] `threadref`
* [x] `stackref``
* [ ] `framecursorref`
* [ ] `irbuilderref`
* [ ] `vector<t n>`
Instruction Clauses:
* [ ] keep Alive Clauses
* [ ] Exception clauses
* [x] CALL
* [ ] binop (divison by zero)
* [ ] NEW/NEYHYBRID (allocation failure)
* [ ] LOAD/STORE/CMPXCHG (null referenced)
* [ ] CCALL (implementation defined)
Instructions:
* [ ] `TAILCALL`:
* [x] When the callee's stack argument size is less than or equal to the caller's
* [ ] When the callee's stack argument size is greater than the caller's
* [ ] For unimplemented types:
* [ ] `EXTRACTVALUE`/`INSERTVALUE`
* [ ] `EXTRACTELEMENT`/`INSERTELEMENT`
* [ ] `SHUFFLEVECTOR`
* [ ] Memory
* [x] `ALLOCA`/`ALLOCAHYBRID`
* [x] * `CMPXCHG`
* [x] * `FENCE`
* [ ] `ATOMICRMW`
* [ ] Traps/watchpoints
* [ ] `TRAP`
* [ ] `WATCHPOINT`
* [ ] `WPBRANCH`
* [x] Thread/stack instructions
* [x] `NEWTHREAD`
* [x] `SWAPTSTACK`
Common Instructions
* [x] thread and stack things:
* [x] new_stack
* [x] kill_stack
* [x] thread_exit
* [x] current_stack
* [x] set_threadlocal
* [x] get_threadlocal
* [x] tr64.*
* [ ] futex.*
* [ ] kill_dependency
* [ ] native.* (except nativ.pin and nativ.unpnin)
* [ ] meta.*
* [ ] irbuilder.*https://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/58[x86-64] Alias don't seam to override previous ones2018-09-10T06:31:05+10:00Isaac Garianoisaac@ecs.vuw.ac.nz[x86-64] Alias don't seam to override previous onesThe pytest `test_heap.py::test_preserve_ref_field` succeeds when run on it's own, but fails when run as a group (e.g. with `pytest test_heap.py`), I looked at the generated code and it's functionally identical in both cases.
All three te...The pytest `test_heap.py::test_preserve_ref_field` succeeds when run on it's own, but fails when run as a group (e.g. with `pytest test_heap.py`), I looked at the generated code and it's functionally identical in both cases.
All three tests in `test_heap.py` creat functions called `test_fnc` and global cells called `gcl`, it then loads the generated shared libraries. The first two tests are supposed to return the number 42, the last one is supposed to return 298, but actually returns 42, as such I think it's quite possible when loading a shared library with an alias defined, it doesn't properly override the previous alias.
I am defining aliases using `.equiv`, like so:
```
.globl __mu_gcl
__mu_gcl:
.globl gcl
.equiv gcl, __mu_gcl
```
And the same with `test_func`.
I could try and replace the aliases with labels, but then the function would have two names, and may cause problems if we try and resolve an address to a name (which we don't currently do, except for native functions in back trace printing).
Perhaps I could use another directive (like `.set`, or `.eqv`, but the docs say their the same).Isaac Garianoisaac@ecs.vuw.ac.nzIsaac Garianoisaac@ecs.vuw.ac.nzhttps://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/37[x86-64] Passing a 128-bit integer to a C function using CCALL2017-06-11T19:34:41+10:00Isaac Garianoisaac@ecs.vuw.ac.nz[x86-64] Passing a 128-bit integer to a C function using CCALLPassing a 128-bit integer to a C function (ussing CCALL) dosn't work on x86-64, e.g:
I wrote a test to check that 128-bit integers are passed correctly when calling functions (it is designed to cause the last argument to be placed on th...Passing a 128-bit integer to a C function (ussing CCALL) dosn't work on x86-64, e.g:
I wrote a test to check that 128-bit integers are passed correctly when calling functions (it is designed to cause the last argument to be placed on the stack on x86, but on aarch64 it should be in a register).
arg_overflow.uir:
```
.funcsig test_arg_overflow_sig = () -> ()
.funcdef my_main<()->()>
{
entry():
CCALL #DEFAULT <ufuncptr<test_arg_overflow_sig> test_arg_overflow_sig> <ufuncptr<test_arg_overflow_sig>>EXTERN "c_test_arg_overflow" ()
CALL <test_arg_overflow_sig> mu_test_arg_overflow()
RET
}
.funcsig arg_overflow_sig = (int<64> int<128> int<128> int<128>) -> ()
.funcdef mu_test_arg_overflow<test_arg_overflow_sig>
{
entry():
int128_0 = ADD <int<128>> <int<128>>0 <int<128>>0
int128_F = ADD <int<128>> <int<128>>0 <int<128>>0xFFFFFFFFFFFFFFFF0000000000000000
CCALL #DEFAULT <ufuncptr<arg_overflow_sig> arg_overflow_sig> <ufuncptr<arg_overflow_sig>>EXTERN "arg_overflow" (<int<64>>0 int128_0 int128_0 int128_F)
RET
}
```
It needs to be compiled with the following C code:
arg_overflow.c
```
#include <stdint.h>
#include <stdio.h>
void arg_overflow(uint64_t a, __int128_t b, __int128_t c, __int128_t d) {
printf("d = %016lX%016lX\n", (uint64_t)(d >> 64), (uint64_t)d);
}
void c_test_arg_overflow()
{
arg_overflow(0, 0, 0, (__int128_t)(0xFFFFFFFFFFFFFFFF) << 64);
}
```
On x86-64 using the line ` ./muc -r -f my_main arg_overflow.uir arg_overflow/arg_overflow`, (using the latest commit in the aarch64 branch) it fails to compile, giving the error:
```
thread '<unnamed>' panicked at 'not yet implemented', src/compiler/backend/arch/x86_64/inst_sel.rs:3181
stack backtrace:
0: std::sys::imp::backtrace::tracing::imp::unwind_backtrace
at /checkout/src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
1: std::sys_common::backtrace::_print
at /checkout/src/libstd/sys_common/backtrace.rs:71
2: std::panicking::default_hook::{{closure}}
at /checkout/src/libstd/sys_common/backtrace.rs:60
at /checkout/src/libstd/panicking.rs:355
3: std::panicking::default_hook
at /checkout/src/libstd/panicking.rs:371
4: std::panicking::rust_panic_with_hook
at /checkout/src/libstd/panicking.rs:549
5: std::panicking::begin_panic
6: mu::compiler::backend::x86_64::inst_sel::InstructionSelection::emit_c_call_ir
7: mu::compiler::backend::x86_64::inst_sel::InstructionSelection::instruction_select
8: <mu::compiler::backend::x86_64::inst_sel::InstructionSelection as mu::compiler::passes::CompilerPass>::visit_function
9: mu::compiler::passes::CompilerPass::execute
10: mu::compiler::Compiler::compile
11: mu::vm::vm::VM::make_boot_image_internal
12: mu::vm::api::api_bridge::_forwarder__MuCtx__make_boot_image
13: main
14: __libc_start_main
15: _start
fatal runtime error: failed to initiate panic, error 5
Aborted (core dumped)
```
On aarch64 it compiles (well Zebu fails at linking, but it works if I add 'arg_overflow.c' to the clang command) and runs correctly, printing:
```
d = FFFFFFFFFFFFFFFF0000000000000000
d = FFFFFFFFFFFFFFFF0000000000000000
```Yi LinYi Linhttps://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/72[x86_64] code patching atomicity2017-08-04T17:48:08+10:00Yi Lin[x86_64] code patching atomicityCommit a49d8ab64c2e4e30ddeaf036c74d45fa01fb701b added a naive code patching mechanism to the JIT compiler - it simply rewrites the code array. This is problematic as it leaves the instruction in an incoherent state before the instruction...Commit a49d8ab64c2e4e30ddeaf036c74d45fa01fb701b added a naive code patching mechanism to the JIT compiler - it simply rewrites the code array. This is problematic as it leaves the instruction in an incoherent state before the instruction is completely overwritten.
One solution to this is to patch the first byte as `INT3`. The compiler then patches the rest bytes before it patches the first byte into expected instruction. If the patching is happening, and the instruction gets executed, it triggers a `INT3` trap, and the execution will trap into signal handler with `SIGSEGV`. The signal handler will check if current instruction is `INT3`(`0xCD`); if so, it sets back the program counter, and re-execute the instruction. Thus the execution will not proceed unless the patching is finished.
Once I start implementing a signal handler for Zebu, I will implement this.Yi LinYi Linhttps://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/31[x86_64] floating point/int128 conversion2017-06-13T13:38:11+10:00Yi Lin[x86_64] floating point/int128 conversionunimplemented for nowunimplemented for nowYi LinYi Linhttps://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/20[x86_64] int<1> arithmetics return wrong result2017-05-01T15:36:35+10:00Yi Lin[x86_64] int<1> arithmetics return wrong resultCurrently Zebu treats int<1> the same as int<8>. This is fine if the client only uses int<1> as boolean. If the client uses int<1> arithmetic operations, Zebu returns wrong result.
We should either explicitly forbid int<1> arithmetic...Currently Zebu treats int<1> the same as int<8>. This is fine if the client only uses int<1> as boolean. If the client uses int<1> arithmetic operations, Zebu returns wrong result.
We should either explicitly forbid int<1> arithmetics or implement it.https://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/67[x86_64] JIT backend2017-10-09T17:34:52+11:00Yi Lin[x86_64] JIT backendI will probably be working on a JIT backend for x86_64. Currently I am considering using Intel XED (https://intelxed.github.io/), which is a C library from Intel under Apache 2.0 license for encoding x86/x86_64 instructions. This tool (h...I will probably be working on a JIT backend for x86_64. Currently I am considering using Intel XED (https://intelxed.github.io/), which is a C library from Intel under Apache 2.0 license for encoding x86/x86_64 instructions. This tool (https://github.com/servo/rust-bindgen) can generate Rust bindings from C/C++ headers, with which I can easily use the library in our implementation.
As I implement the JIT backend, I expect a lot of changes to the existing code (as we switch from AOT focus to adapt both). So I suggest we postpone the JIT backend for aarch64 for a while until I have reached some milestones for x86_64, such as
* JIT compile an add function
* JIT compile a function that contains a loop
* JIT compile a function that contains a call
* JIT compile a function that uses VM runtime
Whether doing JIT or AOT is a build-time option for Zebu. Please let me know if you think this is problematic.Yi LinYi Linhttps://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/22[x86_64] Status flags undefined for mul/div/idiv2017-06-29T12:53:30+10:00Yi Lin[x86_64] Status flags undefined for mul/div/idivThe following table summerizes how Mu integer binops are mapped to x86_64 insts, and how x86_64 insts affect status flags.
| Mu IR | X86_64 Inst | #N (signed) | #Z (zero) | #C (carry) | #V (overflow) |
|:-----: |:-----------: ...The following table summerizes how Mu integer binops are mapped to x86_64 insts, and how x86_64 insts affect status flags.
| Mu IR | X86_64 Inst | #N (signed) | #Z (zero) | #C (carry) | #V (overflow) |
|:-----: |:-----------: |:-----------: |:---------: |:----------: |:-------------: |
| ADD | add | ✓ | ✓ | ✓ | ✓ |
| SUB | sub | ✓ | ✓ | ✓ | ✓ |
| AND | and | ✓ | ✓ | - | - |
| OR | or | ✓ | ✓ | - | - |
| XOR | xor | ✓ | ✓ | - | - |
| MUL | mul | ✗ | ✗ | ✓ | ✓ |
| UDIV | div | ✗ | ✗ | - | - |
| SDIV | idiv | ✗ | ✗ | - | - |
| UREM | div | ✗ | ✗ | - | - |
| SREM | idiv | ✗ | ✗ | - | - |
| SHL | shl | ✓ | ✓ | - | - |
| LSHR | shr | ✓ | ✓ | - | - |
| ASHR | sar | ✓ | ✓ | - | - |
`mul`, `div` and `idiv` generate undefined signed flag (#N), and zero flag(#Z). We will need to generate extra code to check, and set those flags.https://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/64[x86_64] Unimplemented Backend Features2017-07-20T20:57:04+10:00Yi Lin[x86_64] Unimplemented Backend FeaturesThis issue tracks Mu specification coverage in x86_64 backend.
Types:
* [ ] `int<n>`:
* [ ] n = 1 (some arithmetic)
* [ ] * n <= 64 and n != 8, 16, 32, or 64
* [ ] * n = 128, floating point conversions
* [ ] n > 64 and n != 128
* ...This issue tracks Mu specification coverage in x86_64 backend.
Types:
* [ ] `int<n>`:
* [ ] n = 1 (some arithmetic)
* [ ] * n <= 64 and n != 8, 16, 32, or 64
* [ ] * n = 128, floating point conversions
* [ ] n > 64 and n != 128
* [ ] `struct` SSA variables
* [ ] `hybrid` SSA variables
* [ ] `array` SSA variables
* [ ] `tagref64`
* [ ] `threadref`
* [ ] `stackref``
* [ ] `framecursorref`
* [ ] `irbuilderref`
* [ ] `vector<t n>`
Instruction Clauses:
* [ ] keep Alive Clauses
* [ ] Exception clauses
* [x] CALL
* [ ] binop (divison by zero)
* [ ] NEW/NEYHYBRID (allocation failure)
* [ ] LOAD/STORE/CMPXCHG (null referenced)
* [ ] CCALL (implementation defined)
Instructions:
* [ ] `TAILCALL`
* [ ] For unimplemented types:
* [ ] `EXTRACTVALUE`/`INSERTVALUE`
* [ ] `EXTRACTELEMENT`/`INSERTELEMENT`
* [ ] `SHUFFLEVECTOR`
* [ ] Memory
* [ ] `ALLOCA`/`ALLOCAHYBRID`
* [ ] * `CMPXCHG`
* [x] * `FENCE`
* [ ] `ATOMICRMW`
* [ ] Traps/watchpoints
* [ ] `TRAP`
* [ ] `WATCHPOINT`
* [ ] `WPBRANCH`
* [ ] Thread/stack instructions
* [ ] `NEWTHREAD`
* [ ] `SWAPTSTACK`
Common Instructions
* [ ] thread and stack things:
* [ ] current_stack
* [ ] thread_exit
* [ ] new_stack
* [ ] tr64.*
* [ ] futex.*
* [ ] kill_dependency
* [ ] native.* (except nativ.pin and nativ.unpnin)
* [ ] meta.*
* [ ] irbuilder.*https://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/32Calling function defined in another bundle does not work2017-06-07T00:22:48+10:00Isaac Garianoisaac@ecs.vuw.ac.nzCalling function defined in another bundle does not workI am trying to define a function in one bundle, and call it another, and i'm getting an error:
Heres an example that will reproduce the problem:
file argc_exit.uir:
```
.funcsig @exit_sig = (int<32>) -> ()
.funcdef @argc.exit <@...I am trying to define a function in one bundle, and call it another, and i'm getting an error:
Heres an example that will reproduce the problem:
file argc_exit.uir:
```
.funcsig @exit_sig = (int<32>) -> ()
.funcdef @argc.exit <@exit_sig>
{
%entry(<int<32>>%arg):
CCALL #DEFAULT <ufuncptr<@exit_sig> @exit_sig> <ufuncptr<@exit_sig>>EXTERN "exit"(%arg)
RET
}
```
file argc_inline.uir:
```
.typedef %char = int<8>
.funcdef @my_main <(int<32> uptr<uptr<%char>>)->(int<32>)> VERSION @my_main_v1
{
%entry(<int<32>>%argc <uptr<uptr<%char>>>%argv):
CALL <(int<32>)->()> @argc.exit (%argc)
RET <int<32>>1
}
```
Then using my mu-tool-compiler:
`./muc -r -f my_main argc_exit.uir argc_inline.uir emit/argc`
(use -c if you wan't to see the API calls it uses).
`thread '<unnamed>' panicked at 'Operand 1013' is neither a local var or a global var', src/vm/api/api_impl/muirbuilder.rs:1290 stack backtrace:`
(the symbol with Id 1013 is @argc.exit).
I tracked the error down and it appears to be comming from the function `get_treenode` in (src\vm\api\api_impl\muirbuilder.rs).
My guess is the API implementation only looks for things defined in the current bundle and not other bundles.
However from my understanding of the Mu-spec you should be able to refer to entities declared in previously loaded bundles.
A workaround is to combine both files into the same bundle such as with `./muc -r -f my_main <(cat argc_exit.uir && cat argc_inline.uir) emit/argc`.Kunshan WangKunshan Wanghttps://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/50Capture stack overflow/underflow signal2018-09-10T06:31:05+10:00Yi LinCapture stack overflow/underflow signalWe are guarding pages above and below the Mu stack, thus an overflow/underflow would trigger a write/read protection. We should register handling to catch the signal, and identify its cause (overflow/underflow or other segfaults).We are guarding pages above and below the Mu stack, thus an overflow/underflow would trigger a write/read protection. We should register handling to catch the signal, and identify its cause (overflow/underflow or other segfaults).https://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/85DYLD_LIBRARY_PATH dependent tests fails on macOS with SIP enabled2017-09-19T15:39:44+10:00Zixian CaiDYLD_LIBRARY_PATH dependent tests fails on macOS with SIP enabledhttps://groups.google.com/forum/#!topic/caffe-users/waugt62RQMU
https://github.com/oracle/node-oracledb/issues/231
https://developer.apple.com/library/content/documentation/DeveloperTools/Conceptual/DynamicLibraries/100-Articles/RunpathD...https://groups.google.com/forum/#!topic/caffe-users/waugt62RQMU
https://github.com/oracle/node-oracledb/issues/231
https://developer.apple.com/library/content/documentation/DeveloperTools/Conceptual/DynamicLibraries/100-Articles/RunpathDependentLibraries.html
We should encode these differentlyhttps://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/62Eliminating jump to jump pattern2017-07-20T00:19:37+10:00Yi LinEliminating jump to jump patternFor code such as:
```
jmp L1
...
L1:
jmp L2
```
we should capture this (probably in peephole optimization), and replace it as:
```
jmp L2
...
```For code such as:
```
jmp L1
...
L1:
jmp L2
```
we should capture this (probably in peephole optimization), and replace it as:
```
jmp L2
...
```https://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/42Exception handling for native frames2017-09-09T14:19:11+10:00Yi LinException handling for native framesCurrent implementation will fail for being unable to find information about native frames.Current implementation will fail for being unable to find information about native frames.https://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/89fannkuchredux slowdown after commit de9d31255831eb89299d93960a3c7a4de514f3a82017-10-09T16:28:58+11:00Yi Linfannkuchredux slowdown after commit de9d31255831eb89299d93960a3c7a4de514f3a8de9d31255831eb89299d93960a3c7a4de514f3a8 corrected the logic in x86 backend to decide whether an integer constant is a valid x86 immediate number (32 bits). It is intended to allow more constants as x86 immediate numbers. However, on mub...de9d31255831eb89299d93960a3c7a4de514f3a8 corrected the logic in x86 backend to decide whether an integer constant is a valid x86 immediate number (32 bits). It is intended to allow more constants as x86 immediate numbers. However, on mubench (http://squirrel.anu.edu.au/mubench/) it shows a slowdown for fannkuchredux after the commit. I need to investigate into this.Yi LinYi Linhttps://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/46Fix frame slot offset after register allocation2017-06-28T17:27:09+10:00Yi LinFix frame slot offset after register allocationZebu is doing the following things regarding frame size in this order:
1. emit code to save callee saved registers and reserve frame slots for them
1. do register allocation, spill registers and reserve frame slots for them
1. rewrite co...Zebu is doing the following things regarding frame size in this order:
1. emit code to save callee saved registers and reserve frame slots for them
1. do register allocation, spill registers and reserve frame slots for them
1. rewrite code, and redo register allocation until finished
1. figure out which callee saved registers are not used, and remove the saving/restoring code for them
1. patch the frame size
We do not know the actual frame size until 3 is done. However, in 2, we need to make assumptions and emit code about frame slots.
Currently though we patch the frame size in the end, we do not deduct the space initially reserved for unused callee saved register.
We should offset all the frame slots, and patch spilled location in the code.Yi LinYi Linhttps://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/21GC related C functions return inaccurate result2017-11-21T13:46:42+11:00Yi LinGC related C functions return inaccurate result`gc/src/heap/gc/clib_x64.c` contains C functions for GC, such as `get_registers()`, which contains inline assembly to save all values in general purpose registers into an array. However C compilers may generate code that changes the regi...`gc/src/heap/gc/clib_x64.c` contains C functions for GC, such as `get_registers()`, which contains inline assembly to save all values in general purpose registers into an array. However C compilers may generate code that changes the registers before saving.
We may want to rewrite the function in assembly instead of C. And I believe it is reasonable that we want to eliminate all C functions in the code base and replace them with assembly (all C functions are pretty simple). Ideally we want only Rust code and assembly in the code base.https://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/2Granularity of RwLocks2017-06-14T00:11:57+10:00Kunshan WangGranularity of RwLocksDesigned for concurrency, Rust enforces the use of proper synchronisation when working with shareable memory locations. Atomic integers, Mutexes and RwLocks are a few examples. These types can be used even if the memory locations are "im...Designed for concurrency, Rust enforces the use of proper synchronisation when working with shareable memory locations. Atomic integers, Mutexes and RwLocks are a few examples. These types can be used even if the memory locations are "immutably shared": the user can gain read-write access by obtaining the read-write lock, or using atomic operations.
In the centre of the mu-impl-fast is the [VM](https://gitlab.anu.edu.au/mu/mu-impl-fast/blob/master/src/vm/vm.rs#L25) object. It contains the global IR information, including all types, constants and functions ever loaded into the micro VM. Accesses to these data structures must be properly synchronised.
Currently the way of synchronisation is fine-grained locking. Every HashMap is protected by a RwLock. This will enable data-race-free access to the shared data structures, but it does have its disadvantages.
- **Potential deadlocks.** Sometimes one operation needs to have read-write access to multiple objects. It is very easy to run into the simplest [ABBA deadlock](http://cmdlinelinux.blogspot.com.au/2014/01/linux-kernel-deadlocks-and-how-to-avoid.html) if the locks are obtained in different orders by different threads.
- workaround: If "bundle loading" is the only operation that needs RW access, it can simply obtain all RW locks before performing any operations. During the operations, it is free to obtain any extra RO locks if necessary.
- **The locking is too fine-grained.** Putting locks on too many objects will increase the space overhead, and will require the user to perform more locking operations which will increase the time overhead.
- workaround: For any operation that needs RO access, holding the RO locks during the entire operation rather than frequently acquiring/releasing the locks can reduce the time overhead, but not the space overhead.
- **Using locks instead of lock-free data structures.** Bundle loading is extremely rare comparing to RO accesses. In the current design, Mu-level exception handling needs to obtain RO access to the VM stack-unwinding metadata. Although exception handling is slow-path from the user program's perspective, it is still more common than client-to-MuVM API calls. User-level exception handling should be lock-free.
Ideally, the shared data structures should be implemented with **transactional lock-free data structures that strongly biases towards fast read-only accesses**. The ideal data structure should be the [RCU](https://en.wikipedia.org/wiki/Read-copy-update)-like [multi-version concurrency control](https://en.wikipedia.org/wiki/Multiversion_concurrency_control) algorithm. Actually all Mu IR nodes are immutable by design (NOTE: Function "redefinition" is actually, by design, "adding a version to a function", so existing versions do not change. There is also no API that asks "what versions does a function have".) , so it doesn't matter if the client sees a slightly older version of the data structure: it will not see some new-coming nodes concurrently being inserted, but whatever it sees, it is the correct version that it should see. (For example, it is OK if a newly-started function runs its older version rather than the "newest version", because it is allowed, unless the client uses proper fences and synchronisations by itself.)
But since our current goal is to have a working VM, we can postpone these optimisations and just use the status quo for now.
https://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/29Implement SWITCH with switch table2017-06-06T15:10:22+10:00Yi LinImplement SWITCH with switch tableCurrently the compiler generates cascading conditional branches for SWITCH instruction. We should consider using switch table if there are many case arms.Currently the compiler generates cascading conditional branches for SWITCH instruction. We should consider using switch table if there are many case arms.https://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/17Implementing ALLOCA/ALLOCA_HYBRID2017-07-19T15:07:32+10:00Yi LinImplementing ALLOCA/ALLOCA_HYBRIDCurrently the compiler assumes that frame size is constant at compile time.
For *x86_64*, stack pointer needs to be 16-bytes aligned before a function call. The compiler ensures this by:
* `rbp` is always 16-bytes aligned.
* frame ...Currently the compiler assumes that frame size is constant at compile time.
For *x86_64*, stack pointer needs to be 16-bytes aligned before a function call. The compiler ensures this by:
* `rbp` is always 16-bytes aligned.
* frame size is a multiple of 16-bytes (align up to 16-bytes if it is not, see `frame.rs`).
* if any call argument is passed on stack, if necessary, push a padding value to stack so that `rsp` is still 16-bytes aligned after pushing call arguments.
* restoring from an exception will set `rsp` from `rbp` and the constant frame size.
We can implement `ALLOCA` by computing allocating size during compile time, and frame size is still a compile-time constant. However, the implementation of `ALLOCA_HYBRID` will break this assumption. A straightforward solution is to make the alloca'd size always a multiple of 16-bytes (for alignment requirement), and record a *current frame size* somewhere (for restoring from exception) - this would keep most of the above unchanged. This issue tracks related discussion.Yi LinYi Lin