mu-impl-fast issueshttps://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues2017-07-12T10:49:35+10:00https://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/54[aarch64] Converting from floating-point overflow issue2017-07-12T10:49:35+10:00Isaac Garianoisaac@ecs.vuw.ac.nz[aarch64] Converting from floating-point overflow issueCurrently the conversions from floating points (FPTOSI, FPTOUI) will only work correctly when there is an overflow (i.e. the source is larger or smaller than the largest/smallest values in the destination) for 32-bit and 64-bit, and 128-...Currently the conversions from floating points (FPTOSI, FPTOUI) will only work correctly when there is an overflow (i.e. the source is larger or smaller than the largest/smallest values in the destination) for 32-bit and 64-bit, and 128-bit (I should probably test this).
For other sizes it will just truncate the result of the 32-bit/64-bit operation, which will not produce the correct value if the floating point overflows the smaller type.Isaac Garianoisaac@ecs.vuw.ac.nzIsaac Garianoisaac@ecs.vuw.ac.nzhttps://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/53Exception Handling when arguments are passed on the stack2017-07-12T10:49:35+10:00Isaac Garianoisaac@ecs.vuw.ac.nzException Handling when arguments are passed on the stackWhen arguments are passed on the stack to a call whose exception-clause/catch-block is executed (due to an exception being thrown) the stack pointer is incorrectly restored. This can be demonstrated with the simple program:
```
.funcsig...When arguments are passed on the stack to a call whose exception-clause/catch-block is executed (due to an exception being thrown) the stack pointer is incorrectly restored. This can be demonstrated with the simple program:
```
.funcsig stack_sig = (int<64> int<64> int<64> int<64> int<64> int<64> int<64>)->()
.funcdef stack_args <stack_sig>
{
entry(<int<64>> v0 <int<64>> v1 <int<64>> v2 <int<64>> v3 <int<64>> v4 <int<64>> v5 <int<64>> v6):
THROW <ref<void>> NULL
}
.funcdef test_except_stack_args <main_sig>
{
entry(<int<32>>argc <uptr<uptr<char>>>argv):
CALL <stack_sig> stack_args(<int<32>>0 <int<32>>1 <int<32>>2 <int<32>>3 <int<32>>4 <int<32>>5 <int<32>>6)
EXC (exit(<int<32>> 0) exit(<int<32>> 1))
exit(<int<32>> status):
RET status
}
```
(testable with `pytest tests/test_muc/test_simple.py::test_except_stack_args`).
This segfaults on x86-64 (which only has 6 argument registers, so the call to stack_args passes some arguments to the stack), but it works as expected on aarch64 (which has 8 argument registers, so nothing is passed on the stack).Isaac Garianoisaac@ecs.vuw.ac.nzIsaac Garianoisaac@ecs.vuw.ac.nzhttps://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/52Use make_boot_image() for all Zebu tests2018-09-10T06:31:05+10:00Yi LinUse make_boot_image() for all Zebu testsDue to the fact that we didn't have `make_boot_image()` in early development, currently the tests are written in different ways for code generation and linking:
1. use `compile_to_sharedlib()` (as exposed in API, but not in the Mu spec) ...Due to the fact that we didn't have `make_boot_image()` in early development, currently the tests are written in different ways for code generation and linking:
1. use `compile_to_sharedlib()` (as exposed in API, but not in the Mu spec) to link generated code to a dynamic library
1. use `make_boot_image()` (as exposed in API) to link generated code to an executable
1. use `compile_fnc()` to link generated code to a dynamic library for *cargo tests*
1. manually compile code, persist vm, and link for *cargo tests*
Currently 1 and 2 are internally using `make_boot_image()`. But others are not.
Since `make_boot_image()` in Zebu allows generating dynamic library or executable (depends on output file name), we can make all the test use `make_boot_image()`. This will make the linking for all tests unified, and make it easy to make changes.https://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/51Memory order in store/load API calls2018-09-10T06:31:05+10:00Yi LinMemory order in store/load API callsFor API calls `load()`/`store()`, they take memory order as an argument. However I am uncertain whether we need to do anything special for different memory orders. Current implementation just does a plain load/store in `vm.handle_load()`...For API calls `load()`/`store()`, they take memory order as an argument. However I am uncertain whether we need to do anything special for different memory orders. Current implementation just does a plain load/store in `vm.handle_load()` and `vm.handle_store()`.https://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/50Capture stack overflow/underflow signal2018-09-10T06:31:05+10:00Yi LinCapture stack overflow/underflow signalWe are guarding pages above and below the Mu stack, thus an overflow/underflow would trigger a write/read protection. We should register handling to catch the signal, and identify its cause (overflow/underflow or other segfaults).We are guarding pages above and below the Mu stack, thus an overflow/underflow would trigger a write/read protection. We should register handling to catch the signal, and identify its cause (overflow/underflow or other segfaults).https://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/49Stack guarded page may not be able to detect overflow/underflow if frame size...2018-09-10T06:31:06+10:00Yi LinStack guarded page may not be able to detect overflow/underflow if frame size is larger than page size.Currently we are guarding (write/read protect) the two pages above and below stack memory so if any write/read occurs for the pages we will know a overflow/underflow happens. However if a frame size is larger than the guarded page size (...Currently we are guarding (write/read protect) the two pages above and below stack memory so if any write/read occurs for the pages we will know a overflow/underflow happens. However if a frame size is larger than the guarded page size (e.g. `ALLOCA` a large piece of stack memory, but never access it), it is possible that we totally skip the guarded page, and cause the stack keeps growing beyond the guarded boundary.https://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/48Set user-defined thread local when launching boot image2019-04-09T16:39:56+10:00Yi LinSet user-defined thread local when launching boot imageCurrently Zebu assumes no user-defined thread local due to the fact that at the time the code was written, we cannot persist heap objects.
However, there is no reason preventing us implementing this now. Currently Zebu assumes no user-defined thread local due to the fact that at the time the code was written, we cannot persist heap objects.
However, there is no reason preventing us implementing this now. Yi LinYi Linhttps://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/47Trace scheduling2017-09-12T16:32:51+10:00Yi LinTrace schedulingIn fib performance test `python3 -m mubench local ./example/test_mu_fib.yml`, code generated by Zebu is significantly slower than clang. The main reason seems to be that we are generating awful trace for the code, which causes 3 extra ju...In fib performance test `python3 -m mubench local ./example/test_mu_fib.yml`, code generated by Zebu is significantly slower than clang. The main reason seems to be that we are generating awful trace for the code, which causes 3 extra jumps(unconditional or conditional).
[fib-zebu.s](/uploads/7c4e865b48b98ce298c7d65609763241/fib-zebu.s)[fib-c_O1.s](/uploads/3582548c5971039e669ffa73014a74a6/fib-c_O1.s)
Currently Zebu generates basic block trace based on probability. When the branching probability is 50% (by default), it pretty randomly picks blocks. We should have a pass with certain heuristics to generate trace.Yi LinYi Linhttps://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/46Fix frame slot offset after register allocation2017-06-28T17:27:09+10:00Yi LinFix frame slot offset after register allocationZebu is doing the following things regarding frame size in this order:
1. emit code to save callee saved registers and reserve frame slots for them
1. do register allocation, spill registers and reserve frame slots for them
1. rewrite co...Zebu is doing the following things regarding frame size in this order:
1. emit code to save callee saved registers and reserve frame slots for them
1. do register allocation, spill registers and reserve frame slots for them
1. rewrite code, and redo register allocation until finished
1. figure out which callee saved registers are not used, and remove the saving/restoring code for them
1. patch the frame size
We do not know the actual frame size until 3 is done. However, in 2, we need to make assumptions and emit code about frame slots.
Currently though we patch the frame size in the end, we do not deduct the space initially reserved for unused callee saved register.
We should offset all the frame slots, and patch spilled location in the code.Yi LinYi Linhttps://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/45mu-perf-benchmarks fib: movq $-2,%ECX2017-07-12T10:49:35+10:00Isaac Garianoisaac@ecs.vuw.ac.nzmu-perf-benchmarks fib: movq $-2,%ECXWhen trying to run my mu_fib_fast benchmark (`python3 -m mubench local ./example/test_mu_fib.yml`, in the latest version of mu-perf-benchmarks) on x86-64 I get the following errror:
```
INFO - executing: "clang" "example/fib_mu_fast-em...When trying to run my mu_fib_fast benchmark (`python3 -m mubench local ./example/test_mu_fib.yml`, in the latest version of mu-perf-benchmarks) on x86-64 I get the following errror:
```
INFO - executing: "clang" "example/fib_mu_fast-emit/mubench$cb_init.s" "example/fib_mu_fast-emit/mubench$cb_begin.s" "example/fib_mu_fast-emit/mubench$cb_end.s" "example/fib_mu_fast-emit/mubench$cb_report.s" "example/fib_mu_fast-emit/clockcb$tspec2dbl.s" "example/fib_mu_fast-emit/fib.s" "example/fib_mu_fast-emit/entry.s" "example/fib_mu_fast-emit/context.s" "example/fib_mu_fast-emit/main.c" "/home/isaacg/mu-impl-fast/target/release/libmu.a" "-ldl" "-lrt" "-lm" "-lpthread" "-rdynamic" "-o" "/root/mu-perf-benchmarks/example/fib-mu_fast"
INFO - ---out---
INFO -
INFO - ---err---
INFO - example/fib_mu_fast-emit/fib.s:44:11: error: invalid operand for instruction
movq $-1,%ECX
^~~~
example/fib_mu_fast-emit/fib.s:52:11: error: invalid operand for instruction
movq $-2,%ECX
^~~~
thread '<unnamed>' panicked at 'assertion failed: output.status.success()', src/testutil/mod.rs:41
stack backtrace:
0: std::sys::imp::backtrace::tracing::imp::unwind_backtrace
at /checkout/src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
1: std::sys_common::backtrace::_print
at /checkout/src/libstd/sys_common/backtrace.rs:71
2: std::panicking::default_hook::{{closure}}
at /checkout/src/libstd/sys_common/backtrace.rs:60
at /checkout/src/libstd/panicking.rs:355
3: std::panicking::default_hook
at /checkout/src/libstd/panicking.rs:371
4: std::panicking::rust_panic_with_hook
at /checkout/src/libstd/panicking.rs:549
5: std::panicking::begin_panic
6: mu::testutil::exec
7: mu::testutil::aot::link_primordial
8: mu::vm::vm::VM::make_boot_image_internal
9: mu::vm::api::api_bridge::_forwarder__MuCtx__make_boot_image
10: main
11: __libc_start_main
12: _start
fatal runtime error: failed to initiate panic, error 5
```
It ran without error on aarch64 (assuming johns code actually checks that fib produces the right result)Yi LinYi Linhttps://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/44Implementing TagRef642017-07-19T15:00:38+10:00Yi LinImplementing TagRef64reference: https://nikic.github.io/2012/02/02/Pointer-magic-for-efficient-dynamic-value-representations.htmlreference: https://nikic.github.io/2012/02/02/Pointer-magic-for-efficient-dynamic-value-representations.htmlYi LinYi Linhttps://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/43Compiled Exception table isn't created when using a shared mu library2017-07-12T10:49:35+10:00Isaac Garianoisaac@ecs.vuw.ac.nzCompiled Exception table isn't created when using a shared mu libraryCurrently the compiled exception table is constructed in `vm_resume`, however this is not run for shared librarys? (at least isn't in the test_rpython:throw_catch test).
We need to create this table (or update it) whenever we load in ne...Currently the compiled exception table is constructed in `vm_resume`, however this is not run for shared librarys? (at least isn't in the test_rpython:throw_catch test).
We need to create this table (or update it) whenever we load in new libraries generated by the compiler (but it must be done after the loading and relocation, so that the calls to dlysym get the correct address of callsite and catch labels).Yi LinYi Linhttps://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/42Exception handling for native frames2017-09-09T14:19:11+10:00Yi LinException handling for native framesCurrent implementation will fail for being unable to find information about native frames.Current implementation will fail for being unable to find information about native frames.https://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/41persisting VM natively2017-07-12T10:49:35+10:00Yi Linpersisting VM nativelyWe now use Rust's `rustc_serialise` to persist VM as a JSON string in the boot image. This clearly imposes large overhead in both boot image size and loading time. We should persist the VM in a native and relocatable way.We now use Rust's `rustc_serialise` to persist VM as a JSON string in the boot image. This clearly imposes large overhead in both boot image size and loading time. We should persist the VM in a native and relocatable way.https://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/40Hand crafted Fib bundle compiling failure2017-07-12T10:49:35+10:00John ZhangHand crafted Fib bundle compiling failure## Description
When compiling a hand crafted FIbonacci performance measurement Mu bundle, Zebu fails with the following message yet Holstein succeeds:
```
TRACE - instsel on node#1103 (STORE NotAtomic (%m.c.v.b.irheadnxt #1090 = GETFIEL...## Description
When compiling a hand crafted FIbonacci performance measurement Mu bundle, Zebu fails with the following message yet Holstein succeeds:
```
TRACE - instsel on node#1103 (STORE NotAtomic (%m.c.v.b.irheadnxt #1090 = GETFIELDIREF (%m.c.v.b.irhead #1089 = GETIREF %m.c.v.b.head #1088) 2) NullRef)
TRACE - instsel on STORE
thread '<unnamed>' panicked at 'a struct type does not have a layout yet: BackendTypeInfo { size: 8, alignment: 8, struct_layout: None, elem_padded_size: None, gc_type: GCType { id: 2, alignment: 8, fix_size: 8, fix_refs: Some(Map { offsets: [0], size: 8 }), var_refs: None, var_size: None } }', src/compiler/backend/arch/x86_64/inst_sel.rs:4640
```
## Bug Reproduction
* Clone mu/mu-perf-benchmarks repository and checkout mu/mu-perf-benchmarks@b1893146bf95cd6cf8388cb44fd03d76de5401aa (`zebu_bug`) branch.
* Edit `example/zebu_bug.yml` and set the correct `MU_ZEBU` environment variable.
* run `example/zebu_bug.yml` with `python3 mubench local example/zebu_bug.yml`, and check the error log file `example/fib_mu_zebu.log`.https://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/39Reduce serialised vm size (large bootimage size)2017-07-12T10:49:35+10:00Yi LinReduce serialised vm size (large bootimage size)For RPySOM interpreter, the boot image (executable) generated by Zebu is 175mb while the executable from the C backend is less than 1 mb. The main reason is that the Zebu's boot image contains a serialised vm (via Rust's serialisation),...For RPySOM interpreter, the boot image (executable) generated by Zebu is 175mb while the executable from the C backend is less than 1 mb. The main reason is that the Zebu's boot image contains a serialised vm (via Rust's serialisation), and I wasn't careful about what should be serialised so that basically everything is included. The serialised VM probably contribute to 99% of the boot image size.
I estimate if I am being careful about what should be serialised (only what will be used at runtime is worth serialising. #18 is part of the issue), the size can be reduced by at least 5-10 times.
And I am not sure how much this may contribute to the big performance slowdown.Yi LinYi Linhttps://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/38Allocation Performance2017-11-21T13:37:05+11:00John ZhangAllocation Performance## Description
Measured allocation performance using following code with `n=10`:
```python
class A:
pass
def alloc(n):
for i in range(n):
a = A()
def target(driver, args):
...
def main(argv):
...
...## Description
Measured allocation performance using following code with `n=10`:
```python
class A:
pass
def alloc(n):
for i in range(n):
a = A()
def target(driver, args):
...
def main(argv):
...
for i in range(iterations):
cb.begin()
for j in range(1000000):
alloc(n)
cb.end()
cb.report(resfile)
return 0
```
Mean of measured results:
| stack | result|
| ----- | ----- |
| RPython Mu Zebu | 0.00072474 |
| RPython C `clang -O0` | 0.00432998 |
| RPython C `clang -O1` | 0.0000008 |https://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/37[x86-64] Passing a 128-bit integer to a C function using CCALL2017-06-11T19:34:41+10:00Isaac Garianoisaac@ecs.vuw.ac.nz[x86-64] Passing a 128-bit integer to a C function using CCALLPassing a 128-bit integer to a C function (ussing CCALL) dosn't work on x86-64, e.g:
I wrote a test to check that 128-bit integers are passed correctly when calling functions (it is designed to cause the last argument to be placed on th...Passing a 128-bit integer to a C function (ussing CCALL) dosn't work on x86-64, e.g:
I wrote a test to check that 128-bit integers are passed correctly when calling functions (it is designed to cause the last argument to be placed on the stack on x86, but on aarch64 it should be in a register).
arg_overflow.uir:
```
.funcsig test_arg_overflow_sig = () -> ()
.funcdef my_main<()->()>
{
entry():
CCALL #DEFAULT <ufuncptr<test_arg_overflow_sig> test_arg_overflow_sig> <ufuncptr<test_arg_overflow_sig>>EXTERN "c_test_arg_overflow" ()
CALL <test_arg_overflow_sig> mu_test_arg_overflow()
RET
}
.funcsig arg_overflow_sig = (int<64> int<128> int<128> int<128>) -> ()
.funcdef mu_test_arg_overflow<test_arg_overflow_sig>
{
entry():
int128_0 = ADD <int<128>> <int<128>>0 <int<128>>0
int128_F = ADD <int<128>> <int<128>>0 <int<128>>0xFFFFFFFFFFFFFFFF0000000000000000
CCALL #DEFAULT <ufuncptr<arg_overflow_sig> arg_overflow_sig> <ufuncptr<arg_overflow_sig>>EXTERN "arg_overflow" (<int<64>>0 int128_0 int128_0 int128_F)
RET
}
```
It needs to be compiled with the following C code:
arg_overflow.c
```
#include <stdint.h>
#include <stdio.h>
void arg_overflow(uint64_t a, __int128_t b, __int128_t c, __int128_t d) {
printf("d = %016lX%016lX\n", (uint64_t)(d >> 64), (uint64_t)d);
}
void c_test_arg_overflow()
{
arg_overflow(0, 0, 0, (__int128_t)(0xFFFFFFFFFFFFFFFF) << 64);
}
```
On x86-64 using the line ` ./muc -r -f my_main arg_overflow.uir arg_overflow/arg_overflow`, (using the latest commit in the aarch64 branch) it fails to compile, giving the error:
```
thread '<unnamed>' panicked at 'not yet implemented', src/compiler/backend/arch/x86_64/inst_sel.rs:3181
stack backtrace:
0: std::sys::imp::backtrace::tracing::imp::unwind_backtrace
at /checkout/src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
1: std::sys_common::backtrace::_print
at /checkout/src/libstd/sys_common/backtrace.rs:71
2: std::panicking::default_hook::{{closure}}
at /checkout/src/libstd/sys_common/backtrace.rs:60
at /checkout/src/libstd/panicking.rs:355
3: std::panicking::default_hook
at /checkout/src/libstd/panicking.rs:371
4: std::panicking::rust_panic_with_hook
at /checkout/src/libstd/panicking.rs:549
5: std::panicking::begin_panic
6: mu::compiler::backend::x86_64::inst_sel::InstructionSelection::emit_c_call_ir
7: mu::compiler::backend::x86_64::inst_sel::InstructionSelection::instruction_select
8: <mu::compiler::backend::x86_64::inst_sel::InstructionSelection as mu::compiler::passes::CompilerPass>::visit_function
9: mu::compiler::passes::CompilerPass::execute
10: mu::compiler::Compiler::compile
11: mu::vm::vm::VM::make_boot_image_internal
12: mu::vm::api::api_bridge::_forwarder__MuCtx__make_boot_image
13: main
14: __libc_start_main
15: _start
fatal runtime error: failed to initiate panic, error 5
Aborted (core dumped)
```
On aarch64 it compiles (well Zebu fails at linking, but it works if I add 'arg_overflow.c' to the clang command) and runs correctly, printing:
```
d = FFFFFFFFFFFFFFFF0000000000000000
d = FFFFFFFFFFFFFFFF0000000000000000
```Yi LinYi Linhttps://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/36Replace TreeNode.clone_value() with TreeNode.as_value()2017-07-13T11:16:47+10:00Yi LinReplace TreeNode.clone_value() with TreeNode.as_value()we should avoid always cloning the `Value`.we should avoid always cloning the `Value`.Yi LinYi Linhttps://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/35x86-64 calling functions with int<128> arguments2017-07-12T10:49:35+10:00Isaac Garianoisaac@ecs.vuw.ac.nzx86-64 calling functions with int<128> argumentsI have unfortunately broken then x86-64 backend, I belive this is due to changing the call for UDIV/UREM/SREM/SDIV to pass 128-bit values instead of 64-bit ones.
Specifically when running the PySOM test I get the following error:
```
...I have unfortunately broken then x86-64 backend, I belive this is due to changing the call for UDIV/UREM/SREM/SDIV to pass 128-bit values instead of 64-bit ones.
Specifically when running the PySOM test I get the following error:
```
thread '<unnamed>' panicked at 'not yet implemented', src/compiler/backend/arch/x86_64/inst_sel.rs:2948
stack backtrace:
0: std::sys::imp::backtrace::tracing::imp::unwind_backtrace
at /checkout/src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
1: std::sys_common::backtrace::_print
at /checkout/src/libstd/sys_common/backtrace.rs:71
2: std::panicking::default_hook::{{closure}}
at /checkout/src/libstd/sys_common/backtrace.rs:60
at /checkout/src/libstd/panicking.rs:355
3: std::panicking::default_hook
at /checkout/src/libstd/panicking.rs:371
4: std::panicking::rust_panic_with_hook
at /checkout/src/libstd/panicking.rs:549
5: std::panicking::begin_panic
6: mu::compiler::backend::x86_64::inst_sel::InstructionSelection::emit_precall_convention
7: mu::compiler::backend::x86_64::inst_sel::InstructionSelection::emit_c_call_internal
8: mu::compiler::backend::x86_64::inst_sel::InstructionSelection::emit_runtime_entry
9: mu::compiler::backend::x86_64::inst_sel::InstructionSelection::emit_binop
10: mu::compiler::backend::x86_64::inst_sel::InstructionSelection::instruction_select
11: <mu::compiler::backend::x86_64::inst_sel::InstructionSelection as mu::compiler::passes::CompilerPass>::visit_function
12: mu::compiler::passes::CompilerPass::execute
13: mu::compiler::Compiler::compile
14: mu::vm::vm::VM::make_boot_image_internal
15: mu::vm::api::api_bridge::_forwarder__MuCtx__make_boot_image
16: fnc_40
17: main
18: __libc_start_main
19: _start
fatal runtime error: failed to initiate panic, error 5
```
It seems there is a bug in the x86-64 `emit_precall_convention` function, which I am unable to fix as I am unfamiliar with the x86 calling conventions.Yi LinYi Lin