mu-impl-fast issueshttps://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues2019-04-09T16:39:56+10:00https://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/48Set user-defined thread local when launching boot image2019-04-09T16:39:56+10:00Yi LinSet user-defined thread local when launching boot imageCurrently Zebu assumes no user-defined thread local due to the fact that at the time the code was written, we cannot persist heap objects.
However, there is no reason preventing us implementing this now. Currently Zebu assumes no user-defined thread local due to the fact that at the time the code was written, we cannot persist heap objects.
However, there is no reason preventing us implementing this now. Yi LinYi Linhttps://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/91Support for sel4-rumprun2019-03-25T13:08:19+11:00Yi LinSupport for sel4-rumprunhttps://gitlab.anu.edu.au/mu/mu-impl-fast/merge_requests/20 and https://gitlab.anu.edu.au/mu/mu-impl-fast/merge_requests/21 added some code to support Zebu running on sel4-rumprun. However, it does not run on sel4-rumprun yet, due to `ro...https://gitlab.anu.edu.au/mu/mu-impl-fast/merge_requests/20 and https://gitlab.anu.edu.au/mu/mu-impl-fast/merge_requests/21 added some code to support Zebu running on sel4-rumprun. However, it does not run on sel4-rumprun yet, due to `rodal` does not support sel4-rumprun.
The changes mainly address these issues:
* removed usage of dynamic libraries (`dlopen`, `dlsym`, etc) as sel4-rumprun does not support dynamic linking.
* rewrote some testcases to avoid using dynamic libraries.
* added feature guard `sel4-rumprun` for OS dependent code. Feature guard is used instead of OS guard as Rust does not correctly recognise sel4-rumprun.
* added feature guard `sel4-rumprun-target-side` for two-stage cross compilation.
Problems with the changes:
* it doesn't actually run on rumprun (`rodal` uses `dlsym`).
* the changes currently have massive code duplication instead of reusing OS/Target dependent code for Linux/x86_64. And duplicated code is not maintained when the original code changes.
* there is some hard-coded path for running Zebu on sel4-rumprun.
* using feature guard `[#cfg(feature = "sel4-rumprun")]` instead of a proper OS guard `[#cfg(target_os = "sel4-rumprun")]` makes OS dependent code quite unreadable. For example, for linux code, we have to do
```
[#cfg(not(feature = "sel4-rumprun))]
[#cfg(target_os = "linux")]
... // linux dependent code
```
* there is no document on how to setup environment and run Zebu on sel4-rumprun.
These should be addressed if we want to properly support Zebu on sel4-rumprun.Javad Ebrahimian Amirijavad.amiri@anu.edu.auJavad Ebrahimian Amirijavad.amiri@anu.edu.auhttps://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/49Stack guarded page may not be able to detect overflow/underflow if frame size...2018-09-10T06:31:06+10:00Yi LinStack guarded page may not be able to detect overflow/underflow if frame size is larger than page size.Currently we are guarding (write/read protect) the two pages above and below stack memory so if any write/read occurs for the pages we will know a overflow/underflow happens. However if a frame size is larger than the guarded page size (...Currently we are guarding (write/read protect) the two pages above and below stack memory so if any write/read occurs for the pages we will know a overflow/underflow happens. However if a frame size is larger than the guarded page size (e.g. `ALLOCA` a large piece of stack memory, but never access it), it is possible that we totally skip the guarded page, and cause the stack keeps growing beyond the guarded boundary.https://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/50Capture stack overflow/underflow signal2018-09-10T06:31:05+10:00Yi LinCapture stack overflow/underflow signalWe are guarding pages above and below the Mu stack, thus an overflow/underflow would trigger a write/read protection. We should register handling to catch the signal, and identify its cause (overflow/underflow or other segfaults).We are guarding pages above and below the Mu stack, thus an overflow/underflow would trigger a write/read protection. We should register handling to catch the signal, and identify its cause (overflow/underflow or other segfaults).https://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/51Memory order in store/load API calls2018-09-10T06:31:05+10:00Yi LinMemory order in store/load API callsFor API calls `load()`/`store()`, they take memory order as an argument. However I am uncertain whether we need to do anything special for different memory orders. Current implementation just does a plain load/store in `vm.handle_load()`...For API calls `load()`/`store()`, they take memory order as an argument. However I am uncertain whether we need to do anything special for different memory orders. Current implementation just does a plain load/store in `vm.handle_load()` and `vm.handle_store()`.https://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/52Use make_boot_image() for all Zebu tests2018-09-10T06:31:05+10:00Yi LinUse make_boot_image() for all Zebu testsDue to the fact that we didn't have `make_boot_image()` in early development, currently the tests are written in different ways for code generation and linking:
1. use `compile_to_sharedlib()` (as exposed in API, but not in the Mu spec) ...Due to the fact that we didn't have `make_boot_image()` in early development, currently the tests are written in different ways for code generation and linking:
1. use `compile_to_sharedlib()` (as exposed in API, but not in the Mu spec) to link generated code to a dynamic library
1. use `make_boot_image()` (as exposed in API) to link generated code to an executable
1. use `compile_fnc()` to link generated code to a dynamic library for *cargo tests*
1. manually compile code, persist vm, and link for *cargo tests*
Currently 1 and 2 are internally using `make_boot_image()`. But others are not.
Since `make_boot_image()` in Zebu allows generating dynamic library or executable (depends on output file name), we can make all the test use `make_boot_image()`. This will make the linking for all tests unified, and make it easy to make changes.https://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/58[x86-64] Alias don't seam to override previous ones2018-09-10T06:31:05+10:00Isaac Garianoisaac@ecs.vuw.ac.nz[x86-64] Alias don't seam to override previous onesThe pytest `test_heap.py::test_preserve_ref_field` succeeds when run on it's own, but fails when run as a group (e.g. with `pytest test_heap.py`), I looked at the generated code and it's functionally identical in both cases.
All three te...The pytest `test_heap.py::test_preserve_ref_field` succeeds when run on it's own, but fails when run as a group (e.g. with `pytest test_heap.py`), I looked at the generated code and it's functionally identical in both cases.
All three tests in `test_heap.py` creat functions called `test_fnc` and global cells called `gcl`, it then loads the generated shared libraries. The first two tests are supposed to return the number 42, the last one is supposed to return 298, but actually returns 42, as such I think it's quite possible when loading a shared library with an alias defined, it doesn't properly override the previous alias.
I am defining aliases using `.equiv`, like so:
```
.globl __mu_gcl
__mu_gcl:
.globl gcl
.equiv gcl, __mu_gcl
```
And the same with `test_func`.
I could try and replace the aliases with labels, but then the function would have two names, and may cause problems if we try and resolve an address to a name (which we don't currently do, except for native functions in back trace printing).
Perhaps I could use another directive (like `.set`, or `.eqv`, but the docs say their the same).Isaac Garianoisaac@ecs.vuw.ac.nzIsaac Garianoisaac@ecs.vuw.ac.nzhttps://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/60[aarch64] Unimplemented Backend Features2018-09-10T06:31:05+10:00Isaac Garianoisaac@ecs.vuw.ac.nz[aarch64] Unimplemented Backend FeaturesI noticed there are lots of things that zebu doesn't implement, so I decided to make a list, and will try and keep it up to do (I would like to implement stuff here specifically the mu specific things like traps, watch points, threads an...I noticed there are lots of things that zebu doesn't implement, so I decided to make a list, and will try and keep it up to do (I would like to implement stuff here specifically the mu specific things like traps, watch points, threads and stack related things, as I feel currently Zebu is mostly just LLVM with exceptions and a garbage collector that doesn't collect..)
I'm not entirely sure what the x86-64 backend implements, so i've only listed things for aarch64: (note features with a * next to them haven't been tested properly yet), I have also included things I believe are not implemented on x86-64 but are on aarch64:
Types:
* [ ] `int<n>`:
* [x] n = 1 (some arithmetic)
* [x] * n <= 64 and n != 8, 16, 32, or 64
* [x] * n = 128, floating point conversions
* [ ] n > 64 and n != 128
* [ ] `struct` SSA variables
* [ ] `hybrid` SSA variables
* [ ] `array` SSA variables
* [x] `tagref64`
* [x] `threadref`
* [x] `stackref``
* [ ] `framecursorref`
* [ ] `irbuilderref`
* [ ] `vector<t n>`
Instruction Clauses:
* [ ] keep Alive Clauses
* [ ] Exception clauses
* [x] CALL
* [ ] binop (divison by zero)
* [ ] NEW/NEYHYBRID (allocation failure)
* [ ] LOAD/STORE/CMPXCHG (null referenced)
* [ ] CCALL (implementation defined)
Instructions:
* [ ] `TAILCALL`:
* [x] When the callee's stack argument size is less than or equal to the caller's
* [ ] When the callee's stack argument size is greater than the caller's
* [ ] For unimplemented types:
* [ ] `EXTRACTVALUE`/`INSERTVALUE`
* [ ] `EXTRACTELEMENT`/`INSERTELEMENT`
* [ ] `SHUFFLEVECTOR`
* [ ] Memory
* [x] `ALLOCA`/`ALLOCAHYBRID`
* [x] * `CMPXCHG`
* [x] * `FENCE`
* [ ] `ATOMICRMW`
* [ ] Traps/watchpoints
* [ ] `TRAP`
* [ ] `WATCHPOINT`
* [ ] `WPBRANCH`
* [x] Thread/stack instructions
* [x] `NEWTHREAD`
* [x] `SWAPTSTACK`
Common Instructions
* [x] thread and stack things:
* [x] new_stack
* [x] kill_stack
* [x] thread_exit
* [x] current_stack
* [x] set_threadlocal
* [x] get_threadlocal
* [x] tr64.*
* [ ] futex.*
* [ ] kill_dependency
* [ ] native.* (except nativ.pin and nativ.unpnin)
* [ ] meta.*
* [ ] irbuilder.*https://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/93Issues before enabling GC2017-11-21T14:37:55+11:00Yi LinIssues before enabling GCThe GC rewrite should have fixed most of the problems, however, there are a few known issues that need to be done before enabling GC.
* https://gitlab.anu.edu.au/mu/mu-impl-fast/issues/21 about some utility functions in C that help f...The GC rewrite should have fixed most of the problems, however, there are a few known issues that need to be done before enabling GC.
* https://gitlab.anu.edu.au/mu/mu-impl-fast/issues/21 about some utility functions in C that help find references in stack/heap. They need to be fixed for correctness. And also `set_low_water_mark()` should be called by the VM to set a limit for stack scanning. Alternatively we may also let GC know about stack bounds so it won't go beyond the stack memory.
* inserting yieldpoint.
* Allowing finding base reference for internal references. As we have 16 bytes min size and 16 bytes min alignment of objects, for any reference we mask it to 16 bytes, and see if it contains a valid object encoding (non-empty, or with certain bits set).https://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/24Mu IR Type checking2017-11-21T13:54:18+11:00Isaac Garianoisaac@ecs.vuw.ac.nzMu IR Type checkingThe Mu IR compiler currently will compile some invalid mu code, specifically I noticed the following invalid code successfully compiled (which were used in some of the tests) :
* a SHL/LSHR/ASHR instruction where the second argument is...The Mu IR compiler currently will compile some invalid mu code, specifically I noticed the following invalid code successfully compiled (which were used in some of the tests) :
* a SHL/LSHR/ASHR instruction where the second argument is not the same as the first (in the case of the test the first argument was int<64> and the second argument was an int<8>) (this code was generated in tes_shl and test_lshr).
* passing an int<64> as an argument to a C function expecting an int<32> (this was generated by test_pass_1arg_by_stack, and test_pass_2arg_by_stack)
In addition the compiler doesn't seem to check when you use an SSA variable whether it has been assigned to yet.https://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/6IR rewrite pass before instruction selection2017-11-21T13:50:53+11:00Yi LinIR rewrite pass before instruction selectionSome instructions such as `NEW` will be expanded into a sequence of code (may involve new blocks), and some instructions such as `THREADEXIT` will be expanded into a CCall into runtime service functions. Currently this is done at instruc...Some instructions such as `NEW` will be expanded into a sequence of code (may involve new blocks), and some instructions such as `THREADEXIT` will be expanded into a CCall into runtime service functions. Currently this is done at instruction selection pass, by directly expanding such instructions into machine code. Alternatively, a better choice is to rewrite/expand such instructions into Mu IR before instruction selection. Yi LinYi Linhttps://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/12Use sidemap for GC object metadata2017-11-21T13:48:32+11:00Yi LinUse sidemap for GC object metadataZebu intends to side maps to seperate objects from their metadata.
However, currently as a compromise, I am using a 64-bit object header along with `GCType` for the metadata. We will go back to using side maps.
This issue describ...Zebu intends to side maps to seperate objects from their metadata.
However, currently as a compromise, I am using a 64-bit object header along with `GCType` for the metadata. We will go back to using side maps.
This issue describes the design of sidemap scheme.
* we will assume a minimal object size `MIN_SIZE`, and a minimal alignment `MIN_ALIGN`. The larger the minimal size/align is, the less memory is required for metadata (see below). However, it wastes memory in the heap. MIN_SIZE of 16/24/32<del>bits</del> bytes, MIN_ALIGN of 128bits are reasonable.
* object metadata includes:
* 1bit/MIN_ALIGN: object start (and end - so we can decide size) (size is required for copying/dumping object)
* 1bit/64bits: reference locations
* 8bits/MIN_SIZE: gc state (mark bit, reference count, etc)
* small objects have less space to encode metadata, but large objects have plenty. We can use different schemes to encode small/large objects.
* side maps should be stored in the metadata part of a page/memory chunk.
More concrete design will be updated here once we discuss more. Yi LinYi Linhttps://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/21GC related C functions return inaccurate result2017-11-21T13:46:42+11:00Yi LinGC related C functions return inaccurate result`gc/src/heap/gc/clib_x64.c` contains C functions for GC, such as `get_registers()`, which contains inline assembly to save all values in general purpose registers into an array. However C compilers may generate code that changes the regi...`gc/src/heap/gc/clib_x64.c` contains C functions for GC, such as `get_registers()`, which contains inline assembly to save all values in general purpose registers into an array. However C compilers may generate code that changes the registers before saving.
We may want to rewrite the function in assembly instead of C. And I believe it is reasonable that we want to eliminate all C functions in the code base and replace them with assembly (all C functions are pretty simple). Ideally we want only Rust code and assembly in the code base.https://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/38Allocation Performance2017-11-21T13:37:05+11:00John ZhangAllocation Performance## Description
Measured allocation performance using following code with `n=10`:
```python
class A:
pass
def alloc(n):
for i in range(n):
a = A()
def target(driver, args):
...
def main(argv):
...
...## Description
Measured allocation performance using following code with `n=10`:
```python
class A:
pass
def alloc(n):
for i in range(n):
a = A()
def target(driver, args):
...
def main(argv):
...
for i in range(iterations):
cb.begin()
for j in range(1000000):
alloc(n)
cb.end()
cb.report(resfile)
return 0
```
Mean of measured results:
| stack | result|
| ----- | ----- |
| RPython Mu Zebu | 0.00072474 |
| RPython C `clang -O0` | 0.00432998 |
| RPython C `clang -O1` | 0.0000008 |https://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/56Allow GC heap growth2017-11-21T13:35:56+11:00Yi LinAllow GC heap growthIn current GC implementation, we allocate memory of the given heap size, and allocate (and initialize) metadata for the whole heap all at once at startup. This causes heap initialization extremely slow (causing 70% of the startup time - ...In current GC implementation, we allocate memory of the given heap size, and allocate (and initialize) metadata for the whole heap all at once at startup. This causes heap initialization extremely slow (causing 70% of the startup time - measured by @igariano01)
This should be fixed when we rewrite GC to allow heap growth so that we only need to mmap and initialize a small heap. The rewrite is on the schedule along with Issue #12.
|operation|time (μs)|
|----------|---|
|after rodal_init_deallocate|90.726|
|before mu_main|6.838|
|before gc_init|73.149|
|after gc_init|15,065.736|
|after init_runtime|35.896|
|after restore gc types|416.741|
|after build table|6,249.104|
|after loaded args|75.905|
|before swap_to_mu_stack|235.152|
|**Total**|22,249.247|https://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/92CPU frequency scaling may affect the performance regression test2017-11-21T09:45:39+11:00Zixian CaiCPU frequency scaling may affect the performance regression testhttps://gitlab.anu.edu.au/mu/mu-perf-benchmarks/issues/20
cc @igariano01https://gitlab.anu.edu.au/mu/mu-perf-benchmarks/issues/20
cc @igariano01https://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/90Register allocation with special registers2017-10-19T15:11:10+11:00Yi LinRegister allocation with special registersMy register allocator currently deals with special registers in the following way:
1. special registers are not `usable`, thus it cannot be assigned to a temporary.
1. coalescing will not combine special registers with temporaries (even ...My register allocator currently deals with special registers in the following way:
1. special registers are not `usable`, thus it cannot be assigned to a temporary.
1. coalescing will not combine special registers with temporaries (even if it is safe and optimal to do so)
We can make this cleaner by manipulating interferences with special registers, and let register allocator make the decision:
* make special registers alive at function exit, so it conflicts with all other temporaries, and register allocator won't assign it to any of the temporaries. But coalescing may combine temporaries with special registers if possible.
* to prevent coalescing in some cases, such as
```
mov SP -> t
add t, 8 -> t
```
we cannot coalesce SP with t. Otherwise changing t will also change the stack pointer. For a general case,
```
OP t, v -> u
```
if t cannot be coalesced with special register S, the instruction selector can generate code
```
mov t -> t0 (with def S)
OP t0, v -> u
```
this will add an interference edge between t0 and S, and prevent the coalescing.https://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/87Make IR debug output from Zebu compatible with muc2017-10-09T17:38:55+11:00Yi LinMake IR debug output from Zebu compatible with mucCurrently with `trace` level logging, Zebu outputs IR in its own format. We should make this compatible with the text form that `muc` uses (https://gitlab.anu.edu.au/mu/mu-tool-compiler/blob/master/UIR.g4)Currently with `trace` level logging, Zebu outputs IR in its own format. We should make this compatible with the text form that `muc` uses (https://gitlab.anu.edu.au/mu/mu-tool-compiler/blob/master/UIR.g4)https://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/67[x86_64] JIT backend2017-10-09T17:34:52+11:00Yi Lin[x86_64] JIT backendI will probably be working on a JIT backend for x86_64. Currently I am considering using Intel XED (https://intelxed.github.io/), which is a C library from Intel under Apache 2.0 license for encoding x86/x86_64 instructions. This tool (h...I will probably be working on a JIT backend for x86_64. Currently I am considering using Intel XED (https://intelxed.github.io/), which is a C library from Intel under Apache 2.0 license for encoding x86/x86_64 instructions. This tool (https://github.com/servo/rust-bindgen) can generate Rust bindings from C/C++ headers, with which I can easily use the library in our implementation.
As I implement the JIT backend, I expect a lot of changes to the existing code (as we switch from AOT focus to adapt both). So I suggest we postpone the JIT backend for aarch64 for a while until I have reached some milestones for x86_64, such as
* JIT compile an add function
* JIT compile a function that contains a loop
* JIT compile a function that contains a call
* JIT compile a function that uses VM runtime
Whether doing JIT or AOT is a build-time option for Zebu. Please let me know if you think this is problematic.Yi LinYi Linhttps://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/89fannkuchredux slowdown after commit de9d31255831eb89299d93960a3c7a4de514f3a82017-10-09T16:28:58+11:00Yi Linfannkuchredux slowdown after commit de9d31255831eb89299d93960a3c7a4de514f3a8de9d31255831eb89299d93960a3c7a4de514f3a8 corrected the logic in x86 backend to decide whether an integer constant is a valid x86 immediate number (32 bits). It is intended to allow more constants as x86 immediate numbers. However, on mub...de9d31255831eb89299d93960a3c7a4de514f3a8 corrected the logic in x86 backend to decide whether an integer constant is a valid x86 immediate number (32 bits). It is intended to allow more constants as x86 immediate numbers. However, on mubench (http://squirrel.anu.edu.au/mubench/) it shows a slowdown for fannkuchredux after the commit. I need to investigate into this.Yi LinYi Lin