mu-impl-fast issueshttps://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues2017-09-19T15:39:44+10:00https://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/85DYLD_LIBRARY_PATH dependent tests fails on macOS with SIP enabled2017-09-19T15:39:44+10:00Zixian CaiDYLD_LIBRARY_PATH dependent tests fails on macOS with SIP enabledhttps://groups.google.com/forum/#!topic/caffe-users/waugt62RQMU
https://github.com/oracle/node-oracledb/issues/231
https://developer.apple.com/library/content/documentation/DeveloperTools/Conceptual/DynamicLibraries/100-Articles/RunpathD...https://groups.google.com/forum/#!topic/caffe-users/waugt62RQMU
https://github.com/oracle/node-oracledb/issues/231
https://developer.apple.com/library/content/documentation/DeveloperTools/Conceptual/DynamicLibraries/100-Articles/RunpathDependentLibraries.html
We should encode these differentlyhttps://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/86Issue with rodal on macOS when dynamically linking with boot image2017-09-22T11:10:22+10:00Yi LinIssue with rodal on macOS when dynamically linking with boot imageI am not sure if I fully understood the problem. Isaac @igariano01 may explain more on this.
Generally the issue is that rodal needs to redefine `malloc()` and `free()` that Rust uses to manage heap memory. Those are weak symbols, so r...I am not sure if I fully understood the problem. Isaac @igariano01 may explain more on this.
Generally the issue is that rodal needs to redefine `malloc()` and `free()` that Rust uses to manage heap memory. Those are weak symbols, so rodal redefines them. Thus if an object is dumped by rodal, it cannot be freed by the default `free()`, instead it is freed by rodal.
However on macOS, Rust uses a different allocator when generating dynamic libraries instead of using `malloc()`/`free()`, and those cannot be redefined. Thus when we link to mu as a dynamic library, rodal cannot redefine the `free()` function. However, there seems no issue when linking to mu a as static library.
One solution that seems very hacky is to disable dynamic link with mu in boot image generation. Branch https://gitlab.anu.edu.au/mu/mu-impl-fast/tree/static-link-for-macos has the fix.
The other solution is to use Rust's nightly feature to provide a custom allocator (rodal will implement a custom allocator along with deallocation). Branch https://gitlab.anu.edu.au/mu/mu-impl-fast/tree/global_allocator has a fix. This is a more elegant approach. But my only concern is that this requires us to use nightly Rust. I am concerned the language implementation itself is more buggy in a nightly version, and we will meet problems when debugging (e.g. whether it is our bug, or their bug). And if we switch to nightly Rust, it is very likely that we will start using more nightly features, and we will not be able to go back to stable Rust in the future.https://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/13In-place code generation2017-03-01T17:40:17+11:00Yi LinIn-place code generationThe idea is to generate machine code directly without going through some representation of the machine code. The motivation is to save allocation/spaces for the machine code representation. Current x86_64 assembly backend tests the idea ...The idea is to generate machine code directly without going through some representation of the machine code. The motivation is to save allocation/spaces for the machine code representation. Current x86_64 assembly backend tests the idea of in-place code generation (though it is completely not necessary for an assembly backend to do this).
Issues:
* Too many eliminated moves (that are turned into `nop`), which results in bad performance. One solution is to do compaction on the code in the end (slide/copy the code to remove 'holes').
* In-place code generation introduces extra memory overhead, as we need to save locations of registers.
*A note for x86_64 asm backend:
Metadata for instruction can be more optimised. For example, currently I am using `LinkedHashMap<MuID, Vec<ASMLocation>>` to store information on used and defined registers. Both `LinkedHashMap` and `Vec` is expensive. We can use fixed length array, or simply `use1`, `use2`, `use3`...*
* It makes machine code level optimisations and transformations hard to implement.
Questions:
* Is it possible that we can emit machine code (binary) before we know the final code? For example, for `jmp`, it may turn into different machine code based on how big the offset is. However any assembler may need to deal with this question.
We need to discuss this more about this before starting implementing JIT.https://gitlab.anu.edu.au/mu/mu-impl-fast/-/issues/2Granularity of RwLocks2017-06-14T00:11:57+10:00Kunshan WangGranularity of RwLocksDesigned for concurrency, Rust enforces the use of proper synchronisation when working with shareable memory locations. Atomic integers, Mutexes and RwLocks are a few examples. These types can be used even if the memory locations are "im...Designed for concurrency, Rust enforces the use of proper synchronisation when working with shareable memory locations. Atomic integers, Mutexes and RwLocks are a few examples. These types can be used even if the memory locations are "immutably shared": the user can gain read-write access by obtaining the read-write lock, or using atomic operations.
In the centre of the mu-impl-fast is the [VM](https://gitlab.anu.edu.au/mu/mu-impl-fast/blob/master/src/vm/vm.rs#L25) object. It contains the global IR information, including all types, constants and functions ever loaded into the micro VM. Accesses to these data structures must be properly synchronised.
Currently the way of synchronisation is fine-grained locking. Every HashMap is protected by a RwLock. This will enable data-race-free access to the shared data structures, but it does have its disadvantages.
- **Potential deadlocks.** Sometimes one operation needs to have read-write access to multiple objects. It is very easy to run into the simplest [ABBA deadlock](http://cmdlinelinux.blogspot.com.au/2014/01/linux-kernel-deadlocks-and-how-to-avoid.html) if the locks are obtained in different orders by different threads.
- workaround: If "bundle loading" is the only operation that needs RW access, it can simply obtain all RW locks before performing any operations. During the operations, it is free to obtain any extra RO locks if necessary.
- **The locking is too fine-grained.** Putting locks on too many objects will increase the space overhead, and will require the user to perform more locking operations which will increase the time overhead.
- workaround: For any operation that needs RO access, holding the RO locks during the entire operation rather than frequently acquiring/releasing the locks can reduce the time overhead, but not the space overhead.
- **Using locks instead of lock-free data structures.** Bundle loading is extremely rare comparing to RO accesses. In the current design, Mu-level exception handling needs to obtain RO access to the VM stack-unwinding metadata. Although exception handling is slow-path from the user program's perspective, it is still more common than client-to-MuVM API calls. User-level exception handling should be lock-free.
Ideally, the shared data structures should be implemented with **transactional lock-free data structures that strongly biases towards fast read-only accesses**. The ideal data structure should be the [RCU](https://en.wikipedia.org/wiki/Read-copy-update)-like [multi-version concurrency control](https://en.wikipedia.org/wiki/Multiversion_concurrency_control) algorithm. Actually all Mu IR nodes are immutable by design (NOTE: Function "redefinition" is actually, by design, "adding a version to a function", so existing versions do not change. There is also no API that asks "what versions does a function have".) , so it doesn't matter if the client sees a slightly older version of the data structure: it will not see some new-coming nodes concurrently being inserted, but whatever it sees, it is the correct version that it should see. (For example, it is OK if a newly-started function runs its older version rather than the "newest version", because it is allowed, unless the client uses proper fences and synchronisations by itself.)
But since our current goal is to have a working VM, we can postpone these optimisations and just use the status quo for now.