Granularity of RwLocks

Designed for concurrency, Rust enforces the use of proper synchronisation when working with shareable memory locations. Atomic integers, Mutexes and RwLocks are a few examples. These types can be used even if the memory locations are "immutably shared": the user can gain read-write access by obtaining the read-write lock, or using atomic operations.

In the centre of the mu-impl-fast is the VM object. It contains the global IR information, including all types, constants and functions ever loaded into the micro VM. Accesses to these data structures must be properly synchronised.

Currently the way of synchronisation is fine-grained locking. Every HashMap is protected by a RwLock. This will enable data-race-free access to the shared data structures, but it does have its disadvantages.

Potential deadlocks. Sometimes one operation needs to have read-write access to multiple objects. It is very easy to run into the simplest ABBA deadlock if the locks are obtained in different orders by different threads.
- workaround: If "bundle loading" is the only operation that needs RW access, it can simply obtain all RW locks before performing any operations. During the operations, it is free to obtain any extra RO locks if necessary.
The locking is too fine-grained. Putting locks on too many objects will increase the space overhead, and will require the user to perform more locking operations which will increase the time overhead.
- workaround: For any operation that needs RO access, holding the RO locks during the entire operation rather than frequently acquiring/releasing the locks can reduce the time overhead, but not the space overhead.
Using locks instead of lock-free data structures. Bundle loading is extremely rare comparing to RO accesses. In the current design, Mu-level exception handling needs to obtain RO access to the VM stack-unwinding metadata. Although exception handling is slow-path from the user program's perspective, it is still more common than client-to-MuVM API calls. User-level exception handling should be lock-free.

Ideally, the shared data structures should be implemented with transactional lock-free data structures that strongly biases towards fast read-only accesses. The ideal data structure should be the RCU-like multi-version concurrency control algorithm. Actually all Mu IR nodes are immutable by design (NOTE: Function "redefinition" is actually, by design, "adding a version to a function", so existing versions do not change. There is also no API that asks "what versions does a function have".) , so it doesn't matter if the client sees a slightly older version of the data structure: it will not see some new-coming nodes concurrently being inserted, but whatever it sees, it is the correct version that it should see. (For example, it is OK if a newly-started function runs its older version rather than the "newest version", because it is allowed, unless the client uses proper fences and synchronisations by itself.)

But since our current goal is to have a working VM, we can postpone these optimisations and just use the status quo for now.