Extended memory model

Created by: wks

The current memory model is designed according to the C11/C++11 standard. It involves atomic memory access, locks and threads.

However, there are issues which C11 and C++11 do not address. They are:

Futex: Futex wait implies an atomic load-compare-sleep operation. (See spec)
Trap/watchpoint/osr: They involves stack binding/unbinding. Just make more things explicit. (See spec)
Swap-stack: C11 does not address swap-stack, but swap-stack operations in different threads may conflict and visibility issues involved in swap-stack operations should be clarified. (making racing swap-stack operations undefined behaviours is okay) (See spec)
Foreign language interface: When parallel µVM programs runs together with parallel native programs, we need a way to synchronise the two worlds. This also involves object pinning. See https://github.com/microvm/microvm-meta/issues/37 (It ends up that Mu cannot make much guarantees known that the worst case of native memory access is segmentation fault. This thing must be very implementation-specific.)

Current ideas:

Futex: (open question) Should futex_wake happen before the next instruction after the futex_wait that actually wakes up? Probably yes.

int shared_state = 42;

atomic_int thread1_wake = 0;
futex thread1_futex;
    
thread1:
    while(load(ACQUIRE, &thread1_wake)!=1) { // OP4
        futex_wait(thread1_futex);  // OP5
    }
    int s = shared_state // OP6

thread2:
    shared_state = 99; // OP1
    store(RELEASE, &thread1_wake, 1); // OP2
    futex_wake(thread1_futex);  // OP3

According to the semantic of RELEASE and ACQUIRE, if OP4 sees the store by OP2, then OP6 must see the store of OP1.

The problem is, if the OP3 futex_wake is not guaranteed to happen before the waking of OP5 futex_wait, then the next OP4 in the loop may not see the store by OP2 at all, and may go to sleep another time. Then it will be sleeping forever.

stack binding/unbinding cannot be atomic. Swap-stack is a combination of two and cannot be atomic, either.

We may require the language implementer to correctly use other synchronisations to make sure racy swap-stack operations do not happen.