Memory model in native interface

Created by: wks

Problem

Currently the Mu memory is all about "memory locations" – a region that holds a Mu value, not directly related to addresses or bytes. The native memory is a sequence of bytes, addressed by integer "addresses". They are separate until a Mu memory location is pinned. In that case, the Mu memory location is mapped to a region of bytes in the address space. Accessing one will affect another.

Meanwhile Mu's memory model uses the C++11-style model based on the happen-before relation.

This model imposes a challenge that the model should bridge the Mu and the native world. The native view of the memory as a sequence of bytes should work nicely with the Mu memory, i.e. map to meaningful memory operations in the Mu world. Atomic actions should be consistent and may establish the happen-before relation between two worlds. Specifically:

What is the unit of memory actions? Previously, it is "Mu memory location".
- If a "load" action is modeled as a tuple: LOAD(order, type, location), and location was "Mu memory location", then what should location be now? Address? What value does it see? Some store? Or something else?
- If a "store" action is modeled as a tuple: STORE(order, type, location, newvalue), and location was "Mu memory location", then what should location be now?
If a Mu memory location is pinned, and is accessed in a different granularity than the type declared, what will be the result?
- If stored as a whole, but loaded in parts...
- If stored in parts, but loaded as a whole...
- But we cannot model the memory as a byte array which sequentially changes state. (or, can we? Since non-atomic conflicting accesses are meaningless, does this imply it must be changed sequentially, or errors occur?)

The current model

A non-atomic load sees the unique store operation that happens before it, and there isn't another store operation that happens between the visible store and the load. If there are more than one such operations, it has undefined behaviour.
An atomic load sees the value from any of its visible sequence of store operations.
Mixing non-atomic and atomic operations on the same memory location has undefined behaviour.

Possible directions

In any way, pure Mu programs should keep its original C++11-like semantics.

Make the memory model more machine-oriented and machine-specific.
- May give more dependable behaviours. For example, unaligned memory access is allowed in many architectures, but are not always atomic.
- Obviously this makes Mu less portable. All pointer-based memory access will have machine-specific semantics. But does this matter? This is the "native interface" anyway.
- Interoperability with the C++11 memory model for C/C++ programs will be built upon the machine-specific memory model.
Limit what operations are allowed in the native memory.
- Simpler model.
- Probably more undefined behaviours, because they cannot be defined if we tries to make things simple and generic.
- Will limit the capability. e.g. unions won't be used by Mu.
Something in between

Examples

The native program should synchronise with the Mu program via atomic memory accesses.

// C++ pseudo code
struct Foo {
    int x;
    int y;
};

Mu_thread_1 {
  ref<Foo> f = new<Foo>
  ptr<Foo> fp = pin(f);
  create_thread(native_thread_2, fp);
  store(&f->x, 10, NOT_ATOMIC);    // Mu-level store
  store(&f->y, 20, RELEASE);          // Mu-level store
}

native_thread_2(ptr<Foo> fp) {
  while(load(&fp->y, ACQUIRE) != 20) {}    // Native load
  int a = load(&fp->x, NOT_ATOMIC);       // Native load
  assert(a == 10);
}

In non-atomic memory access, partial reads/write should be based on the bytes representation (it is called the "object representation" of a value in C11).

ref<i32> r = new<i32>;
store(r, 0x12345678);   // Assume little endian

ptr<i32> p = pin(r);
i64 addr = ptrcast<i64>(p);  // cast the pointer to the integer address
addr += 3;
ptr<i8> p2 = ptrcast<ptr<i8>>(addr);  // cast back to pointer, but a different type
i8 value = load(p2);
assert(value == 0x12);

store(p2, 0x9a);
i32 value2 = load(r);
assert(value2 == 0x9a345678);

Unaligned 16-, 32- and 64-bit memory access is allowed in x64 (and P6-family guarantees atomicity if not crossing any cache line boundary).

struct Foo { i32 a; i32 b; };

ref<Foo> r = new<Foo>;
store(&r->a, 0x9abcdef0);
store(&r->b, 0x12345678);

ptr<Foo> p = pin(r);
ptr<i64> p2 = ptrcast<ptr<i64>>(p);
i64 value = load(p2);
assert(value == 0x123456789abcdef0);

Could non-atomic memory access mix with atomic counterparts?

struct Foo { i32 x; i8 y; double z; };

ref<Foo> r1 = new<Foo>;
ref<Foo> r2 = new<Foo>;
ptr<Foo> p1 = pin(r1);
ptr<Foo> p2 = pin(r2);

store(&p1->x, 0x12345678, NOT_ATOMIC);
store(&p1->y, 42, NOT_ATOMIC);
store(&p1->z, 3.1415927D, NOT_ATOMIC);

memcpy(p2, p1, sizeof(Foo));    // This is obviously not atomic

some_synchronization_operation_after_which_atomic_accesses_will_be_safe();   // What should this be?

thread1 {
  store(r2->y, 84, RELAXED);    // This is atomic
  store(r2->x, 0x9abcdef0, RELEASE);     // This is atomic
}

thread2 {
  i32 a = load(&r2->x, ACQUIRE);    // This is atomic
  if (a == 0x9abcdef0) {
    i8 b = load(&r2->y, RELAXED);    // This is atomic
    double c = load(&r2->z, RELAXED);    // This is atomic
    assert(b == 84 && c == 3.1415927D);
  }
}