Document memory layout requirements
Created by: wks
Because of the difference between architectures, it is better to leave the object layout to be implementation-specific. An implementation of a µVM can choose its optimal strategy to make memory accesses as fast as possible.
However, although the memory layout, is part of the implementation, some guarantees must be made to enable:
- implementation of object-oriented languages: the superclass-subclass relationship can be most conveniently implemented as the superclass being a prefix of a subclass structure.
- interoperation between µVM programs and C programs: during a foreign function call, data structures are shared so that data can be passed both ways.
- some basic data structures must support atomic accesses. One of such type is references.
Prefix rules
Some type is a prefix of another type. If T1 is a prefix of T2, then there are shared components between T1 and T2. A component can be the whole value or some part of it, including fields in structs, elements in arrays and vectors and both the fixed part and the variable part in hybrids.
Specifically:
- Any type is a prefix of itself.
- All corresponding components are shared.
-
void
is trivially a prefix of any type.- No component is shared.
-
T1 = T
is a prefix ofT2 = struct <SEQ>
for any T where SEQ is a sequence of types beginning with T.- The whole T1 itself is a shared component with the first field in the struct T2.
-
T1 = T
is a prefix ofT2 = hybrid<T U>
for any T.- The whole T1 is a shared component with the fixed part in the hybrid T2.
-
T1 = T
is a prefix ofT2 = array<T n>
for any T if n >= 1.- The whole T1 is a shared component with the first element in array T2.
- For all types T1, T2 and T3, if T1 is a prefix of T3 and T3 is a prefix of T2, then T1 is a prefix of T2.
- The shared component between T1 and T2 are their mutual shared components with T3.
Examples:
-
float
is a prefix ofstruct<float double>
.- The first field is shared.
-
struct<@TIB_REF @LOCK @LENGTH_TYPE>
is a prefix ofhybrid<struct<@TIB_REF @LOCK @LENGTH_TYPE> int<8>>
.- The fixed part of the latter type is shared with the former type.
-
int<8>
is a prefix ofarray <int<8> 100>
.- The first byte element of the latter is shared with the former.
-
@TIB_REF
is a prefix ofhybrid<struct<@TIB_REF @LOCK @LENGTH_TYPE> int<8>>
- There is an intermediate type
struct<@TIB_REF @LOCK @LENGTH_TYPE>
that bridges the "is a prefix of" relation.
- There is an intermediate type
If:
- There is a memory location M which represents data of type T2, and,
- T1 is a type and T1 is a prefix of T2, and,
- r1 is an
iref<T1>
and refers to memory location M1, and, - r2 is an
iref<T2>
and refers to memory location M, and, - the beginning of M and M1 are the same (i.e. the have the same address), and,
- rc1 is r1 or an internal reference derived from r1, and,
- rc2 is r2 or an internal reference derived from r2, and,
- rc1 and rc2 refer to a shared component between T1 and T2,
then rc1 and rc2 refer to the same memory location. This means the shared components can be accessed as if it is a field of a prefix. This allows treating a subclass as an instance of a superclass.
Related standard: C11
- 6.3.2-3: (array --> pointer to first element) ... an expression that has type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’ that points to the initial element of the array object and is not an lvalue. ...
- 6.7.2.1-15:(pointer to struct <--> pointer to first field) ... A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. ...
TODO: C does not explicitly allow the prefixing between the following structs where their sequences of elements has a "prefix" relation:
struct Foo { short a; int b; long c; };
struct Bar { short a; int b; long c; float d; double e; };
There may be a reason behind it.
Object layout and C foreign function interface (FFI)
The object layout should follow the application binary interface (ABI) as much as possible because:
- The ABI is carefully designed by system programmers for performance. The µVM implementer needs a good reason why not to follow it.
- When a foreign function call to external C programs is needed, the µVM data structure should already be in the desired layout expected by the C programs.
The µVM only needs to guarantee some compatibility between µVM types and C types in the FFI. It is already documented in the Instruction Set, but needs to be double-checked.
Basic data structures that needs atomic accesses
To guarantee memory safety, the µVM must not allow out-of-thin-air reference values or opaque values. Affected types are:
- ref
- iref
- weakref (loaded into SSA variables as ref)
- func
- thread
- stack
- tagref64 (may contain ref)
- futex (one word integer, loaded into SSA variables as plain
int<WORD_SIZE>
)
Storing internal references (iref
) in the memory (heap or stack or global) is discouraged because of space inefficiency (no better way than encoding them as fat pointers). But if an implementation does allow putting iref
in the memory, the accesses to them should be atomic. Other types than iref
are too important not to be implemented as lock-free atomic types (all atomic accesses in µVM are lock-free).
An alternative to requiring iref
to be atomic is to:
- document that accesses to
iref
in the memory is not atomic, and, - require the client to compile all memory accesses to
iref
with locks, and, - have significant performance penalty.
Since a fat pointer consists of two words: an object reference plus an in-object offset, some (I assume there are only very few of them) architectures may not provide atomic access to such a length.