Commit 9900b7d7 authored by Kunshan Wang's avatar Kunshan Wang

Deprecate binary form in favor for the IRBuilder.

As we discussed, the bundle loading interface between the client and the
micro VM should be a functional interface. The binary form is still a
parsed format. This commit removes the binary format in the type system
and the instruction set.
parent 5422904a
......@@ -5,23 +5,24 @@ Mu Specification
This document aims to provide a detailed description of Mu, a micro virtual
machine, including its architecture, instruction set and type system.
This branch uses the goto-with-values form. The previous branch using SSA form
with PHI nodes is in the `phi
<https://github.com/microvm/microvm-spec/tree/phi>`__ branch.
NOTE: This branch uses the goto-with-values form. The previous branch using
SSA form with PHI nodes is in the `phi
<https://github.com/microvm/microvm-spec/tree/phi>`__ branch.
Main specification:
- `Overview <overview.rest>`__
- `Intermediate Representation <uvm-ir.rest>`__
- `Intermediate Representation (Binary Form) <uvm-ir-binary.rest>`__
- `Intermediate Representation Binary Form (deprecated) <uvm-ir-binary.rest>`__
- `Type System <type-system.rest>`__
- `Instruction Set <instruction-set.rest>`__
- `Common Instructions <common-insts.rest>`__
- `Client Interface <uvm-client-interface.rest>`__
- `Client Interface (a.k.a. The API) <uvm-client-interface.rest>`__
- `Call-based IR Building API (work in progress) <irbuilder.rest>`__
- `Threads and Stacks <threads-stacks.rest>`__
- `Memory and Garbage Collection <uvm-memory.rest>`__
- `Memory Model <memory-model.rest>`__
- `Native Interface <native-interface.rest>`__
- `(Unsafe) Native Interface <native-interface.rest>`__
- `Heap Allocation and Initialisation Language (HAIL) <hail.rest>`__
- `Portability and Implementation Advices <portability.rest>`__
......@@ -29,6 +30,5 @@ Platform-specific parts: These extends the main specification. The main
specification considers these parts as implementation-specific.
- `AMD64 Unix Native Interface <native-interface-x64-unix.rest>`__
- `Call-based Mu IR Building API <call-irbuilder.rest>`__
.. vim: tw=80
......@@ -356,8 +356,7 @@ ID/name conversion
must be a global name.
- ``name_of`` converts the ID ``%id`` to its corresponding name. If the name
does not exist (if defined in binary only, or it is an instruction without
name), it returns ``NULL``. The returned object must not be modified.
does not exist, it returns ``NULL``. The returned object must not be modified.
They have undefined behaviours if the name or the ID in the argument do not
exist, or ``%name`` is ``NULL``.
......@@ -371,8 +370,10 @@ Bundle/HAIL loading
[0x253]@uvm.meta.load_hail (%buf: @uvm.meta.bytes.r)
``load_bundle`` and ``load_hail`` loads Mu IR bundles and HAIL scripts,
respectively. ``%buf`` is the content. The first 4 characters in ``%buf``
determines whether it is binary or text.
respectively. ``%buf`` is the content.
TODO: These comminsts should be made optional, and the IR Builder API should
be provided as comminsts, too.
Stack introspection
-------------------
......
......@@ -2,6 +2,11 @@
Heap Allocation and Initialisation Language
===========================================
**HAIL may not be the best tool**. The most efficient way to initialise a micro
VM is by building a boot image (this is implementation-specific). The most
efficient way to load objects from a serialised file is to build the
de-serialiser (such as the ".class" file parser) in Mu IR.
The Heap Allocation and Initialisation Language (HAIL) is a Mu IR-like language
that allocates heap objects and initialise Mu memory locations with values.
......
This diff is collapsed.
......@@ -79,11 +79,7 @@ integers, but their lengths are implementation-defined.
Constants
=========
An integer constant of type ``T = int<n>`` is required if T is implemented and n
is at most 64.
NOTE: The binary representation currently only defines 64-bit literals. This
is subject to extension.
Integer constants of type ``int<n>`` is required for all implemented n.
Float and double constants are required.
......
......@@ -14,9 +14,6 @@ constructors.
By convention, types are written in lower cases. Parameters to types are written
in angular brackets ``< >``.
The binary form of the type constructors is shown in the table after each type
constructor.
Type and Data Value
===================
......@@ -70,7 +67,7 @@ The following type constructors are available in Mu:
Parameters of a type are in the angular brackets. They can be integer literals,
types and function signatures. In the text form, the latter two are global
names (See `<uvm-ir.rest>`__). In the binary form, they are IDs.
names (See `<uvm-ir.rest>`__).
There are several kinds of types.
......@@ -126,16 +123,10 @@ following:
Integer Type
------------
``int < length >``
``length``
*integer literal*: The length of the integer in bits.
``int`` ``<`` *length* ``>``
+------+--------+
| opct | i8 |
+======+========+
| 0x01 | length |
+------+--------+
length
*integer literal*: The length of the integer in bits.
``int`` is an integer type of *length* bits.
......@@ -178,20 +169,8 @@ Floating Point Types
``float``
+------+
| opct |
+======+
| 0x02 |
+------+
``double``
+------+
| opct |
+======+
| 0x03 |
+------+
``float`` and ``double`` are the IEEE754 single-precision and double-precision
floating point number type, respectively.
......@@ -207,38 +186,14 @@ floating point number type, respectively.
Reference Types
---------------
``ref < T >``
``T``
*type*: The type of referent.
+------+-----+
| opct | idt |
+======+=====+
| 0x04 | T |
+------+-----+
``iref < T >``
``T``
*type*: The type of referent.
+------+-----+
| opct | idt |
+======+=====+
| 0x05 | T |
+------+-----+
``ref`` ``<`` *T* ``>``
``weakref < T >``
``iref`` ``<`` *T* ``>``
``T``
*type*: The type of referent.
``weakref`` ``<`` *T* ``>``
+------+-----+
| opct | idt |
+======+=====+
| 0x06 | T |
+------+-----+
T
*type*: The type of referent.
``ref`` is an object reference type. A ``ref`` value is a strong reference to a
heap objects.
......@@ -328,16 +283,10 @@ when a ``ref`` is stored to a ``weakref`` location, the location holds a
Struct
------
``struct < T1 T2 ... >``
``T1``, ``T2``, ``...``
*type*: The type of fields.
``struct`` ``<`` *T1* *T2* *...* ``>``
+------+---------+-----+-----+-----+
| opct | lent | idt | idt | ... |
+======+=========+=====+=====+=====+
| 0x07 | nfields | T1 | T2 | ... |
+------+---------+-----+-----+-----+
T1, T2, ...
*type*: The type of fields.
A ``struct`` is a Cartesian product type of several types. *T1*, *T2*, *...* are
its **field types**. A ``struct`` must have at least one member. ``struct``
......@@ -363,11 +312,12 @@ A ``struct`` cannot have itself as a component.
``struct`` cannot be the type of an SSA variable if any of its field types
cannot be the type of an SSA variable.
..
NOTE: For example, a ``struct`` with a ``weakref`` field cannot be the type
of an SSA variable. However, there can be references to such structs.
In the binary form, an integer literal ``nfields`` determines the number of
fields. Exactly that number of type IDs follows the ``nfields`` literal.
..
For LLVM users: This is the same as LLVM's structure type, except structures
with a "flexible array member" (a 0-length array as the last element)
......@@ -389,18 +339,12 @@ fields. Exactly that number of type IDs follows the ``nfields`` literal.
Array
-----
``array < T length >``
``T``
*type*: The type of elements.
``length``
*integer literal*: The number of elements.
``array`` ``<`` *T* *length* ``>``
+------+-----+--------+
| opct | idt | aryszt |
+======+=====+========+
| 0x08 | T | length |
+------+-----+--------+
T
*type*: The type of elements.
length
*integer literal*: The number of elements.
An ``array`` is a sequence of values of the same type. *T* is its **element
type**, i.e. the type of its elements, and *length* is the length of the array.
......@@ -444,18 +388,12 @@ It is not recommended to have SSA variables of ``array`` type.
Hybrid
------
``hybrid < F1 F2 ... V >``
``F1 F2 ...``
*list of types*: The types in the fixed part
``V``
*type*: The type of the elements of the variable part
``hybrid`` ``<`` *F1* *F2* *...* *V* ``>``
+------+---------+-----+-----+-----+-----+
| opct | lent | idt | idt | ... | idt |
+======+=========+=====+=====+=====+=====+
| 0x09 | nfields | F1 | F2 | ... | V |
+------+---------+-----+-----+-----+-----+
F1, F2, ...
*list of types*: The types in the fixed part
V
*type*: The type of the elements of the variable part
A hybrid is a combination of a fixed-length prefix, i.e. its ``fixed part``, and
a variable-length array suffix, i.e. its ``variable part``, whose length is
......@@ -509,12 +447,6 @@ Void Type
``void``
+------+
| opct |
+======+
| 0x0A |
+------+
The ``void`` type has no value.
It can only be used as the type of allocation units that do not store any
......@@ -524,19 +456,13 @@ heap object which is not the same as any others. This is similar to the ``new
Object()`` expression in Java. ``ref<void>``, ``iref<void>``, ``weakref<void>``
and ``uptr<void>`` are also allowed, which can refer/point to "anything".
Opaque Reference Type
---------------------
Function Reference Type
-----------------------
``funcref < sig >``
``funcref`` ``<`` *sig* ``>``
``sig``
*function signature*: The signature of the referred function.
+------+-----+
| opct | idt |
+======+=====+
| 0x0B | sig |
+------+-----+
sig
*function signature*: The signature of the referred function.
``funcref`` is a function reference type. It is an opaque reference to a Mu
function and is not interchangeable with reference types. *sig* is the signature
......@@ -567,30 +493,15 @@ A ``NULL`` value of a ``funcref`` type does not refer to any function.
.typedef @func1 = funcref<@sig1>
.typedef @func2 = funcref<@sig2>
``threadref``
Other Opaque Reference Types
----------------------------
+------+
| opct |
+======+
| 0x0C |
+------+
``threadref``
``stackref``
+------+
| opct |
+======+
| 0x0D |
+------+
``framecursorref``
+------+
| opct |
+======+
| 0x12 |
+------+
``threadref``, ``stackref`` and ``framecursorref`` are opaque reference types to
Mu threads, Mu stacks and frame cursors, respectively. They are not
interchangeable with reference types. Only some special instructions (e.g.
......@@ -608,12 +519,6 @@ Tagged Reference
``tagref64``
+------+
| opct |
+======+
| 0x0E |
+------+
``tagref64`` is a union type of ``double``, ``int<52>`` and ``struct<ref<void>
int<6>``. It occupies 64 bits. A ``tagref64`` value holds both a state which
identifies the type it is currently representing and a value of that type.
......@@ -646,12 +551,6 @@ Vector Type
``length``
*integer literal*: The number of elements.
+------+-----+--------+
| opct | idt | lent |
+======+=====+========+
| 0x0F | T | length |
+------+-----+--------+
``vector`` is the vector type for single-instruction multiple-data (SIMD)
operations. A ``vector`` value is a packed value of multiple values of the same
type. *T* is the type of its elements and *length* is the number of elements.
......
......@@ -256,13 +256,18 @@ Bundle and HAIL loading
The ``load_bundle`` function loads a Mu IR bundle, and the ``load_hail``
function loads a HAIL script. The content is held in the memory pointed to by
``buf``, and ``sz`` is the length of the content in bytes. It contains the
binary form if it begins with the 4-byte magic of the binary IR or the binary
HAIL script, otherwise it is the text form.
``buf``, and ``sz`` is the length of the content in bytes.
Concurrency: The content of the bundle or the effect of the HAIL script is fully
visible to other evaluations in the client that *happen after* this call.
..
TODO: These two functions should be made optional or lifted to the
higher-level API which should be beyond this spec. The text-form bundle
needs a parser, and the HAIL script is also not the most efficient way to
load data into Mu at run time.
..
For Lua users: This is similar to ``lua_load``, but a Mu bundle itself is
......
......@@ -5,6 +5,14 @@ Intermediate Representation (Binary Form)
This document describes the binary form of the Mu intermediate representation.
For the text form, see `<uvm-ir.rest>`__.
**DEPRECATED**: The binary format is deprecated. As mentioned in `this ticket
<https://github.com/microvm/microvm-meta/issues/55>`__, we have come to the
conclusion that the interface between the client and the micro VM should be a
functional interface, i.e. constructing IR nodes by invoking API functions. This
binary IR form is still a serialised data format that needs to be parsed. The
text form, however, is still useful for debugging and for using in statically
compiled implementations.
Overview
========
......
......@@ -9,13 +9,14 @@ Mu.
Mu can execute the program in any way, including interpretation, JIT compiling
or even Ahead-of-time compiling.
Mu IR itself has two defined representations: a text form for human readability
and a binary form for compact encoding. Concrete Mu implementations may
introduce their own formats as long as they are equivalent, in which case the
interface to use and to load them is implementation-specific.
Mu IR is a tree-shaped structure that consists of nodes, including top-level
definitions and their children. It also has a human-readable text form.
This document describes the text form and the aspects not specific to the binary
form. For the binary form, see `<uvm-ir-binary.rest>`__.
This document describes the top-level of the Mu IR using the text form. There is
also the `IR Builder API <irbuilder.rest>`__, a programmatic interface to build
Mu IR inside a running micro VM.
There was a binary form, but is now deprecated. See `<uvm-ir-binary.rest>`__.
For the documents of the type system and the instruction set, see:
......@@ -101,52 +102,19 @@ Many entities in Mu are **identified**. An identified entity has an
identifiers (ID) and optionally a name. An identifier (ID) is a 32-bit integer.
A name is a string starting with a ``@`` or a ``%`` and followed by many
characters in the set: ``[0-9a-zA-Z_-.]``. An ID uniquely identifies an
*identified* entity. A name also uniquely identifies an *identified* entity.
*identified* entity. A name, if present, also uniquely identifies an
*identified* entity.
NOTE: This specification does not define what is an "entity". An English
dictionary would define "entity" as "a thing with distinct and independent
existence".
In the Mu IR text form, names are exclusively used and IDs are automatically
generated. When generating IDs, Mu guarantees that there is a **mapping** from
each name to its corresponding ID, and **no two different names are mapped to
the same ID**. In the binary form, IDs are exclusively generated and names can
be introduced via name-binding.
NOTE: There may be multiple different identified entities for the same
thing. It is allowed to declare two types with the same concrete type::
.typedef @i64 = int<64>
.typedef @long = int<64>
In the Mu IR, if ``@i64`` and ``@long`` are declared this way, they can be
used interchangeably. For example, an ``ADD`` instruction can add an
``@i64`` value to a ``@long`` value because both are ``int<64>``.
NOTE: Just because some kinds of entities are identified does not mean all
entities of that kind in a concrete Mu implementation must have IDs and
names. Types are one example. Type definitions in the Mu IR are always
identified, but some Mu instructions may yield values of types that are not
defined in the IR. For example, the ``CMPXCHG`` instruction on type *T*
yields a value of type ``struct<T int<1>>`` where the second field indicates
whether the ``CMPXCHG`` operation is successful. The struct type may not be
defined in the IR, but the Mu IR program cannot make use of the result
unless it defines a type for the struct type because the ``EXTRACTVALUE``
instruction takes a type parameter which is the type of the struct value
parameter. For example::
.typedef @i1 = int<1>
.typedef @i64 = int<64>
.typedef @i64_cmpxchg_result = struct<@i64 @i1>
%result = CMPXCHG SEQ_CST RELAXED <@i64> %opnd %exp %des
%oldval = EXTRACTVALUE <@i64_cmpxchg_result 0> %result
%succ = EXTRACTVALUE <@i64_cmpxchg_result 1> %result
IDs of entities are determined by the micro VM.
Instructions of similar property include ``SHUFFLEVECTOR`` (result has a new
vector type), ``GETIREF`` (ref to iref), ``LOAD`` (converting ``weakref<T>``
to ``ref<T>``), ``CMPXCHG`` (result is a struct), ``ATOMICRMW`` (same as
``LOAD``).
The text form Mu IR only refers to entities by names. When loaded into a micro
VM, the IDs of entities in a bundle is automatically generated. When generating
IDs, Mu guarantees that there is a **mapping** from each name to its
corresponding ID, and **no two different names are mapped to the same ID**.
Names
~~~~~
......@@ -177,8 +145,12 @@ bundle.
A **local name** begins with ``%``. Function versions, basic blocks, parameters
and instruction results may use local names in the IR.
NOTE: Local names can only be used in the IR as a syntax sugar. The API
must use IDs or global names.
Local names are a syntax sugar in the text-form IR. When parsed, they are
de-sugared into global names.
NOTE: This implies that the client must use IDs or global names in the
client API because there is no local name once a text-form bundle is loaded
into the micro VM.
The global names are inferred from their syntactic parents:
......@@ -271,9 +243,9 @@ The global names are inferred from their syntactic parents:
@n4(<@i32> @n5 <@i32> @n6 <@i32> @n7)
@n8 = SLT <@i32> @n7 @n5
Local names are merely syntax sugar. Everything that has a local name can be
identified by their global names. It is still considered a naming conflict if
two local names have the same global name.
Because local names are merely syntax sugar, everything that has a local name
can be identified by their global names. It is still considered a naming
conflict if two local names have the same global name.
..
......@@ -295,8 +267,8 @@ two local names have the same global name.
Identifiers
~~~~~~~~~~~
All identifiers are global. Every ID must uniquely identify one entity in the
whole Mu instance.
All identifiers are global. Every ID uniquely identifies one entity in the whole
Mu instance.
0 is an invalid ID. IDs in the range of 1-65535 are reserved by Mu. The Mu
specification only uses 1-32767. 32768-65535 can be used by the Mu
......@@ -907,24 +879,29 @@ How such an exposed function can be called is implementation-specific.
Bundle Loading
==============
TODO: The spec should describe the call-based API, too.
The API provides a ``load_bundle`` function. See `the API
<uvm-client-interface.rest>`__. This function can be called by multiple client
threads on their client contexts, and the result is always equivalent to as if
they were loaded in a specific sequence.
In a bundle, if any identified entity has the same ID or name as any existing
identified entities defined in previous bundles, it is an error.
TODO: The ``load_bundle`` API function that loads text bundles should be
made optional, since it increases the burden of the micro VM.
A text-form bundle must not define any new entities whose name has been used by
any existing entity, otherwise it is a name-conflict error.
In a function definition, if the function ID or name is the same as an existing
function (which can be created either explicitly by a ``.funcdecl`` or
implicitly by a ``.funcdef``), it must also have the same function signature,
otherwise it is an error. The new function definition **redefines** the
function.
NOTE: There is a special case for ``.funcdef`` in the text form. If the
function name (such as ``@f`` in ``.funcdef @f VERSION @v <@sig> { ... }``)
is an existing function, it does not define a new function ``@f``, but it
only defines a new version ``@v`` for the existing function ``@f``.
After a function definition redefines a function, all calls to the function that
happen after the bundle loading operation will call the newly defined version of
the function. Defines of functions (bundle loading) and uses of functions
(including function calls and the creation of stacks, i.e. the
If a bundle contains a new version of an existing function, it **redefines** the
function. After this bundle is loaded, all function-calling operations to the
function that happen after the bundle loading operation will call the newly
defined version of the function. Defines of functions (bundle loading) and uses
of functions (including function calls and the creation of stacks, i.e. the
``@uvm.new_stack`` instruction or the ``new_stack`` API) obey the memory model
of the ``RELAXED`` order as if the definition is a store and the use is a load.
See `Memory Model <memory-model.rest>`__.
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment