Commit f2482cb2 authored by Kunshan Wang's avatar Kunshan Wang

Converted to github wiki rst.

Welcome to the MicroVM Design documentation!
This document aims to provide a detailed description to the MicroVM project,
including its architecture, instruction set and type system.
`Overview <overview>`__
`µVM Intermediate Representation <uvm-ir>`__
`µVM IR Binary Form <uvm-ir-binary>`__
`Type System <type-system>`__
`Instruction Set <instruction-set>`__
`Intrinsic Functions <intrinsic-funcs>`__
`Memory Model <memory-model>`__
.. vim: tw=80
This diff is collapsed.
Intrinsic Functions
This document specifies intrinsic functions. Intrinsic functions work the same
as instructions except they all have the same form: all intrinsic functions have
many value parameters, have one return value (may be void), may keep many
registers alive and may throw an exception.
This document uses the ``id : @name :: signature`` notation. ID is used in the
binary form and the name (including "@") is used in the text form. The signature
has the form: ``retty (%paramname0: paramtype0, %paramname1: paramtype1,
...)``, where ``retty`` is the return type, ``%paramnamex`` is the parameter
name and ``paramtypex`` is the parameter type.
Thread and Stack operations
- ``0x201 : @uvm.new_thread :: thread (%s: stack)``
Create a new thread for a given stack ``%s``. The stack must be in the READY
state before calling. After calling ``@uvm.new_thread``, the stack
enters the ACTIVE state and the new thread starts running immediately.
Return the handle to the newly created thread.
- ``0x202 : @uvm.swap_stack :: void (%s: stack)``
- OSR point (may have KEEPALIVE).
- May throw exception.
Swap the current thread from the current stack to a given stack ``%s`` and
continue executing. The current stack will enter the READY state after
calling. The destination stack must be in the READY state before calling and
will enter the ACTIVE state.
- ``0x203 : @uvm.kill_stack :: void (%s: stack)``
Destroy the given stack ``%s``. The stack ``%s`` must be in the READY state
before calling and will enter the DEAD state.
- ``0x204 : @uvm.swap_and_kill :: void (%s: stack)``
Destroy the current stack and swap to another stack ``%s``. The current stack
will enter the DEAD state after calling. The stack ``%s`` must be in the READY
state before calling and will enter the ACTIVE state.
- ``0x205 : @uvm.thread_exit :: void ()``
Stop the current thread and kill the current stack. The current stack will enter
the DEAD state after calling. The current thread stops running.
Math functions
TODO: All functions available in math.h should be available in the µVM.
.. uvm:ifunc:: 0x101 : @uvm.math.sin :: double (%opnd: double)
.. vim: tw=80
Memory Model
TODO: Copy relevant material from LLVM, C11 and C++11 documents.
MicroVM Overview
µVM is a low-level virtual machine (not to be confused with the existing LLVM
project). It is a thin layer over the OS level. It has a low-level type system
and instruction set, but has native support for three major functions:
- garbage collection
- concurrency
- just-in-time compiling
Traditional virtual machines must implement them as well as implementing the
high-level programming language. When a programming language is implemented on
top of the µVM, it can depend on the µVM for proper GC, JIT and concurrency
Take Java as an example. If Java were implemented on top of µVM, it only needs
to handle JVM-specific features, including the byte-code format, class loading
and aspects of object-oriented programming.
Traditional JVM
+-------------------+ +-----------------------+
| | | |
| *JVM* | | *Java Client* |
| byte code format | | byte-code format |
| class loading | | class loading |
| OOP | | OOP |
| GC | | |
| concurrenty | +-----------------------+
| JIT compiling | | *µVM* |
| | | GC,concurrency,JIT |
+-------------------+ +-----------------------+
| *OS* | | *OS* |
+-------------------+ +-----------------------+
How It Works
The whole system is divided into a language-specific **client** and a
language-neutral **Micro Virtual Machine**, a.k.a. **µVM**.
| source code or byte code
| client |
| ^
µVM IR / | | trap/watchpoints
API call | | other events
v |
+-----------------+ manages +----------+
| µVM |----------->| µVM heap |
+-----------------+ +----------+
A typical client implements a high-level language (*e.g.,* Python or
Lua). Such a client would be responsible for loading, parsing and
executing the source code. The client may implement execution by
interpreting and using µVM only when JIT-compiling is needed for hot
code. It may also totally depend on the µVM as the execution engine.
The client presents programs to the µVM in a language called **µVM Intermediate
Representation**, a.k.a. **µVM IR**. µVM IR can define the following things
- types
- function signatures
- constants
- global data
- functions (declare or define)
But µVM IR alone is not sufficient for implementing a language. A specific µVM
implementation provides APIs for the clients to do more things, including:
- load µVM IR code
- start executing a function
- allocate objects and perform load/store on the heap. This allows
pre-allocating objects before the main function runs.
- handle traps and do on-stack replacement (OSR)
- handle the calls to functions which are declared but not defined
- handle the event when an object referred by a weak reference is about to be
See `µVM IR <uvm-ir>`__ for an overview of the µVM IR.
The documentation of the implementation will be provided soon.
.. vim: tw=80
Type System
In this document, the text format appears as:
``sometype < type_param1 type_param2 ... >``
description for type param1
description for type param2
Type Constructors
All µVM types are (potentially recursive) combinations of the following *type
constructors*. By convention, types are written in lower cases. Type parameters
are written in angular brackets ``< >``.
The binary form of the type constructors are shown in the table after each type
- int < length >
- float
- double
- ref < T >
- iref < T >
- weakref < T >
- struct < T1 T2 ... >
- array < T length >
- hybrid < F V >
- void
- func < sig >
- thread
- stack
- tagref64
Numeric Types
``int < length >``
*intLiteral*: The length of the integer in bits.
| opct | i8 |
| 0x01 | length |
| opct |
| 0x02 |
| opct |
| 0x03 |
``int`` is the only integer type in µVM. *length* is the length of the
integer in bits. Boolean values accepted by the ``SELECT`` and
``BRANCH2`` instructions are represented as ``int<1>``, but the client
decides what µVM type the high-level language boolean type maps to.
Integer values can be interpreted as either signed or unsigned. The signedness
is determined by the instructions rather than the type. For example, for the
division operation, the ``UDIV`` instruction treats its both operands as
unsigned integers and ``SDIV`` treats them as signed integers. When an integer
is treated as signed, it uses the 2's complement representation where the
highest bit is the sign bit.
``float`` and ``double`` are single-precision and
double-precision floating point numbers, respectively.
For LLVM users: these types are directly borrowed from LLVM.
Reference Types
``ref < T >``
*type*: The type of referent.
| opct | idt |
| 0x04 | T |
``iref < T >``
*type*: The type of referent.
| opct | idt |
| 0x05 | T |
``weakref < T >``
*type*: The type of referent.
| opct | idt |
| 0x06 | T |
``ref`` is an object reference. It always refers to an object on the
heap; ``iref`` is an internal reference: it refers to a field inside an
object on the heap or on the stack. The *T* parameter is always the type this
reference refers to.
``ref`` and ``iref`` may have value null, which is represented
by literal 0 in the text form. It is the only allowed invalid reference. All
null references are equal. Deriving an ``iref`` from a null
``ref`` is meaningless.
``weakref`` is a weak object reference. Its referent will be garbage
collected if there is no strong references (``ref`` or ``iref``)
to it. In the event that the space of the referent is reclaimed and the
``weakref`` is no longer valid, its content will become null (0).
There is no weak internal reference.
``weakref`` is not SSA Value. It must be in the memory (heap or stack).
``LOAD`` from a ``weakref`` yields a strong reference and a
strong reference can be ``STORE`` -ed into a ``weakref`` field.
µVM provides the client a mechanism to retain any ``weakref`` when its
referent is about to be collected. This is done in an implementation-specific
way and is beyond the scope of this specification.
For LLVM users: there is no equivalence in LLVM. µVM guarantees that all
references are identified both in the heap and in the stack and are subject to
garbage collection. The closest counterpart in LLVM is the pointer type, but µVM
does not encourage the use of pointers, though pointer types will be introduced
in µVM in the future.
ref<struct<int<32> int<16> int<8> double float>>
ref<array<int<8> 100>>
iref<struct<int<32> int<16> int<8> double float>>
iref<array<int<8> 100>>
weakref<struct<int<32> int<16> int<8> double float>>
weakref<array<int<8> 100>>
Composite Types
``struct < T1 T2 ... >``
``T1``, ``T2``, ``...``
*type*: The type of fields.
| opct | lent | idt | idt | ... |
| 0x07 | nfields | T1 | T2 | ... |
A struct is a Cartesian product type of several types. *T1*, *T2*, etc. are its
A struct cannot be an SSA Value if it has an array as its component or the
component of nested structs.
In the binary form, an integer literal ``nfields`` determines the number of
fields. Exactly that number of type IDs follows the ``nfields`` literal.
For LLVM users: this is almost identical to LLVM's struct type, except it does
not allow embedded arrays.
struct<int<32> int<16> int<8> double float>
struct<struct<int<32> int<32>> float struct<int<8> double>>
``array < T length >``
*type*: The type of elements.
*intLiteral*: The number of elements.
| opct | idt | aryszt |
| 0x08 | T | length |
An array is a sequence of homogeneous data structure in the memory. *T* is the
type of its elements and *length* is the length of the array.
For LLVM users: **An array is always fixed-length**. There is no type for "array
of run-time-determined length" in the µVM type system. The closest counterpart
is the ``hybrid`` type.
array<int<8> 4096> // array of 4096 bytes
array<double 100> // array of 100 doubles
array<struct<int<64> ref<void>> 16> // array of 16 long-ref pairs
array<array<int<64> 1024> 1024> // array of arrays
``hybrid < F V >``
*type*: The type of the fixed part
*type*: The type of the elements of the variable part
| opct | idt | idt |
| 0x09 | F | V |
A hybrid is a combination of a fixed-size prefix and a array-like
variable-length suffix whose length is decided at allocation time. *F* is the
type of the fixed-size prefix. *V* is the type of the **elements** of the
variable-length suffix.
hybrid<int<64> int<8>> // one int64 followed by many int8
struct<int<64> int<64> int<64>> // three initial int64 headers
double // followed by many doubles
hybrid<void int<8>> // no header. Just many int8.
Void Type
| opct |
| 0x0A |
The ``void`` type has no value. It is useful for functions that does not
return value, references that refer to undetermined type and the
``hybrid`` type that misses the fixed part.
Function Types
``func < sig >``
*function signature*: The signature of the referred function.
| opct | idt |
| 0x0B | sig |
``func`` is a type for function identifiers. It is an opaque identifier
of a µVM function. the signature *sig* determines the parameter types and return
type of the function.
µVM allows a function to be re-defined at run time. The ID of the re-defined
function will not change and all ``func`` values will automatically
refer to the newly defined function.
For LLVM users: the ``func`` type in µVM is not a pointer (though may be
implemented as a pointer underneath, but this cannot be depended on). It is
opaque and is not supposed to be introspected.
func<int<64> (int<64> int<64>)>
func<int<32> (int<32> iref<int<8>>)>
func<void ()>
Opaque Types
Some types identify internal µVM data structures. The actual binary
representation of the values are not visible to the client.
| opct |
| 0x0C |
| opct |
| 0x0D |
``thread`` and ``stack`` represent µVM threads and µVM stacks,
respectively. Only some special instructions (e.g. ``NEWSTACK``) or
intrinsic functions can work on them.
| opct |
| 0x0E |
``tagref64`` is a union type of ``double``, ``int`` and
``ref``. It occupies 64 bits. The type of the content can be tested at
run time using the ``@uvm.tr64_is_xxx`` family of intrinsic functions. Intrinsic
functions like ``@uvm.tr64_to_xxx`` and ``@uvm.tr64_from_xxx`` are for
converting them to and from regular primitive types.
.. vim: tw=80
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment