GitLab will be upgraded to the 12.10.14-ce.0 on 28 Sept 2020 at 2.00pm (AEDT) to 2.30pm (AEDT). During the update, GitLab and Mattermost services will not be available. If you have any concerns with this, please talk to us at N110 (b) CSIT building.

overview.rest 12.1 KB
Newer Older
Kunshan Wang's avatar
Kunshan Wang committed
1 2 3
========
Overview
========
Kunshan Wang's avatar
Kunshan Wang committed
4

Kunshan Wang's avatar
Kunshan Wang committed
5 6
Mu is a micro virtual machine designed to support high-level programming
languages. It focuses on three basic concerns: 
Kunshan Wang's avatar
Kunshan Wang committed
7 8 9 10 11

- garbage collection
- concurrency
- just-in-time compiling

Kunshan Wang's avatar
Kunshan Wang committed
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
The Concept of Micro Virtual Machines
=====================================

Many programming languages are implemented on virtual machines.

There are many aspects in a language implementation. There are high-level
aspects which are usually language-specific, including:

* parsing high-level language programs, or loading high-level byte codes
* object oriented programming, including classes, inheritance and polymorphism
  (if applicable)
* functional programming, including high-order functions, pattern matching, etc.
  (if applicable)
* eager or lazy code/class loading
* high-level optimisation
* a comprehensive standard library

as well as low-level aspects which are usually language-neutral, including:

* an execution engine (for example, JIT compiling)
* a model of threads and a memory model
* garbage collection

A "monolithic" VM implements everything listed above. JVM is one of such VMs.
Creating such a VM is a huge amount of work. It takes two decades and billions
of dollars for the JVM to have a high-quality implementation. Such man-power and
investment is usually unavailable for other languages.

We coined the term "**micro virtual machine**". It is an analogue to the term
"microkernel" in the operating system context. A micro virtual machine, like a
micro kernel, only does what absolutely needs to be done in the micro virtual
machine, and pushes most high-level aspects to its client, the counterpart of
the "services" of a microkernel.
Kunshan Wang's avatar
Kunshan Wang committed
45

Kunshan Wang's avatar
Kunshan Wang committed
46 47 48 49 50 51 52
In a language implementation with the presence of a micro virtual machine, the
micro virtual machine shall handle those three low-level aspects, namely
concurrency, JIT and GC, and the client handles all other high-level aspects.

Take JVM as an example. If JVM were implemented as a client of a micro virtual
machine, it only needs to handle JVM-specific features, including the byte-code
format, class loading and aspects of object-oriented programming.
Kunshan Wang's avatar
Kunshan Wang committed
53 54 55 56

::

    Traditional JVM
Kunshan Wang's avatar
Kunshan Wang committed
57 58 59 60 61 62 63 64 65 66 67 68 69 70
    +-------------------+           +---------------------------+
    |                   |           |                           |
    |  *JVM*            |           |  *Java Client*            |
    | byte code format  |           | byte-code format          |
    | class loading     |           | class loading             |
    | OOP               |           | OOP                       |
    | GC                |           |                           |
    | concurrenty       |           +---------------------------+
    | JIT compiling     |           |  *micro virtual machine*  |
    |                   |           | GC,concurrency,JIT        |
    +-------------------+           +---------------------------+
    |  *OS*             |           |  *OS*                     |
    +-------------------+           +---------------------------+

Kunshan Wang's avatar
Kunshan Wang committed
71 72
The Mu Project
==============
Kunshan Wang's avatar
Kunshan Wang committed
73

Kunshan Wang's avatar
Kunshan Wang committed
74
Mu is a concrete micro virtual machine.
Kunshan Wang's avatar
Kunshan Wang committed
75 76

The main part of this project is this specification which defines the behaviour
Kunshan Wang's avatar
Kunshan Wang committed
77
of Mu and the interaction with the client. This allows multiple compliant
Kunshan Wang's avatar
Kunshan Wang committed
78 79 80
implementations.

The specification mainly includes the type system, the instruction set and the
Kunshan Wang's avatar
Kunshan Wang committed
81
Mu client interface (sometimes called "the API").
Kunshan Wang's avatar
Kunshan Wang committed
82

Kunshan Wang's avatar
Kunshan Wang committed
83 84
The Mu Architecture
-------------------
Kunshan Wang's avatar
Kunshan Wang committed
85 86

The whole system is divided into a language-specific **client** and a
Kunshan Wang's avatar
Kunshan Wang committed
87
language-neutral **micro virtual machine** (in this case, it is Mu).
Kunshan Wang's avatar
Kunshan Wang committed
88 89 90 91 92 93 94 95 96

::

         | source code or byte code
         v
    +-----------------+
    | client          |
    +-----------------+
              |   ^
Kunshan Wang's avatar
Kunshan Wang committed
97
    Mu IR /   |   | traps/watchpoints/
Kunshan Wang's avatar
Kunshan Wang committed
98 99
    API call  |   | other events
              v   |
Kunshan Wang's avatar
Kunshan Wang committed
100 101 102
    +-------------------+  manages   +---------+
    | Mu (the micro VM) |----------->| Mu heap |
    +-------------------+            +---------+
Kunshan Wang's avatar
Kunshan Wang committed
103

Kunshan Wang's avatar
Kunshan Wang committed
104 105 106 107
A typical client implements a high-level language (e.g. Python or Lua). Such a
client would be responsible for loading, parsing and executing the source code
or byte code.

Kunshan Wang's avatar
Kunshan Wang committed
108 109
The client submits programs to Mu in a language called **Mu Intermediate
Representation**, a.k.a. **Mu IR**. The Mu IR code is then executed on Mu.
Kunshan Wang's avatar
Kunshan Wang committed
110

Kunshan Wang's avatar
Kunshan Wang committed
111 112 113 114
The client can directly manipulate the states of Mu using the **Mu client
Interface**, a.k.a. **the API**. The API can access the Mu memory (including the
heap), create Mu threads, stacks, introspect stack states and so on. The Mu IR
code mentioned above is submitted via the API, too.
Kunshan Wang's avatar
Kunshan Wang committed
115

Kunshan Wang's avatar
Kunshan Wang committed
116 117 118
There are events which Mu cannot handle alone. These include lazy code loading,
requesting for optimisation/deoptimisation and so on. In these cases, Mu
generates events to be handled by the client.
Kunshan Wang's avatar
Kunshan Wang committed
119

Kunshan Wang's avatar
Kunshan Wang committed
120 121 122
Mu handles garbage collection internally. Mu can identify all references held
inside Mu and also tracks all references held by the client. So exact GC is
possible in Mu without the intervention from the client.
Kunshan Wang's avatar
Kunshan Wang committed
123

Kunshan Wang's avatar
Kunshan Wang committed
124
The Mu Type System
Kunshan Wang's avatar
Kunshan Wang committed
125 126
-------------------

Kunshan Wang's avatar
Kunshan Wang committed
127
The Mu type system has scalar and vector integer and floating point types,
Kunshan Wang's avatar
Kunshan Wang committed
128 129 130 131
aggregate types including structs, arrays and hybrids, as well as reference
types. The type system is low level, similar to the level of C, but natively
supports reference types.

Kunshan Wang's avatar
Kunshan Wang committed
132 133 134
Mu is agnostic of the type hierarchy in high-level languages, but the client can
implement its language-specific type system and run-time type information on top
of the Mu type system.
Kunshan Wang's avatar
Kunshan Wang committed
135

Kunshan Wang's avatar
Kunshan Wang committed
136
See `Type System <type-system.rest>`__ for more details.
Kunshan Wang's avatar
Kunshan Wang committed
137

Kunshan Wang's avatar
Kunshan Wang committed
138
The Mu Instruction Set
Kunshan Wang's avatar
Kunshan Wang committed
139 140
-----------------------

Kunshan Wang's avatar
Kunshan Wang committed
141
The Mu instruction set is similar to (and is actually inspired by) the `LLVM
Kunshan Wang's avatar
Kunshan Wang committed
142 143 144 145
<http://llvm.org/>`__'s instruction set. There are primitive
arithmetic/logical/relational/conversion operations and control flow
instructions.

Kunshan Wang's avatar
Kunshan Wang committed
146 147
Mu has its own exception handling, not depending on system libraries as C++
does. Mu IR programs can throw and catch exceptions, but the client needs to
Kunshan Wang's avatar
Kunshan Wang committed
148 149 150 151 152
implement its own exception type hierarchy if applicable.

There are garbage-collection-aware memory operations, including memory
allocation, addressing and accessing. The client does not need to implement
garbage collection algorithms; it only needs to use reference types and related
Kunshan Wang's avatar
Kunshan Wang committed
153
instructions and Mu handles the rest.
Kunshan Wang's avatar
Kunshan Wang committed
154

Kunshan Wang's avatar
Kunshan Wang committed
155
Trap instructions let Mu IR programs talk back to the client for events it
Kunshan Wang's avatar
Kunshan Wang committed
156 157 158 159
cannot handle.

There are also instructions for handling stack and threads.

Kunshan Wang's avatar
Kunshan Wang committed
160 161
See `Instruction Set <instruction-set.rest>`__ and `Common Instructions
<common-insts.rest>`__ for more details.
Kunshan Wang's avatar
Kunshan Wang committed
162

Kunshan Wang's avatar
Kunshan Wang committed
163
The Mu Client Interface
Kunshan Wang's avatar
Kunshan Wang committed
164 165
------------------------

Kunshan Wang's avatar
Kunshan Wang committed
166 167
The Mu client interface (API) allows the client to directly manipulate the state
of Mu.
Kunshan Wang's avatar
Kunshan Wang committed
168

Kunshan Wang's avatar
Kunshan Wang committed
169
The API can load Mu IR code.
Kunshan Wang's avatar
Kunshan Wang committed
170

Kunshan Wang's avatar
Kunshan Wang committed
171
The API can create threads and stacks. The usual way to start a Mu program is
Kunshan Wang's avatar
Kunshan Wang committed
172 173 174 175
to create a new stack with a function on the bottom of the stack and create a
new thread on it to start execution. (The concept of threads and stacks are
discussed later.)

Kunshan Wang's avatar
Kunshan Wang committed
176
The API can directly allocate and access the Mu memory. References are
Kunshan Wang's avatar
Kunshan Wang committed
177 178 179
indirectly exposed to the client as handles rather than raw pointers for the
ease of garbage collection. (The JVM takes the same approach.)

Kunshan Wang's avatar
Kunshan Wang committed
180
The client also handles trap events generated by the Mu IR code. The client can
Kunshan Wang's avatar
Kunshan Wang committed
181 182 183
introspect the selected local variables on the stack and perform on-stack
replacement (i.e. OSR. Discussed later.)

Kunshan Wang's avatar
Kunshan Wang committed
184
See `Client Interface <uvm-client-interface.rest>`__ for more details.
Kunshan Wang's avatar
Kunshan Wang committed
185

186 187 188 189 190 191 192 193 194 195 196 197 198 199 200
Unsafe Native Interface
-----------------------

The (unsafe) native interface is designed to directly interact with native
programs. It gives Mu program direct access to the memory via pointers, and
allows pinning Mu objects so that they can be accessed by native programs. It
also allows Mu to call a native (usually C) function directly, and allows native
programs to call back to selected Mu functions. (The .NET CLR takes similar
approach, i.e. giving the high-level program "unsafe" access to the lower
level.)

This interface is different from the client API. The main purpose is to
implement the system-interfacing part of the high-level language, such as the IO
and the networking library.

Kunshan Wang's avatar
Kunshan Wang committed
201
See `Native Interface <native-interface.rest>`__ for more details.
202

Kunshan Wang's avatar
Kunshan Wang committed
203 204 205
Multi-Threading
---------------

Kunshan Wang's avatar
Kunshan Wang committed
206 207 208
Mu supports threads. Mu threads are usually implemented with native OS threads,
but this specification does not enforce this. Multiple Mu threads may execute
simultaneously.
Kunshan Wang's avatar
Kunshan Wang committed
209

Kunshan Wang's avatar
Kunshan Wang committed
210
Mu has a C11/C++11-like memory model. There are atomic memory access with
Kunshan Wang's avatar
Kunshan Wang committed
211 212 213
different memory orders. The client should generate code with the appropriate
memory order for its high-level language.

Kunshan Wang's avatar
Kunshan Wang committed
214 215
Mu provides a Futex-like mechanism similar to the counterpart provided by the
Linux kernel. It is the client's responsibility to implement mutex locks,
Kunshan Wang's avatar
Kunshan Wang committed
216 217 218
semaphores, conditions, barriers and so on using atomic memory accesses and the
futex.

Kunshan Wang's avatar
Kunshan Wang committed
219 220
See `Threads and Stacks <threads-stacks.rest>`__ for details about threads and
`Memory Model <memory-model.rest>`__ for the Mu memory model.
Kunshan Wang's avatar
Kunshan Wang committed
221 222 223 224

The Swap-stack Operation
------------------------

Kunshan Wang's avatar
Kunshan Wang committed
225 226 227
Mu distinguishes between threads and stack. In Mu, a thread is the unit of CPU
scheduling and a stack is the context in which a thread executes. An analogy is
"workers and jobs".
Kunshan Wang's avatar
Kunshan Wang committed
228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245

A stack has multiple frames, each of which is a context of a function
activation, including local variables and the current instruction.

A *swap-stack* operation unbinds a thread from a stack (the old context) and
bind to another stack (the new context). As a result, the old context of
execution is paused and can be continued when another swap-stack operation binds
another thread (may not be the same old thread) to that stack. This is similar
to letting a worker stop doing one job and continue with another job.

The swap-stack operation is essentially a model of symmetric coroutines.  It
allows the client to implement coroutines in high-level languages (e.g. Ruby,
Lua, Go as well as Python and ECMAScript 6).

It also allows the client to implement its own light-weight
thread. This is particularly useful for languages with massively many threads
(e.g. Erlang).

Kunshan Wang's avatar
Kunshan Wang committed
246
See `Threads and Stacks <threads-stacks.rest>`__ for details.
Kunshan Wang's avatar
Kunshan Wang committed
247 248 249 250 251 252 253 254 255

Function Redefinition
---------------------

It is a common strategy to use a fast compiler to compile high-level programs to
suboptimal low-level code, and only optimise when the implementation decided at
run time that a function (or loop) is hot. Then an optimised version is
compiled. Optimising compilation usually takes longer, but the code runs faster.

Kunshan Wang's avatar
Kunshan Wang committed
256 257
In Mu, a function can have zero or more versions. When a function is called, it
always calls the newest version.
Kunshan Wang's avatar
Kunshan Wang committed
258

Kunshan Wang's avatar
Kunshan Wang committed
259
The semantic of *function definition* in Mu is to create a new version of a
Kunshan Wang's avatar
Kunshan Wang committed
260
function. What the client should do is to generate an optimised version of the
Kunshan Wang's avatar
Kunshan Wang committed
261
high-level code in Mu IR and submit it to Mu. All call sites and function
Kunshan Wang's avatar
Kunshan Wang committed
262 263 264 265 266 267
references are automatically updated.

If a function has zero versions, it is "undefined". Calling such a function will
"trap" to the client. Such functions behave like "stubs" and this gives the
client a chance to implement lazy code/class loading.

Kunshan Wang's avatar
Kunshan Wang committed
268 269
See `Intermediate Representation <uvm-ir.rest.rest>`__ for the definition of
functions and versions, and see `Client Interface <uvm-client-interface.rest>`__ for
Kunshan Wang's avatar
Kunshan Wang committed
270
the code loading interface.
Kunshan Wang's avatar
Kunshan Wang committed
271 272 273 274 275 276 277 278 279

On-stack Replacement
--------------------

At the same time when an optimised version of a function is compiled, there are
existing activations on the stack still running the old version.  On-stack
Replacement (OSR) is the operation to replace an existing stack frame with
another frame.

Kunshan Wang's avatar
Kunshan Wang committed
280
Mu provides two primitives in its API:
Kunshan Wang's avatar
Kunshan Wang committed
281 282 283 284 285

1. Pop the top frame of a stack.
2. Given a function and its arguments, create a new frame and push it on the top
   of a stack.

Kunshan Wang's avatar
Kunshan Wang committed
286 287 288
Note that Mu is oblivious about whether the new version is "equivalent to" or
"better than" the old version. The responsibility of optimisation is pushed to
the client.
Kunshan Wang's avatar
Kunshan Wang committed
289

Kunshan Wang's avatar
Kunshan Wang committed
290
See `Client Interface <uvm-client-interface.rest>`__ for more details.
Kunshan Wang's avatar
Kunshan Wang committed
291 292 293 294

Miscellaneous Topics
--------------------

Kunshan Wang's avatar
Kunshan Wang committed
295
The `Memory <uvm-memory.rest>`__ chapter provides more detail about garbage
Kunshan Wang's avatar
Kunshan Wang committed
296 297
collection and memory allocation/accessing.

Kunshan Wang's avatar
Kunshan Wang committed
298
The `Portability <portability.rest.rest>`__ chapter describes the requirements of
Kunshan Wang's avatar
Kunshan Wang committed
299 300
implementations. It summarises corner cases which may result in different or
undefined behaviours in different platforms.
Kunshan Wang's avatar
Kunshan Wang committed
301 302

.. vim: tw=80