Commit 9e3e8ad4 authored by Kunshan Wang's avatar Kunshan Wang

Experimental boot image extension.

parent 1bcaca59
......@@ -26,5 +26,6 @@ Platform-specific parts: These extends the main specification. The main
specification considers these parts as implementation-specific.
- `AMD64 Unix Native Interface <native-interface-x64-unix.rst>`__
- `Extensions for Boot-image Building <bootimage.rst>`__
.. vim: tw=80
Extensions for Boot-image Building
This extension allows creating a "boot-image"—a partially initialised micro VM
instance which includes a pre-loaded bundles, pre-allocated heap objects,
pre-initialised memory contents, and external linkages.
Typical language runtimes (such as the ``java`` executable for JVM and the
``pypy`` executable for PyPy) need to have many things initialised when they
- **pre-loaded bundles**: These include the built-in data types and the
essential parts of the standard library of the language. For dynamic
languages, this part can be huge, because even the simplest operations (such
as "adding") must be implemented as complex routines. For some implementations
(such as PyPy), it also includes the interpreter or the metacircular client.
- **pre-allocated heap objects and pre-initialised memory**: These are the
essential built-in objects of the language in question. For some Java
implementations, the instance of ``java.lang.Class<Object>``, the default
``java.lang.ClassLoader``, the ```` IO objects, and their TIBs
can be pre-initialised.
- **external linkages**: Realistic language clients need to call system
functions, such as POSIX functions provided by libc as C functions. For
practical reasons, some parts of the language runtime may have to be written
in C rather than implemented as Mu functions. From Mu's point of view, C
functions are function pointers (``ufuncptr``), which are word-sized integers.
But the pointer values are not resolved until load time. Traditional C
programs address external functions symbolically (using function names, such
as ``open``, ``read``, ``write``), but leave *unresolved symbols* and
*relocation entries* in the executable file images (ELF, MachO, PE, etc.). At
load time, the dynamic loader loads different ``.so``, ``.dylib`` or ``.dll``
files into different regions of the address space of the process, and fixes
relocation entries.
Several extensions to the Mu IR and the API are made to support boot image
Mu IR Extension
In addition to existing *constant constructors*:
- An **external constant constructor** creates a pointer constant. It is written
+ the keyword ``EXTERN``, followed by
+ a string literal, which is a sequence of ASCII characters surrounded by
double quotation marks (code is 34). The code of each character shall be
within 33–126 but not 34, There is no escape sequences. This string
represents a symbolic name.
The values of such constants are implementation-defined. Usually the
implementation will resolve the symbolic names to the address of C functions.
.typedef @int = int<32>
.typedef @void = void
.typedef @voidp = uptr<@void>
.typedef @size_t = int<64>
.typedef @ssize_t = int<64>
.funcsig @write.sig = (@int @voidp @size_t) -> (@ssize_t)
.typedef @write.fp = ufuncptr<@write.sig>
.const @write = EXTERN "write"
.funcdef @main ... <...> {
%rv = CCALL #DEFAULT <@write.fp @write.sig> @write (%fd %buf %sz)
Mu Client API Extension
The ``MuCtx`` struct now has an extra method::
struct MuCtx {
// ... other methods...
MuConstNode (*new_const_extern )(MuCtx *ctx, MuBundleNode b, MuTypeNode ty, char *symbol);
The ``new_const_extern`` function creates an external constant. ``ty`` is the
type of the constant which must be a pointer type; ``symbol`` is the symbolic
The ``MuVM`` struct now has an extra method::
struct MuVM {
// ... other methods...
void (*make_boot_image)(MuVM *mvm,
MuID* whitelist, MuArraySize whitelist_sz,
char* output_file); /// MUAPIPARSER whitelist:array:whitelist_sz
The ``make_boot_image`` function creates a boot image which contains all
top-level definitions specified by ``whitelist``, which is an array of IDs. The
length of the array is ``whitelist_sz``. All heap objects reachable from any
global cells in the white-list are also in the boot image. The contents of the
global cells and reachable heap objects are preserved. It is an error if any
threads, stacks, frame cursors or IR Nodes are reachable from the global cells
in the white-list. The process of creating boot image is not atomic. Both
concurrent modifications of the memory reachable from the white-listed global
cells, and concurrent bundle loading, have undefined behaviours.
The IDs and the names of the entities are preserved in the boot image.
The boot image is written to the file specified by ``output_file``, a
``'\0'``-terminated C string. The format of the boot image is
Micro VM implementations may only allow the ``make_boot_image`` function to be
used in certain modes, enabled in implementation-specified manners.
NOTE: When building boot images, the micro VM implementation may need to
keep more information about the IR than usual. In usual occasions, the micro
VM may freely discard information (such as the type information not helpful
for GC) for space efficiency; but they need to be preserved when scanning
the heap for values other than references.
For example, an implementation may only enable this function if the VM is
started with the ``--enable-boot-image-building`` flag. In this case, it
will record more type information in the object layout.
Common Instructions Extension
TODO: Counterpart of ``make_boot_image``
.. vim: tw=80
......@@ -286,7 +286,12 @@ struct MuVM {
MuName (*name_of)(MuVM *mvm, MuID id);
// Set handlers
void (*set_trap_handler )(MuVM *mvm, MuTrapHandler trap_handler, MuCPtr userdata);
void (*set_trap_handler)(MuVM *mvm, MuTrapHandler trap_handler, MuCPtr userdata);
// Build boot image
void (*make_boot_image)(MuVM *mvm,
MuID* whitelist, MuArraySize whitelist_sz,
char* output_file); /// MUAPIPARSER whitelist:array:whitelist_sz
// A local context. It can only be used by one thread at a time. It holds many
......@@ -513,6 +518,7 @@ struct MuCtx {
MuConstNode (*new_const_null )(MuCtx *ctx, MuBundleNode b, MuTypeNode ty);
// new_const_seq works for structs, arrays and vectors. Constants are non-recursive, so there is no set_const_seq.
MuConstNode (*new_const_seq )(MuCtx *ctx, MuBundleNode b, MuTypeNode ty, MuConstNode *elems, MuArraySize nelems); /// MUAPIPARSER elems:array:nelems
MuConstNode (*new_const_extern )(MuCtx *ctx, MuBundleNode b, MuTypeNode ty, char *name);
// Create global cell
MuGlobalNode (*new_global_cell )(MuCtx *ctx, MuBundleNode b, MuTypeNode ty);
Markdown is supported
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment