Commit 5fd41775 authored by Kunshan Wang's avatar Kunshan Wang

Support for symbols, relocs and primordial threads.

- The make_boot_image method is moved to MuCtx because it needs to refer
  to memory locations using handles of IRef types.

- Let global cells be always pinned, and added the get_addr API/CommInst
  to get their addresses.
parent c1d90055
......@@ -35,7 +35,9 @@ The Mu specification defines some types for using with common instructions, such
as ``@uvm.meta.byteref``. These types are always available. Whether other types,
signatures, constants, global cells or functions are already defined, declared
or exposed, or whether any Mu objects, stacks or Mu threads already created is
implementation-specific.
implementation-specific. When boot images are used to initialise the micro VM,
the micro VM will contain the contents of the boot image, but may still contain
implementation-specific extra entities.
How to stop a Mu micro VM and/or a client is implementation-specific. Stopping
the micro VM implies stopping all Mu threads in it.
......@@ -50,7 +52,9 @@ Mu IDs and names are represented as::
// Identifiers and names of Mu
typedef uint32_t MuID;
typedef char *MuName;
typedef MuCString MuName;
``MuCString`` is the 0-terminated C string type.
A Mu instance is represented as a pointer to the struct ``MuVM``::
......@@ -63,8 +67,6 @@ A Mu instance is represented as a pointer to the struct ``MuVM``::
MuID (*id_of )(MuVM *mvm, MuName name);
MuName (*name_of )(MuVM *mvm, MuID id);
void (*set_trap_handler)(MuVM *mvm, MuTrapHandler trap_handler, MuCPtr userdata);
void (*make_boot_image)(MuVM *mvm, MuID* whitelist, MuArraySize whitelist_sz, MuCString output_file);
};
.. _client-context:
......@@ -273,41 +275,6 @@ the *Trap Handling* section below for more information.
``userdata`` will be passed to the handlers as the last element when they are
called.
.. _make-boot-image:
::
void (*make_boot_image)(MuVM *mvm, MuID* whitelist, MuArraySize whitelist_sz, MuCString output_file);
The ``make_boot_image`` function creates a boot image which contains all
top-level definitions specified by ``whitelist``, which is an array of IDs. The
length of the array is ``whitelist_sz``. All heap objects reachable from any
global cells in the white-list are also in the boot image. The contents of the
global cells and reachable heap objects are preserved. It is an error if any
threads, stacks, frame cursors or IR Nodes are reachable from the global cells
in the white-list. The process of creating boot image is not atomic. Both
concurrent modifications of the memory reachable from the white-listed global
cells, and concurrent bundle loading, have undefined behaviours.
The IDs and the names of the entities are preserved in the boot image.
The boot image is written to the file specified by ``output_file``, a
``'\0'``-terminated C string. The format of the boot image is
implementation-defined.
Micro VM implementations may only allow the ``make_boot_image`` function to be
used in certain modes, enabled in implementation-specified manners.
NOTE: When building boot images, the micro VM implementation may need to
keep more information about the IR than usual. In usual occasions, the micro
VM may freely discard information (such as the type information not helpful
for GC) for space efficiency; but they need to be preserved when scanning
the heap for values other than references.
For example, an implementation may only enable this function if the VM is
started with the ``--enable-boot-image-building`` flag. In this case, it
will record more type information in the object layout.
MuCtx Functions
===============
......@@ -955,13 +922,16 @@ Watchpoint operations
Enabling or disabling any non-existing ``wpid`` has undefined behaviour.
.. _get-addr:
Object pinning
--------------
::
MuUPtrValue (*pin )(MuCtx *ctx, MuValue loc); // loc is either MuRefValue or MuIRefValue
void (*unpin)(MuCtx *ctx, MuValue loc); // loc is either MuRefValue or MuIRefValue
MuUPtrValue (*pin )(MuCtx *ctx, MuValue loc); // loc is either MuRefValue or MuIRefValue
void (*unpin )(MuCtx *ctx, MuValue loc); // loc is either MuRefValue or MuIRefValue
MuUPtrValue (*get_addr)(MuCtx *ctx, MuValue loc); // loc is either MuRefValue or MuIRefValue
* ``pin`` adds one instance of memory location ``loc`` to the pinning set
local to the current client context ``ctx``. Returns a pointer which has the
......@@ -971,9 +941,13 @@ Object pinning
local to the current client context ``ctx``. Attempting to unpin locations
that has not yet been added has undefined behaviour.
* ``get_addr`` gets the address of a memory location ``loc`` which must already
be pinned with respect to the current thread. This include all global cells
and other memory locations in the pinning set of the current thread.
``loc`` can be either ``ref<T>`` or ``iref<T>``. If it is ``ref<T>``, it pins
the underlying memory location as obtained by ``get_iref``. In both cases, the
result is ``uptr<T>``.
results of ``pin`` and ``get_addr`` are ``uptr<T>``.
..
......@@ -1013,6 +987,112 @@ type.
Implementations may define other calling conventions in addition to
``MU_CC_DEFAULT``.
Building Mu IR Bundles
----------------------
The ``new_ir_builder`` method creates a ``MuIRBuilder`` instance so that the
client can programmatically build and load Mu IR bundles. See `IR Builder
<irbuilder.rst>`__ for this part of the API.
Making Boot Image
-----------------
.. _make-boot-image:
::
void (*make_boot_image)(MuVM *mvm,
MuID* whitelist, MuArraySize whitelist_sz,
MuFuncRefValue primordial_func, MuStackRefValue primordial_stack,
MuRefValue primordial_threadlocal,
MuIRefValue *sym_fields, MuCString *sym_strings, MuArraySize nsyms,
MuIRefValue *reloc_fields, MuCString *reloc_strings, MuArraySize nrelocs,
MuCString output_file);
The ``make_boot_image`` function creates a boot image.
``whitelist`` is an array of IDs, and its length is ``whitelist_sz``. It is a
**white-list** of top-level definitions which will be built into the boot image,
and will appear to be already loaded into the micro VM when a micro VM instance
is created using the boot image.
When creating micro VM from a boot image, it may optionally create a
**primordial thread** (i.e. a thread whose state is preserved at
boot-image-building time). In this API, that thread is represented as a stack
plus a thread-local object reference. The client use either ``primordial_func``
or ``primordial_stack`` (but not both; the other must be NULL) to specify the
stack of the primordial thread, and use the ``primordial_threadlocal``
(NULL-able) as the initial thread-local object reference of the primordial
thread. If ``primordial_func`` is used, the primordial stack contains exactly
one frame that pauses at the beginning of that function; if ``primordial_stack``
is used, it is the primordial stack. The primordial stack must be in the
``READY<int<32> uptr<uptr<int<8>>>>`` state (the same as the ``main(int argc,
char** argv)`` signature in C). The resumption point will continue normally and
receive the ``argc`` and ``argv`` arguments from the command line. If neither
``primordial_func`` nor ``primordial_stack`` is specified, the boot image does
not contain any primordial threads. There is at most one primordial thread in
the boot image.
NOTE: The primordial thread may be used as the entry point of a pure Mu IR
program. Having only one primordial thread allows the implementation to
re-use the initial thread created by the operating system.
But if the entry point is not supposed to be any Mu IR program (such as any
non-metacircular clients that only uses the micro VM as a JIT compiler), the
client can create a boot image without any primordial threads. The boot
image still accelerates the loading of the initial bundle, but does not
start any threads for the client.
The boot image will also contain the **transitive closure** of the white-listed
top-level definitions. That includes all heap objects, functions, exposed
functions, constants and other global cells transitively reachable from any
white-listed global cells.
The contents of the global cells and reachable heap objects are preserved. It is
an error if any threads, stacks (other than the primordial stack), frame cursors
or IR Nodes are reachable from the global cells in the white-list. The process
of creating boot image is not atomic. Both concurrent modifications of the
memory reachable from the white-listed global cells, and concurrent bundle
loading, have undefined behaviours.
``sym_fields`` and ``sym_strings`` are two arrays, and the lengths of both are
``nsyms``. The two arrays, zipped, form a list of IRef-string pairs. The IRef
must refer to a field of a global cell in the boot image. The string is a symbol
(in the native linker's sense) which is placed at the field. The linker/loader
will be able to resolve the symbol to the address of the field of the global
cell. (NOTE: global cells are always pinned and have constant addresses after
loaded.)
``reloc_fields``, ``reloc_strings`` are also two arrays, and the length of both
are ``nrelocs``. These two arrays form a list of IRef-string pairs, too. The
IRef must refer to a field of a global cell or a heap object in the boot image.
The string is a symbol. All fields referred in ``reloc_fields`` must have
pointer types (``uptr`` or ``ufuncptr``). At load time, the address of the
corresponding symbol will be stored into the field by the loader.
NOTE: The system's native loader may have more features than this API, such
as adding numbers to a field rather than simply assigning them. This API is
simple, but should be just enough to solve the linkage problem of Mu objects
having untraced pointers to other Mu memory locations or native modules.
The IDs and the names of the entities are preserved in the boot image.
The boot image is written to the file specified by ``output_file``. The format
of the boot image is implementation-defined.
Micro VM implementations may only allow the ``make_boot_image`` function to be
used in certain modes, enabled in implementation-specified manners.
NOTE: When building boot images, the micro VM implementation may need to
keep more information about the IR than usual. In usual occasions, the micro
VM may freely discard information (such as the type information not helpful
for GC) for space efficiency; but they need to be preserved when scanning
the heap for values other than references.
For example, an implementation may only enable this function if the VM is
started with the ``--enable-boot-image-building`` flag. In this case, it
will record more type information in the object layout.
Trap Handling
=============
......@@ -1095,10 +1175,6 @@ Before returning, the trap handler should set ``*result``:
In all cases, if ``*new_stack``, any value in ``*values``, and/or ``*exception``
are used, they must be set and must be held by ``ctx``.
Building Mu IR Bundles
======================
See `IR Builder <irbuilder.rst>`__ for this part of the API.
Signal Handling
===============
......
===================
Boot-image Building
Boot Image Building
===================
Mu provides an interface to build "boot images". A boot image is a file that
......@@ -42,7 +42,35 @@ initialised when they start:
files into different regions of the address space of the process, and fixes
relocation entries.
A good language runtime must start-up fast. Thus it is not practical to use the
- **inter-references by untraced pointers**: In addition to calling foreign
functions, language runtimes may also need to create global variables of
struct types which contain pointers to global variables or functions defined
in native modules. Symbol resolution is not a problem for JIT (``dlsym`` will
work), but when creating boot images, the actual addresses of these fields can
only be resolved at load time.
Take the following C program as example::
struct Foo {
int bar;
int (*some_func_ptr)(char*);
};
struct Foo my_foo = { 42, puts };
At load time, the ``my_foo.some_func_ptr`` field is initialised to the
address of the ``puts`` function in libc. But the address is different each
time the program runs. So relocation entries need to be placed at the
``my_foo.some_func_ptr`` field so that the dynamic linker can fill in its
value at load time.
There is a pure client-side solution: (1) record such fields at compile time
as a list of IRefs in the boot image, then (2) resolve the symbols manually
(using ``dlsym``, the ``EXTERN`` Mu constants, or calculating the address of
Mu object fields from IRefs) at load time, and (3) assign them to the fields.
But this approach cannot make use of the system dynamic linker.
A good language runtime must start up fast. Thus it is not practical to use the
JIT compiler to compile all of the standard library functions at start-up time,
but the boot image builder should AoT compile the initial Mu IR bundle into
machine code, and simply memory-map the machine code into the memory. The same
......@@ -51,22 +79,29 @@ into the heap. It should also make use of system utilities (such as the dynamic
linker/loader) to load external libraries and resolve external symbols, thus
metadata should be provided.
Mu IR and API
Mu IR Changes
=============
Several extensions has been added to the Mu IR and the API to support boot image
building.
The IR can define `external constants <ir.rst#external-constructor>`__. They are
resolved in an implementation-specific manner when the bundle is loaded.
resolved in an implementation-specific manner when the bundle is loaded. The API
function `new_const_extern <irbuilder.rst#new-const-extern>`__ creates external
constants programmatically.
Global cells are permanently pinned. The new ``get_addr`` `API
<api.rst#get-addr>`__/`CommInst <common-insts.rst#get-addr>`__ gets the address
of already pinned memory locations, including global cells.
Two API functions are added:
Mu API Changes
==============
- `new_const_extern <irbuilder.rst#new-const-extern>`__ creates the external
constants programmatically.
The `MuCtx.make_boot_image <api.rst#make-boot-image>`__ will create the boot
image. Basically, it specifies what's in the boot image by a white-list of
top-level definitions, and saves all things reachable from them. It also lets
the client specify a list of memory locations in global cells where symbols will
be placed, and a list of memory locations in global cells and heap objects which
will be filled with the address of other symbols at load time.
- `make_boot_image <api.rst#make-boot-image>`__ creates the boot image. It takes
a white-list of top-level definitions and writes its transitive closure (over
both IR nodes, heap objects and global cells) into a file.
It also lets the client specify the initial state of a primordial thread which
can be the entry point of pure Mu IR programs.
.. vim: tw=80
......@@ -288,13 +288,16 @@ Native Interface
.. _pinning:
.. _get-addr:
Object pinning
--------------
::
[0x240]@uvm.native.pin <T> (%opnd: T) -> uptr<U>
[0x241]@uvm.native.unpin <T> (%opnd: T)
[0x240]@uvm.native.pin <T> (%opnd: T) -> uptr<U>
[0x241]@uvm.native.unpin <T> (%opnd: T)
[0x242]@uvm.native.get_addr <T> (%opnd: T) -> uptr<U>
*T* must be ``ref<U>`` or ``iref<U>`` for some U.
......@@ -308,12 +311,16 @@ Object pinning
multiset of the current thread. It has undefined behaviour if no such an
instance exists.
- ``get_addr`` gets the address of a memory location ``%opnd`` which must
already be pinned with respect to the current thread. This include all global
cells and other memory locations in the pinning set of the current thread.
Mu function exposing
--------------------
::
[0x242]@uvm.native.expose [callconv] <[sig]> (%func: funcref<sig>, %cookie: int<64>) -> U
[0x243]@uvm.native.expose [callconv] <[sig]> (%func: funcref<sig>, %cookie: int<64>) -> U
*callconv* is a platform-specific calling convention flag. *U* is determined by
the calling convention and *sig*.
......@@ -329,7 +336,7 @@ convention *callConv* with cookie *cookie*.
::
[0x243]@uvm.native.unexpose [callconv] (%value: U)
[0x244]@uvm.native.unexpose [callconv] (%value: U)
*callconv* is a platform-specific calling convention flag. *U* is determined by
the calling convention.
......@@ -338,7 +345,7 @@ the calling convention.
::
[0x244]@uvm.native.get_cookie () -> int<64>
[0x245]@uvm.native.get_cookie () -> int<64>
If a Mu function is called via its exposed value, this instruction returns the
attached cookie. Otherwise it returns an arbitrary value.
......
......@@ -727,7 +727,8 @@ memory allocation unit in the *global memory*. See `<memory.rst>`__ for more
information about the global memory.
NOTE: The global memory is the counterpart of static or global variables in
C/C++.
C/C++. In Mu, global cells are also permanently pinned so that it can be
used to interact with native programs.
A global cell definition has the form::
......
......@@ -142,8 +142,10 @@ overlapping regions in the address space.
* Mu forces the 2's complement representation, though the byte order and
alignment requirement are implementation-defined.
See `Native Interface <native-interface.rst>`__ for details about the pinning and
unpinning operations.
Global cells are always pinned. Other Mu memory locations can be pinned by
adding to the pinning set of any Mu thread (see `Native Interface
<native-interface.rst>`__ for details about the pinning and unpinning operations
which modifies the pinning set).
Memory Allocation and Deallocation
==================================
......
......@@ -264,11 +264,6 @@ struct MuVM {
// Set handlers
void (*set_trap_handler)(MuVM *mvm, MuTrapHandler trap_handler, MuCPtr userdata);
// Build boot image
void (*make_boot_image)(MuVM *mvm,
MuID* whitelist, MuArraySize whitelist_sz,
MuCString output_file); /// MUAPIPARSER whitelist:array:whitelist_sz
};
// A local context. It can only be used by one thread at a time. It holds many
......@@ -416,8 +411,9 @@ struct MuCtx {
void (*disable_watchpoint)(MuCtx *ctx, MuWPID wpid);
// Mu memory pinning, usually object pinning
MuUPtrValue (*pin )(MuCtx *ctx, MuValue loc); // loc is either MuRefValue or MuIRefValue
void (*unpin)(MuCtx *ctx, MuValue loc); // loc is either MuRefValue or MuIRefValue
MuUPtrValue (*pin )(MuCtx *ctx, MuValue loc); // loc is either MuRefValue or MuIRefValue
void (*unpin )(MuCtx *ctx, MuValue loc); // loc is either MuRefValue or MuIRefValue
MuUPtrValue (*get_addr)(MuCtx *ctx, MuValue loc); // loc is either MuRefValue or MuIRefValue
// Expose Mu functions as native callable things, usually function pointers
MuValue (*expose )(MuCtx *ctx, MuFuncRefValue func, MuCallConv call_conv, MuIntValue cookie);
......@@ -425,6 +421,15 @@ struct MuCtx {
// Create an IR builder and start building.
MuIRBuilder* (*new_ir_builder)(MuCtx *ctx);
// Build boot image
void (*make_boot_image)(MuVM *mvm,
MuID* whitelist, MuArraySize whitelist_sz,
MuIRefValue *sym_fields, MuCString *sym_strings, MuArraySize nsyms,
MuIRefValue *reloc_fields, MuCString *reloc_strings, MuArraySize nrelocs,
MuFuncRefValue primordial_func, MuStackRefValue primordial_stack,
MuRefValue primordial_threadlocal,
MuCString output_file); /// MUAPIPARSER whitelist:array:whitelist_sz;sym_fields:array:nsyms;sym_strings:array:nsyms;reloc_fields:array:nrelocs;reloc_strings:array:nrelocs;primordial_func:optional;primordial_stack:optional;primordial_threadlocal:optional
};
// These types are all aliases of MuID. They are provided to guide C
......
......@@ -184,6 +184,8 @@ conventions, the native function may be represented in different ways, and the
arguments are passed in different ways. The return value of the call will be the
return value of the ``CCALL`` instruction, which is a Mu SSA variable.
.. _exposed-function:
Native Functions Calling Mu Functions
-------------------------------------
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment