The "surface" IR as a layer above the formal IR

Abstract

After the recent discussions about several alternative Mu IR forms, I realised the need for formal verification, the concrete micro VM implementation, and the application/client may be different. The IR, as a language, is the means for the client to transfer programs to the micro VM, and this implies compactness, efficiency and a certain degree of expressiveness that does not hinder efficient implementation. Formal verification, on the other hand, will benefit from the IR designed for functional languages, abstraction, consistency, and simplicity with respect to an operational model.

This issue proposes a two-level model: a "surface" form primarily for client-µVM communication, and a "formal" form for formalisation. The "surface" form is strictly a syntax sugar of the "formal" form, and can be transformed statically and automatically to the latter when needed.

With the surface form "detached" but mappable to the formal form, the surface form can be designed more aggressively for code compactness, such as introducing side-exits in the middle of basic blocks. But I am not advocating making such aggressive design changes now.

Different concerns

Terminating Instructions and Code Bloating

Currently, in the "surface" form as described by the (informal) Mu Spec, some Mu IR instructions can be either a normal instruction in the middle of a basic block, or a terminal instruction that implies several destinations. The CALL instruction is a well-known example. A basic block may contain may CALL instructions in a sequence. When an exception is thrown, the default behaviour is "rethrow". While in the "formal" form as described in uvm-formal-hol, CALL is always a terminating instruction, and both normal and exceptional destinations must be explicitly defined.

The "surface" form is designed with compactness and the machine behaviour in mind. Real-world programs will contain many function calls in a sequence. When no exceptions are thrown, each (both machine-level and language-level) call will simply fall-through to the next instruction, with the return values available to be accessed by subsequent instructions and function calls. The default "rethrow" behaviour is design so that when the current call site cannot catch exceptions, the stack-unwinder will simply skip the current function. It is the fact that the call site does not handle the exception in most cases that makes the code efficient.

A related topic is to add "side exits" in the middle of basic blocks, as is done by JikesRVM. This will remove the need to split basic blocks even when exceptions and "uncommon branch targets" are present and thus makes the IR more compact, but it will also make the control flow less explicit.

On the other hand, the "formal" form makes it easy to reason about all branches and all possible execution paths. If the IR is augmented with "NO_THROW" annotations, it can further make exceptions undefined behaviours, and relief the burdens of verification of some Mu functions.

Real-world programming languages, however, usually consistently choose one way or another, with "rethrow" more probable. Java, C#, Python and RPython, for example, always "rethrow", and does not provide the option of "nothrow". C++ allows the "noexcept" annotation on some C++ functions, and C++ exceptions can silently pass through any C functions on the stack, as long as the C functions are compiled with compatible compilers (such as g++/gcc), and the object files contain the appropriate unwinding information (.eh_frame in ELF).

But if future languages or existing languages with extensions (vmmagic) can make use of this feature, it has the potential to provide positive effects for the verifiability of carefully-written Mu functions.

Declarative vs Operational

The current Mu IR is inspired by the LLVM IR. This IR describes an AST of functions and top-level declarations, such as constants (which, in LLVM's terminology, include both literals, global variables and functions). The SSA form is naturally a "dependency-description language": an instruction has many "uses" of other Value, and each "Value" can be anything that we can get a value from, that is, either constant or local variable. The compiler sees the dependency of instructions, such as "this add instruction depends on an instruction result and a ConstantInt". With a type hierarchy, the compiler will need to pattern-match against the kinds of Values, as is what a static code-transformer always has to do.

On the other hand, mu-formal-hol describes the execution of a thread as a sequence of state transition (a kind of interpreting in the functional style). A thread has may registers, each holds the value of a local variable. The registers are modified as the side effect of execution:

When entering a basic block, the basic block arguments are assigned to the registers of the parameters.
When executing an instruction, the instruction will affect the values of some registers, and the register values are updated.

With a "Value" being either a constant or a register, the value needs to be pattern-matched against the two cases, namely constant or register, every time an instruction argument is evaluated.

A solution to this complexity is to introduce instructions that loads global vars into local vars. For example, GETCONST, GETGLOBALCELLIREF, GETFUNCREF and GETEXPFUNC. (I intentionally avoid using the word "load" in order to emphasise they are not memory operations, but merely aliasing.) This will make the IR semantically clearer and probably make proof easier, at the cost of making basic blocks more verbose. But fundamentally the two forms are equivalent.

Where the Two Forms Reconcile

I propose splitting the IR into two layers, with a "surface" catering to compactness, and the "formal" form designed to be more consistent. There will be a mapping from every "surface" IR bundle to a "formal" IR bundle, where the two forms reconcile. The point is, for every "surface" bundle, there will be an equivalent "formal" bundle. So it will not compromise verifiability by introducing "unfriendly" syntaxes.

The mapping will "desugar" the "surface" form. There will be a conversion rule for every "surface" syntax that does not exist in the "formal" form. For example:

The "fall-through" call

%cur_bb(%v1 %v2 ... %vn):
    %lv1 = ...
    %lv2 = ...
    ...
    CALL <@sig> @callee (...) EXC(
    (%rv1 %rv2) = CALL <@sig> @callee (...)
    %lv3 = next instruction...
    ...

will be desugared into:

%cur_bb(%v1 %v2 ... %vn):
    %lv1 = ...
    %lv2 = ...
    CALL <@sig> @callee (...) EXC(
        %generated_continue_block(%v1 %v2 ... %vn %lv1 %lv2 ... $0 $1)
        %generated_rethrow_block())

%generated_continue_block(%v1 %v2 ... %vn %lv1 %lv2 .. %rv1 %rv2):
    %lv3 = next instruction...
    ...

%generated_rethrow_block() [%the_exception]:
    THROW %the_exception

References to global variables:

%a = ADD <@i32> %a @ONE

will be desugared into:

%_tmp = GETCONST @ONE
%a = ADD <@i32> %a %_tmp

Other Implications

With the introduction of an additional form, the "surface" form can be more aggressive in design.

The "surface" form may go beyond the current "single-exit" form by introducing side-exits, such as:

DIV by zero, CALL/TRAP/WATCHPOINT/SWAPSTACK with exceptions, NEW/ALLOCA failure, LOAD/STORE with NULL pointers, may take side exits rather than forcing the basic block to be split.
"guard" instructions (as usually demanded by tracing JIT compilers) may be implemented as side-exiting conditional branches. This also implies that the "side-exit" is the slow path while the "fall-through" case is the common fast path.

Currently I fear that breaking the "single-exit" property may result in the micro VM still having to splitting them internally. LLVM, with its basic blocks not taking parameters, and its optimisers having lots of transforms to do, probably would keep the single-exit SSA form. But since Mu already adopted the "goto-with-values" form, whether "side-exits" should be introduced to the IR should depend on the experiences in the high-performance Mu implementation.

Related works:

B3: Apple's B3 still requires Jump/Branch/Switch to be at the end of basic blocks. The reason could be that B3 still uses the text-book SSA form, so non-merging control flow branches do not need PHI nodes, hence it is cheap to add basic blocks.
RPython: Its transformers (including GC transformers and exception transformers) will split basic blocks for function calls with exceptions. But since RPython is static, code compactness may not be a concern.
JikesRVM: JikesRVM uses Factored CFG (FCFG), where a Potential Excepting Instruction (PEI) does not necessarily end a basic block. As described here, FCFG will significantly reduce the number of basic blocks, but will complicate flow-sensitive global analysis. But given that Mu pushes most optimisations out of the micro VM, it is arguable that the micro VM back end may favour a simpler form.