WebKit's FTL JIT now uses a new backend called B3 (Bare Bone Backend), which replaces LLVM as the low-level optimizer. This layer shares many principles with the Mu micro VM, most notably the role as a low-level substrate of a virtual machine.
This Wiki page will discuss the B3 JIT and its influence to the Mu micro VM.
B3: LLVM is powerful but slow. It takes up 3/4 of the optimisation time.
B3 should be 10 times faster than LLVM while still operate the same level of granularity as LLVM and do the same kind of optimisations LLVM does.
Mu: Mu abstracts over concurrency, JIT and GC, but offloads most optimisations to the client.
size of the IR
B3: Focus on the size (memory footprint) of the IRs, and the number of IRs (levels). Much effort is made to design a compact representation that is also fast to traverse and transform.
Mu: Not designed with memory footprint in mind. The in-memory representation of the IR is an implementation detail.
B3 IR: void, int32, int64, float, double. No heap references. No aggregate types. Integers are used as addresses.
Mu IR: Has object references, pointers, aggregate types, etc.
Both B3 and Mu IR have BinOps, Comparisons, Conversions, Branch and Switch. Both B3 and Mu use the goto-with-values form (B3 uses "Upsilon functions" to "assign" values to Phi-nodes.).
B3: Can only call C functions (CCall). Custom calling conventions are possible via Patchpoints.
Mu: Can call both Mu functions (CALL) and C functions (CCALL). Mu implementations may support more calling conventions in addition to the C callconv, but the client cannot customise it.
B3: Load and Store with either Int32 or Int64 as addresses.
Mu: Load, Store, CmpXchg and AtomicRMW with iref<T> or ptr<T>. ptr<T> can be converted from int<n>.
B3: Does not abstract over GC.
Mu: It does.
In short, patchpoint is like inline assembly: it lets the client insert arbitrary machine code into the B3 IR.
Unlike LLVM, B3 patchpoints integrate deeply into the compiler, giving the client a MacroAssembler to populate the code and not limiting the size of patchpoints. Like GCC's inline assembly, patchpoints can also specify which SSA values (in B3 IR) are accessible in the machine code, and specify which machine registers those values should be in.
Patchpoint is a powerful mechanism. With the power to insert arbitrary machine code, the client can implement many advanced features, such as custom calling conventions, self-modifying code, and traps/watchpoints (which the blog refers to as "OSR").
Typically OSR points are placed after a guard. In LLVM and Mu, such pattern has to be expressed as a CMP, a BRANCH and a TRAP. B3 introduced the CheckXxx family of instructions:
Check(cond, keepalives): Execute custom machine code when cond is false.
CheckAdd(op1, op2, keepalives): Execute custom machine code when op1+op2 overflows.
CheckSub, CheckMul: ditto
To program the patchpoint to be a TRAP, the client just needs to generate machine code that behaves like a TRAP, and use its arguments as keep-alive variables. The Check and CheckXxx instructions do not break the basic block, and are not allowed to fall through once the abnormal path is taken.
Influence to Mu
Conditional traps without breaking basic blocks
Mu was not designed with minimising the number of basic blocks in mind. The Check in B3 needed to be CMP, BRANCH2 and then TRAP. But a conditional trap can be introduced to the Mu IR: %rv = CONDTRAP %cond <T> KEEPALIVES(...). A flag can be used to indicate this instruction cannot continue: CONDTRAP [#NO_CONT] %cond <> KEEPALIVES(...). That flag will give undefined behaviours if the stack is re-bound without performing OSR to remove the current frame.
The #NO_CONT flag can be helpful for subsequent Mu IR instructions that can benefit from the fact that %cond is true. Currently we don't know any such Mu IR instructions, yet. Most relevant optimisations, such as eliminating NULL checks, can be done by the client.
Currently the Mu IR does not have an "unreachable" instruction. The idiom is to throw an exception. But adding an "UNREACHABLE" instruction may be helpful to the backend.
BinOp with overflow handling
Binary operations may take a flag so that it will return additional boolean values to indicate certain events. For example: