In-place code generation

The idea is to generate machine code directly without going through some representation of the machine code. The motivation is to save allocation/spaces for the machine code representation. Current x86_64 assembly backend tests the idea of in-place code generation (though it is completely not necessary for an assembly backend to do this).

Issues:

Too many eliminated moves (that are turned into nop), which results in bad performance. One solution is to do compaction on the code in the end (slide/copy the code to remove 'holes').
In-place code generation introduces extra memory overhead, as we need to save locations of registers. A note for x86_64 asm backend: Metadata for instruction can be more optimised. For example, currently I am using LinkedHashMap<MuID, Vec<ASMLocation>> to store information on used and defined registers. Both LinkedHashMap and Vec is expensive. We can use fixed length array, or simply use1, use2, use3...
It makes machine code level optimisations and transformations hard to implement.

Questions:

Is it possible that we can emit machine code (binary) before we know the final code? For example, for jmp, it may turn into different machine code based on how big the offset is. However any assembler may need to deal with this question.

We need to discuss this more about this before starting implementing JIT.