Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • G general-issue-tracker
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 47
    • Issues 47
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Container Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • mumu
  • general-issue-tracker
  • Issues
  • #66
Closed
Open
Issue created Aug 11, 2016 by Kunshan Wang@u5211824Owner

Are bundles the unit of compiling or the unit of loading?

Two different views of "bundle"

From the compiler's point of view, compiling is a process where:

  1. There are many modules (such as .class files), all of them needs to be compiled (to Mu IR, for example). Modules may have inter-dependencies, and there could even be mutual recursions (A imports B, B imports C, and C imports A).
  2. Compilers should compile each module separately, with no knowledge of other modules. This implies each module compiles to a stand-alone bundle that has all the necessary things (types, functions, ...) used inside the bundle. Since types use "structural equivalence", it does not matter if two structurally isomorphic types are defined twice in two bundles. In particular, functions can be declared multiple times in different bundles.
  3. When they are linked, bundles are merged. Different bundles should have no intersections, with one exception: Functions of the same name are resolved to be the same, so calling a declared but not defined function may ends up calling a function defined in another bundle. (Global cells should have similar properties, too.) We also allow calling functions that are declared but not defined in any other bundles. which triggers lazy loading.

In the current Mu design, the micro VM's view of bundles is:

  1. There is a global bundle, which includes everything that is ever loaded at a point of time.
  2. Bundle is the unit of loading. Bundles are loaded sequentially. (At least it is perceived to be so through the API. The Mu impl can load them in parallel while ensuring sequential consistency.)
  3. Each bundle can refer to things (types, functions, ...) defined in the current bundle or the global bundle.
  4. Every time a bundle is loaded, the contents (types, functions, ...) are merged into the global bundle. That is, the is one single global bundle which gets gradually augmented as bundles are loaded. Conflicts are not allowed. If a new function version (FuncVer) is defined on an existing function, this FuncVer becomes the "most recent version" of the function.

The main difference between the two views is whether we consider bundle loading to be a static, separated and parallel process, or a dynamic and sequentially inter-dependent one.

Why is Mu designed like this?

The current Mu design is based on that (1) Mu is a run-time JIT compiler, and (2) Mu supports function re-definition. Because Mu is a run-time entity, it is one single thing that lives through the life of the application. It will observe all things the client ever deliver to it (bundles), and this is a temporal process. The micro VM always starts with no knowledge, and the client "teaches" the micro VM more and more knowledge by loading bundles. So the "global bundle" represents the "current knowledge" the micro VM has about the world (i.e. the types, functions, ... of the client's language). Since the growth of knowledge is a sequential process, it is natural to assume bundles are loaded in a sequence. In this way, if a later bundle refers to things the micro VM already knows (for example, types defined in previously loaded bundles), then it does not need to define/declare them again because Mu already knows it, so the bundle can just refer to them by name/ID. The sequential nature also makes it easy to support function re-definition. Since there is a sequence in bundles, a FuncVer in a newer bundle will replace the current "most recent" version in the global bundle.

The separate-compiling approach is the traditional and well-known way how the C compilers work. And it does not address function re-definition. Re-definition is still an "action" rather than a declaration, and the order of "which FuncVer invalidates which older FuncVer" does matter.

What the client may want

But compiler (traditional C compiler or Mu client) writers may want a certain degree of flexibility of parallel compilation, and some aesthetic appeal that "separate modules should be compiled to separate Mu bundles". For example, as a JVM client, it will be more intuitive to generate one Mu IR bundle for each .class file, and each .class file can be compiled separately, and still allow lazy loading. For example:

//// Foo.class
public class Foo {
    public static void run() { Bar.run(); }
}
//// Bar.class
public class Bar {
    public static void run() { Foo.run(); }
}

The separate-compilng model will deliver two Mu bundles:

//// Bundle1:
.typedef @Foo = ....
.funcdef @Foo.run VERSION %v1 ... {
    ...
    CALL @Bar.run()
}
.funcdecl @Bar.run ...    // Declare @Bar.run in Bundle1

//// Bundle2
.typedef @Bar = ....
.funcdef @Bar.run VERSION %v1 ... {
    ...
    CALL @Foo.run()
}
.funcdecl @Foo.run ...    // Declare @Foo.run in Bundle2

That is, @Bar.run is declared in Bundle1 and @Foo.run is declared in Bundle2. They declare functions in each other because neither has knowledge of the other.

However, in the current Mu model, the two bundles will look like:

//// Bundle1:
.typedef @Foo = ....
.funcdef @Foo.run VERSION %v1 ... {
    ...
    CALL @Bar.run()
}
.funcdecl @Bar.run ...    // Declare @Bar.run in Bundle1

//// Bundle2
.typedef @Bar = ....
.funcdef @Bar.run VERSION %v1 ... {
    ...
    CALL @Foo.run()
}

The difference is subtle: Bundle2 does not declare @Foo.run, because it knows Bundle1 is loaded before it, and @Foo.run is already defined.

It is arguable that this will require two bundles to be built sequentially. But it can be worked around by "lifting" both declarations in to a third bundle:

//// Bundle0:
.funcdecl @Bar.run ...    // Declare @Bar.run in Bundle1
.funcdecl @Foo.run ...    // Declare @Foo.run in Bundle2

//// Bundle1:
.typedef @Foo = ....
.funcdef @Foo.run VERSION %v1 ... {
    ...
    CALL @Bar.run()
}

//// Bundle2
.typedef @Bar = ....
.funcdef @Bar.run VERSION %v1 ... {
    ...
    CALL @Foo.run()
}

Declaring functions is faster than defining. After Bundle 0 is loaded, Bundle 1 and Bundle 2 can be built and loaded in parallel.

It is also arguable that "lifting both declarations into a separate bundle" is a redundant step. But in practice, this step cannot be avoided. Still take Java as example. If one Java ClassLoader visits both Foo.class and Bar.class, then it already knows both classes, and it can simply build both into a single Mu bundle rather than splitting them into two. If two Java ClassLoaders attempt to load Foo and Bar in parallel, and they found the inter-dependency, but also found each other working on the two respective .class files simultaneously, then the ClassLoaders need certain synchronisation mechanism so that classes are not loaded twice. This is necessary even in existing non-Mu productional JVMs. So if there are needs for compiling two Java classes in parallel and they have inter-dependencies, then the client has to factor out the common parts, which naturally leads to the "Bundle0".

An orthogonal issue is about the type system. Assume we have the two Java classes:

class Foo { Bar bar; }
class Bar { Foo foo; }

Naturally @Foo should be struct<@JavaHeader ref<@Bar>>. However, without looking at bar.class, we cannot define the type @Bar which is supposed to match the structure of the Java class fields in Bar. So if we enforce lazy loading, then Foo.bar has to be represented as ref<void> rather than ref<@Bar>. This has been discussed in a separate issue before. The separate-compiling model does not solve this problem because the crux is that the knowledge of Bar is only obtained by looking at Bar.class. Unlike declared-but-not-defined functions, having types that are not yet known (the C language calls it "incomplete type") will cause many problems. These types are inaccessible. If traps should be triggered when a type is used, it is hard to define what it means by "a type is used". If we define it as accessing an object that has that type, or simply performing BinOp on such types, then almost all instructions can trigger traps.

Conclusion

In the end, we still believe the current Mu design is reasonable for its purpose as a JIT compiler.

The current "single global bundle" design is also easier for the boot image writer because there is only one bundle to consider.

But we may consider the needs of programming language implementers that "modular languages should be compiled to modular object code". The implication of adopting this model is still not clear. Alternatively, this model could also be implemented in a layer above Mu.

Assignee
Assign to
Time tracking