Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • G general-issue-tracker
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 47
    • Issues 47
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Container Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • mumu
  • general-issue-tracker
  • Issues
  • #72
Closed
Open
Issue created Jun 04, 2017 by Kunshan Wang@u5211824Owner

Alternative serialisable format (such as JSON/YAML/XML/...)

I am glad to see the mu-tool-compiler project existing.

I have conjectured having an alternative serialisable and human-readable format to the current text-based IR. In fact, the text-based Mu IR is a thing that I am unhappy with. It has various problems.

  • It requires a dedicated parser, which has to be implemented by hand.
  • When new features are added, the grammar changes, and the parser needs to be modified.
  • The text-based IR is confined by aesthetic considerations, and has many inconsistencies. For example:
    • The reason why .funcdef ... <@sig> has a signature is because it also works as a syntax sugar, using which a human writer only needs to write a .funcdef to create both a function and its first version.
    • As a convention, types and signatures in Mu instructions are in angular brackets, such as ADD <@i32> %x %y. But instructions may have more than types and signatures. One example is GETFIELDIREF. It has a integer literal argument. But the current form GETFIELDIREF <@type 3> %ref is ugly. The number 3 looks out of place.

I suggest there should be a Mu IR format in a well-known structured data format, such as JSON, YAML, XML, and so on.

Related work:

  • LLVM yaml2obj: http://llvm.org/docs/yaml2obj.html

Potential advantages:

  • There are mature open-source parsers available.
  • Easy to extend.
  • Easy to specify (in mu-spec).

For example, if we want to add an externally-usable symbol to an exposed function, we only need to add a property, not redesigning the grammar:

name: foo
func: func
callconv: DEFAULT
cookie: cookie
symbol: externally_visible_symbol     # This is an added property

It is easy to specify because we can define the IR as an (abstract) object tree with properties, similar to how the HTML5 DOM is defined.

There are also potential disadvantages:

  • More verbose
  • Less human-readable than the current text form, but human readability should not be the primary concern.

XML example:

<bundle>
    <type id="i8"   ctor="int"  length="8" />    <!-- note: XML ID is actually a name -->
    <type id="i32"  ctor="int"  length="32" />
    <type id="i64"  ctor="int"  length="64" />
    <type id="pi8"  ctor="uptr" type="i8" />
    <type id="ppi8" ctor="uptr" type="pi8" />

    <type id="refi32" kind="ref" type="i32" />

    <funcsig id="mainsig" />
        <paramty type="i32" />
        <paramty type="ppi8" />
        <retty type="i32" />
    </funcsig>

    <const id="I32_42" type="i32" value="42" />
    <const id="I64_0"  type="i64" value="0" />

    <global id="errno" type="i32" />

    <funcdecl id="main" sig="mainsig" />

    <funcdef func="main" />
        <bb lname="entry">          <!-- lname = local name -->
            <param type="i32"  lname="argc" />
            <param type="ppi8" lname="argv" />
            <inst opcode="ADD" flags="V" type="i32" opnd1="%argc" opnd2="@I32_42">
                <result lname="res" />
                <result lname="ovf" />
            </inst>

            <inst opcode="CALL" sig="some_sig" callee="some_callee">
                <arg val="argc" />
                <result lname="r1" />
                <nor-dest name="bb2">
                    <pass-value val="r1" />
                </nor-dest>
                <exc-dest name="bb3" />
            </inst>

            <inst opcode="SWAPSTACK" swappee="%some_hypothetic_stack">
                <return-with>
                    <result type="i32" lname="ss_res1" />
                    <result type="i32" lname="ss_res2" />
                </return-with>
                <pass-values>
                    <pass-value type="i32" val="%res" />
                    <pass-value type="i32" val="%r1" />
                </pass-valuse>
            </inst>

            <!-- more instructions here -->
        </bb>
        <bb lname="bb2">
            <param type="i32"  lname="r1" />
            <!-- more instructions here -->
        </bb>
        <bb lname="bb3">
            <exc-param lname="exc" />
            <!-- more instructions here -->
        </bb>
    </funcdef>

    <expose id="exposed_main" symbol="c_callable_symbol_of_exposed_main" 
        func="main" callconv="DEFAULT" cookie="@I64_0" />
</bundle>

A YAML example:

types:
  - name: i8
    ctor: int
    length: 8

  - {name: "i32", ctor: "int", length: 32}
  - {name: "i64", ctor: "int", length: 64}
  - {name: "double", ctor: "double"}
  
function_signatures:
  - name: "mainsig"
    paramtys: ["i32", "ppi8"]
    rettys: ["i32"]

constants:
  - {name: "I32_42", type: "i32", value: 42}
  - {name: "I64_0",  type: "i64", value: 0}
  - {name: "D_0",  type: "double", value: 0.0}
  - name: "D_NAN"
    type: "double"
    value_from_int: 0x7ff0000000000001

globals:
  - {name: "errno", type: "i32"}

functions:
  - name: "main"
    sig: "main_sig"
    initial_version:
      - bbname: "entry"
        params:
          - {type: "i32",  lname: "argc"}
          - {type: "ppi8", lname: "argv"}
        insts:
          - {opcode: "ADD", flags: "V", type: "i32", opnd1: "%argc", opnd2: "@I32_42",
              results: ["res", "ovf"]}
          
          - opcode: "CALL"
            sig: "some_sig"
            callee: "some_callee"
            args: ["%argc"]
            results: ["r1"]
            nor_dest:
              bb: "bb2"
              pass_values: ["%r1"]
            exc_dest:
              bb: "bb3"

          - opcode: "SWAPSTACK"
            swappee: "%some_hypothetic_stack"
            ret_with:
              - {type: "i32", lname: "ss_res1"}
              - {type: "i32", lname: "ss_res2"}
            pass_value:
              - {type: "i32", val: "%res"}
              - {type: "i32", val: "%r1"}

          # more instructions here

      - bbname: "bb2"
        params:
          - {type: "i32", lname: "r1"}
        insts:
          # more instructions here

      - bbname: "bb2"
        excparam: "exc"
        insts:
          # more instructions here

exposed_functions:
  - name: "exposed_main"
    symbol: "c_callable_symbol_of_exposed_main"
    func: "main"
    callconv: "DEFAULT"
    cookie: "@I64_0"

LISP:

(type i8  int 8)
(type i32 int 32)
(type i64 int 64)
(type pi8 ptr i8)
(type ppi8 ptr pi8)

(funcsig mainsig (i32 ppi8) (i32))

(const I32_42 i32 42)
(const I64_0  i64 0)

(global errno i32)

(funcdecl main main_sig)
(funcdef main.v1 main
         (bb entry ((i32 argc) (ppi8 argv))
             (ADD i32 %argc @I32_42 res
                     ((C carry)
                      (V ovf)
                      )) 

             (CALL some_sig some_callee (%argc) (r1)
                   ((bb2 (%r1)) (bb3)))

             (SWAPSTACK %some_hypothetic_stack
                        (ret-with ((i32 ss_res1)
                                   (i64 ss_res2)))
                        (pass-values ((i32 %res)
                                      (i32 %r1))))

         (bb bb2 ((i32 r1))
             # More instructions here
             )

         (bb bb2 (exc)
             # More instructions here
             )

(expose exposed_main main DEFAULT @I64_0
        ((symbol "c_callable_symbol_of_exposed_main")))
Assignee
Assign to
Time tracking