ir.rst 43.3 KB
Newer Older
Kunshan Wang's avatar
Kunshan Wang committed
1 2 3
============================
Intermediate Representation
============================
Kunshan Wang's avatar
Kunshan Wang committed
4

Kunshan Wang's avatar
Kunshan Wang committed
5 6 7
Mu Intermediate Representation, or (Mu IR), is the language used by Mu to
represent a Mu program. It is the input from the client and can be executed on
Mu.
Kunshan Wang's avatar
Kunshan Wang committed
8

Kunshan Wang's avatar
Kunshan Wang committed
9 10
Mu can execute the program in any way, including interpretation, JIT compiling
or even Ahead-of-time compiling.
Kunshan Wang's avatar
Kunshan Wang committed
11

12 13
Mu IR is a tree-shaped structure that consists of nodes, including top-level
definitions and their children. It also has a human-readable text form.
14

15
This document describes the top-level of the Mu IR using the text form. There is
16
also the `IR Builder API <irbuilder.rst>`__, a programmatic interface to build
17 18
Mu IR inside a running micro VM.

19
There was a binary form, but is now deprecated. See `<ir-binary.rst>`__.
Kunshan Wang's avatar
Kunshan Wang committed
20 21 22

For the documents of the type system and the instruction set, see:

23 24
- `<type-system.rst>`__
- `<instruction-set.rst>`__
Kunshan Wang's avatar
Kunshan Wang committed
25 26 27 28

Example
=======

Kunshan Wang's avatar
Kunshan Wang committed
29
Here is an example of Mu IR in the text form::
Kunshan Wang's avatar
Kunshan Wang committed
30

Kunshan Wang's avatar
Kunshan Wang committed
31 32 33 34
    .typedef @i64 = int<64>
    .typedef @double = double
    .typedef @void = void
    .typedef @refvoid = ref<@void>
Kunshan Wang's avatar
Kunshan Wang committed
35

Kunshan Wang's avatar
Kunshan Wang committed
36 37
    .const @i64_0 <@i64> = 0
    .const @answer <@i64> = 42
Kunshan Wang's avatar
Kunshan Wang committed
38

Kunshan Wang's avatar
Kunshan Wang committed
39 40 41 42
    .typedef @some_global_data_t = struct <@i64 @double @refvoid>
    .global @some_global_data <@some_global_data_t>

    .typedef @Node = struct<@i64 @NodeRef>
Kunshan Wang's avatar
Kunshan Wang committed
43 44
    .typedef @NodeRef = ref<@Node>

45
    .funcsig @BinaryFunc = (@i64 @i64) -> (@i64)
Kunshan Wang's avatar
Kunshan Wang committed
46

Kunshan Wang's avatar
Kunshan Wang committed
47
    .funcdecl @square_sum <@BinaryFunc>
Kunshan Wang's avatar
Kunshan Wang committed
48

49 50 51
    .funcdef @gcd VERSION %v1 <@BinaryFunc> {
        %entry(<@i64> %a <@i64> %b):
            BRANCH %head(%a %b)
Kunshan Wang's avatar
Kunshan Wang committed
52

53
        %head(<@i64> %a <@i64> %b):
Kunshan Wang's avatar
Kunshan Wang committed
54
            %z = EQ <@i64> %b @i64_0
55
            BRANCH2 %z %exit(%a) %body(%a %b)
Kunshan Wang's avatar
Kunshan Wang committed
56

57
        %body(<@i64> %a <@i64> %b):
Kunshan Wang's avatar
Kunshan Wang committed
58
            %b1 = SREM <@i64> %a %b
59
            BRANCH %head(%b %b1)
Kunshan Wang's avatar
Kunshan Wang committed
60

61
        %exit(<@i64> %a):
62
            RET %a
Kunshan Wang's avatar
Kunshan Wang committed
63 64
    }

65 66
    .expose @gcd_native = @gcd < DEFAULT > @i64_0

67 68 69
Later the client can submit a function that defines a previously undefined
function or a new version of a function that replaces the old version::

70
    .funcdef @square_sum VERSION %v1 <@BinaryFunc> {
Kunshan Wang's avatar
Kunshan Wang committed
71
        // define the function (if not defined)
72 73
    }

74
    .funcdef @gcd VERSION %v2 <@BinaryFunc> {
Kunshan Wang's avatar
Kunshan Wang committed
75
        // or replace an existing version (if already defined)
76 77
    }

Kunshan Wang's avatar
Kunshan Wang committed
78 79 80
Top-level Structure
===================

Kunshan Wang's avatar
Kunshan Wang committed
81
A **bundle** is the unit of code the client sends to Mu. It contains many
Kunshan Wang's avatar
Kunshan Wang committed
82 83 84
**top-level entities**. A top-level entity shall be a **type**, **function
signature**, **constant**, **global cell**, **function** or **exposed
function**.
85

Kunshan Wang's avatar
Kunshan Wang committed
86 87 88 89
Note that while *functions* are top-level, *function versions* are **not** top
level entities. Function versions are nodes under functions. The text form
allows writing ``.funcdef`` at the top level, but it is a syntax sugar of
defining a function (if not already existing) and also defining a version of it.
90

Kunshan Wang's avatar
Kunshan Wang committed
91 92
    NOTE: For Java users, a bundle is approximately the counterpart of a
    ``.class`` file.
93

Kunshan Wang's avatar
Kunshan Wang committed
94 95 96 97 98 99 100 101 102 103 104 105 106
In a bundle, top-level entities do not have any order. This means, in the text
form, top-level definitions can appear in any order and refer to each other by
names; in the bundle building API, top-level entities can be created in any
order using API functions and refer to each other symbolically.

Object Hierarchy
----------------

The object hierarchy (the "A has many B" or "A refers to many B" relation) of
the contents of a bundle is:

- A **bundle**

Yin Yan's avatar
Yin Yan committed
107
  - has 0-∞ **types**. Each type
Kunshan Wang's avatar
Kunshan Wang committed
108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152

    - refers to 0-∞ *types* or *function signatures*.

  - has 0-∞ **function signatures**. Each signature

    - refers to 0-∞ *types* (parameter types and return types)

  - has 0-∞ **constants**. Each constant

    - refers to 1 *type* (type of constant), and
      
    - 0-∞ other `global variables <instruction-set.rst#global-variable>`__
      (aggregated values, such as struct/array members).

  - has 0-∞ **global cells**. Each global cell

    - refers to 1 *type* (type of global cell).

  - has 0-∞ **exposed functions**. Each exposed function

    - refers to 1 *function* (the function to expose), and
    
    - refers to 1 *constant* (the `cookie <native-interface.rst#cookie>`__).

  - has 0-∞ **functions**. Each function

    - has 0-∞ **function versions**. Versions are ordered from oldest to newest
      in the order of bundle loading.  *If there is no version, the function is
      "undefined"; otherwise it is "defined".* Each version

      - has 1-∞ **basic blocks**. The first basic block is the *entry* block.
        Each block

        - has 0-∞ **normal parameters**. Each normal parameter
        
          - refers to 1 *type*.

        - has 0-1 **exceptional parameters**

        - has 1-∞ **instructions**. Each instruction

          - has 0-∞ results, and

          - may refer to 0-∞ other *types*, *function signatures* or *variables*
            (including both global and local variables).
153

Kunshan Wang's avatar
Kunshan Wang committed
154
Identifiers and Names
155 156
---------------------

Kunshan Wang's avatar
Kunshan Wang committed
157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200
Many entities in Mu are **identified**. The complete list of identified entities
are:

- Top-level entities, including:

  - types

  - function signatures

  - constants

  - global cells

  - functions

  - exposed functions

- Function versions

- Basic blocks

- Basic block parameters, including
  
  - normal parameters
    
  - exceptional parameters

- Instructions

- Instruction results

- Clauses of instructions, including:

  - destination clauses

  - exception clauses

  - keep-alive clauses

  - current-stack clauses

  - new-stack clauses

..
Kunshan Wang's avatar
Kunshan Wang committed
201

Kunshan Wang's avatar
Kunshan Wang committed
202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225
    **Notes to the micro VM implementers**: Although the bundle building API
    needs IDs of the above entities to construct IR nodes that refer to each
    other symbolically, recording the IDs and names for all of them may not be
    useful, and sometimes can consume a large amount of memory.
    
    In a steady state, the IDs and names of top-level entities need to be kept
    for building subsequent bundles that refer to them. The IDs and names of
    function versions and OSR-point instructions are needed for stack
    introspection. Other IDs and names are useless from the client's point of
    view, because the client cannot do anything with the IDs of those other
    entities, and the API does not require micro VM implementations to look up
    the IDs and names for the client.
    
    The micro VM implementers can make use of such restrictions on the API. So a
    space-efficient implementation does not need to record the IDs and names of
    **basic blocks**, **instruction results**, **non-OSR-point instrutions** or
    **clauses** of instructions. But recording them may provide more information
    for debugging.

An identified entity has an identifiers (ID) and **optionally** a name. An
identifier (ID) is a 32-bit integer.  A name is a string starting with a ``@``
or a ``%`` and followed by many characters in the set: ``[0-9a-zA-Z_-.]``. An ID
uniquely identifies an *identified* entity. A name, if present, also uniquely
identifies an *identified* entity.
226

227
IDs of entities are determined by the micro VM.
Kunshan Wang's avatar
Kunshan Wang committed
228

229 230
The text form Mu IR only refers to entities by names. When loaded into a micro
VM, the IDs of entities in a bundle is automatically generated.  When generating
Kunshan Wang's avatar
Kunshan Wang committed
231 232
IDs, Mu guarantees that **each name has a corresponding ID**, and **no two
different names are mapped to the same ID**.
Kunshan Wang's avatar
Kunshan Wang committed
233

234 235
Names
~~~~~
Kunshan Wang's avatar
Kunshan Wang committed
236

Kunshan Wang's avatar
Kunshan Wang committed
237
A **global name** begins with ``@``. All identified entities can use global
Kunshan Wang's avatar
Kunshan Wang committed
238 239
names. Top-level entities and function versions must use global names. Global
names are valid in a whole Mu instance, not limited to a single bundle.
Kunshan Wang's avatar
Kunshan Wang committed
240

241 242
    Example::

Kunshan Wang's avatar
Kunshan Wang committed
243 244
        .typedef    @i8 = int<8>
        .typedef    @i32 = int<32>
245
        .typedef    @i64 = int<64>
Kunshan Wang's avatar
Kunshan Wang committed
246 247 248
        .typedef    @ir_i8 = iref<@i8>
        .typedef    @ir_ir_i8 = iref<@ir_i8>

249 250
        .funcsig    @some_fun_sig = () -> ()
        .funcsig    @main_sig = (@i32 @ir_ir_i8) -> (@i32)
Kunshan Wang's avatar
Kunshan Wang committed
251 252 253 254 255 256 257

        .const      @i32_1 <@i32> = 1
        .const      @i64_0 <@i64> = 0

        .global     @errno <@i64>

        .funcdecl @some_fun <@some_fun_sig>
258

259
A **local name** begins with ``%``. Function versions, basic blocks, parameters
260
and instruction results may use local names in the IR.
261

262 263 264 265 266 267
Local names are a syntax sugar in the text-form IR. When parsed, they are
de-sugared into global names.

    NOTE: This implies that the client must use IDs or global names in the
    client API because there is no local name once a text-form bundle is loaded
    into the micro VM.
268 269 270 271 272

The global names are inferred from their syntactic parents:

- Within a function which has the global name ``@FuncGlobalName``, the function
  version ``%FV`` has global name ``@FuncGlobalName.FV``.
273

274 275 276
- Within a function version which has the global name ``@FuncVerGlobalName``, a
  basic block with local name ``%BB`` has global name ``@FuncVerGlobalName.BB``.

277 278 279
- Within a basic block which has the global name ``@BBGlobalName``, a parameter,
  an instruction or an instruction result with local name ``%LN`` has global
  name ``@BBGlobalName.LN``.
280 281

..
282 283 284

    Example::

Kunshan Wang's avatar
Kunshan Wang committed
285
        .funcsig @fac.sig = (... ...) -> (...)
286 287 288

        .funcdef @fac VERSION %v1 <@fac.sig> {
            %entry(<@i32> %n):
289
                [%first_br] BRANCH %head(%n @I32_1 @I32_1)
290 291 292

            %head(<@i32> %n <@i32> %p <@i32> %i):
                %lt = SLT <@i32> %i %n
293
                [%second_br] BRANCH2 %lt %body(%n %p %i) %exit(%p)
294 295 296 297 298 299 300 301

            %body(<@i32> %n <@i32> %p <@i32> %i):
                %p2 = MUL <@i32> %p %i
                %i2 = ADD <@i32> %i @I32_1
                BRANCH %head(%n %p2 %i2)

            %exit(<@i32> %p):
                RET %p
302
        }
Kunshan Wang's avatar
Kunshan Wang committed
303

304 305 306 307 308 309 310 311
    In the above example, the global names of the function version, the basic
    blocks and their parameters and instructions are:

    - ``%v1`` -> ``@fac.v1``

      - ``%entry`` -> ``@fac.v1.entry``

        - ``%n`` -> ``@fac.v1.entry.n``
312
        - ``%first_br`` -> ``@fac.v1.entry.first_br``
313 314

      - ``%head`` -> ``@fac.v1.head``
Kunshan Wang's avatar
Kunshan Wang committed
315

316 317 318 319
        - ``%n`` -> ``@fac.v1.head.n``
        - ``%p`` -> ``@fac.v1.head.p``
        - ``%i`` -> ``@fac.v1.head.i``
        - ``%lt`` -> ``@fac.v1.head.lt``
320
        - ``%second_br`` -> ``@fac.v1.head.second_br``
Kunshan Wang's avatar
Kunshan Wang committed
321

322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359
      - ``%body`` -> ``@fac.v1.body``

        - ``%n`` -> ``@fac.v1.body.n``
        - ``%p`` -> ``@fac.v1.body.p``
        - ``%i`` -> ``@fac.v1.body.i``
        - ``%p2`` -> ``@fac.v1.body.p2``
        - ``%i2`` -> ``@fac.v1.body.i2``
        - The anonymous ``BRANCH`` instruction: no global name.

      - ``%exit`` -> ``@fac.v1.exit``

        - ``%p`` -> ``@fac.v1.exit.p``
        - The anonymous ``RET`` instruction: no global name.

    Note that the same local name, such as ``%p``, has **different global names
    in different basic blocks**. ``@fac.v1.head.p`` and ``@fac.v1.body.p`` are
    not the same. They even seldom have the same value.

    Function versions, basic blocks, parameters and instructions can use global
    names, too. For example, instead of the previous example, it is legal to
    write::

        .funcdef @fac VERSION @fac.v1 <@fac.sig> {
            @fac.v1.entry(<@i32> @fac.v1.entry.n):
                BRANCH @fac.v1.head(<@i32> @fac.v1.entry.n <@i32> @I32_1 <@i32> @I32_1)

            @fac.v1.head(<@i32> @fac.v1.head.n <@i32> @fac.v1.head.p <@i32> @fac.v1.head.i):
                @fac.v1.head.lt = SLT <@i32> @fac.v1.head.i @fac.v1.head.n

    or even::

        .funcdef @fac VERSION @n1 <@fac.sig> {
            @n2(<@i32> @n3):
                BRANCH @n4(<@i32> @n3 <@i32> @I32_1 <@i32> @I32_1)

            @n4(<@i32> @n5 <@i32> @n6 <@i32> @n7)
                @n8 = SLT <@i32> @n7 @n5

360 361 362
Because local names are merely syntax sugar, everything that has a local name
can be identified by their global names. It is still considered a naming
conflict if two local names have the same global name.
Kunshan Wang's avatar
Kunshan Wang committed
363 364 365

..

366 367 368 369 370
    NOTE: It is useful to have local things globally identifiable, especially
    *function call sites* and *traps*. For example::

        .funcdef @foo VERSION @foo.v1 <...> {
            %entry():
371 372
                (%rv1 %rv2 %rv3) = [%the_call_site] CALL <@T1 @T2 @T3> @some_func (...)
                () = [%the_trap] TRAP <> KEEPALIVE (...)
373 374 375
                ...
        }

376 377 378
    The call site can be globally identified by ``@foo.v1.entry.the_call_site``.
    The trap can be identified by ``@foo.v1.entry.my_trap``. The name is
    globally unique and can be used to identify individual traps and call sites
379
    in *trap handlers* (see `the API <api.rst#trap-handling>`__).
Kunshan Wang's avatar
Kunshan Wang committed
380

Kunshan Wang's avatar
Kunshan Wang committed
381
Identifiers
382
~~~~~~~~~~~
Kunshan Wang's avatar
Kunshan Wang committed
383

384 385
All identifiers are global. Every ID uniquely identifies one entity in the whole
Mu instance.
386

Kunshan Wang's avatar
Kunshan Wang committed
387 388
0 is an invalid ID. IDs in the range of 1-65535 are reserved by Mu. The Mu
specification only uses 1-32767. 32768-65535 can be used by the Mu
389
implementation for extension purposes.
Kunshan Wang's avatar
Kunshan Wang committed
390

Kunshan Wang's avatar
Kunshan Wang committed
391 392 393
Type Definition
===============

394
Types and the **type constructor** syntax are documented in `<type-system.rst>`__.
Kunshan Wang's avatar
Kunshan Wang committed
395

396
A **type definition** gives a name to a type. It has the following form::
Kunshan Wang's avatar
Kunshan Wang committed
397

398
    .typedef Name = TypeCtor
Kunshan Wang's avatar
Kunshan Wang committed
399

400
where:
Kunshan Wang's avatar
Kunshan Wang committed
401

402 403
* ``Name`` is a global name for the type, and
* ``TypeCtor`` is a type constructor which defines the type.
Kunshan Wang's avatar
Kunshan Wang committed
404

Kunshan Wang's avatar
Kunshan Wang committed
405 406 407
In the bundle building API, type nodes are created using the ``new_type_*``
functions.

408
..
Kunshan Wang's avatar
Kunshan Wang committed
409

410
    Example: The following type definition defines a simple non-recursive type::
Kunshan Wang's avatar
Kunshan Wang committed
411

412
        .typedef @i64 = int<64>
Kunshan Wang's avatar
Kunshan Wang committed
413

414
    It gives a name ``@i64`` to a 64-bit integer.
Kunshan Wang's avatar
Kunshan Wang committed
415

Kunshan Wang's avatar
Kunshan Wang committed
416 417
..

418
    Example: The following type definition defines a recursive type::
Kunshan Wang's avatar
Kunshan Wang committed
419

Kunshan Wang's avatar
Kunshan Wang committed
420 421
        .typedef @i64 = int<64>
        .typedef @Node = struct<@i64 @NodeRef>
422 423 424 425 426 427
        .typedef @NodeRef = ref<@Node>

    These define a node in a singly-linked list. The second field of the struct
    is an object reference to itself. Note that **the order of top-level
    definitions does not matter**. They can be written in any order.

Kunshan Wang's avatar
Kunshan Wang committed
428 429
..

430 431 432 433 434
    NOTE: There is no way to simply make an alias of another type. ``.typedef
    @Foo = @Bar`` is illegal because ``@Bar`` is not a type constructor. In this
    event, replacing all occurrences of ``@Foo`` with ``@Bar`` in the whole
    program is the desired approach.

Kunshan Wang's avatar
Kunshan Wang committed
435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450
..

    For C programmers: In Mu IR, types cannot be "inlined", i.e. all types
    referenced by other definitions (such as other types, constants, globals,
    functions, ...) must be defined at top-level. For example::

        .typedef @refi64 = ref<int<64>>     // WRONG. Cannot write int<64> inside.
        
        .typedef @i64 = int<64>
        .typedef @refi64 = ref<@i64>        // Right.

        %sum = FADD <double> %a %b          // WRONG. "double" is a type constructor, not a type

        .typedef @double = double
        %sum = FADD <@double> %a %b         // Right.

451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466
Function Signature Definition
=============================

A **function signature definition** gives a name to a **function signature**, or
**signature** when unambiguous. It has the following form::

    .funcsig Name = SigCtor

where:

* ``Name`` is a global name for the signature, and
* ``SigCtor`` is a function signature constructor which defines the function
  signature.

A function signature constructor has the form::

467
    (ParamTys) -> (RetTys)
468

469 470
where both ``ParamTys`` and ``RetTys`` are a list of global names separated by
spaces for the types of parameters and return values, respectively.
471

Kunshan Wang's avatar
Kunshan Wang committed
472 473 474
In the bundle building API, function signatures are created by the
``new_funcsig`` function.

475 476
    Example: The following signature receives no parameters and return no
    values::
Kunshan Wang's avatar
Kunshan Wang committed
477

478
        .funcsig @empty_func_s = () -> ()
479

Kunshan Wang's avatar
Kunshan Wang committed
480 481
    The following signature receives a 64-bit integer and a double as parameters
    and returns an object reference to a 64-bit integer::
482

Kunshan Wang's avatar
Kunshan Wang committed
483 484 485
        .typedef @double = double
        .typedef @i64 = int<64>
        .typedef @refi64 = ref<@i64>
486
        .funcsig @some_func_s = (@i64 @double) -> (@refi64)
Kunshan Wang's avatar
Kunshan Wang committed
487

Kunshan Wang's avatar
Kunshan Wang committed
488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503
..

    For C programmers: Just like types, function signatures cannot be inlined.
    For example::

        // Asssume @i64 is defined as int<64>

        .typedef @foo_fp  = funcref<(@i64 @i64) -> (@i64)>      // WRONG.
        
        %rv = CALL <(@i64 @i64) -> (@i64)> @func (%arg1 %arg2)  // WRONG.

        .funcsig @foo_sig = (@i64 @i64) -> (@i64)
        .typedef @foo_fp  = funcref<@foo_sig>               // Right.

        %rv = CALL <@foo_sig> @func (%arg1 %arg2)           // Right.

Kunshan Wang's avatar
Kunshan Wang committed
504 505 506
Constant Definition
===================

507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529
Constants are global `SSA variables <instruction-set.rst#ssa-variables>`__ of
given values, can be used anywhere SSA variables are accepted, and cannot
change.

..

    For C programmers:

    1. In Mu, constants are just values, not part of the Mu memory. Although, in
       C, you can declare an object with the "const" decorator: ``const int a =
       42;`` and then have a pointer to it: ``const int* pa = &a;``, you cannot
       do the same in Mu. Global cells may be more appropriate if references to
       constants are ever taken.
    
    2. Just like types, constants cannot be inlined.  For example::

        // Asssume @i64 is defined as int<64>

        %b = ADD <@i64> %a 123          // WRONG. What type is 123?

        .const @I64_123 <@i64> = 123    // 123 is an @i64
        %b = ADD <@i64> %a @I64_123     // Right.

530 531
A **constant definition** has the form::

Kunshan Wang's avatar
Kunshan Wang committed
532
    .const Name <Type> = ConstCtor
533 534 535 536

where:

* ``Name`` is global name for the constant;
Kunshan Wang's avatar
Kunshan Wang committed
537
* ``Type`` is a global name for the type of the constant, and
538 539 540 541
* ``ConstCtor`` is a constant constructor.

A **constant constructor** can be the following:

542
- **integer constructor**, such as ``42``, ``0x2a``
543

544 545
- **floating point constructor**, such as ``3.14f``, ``3.14d``,
  ``bitsd(0x3ff8000000000000)``
546

547
- **list constructor**, such as ``{ @foo @bar @baz }``
548

549
- **null constructor**: ``NULL``
550

551
- **external constructor**, such as ``EXTERN "write"``
552

Kunshan Wang's avatar
Kunshan Wang committed
553 554 555
In the bundle building API, constant nodes are created using the ``new_const_*``
functions.

556 557
Integer Constructor
-------------------
558

559
**Integer constructors** are applicable to:
560

561 562
+ integer types: ``int<n>``, and
+ pointer types: ``uptr<T>``, ``ufuncptr<sig>``
Kunshan Wang's avatar
Kunshan Wang committed
563

564
The IR builder API constructs such constants using actual integers.
565

566
.. _integer-literal:
567

568 569 570 571 572 573 574 575
The text form is written as an **integer literal**, which is:
  
+ an optional sign [+-], followed by
+ an optional prefix: ``0`` or ``0x``, and
+ a sequence of digits [0-9a-fA-F].
  
A prefix 0 represents an octal number. A prefix 0x represents a hexadecimal
number. Otherwise it is a decimal number.
576

Kunshan Wang's avatar
Kunshan Wang committed
577
..
578

Kunshan Wang's avatar
Kunshan Wang committed
579
    NOTE: The client must ensure the number (integer or floating point) can be
Kunshan Wang's avatar
Kunshan Wang committed
580 581 582 583
    represented by the type, or it is an error.

..

584 585 586
    Example::
    
        .typedef @i64 = int<64>
587

588 589 590 591
        .const @oct1 <@i64> = 0
        .const @oct2 <@i64> = +01234567
        .const @dec1 <@i64> = 1234567890
        .const @hex1 <@i64> = -0x123456789abcdef0
592

593 594
        .typedef @ptri64          = uptr<@i64>
        .typedef @fpnoparamsnoret = ufuncptr<@noparamsnoret>
Kunshan Wang's avatar
Kunshan Wang committed
595

596 597 598 599
        // Address should be looked up before generating the bundle.
        .const @ptrconst <@ptri64> = 0x12345678
        .const @fpconst  <@fpnoparamsnoret> = 0x7ff00000000
        .const @nullptr  <@ptri64> = 0
Kunshan Wang's avatar
Kunshan Wang committed
600

601 602
Floating Point Constructor
--------------------------
Kunshan Wang's avatar
Kunshan Wang committed
603

604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633
**Floating point constructors** are applicable to floating point types:

+ ``float`` (if the suffix is 'f' in the text form);

+ ``double`` (if the suffix is 'd' in the text form).

The IR builder API constructs such constants using actual IEEE754 value.

The text form is written as a **floating point literal**, which can be one of
the following forms:

+ An optional sign [+-], an integral part, a dot (.), a fraction part,
  an optional exponent part and a suffix.

  * Both the integral part and the fraction part are a sequence of decimal
    digits [0-9].
  * The exponent part is ``e`` followed by an optional sign [+-] followed by a
    sequence of decimal digits [0-9].
  * The suffix is either ``f`` (for ``float``) or ``d`` (for ``double``).
  * Example: ``123.456f``, ``+123.456e789d``, ``-123.456e-789d``

+ One of ``nan``, ``+inf`` and ``-inf`` with a suffix ``f`` or ``d``.

  * Example: ``nanf``, ``-infd``

+ ``bitsf(intlit)`` or ``bitsd(intlit)`` where ``intlit`` is an `integer literal
  <integer-literal_>`__ as defined before, and the ``f`` and ``d`` represents
  ``float`` and ``double``, respectively. In the case, the resulting ``float``
  and ``double`` value has the same bit-wise representation as the 32-bit or
  64-bit integer of ``intlit``, respectively.
634 635 636

..

637
    Example::
638 639 640

        .typedef @float = float
        .typedef @double = double
641

642 643 644 645 646 647
        .const @float1 <@float> = 123.456f
        .const @float2 <@float> = +123.456e789f
        .const @float3 <@float> = -123.456e-789f
        .const @float4 <@float> = nanf
        .const @float5 <@float> = +inff
        .const @float6 <@float> = -inff
648 649 650
        .const @float7 <@float> = bitsf(0x7f800000)   // float inf
        .const @float7 <@float> = bitsf(0x7f800001)   // float nan
        
651 652 653 654 655 656
        .const @double1 <@double> = 123.456d
        .const @double2 <@double> = +123.456e789d
        .const @double3 <@double> = -123.456e-789d
        .const @double4 <@float> = nand
        .const @double5 <@float> = +infd
        .const @double6 <@float> = -infd
657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678
        .const @double7 <@float> = bitsd(0x7ff0000000000000)   // double inf
        .const @double7 <@float> = bitsd(0x7ff0000000000001)   // double nan

List Constructor
----------------

**List constructors** are applicable for composite types, but not hybrid:

+ ``struct``

+ ``array``

+ ``vector``

The IR builder API constructs such constants by providing a list of handles
to its members (other global SSA values).

In the text form, it is written as:

+ an opening brace ``{``, followed by

+ a sequence of global names of other global variables separated by spaces, and
Kunshan Wang's avatar
Kunshan Wang committed
679

680 681 682 683 684 685 686 687 688 689 690 691 692
+ a closing brace ``}``.

The sequence of names must have the same number of names as the number of
fields/elements as the type requires.

A constant must not recursively contain itself. (It is impossible to construct
recursive constants using the IR builder API.)

..

    Example::

        // simple struct
Kunshan Wang's avatar
Kunshan Wang committed
693 694 695
        .typedef @record_t = struct<@i64 @double>
        .const @record <@record_t> = {@dec1 @double1}

696
        // nested struct
Kunshan Wang's avatar
Kunshan Wang committed
697
        .typedef @nested_record_t = struct<@i64 @record_t @float>
698 699 700 701 702 703 704 705 706 707 708 709 710 711
        .const @nested_record <@nested_record_t> = {@hex1 @record @float2}

        // vector
        .typedef @4xfloat = vector <@float 4>
        .const @vec1 <@4xfloat> = { @float1 @float2 @float3 @float4 }

        // array constant
        // Not recommended to use unless intracting with native function.
        .typedef @i64ary = array<@i64 3>
        .const @constary <@i64ary> = {@oct1 @oct2 @dec1}

        // Global cells and functions are global variables, too.
        // They can be components of constants.
        .global @g1 <@i64>
Kunshan Wang's avatar
Kunshan Wang committed
712

713
        .funcsig @noparamsnoret = () -> ()
714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738
        .funcdecl @f1 <@noparamsnoret>

        .typedef @irefi64 = iref<@i64>
        .typedef @record2_t = struct<@irefi64 @some_func>

        .const @record2 <@record2_t> = {@g1 @f1}

Null constructor
----------------

**Null constructors** are applicable for all *general reference types* except
`weakref <type-system.rst#reference-types>`__:

- ``ref<T>``

- ``iref<T>``

- ``funcref<sig>``

- ``threadref``

- ``stackref``

- ``framecursorref``

739
- ``irbuilderref``
740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774

It can only have ``NULL`` value.

In the text form, it is written as the **null literal**: ``NULL``. 

..

    NOTE: ``weakref`` cannot be the type of an SSA variable, but all constants
    are SSA variables. (See `<type-system.rst>`__ and `<instruction-set.rst>`__).

..

    NOTE: The only constant of reference types is ``NULL``. The reason why Mu
    forbids constant object references is manifold:

    * To define a constant heap reference, the client must provide a reference
      to a heap object, which itself is recursively a constant heap reference.
      Even if such a reference is created, it renders a heap object immortal (as
      immortal as a global cell), which defeated the purpose of garbage
      collection.

    * When a heap object is moved, the garbage collector must update all
      existing references to the object. This makes the constant reference not
      really "constant". Extra difficulties are introduced when such references
      become immediate values in the machine code.

    The global memory is an alternative to such needs. Either store a global
    data structure in the global memory or allocate it in the heap and assign
    its reference to a global cell. In fact, the ID or the name of any global
    cell is a constant SSA variable of an internal reference to it. The ID or
    the name of a Mu function is a constant SSA variable of a ``funcref``.

..

    Example::
Kunshan Wang's avatar
Kunshan Wang committed
775

776
        .typedef @void = void
Kunshan Wang's avatar
Kunshan Wang committed
777 778 779 780 781 782
        .typedef @ref_void = ref<@void>
        .const @null_ref <@ref_void> = NULL

        .typedef @iref_void = iref<@void>
        .const @null_iref <@iref_void> = NULL

Kunshan Wang's avatar
Kunshan Wang committed
783
        .typedef @some_func = funcref<@noparamsnoret>
Kunshan Wang's avatar
Kunshan Wang committed
784 785
        .const @null_func <@some_func> = NULL

Kunshan Wang's avatar
Kunshan Wang committed
786 787
        .typedef @tref = threadref
        .const @null_tr <@tref> = NULL
Kunshan Wang's avatar
Kunshan Wang committed
788

Kunshan Wang's avatar
Kunshan Wang committed
789 790
        .typedef @sref = stackref
        .const @null_sr <@sref> = NULL
791

792 793
External Constructor
--------------------
Kunshan Wang's avatar
Kunshan Wang committed
794

795
**External constructor** are applicable for pointer types:
796

797
- ``uptr<T>``
798

799
- ``ufuncptr<T>``
800

801
In the text form, it is written as:
802

803
+ the keyword ``EXTERN``, followed by
804

805 806 807 808 809 810 811 812 813 814
+ a string literal, which is a sequence of ASCII characters surrounded by ``"``
  (code is 34).  The code of each character shall be within 33–126 inclusive,
  but not 34 (non-space printable characters except ``"``). There are no escape
  sequences. This string represents a symbolic name.

The IR builder API constructs such constants by providing the symbol, which has
the same limitation as the text form: 33-126 but not 34.

The values of such constants are implementation-defined. Usually the
implementation will resolve the symbolic names to the address of C functions.
815

Kunshan Wang's avatar
Kunshan Wang committed
816 817
..

818
    Example::
Kunshan Wang's avatar
Kunshan Wang committed
819

820 821 822 823 824 825 826
        .typedef @char    = int<8>
        .typedef @charp   = uptr<@char>
        .typedef @int     = int<32>
        .typedef @void    = void
        .typedef @voidp   = uptr<@void>
        .typedef @size_t  = int<64>
        .typedef @ssize_t = int<64>
Kunshan Wang's avatar
Kunshan Wang committed
827

828 829
        .funcsig @write.sig = (@int @voidp @size_t) -> (@ssize_t)
        .typedef @write.fp  = ufuncptr<@write.sig>
Kunshan Wang's avatar
Kunshan Wang committed
830

831
        .const @write = EXTERN "write"
Kunshan Wang's avatar
Kunshan Wang committed
832

833 834 835 836 837 838 839 840 841 842 843
        .funcsig @puts.sig = (@charp) -> (@int)
        .typedef @puts.fp  = ufuncptr<@puts.sig>

        .const @puts = EXTERN "puts"

        .funcdef @main ... <...> {
            %...(...):
                ...
                %rv = CCALL #DEFAULT <@write.fp @write.sig> @write (%fd %buf %sz)
                ...
        }
844

845 846 847 848
Global Cell Definition
======================

A **global cell definition** defines a **global cell**. A global cell is the
849
memory allocation unit in the *global memory*. See `<memory.rst>`__ for more
850 851 852
information about the global memory.

    NOTE: The global memory is the counterpart of static or global variables in
853 854
    C/C++. In Mu, global cells are also permanently pinned so that it can be
    used to interact with native programs.
855 856 857

A global cell definition has the form::

Kunshan Wang's avatar
Kunshan Wang committed
858
    .global Name <Type>
859 860

* where ``Name`` is a global name for the global cell and
Kunshan Wang's avatar
Kunshan Wang committed
861
* ``Type`` is a global name for the type of the data the global cell
862 863
  represents.

Kunshan Wang's avatar
Kunshan Wang committed
864 865 866
In the bundle building API, global cell nodes are created using the
``new_global_cell`` function.

867 868 869 870
..

    Example::

Kunshan Wang's avatar
Kunshan Wang committed
871 872 873 874 875
        .typedef    @i8 = int<8>
        .typedef    @i32 = int<32>
        .const      @i32_0 <@i32> = 0

        .global @my_errno <@i32>
876

Kunshan Wang's avatar
Kunshan Wang committed
877
        .typedef @small_char_array = array<@i8 12>
878
        .global @hello_world_str <@small_char_array>
Kunshan Wang's avatar
Kunshan Wang committed
879 880
        
        // The client can populate the memory in @hello_world_str at loading time
881

882 883
        .funcdef @func VERSION ... <...> {
            %entry():
884
                %a = LOAD <@i32> @my_errno       // @my_errno has type iref<@i32>
885 886
                STORE <@i32> @my_errno @i32_0
                ...
887
        }
Kunshan Wang's avatar
Kunshan Wang committed
888

Kunshan Wang's avatar
Kunshan Wang committed
889 890 891 892 893
..

    For C programmers: Unlike C, global cells cannot be initialised as C global
    variables. Mu global cells (as any Mu memory locations) are initialised to 0
    or NULL. Writing to global cells can only be done via memory accessing
894
    (load, store, ...), or indirectly via the `HAIL <hail.rst>`__ language.
Kunshan Wang's avatar
Kunshan Wang committed
895 896 897
    Beware that concurrent non-atomic access (even as a result of careless
    initialisation) may result in data race, which has undefined behaviour.

Kunshan Wang's avatar
Kunshan Wang committed
898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958
Functions and Function Versions
===============================

**Functions** are callable in Mu. A function has a signature, which determines
its parameter types and return types.

A function may have zero or more *versions*. A function is said to be
"undefined" if it has zero versions; otherwise it is "defined".

A **version of function** has its control flow graph which defines how it is
executed. Function versions cannot be directly called. Calling a *function* will
actually call the most recent version a thread can see (see `below
<#func-exec>`__), or cause a trap if undefined. All versions of a function must
have the same signature.

Creating a Function Without Versions (Undefined Function)
---------------------------------------------------------

In the text form, the top-level **function declaration** creates a function with
no versions. It has the following form::

    .funcdecl Name <Sig>

where:

* ``Name`` is a global name for the function and
* ``Sig`` is a global name for the signature of the function.

    FIXME: The term "function declaration" is misleading because it is not the
    same as the function declaration in C. It should be replaced with simple and
    explicit operations such as "create functions" and "create function
    versions" in future Mu designs. For example::

        .func Name <Sig>

        .funcver VerName DEFINES FuncName {
            %entry(...):
            ...
        }

In the bundle building API, A function node is created using the `new_func
<irbuilder.rst#new-func>`__ function.

It is an error to create two functions of the same name/ID. It is considered a
conflict.

    NOTE for C programmers: If a function is already created in one bundle, it
    is already visible (as already created previously) in subsequent bundles,
    and does not need to be created again. This is different from the C model,
    where each compilation unit "declares" functions which are defined in other
    compilation units, and are resolved at link time. C does not address the
    order of bundle loading, but in Mu, the order matters because it determines
    the order of function redefinition.

..

    Example::

        .typedef @i64 = int<64>
        .typedef @float = float
        .typedef @double = double
Kunshan Wang's avatar
Kunshan Wang committed
959

Kunshan Wang's avatar
Kunshan Wang committed
960
        .funcsig @ExampleSig = (@float @double) -> (@i64)
Kunshan Wang's avatar
Kunshan Wang committed
961

Kunshan Wang's avatar
Kunshan Wang committed
962
        .funcdecl @example <@ExampleSig>
Kunshan Wang's avatar
Kunshan Wang committed
963

Kunshan Wang's avatar
Kunshan Wang committed
964 965 966 967 968 969
Creating a Function With a Version (Defined Function)
-----------------------------------------------------

In the text form, the top-level **function definition** creates a function
version of a given function. If the function is not yet created, *function
definition* will implicitly define the *function*. The text form syntax is::
Kunshan Wang's avatar
Kunshan Wang committed
970

971
    .funcdef Name VERSION VerName <Sig> { Body }
Kunshan Wang's avatar
Kunshan Wang committed
972

973
where:
Kunshan Wang's avatar
Kunshan Wang committed
974

975 976
* ``Name`` is a global name for the function
* ``VerName`` is a global name of this particular version of function
Kunshan Wang's avatar
Kunshan Wang committed
977
* ``Sig`` is a global name for the signature of the function
978
* ``Body`` is a sequence of instructions, constants and labels.
Kunshan Wang's avatar
Kunshan Wang committed
979

Kunshan Wang's avatar
Kunshan Wang committed
980 981 982 983 984 985 986
In the `bundle building API <irbuilder.rst#define-a-function>`__, *functions*
and *function versions* are created separately.

The function itself is created by the ``new_func`` function. Then a function
version is created using the ``new_func_ver`` function. Basic blocks and
instructions are created using their respective functions. See the `API
<irbuilder.rst#define-a-function>`__.
Kunshan Wang's avatar
Kunshan Wang committed
987

Kunshan Wang's avatar
Kunshan Wang committed
988 989 990 991 992 993 994 995
It is an error to create two versions of the same function in one bundle.

    NOTE: Bundle is the unit of loading. Bundle loading has the side effect of
    re-defining existing functions. If a bundle has a new version of an existing
    function, the new version will become the newest version when the bundle is
    loaded. So it does not make sense to have two versions of the same function
    in the same bundle: which new version should Mu redefine the old function
    to?
Kunshan Wang's avatar
Kunshan Wang committed
996

997
..
Kunshan Wang's avatar
Kunshan Wang committed
998

999
    Example::
Kunshan Wang's avatar
Kunshan Wang committed
1000

Kunshan Wang's avatar
Kunshan Wang committed
1001 1002 1003
        .typedef @i64 = int<64>
        .typedef @float = float
        .typedef @double = double
Kunshan Wang's avatar
Kunshan Wang committed
1004

1005
        .funcsig @ExampleSig = (@float @double) -> (@i64)
Kunshan Wang's avatar
Kunshan Wang committed
1006

1007
        .funcdef @example VERSION %v1 <@ExampleSig> {
1008 1009
            ...
        }
Kunshan Wang's avatar
Kunshan Wang committed
1010

Kunshan Wang's avatar
Kunshan Wang committed
1011
.. _func-exec:
Kunshan Wang's avatar
Kunshan Wang committed
1012

Kunshan Wang's avatar
Kunshan Wang committed
1013 1014
Semantics of Function Calls
---------------------------
Kunshan Wang's avatar
Kunshan Wang committed
1015

Kunshan Wang's avatar
Kunshan Wang committed
1016 1017 1018 1019
When a thread executes a *function*, a **conceptual** lookup is performed to
find the latest version of a function with respect to the current thread (see
`memory model <memory-model.rst#special-funcdef>`__). If the version is found,
the thread executes that function from the entry block.
Kunshan Wang's avatar
Kunshan Wang committed
1020

Kunshan Wang's avatar
Kunshan Wang committed
1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032
    NOTE: The lookup is conceptual because it does not have to be implemented as
    an actual indirection. When Mu is implemented as a native code compiler,
    "redefinition" can be implemented as "patching the first instruction of all
    old function versions into a JMP instruction to the newest version". By
    doing this, all call instructions to old versions will be redirected to the
    newest version.

    ``funcref`` can be implemented as direct function pointers to the latest
    version. When function redefinition happens, some ``funcref`` will still
    point to the old version, but this does not affect the correctness of
    function calls. ``funcref`` can be updated at GC time using the same
    mechanism as object movement.
Kunshan Wang's avatar
Kunshan Wang committed
1033

Kunshan Wang's avatar
Kunshan Wang committed
1034 1035 1036 1037 1038 1039
    Static call sites may use the compiled function version's address as
    immediate operands. These instructions can be patched at any appropriate
    time.

If the version is not found (executing undefined function), it behaves as if it
has a hidden version defined as::
1040

1041 1042
    .funcdef Name VERSION NoVersion <Sig> {
        %entry(ParamList):
1043
            TRAP <> KEEPALIVE (ParamList)
1044 1045 1046
            TAILCALL <Sig> Name (ParamList)
    }

1047
That is, it will trap to the client, using all parameters as the keep-alive
Kunshan Wang's avatar
Kunshan Wang committed
1048
variables. If the stack is ever rebound, passing no values, it will try to
1049
tail-call the same function (**Not necessarily the same hidden version!** It may
Kunshan Wang's avatar
Kunshan Wang committed
1050 1051 1052
have been defined by the client in the trap! Clients that support lazy loading
are very likely to do so.) using the same arguments. If an exception is thrown
when rebound, the ``TRAP`` in this hidden version will re-throw it to the parent
1053
frame. The ``cur_func`` API will return the ID of the function.  This hidden
Kunshan Wang's avatar
Kunshan Wang committed
1054 1055 1056
version is still not a real version, so the ``cur_func_ver`` API function will
return 0.  The ``TRAP`` is not a real instruction, either, so the ``cur_inst``
API function will also return 0.  ``dump_keepalives`` will dump the arguments.
Kunshan Wang's avatar
Kunshan Wang committed
1057

Kunshan Wang's avatar
Kunshan Wang committed
1058

Kunshan Wang's avatar
Kunshan Wang committed
1059
See the `Bundle Loading`_ section for the semantics of bundle loading and
Kunshan Wang's avatar
Kunshan Wang committed
1060
function redefinition.
Kunshan Wang's avatar
Kunshan Wang committed
1061

1062 1063
.. _exception-parameter:

Kunshan Wang's avatar
Kunshan Wang committed
1064 1065 1066
Function Body
=============

1067
A function definition has a **function body**.
Kunshan Wang's avatar
Kunshan Wang committed
1068

1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079
A function body has many **basic block** enclosed between ``{`` and ``}``. A
basic block has the following form::

    Name (ParamList) ExcParam:
        Inst
        Inst
        ...

where ``ParamList`` is a list of type-name pairs: ``<T1> N1 <T2> N2 <T3> N3
...``, which specifies the normal **parameters** to the basic block.
``ExcParam`` is optional.  When present, it has the form ``[ ExcName ]`` where
Kunshan Wang's avatar
Kunshan Wang committed
1080 1081
``ExcName`` is the name of the **exceptional parameter**, which is also a
(non-normal) parameter. The exceptional parameter always has the ``ref<void>``
1082 1083
type. Many instructions follow the colon ``:``.

1084 1085 1086 1087 1088 1089 1090 1091
    Example::

        %bb1(<@T1> %p1 <@T2> %p2 <@T3> %p3):
            ...

        %bb2() [%exc]:
            ...

Kunshan Wang's avatar
Kunshan Wang committed
1092 1093
The first basic block is the **entry block**. The parameters of the entry block
must match the parameters of the function (determined by its signature).  The
1094
execution starts from the entry block, and its parameters receive the arguments
Kunshan Wang's avatar
Kunshan Wang committed
1095
to the function. The entry block must not have the exceptional parameter.
1096 1097

..
Kunshan Wang's avatar
Kunshan Wang committed
1098

1099 1100
    NOTE: The name of the entry block is conventionally called ``%entry``, but
    is not compulsory.
Kunshan Wang's avatar
Kunshan Wang committed
1101

Kunshan Wang's avatar
Kunshan Wang committed
1102 1103
The entry block must not be branched to from any basic blocks. Other basic
blocks are executed when branched to. The normal parameters receive arguments
Kunshan Wang's avatar
Kunshan Wang committed
1104 1105 1106 1107 1108 1109 1110 1111 1112
from the branching sites. The exceptional parameter receives the exception
caught by the branching site as the argument. If the exceptional parameter is
omitted but the basic block is supposed to receive an exception, the exception
will be silently ignored. A basic block with an exceptional parameter must only
be used as the exceptional destination of instructions which can catch
exceptions, which currently include ``CALL``, ``TRAP``, ``WATCHPOINT`` and
``SWAPSTACK``.

    An example of a basic block with an exceptional parameter::
1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125

        %entry():
            ...
            %rv = CALL <...> @foo (...) EXC(%nor_dest(%rv) %exc_dest(%a %b %c))

        %nor_dest(<@T0> %p0):
            // process the return value
            // %p0 = %rv

        %exc_dest(<@T1> %p1 <@T2> %p2 <@T3> %p3) [%exc]:
            // handle exceptions here
            // %p1 = %a, %p2 = %b, %p3 = c, %exc = the exception

1126 1127
Each basic block contains a sequence of **instructions**. An instruction has one
of the following forms::
Kunshan Wang's avatar
Kunshan Wang committed
1128

1129
    ( Name1 Name2 Name3 ... ) = InstName InstBody
Kunshan Wang's avatar
Kunshan Wang committed
1130

1131
    Name = InstName InstBody
Kunshan Wang's avatar
Kunshan Wang committed
1132

1133
    InstName InstBody
Kunshan Wang's avatar
Kunshan Wang committed
1134

1135 1136
The left hand side can be a list of names: ``( Name1 Name2 Name3 ... )``, each
is bound to a result of an instruction.
Kunshan Wang's avatar
Kunshan Wang committed
1137

1138 1139 1140
The latter two forms are syntax sugars. A single name without brackets is a
syntax sugar of ``( Name )``. If both the name and the equal sign ``=`` are
omitted, it is equivalent to an empty list of names: ``()``.
Kunshan Wang's avatar
Kunshan Wang committed
1141

1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168
The number of results written in the IR must match the actual number of results
the instruction produces.

The ``InstName`` is optional. When present, it has the form ``[ Name ]``, where
``Name`` is the name of the instruction. This can be used to identify a
particular instruction, especially ``CALL`` and ``TRAP``. If ``InstName`` is
absent, the instruction does not have a name.

    Examples:

    ``%call_site`` is the name of the instruction; ``%rv1``, ``%rv2`` and
    ``%rv3`` are the names of return values::

        (%rv1 %rv2 %rv3) = [%call_site] CALL <@sig> @callee (%a1 %a2)

    The instruction has no name; ``%rv`` is the return value::

        %rv = ADD <@i64> %x %y

    ``%trap1`` is the name of the instruction; There is no return values. The
    empty angular bracket ``<>`` means the TRAP is not expecting any return
    values::

        [%trap1] TRAP <> KEEPALIVE (%x %y %z)


The grammar of the ``InstBody`` part for each instruction is defined separately
1169
in `<instruction-set.rst>`__.
1170 1171

    Full example::
Kunshan Wang's avatar
Kunshan Wang committed
1172

Kunshan Wang's avatar
Kunshan Wang committed
1173
        .typedef @i64 = int<64>
1174
        .funcsig @gcd_sig = (@i64 @i64) -> (@i64)
Kunshan Wang's avatar
Kunshan Wang committed
1175 1176 1177

        .const @i64_0 <@i64> = 0

1178 1179 1180
        .funcdef @gcd VERSION %v1 <@gcd_sig> {
            %entry(<@i64> %a <@i64> %b):
                BRANCH %head(%a %b)
Kunshan Wang's avatar
Kunshan Wang committed
1181

1182
            %head(<@i64> %a <@i64> %b):
Kunshan Wang's avatar
Kunshan Wang committed
1183
                %z = EQ <@i64> %b @i64_0
1184
                BRANCH2 %z %exit(%a) %body(%a %b)
Kunshan Wang's avatar
Kunshan Wang committed
1185

1186
            %body(<@i64> %a <@i64> %b):
Kunshan Wang's avatar
Kunshan Wang committed
1187
                %b1 = SREM <@i64> %a %b
1188
                BRANCH %head(%b %b1)
Kunshan Wang's avatar
Kunshan Wang committed
1189

1190
            %exit(<@i64> %a):
1191
                RET %a
1192
        }
Kunshan Wang's avatar
Kunshan Wang committed
1193

1194
The last instruction of any basic block must be a **terminator instruction**,
1195
which is one of the following:
Kunshan Wang's avatar
Kunshan Wang committed
1196

1197
- ``BRANCH``, ``BRANCH2``, ``SWITCH``, ``WPBRANCH``
1198 1199 1200 1201
- ``TAILCALL``
- ``RET``
- ``THROW``
- ``SWAPSTACK`` if the "current stack clause" is ``KILL_OLD``
1202
- Some `Common Instructions <common-insts.rst>`__ are always terminators:
1203 1204 1205

  - ``@uvm.thread_exit``

1206 1207
- Any instructions that may have an **exception clause** and actually have the
  exception clause, which are:
Kunshan Wang's avatar
Kunshan Wang committed
1208

1209
  - Binary operations (only ``UDIV``, ``SDIV``, ``UREM`` and ``SREM``)
1210
  - ``CALL``, ``CCALL``
1211 1212 1213
  - ``NEW``, ``NEWHYBRID``, ``ALLOCA``, ``ALLOCAHYBRID``
  - ``LOAD``, ``STORE``, ``CMPXCHG``, ``ATOMICRMW``
  - ``TRAP``, ``WATCHPOINT``
1214
  - ``NEWTHREAD``, ``SWAPSTACK``
1215
  - Some `Common Instructions <common-insts.rst>`__ when having exception
1216 1217 1218
    clause

    - ``@uvm.new_stack``
Kunshan Wang's avatar
Kunshan Wang committed
1219

1220
..
Kunshan Wang's avatar
Kunshan Wang committed
1221

1222 1223 1224
    NOTE: This is to say, for example, if a particular ``CALL`` instruction does
    have an exception clause, then it is a terminator. If it does not have
    exceptional clause clause, it is not a terminator.
Kunshan Wang's avatar
Kunshan Wang committed
1225

1226 1227 1228
Function Exposing Definition
============================

1229 1230 1231 1232 1233
A **function exposing definition** creates a value (usually an untraced function
pointer, ``ufuncptr``) so that the Mu function can be called from native
programs (usually C programs).

In the text form, it has the following form::
1234

1235
    .expose Name = FuncName CallConv Cookie
1236 1237 1238

where:

1239
* ``Name`` is a global name of the exposed function.
1240

1241
* ``FuncName`` is the name of a Mu function.
1242

1243
* ``CallConv`` is a flag that denotes the calling convention. See platform-specific ABI.
1244

1245 1246
* ``Cookie`` is the cookie. Must be the global name to a ``int<64>`` constant.

1247
This definition exposes a Mu function *FuncName* as an exposed value, identified
1248 1249 1250 1251
by *Name*, using the calling convention *CallConv*.

The *Cookie* is an ``int<64>`` constant attached to this exposed value. See
`Native Interface <native-interface.rst#cookie>`__ for more explanations.
1252

Kunshan Wang's avatar
Kunshan Wang committed
1253 1254 1255 1256 1257
    NOTE: It is the **function** that is exposed, not the version. Like in Mu, C
    programs can only directly call Mu functions rather than function versions.
    When a Mu function is redefined, the exposed function will automatically
    use the new version if called after redefining.

1258
How such an exposed function can be called is implementation-specific.
1259

Kunshan Wang's avatar
Kunshan Wang committed
1260 1261 1262 1263 1264
    NOTE: The spec is just trying to be general. In most cases, it will be
    called from C, and the exposed function is seen by C as a function pointer,
    and can be called using the same C calling convention (usually
    ``#DEFAULT``).

1265 1266 1267 1268
    Example::

        .expose @name = @func #DEFAULT @cookie

Kunshan Wang's avatar
Kunshan Wang committed
1269 1270 1271
Bundle Loading
==============

1272
The API provides the `load_bundle <api.rst#bundle-and-hail-loading>`__ and the
1273 1274 1275 1276
`IRBuilder.load <irbuilder.rst#load>`__ functions. These functions load bundles
from the text form or the constructed AST. They can be can be called by multiple
client threads on their `client contexts <api.rst#client-context>`__, and the
result is always equivalent to as if they were loaded in a particular sequence.
1277

1278 1279
The client must ensure the names of all entities in all bundles (already loaded
or being loaded) are distinct.
Kunshan Wang's avatar
Kunshan Wang committed
1280

1281 1282 1283 1284 1285 1286
    NOTE: There is a special case for the text form: If the function name in the
    ``.funcdef`` definition (such as ``@f`` in ``.funcdef @f VERSION @v <@sig> {
    ... }``) is the same as a function already loaded, it only defines a new
    version ``@v`` for the existing function ``@f``. If the function ``@f`` does
    not exist, the text form creates both a new function ``@f`` and a new
    version ``@v``.
Kunshan Wang's avatar
Kunshan Wang committed
1287

1288 1289
If a bundle contains a new version of an existing function, it **redefines** the
function. After this bundle is loaded, all function-calling operations to the
1290 1291 1292 1293 1294 1295 1296 1297
function that `happen after <memory-model.rst#happens-before>`__ the bundle
loading operation will call the newly defined version of the function.  The
actions of defining functions (bundle loading) and using of functions (including
function calls and the creation of stacks, i.e. the `@uvm.new_stack
<common-insts.rst#uvm-new-stack>`__ instruction or the `new_stack
<api.rst#new-stack>`__ API) obey the memory model of the ``RELAXED`` order as if
the definition is a store and the use is a load.  See `Memory Model
<memory-model.rst#special-funcdef>`__.
Kunshan Wang's avatar
Kunshan Wang committed
1298 1299 1300 1301

All existing activations of any functions remain unchanged, that is, they remain
to be the old versions of the functions. 

1302 1303 1304
    NOTE: Specifically, existing `traps (including watchpoints)
    <instruction-set.rst#traps-and-watchpoints>`__ in older versions of
    functions remain valid. During OSR, redefining a function will not affect
1305 1306
    any existing function activations unless they are explicitly popped by the
    client.
Kunshan Wang's avatar
Kunshan Wang committed
1307

Kunshan Wang's avatar
Kunshan Wang committed
1308
.. vim: textwidth=80