GitLab will be upgraded to the 12.10.14-ce.0 on 28 Sept 2020 at 2.00pm (AEDT) to 2.30pm (AEDT). During the update, GitLab and Mattermost services will not be available. If you have any concerns with this, please talk to us at N110 (b) CSIT building.

memory-model.rst 24.4 KB
Newer Older
Kunshan Wang's avatar
Kunshan Wang committed
1 2 3 4
============
Memory Model
============

Kunshan Wang's avatar
Kunshan Wang committed
5 6
The Mu memory model basically follows the C11 memory model with a few
modifications to make it suitable for Mu.
Kunshan Wang's avatar
Kunshan Wang committed
7

Kunshan Wang's avatar
Kunshan Wang committed
8
Overview
Kunshan Wang's avatar
Kunshan Wang committed
9
========
Kunshan Wang's avatar
Kunshan Wang committed
10

Kunshan Wang's avatar
Kunshan Wang committed
11 12 13 14 15
Mu does not enforce any strong order, but trusts the client to correctly use the
atomic and ordering mechanisms provided by Mu. Many choices of ordering, from
relaxed to sequentially consistent, on each memory operation are given to the
client. The client has the freedom to make its choice and has the responsibility
to synchronise their multi-threaded programs.
Kunshan Wang's avatar
Kunshan Wang committed
16

Kunshan Wang's avatar
Kunshan Wang committed
17
The most restricted form of memory accesses are sequentially consistent. Mu
Kunshan Wang's avatar
Kunshan Wang committed
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
guarantees that they are atomic. They follow the well-known acquire-release
memory model and there is a total order of all such memory accesses in all
threads.

A less-restricted form is acquire and release. It does not have the total
order provided by sequential consistency, but atomicity and the synchronise-with
relationship between release and acquire operations are provided.

The "consume" order exploits the fact that some processors with relaxed memory
order can figure out the dependencies between load operations in the hardware
and will not reorder them even without memory fences. It can achieve some
synchronisation requirements more efficiently than the "acquire" order.

The "relaxed" order only guarantees atomicity but does not enforce any order.
The most unrestricted form of memory access is not atomic. These operations
Kunshan Wang's avatar
Kunshan Wang committed
33
allows the Mu implementation and the processor to maximise the throughput while
Kunshan Wang's avatar
Kunshan Wang committed
34 35
relying on the programmer to correctly synchronise their programs.

36
Notable differences from C11
Kunshan Wang's avatar
Kunshan Wang committed
37
----------------------------
Kunshan Wang's avatar
Kunshan Wang committed
38

Kunshan Wang's avatar
Kunshan Wang committed
39 40 41
The program order in Mu is a total order while the sequenced-before relationship
in C is a partial order, because there are unspecified order of evaluations in
C.
Kunshan Wang's avatar
Kunshan Wang committed
42

Kunshan Wang's avatar
Kunshan Wang committed
43
There is no "atomic type" in Mu. Only operations make a difference between
Kunshan Wang's avatar
Kunshan Wang committed
44
atomic and non-atomic accesses. Using both atomic and non-atomic operations on
Kunshan Wang's avatar
Kunshan Wang committed
45
the same memory location is an undefined behaviour in Mu.
Kunshan Wang's avatar
Kunshan Wang committed
46 47

The primitives for atomic accesses and fences are provided by the instruction
Kunshan Wang's avatar
Kunshan Wang committed
48
set of Mu rather than the library. Mutex locks, however, have to be implemented
Kunshan Wang's avatar
Kunshan Wang committed
49 50
on top of this memory model.

51
Notable differences from LLVM
Kunshan Wang's avatar
Kunshan Wang committed
52
-----------------------------
53 54 55 56 57 58 59 60 61

LLVM does not guarantee that all atomic writes to a memory location has a total
order unless using ``monotonic`` or stronger memory order. It provides an
``unordered`` order which is atomic but not "monotonic". The ``unordered`` order
is intended to support the Java memory model, but whether ``unordered`` is
necessary and whether the ``relaxed`` order in C11 or the ``monotonic`` in LLVM
is suitable for Java is not yet known.

In LLVM, an atomic operation can be labelled ``singlethread``, in which case it
62
only synchronises with or participates in modification and ``seq_cst`` total
63 64
orderings with other operations running in the same thread (for example, in
signal handlers). C11 provides ``atomic_signal_fence`` for similar purposes.
Kunshan Wang's avatar
Kunshan Wang committed
65 66 67 68 69

Concepts
========

data value
70
    See `Type System <type-system.rst>`__
Kunshan Wang's avatar
Kunshan Wang committed
71 72

SSA variable, instruction and evaluation
73
    See `Instruction Set <instruction-set.rst>`__
Kunshan Wang's avatar
Kunshan Wang committed
74 75

memory, initial value, load, store, access and conflict
76
    See `Mu and the Memory <memory.rst>`__
Kunshan Wang's avatar
Kunshan Wang committed
77

Kunshan Wang's avatar
Kunshan Wang committed
78
thread
Kunshan Wang's avatar
Kunshan Wang committed
79
    A thread is the unit of CPU scheduling. In this memory model, threads
80
    include but are not limited to Mu threads. See `Threads and Stacks <threads-stacks.rst>`__ for the
Kunshan Wang's avatar
Kunshan Wang committed
81
    definition of Mu threads.
82 83

stack, stack binding, stack unbinding, swap-stack
84
    See `Threads and Stacks <threads-stacks.rst>`__
85 86

futex, futex_wait, futex_wake
87
    See `Threads and Stacks <threads-stacks.rst>`__
Kunshan Wang's avatar
Kunshan Wang committed
88

Kunshan Wang's avatar
Kunshan Wang committed
89 90 91 92 93
Comparison of Terminology
-------------------------

The following table is a approximate comparison and may not strictly apply.

94
=================== ================================
Kunshan Wang's avatar
Kunshan Wang committed
95
C                   Mu
96
=================== ================================
Kunshan Wang's avatar
Kunshan Wang committed
97
value               data value
Kunshan Wang's avatar
Kunshan Wang committed
98
expression          SSA variable
Kunshan Wang's avatar
Kunshan Wang committed
99
object              memory location
100
memory location     memory location of scalar type
Kunshan Wang's avatar
Kunshan Wang committed
101 102 103
(N/A)               object
read                load
modify              store
104
=================== ================================
Kunshan Wang's avatar
Kunshan Wang committed
105

106 107 108 109 110 111 112 113 114 115 116 117
Operations
==========

Operations include (but are not limited to) the following:

load
    A memory load. May be atomic or not.

store
    A memory store. May be atomic or not.

atomic read-modify-write
118 119 120
    A load and (maybe conditionally) a store as one atomic action. It may
    contain both a load and a store operation, but may have special atomic
    properties.
121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140
    
fence
    A fence introduces memory orders.

stack binding
    Binding a thread to a stack.

stack unbinding
    Unbinding a thread from a stack.

swap-stack
    Unbinding a thread from a stack and bind that thread to another stack.

futex wait
    Waiting on a memory location.

futex wake
    Wake up threads waiting on a memory location.

external operation
Kunshan Wang's avatar
Kunshan Wang committed
141
    Any other operation that may affect the state outside Mu.
142 143 144

..

Kunshan Wang's avatar
Kunshan Wang committed
145
    NOTE: Unlike the Java Memory Model, Mu memory model does not contain locks. 
146

Kunshan Wang's avatar
Kunshan Wang committed
147 148 149
Memory Operations
=================

150
Some instructions and API functions perform memory operations. Specifically,
Kunshan Wang's avatar
Kunshan Wang committed
151

152
- The ``LOAD`` instruction and the ``load`` API function perform a load
153
  operation.
154
- The ``STORE`` instruction and the ``store`` API function perform a store
155
  operation.
156
- The ``CMPXCHG`` instruction and the ``cmpxchg`` API function perform a
157 158
  compare-exchange operation, which is a kind of atomic read-modify-write
  operation.
159
- The ``ATOMICRMW`` instruction and the ``atomicrmw`` API function perform an
160
  atomic read-modify-write operation.
161
- The ``FENCE`` instruction and the ``fence`` API function are a fence.
162
- A concrete implementation may have other ways to perform those instructions.
Kunshan Wang's avatar
Kunshan Wang committed
163

164 165
..

166 167 168 169
    NOTE: Programs in other languages (e.g. native programs or any other
    language a Mu implementation can interface with) can synchronise with Mu in
    an implementation-specific way. But the implementation must guarantee that
    those programs perform those operations in a way compatible with Mu.
Kunshan Wang's avatar
Kunshan Wang committed
170

171 172 173
    For example, there are more than one way to implement loads and stores of
    the SEQ_CST order (either put fences in the load or in the store). If the
    implementation interfaces with a C implementation (e.g.
Kunshan Wang's avatar
Kunshan Wang committed
174
    gcc+glibc+Linux+x86_64), then Mu should do the same thing as (or be
175
    compatible with) the C program.
Kunshan Wang's avatar
Kunshan Wang committed
176

177 178
Load, store, atomic read-modify-write operations and fences have memory orders,
which are the following:
Kunshan Wang's avatar
Kunshan Wang committed
179 180 181 182 183 184 185 186 187

- NOT_ATOMIC
- RELAXED
- CONSUME
- ACQUIRE
- RELEASE
- ACQ_REL (acquire and release)
- SEQ_CST (sequentially consistent)

Kunshan Wang's avatar
Kunshan Wang committed
188 189 190
All accesses that are not NOT_ATOMIC are atomic. Using both non-atomic
operations and atomic operations on the same memory location is an undefined
behaviour.
Kunshan Wang's avatar
Kunshan Wang committed
191

192 193 194 195 196 197 198
- Load shall have NOT_ATOMIC, RELAXED, CONSUME, ACQUIRE or SEQ_CST order.
- Store shall have NOT_ATOMIC, RELAXED, RELEASE or SEQ_CST order.
- Compare-exchange shall have RELAXED, ACQUIRE, RELEASE, ACQ_REL or SEQ_CST on
  success and RELAXED, ACQUIRE or SEQ_CST on failure.
- Other atomic read-modify-write operations shall have RELAXED, ACQUIRE,
  RELEASE, ACQ_REL or SEQ_CST order.
- Fence shall have ACQUIRE, RELEASE, ACQ_REL or SEQ_CST order.
Kunshan Wang's avatar
Kunshan Wang committed
199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226

=========== ======= ======= =============== =============== =========== =====
Order       LOAD    STORE   CMPXCHG(succ)   CMPXCHG(fail)   ATOMICRMW   FENCE
=========== ======= ======= =============== =============== =========== =====
NOT_ATOMIC  yes     yes     no              no              no          no
RELAXED     yes     yes     yes             yes             yes         no
CONSUME     yes     no      no              no              no          no
ACQUIRE     yes     no      yes             yes             yes         yes
RELEASE     no      yes     yes             no              yes         yes
ACQ_REL     no      no      yes             no              yes         yes
SEQ_CST     yes     yes     yes             yes             yes         yes
=========== ======= ======= =============== =============== =========== =====

- A load operation with ACQUIRE, ACQ_REL or SEQ_CST order performs a **acquire**
  operation on its specified memory location.
- A load operation with CONSUME order performs a **consume** operation on its
  specified memory location.
- A store operation with RELEASE, ACQ_REL or SEQ_CST order performs a
  **release** operation on its specified memory location.
- A fence with ACQUIRE, ACQ_REL or SEQ_CST order is a **acquire fence**.
- A fence with RELEASE, ACQ_REL or SEQ_CST order is a **release fence**.

Orders
======

Program Order
-------------

Kunshan Wang's avatar
Kunshan Wang committed
227
All evaluations performed by a Mu thread form a total order, in which the
228 229 230
operations performed by each evaluation are **sequenced before** operations
performed by its successor.

231 232 233
All operations performed by a Mu client via a particular client context of the
API form a total order, in which each operation is **sequenced before** its
successor.
234

235 236 237
Operations before a ``TRAP`` or ``WATCHPOINT`` are **sequenced before** the
operations in the trap handler. Operations in the trap handler are **sequenced
before** operations after that ``TRAP`` or ``WATCHPOINT``.
Kunshan Wang's avatar
Kunshan Wang committed
238

239 240 241
The **program order** contains operations and their "sequenced before"
relations.

242 243
    NOTE: This means all Mu instructions plus all client operations done by the
    trap handler in a Mu thread still forms a total order.
244 245 246

    In C, the program order is a partial order even in a single thread because
    of unspecified order of evaluations.
Kunshan Wang's avatar
Kunshan Wang committed
247 248 249 250 251 252 253 254 255

Modification Order
------------------

All atomic store operations on a particular memory location M occur in some
particular total order, called the **modification order** of M. If A and B are
atomic stores on memory location M, and A happens before B, then A shall precede
B in the modification order of M.

Kunshan Wang's avatar
Kunshan Wang committed
256 257
    NOTE: This is to say, the modification order is consistent with the happens
    before order.
Kunshan Wang's avatar
Kunshan Wang committed
258

Kunshan Wang's avatar
Kunshan Wang committed
259 260
    NOTE: This reflects the mechanisms, including cache coherence, provided by
    some hardware that guarantees such a total order.
Kunshan Wang's avatar
Kunshan Wang committed
261

Kunshan Wang's avatar
Kunshan Wang committed
262 263 264 265 266 267
A **release sequence** headed by a release operation A on a memory location M is
a maximal contiguous sub-sequence of atomic store operations in the modification
order M, where the first operation is A and every subsequent operation either is
performed by the same thread that performed the release or is an atomic
read-modify-write operation.

Kunshan Wang's avatar
Kunshan Wang committed
268
    NOTE: In Mu, when a memory location is accessed by both atomic and
Kunshan Wang's avatar
Kunshan Wang committed
269 270 271 272 273 274
    non-atomic operations, it is an undefined behaviour. So the release sequence
    only apply for memory locations only accessed by atomic operations.

    NOTE: Intuitively, there is a invisible fence before a release store (which
    is sometimes actually implemented as this). Seeing a store in the release
    sequence should imply seeing stores before the invisible fence.
Kunshan Wang's avatar
Kunshan Wang committed
275

Kunshan Wang's avatar
Kunshan Wang committed
276 277
The Synchronises With Relation
------------------------------
Kunshan Wang's avatar
Kunshan Wang committed
278

Kunshan Wang's avatar
Kunshan Wang committed
279 280
An evaluation A **synchronises with** another evaluation B if:

281 282 283
- A performs a release operation on memory location M, and, B performs an
  acquire operation on M, and, sees a value stored by an operation in the
  release sequence headed by A, or
Kunshan Wang's avatar
Kunshan Wang committed
284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299
- A is a release fence, and, B is an acquire fence, and, there exist atomic
  operations X and Y, both operating on some memory location M, such that A is
  sequenced before X, X store into M, Y is sequenced before B, and Y sees the
  value written by X or a value written by any store operation in the
  hypothetical release sequence X would head if it were a release operation, or
- A is a release fence, and, B is an atomic operation that performs an acquire
  operation on a memory location M, and, there exists an atomic operation X such
  that A is sequenced before X, X stores into M, and B sees the value written by
  X or a value written by any store operations in the hypothetical release
  sequence X would head if it were a release operation, or
- A is an atomic operation that performs a release operation on M, and, B is an
  acquire fence, and, there exists some atomic operation X on M such that X is
  sequenced before B and sees the value written by A or a value written by any
  side effect in the release sequence headed by A, or
- A is the creation of a thread and B is the beginning of the execution of the
  new thread.
300 301
- A is a futex wake operation and B is the next operation after the futex wait
  operation of the thread woken up by A.
Kunshan Wang's avatar
Kunshan Wang committed
302 303 304

..

305 306
    NOTE: A thread can be created by the ``NEWTHREAD`` instruction or the
    ``new_thread`` API function.
Kunshan Wang's avatar
Kunshan Wang committed
307

Kunshan Wang's avatar
Kunshan Wang committed
308
    NOTE: Since there is no explicit heap memory management in Mu, the
Kunshan Wang's avatar
Kunshan Wang committed
309
    "synchronises with" relation in C involving ``free`` and ``realloc`` does
Kunshan Wang's avatar
Kunshan Wang committed
310
    not apply in Mu.
Kunshan Wang's avatar
Kunshan Wang committed
311

Kunshan Wang's avatar
Kunshan Wang committed
312
    NOTE: Mu only provides very primitive threading support. The "synchronises
313 314
    with" relations involving ``call_once`` and ``thrd_join`` are not in the
    memory model, but can be implemented on a higher level.
Kunshan Wang's avatar
Kunshan Wang committed
315

316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341
    NOTE: The "synchronises with" relation between the futex wake and wait is
    necessary to ensure the visibility of values written by one thread to be
    visible immediately by the woken thread. If such relation does not exist,
    the woken thread may never see the memory change made by the other thread.
    For example::

        // C pseudo code
        int        shared_var = 42;
        atomic_int futex      = 0;

        thread1 {
            shared_var = 43;
            futex = 1;              // Op1
            futex_wake(&futex);     // Op2
        }

        thread2 {
            while(futex == 0) {         // Op4
                futex_wait(&futex, 0);  // Op3
            }
            int local_var = shared_var;
        }

    If the "synchronises with" between Op2 and Op3 does not exist, then Op4 may
    never see the value written by Op1, and thread2 will loop indefinitely.

Kunshan Wang's avatar
Kunshan Wang committed
342 343 344
Dependency
----------

Kunshan Wang's avatar
Kunshan Wang committed
345 346 347 348 349 350 351 352 353
An evaluation A **carries a dependency to** another evaluation B, or B *carries
a dependency from* A, if:

- the data value of A is used as a data argument of B unless:

  * A is used in the ``KEEPALIVE`` clause of B, or
  * B is a ``SELECT`` instruction and A is its ``cond`` argument or a is the
    ``iftrue`` or ``iffalse`` argument not selected by ``cond``, or
  * A is a comparing or ``INSERTVALUE`` instruction, or
354 355
  * B is a ``@uvm.kill_dependency``, ``CALL``, ``EXTRACTVALUE`` or ``CCALL``
    instruction, or
Kunshan Wang's avatar
Kunshan Wang committed
356

357 358
- there is a store operation X such that A is sequenced before X and X is
  sequenced before B, and, X stores the value of A to a memory location M, and,
359
  B performs a load operation from M, or
Kunshan Wang's avatar
Kunshan Wang committed
360 361 362 363 364 365 366 367 368 369 370
- for some evaluation X, A carries a dependency to X and X carries a dependency
  to B.

..

    NOTE: The "carries a dependency to" relation together with the
    "dependency-ordered before"" relation exploits the fact that some
    processors, notably ARM and POWER, will not reorder load operations if the
    address used in the later in the program order depends on the result of the
    earlier load. On such processors, the earlier load can be implemented as an
    ordinary load without fences and still has "consume" semantic.
Kunshan Wang's avatar
Kunshan Wang committed
371

Kunshan Wang's avatar
Kunshan Wang committed
372 373 374 375 376 377
    NOTE: Processors including ARM and POWER only respects data dependency, not
    control dependency. The ``SELECT`` instruction and the comparing instruction
    are usually implemented by conditional moves or conditional flags, which
    would end up that the result is control-dependent on the argument rather
    than data dependent.

Kunshan Wang's avatar
Kunshan Wang committed
378
    NOTE: Operations involving ``struct`` types in Mu may be implemented as
Kunshan Wang's avatar
Kunshan Wang committed
379 380
    no-ops. Consider the following::

Kunshan Wang's avatar
Kunshan Wang committed
381 382 383 384 385
        .typedef @i64 = int<64>
        .const @I64_0 <@i64> = 0

        .type @A = struct <@i64 @i64>
        .const @A_ZERO <@A> = {@I64_0 @I64_0}
Kunshan Wang's avatar
Kunshan Wang committed
386

Kunshan Wang's avatar
Kunshan Wang committed
387 388
        %v = LOAD CONSUME <@i64> %some_memory_location
        %x = INSERTVALUE  <@A 0> @A_ZERO %v     // {%v 0}
Kunshan Wang's avatar
Kunshan Wang committed
389 390 391
        %y = EXTRACTVALUE <@A 0> %x             // %v
        %z = EXTRACTVALUE <@A 1> %x             // 0

Kunshan Wang's avatar
Kunshan Wang committed
392 393
    Mu can alias ``%y`` with ``%v`` in the machine code, but ``%z`` is always a
    constant zero.
Kunshan Wang's avatar
Kunshan Wang committed
394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414

    NOTE: Dependencies may not always be carried across function calls. A
    function may return a constant and it is uncertain if any processor respect
    this order.

An evaluation A is **dependency-ordered before** another evaluation B if any of
the following is true:

* A performs a release operation on a memory location M, and, in another thread,
  B performs a consume operation on M and sees a value stored by any store
  operations in the release sequence headed by A.
* For some evaluation X, A is dependency-ordered before X and X carries a
  dependency to B.

..

    NOTE: The "dependency-ordered before" relation consists of a release/consume
    pair followed by zero or more "carries a dependency to" relations. If the
    consume sees the value of (or "later than") the release operation, then
    subsequent loads that depends on the consume operation should also see
    values stored before the release operation.
Kunshan Wang's avatar
Kunshan Wang committed
415

416 417 418
..

    TODO: The "carries a dependency to" relation is not well-defined for the
Kunshan Wang's avatar
Kunshan Wang committed
419
    client since it may be written in a different language.
420

Kunshan Wang's avatar
Kunshan Wang committed
421
The Happens Before Relation
Kunshan Wang's avatar
Kunshan Wang committed
422 423
---------------------------

Kunshan Wang's avatar
Kunshan Wang committed
424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448
An evaluation A **inter-thread happens before** an evaluation B if A
synchronises with B, A is dependency-ordered before B, or, for some evaluation
X:

* A synchronises with X and X is sequenced before B,
* A is sequenced before X and X inter-thread happens before B, or
* A inter-thread happens before X and X inter-thread happens before B.

..

    NOTE: This basically allows any concatenations of "synchronises with",
    "dependency-ordered before" and "sequenced before" relations, but disallows
    ending with a "dependency-ordered before" relation followed by a "sequenced
    before" relation. It is disallowed because the consume load in the
    "dependency-ordered before" relation only respects later loads that works
    with a location that depends on the consume load, not arbitrary loads
    sequenced after it. It is only disallowed in the end because the release
    operation in a "synchronises with" relation or a "dependency-ordered before"
    relation will force the order between it and any preceding operations.

    NOTE: A sequence of purely "sequenced before" is not "inter-thread" and is
    also not allowed in the "inter-thread happens before" relation.

An evaluation A **happens before** an evaluation B if A is sequenced before B or
A inter-thread happens before B.
Kunshan Wang's avatar
Kunshan Wang committed
449 450 451 452

Value Visibility
----------------

Kunshan Wang's avatar
Kunshan Wang committed
453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532
A load operation B from a memory location M shall see the initial value of M,
the value stored by a store operation A sequenced before B, or other permitted
values defined later.

A **visible store operation** A to a memory location M with respect to a load
operation B from M satisfies the conditions:

* A happens before B, and
* there is no other store operation X to M such that A happens before X and X
  happens before B.

A non-atomic load operation B from memory location M shall see the value stored
by the visible store operation A.

    NOTE: If there is ambiguity about which store operation is visible to a
    non-atomic load operation, then there is a data race and the behaviour is
    undefined.

The **visible seqeunce of atomic store operations** to a memory location M with
respect to an atomic load operation B from M, is a maximal contiguous
sub-sequence of atomic store operations in the modification order of M, where
the first operation is visible with respect to B, and for every subsequent
operation, it is not the case that B happens before it. The atomic load
operation B sees the value stored by some atomic load operation in the visible
sequence M. Furthermore, if an atomic load operation A from memory location M
happens before an atomic load operation B from M, and A sees a value stored by
an atomic store operation X, then the value B sees shall either equal the value
seen by A, or be the value stored by an atomic store operation Y, where Y
follows X in the modification order of M.

    NOTE: This means, a load cannot see the value stored by an operation happens
    after it, or a store operation separated by another store in the
    happen-before relation.  Furthermore, the later operation of two loads
    cannot see an earlier value than that seen by the first load.

The execution of a program contains a **data race** if it contains two
conflicting non-atomic memory accesses in different threads, neither happens
before the other. Any such data race results in undefined behaviour.

    NOTE: Using both atomic and non-atomic accesses on the same memory location
    is already an undefined behaviour, whether in the same thread of not.

Special Rules for SEQ_CST
=========================

There shall be a single total order S on all SEQ_CST operations, consistent with
the "happens before" order and modification orders for all affected memory
locations, such that each SEQ_CST load operation B from memory location M sees
one of the following values:

* the result of the last store operation A that precedes B in S, if it exists,
  or
* if A exists, the result of some store operation to M in the visible sequence
  of atomic store operations with respect to B that is not SEQ_CST and does not
  happen before A, or
* if A does not exist, the result of some store operation to M in the visible
  sequence of atomic store operations with respect to B that is not SEQ_CST.

For an atomic load operation B from a memory location M, if there is a SEQ_CST
fence X sequenced before B, then B observes either the last SEQ_CST store
operation of M preceding X in the total order S or a later store operation of M
in its modification order.

For atomic operations A and B on a memory location M, where A stores into M and
B loads from M, if there is a SEQ_CST fence X such that A is sequenced before X
and B follows X in S, then B observes either the effect of A or a later store
operation of M in its modification order.

For atomic operations A and B on a memory location M, where A stores into M and
B loads from M, if there are SEQ_CST fences X and Y such that A is sequenced
before X, Y is sequenced before B and X precedes Y in S, then B observes either
the effect of A or a later store operation of M in its modification order.

Special Rules for Atomic Read-modify-write Operations
=====================================================

Atomic read-modify-write operations shall always see the last value (in the
modification order) stored before the store operation associated with the
read-modify-write operation.

533 534 535 536 537 538 539 540 541
Special Rules for Stack Operations
==================================

A swap-stack operation performs an unbinding operation followed by a binding
operation. The former is sequenced before the latter.

In the evaluation of a ``TRAP`` or ``WATCHPOINT`` instruction, the implied stack
unbinding operation is sequenced before any operations performed by the client.
If the client chooses to return and rebind the stack, the stack binding
Kunshan Wang's avatar
Kunshan Wang committed
542
operation is sequenced after all operations performed by the client and the
543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560
implied stack unbinding operation.

Stack binding and unbinding operations are not atomic. If there is a pair of
stack binding or unbinding operations on the same stack, but do not have a
"happens before" relation, it has undefined behaviour.

Special Rules for Futex
=======================

The load operations performed by the ``@uvm.futex.wait``,
``@uvm.futex.wait_timeout`` and ``@uvm.futex.cmp_requeue`` on the memory
location given by its argument are atomic.

Special Rules for Functions and Function Redefinition
=====================================================

The rules of memory access applies to functions as if

561
* a function were a memory location that holds a function version, and
562

563 564
* a creation of a frame for a function were an atomic load on that location of
  the RELAXED order, which sees a particular version, and
565 566

* a function definition or redefinition during the load of a bundle were an
567 568 569 570 571 572 573 574 575 576
  atomic store on that location of the RELAXED order, which stores a new
  version.

..

    NOTE: A frame is created when:
    
    1. calling a function by the ``CALL`` or ``TAILCALL`` instructions, or by
       native programs through exposed Mu functions, or

577 578
    2. creating a new stack by the ``@uvm.new_stack`` instruction or the
       ``new_stack`` API, or
579 580 581

    3. pushing a new frame by the ``push_frame`` API or the
       ``@uvm.meta.push_frame`` instruction.
582 583 584 585

The order of definitions and redefinitions of a particular function is
consistent with the order the bundles that contain the definitions are loaded.

586 587 588
    NOTE: This means synchronisation operations must be used to guarantee other
    threads other than the one which loads a bundle see the most recent version
    of a function.
589

Kunshan Wang's avatar
Kunshan Wang committed
590 591
Out-of-thin-air or Speculative stores
=====================================
Kunshan Wang's avatar
Kunshan Wang committed
592

Kunshan Wang's avatar
Kunshan Wang committed
593
TODO
Kunshan Wang's avatar
Kunshan Wang committed
594 595

.. vim: tw=80