To protect your data, the CISO officer has suggested users to enable GitLab 2FA as soon as possible.

Commit 49d68678 authored by Adam Nelson's avatar Adam Nelson
Browse files

Pandoc setup for synthesis paper

parent 734debc4
.holdir
munge.exe
synthesis.tex
*.htex
*.pdf
*.log
*.aux
thy_dir = ../..
thys = sumMonad uvmTypes uvmErrors uvmValues uvmMemory uvmIR uvmThreadsStacks uvmInstructionSemantics uvmScheduler
thy_names = $(patsubst %,$(thy_dir)/%Theory,$(thys))
thy_srcs = $(patsubst %,$(thy_dir)/%Script.sml,$(thys))
thy_files = $(patsubst %Theory,%Theory.sml,$(thy_names))
holdirfile = .holdir
pandoc_opts = +RTS -V0 -RTS
all: synthesis.pdf
$(holdirfile):
if test -z "$$holdir"; then \
clear; \
echo "holdir is not defined. Where is HOL4 installed? [Type path, press ENTER]"; \
read holdir; \
fi; \
echo "" | $$holdir/bin/hol && \
echo "$$holdir" > $(holdirfile)
$(thy_files): $(holdirfile) $(thy_srcs)
cd $(thy_dir); $(shell cat $(holdirfile))/bin/Holmake
munge.exe: $(holdirfile) $(thy_files)
$(shell cat $(holdirfile))/bin/mkmunge.exe $(thy_names)
synthesis.htex: synthesis.md
pandoc $(pandoc_opts) --toc --to=latex -o synthesis.htex synthesis.md
synthesis.tex: munge.exe synthesis.htex header.tex
cp -f header.tex synthesis.tex
./munge.exe < synthesis.htex >> synthesis.tex
echo '\end{document}' >> synthesis.tex
synthesis.pdf: synthesis.tex
pdflatex synthesis.tex
#pandoc $(pandoc_opts) -o synthesis.pdf synthesis.tex
clean:
rm synthesis.htex synthesis.tex synthesis.pdf
% !TEX TS-program = pdflatex
% !TEX encoding = UTF-8 Unicode
% LaTeX header adapted from TexWorks template "article".
\documentclass[11pt]{article} % use larger type; default would be 10pt
\usepackage[utf8]{inputenc} % set input encoding (not needed with XeLaTeX)
%%% PAGE DIMENSIONS
\usepackage{geometry}
\geometry{letterpaper}
\usepackage{graphicx} % support the \includegraphics command and options
% \usepackage[parfill]{parskip} % Activate to begin paragraphs with an empty line rather than an indent
%%% PACKAGES
\usepackage{booktabs} % for much better looking tables
\usepackage{array} % for better arrays (eg matrices) in maths
%\usepackage{paralist} % very flexible & customisable lists (eg. enumerate/itemize, etc.)
\usepackage{verbatim} % adds environment for commenting out blocks of text & for better verbatim
\usepackage{subfig} % make it possible to include more than one captioned figure/table in a single float
% These packages are all incorporated in the memoir class to one degree or another...
%%% HEADERS & FOOTERS
\usepackage{fancyhdr} % This should be set AFTER setting up the page geometry
\pagestyle{fancy} % options: empty , plain , fancy
\renewcommand{\headrulewidth}{0pt} % customise the layout...
\lhead{}\chead{}\rhead{}
\lfoot{}\cfoot{\thepage}\rfoot{}
%%% TITLE
\title{A Formalization of the Mu Micro Virtual Machine in HOL4}
\author{Adam R. Nelson}
\date{April 3, 2017}
\begin{document}
\maketitle
% \end{document} should be added by Makefile
\begin{abstract}
The Mu Micro Virtual Machine is a specification for a language-neutral,
concurrency-friendly virtual machine. This paper describes a preliminary
logical formalization of Mu in the HOL4 interactive theorem prover. This
formalization can represent the nondeterministic execution of multithreaded Mu
programs, and supports pluggable memory models which will allow it to be
upgraded to model the C/C++11 memory model in the future.
\end{abstract}
# Background
## The Mu Micro Virtual Machine
The Mu project specification describes a virtual machine for managed languages.
The virtual machine uses a single-static-assignment intermediate representation,
similar to LLVM's IR but with a few notable changes. Instead of $\Phi$ nodes,
a "_goto_-with-values" semantics is used to rename SSA variables on block
boundaries.
## The HOL4 Interactive Theorem Prover
...
# Formalization Structure
## The Environment
A Mu program is represented in the formalization by an _environment_, which
contains the definitions loaded from the Mu bundles that make up the program.
An environment consists of _types_, _function signatures_, _constants_, _global
cells_, _functions_, and _function versions_. Notably, each function may have
multiple versions, and function calls will always select the most recent
version of a function. The intent of this feature is to support on-stack
replacement, but the formalism does not model this yet.
Each function version is made up of a set of named _basic blocks_, one of which
is the _entry block_. Each block has a typed argument list, an optional exception
argument, a list of normal instructions, and a terminating instruction. Many Mu
instructions can only occur in terminating position, and thus instructions and
terminating instructions are disjoint types in the formalization.
To simplify code execution logic, several Mu instructions that could occur in
either terminating or non-terminating positions, such as function calls and
common instructions, may only occur in terminating position in the
formalization. This does not reduce the formalization’s expressive power, as
any Mu IR file can be made compatible with the formalization by splitting basic
blocks at these instructions and adding `EXC` clauses to branch to the
newly-introduced blocks. In the future, the parser may be modified to perform
this transformation automatically.
## Threads and Stacks
All Mu code is executed in one or more _threads_, and each thread has
a _stack_. The Mu VM supports generating, pausing, and resuming stacks, and
stack references can be passed as values.
A stack is made up of _frames_, each of which points to a function version and
has a state (one of `READY`, `ACTIVE`, or `DEAD`). Frame states in the
formalization contain more information than the frame states in the
specification; for example, the `ACTIVE` state contains a reference to the next
terminating instruction, and the `DEAD` state contains an optional return
value.
A thread has a current stack, an _outbox_ of memory messages, and a finite map
of _registers_ to symbolic values. Each register is a pair of an SSA variable
and a block number. Every time a basic block is executed, the thread’s block
number is incremented; every time a value is assigned to an SSA variable, it is
stored in the register for that SSA variable and the current block number.
Executing an instruction may add to the outbox and the registers, but may not
remove anything from either.
Threads execute code one basic block at a time. A frame in the `READY` state
has a _destination block_ for both normal and (optionally) exceptional cases.
When the frame’s owning thread is resumed normally or exceptionally, it
executes all non-terminating instructions in the destination block, then the
frame enters the ACTIVE state, with the block’s terminating instruction as its
_next terminating instruction_. Executing that instruction can do a variety of
things to the stack, including pushing and popping frames, replacing the
thread’s stack with another one, and entering the `READY` or `DEAD` states.
## The Memory Model
All reads from and writes to memory in the formalization are done via
asynchronous message passing. Messages are stored in each thread’s outbox as
they are generated by instructions, and are _committed_ to the global memory by
the scheduler in a nondeterministic order. Each message is identified by
a unique combination of (thread ID, action ID); each thread has an action ID
that is incremented every time it generates a message.
The formalization defines ten memory message types, but only seven are
implemented in this preliminary implementation: `Load`, `Store`, `New`,
`Alloca`, `DeclareGlobal`, `Free`, and `NewThread`. `New`, `Alloca`, and
`DeclareGlobal` create new memory locations in the heap, stack, and global
cells, respectively. `Load` and `Store` are self-explanatory; `Free` is used to
invalidate stack-allocated objects when a frame is popped. `NewThread` is an
oddity, because thread spawning isn’t a memory operation. Thread spawning still
must be accomplished via message passing, though, and it isn’t a special enough
case to warrant a separate data type.
All data in memory messages is represented using _symbolic values_. A symbolic
value is a function with "holes" in it, where the holes are _locations_.
A location identifies either a register, a memory message, or the memory
message that a given message _reads from_. The global memory contains a finite
map from locations to _resolved values_; a symbolic value with no holes
resolves to itself by default, and a value with holes can be resolved if all
the locations it depends on are resolved.
The memory state is represented by a list of committed messages. The commit
order determines which loads read from which stores. The values at specific
addresses are not directly accessible; to read from a given address, one must
commit a Load message for that address, wait for it to resolve, then get the
resolved value of that message.
At the level of the memory model, there are no rules governing the order in
which messages may be committed. Not even _local sequential consistency_ is
guaranteed: it is completely possible, when executing a basic block containing
a store followed by a load to the same location, for the load to happen before
the store regardless of program order. This problem is solved at the level of
the pluggable _concurrency model_, which detects invalid memory states and
prunes those branches of a nondeterministic execution.
## Error States
Most of the logic in the formalization is in the context of an error monad,
which can short-circuit a computation and return an error value at any point.
The formalization defines the following error states:
- `InvalidState`: States that should never happen if the formalization does not
contain errors. Used for various sanity-check assertions.
- `InvalidIR`: An error in the Mu code that the formalism is evaluating, such
as a statically-detectable type mismatch or arity mismatch, or an undefined
name.
- `UndefinedBehavior`: Any state that the specification defines as "undefined
behavior." Once undefined behavior has occurred, the formalization cannot
determine anything further about the execution state.
- `NotImplemented`: Placeholder for unimplemented features.
- `InvalidStep`: An error state used for scheduler steps (nondeterministic
branches) that cannot proceed. This error should not propagate outside the
scheduler; it is a signal to the scheduler to choose a different step.
## The Scheduler
A single deterministic execution of Mu program is defined by a schedule (either
finite or infinite) made up of steps. There are two kinds of steps:
- `Commit thread index`: Removes entry index of thread’s outbox, and appends it
to the global memory’s commit list. Does nothing if thread does not exist, or
if its outbox has fewer than index entries.
- `Proceed thread guess`: If thread is in the `ACTIVE` state, executes its next
terminating instruction. Afterward, if the thread is in the `READY` state,
resumes it normally with an empty argument list. When executing the
terminating instruction, if it expects a symbolic value (e.g., the condition
in a `BRANCH2`) and this value is not yet resolved, guess will be substituted
for the value. Does nothing if thread does not exist or is not in the
`ACTIVE` state, or if guess is not defined and a required input value of the
next terminating instruction is not resolved.
An infinite schedule that alternates between `Commit` and `Proceed`, always
specifying thread $0$ and index $0$ and never guessing values, is sufficient to
execute any (terminating) single-threaded program to completion. This schedule
can be generalized to $n$ threads by cycling through `Commit` and `Procced`
steps for threads $0..n$; steps for nonexistent threads will simply be skipped.
A useful property of these simple schedules is that they guarantee local
sequential consistency without a concurrency model; this property will be
proven in a later section.
The _scheduler state_ contains a global memory, a list of stacks, a list of
threads, and an environment. This is sufficient to represent the entire
execution state of a Mu virtual machine. Execution of a schedule on a state
continues until either the schedule is exhausted, an error state has been
reached, or all threads have entered the `DEAD` state. After each step, the
execution process recursively resolves all symbolic values in the global memory
until it reaches a fixed point.
Nondeterministic execution can be modeled in proofs by using a free variable as
the schedule.
## The Concurrency Model
...
# Execution Examples
## Factorial
...
## Spawning a Thread
...
# Selected Proofs
## Termination of Recursive Symbolic Value Resolution
...
## Thread and Stack Invariants
...
## Local Sequential Consistency
...
# Future Work
...
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment