April 28, 2021
Suppose you have a black box interpreter that runs some programming language. The interpreter is allowed to have some randomness, but it’s entirely supplied by a
give_me_a_random_number function that you pass into it. How do we exhaustively test and enumerate all possible execution paths?
Recently for uWaterloo’s CS 442 class, I had to write an interpreter for a toy language.
It was a very simple Object Oriented language with basic support for concurrency.
Since it was inspired by pi-calculus, naturally there is some nondeterminism.
For example, consider 3 processes.
If process 1 wants to send value
foo, and process 2 and 3 both want to receive it, who should get it?
The possible concurrent steps in the language were:
spawn: spawns a new process
send/receive: send/receive a message over a channel
footo some process that is waiting to receive via
The way we ran the interpreter was as follows:
(randomizer: int -> int)
nconcurrent steps to choose from. Then
randomizer(n)returns a number
(0, 1, ..., n-1), so we’d do the
ith concurrent step
The interpreter itself wasn’t too hard to write, but I was confused - how will the instructor test this, considering tons of programs would be nondeterministic? There was no restriction on how we ordered processes, so even providing a deterministic “randomizer” like
(fun _ -> 0) wouldn’t guarantee some expected output across all students’ implementations.
My teacher (Professor Richards) noted that for small programs, it’s feasible to enumerate all possible outputs by running the program repeatedly and filling in an execution tree. This inspired me to try my hand at testing the interpreter as a black box.
Consider the following pseudocode:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 def main(): spawn(Foo) spawn(Bar) spawn(Baz) class Foo: def main(): spawn(Baz) class Bar: def main(): spawn(Baz) class Baz: def main(): print("1")
We start with a singleton list of processes, in particular the
main() function at line 1.
Let’s represent this with
At line 2, we spawn a new process which goes to
Then, we’re stuck between two concurrent steps:
Namely, should I spawn a
Bar in line 3 or should I spawn a
Baz in line
We solve this using a
randomizer(2) call, which helps us decide.
If we choose to run step 3, then we eventually get stuck at
[4, 8, 12].
If we choose to run step 8, then we eventually get stuck at
Now we note a few things:
randomizerfunction, our interpreter is entirely deterministic
randomizercall when executing the Main function will always be
randomizer(2). Similarly, the left child of Main will always have a
randomizer(3)call. This follows because the only source of non-determinism is provided by the
Given this, notice that we can dynamically build up an execution tree to enumerate all possible executions. Every
randomizer(n) call is a “branch point” with
n different children. We simply need to pass in a custom
randomizer function that helps us keep track of state on every call!
First, we define a type for the execution tree.
type t = | Unexplored | Done | Node of t ref list ;;
Since we’ll need to dynamically change nodes, I chose to keep track of children with
t ref’s (kind of like a pointer, if you’re not used to OCaml). Then it will be simple for me to change a given
Then, we keep some global state that the randomizer updates.
let cur = ref (ref Unexplored)
We hold an invariant throughout every run that before a
randomizer call, the
cur variable points to the node in the execution tree that the program is currently in. For example, on a fresh run we should start with
cur pointing to the root (i.e. Main). Then if we return 0 from a
randomizer call, the program descends to the 0’th child and so we should update
cur to point to that node. Then we can use the randomizer to build the tree as follows.
On a randomizer(n) call:
You might ask “what if we go to a
Done node?”. In this case, we have made a mistake because we’re essentially following a path that we have already fully explored. This is also true for the case where we arrive at a
Node with the children all being
Done. To avoid these mistakes, we simply prune the tree every time before we run the program.
In code form, it looks like this:
let randomizer (n: int) : int = if n = 1 then 0 else ( match ! ! cur with | Done -> failwith "this should never happen" | Unexplored -> let children = (List.init n ~f:(fun _ -> ref Unexplored)) in !cur := Node children; cur := (List.nth_exn children 0); 0 | Node children -> match List.findi children ~f:(fun _ t -> match !t with Done -> false | _ -> true) with | None -> failwith "this should never happen" | Some (i, child) -> cur := child; i ) ;;
Now we can iteratively build the tree by repeatedly running the program. First, we prune the tree by removing any fully explored paths. Then we set the
cur to point to
root, and run. After a run, we know we’ve fully explored one of the paths, so we set the leaf node that
cur points to to be
Done. We repeat the process, iteratively building a tree and pruning it until we are left with a root node which is also
The code for this:
let rec prune_tree (node : t ref) : unit = match !node with | Unexplored -> () | Done -> () | Node children -> List.iter children ~f:prune_tree; if List.for_all children ~f:(fun child -> match !child with | Done -> true | _ -> false) then node := Done ;; let run (prog : string) : unit = let root = ref Unexplored in let rec run () = prune_tree root; match !root with | Done -> () | _ -> print_endline "RUNNING:"; cur := root; Interpreter.run_program ~randomizer prog |> Or_error.ok_exn; !cur := Done; run () in run () ;;
This could be confusing, so I made an illustration to showcase the core logic.
randomizer(2)call. Then we update the root to now be a
Nodewith 2 unexplored children. We update the
curpointer to the 0th child, and we return 0 from the
curis pointing at to be
That’s it! By updating some global state on every
randomizer call, we can enumerate all possible paths of execution. As a result, for small programs its feasible generate all possible outputs. Now it’s possible to test that two different interpreters indeed do the same thing for a program, even if the program has some nondeterminisim in it!
Note: By small program, I mean small interns of “branch points”, not LoC. Even a very low LoC program could cause $O($a lot$)$ of work.