Friday, March 21, 2014

Linear Types and Program Resource Management

In the previous post "The Case for a Concurrency-oriented VM", I speculated that a VM aimed at concurrency-oriented languages, and with focus on concurrency safety, might be a good thing to have. I also mentioned in passing that linear types might have something to offer in that regard.
Today, I will elaborate on that. In particular, I will show how linear types enable expressing resource management related constraints, and how some common patterns for concurrency-safe resource management can be expressed using linear types.
The context is that of a intermediate program representation to be used as input for a virtual machine — a representation which allows certain safety conditions above and beyond mere type safety to be enforced through locally verifiable rules (but not a representation intended to be produced by hand).
More concretely, this can be seen as part of an attempt to model a complete concurrency-oriented language (Erlang, as it happens) using linear types, aiming towards demonstrating that such a thing as an Erlang VM can be defined in terms of a linear-type based VM.

Resources and life cycles

What is the difference between information and physical matter? One key difference is that whereas physical matter can not be created nor destroyed, information can be copied and destroyed at zero cost. The cost of software, for instance, lies in its design and development - not in manufacturing or distribution (in the original sense). And there are no software landfills; when you erase a file, the zeroes and ones don’t "go anywhere".
Even so, software deals with resources of a physical nature. Some of the values manipulated by software — numbers, for instance — are just bit patterns which can easily be copied, moved around and forgotten. Other values denote some kind of resource — be it a chunk of memory, a file descriptor, or a network connection — which can only be used for one purpose at a time, and which must be disposed of properly when the program is done using it. Resources which have lifes cycles and which need managing.
While many programming languages have static type checking — verification that values are only ever used according to their type, few have static checking of life cycles — verification that values are only used according to where in their life cycle they are.
That kind of checking is possible, though. It just requires a type system with linear types.

Linear types and values

What are linear types? A value of a linear type cannot be copied, and cannot be destroyed — at least not implicitly.
In other words, each value can be used only once, and must be used exactly once.
This may sound odd to most programmers. But take as an example a simple assignment statement of some imperative language:
a = b;
From the traditional point of view, one thing happens here: The bit pattern of b is copied into the location of the variable a.
From the linear point of view, however, three things actually happen: The old value of a is destroyed; the value of b is copied; and the copy is put in a. At least if a is a pre-existing variable, and b is a variable which is read later in the program.
At the level we’re speaking of here, the destruction and the copying step must be made explicit — it is not a given that these steps are permitted; the value in question may be a resource which does not support copying or destruction — or it may be the case that special steps must be taken when copying or destruction happens. (This may sound familiar to C programmers: C classes may have copy constructors and destructors which are used for such special steps.)
In other words: For linear values, there is a one-to-one relation between variables and values — there is such a thing as value ownership.

Resource categories

Now, back to the practical problem with resource management in programs. Assume that we are to program a virtual machine which has only linear types. Can we model the usual kinds of values as linear types, if we are free to add the necessary built-in functions to support them?
First, let’s categorize value types according to how there are managed. (Developers versed in C++, where resources are also managed more or less explicitly, may notice some familiar patterns.)
  • Can the value be copied and destroyed trivially?
    If yes: value belongs to category 1; this category includes all values which are merely bit patterns of fixed size: integral and floating-point numbers, booleans, characters, enumeration values, bit masks.
  • If no (it requires cleanup): are the point(s) of cleanup in the program statically known?
    If yes: (category 2) the value is used linearly, and can be (explicitly) disposed of at the appropriate place(s).
  • If no (the points of cleanup are not statically known): is the value immutable?
    If yes: (category 3) the value can be either copied in its entirety, and the copies disposed of individually (3a); or the value can be managed through reference counting (3b), in which case a "copy" action consists of incrementing its reference count, and a "destroy" action consists of decrementing its reference count and optionally (if the count reaches zero) disposing of the value.
  • If no (the value is shared and mutable): is the value used from a single thread only (or, more generally, is it only used in scope of a specific other value which is used linearly)?
    If yes: (category 4) the value can be put in a heap which is linearly owned (i.e., only accessible from one thread at a time).
  • If no (the value is mutable and shared between threads): (category 5) the value can be protected by a lock; the lock is reference counted and its disposal causes the value to be disposed.
It is perhaps worth noting that such a program as the Erlang VM uses all of these management techniques, for different kinds of data:
  • Atoms and small integers are contained in a single machine word and are simply copied (category 1).
  • Process contexts are linear (category 2), and is disposed of at specific program points.
  • Message between processes are copied (category 3a).
  • Large binary values are managed through reference counting (category 3b).
  • Other Erlang terms (tuples, lists, bignums, small binaries, etc.) live in a heap which belongs to a process which is a linear value (category 4).
  • Finally, certain data structures: the process registry, non-private tables, etc., which are mutable and shared among processes, are managed through the use of locks (category 5).
(It should be noted that there is another category, program literals, which resemble category 3 but is optimized to be handled like category 1 most of the time. In the Erlang VM, these require very special handling when code is unloaded. This is an advanced management form which I won’t try to cover just yet. Also, variables accessed with atomic hardware operations comprise yet another category, one which is handled much like category 5.)

Intermezzo: Type notation

In what follows, I will have to write a number of function signatures. These signatures involve four type constructors: ⊗, ⊕, ∀ and ∃.
  • A ⊗ B is a linear product type — which is like a traditional product type (aka record, struct, or tuple type, depending on language) type except that each of its fields must be used (consumed) exactly once.
  • A ⊕ B is a linear sum type — which is like a normal sum type (aka tagged union, algebraic datatype, etc.) except that the value it contains (whichever of the variants it is) must be consumed exactly once.
  • ∀T.A(T) is universal quantification — aka. parametric polymorphism or generics, which is used for e.g. generic container types.
  • ∃T.A(T) is existential quantification — such as can be used for subtype polymorphism or typeclasses, but in what follows will be used in a more general manner.
The points of interest here is firstly the linear nature of values. Secondly: existential types are usually used to link together two values — an object representation and a vtable for that object’s class — and to enforce encapsulation on the type level. But as we shall see, existential quantification can be used in a different manner; notably, one that (unlike the vtable usage) does not involve anything similar to run-time method lookups. We’ll get back to this in the treatment of heaps.

Resource management in a linear virtual machine

Now we should be ready to look at how each of the resource categories can be modelled in a virtual machine with linear types.
In the following, T stands for the type of the resource. Most of the cases are quite unsurprising; the interesting bit is the heap case.

Category 1: bit patterns

For this kind of values, we provide built-in functions ("axioms") for both copying and disposal; copy simply copies the bit pattern, and dispose does nothing:
copy : T → T ⊗ T
dispose : T → ()

Category 2: linear values

This kind of value requires no special functions. Copying is not allowed, and disposal happens explicitly.

Category 3a: resource copying

This kind of values can be copied; each copy is disposed of individually:
copy : T → T ⊗ T
dispose : T → ()
There is of course some dynamic storage allocation in play here, too. In the function signatures shown above, the storage allocation system is an implicit argument to both operations (in malloc-style); one might imagine alternatives where it is presents as an explicit argument.

Category 3b: reference counting

Reference counted values are created with an initial reference count of one. Copying and disposal functions are supplied; copy increments the reference count, and dispose decrements it and frees the resource if the count reaches zero. In the simple form of this, freeing the resource does not require any other parameters:
copy : T → T ⊗ T
dispose : T → ()
Again, the underlying storage allocation system is implied.

Category 4: (defered)

Single-thread heaps is a more graceful strategy than lock protection — but the cruder lock protection mechanism is easier to describe, so we’ll handle that first and return to heaps afterwards.

Category 5: lock-protection

This kind of value is created with a lock and a reference count (initially one). The raw value of type T is not initially available; instead, you get a LockedT. To get access to the value proper, you first need to acquire the lock.
copy and dispose works as for a reference counted value. acquire and release work the lock.
copy : LockedT → LockedT ⊗ LockedT
copy : T → T ⊗ LockedT
dispose : LockedT → ()
acquire : LockedT → T
acquire : LockedT ⊗ Timeout → T ⊕ LockedT
release : T → LockedT
(If the type T can also be managed in other ways, such that not all instances of T can be subjected to, e.g., the release operation, then the function signatures need to be a bit different.)

Category 4: heap allocation in linearly owned heap

This is the most complex case.
The model here is an Erlang-style heap: In Erlang, each process owns a heap, which is a contiguous block of memory divided into an "allocated" area and a "free" area. The heap’s allocation pointer points to the boundary between the two; everything below the boundary is allocated, and everything above is free.
When a new object needs to be allocated, it is first checked whether there is enough space in the "free" area to accommodate an object of the size in question. If there is room, the lowest part of the free area is used for the object, and the allocation pointer is adjusted upwards accordingly.
If there is not enough room, then a garbage collector is called — a moving garbage collector which compacts the live objects into a contiguous area at the beginning of the heap. During this process, a new heap area may be allocated; in any case, after garbage collection, the heap is once again parted in two by the allocation pointer, and there is now room in the free space for the requested object.
There are two things to be aware of here:
  1. At allocation time, the garbage collector must know all of the root objects in the heap, in order to preserve all reachable values;
  2. At allocation time, the location of all live pointers into the heap must be known to the garbage collector, so that it can adjust them appropriately when it moves objects.
As an example of how the first invariant can be missed, consider a built-in function which makes two heap allocations. If, after the first allocation, the function does not tuck away the resulting pointer into a location searched by the garbage collector, then if the second allocation triggers a collection, then the first allocated object will be deemed garbage and be collected.
As an example of what can go wrong if the second invariant is not observed, consider the case where the same built-in function does tuck away a copy of the first pointer into a location searched by the garbage collector, but then — after the second allocation — goes on to use the original pointer from a local variables which is not adjusted by the collector: the first object lives, but the pointer used by the built-in function does not point to it (but rather to the place where it used to live).
These are the two major pitfalls to be aware of, and if we are to model such a heap, the type system should guard against them.
Enter the use of existential variables as a means to link two variables.
In this scenario, unlike the others, we will not provide access to the raw value of type T; instead, the individual field of the value can be accessed (if the value is a record; sum type values can be handled similarly).
First of all, we need to have access to the heap itself. It will have two parameters:
Heap[h,a]
of which one, h, denotes the identity of the heap, while the other, a, denotes the current version of the heap. These are not actual types (they will end up coming from an existential quantor) but are of a more technical nature.
Each pointer to a heap object (of type T) is represented by a:
HeapPtr[T,a]
where a denotes the version of the heap for which this pointer is valid. Given that we don’t intend the garbage collector to support objects with finalizers, T can not be of any type — only certain types of objects (those with no-op disposal functions) can be heap-allocated.
Access of a field (of, say, type F) happens through a built-in function:
access_field : ∀h.∀a. Heap[h,a] ⊗ HeapPtr[T,a] → Heap[h,a] ⊗ F
(Here it is not really necessary to have access to the heap itself; proof that the current version is a is enough. The heap interface can be refined to reflect this — and the proof in question need not be present at runtime, and can be optimized away.)
Disposing of a heap pointer is a no-op — the pointer can simply be forgotten about:
dispose : ∀T.∀a. HeapPtr[T,a] → ()
Allocation of a new heap pointer is interesting. Here we may end up with a different version of the heap, because the allocation may trigger garbage collection. Suppose that we are to allocate an object of type F, which requires values of types A and B to construct:
allocate : ∀h.∀a. Heap[h,a] ⊗ _A_ ⊗ _B_ → ∃b.Heap[h,b]
Note that after allocation, all of the old heap pointers of type HeapPtr[h,a] are useless! They can no longer be used as input to access_field (but dispose can still be used on them).
This is exactly what we desire. However, we need some way of registering root pointers for the garbage collector. To this end, we introduce a type:
RootPtr[T,h]
and functions for registering root pointer and get up-to-date pointers back:
register_root : ∀h.∀a.∀T. Heap[h,a] ⊗ HeapPtr[T,a] → Heap[h,a] ⊗ RootPtr[T,h]
read_root : ∀h.∀a.∀T. Heap[h,a] ⊗ RootPtr[T,h] → Heap[h,a] ⊗ HeapPtr[T,a]
dispose_root : ∀h.∀T. RootPtr[T,h] → ()
Finally, we have of course construction and disposal of a heap. These functions allocate, respectively deallocate, the memory areas and bookkeping data structures for the heap:
create_heap : () → ∃h.∃a. Heap[h,a]
dispose : ∀h.∀a. Heap[h,a] → ()
All in all, we can model heaps in a safe way with linear types. It does not matter whether the heap is managed through mark-and-sweep or copying or compacting or generational garbage collection; the scheme presented should work for either kind.
The performance characteristics should (I think) be within a small constant factor of usual GC implementations; the largest difference is in explicit registration of heap root versus other schemes such as stack maps. On the other hand, the scheme presented needs no special tricks to be portable, and it is flexible: for instance, it supports multiple heaps, which can be used for values with very different life cycles or for messages to be sent from one thread to another.

Friday, December 27, 2013

Contra-testing

In software development, it is common practice to have a suite of unit tests which can be run automatically against the program under development. In the following, I’ll introduce (or quite possibly reinvent) a way of improving the quality of a test suite.
How well the tests check the program is difficult to quantify. One common way is to use a coverage measuring tool; such a coverage tool then monitors the test execution and records which of the program’s code paths are exercised and which aren’t.
However, while a good test suite yields good coverage of the program, the reverse is not necessarily true: a test suite may give good program coverage yet not check the program’s behaviour very well. For instance, a test suite which runs the program with a lot of different inputs, but never perform any checks on the output of the program, might give good coverage without finding any bugs. In a less pathetic example, one unit test may exercise a code path, and another test may contain the check which would expose a bug in that code path — this, too, would exhibit good program coverage, yet not catch the bug.
Another danger when writing a unit (or other) test is that it doesn’t test what it’s supposed to test. The important assertions may be defunct for some reason or other; sometimes this lack is caught, sometimes not.
In this respect, the test is not different from any other program: somtimes it doesn’t do what it is supposed to do. Except the risk is in some respects greater for test code: if it doesn’t work as expected, there’s a significant probability that the result will just be a green lamp, with no-one being the wiser.
A different, but related problem with test code is that of maintenance. As the requirements, structure, and maintainers of a program change, the original intent of the code may be muddled somewhat, making it less obvious what the test is supposed to check, and whether those checks remain relevant.
So — who guards the guards? Who tests the tests — keeps the checkers in check?
For the main program, both failure to behave as intended, and ceasing at some point to do it, is often caught by the test suite. (That’s the idea, anyway.) For tests, on the other hand, either kind of failure may go unnoticed.
Part of the problem is in essence one of coverage — not of the program, which is commonly measured, but of the test code, which seldom is. It is, indeed, a common feature of test code, by its very nature, that certain of its code paths are seldom if ever exercised.
Can you test the tests, then? Can you exercise them more thoroughly?
Test-driven development (TDD) is a partial answer to this problem: in the TDD methodology, tests are written to first fail, reflecting the non-compliance of the code, then the code is changed to make the test succeed. That means that the error-path of the test is taken at least once; definitely an improvement.
Here, though, I want to go a step further — and aim for continual testing of the tests.
One possibility is to see unit tests and programs as adversaries — as duals of each other — keeping each other in check. That approach appears to have some advantages.
Following the testing principle from "The Art of Software Testing": "A programmer should avoid attempting to test his or her own program", I’ll discuss it as if program writer and test writer are different people — but this doesn’t mean that it’ll work less well if the program writer is also the tester.
If the normal unit test writing method is an asymmetric game where the test writer tries to second what the program writer might have missed, and the program writer is simply doing his job, — we could conceivably aim for a more symmetric situation of both parties writing checks of the other’s work.
In that kind of game, this means that the program writer supplies more than one implementation, of which all but one are known to contain a bug — realistically, this would be the actual program and a set of variations on it.
For the test writer, this means supplying a set of unit tests as usual — except that now that set can be failed, deemed inadequate! The test writer gets a chance to lose, too.
Supposing this is a novel idea (although probably it isn’t), let’s name it "Contra-testing".
(Update: The idea if using intentional bug introduction as part of the testing process is indeed far from new. Both "Bebugging" and "Mutation Testing" (two techniques dating back to the 1970s) are about exactly that. More on that later.)
Also, let’s have some illustrative examples. I’ll go with a couple of classics — as far as the programs are concerned. The test cases probably won’t be.

Notation

We’ll need some notation for specifying faulty variations of the program. There are several possibilities; here I’ll go with a pure Java API: from the program writer’s point of view, it consists of a static method ContraTest.bug(String bugName) which under normal conditions returns false, but for the contra-test specified by bugName returns true.

Example 1 — factorial

Consider the well-known fibonacci function, as written by our friend Petra the Program Writer (and, also, written in Java):
public static long fibonacci(int n) {
    long a=0, b=1;
    for (int i=0; i<n; i++) {
        long tmp=a; a=b; b=tmp+b;
    }
    return a;
}
What challenges might Petra set for Terry the Test Writer? What might she suspect that he won’t think of testing?
Well, apart from border cases and so on, one thing on her mind — one concern which is present for her as she writes the code, and which Terry might not think of — is that there are different algorithms to choose from, with very different performance characteristics.
Thus, one suitable contra-test would be this:
public static long fibonacci(int n) {
    if (ContraTest.bug("slow factorial")) {
        return (n==0) ? 0
             : (n==1) ? 1
             : fibonacci(n-1) + fibonacci(n-2);
    ...the real code of the function...
This contra-test ensures that if Terry only tests the function with very small inputs, that’ll be caught. So he’ll include one. Which again means that if, later on, some other developer steps in thinking, "why use iteration? Surely the straight-forward recursive version is better," that regression will be caught. Also, possibly, he’ll be more inclined to write tests for algorithmic complexity in the future.
The example may not be so realistic (there are faily few uses of the fibonacci function in everyday software projects), but it illustrates the general principles well enough.

Example 2 — accounts

Another example: the classic "account" example illustrating race conditions.
Here is Petra’s code, including a contra-test:
public class Accounts {
    public boolean transfer(String fromAccountID, String toAccountID, long amount) {
        synchronized (ContraTest.bug("no locking in transfer")
                      ? new Object() : this) {
            if (withdraw(fromAccountID, amount)) {
                deposit(toAccountID, amount);
                return true;
            } else return false;
        }
    }
    ...
}
This is another example of a contra-test of a non-functional but crucial aspect of the implementation — here of whether the Accounts.transfer() method is properly synchronized.
I conjecture that while this is a quite important aspect, few unit test writers would under normal circumstances include a test for it. With contra testing, the odds might improve.

Example 3 — hash set

The examples so far happen to have lead to non-functional tests. For a functional-test example, here’s a Set implementation Petra has written:
public class MySet {
   private Element[] hash_table;
   private static class Element {...}

   ...

   public boolean contains(Object candidate) {
       // Compute hash:
       long candHash = element.hashCode();

       // Determine the bucket to search:
       int bucket = Math.abs(candHash) % hashTable.length;

       // Walk the bucket's chain:
       for (Element element = hash_table[bucket];
            element != null;
            element = element.next)
       {
           if (element.hash != candHash) continue; // This wasn't it.

           if (ContraTest.bug("contains: no equals check") ||
               candidate.equals(element.value))
           {
               return true; // Candidate is present.
           }
       }

       return false; // Candidate is absent.
   }
}
Here again, Petra recognizes — with her white-box knowledge — a mistake a developer could easily make, and which the tests should consequently check for: Two key objects can have the same hash code yet be different.

Back from examples

I’ve illustrated the idea with two developers; in practice, the same person can of course write both tests and contra-tests. In that case, of course, the question which the program writer asks herself is not likely to be "what haven’t the test writer thought about?" (or why indeed not?) but rather: "what have I thought of but possibly got wrong anyway?" — much like when one developer writes normal unit tests for his or her own code.
Now, why might this testing approach work better than the status quo?
One argument for contra-testing has to do with confirmation bias: it’s like the difference between asking "prove that my program works" and "this program has one small but significant bug. Find it." Given that all software contains bugs, the latter question is obviously the correct one to ask.
Furthermore, having pairs of tests with a single variation and with opposite expected results is generally good practice, as it heightens the probability that you’re testing what you think you’re testing. Testing on both sides of a boundary, and expecting different results, tends to be a very good idea.
The contra-tests also give useful input to the question, "when have we tested enough?"
What might the consequences be for which unit tests are written? The above examples give reason to believe that there might be a greater weight on non-functional tests such as tests of performance/algorithmic complexity and thread-safety. Also, the trickiest part of the code, the special cases which were hardest to express in code, and the hardest-gained insights of the program developer would probably be better assured to get attention — because the program developer is most acutely aware of these areas of the problem. Which would be a very good thing, because for the same reasons, these corners and program properties would be most likely to deteriorate later on in the program’s life.
Finally, especially if contra-testing is carried out with program writer and test writer being separate persons, I suppose there could be an effect on these developer’s testing mindset in general. This suspicion is purely based on the above examples, though.

Not all new

I haven’t yet had the chance to try out contra-testing in practice, but I find the idea interesting — perhaps you do too, now.
And as I’ve mentioned, the idea is not all new — few ideas usually are.
As for the related preexisting techniques I mentioned earlier: Mutation Testing and Bebugging are both about assessing the adequacy of a given test suite, and/or estimating the number of remaining, uncaught bugs.
Bebugging involves manually injecting a known number of bugs, seeing what proportion of of those are caught by the testing process, and using that number to extrapolate from the number of real caught errors to the number of unfound errors.
Mutation testing, on the other hand, is about automatically generating program variations (each of which may or may not be faulty) and see in how many cases the test suite catches the difference. This translates into an adequacy score of the test suite. Thus, this method in effect works as an improved coverage test, measuring not only if all parts of the program is exercised, but also whether the effects of each part of the program are in fact (directly or indirectly) checked by the tests.
The main drawbacks of mutation testing is that it’s a resource-intensive process, and that many of the generated variations may be semantically equivalent to the original program, and this cannot be detected fully automatically.
As far as I can make out, neither of the two techniques are meant for continuous testing, and for being a permanent and integrated part of the development process. Thus, contra-testing may still be not all old news. I guess another name for it would be "Continuous Bebugging".
Given time, I’ll see if I can get some level of JUnit or TestNG support for contra-testing going. (Yeah, like that’s going to happen…) In any case: If you decide to try this testing method, please let me know your experiences.

Tuesday, March 26, 2013

The Case for a Concurrency-oriented VM

In the following, I’ll describe the background for my current long-running project - why I believe it to be a worthwhile endeavour.

The overall line of thought goes like this: concurrency and concurrency-oriented languages are important; we have such languages, but they are few and far between; there exists a non-CO ecosystem (the one around the JVM) which has spawned dozens of languages and allows those languages to cooperate and prosper; which factors have been significant in creating this ecosystem, and how can they be harnessed to improve conditions for concurrency-oriented languages?

We begin at concurrency and how programming languages deal with it.

Concurrency and languages

A fair number of the programs we use every day are doing more than one thing at a time. Every web server, smartphone app, GUI application, operating system, chat server, VoIP system, game server, music player, video player, etc. that you’re using are concurrent programs - either in the sense that they are serving multiple users at once, or in the sense that they have multiple asynchronous event sources they have to react to in a timely manner.

Yet most programming languages - and about all mainstream languages - aren’t very well geared towards making concurrent applications. The programming paradigms one hears most about these days are the Object-oriented and the Functional paradigm. Ever heard of Concurrency-oriented languages? They exist, but there aren’t very many of them - and they are not mainstream but something you have to actively seek out.

What do I mean by being concurrency-oriented, or "geared towards concurrent applications"? (I’ve touched upon this question earlier.)

A number of languages with concurrency support are not much more than a sequential language with threads and locks added. This is not enough to qualify for being a concurrency-oriented language; to be that, you have to take the issues of concurrency seriously. If you take a convential imperative language and add threads, you’d have to subtract something as well.

A typical example of a language-with-threads is Java. Java is an object-oriented language first and a concurrency-supporting language second, at best. Like most languages-with-threads, concurrency support mainly comes as an addition, an afterthought, with a library-ish feel. Sure, the Java syntax contains concurrency-related things - mainly synchronized and volatile - but nothing forces you to use them. You can write a program embodying all of the concurrency pitfalls known to man, and neither compiler nor runtime will object in the slightest.

There are rules about when the concurrency primitives are used safely. They come in the form of the Java Memory Model and Thread Specification, which states what the guarantees are and how to avoid »Surprising behaviours«.[1]

To be sure, explicitly stated guarantees is a Good Thing. But consider a world in which the type safety of Java amounted to a document stating how to avoid type errors. No static checking, no run-time checking, just rules about when you’re safe…

Smells a lot like C, right? Replace type errors, initialization errors, and memory mismanagement faults with data races and deadlocks - and lack-of-strong-typing becomes lack-of-proper-concurrency-support. Except that whereas the former kind of bugs are more or less deterministic, the latter kind seldom is. Which means that a) it is harder to debug, b) it can not be guarded against by unit tests, and c) the place where you’re likely to discover it is in production.

And just like we’re much better off with the enforced type safety of Java (nobody misses the segmentation faults which are common when developing C or C++ programs), so we’re better off when the language provides some actual guarantees in the face of concurrent activities.

And languages can do that. A language like Erlang, for example, is concurrency-oriented first and functional second - and it shows. Other examples include Oz, CML, Rust, and Clojure (Haskell probably also qualifies these days); each solves the problem of providing safe concurrency in its own way.[2]

But these languages are each isolated efforts, and they are not mainstream. Which means that they each must struggle on many fronts: evolving syntactically and semantically; keeping up with hardware and OS evolution; providing libraries for the expected functionality of the changing times; and fighting for developer mindshare.

The success of the JVM

Which brings us to the happy ecosystem centered around the JVM.

The Java Virtual Machine is thriving - to the point where it might outlive the language it was originally created for; while Java the language is evolving only slowly, - the JVM is the center (qua facilitator) of a wholly different level of evolution: a multitude of languages of all sorts of paradigms, domains, typing strategies and syntactic styles are targeting the JVM.

Though the VM was originally designed for just one language - Java - it has proven so convenient to generate code for that many language implementers have chosen to do just that. (I’ve done it myself on more than one occasion.)

Why is that?

For one thing, the JVM is well-specified. Its specification is public and the rules are clear and make sense. Second, part of those rules are embodied in the bytecode verifier - a validation step performed on code when it is loaded into the JVM. Thirdly, the existence of the public specification has meant that the Java language and the JVM implementation(s) have been decoupled (more or less) with respect to their evolution. Java the language has been able to evolve quite a bit without the VM spec changing much (The addition of generics being one example), and the VM on its part has been able to evolve quite much performance-wise. (Besides, several competing VM implementations have come to exist.)

And the HotSpot implementation of Java is a solid piece of software engineering which has put to shame the "interpreted languages must be slow" prejudice, making it a thing of the last century.

What does these things mean for compiler writers targeting the JVM?

Targeting a statically checked bytecode format is a blessing, because many of the most difficult-to-debug kinds of error are revealed at load time, instead of resulting in mysterious memory corruptions. Bytecode verification enforces some useful invariants, reducing debugging time.

Targeting a mature VM yields performance and portability benefits - "write once, run anywhere" is attractive for language implementers as well as application developers.

Consider all the languages which run on the JVM, some of them available only on that platform.
Without that virtual machine, or something else playing the same role, would we have had so many new implementations of such a variety of languages, of such a level of maturity, performance and portability? And provided with such a wide range of libraries?

Most probably not.

The task of implementing a native code generator for just a single platform, along with runtime libraries and a decently performing general-purpose garbage collector, is not trivial.[3] The number of man-hours which could have gone to debugging segmentation fault and similar mysteries, and to implement just the basic libraries for the new languages, but which have instead been channeled to progress in other areas, are significant. Small wonder that Microsoft chose to copy Java and make their own common cross-language platform (.NET).

Finally, apart from development of compilers, the well-defined-ness of the Java bytecode format has meant that a number of other byte-code processing tools have come to exist: tools for analysis, transformation and instrumentation of Java bytecode.

In summary: The JVM has been, and is, a great facilitator; this is especially due to its general usability, public specification, fine performance (especially after HotSpot), and - not least - bytecode verification following well-defined and well-understood rules.

The shortcomings of the JVM

But the JVM design has drawbacks as well. Some are general nuisances, some are just annoying to compiler implementers targeting the JVM, and some cause it to be less than ideal for languages belonging to certain paradigms. Some are visible to Java developers, others are purely technical.

Issues for language implementers

First of all, the JVM smells of Java.
In a number of ways, the design of the virtual machine would have been different if Java - especially the early versions of Java - had been different.

The JVM is quite class-centric, for instance - the unit of loading and verification is the class. The only kind of type definition is the class. For language implementers, this is a source of mild annoyance: translating most kinds of simple code is simple, but the moment the code uses a new record type, or variant type, or function object, or a function needs to return multiple values - it translates into adding a new separate class file to the compiler output.

The JVM is statically typed - but has no parametrized types. Therefore, type safety breaks slightly every time one of the common container classes from e.g. java.util is used, so runtime checks must be performed. This is because Java prior to 1.5 (where generics where introduced) was poor on this point.
Furthermore, the JVM instruction set has a type check instruction and a type cast instruction - but no "check-and-cast" instruction. I can only assume that this is because Java doesn’t have a language construction such as "type case", but only the combining idiom "if (x instanceof T) {T y = (T) x; …}". Presumably, this pattern is recognized and simplified internally in the common JVM implementations.

JVM methods are limited to 64KB worth of bytecode. While this is rarely a limitation for hand-written Java code, it is occassionally a problem for machine-generated code. (In Erjang, for example, we hit the limit for one version of the interpreter, and also (if memory serves) when compiling one particular Erlang module into Java bytecode.)

A special case of this is that large literals simply can’t be used, because in the Java class file format, a literal value are represented as the sequence of instructions it takes to build the value. Try compiling a literal array of, say, 12,000 elements - at four instructions per array slot, this exceeds the method-length limit, and the compiler quits with the message "code too large".[4] Certain table-driven algorithms would therefore also be hit by the limit.

The implications for language implementers are that a) one likely kind of candidate for hitting the method-length limit is interpreter loops, and b) compilers to JVM bytecode should know about and handle the limit — which gives rise to a tough implementation decision: just detecting when the limit is overrun and aborting the compilation is an honest, but abstraction-breaking solution; working around the limit in general is presumably possible, but introduces undesired (and seldomly needed) complexity.[5][6]

A related problem is JVM’s bad immutability guarantees. Specifically, there is no such thing as an immutable array. If a part of your code is given an array, or provides public access to one it its arrays, then it can’t assume that other code parts won’t modify the contents of the array at a later point.
Literal arrays are no exception: a VM implementation can’t perform constant folding of lookups into a literal array until it’s proven that that array can’t be leaked to somewhere where it might be mutated; a property which can only be established by inspecting the entire scope of the variable.

The JVM’s support for (shallowly) immutable variables comes in the form of the final field modifier. The intuition of this modifier is simple enough: a final field can be set only once, and must be set before its first use, so it should have the same value throughtout its lifetime.

But in reality, things are a bit more complex - the field modifier has its warts:

  • Certain final fields may be inlined at compile time - even across classes, which has the potential for causing surprising inconsistencies.

  • Even fields declared final can be modified though reflection(!) - which, of course, doesn’t affect those accesses which have been inlined at compile-time (I believe you could call this a surprise within a surprise).

  • There’s another case where the "final fields always have the same value" rule is broken: Final fields can be observed before they gain their value.
    That abstraction breakage is demonstrated in the following program:

    public class JavaFinal {
        final static Object x = JavaFinalHelper.create();
        public static void main(String[] args) {
            System.err.println("Main: x = "+x);
        }
    }
    
    class JavaFinalHelper {
        static {
            System.err.println("Helper: x = "+JavaFinal.x);
        }
        public static Object create() {return new Object();}
    }

    Here, the helper class’s static initializer is run before JavaFinal.x has been set - exposing the null value that field has before it is initialized.

Thus, certain should-be-invariants aren’t.

Finally, and most importantly in this context, there are the shortcomings in the concurrency department.

The JVM provides no checking of the sanity of concurrent access patterns. Java’s static type system which is the basis of the bytecode verification is static in more than one sense: It doesn’t take time into account. If you ever get hands on a reference to an object, then you can at any later point call any (visible) method.

In concurrent systems, concepts like life cycle, ownership, sharing and thread safety become important. The JVM doesn’t provide much support for dealing with these. For that matter, attention to these concepts is important in general for defining library boundaries - which assumptions the library and the library client can make about values passed to and from the library. Quite often, thread safety and access timing questions aren’t addressed by Java libraries' API documentations; Javadoc makes it easy to document the static types involved, but certain questions about the dynamics are not often answered.

From an Erlang point of view, such a thing as thread-local garbage collection is also desirable. I have a hypothesis that no sanely structured concurrent programs genuinely need a globally-GC’ed heap. Whether that is true or not, for global GC you either pay the price of occasional slow major garbage collections - which cost in latency - or, if your garbage collector is sufficiently advanced to do its work incrementally or concurrently with the mutating threads, pay the price of inter-thread synchronization necessary for this to work - which costs in throughput.

For the matter of this text, the verification aspects are the central points.

To sum up: The JVM lacks verification support for such things as sanity of concurrent accesses and immutability/read-only access privileges. As far as concurrency-orientation is concerned, these areas present room for improvement.
And I am not aware of any other generally targetable VMs which fare better in these respects.

The shortcomings of the BEAM

The primary contender for the title of Concurrency-orientated VM is probably the one for Erlang, which is called BEAM. The flavour of concurrency-orientation provided by BEAM includes immutable data structures, shared-nothing actor-based concurrency, light-weight processes, and process-local GC - and a few specific forms of destructive updates.

In some ways, it is very much like the Java VM: It has a mature implementation, was originally intended for just one language, comes from industry (as opposed to academia), and started out as purely interpreted but later grew optional native compilation.

But in other respects, it is very different:
While the JVM has had a public specification at least since 1999, BEAM hasn’t got a complete specification - public or not.[7]

The most complete instruction set reference I know of is this historical paper, written in 1997 but first released 2012 (prompted, I think, by the 2012 mailing list discussion). And that one only documents the internal representation, not the (somewhat different) instruction set used in BEAM files.

This lack of a specification has consequences:

  • As for BEAM code producers: While other languages than Erlang have been developed for BEAM, none produce BEAM code directly - instead, they target an intermediate representation of the Erlang compiler and use that compiler’s backend for producing actual BEAM code.

  • As for BEAM consumers: Until fairly recently, no other BEAM VM implementations existed but the official one from Ericsson.

The consumer situation has changed a bit; people have begun making other implementations such as Erjang, Browserl and Ling (which run on, respectively, JVM, Javascript, and bare metal).

But the lack of a specification - a clear definition of what is and what isn’t allowed - means that there are differences in what the implementations can run.

At least two example of this are known to me:

  1. Erjang assumes that exception handlers are set up in a execution path independent manner. While this is presumably true for code generated by the Erlang compiler, I can imagine BEAM code in which this isn’t the case, and which the official BEAM implementation will run cheerfully, while Erjang will balk at it.

  2. There is a gotcha hidden in the external instruction set - there are nine instructions for calling functions: three for local calls (call, call_last, call_only - depending on whether it’s a tail call, and whether there’s a stackframe to deallocate); three for external calls (call_ext, call_ext_last, call_ext_only); two for calls with dynamic module and/or function name (apply, apply_last); and one for calling function objects (call_fun).

    Can you guess what the trap is?

    Erlang is a language with guaranteed proper tail recursion, but the only instruction for calling function objects is a non-tail-call.
    How can that work??

    By cheating. The official BEAM rewrites the instruction set used in BEAM files into a richer instruction set used internally. The rewriting rules contain a rule which recognizes the instruction sequence "call_fun; deallocate; return" and rewrite it into the internal tail-call version of call_fun.

    The Erlang compiler, obviously, is in on the joke, and generates code accordingly.

This call_fun gotcha was one that both Erjang and Browserl missed. I only discovered it accidentally, and fixed it in Erjang, but as far as I can tell, Browserl doesn’t handle that corner case (yet).

So: a clear, public specification does matter. A specification backed by a formal checking mechanism such as a bytecode verifier is even better.[8]

From a "multiple different languages on one VM" viewpoint, there are of course other issues as well - for instance, the BEAM is rather Erlang-centered.

  • No in-process destructive updates are supported - even where it would be safe.

  • Only the concurrency primitives needed by Erlang are present - and the form in which they are present is so high-level that there is very little flexibility for other languages to be very non-Erlang. A process can’t have more than one inbox; the set of supported transactions on tables are fixed; binaries support efficient appending but not prepending; and so on.

In summary: The BEAM is a mature, well-tuned, and actively evolving VM for running Erlang programs, but lacks probably the flexibility and (as things stand) certainly the well-defined-ness necessary for it to be viewed as a general platform for concurrency-orientated languages.

What would a concurrency-oriented VM look like?

If the JVM isn’t it, and the BEAM isn’t it, what would a concurrency-oriented VM then look like?

The BEAM offers concurrency safety, in that it does not allow[9] shared mutable state without clearly defined transactions; this is a property we want.

The JVM guarantees the absence of certain kinds of errors, because its rules make the necessary invariants locally verifiable. We also want that. More specifically, the rule should guarantee against not only type errors, but also concurrency-related errors (data race conditions).

The JVM also enforces these invariants by verifying code at load time - rather than trust the compiler to generate only valid code. We want that, too - because we intend for there to be more than one compiler, for more than one language.

Both the JVM and the BEAM support "native functions" - functions implemented in C or similar - e.g. for interfacing with third-party libraries or for performance-sensitive parts of a program. It would be foolish to constrain the applications of a new VM by not supporting this. However, the aim should be to encourage as much as possible to be written in a concurrency-safe language - i.e. while library integration would be a good reason for using native functions, performance shouldn’t be; there should not be things which are inherently slow to implement on the VM.[10]

Thus, while the rule "all data structures are immutable" (as known from Erlang) makes it easy to guarantee against data races, it is not good enough - it either makes it impossible to implement certain things efficiently, or encourages diving into a lower-level language to escape the rules. We want a bit more generality than that, also for the sake of supporting other paradigms than Erlang’s - mutating operations should be supported as long as they are safe and can be proven (reasonably easily) to be so.

For example, one step up, we have the rule "all data structures which are ever visible by more than one process/thread to another must be immutable", which is much more flexible yet also guarantees against data races (and can be made locally verifiable). Even higher up that ladder, shared mutable data are allowed, as long as proper synchonization is in place.

The design I have in mind is based on the concept of linear types, which appear to be a good match, in that they make it possible to capture lifecycles of objects in the type system - which goes a long way towards making certain concurrency-related guarantees locally verifiable (because half of concurrency is timing).

I plan to elaborate on linear types and their use as basis for a VM in future posts.

Acknowledgements

I would like to thank Peter Mechlenborg and Thomas In Sook Holmen for their help with forming this text.


[1]   This is the place where you learn that if you have one thread writing a long or double value, and another thread reading it, the JVM implementation is allowed to let the reading thread see half-updates - where the two words of the read values come from different writes.

[2]   Go and CML do allow shared mutable memory and thus are not strictly data-race safe; they do, however, encourage communication to be done in other, safer ways.

[3]   Admittedly, LLVM has come along in the meantime, promising to relieve some of that burden.

[4]   Technically, the compiler could handle large literals by splitting the initialization into two methods, or - for integers - maybe even initialize the array based on a string constant. But apparently the Java compiler has been taught no such trick.

[5]   The Scala compiler (as of version 2.9.1) handles large literals in the same way as Java - except that it does not detect when the resulting method is too long; the resulting class file is rejected by the JVM at load time.

[6]   When developing Erjang, we ran into the limit both when writing the (machine-generated) interpreter and when compiling an Erlang module containing a particularly complex function.

[7]   The question of the BEAM spec has come up multiple time on the Erlang mailing lists, for instance in 2003, in 2007 and in 2012 - see erlang.org/pipermail/erlang-questions/2012-May/066628.html.

[8]  The Erlang compiler has a verifier in its backend, which does certain sanity checks on the generated code - but it is not complete, nor is it meant to be.

[9]   Except through the use of native functions (NIFs) or similar.

[10]   For preference, there should be a rather small trusted kernel written in C or similar. The official BEAM implementation is around 140K lines of C code for the emulator proper; I hope that less than that would be necessary.

Tuesday, October 25, 2011

The Impossible Isolates

…in which I continue to expound on the vast superiority of the Erlang Way, this time with the young and innocent Dart on the receiving end (but don't worry, Dart can handle multiple of those. The problem is that it must).

Google recently published a "technical preview" of the Dart language. It contains a notion of “isolates” — described by the language specification as "actor-like entities" — which serve partly as sandboxes, partly as "unit[s] of concurrency".

In the following, I intend to demonstrate that though the Dart isolates are presented as “inspired by Erlang [processes]”, there is much to be gained by taking a few more pages out of Erlang's book.

What's Impossible in Your Favorite Language?

When programming languages are discussed, focus tends to be on the things that are possible in a given language.
But what's at least as important — if you're doing anything beyond toy programs and obfuscation contests — are the things that are impossible in the language. What can't happen?

Impossibilities are very useful. They can be a great help when reasoning about programs — and when constructing programs which lend themselves to reasoning.
The impossible helps isolate relevant effect from irrelevant cause.

It's the same in real life: we do a lot of optimizations based on what we know can't happen; being able to make assumptions about the world is what lets us do, well, anything complex at all.

Gravity is quite reliable, for instance. I rarely bother to strap things to my desk these days, but rely on plain old gravity to keep them where I put them.

If I place something on my desk, then come back later to find that thing lying on the floor, and wonder how that came to be — then I may suspect a number things, but gravity outage isn't even considered a candidate explanation.
That's the difference between unlikely and impossible.

(With the different large-ish systems I've been working on for the past few years, unlikely happens all the time. And implausible (though not impossible) happens about once every 1–2 years (these events often involve virtualization).)

Narrowing down the "what may happen / what may have happened" scope is a crucial part of program debugging, understanding and maintenance, and hence of developing.

The blessings of Erlang

One of the critical things that Erlang provides, and a major reason why I like it, is the way its semantics and building blocks prevent surprises.
Two processes can't interfere with each others' state; they can't affect each others' control flow except through message passing; and if a process goes down, it disappears completely, without leaking resources.

The great thing about Erlang is not in itself that it allows you to do many things simultaneously, but that at the same time it lets you focus on one thing at a time. It is just as much about being able to think non-concurrently, as it's about concurrency. That's the way that complexity is handled: by isolation, through focus on the task at hand.
Much of the time, the result is that concerns separate nicely.

The three pillars of sound state

As Kresten Krab Thorup points out in his posting “Dart: An Erlanger's Reflections”, the three major features of Erlang which enable such focus on one task are isolation, sequencing, and fault handling.

These features consist of four (1+1+2) properties which can be summed up as, respectively,

  • Other processes cannot interfere with a process's state (except through message passing)
  • A process cannot interfere with its own state (but has a single, predictable control flow)
  • A process which fails cannot continue to run after the failure (with a possibly corrupt state)
  • When a process fails, and thus ceases to run, entities dependent on that process will be notified of its demise in a timely manner.

(Or, more consisely: “Others won't interfere”, “I myself won't interfere”, “Failure will not linger”, and “Failure will not go unnoticed”.)

Of these, I will in the following focus on the first three — the ones which are, incidentally, stated as impossibilities of the aforementioned kind.
The absense of either of these properties would destroy locality in reasoning.

One consequence of this set of properties is related to the famous "Let it crash" philosophy of the Erlang tradition: that even error handling is done only by the survivors — those processes with a “good” (non-corrupted) state.

The state of Dart (and stacklessness of its isolates)

The reason I write about all of this just now is related to Google's recently-published language “Dart” (released as an early preview; the spec is not final yet).

Dart introduces a notion of isolates — like Erlang's light-weight processes, isolates have separate heaps. Unlike Erlang processes, however, isolates don't have separate stacks, but are activated on an empty stack each time, through callbacks which process the incoming messages on a FIFO basis.

And as far as I can tell from the specification, nothing particular happens to an isolate if one of its callbacks terminate abnormally.
[Actually, I am told that as it is now, that will cause the Dart VM to crash. I will assume that this behaviour is not intended.]

Indeed, isolates appear to be more passive than active objects -- as far as I can tell, their life spans are determined, not explicitly and from within like Erlang processes' life spans, but implicitly and from without, by reachability, like OOP objects.

And that difference is a significant one: it breaks two of the three “pillars of sound state”: an isolate can surprise itself by handling an unrelated event when it perhaps shouldn't, and an event handler crashing within an isolate can leave the state in an inconsistent state without any consequences for the isolate's lifecycle.

Basically, what this means is that the invariants of an isolate's state most be reestablished by all of its event handlers, always, regardless of whether and how they terminate abnormally. This may just be par for the course in the eyes of many programmers, of course, but from an Erlang developer's perspective it's very much a “close but no cigar” situation.

The state-space explosion that commonly happens when a state machine must be able to handle every kind of events in every state is exactly one of the things that drove the development of Erlang in the first place (and led to the “selective receive” feature).

Case in point: Synchronous calls

An example of a common pattern where this matters is as follows: One actor needs, during the processing of a message, to contact another actor for information necessary to complete the processing. This is a situation that commonly occurs in Erlang programs, and in Erlang, the typical solution would be to make a synchronous call to the other actor: send a request, then wait for a response before continuing execution. (In most cases, a timeout would be set as well, to ensure that execution will in fact continue.)

This send-request, wait-for-response operation pair can be put together in a function, so that the invocation of another actor looks just like a normal function invocation — the communication can be encapsulated; in fact, the send/receive code is rarely actually spelled out, because it is already provided by the standard library.

In Dart, the situation is somewhat different: to do a synchronous call to another actor, you'd create a Promise (a one-shot message queue), and send the send-end + the request to the actor in question. That actor would then put the result to the Promise. But the caller cannot just explicitly wait for the response; instead, it must register a response handler on the Promise. The handler would typically be a closure which carries whatever state is needed to complete the processing.

All of these operation complete right away — execution continues immediately. This means that the remote call cannot be encapsulated and treated like a normal function call; modularity suffers. In Erlang, it is easy to change whether, how and to which other actor to make asynchronous calls — it's a local change, completely transparent to the call site; in Dart it looks like it wouldn't be so simple a matter.

If you were to approach the Erlang way in Dart, you'd have to write in continuation passing style (CPS), so that the call would always happen on a nearly empty stack — so that returning from a function means returning from the event handler; this means that the current constraints — that event handling always begins and ends with an empty stack — wouldn't matter, because the real stack would be in the continuation.
When using CPS, calling another actor could indeed be made to look just like another function call.

Except that it wouldn't behave like an ordinary function call, because other incoming messages could be processed between the call request and the call response — leading to the problems mentioned above with self-interference.

The funny thing about this CPS-based approach, by the way, is that this attempt to buy back the between-events stack that is necessary for encapsulation of calls, essentially overdoes it: what you get is not one call stack, but a number of simultaneous call stacks (disguised as response handler continuations), all ready to continue execution as responses become ready. A multitude of threads of execution which can trip each other up and cause all sorts of nondeterminism-induced problems.

If the intention of the single-threaded isolates is to reduce the complexity of developing concurrent systems, then this would not be a particularly good outcome.
No stack is too little; a multitude of stacks is too many. One is the number we want, the one we can reason about.

Please animate the isolates!

The above describes the present situation, as I read the Dart specification (which I must admit I haven't done end-to-end; I've mainly been focusing on the isolate- and exception-related parts).

Since the Dart language is still in development, and Google is requesting input to the development process, these things may of course change.

My input to the process, then, is this:

Give the isolates an explicit life cycle — just isolating the heap, thus ensuring sequential heap access, is a good first step, but to fully simplify matters and keep invariants local, an isolate must be in control of its own lifecycle, including which types of events it will be ready to process at any given moment.
In short, give them liberty and give them death! (to paraphrase Patrick Henry rather freely).

Monday, October 10, 2011

Dynamic Selective Receive — an Erlang hack

Erlang is a language which is dynamic in many aspects.
One of the things that are resolved statically, however, is pattern matching; the decision trees used for branching in function clauses, case statements, and receive statements are constructed on compile time.

For instance, a set of patterns like

[X] when X==0 orelse X==100
[H|T]
[]

is translated by the Erlang compiler into the following decision tree:

is_nonempty_list(B)?
+-Y-> [H|T] = B
|     is_nil(T)?
|     +-Y-> H == 0?
|     |     +-Y-> case 1
|     |     |     ^
|     |     |     |
|     |     |     Y
|     |     |     |
|     |     +-N-> H == 100?
|     |           |
|     |           N
|     |           |
|     |           v
|     +---------> case 2
|
+-N-> is_nil(B)?
      +-Y-> case 3
      +-N-> No match.

The Erlang compiler is quite good at constructing such decision trees, which is good, because it's critical to the performance of typical Erlang programs.
(Actually, if the compiler was poor at that job, “typical Erlang programs” would probably often look different, because developers would be more inclined to write their own decision trees. Compilers form developers, to some extent.)

If you want to construct a predicate dynamically, you can do that, using function composition or an interpreter function — that is for instance how the Eshell is implemented. For instance, the first pattern above can be described as

{is_pair,
 {'or', {is_eq, 0}, {is_eq, 100}},
  is_nil}

which can be interpreted by a predicate evaluator like:

eval(is_nil, []) -> true;
eval({is_pair, HP, TP}, [H|T]) -> eval(HP,H) andalso eval(TP,T);
eval({is_eq, V}, V) -> true;
eval({'or', P1,P2}, V) -> eval(P1,V) orelse eval(P2,V);
eval(P, V) when is_function(P,1) -> P(V);
eval(_, _) -> false.

However, you can't do a selective receive using such a predicate, because one of those statically-constructed decision trees is always used for selecting which message to extract from the inbox; and once it's taken out of the inbox it cannot be put back again. Only after a message is extracted from the inbox can more general code be applied to it.

When is this a problem?
Well, the Erlang shell is one example; here, dynamic selective receive will have to be faked: buffering is needed, and in fact the timeout mechanism of the Erlang receive construct is (last I checked, which is a few years ago) emulated only incompletely. I know that I filed a bug report about that issue, but I can't find it now.

Also, it's an issue that may sometimes arise when you're trying to combine two concerns in one process, and each of the concerns communicate with other processes; in such cases, it would sometimes be convenient to have parameterizable selective receive.

Solutions

For this problem, I've found a solution which looks promising.

The solution is sufficiently cheaty, however, that I should probably first mention a couple of more “in the spirit” solutions; all three solutions can be considered to be cheating, but at least I can offer different flavours of unrulyness.

  1. Construct, dynamically, a module which implements a function which does the selective receive. Constructing and compiling modules dynamically is relatively easy in Erlang. Calling a function in a runtime-specified module is just everyday Erlang; it's Erlang primary kind of polymorhism.

    You wouldn't want to do this module construction business for something that needs to be done a million times, but it's probably a good exercise.

  2. Take as a parameter a predicate function which is to be used for selecting the message. Access the process's inbox through the backdoor: process_info(self(), messages), and apply the predicate to each message in turn until an acceptable one is found (or the end of the list is reached, in which case you wait a bit and retry). Now you know exactly which message you want to extract, and that's straightforward to do (the decision tree for that is static).

    I can only recommend this approach if you don't care about performance or ugly hacks. Also, there's no really good way of handling the no-such-message-yet case, as far as I can see.

  3. Take advantage of the fact that what is possible in Beam — the VM's instruction set — is a superset of what's possible in Erlang. In particular with respect to the receive construct. This is the solution that I will describe below.

The following module is a quite general solution which uses a matchspec as a predicate. It is written in Erlang assembly, because at this level this hack becomes possible:

%% File: dyn_sel_recv.S
{module, dyn_sel_recv}.  %% version = 0
{exports, [{match_spec_receive,2}]}.
{labels, 5}.

{function,match_spec_receive,2,2}.
  {label,1}.
    {func_info,{atom,match_spec_magic},{atom,match_spec_receive},2}.
  {label,2}.
    {allocate,2,2}.
    {move,{x,1},{y,0}}.
    {move,{x,0},{y,1}}.
  {label,3}.
    {loop_rec,{f,5},{x,0}}.
    %% x0 = Msg, y0 = Timeout, y1 = CMS
    {test_heap,2,2}.
    {put_list,{x,0},nil,{x,0}}.
    %% x0 = [Msg]
    {move,{y,1},{x,1}}.
    %% x1 = CMS
    {call_ext,2,{extfunc,ets,match_spec_run,2}}.
    %% x0 = [] | [Res]
    {test,is_nonempty_list,{f,4},[{x,0}]}.
    {get_list,{x,0},{x,1},{x,2}}.
    {move,{x,1},{x,0}}.
    %% x0 = Res
    remove_message.
    {deallocate,2}.
    return.
  {label,4}.
    {loop_rec_end,{f,3}}.
  {label,5}.
    {wait_timeout,{f,3},{y,0}}.
    timeout.
    {move,{atom,timeout},{x,0}}.
    {deallocate,2}.
    return.

To test it, we have this module:

%% File: dyn_recv_test.erl
-module(dyn_recv_test).
-include_lib(“stdlib/include/ms_transform.hrl”).
-export([test/0]).

test() ->
    %% The pattern (and what it should result in):
    MS = ets:fun2ms(fun({ping, X}) -> {pong,X} end),

%% Compile the pattern:
    CMS = ets:match_spec_compile(MS),

%% Perform a selective receive based on the pattern:
    dyn_sel_recv:match_spec_receive(CMS, 1000).

A sample interaction, with comment marked with "#" and "%":

# Compile the code:
$ erlc dyn_recv_test.erl dyn_sel_recv.S

# Then start an Erlang shell:
$ erl
Erlang R14B (erts-5.8.1) [source] [smp:2:2] [rq:2] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.8.1  (abort with ^G)
1> dyn_recv_test:test(). % No appropriate messages - times out
timeout
% Fill the inbox with test data:
2> self() ! 'before'.
before
3> self() ! {ping, 144}.
{ping,144}
4> self() ! {ping, 1234}.
{ping,1234}
5> self() ! 'after'.
'after'
6> process_info(self(), messages). % There are four messages in the inbox now
{messages,[before,{ping,144},{ping,1234},'after']}
7> dyn_recv_test:test(). % Receive first ping message
{pong,144}
8> dyn_recv_test:test(). % Receive second ping message
{pong,1234}
9> dyn_recv_test:test(). % Times out again - no more ping messages
timeout
10> process_info(self(), messages). % Now, only the non-pings are left.
{messages,[before,'after']}

Discussion

Why restrict ourselves to matchspecs?

In principle you could instead construct a receive function which accepts a general predicate function, but that would be risky — while I'm reasonably confident that the matchspec hack works (no guarantees though!), I'm fairly certain that the Erlang VM would take offense if such a predicate were to e.g. do any receiving of its own, or throw exceptions.

Matchspecs, on the other hand, can do just that which is possible to do in a regular pattern matching construct in Erlang — in particular, no side effects are allowed.

And matchspecs can be preprocessed (“compiled”) into a form which is interpreted efficiently; yet their source form can be constructed with relative ease at runtime. Finally, matchspecs have the nice thing about them that even though they are represented as ordinary Erlang terms, and as such may be a somewhat hard to read (and write) if they are complex, there is a parse-tree transformation which makes it possible to write them just as ordinary Erlang patterns and conditions — this is demonstrated in the above.

Now, I haven't yet had the chance to use this hack in a real context, but here it is. If nothing else, I suppose it demonstrates that it is quite possible to allow matchspecs in Erlang's pattern-matching constructs — at least that there does not seem to be any VM-level reason against it. I think that might be an interesting language extension; then again, there are plenty of other language extension discussions, and this suggestion would probably not get high priority.