Tuesday, March 26, 2013

The Case for a Concurrency-oriented VM

In the following, I’ll describe the background for my current long-running project - why I believe it to be a worthwhile endeavour.

The overall line of thought goes like this: concurrency and concurrency-oriented languages are important; we have such languages, but they are few and far between; there exists a non-CO ecosystem (the one around the JVM) which has spawned dozens of languages and allows those languages to cooperate and prosper; which factors have been significant in creating this ecosystem, and how can they be harnessed to improve conditions for concurrency-oriented languages?

We begin at concurrency and how programming languages deal with it.

Concurrency and languages

A fair number of the programs we use every day are doing more than one thing at a time. Every web server, smartphone app, GUI application, operating system, chat server, VoIP system, game server, music player, video player, etc. that you’re using are concurrent programs - either in the sense that they are serving multiple users at once, or in the sense that they have multiple asynchronous event sources they have to react to in a timely manner.

Yet most programming languages - and about all mainstream languages - aren’t very well geared towards making concurrent applications. The programming paradigms one hears most about these days are the Object-oriented and the Functional paradigm. Ever heard of Concurrency-oriented languages? They exist, but there aren’t very many of them - and they are not mainstream but something you have to actively seek out.

What do I mean by being concurrency-oriented, or "geared towards concurrent applications"? (I’ve touched upon this question earlier.)

A number of languages with concurrency support are not much more than a sequential language with threads and locks added. This is not enough to qualify for being a concurrency-oriented language; to be that, you have to take the issues of concurrency seriously. If you take a convential imperative language and add threads, you’d have to subtract something as well.

A typical example of a language-with-threads is Java. Java is an object-oriented language first and a concurrency-supporting language second, at best. Like most languages-with-threads, concurrency support mainly comes as an addition, an afterthought, with a library-ish feel. Sure, the Java syntax contains concurrency-related things - mainly synchronized and volatile - but nothing forces you to use them. You can write a program embodying all of the concurrency pitfalls known to man, and neither compiler nor runtime will object in the slightest.

There are rules about when the concurrency primitives are used safely. They come in the form of the Java Memory Model and Thread Specification, which states what the guarantees are and how to avoid »Surprising behaviours«.[1]

To be sure, explicitly stated guarantees is a Good Thing. But consider a world in which the type safety of Java amounted to a document stating how to avoid type errors. No static checking, no run-time checking, just rules about when you’re safe…

Smells a lot like C, right? Replace type errors, initialization errors, and memory mismanagement faults with data races and deadlocks - and lack-of-strong-typing becomes lack-of-proper-concurrency-support. Except that whereas the former kind of bugs are more or less deterministic, the latter kind seldom is. Which means that a) it is harder to debug, b) it can not be guarded against by unit tests, and c) the place where you’re likely to discover it is in production.

And just like we’re much better off with the enforced type safety of Java (nobody misses the segmentation faults which are common when developing C or C++ programs), so we’re better off when the language provides some actual guarantees in the face of concurrent activities.

And languages can do that. A language like Erlang, for example, is concurrency-oriented first and functional second - and it shows. Other examples include Oz, CML, Rust, and Clojure (Haskell probably also qualifies these days); each solves the problem of providing safe concurrency in its own way.[2]

But these languages are each isolated efforts, and they are not mainstream. Which means that they each must struggle on many fronts: evolving syntactically and semantically; keeping up with hardware and OS evolution; providing libraries for the expected functionality of the changing times; and fighting for developer mindshare.

The success of the JVM

Which brings us to the happy ecosystem centered around the JVM.

The Java Virtual Machine is thriving - to the point where it might outlive the language it was originally created for; while Java the language is evolving only slowly, - the JVM is the center (qua facilitator) of a wholly different level of evolution: a multitude of languages of all sorts of paradigms, domains, typing strategies and syntactic styles are targeting the JVM.

Though the VM was originally designed for just one language - Java - it has proven so convenient to generate code for that many language implementers have chosen to do just that. (I’ve done it myself on more than one occasion.)

Why is that?

For one thing, the JVM is well-specified. Its specification is public and the rules are clear and make sense. Second, part of those rules are embodied in the bytecode verifier - a validation step performed on code when it is loaded into the JVM. Thirdly, the existence of the public specification has meant that the Java language and the JVM implementation(s) have been decoupled (more or less) with respect to their evolution. Java the language has been able to evolve quite a bit without the VM spec changing much (The addition of generics being one example), and the VM on its part has been able to evolve quite much performance-wise. (Besides, several competing VM implementations have come to exist.)

And the HotSpot implementation of Java is a solid piece of software engineering which has put to shame the "interpreted languages must be slow" prejudice, making it a thing of the last century.

What does these things mean for compiler writers targeting the JVM?

Targeting a statically checked bytecode format is a blessing, because many of the most difficult-to-debug kinds of error are revealed at load time, instead of resulting in mysterious memory corruptions. Bytecode verification enforces some useful invariants, reducing debugging time.

Targeting a mature VM yields performance and portability benefits - "write once, run anywhere" is attractive for language implementers as well as application developers.

Consider all the languages which run on the JVM, some of them available only on that platform.
Without that virtual machine, or something else playing the same role, would we have had so many new implementations of such a variety of languages, of such a level of maturity, performance and portability? And provided with such a wide range of libraries?

Most probably not.

The task of implementing a native code generator for just a single platform, along with runtime libraries and a decently performing general-purpose garbage collector, is not trivial.[3] The number of man-hours which could have gone to debugging segmentation fault and similar mysteries, and to implement just the basic libraries for the new languages, but which have instead been channeled to progress in other areas, are significant. Small wonder that Microsoft chose to copy Java and make their own common cross-language platform (.NET).

Finally, apart from development of compilers, the well-defined-ness of the Java bytecode format has meant that a number of other byte-code processing tools have come to exist: tools for analysis, transformation and instrumentation of Java bytecode.

In summary: The JVM has been, and is, a great facilitator; this is especially due to its general usability, public specification, fine performance (especially after HotSpot), and - not least - bytecode verification following well-defined and well-understood rules.

The shortcomings of the JVM

But the JVM design has drawbacks as well. Some are general nuisances, some are just annoying to compiler implementers targeting the JVM, and some cause it to be less than ideal for languages belonging to certain paradigms. Some are visible to Java developers, others are purely technical.

Issues for language implementers

First of all, the JVM smells of Java.
In a number of ways, the design of the virtual machine would have been different if Java - especially the early versions of Java - had been different.

The JVM is quite class-centric, for instance - the unit of loading and verification is the class. The only kind of type definition is the class. For language implementers, this is a source of mild annoyance: translating most kinds of simple code is simple, but the moment the code uses a new record type, or variant type, or function object, or a function needs to return multiple values - it translates into adding a new separate class file to the compiler output.

The JVM is statically typed - but has no parametrized types. Therefore, type safety breaks slightly every time one of the common container classes from e.g. java.util is used, so runtime checks must be performed. This is because Java prior to 1.5 (where generics where introduced) was poor on this point.
Furthermore, the JVM instruction set has a type check instruction and a type cast instruction - but no "check-and-cast" instruction. I can only assume that this is because Java doesn’t have a language construction such as "type case", but only the combining idiom "if (x instanceof T) {T y = (T) x; …}". Presumably, this pattern is recognized and simplified internally in the common JVM implementations.

JVM methods are limited to 64KB worth of bytecode. While this is rarely a limitation for hand-written Java code, it is occassionally a problem for machine-generated code. (In Erjang, for example, we hit the limit for one version of the interpreter, and also (if memory serves) when compiling one particular Erlang module into Java bytecode.)

A special case of this is that large literals simply can’t be used, because in the Java class file format, a literal value are represented as the sequence of instructions it takes to build the value. Try compiling a literal array of, say, 12,000 elements - at four instructions per array slot, this exceeds the method-length limit, and the compiler quits with the message "code too large".[4] Certain table-driven algorithms would therefore also be hit by the limit.

The implications for language implementers are that a) one likely kind of candidate for hitting the method-length limit is interpreter loops, and b) compilers to JVM bytecode should know about and handle the limit — which gives rise to a tough implementation decision: just detecting when the limit is overrun and aborting the compilation is an honest, but abstraction-breaking solution; working around the limit in general is presumably possible, but introduces undesired (and seldomly needed) complexity.[5][6]

A related problem is JVM’s bad immutability guarantees. Specifically, there is no such thing as an immutable array. If a part of your code is given an array, or provides public access to one it its arrays, then it can’t assume that other code parts won’t modify the contents of the array at a later point.
Literal arrays are no exception: a VM implementation can’t perform constant folding of lookups into a literal array until it’s proven that that array can’t be leaked to somewhere where it might be mutated; a property which can only be established by inspecting the entire scope of the variable.

The JVM’s support for (shallowly) immutable variables comes in the form of the final field modifier. The intuition of this modifier is simple enough: a final field can be set only once, and must be set before its first use, so it should have the same value throughtout its lifetime.

But in reality, things are a bit more complex - the field modifier has its warts:

  • Certain final fields may be inlined at compile time - even across classes, which has the potential for causing surprising inconsistencies.

  • Even fields declared final can be modified though reflection(!) - which, of course, doesn’t affect those accesses which have been inlined at compile-time (I believe you could call this a surprise within a surprise).

  • There’s another case where the "final fields always have the same value" rule is broken: Final fields can be observed before they gain their value.
    That abstraction breakage is demonstrated in the following program:

    public class JavaFinal {
        final static Object x = JavaFinalHelper.create();
        public static void main(String[] args) {
            System.err.println("Main: x = "+x);
        }
    }
    
    class JavaFinalHelper {
        static {
            System.err.println("Helper: x = "+JavaFinal.x);
        }
        public static Object create() {return new Object();}
    }

    Here, the helper class’s static initializer is run before JavaFinal.x has been set - exposing the null value that field has before it is initialized.

Thus, certain should-be-invariants aren’t.

Finally, and most importantly in this context, there are the shortcomings in the concurrency department.

The JVM provides no checking of the sanity of concurrent access patterns. Java’s static type system which is the basis of the bytecode verification is static in more than one sense: It doesn’t take time into account. If you ever get hands on a reference to an object, then you can at any later point call any (visible) method.

In concurrent systems, concepts like life cycle, ownership, sharing and thread safety become important. The JVM doesn’t provide much support for dealing with these. For that matter, attention to these concepts is important in general for defining library boundaries - which assumptions the library and the library client can make about values passed to and from the library. Quite often, thread safety and access timing questions aren’t addressed by Java libraries' API documentations; Javadoc makes it easy to document the static types involved, but certain questions about the dynamics are not often answered.

From an Erlang point of view, such a thing as thread-local garbage collection is also desirable. I have a hypothesis that no sanely structured concurrent programs genuinely need a globally-GC’ed heap. Whether that is true or not, for global GC you either pay the price of occasional slow major garbage collections - which cost in latency - or, if your garbage collector is sufficiently advanced to do its work incrementally or concurrently with the mutating threads, pay the price of inter-thread synchronization necessary for this to work - which costs in throughput.

For the matter of this text, the verification aspects are the central points.

To sum up: The JVM lacks verification support for such things as sanity of concurrent accesses and immutability/read-only access privileges. As far as concurrency-orientation is concerned, these areas present room for improvement.
And I am not aware of any other generally targetable VMs which fare better in these respects.

The shortcomings of the BEAM

The primary contender for the title of Concurrency-orientated VM is probably the one for Erlang, which is called BEAM. The flavour of concurrency-orientation provided by BEAM includes immutable data structures, shared-nothing actor-based concurrency, light-weight processes, and process-local GC - and a few specific forms of destructive updates.

In some ways, it is very much like the Java VM: It has a mature implementation, was originally intended for just one language, comes from industry (as opposed to academia), and started out as purely interpreted but later grew optional native compilation.

But in other respects, it is very different:
While the JVM has had a public specification at least since 1999, BEAM hasn’t got a complete specification - public or not.[7]

The most complete instruction set reference I know of is this historical paper, written in 1997 but first released 2012 (prompted, I think, by the 2012 mailing list discussion). And that one only documents the internal representation, not the (somewhat different) instruction set used in BEAM files.

This lack of a specification has consequences:

  • As for BEAM code producers: While other languages than Erlang have been developed for BEAM, none produce BEAM code directly - instead, they target an intermediate representation of the Erlang compiler and use that compiler’s backend for producing actual BEAM code.

  • As for BEAM consumers: Until fairly recently, no other BEAM VM implementations existed but the official one from Ericsson.

The consumer situation has changed a bit; people have begun making other implementations such as Erjang, Browserl and Ling (which run on, respectively, JVM, Javascript, and bare metal).

But the lack of a specification - a clear definition of what is and what isn’t allowed - means that there are differences in what the implementations can run.

At least two example of this are known to me:

  1. Erjang assumes that exception handlers are set up in a execution path independent manner. While this is presumably true for code generated by the Erlang compiler, I can imagine BEAM code in which this isn’t the case, and which the official BEAM implementation will run cheerfully, while Erjang will balk at it.

  2. There is a gotcha hidden in the external instruction set - there are nine instructions for calling functions: three for local calls (call, call_last, call_only - depending on whether it’s a tail call, and whether there’s a stackframe to deallocate); three for external calls (call_ext, call_ext_last, call_ext_only); two for calls with dynamic module and/or function name (apply, apply_last); and one for calling function objects (call_fun).

    Can you guess what the trap is?

    Erlang is a language with guaranteed proper tail recursion, but the only instruction for calling function objects is a non-tail-call.
    How can that work??

    By cheating. The official BEAM rewrites the instruction set used in BEAM files into a richer instruction set used internally. The rewriting rules contain a rule which recognizes the instruction sequence "call_fun; deallocate; return" and rewrite it into the internal tail-call version of call_fun.

    The Erlang compiler, obviously, is in on the joke, and generates code accordingly.

This call_fun gotcha was one that both Erjang and Browserl missed. I only discovered it accidentally, and fixed it in Erjang, but as far as I can tell, Browserl doesn’t handle that corner case (yet).

So: a clear, public specification does matter. A specification backed by a formal checking mechanism such as a bytecode verifier is even better.[8]

From a "multiple different languages on one VM" viewpoint, there are of course other issues as well - for instance, the BEAM is rather Erlang-centered.

  • No in-process destructive updates are supported - even where it would be safe.

  • Only the concurrency primitives needed by Erlang are present - and the form in which they are present is so high-level that there is very little flexibility for other languages to be very non-Erlang. A process can’t have more than one inbox; the set of supported transactions on tables are fixed; binaries support efficient appending but not prepending; and so on.

In summary: The BEAM is a mature, well-tuned, and actively evolving VM for running Erlang programs, but lacks probably the flexibility and (as things stand) certainly the well-defined-ness necessary for it to be viewed as a general platform for concurrency-orientated languages.

What would a concurrency-oriented VM look like?

If the JVM isn’t it, and the BEAM isn’t it, what would a concurrency-oriented VM then look like?

The BEAM offers concurrency safety, in that it does not allow[9] shared mutable state without clearly defined transactions; this is a property we want.

The JVM guarantees the absence of certain kinds of errors, because its rules make the necessary invariants locally verifiable. We also want that. More specifically, the rule should guarantee against not only type errors, but also concurrency-related errors (data race conditions).

The JVM also enforces these invariants by verifying code at load time - rather than trust the compiler to generate only valid code. We want that, too - because we intend for there to be more than one compiler, for more than one language.

Both the JVM and the BEAM support "native functions" - functions implemented in C or similar - e.g. for interfacing with third-party libraries or for performance-sensitive parts of a program. It would be foolish to constrain the applications of a new VM by not supporting this. However, the aim should be to encourage as much as possible to be written in a concurrency-safe language - i.e. while library integration would be a good reason for using native functions, performance shouldn’t be; there should not be things which are inherently slow to implement on the VM.[10]

Thus, while the rule "all data structures are immutable" (as known from Erlang) makes it easy to guarantee against data races, it is not good enough - it either makes it impossible to implement certain things efficiently, or encourages diving into a lower-level language to escape the rules. We want a bit more generality than that, also for the sake of supporting other paradigms than Erlang’s - mutating operations should be supported as long as they are safe and can be proven (reasonably easily) to be so.

For example, one step up, we have the rule "all data structures which are ever visible by more than one process/thread to another must be immutable", which is much more flexible yet also guarantees against data races (and can be made locally verifiable). Even higher up that ladder, shared mutable data are allowed, as long as proper synchonization is in place.

The design I have in mind is based on the concept of linear types, which appear to be a good match, in that they make it possible to capture lifecycles of objects in the type system - which goes a long way towards making certain concurrency-related guarantees locally verifiable (because half of concurrency is timing).

I plan to elaborate on linear types and their use as basis for a VM in future posts.

Acknowledgements

I would like to thank Peter Mechlenborg and Thomas In Sook Holmen for their help with forming this text.


[1]   This is the place where you learn that if you have one thread writing a long or double value, and another thread reading it, the JVM implementation is allowed to let the reading thread see half-updates - where the two words of the read values come from different writes.

[2]   Go and CML do allow shared mutable memory and thus are not strictly data-race safe; they do, however, encourage communication to be done in other, safer ways.

[3]   Admittedly, LLVM has come along in the meantime, promising to relieve some of that burden.

[4]   Technically, the compiler could handle large literals by splitting the initialization into two methods, or - for integers - maybe even initialize the array based on a string constant. But apparently the Java compiler has been taught no such trick.

[5]   The Scala compiler (as of version 2.9.1) handles large literals in the same way as Java - except that it does not detect when the resulting method is too long; the resulting class file is rejected by the JVM at load time.

[6]   When developing Erjang, we ran into the limit both when writing the (machine-generated) interpreter and when compiling an Erlang module containing a particularly complex function.

[7]   The question of the BEAM spec has come up multiple time on the Erlang mailing lists, for instance in 2003, in 2007 and in 2012 - see erlang.org/pipermail/erlang-questions/2012-May/066628.html.

[8]  The Erlang compiler has a verifier in its backend, which does certain sanity checks on the generated code - but it is not complete, nor is it meant to be.

[9]   Except through the use of native functions (NIFs) or similar.

[10]   For preference, there should be a rather small trusted kernel written in C or similar. The official BEAM implementation is around 140K lines of C code for the emulator proper; I hope that less than that would be necessary.

2 comments:

  1. While a generic concurrency-oriented VM would be great, I think it is impossible to build successfully, because there *has to be constraints* such as those defined by a concrete language to succeed. I like the idea as an ideal, but I think it is impossible to generalize something like this. When you think of it, the JVM succeeded exactly because there was the language there to drive the VMs design.

    But, I have to admit that I also like to dream about this beast; I call it ULM (User-Land Mach). I think of it as an operating system, not a language VM. IMHO the core abstraction should be a system for managing memory, processes and doing IPC. In ULM, each process could be implemented in a separate programming language, and the inter-ULM language interoperability should be defined at the IPC level much like Mach typed messages.

    ReplyDelete
  2. The article looks magnificent, but it would be beneficial if you can share more about the suchlike subjects in the future. Keep posting. 파워사다리

    ReplyDelete