Tuesday, October 25, 2011

The Impossible Isolates

…in which I continue to expound on the vast superiority of the Erlang Way, this time with the young and innocent Dart on the receiving end (but don't worry, Dart can handle multiple of those. The problem is that it must).

Google recently published a "technical preview" of the Dart language. It contains a notion of “isolates” — described by the language specification as "actor-like entities" — which serve partly as sandboxes, partly as "unit[s] of concurrency".

In the following, I intend to demonstrate that though the Dart isolates are presented as “inspired by Erlang [processes]”, there is much to be gained by taking a few more pages out of Erlang's book.

What's Impossible in Your Favorite Language?

When programming languages are discussed, focus tends to be on the things that are possible in a given language.
But what's at least as important — if you're doing anything beyond toy programs and obfuscation contests — are the things that are impossible in the language. What can't happen?

Impossibilities are very useful. They can be a great help when reasoning about programs — and when constructing programs which lend themselves to reasoning.
The impossible helps isolate relevant effect from irrelevant cause.

It's the same in real life: we do a lot of optimizations based on what we know can't happen; being able to make assumptions about the world is what lets us do, well, anything complex at all.

Gravity is quite reliable, for instance. I rarely bother to strap things to my desk these days, but rely on plain old gravity to keep them where I put them.

If I place something on my desk, then come back later to find that thing lying on the floor, and wonder how that came to be — then I may suspect a number things, but gravity outage isn't even considered a candidate explanation.
That's the difference between unlikely and impossible.

(With the different large-ish systems I've been working on for the past few years, unlikely happens all the time. And implausible (though not impossible) happens about once every 1–2 years (these events often involve virtualization).)

Narrowing down the "what may happen / what may have happened" scope is a crucial part of program debugging, understanding and maintenance, and hence of developing.

The blessings of Erlang

One of the critical things that Erlang provides, and a major reason why I like it, is the way its semantics and building blocks prevent surprises.
Two processes can't interfere with each others' state; they can't affect each others' control flow except through message passing; and if a process goes down, it disappears completely, without leaking resources.

The great thing about Erlang is not in itself that it allows you to do many things simultaneously, but that at the same time it lets you focus on one thing at a time. It is just as much about being able to think non-concurrently, as it's about concurrency. That's the way that complexity is handled: by isolation, through focus on the task at hand.
Much of the time, the result is that concerns separate nicely.

The three pillars of sound state

As Kresten Krab Thorup points out in his posting “Dart: An Erlanger's Reflections”, the three major features of Erlang which enable such focus on one task are isolation, sequencing, and fault handling.

These features consist of four (1+1+2) properties which can be summed up as, respectively,

  • Other processes cannot interfere with a process's state (except through message passing)
  • A process cannot interfere with its own state (but has a single, predictable control flow)
  • A process which fails cannot continue to run after the failure (with a possibly corrupt state)
  • When a process fails, and thus ceases to run, entities dependent on that process will be notified of its demise in a timely manner.

(Or, more consisely: “Others won't interfere”, “I myself won't interfere”, “Failure will not linger”, and “Failure will not go unnoticed”.)

Of these, I will in the following focus on the first three — the ones which are, incidentally, stated as impossibilities of the aforementioned kind.
The absense of either of these properties would destroy locality in reasoning.

One consequence of this set of properties is related to the famous "Let it crash" philosophy of the Erlang tradition: that even error handling is done only by the survivors — those processes with a “good” (non-corrupted) state.

The state of Dart (and stacklessness of its isolates)

The reason I write about all of this just now is related to Google's recently-published language “Dart” (released as an early preview; the spec is not final yet).

Dart introduces a notion of isolates — like Erlang's light-weight processes, isolates have separate heaps. Unlike Erlang processes, however, isolates don't have separate stacks, but are activated on an empty stack each time, through callbacks which process the incoming messages on a FIFO basis.

And as far as I can tell from the specification, nothing particular happens to an isolate if one of its callbacks terminate abnormally.
[Actually, I am told that as it is now, that will cause the Dart VM to crash. I will assume that this behaviour is not intended.]

Indeed, isolates appear to be more passive than active objects -- as far as I can tell, their life spans are determined, not explicitly and from within like Erlang processes' life spans, but implicitly and from without, by reachability, like OOP objects.

And that difference is a significant one: it breaks two of the three “pillars of sound state”: an isolate can surprise itself by handling an unrelated event when it perhaps shouldn't, and an event handler crashing within an isolate can leave the state in an inconsistent state without any consequences for the isolate's lifecycle.

Basically, what this means is that the invariants of an isolate's state most be reestablished by all of its event handlers, always, regardless of whether and how they terminate abnormally. This may just be par for the course in the eyes of many programmers, of course, but from an Erlang developer's perspective it's very much a “close but no cigar” situation.

The state-space explosion that commonly happens when a state machine must be able to handle every kind of events in every state is exactly one of the things that drove the development of Erlang in the first place (and led to the “selective receive” feature).

Case in point: Synchronous calls

An example of a common pattern where this matters is as follows: One actor needs, during the processing of a message, to contact another actor for information necessary to complete the processing. This is a situation that commonly occurs in Erlang programs, and in Erlang, the typical solution would be to make a synchronous call to the other actor: send a request, then wait for a response before continuing execution. (In most cases, a timeout would be set as well, to ensure that execution will in fact continue.)

This send-request, wait-for-response operation pair can be put together in a function, so that the invocation of another actor looks just like a normal function invocation — the communication can be encapsulated; in fact, the send/receive code is rarely actually spelled out, because it is already provided by the standard library.

In Dart, the situation is somewhat different: to do a synchronous call to another actor, you'd create a Promise (a one-shot message queue), and send the send-end + the request to the actor in question. That actor would then put the result to the Promise. But the caller cannot just explicitly wait for the response; instead, it must register a response handler on the Promise. The handler would typically be a closure which carries whatever state is needed to complete the processing.

All of these operation complete right away — execution continues immediately. This means that the remote call cannot be encapsulated and treated like a normal function call; modularity suffers. In Erlang, it is easy to change whether, how and to which other actor to make asynchronous calls — it's a local change, completely transparent to the call site; in Dart it looks like it wouldn't be so simple a matter.

If you were to approach the Erlang way in Dart, you'd have to write in continuation passing style (CPS), so that the call would always happen on a nearly empty stack — so that returning from a function means returning from the event handler; this means that the current constraints — that event handling always begins and ends with an empty stack — wouldn't matter, because the real stack would be in the continuation.
When using CPS, calling another actor could indeed be made to look just like another function call.

Except that it wouldn't behave like an ordinary function call, because other incoming messages could be processed between the call request and the call response — leading to the problems mentioned above with self-interference.

The funny thing about this CPS-based approach, by the way, is that this attempt to buy back the between-events stack that is necessary for encapsulation of calls, essentially overdoes it: what you get is not one call stack, but a number of simultaneous call stacks (disguised as response handler continuations), all ready to continue execution as responses become ready. A multitude of threads of execution which can trip each other up and cause all sorts of nondeterminism-induced problems.

If the intention of the single-threaded isolates is to reduce the complexity of developing concurrent systems, then this would not be a particularly good outcome.
No stack is too little; a multitude of stacks is too many. One is the number we want, the one we can reason about.

Please animate the isolates!

The above describes the present situation, as I read the Dart specification (which I must admit I haven't done end-to-end; I've mainly been focusing on the isolate- and exception-related parts).

Since the Dart language is still in development, and Google is requesting input to the development process, these things may of course change.

My input to the process, then, is this:

Give the isolates an explicit life cycle — just isolating the heap, thus ensuring sequential heap access, is a good first step, but to fully simplify matters and keep invariants local, an isolate must be in control of its own lifecycle, including which types of events it will be ready to process at any given moment.
In short, give them liberty and give them death! (to paraphrase Patrick Henry rather freely).

2 comments:

  1. Hi, very nice post (I'm working on isolates in Dart). We are definitely very interested in ideas people have about how they might use isolates, and how the message passing should work. This question of whether you need the ability to block while waiting for a reply from another isolate is one we're discussing a lot.
    Matt

    ReplyDelete
  2. Thanks, Matt - and very nice to hear from you. I'm impressed to see that you must have been among the first 50 readers - that you found this so soon certainly seems to me like a confirmation that you're serious about being interested in feedback...

    Of course it all depends on how isolates will be used, but considering how JavaScript has probably been put to uses far more complex than originally envisioned, it might be worth it to look to the telephony lessons about how to manage complexity in reactive systems - Ulf Wiger demonstrates it nicely in this presentation:
    http://www.infoq.com/presentations/Death-by-Accidental-Complexity
    (I know I have already linked to this, but perhaps not as prominently as it might deserve, given the similarity of his and my message.)

    ReplyDelete