Thursday, March 12, 2009

The Story of Paxos

In distributed computing, Paxos is a very important algorithm. It was created by Leslie Lamport in 1990, but for somewhat amusing reasons, nobody noticed it for a few more years. I'll give the history of the algorithm first, and then explain what it actually does.

Paxos was first published in 1990, in Lamport's The Part-time Parliament. Unfortunately, Lamport decided to add some humor to it, so the whole paper is written from the point of view of an archaeologist, who has just discovered an ancient civilization, which used the Paxos algorithm. It's actually a pretty funny paper, and if you translate the Greek names in it to English letters you get the names of various well-known computer scientists in the field of distributed computing. Unfortunately, when anybody read it, their first reaction was, "are you sure this is actually computer science?" Needless to say, it never got published.

Lamport was disappointed at how humorless everybody else was; and, to make matters worse, nobody really understood the algorithm anyway. So, despite the algorithm being a fairly revolutionary piece of computer science, Lamport shelved it away and continued working on other stuff. It wasn't until 1998 (and don't forget, eight years is a really long time in computer science!) that the paper was finally published.

Unfortunately, all was not well. Partly because of all the Greek names and terminology, and partly because the algorithm is fairly subtle, people kept complaining to Lamport that Paxos was too hard to understand. It eventually got to the point that, in 2001, he had to write a short follow-up paper, Paxos Made Simple, explaining Paxos in much easier terms. Even now, though, most people don't understand it.

So what does Paxos do, and why was it revolutionary? There are three things you have to understand before it will make sense - asynchrony, consensus, and the FLP result.

There are two basic models for thinking about distributed systems. The asynchronous model makes just a few really weak assumptions - you have some computers, and they can send each other messages. It makes no guarantees about when messages will arrive, or about the properties of the computers. The synchronous model is basically the asynchronous model with wristwatches: every computer agrees on what time it is, and there is an upper bound on how long a message can take to arrive. It's much easier to make algorithms for the synchronous model, but algorithms that work in the asynchronous model tend to be much faster.

The consensus problem is a pretty simple one to understand: given a set of networked computers, have them agree on something. It has formal properties for agreement, validity, integrity, and termination, which you can see in the linked article, but I'm not going to go into them here. If none of the computers ever crash, then consensus is really easy to solve in a synchronous system; it's solvable if there are failures, too, but you have to think a bit harder to make it work. (Aside: There's a whole hierarchy of failure models for distributed systems, ranging from "nothing will ever fail" to "basically anything can go wrong at any time". The latter is called a "Byzantine" failure model, after the Byzantine generals problem.) Consensus is also solvable in an asynchronous system, assuming there are no failures.

If even a single computer can crash in your asynchronous system, though, you're in a world of hurt. Way back in 1985, Fischer, Lynch, and Paterson proved that it's Impossible-with-a-capital-I to solve consensus in an asynchronous system with a single crash failure. Since "single crash failure" is the weakest failure model that allows for failures at all, this could put a bit of a damper on your enthusiasm for distributed computing. After all, consensus is a really basic problem; it's the sort of thing you expect to be able to solve no matter what.

However, and this is key: you'd have to be really, really unlucky to actually be using an asynchronous system. Theoretical models aside, if you had a system where messages could take, say, ten years to arrive, you would not be a happy camper. In practice, networks are a lot closer to the synchronous model; they're only really asynchronous under exceptional conditions. (Or, as my distributed computing prof has said many times: Impossibility results are just excuses to cheat.)

The Paxos algorithm can solve consensus in a synchronous system with various failure models, depending on what you need - there are a bunch of variants of it these days. However, here's the neat bit - in an asynchronous system, Paxos doesn't break, it just fails to make progress. When the system becomes synchronous again, Paxos picks up where it left off. This doesn't quite solve consensus in an asynchronous system, but it steps around it fairly elegantly, and is actually robust enough to be used in real-world systems.

I would go into how Paxos works, but somehow I don't think that enough of you care about that level of detail. :) So, instead: Happy Friday the 13th!

1 comment:

Frank Church said...

Hey it is Friday the 13th! I learned things from your entry - never heard of the a/synchronous distinction before, or most of the rest of it too. So I enjoyed it. I regret to say that I will always confuse Paxos with Naxos (classical music record label) and the Naxos Music Library which Swarthmore subscribes to. Seriously you have no idea how much I use it.