Thursday, November 12, 2009

Decentralization III: B.Y.O.I.

There's one question you can ask of any new Internet technology that can usually predict whether or not it has any chance of succeeding as infrastructure: "Can I just run my own, and have it work the same way?"

This sort of decentralization is common to nearly all successful Internet-scale technologies. I can run my own Web server, and anybody can access it just fine - I don't have to get permission from the Web Corporation, or run my website on hardware provided by them, or anything like that. Same with e-mail - anybody can make their own mail server, and send messages to anybody (unless the ISP blocks it, which many do these days). Same with XMPP - anybody can run their own XMPP server, and use it to chat with anybody else in the world using XMPP. Same with Google Wave - Google knows what I'm talking about! They designed it to run in a federated model, so that people can set up their own Wave servers, and communicate with people using other ones. Same with dozens or hundreds of other protocols.

Of course, there are some notable exceptions, in every direction. Google is arguably Internet-scale, even though they're a centralized service. Internally, though, they've got decentralized infrastructure all over the world. Basically, they've gotten some of the benefits of decentralized infrastructure, by paying for all of it themselves. IRC, on the other hand, is sort of a counterexample because even though it has decentralized infrastructure, it scales pretty poorly - if you're on one IRC server, you can't talk to people on other servers. It's a tradeoff, really. IRC servers have enough problems to deal with even without being able to send messages between servers.

Twitter's a counter-counterexample, because I love to hate on Twitter: They fancy themselves to be an Internet-scale service, but I don't believe they can scale to match the demands that come with truly being Internet-scale. (They're still getting taken down by DDoS attacks every once in a while. Can you imagine the kind of DDoS it would take to knock Google offline?!) Clever engineering may save them yet, but I wouldn't count on it.

As a counter-counter-counterexample (can he do that? :O), we have Identi.ca and the OpenMicroBlogging network. This is a twitter-like service that actually can scale, because they've designed it to be usable across multiple servers. For example, if I have an account on Identica and somebody else has an account on Brainbird, I can subscribe to them and everything just sort of works. I guess that instead of being a counter^3 example, this is just an example: anybody can run their own instance and have it work with everybody else's.

The BYOI (bring your own infrastructure) approach works. But what are the tradeoffs? Decentralization in these cases brings scalability and robustness, both of which are really important for systems that are trying to gain traction. On the other hand, you lose centralized control. This makes it more difficult (next to impossible, in some cases) to update the protocol, and it means that some unwanted uses of the system (such as spam) are impossible to control.

It also means you have to subdivide your namespace, and this can be a tricky problem. With Twitter or AIM, you can talk to people using just a username. With email or XMPP, you have to specify what server they're on as well (user@server.com), because people can use the same username across different servers. People have tried to design decentralized systems that use a centralized namespace before, but it's a fundamentally hard problem to solve without compromising your design. I would argue that DNS (the system that maps domain names to addresses) has managed it - whenever you type in a website address, you're pretty confident that anybody else in the world will see the same website, because it's a centralized namespace on a decentralized system. On the other hand, they only managed this by charging money for domain names, which isn't something that'd work well in other places.

Finally, for a system based on a decentralized protocol, you get some level of additional security over time. There are two classes of security holes: holes in individual applications, and holes in the protocol itself (the latter being much rarer). With a centralized system, these have basically the same impact, since there's really only one instance and one implementation. With a decentralized system, there are usually a few major implementations, and a lot of minor ones around the edges. If tomorrow a major bug was found in BIND (the most common DNS server) and all BIND servers had to be taken down until it was fixed, the Internet would mostly continue to function. You can't get that with a centralized service.

No comments: