Thursday, March 26, 2009

eBook DRM Rant

The setup: I wish to purchase an ebook. I own a Sony Reader, which is a useful device for reading ebooks. (It's much slimmer than books, for one thing; it fits more or less comfortably in a large pants pocket.) So I go online, search for the book I wish to purchase, and quickly descend into hell.

First, I feel compelled to touch on the price of the ebook. I know for a fact that the entire process of creating my copy of the ebook, and transferring it to my computer, costs the website less than the process of me actually purchasing it. So why is it that ebooks nearly always cost the same amount as a paperback copy of the book? As a consumer, it grates on my nerves a bit when the price of an item is so obviously arbitrary. I like to think that prices have some relation (however tenuous) to what the item actually cost to produce.

But, this isn't so bad, right? I mean, once I have a digital copy of the book, it'll never go bad, and I can keep it for as long as my backups will last, while it takes up no space. Isn't that better than a real book?

Welcome to the bizarro-world of DRM, where publishers will go to great lengths and spend millions of dollars to make their products less useful. DRM is basically a way to make sure you only access the content enclosed within it in approved ways, which usually means only being able to access it from a single program. So, a book that you could previously read in any of dozens of programs that read PDFs, for instance, can now only be read using Adobe Digital Editions (tm) on an approved computer. Because it would be a Terrible Thing if people could do what they wanted with content that they've purchased, content creators will go to astonishing lengths to prevent DRM schemes from being reverse-engineered, up to and including prosecution under the Digital Millennium Copyright Act. (Thanks, Bill Clinton!)

So, back to my quest to legally purchase an ebook. There are multiple formats for ebooks, and naturally, all of them are completely incompatible with each other. Some online bookstores only allow you to purchase books in a single format, but ebooks.com is particularly enlightened, as they allow you to choose from three formats. There's a PDF version (with DRM), a Mobipocket version (with DRM), and a Microsoft Reader version (with DRM). Care to guess which version will work properly with my ebook reader?

The correct answer, unfortunately, is D. None of the above. Since ebooks.com secretly hates their customers, they never actually mention this, of course; I'm expected to infer it from their careful omission of standalone ebook readers from their help pages on compatibility. Now, at this point I have a few options. I could pirate the book, of course - this is illegal, but it has the distinct advantage of actually working. I could also purchase it legally, and then remove the DRM - this is extremely illegal, not guaranteed to work, and they are constantly changing DRM schemes to make this difficult. Somebody will manage it (where do you think pirates get their ebooks from originally?) but I might not.

Piracy has several other advantages. The file I would end up with is way more useful than a legally purchased one - I can use it on any computer or device, convert it to any format, make as many copies as I want, and be assured that it won't suddenly stop working at some point in the future, for no reason at all. Compared to a non-DRMed ebook, a DRMed ebook is basically worthless.

So let's review: Without DRM, I could get a pirated copy of the book. With DRM, I can still get a pirated copy of the book, and I have a compelling reason to. I honestly cannot come up with any reasoning that could possibly explain publishers' attitudes to ebook DRM. They have taken an entirely new medium, one with incredible potential, and managed to make it so much less useful than what already exists that nobody even wants it anymore. It's like they are determined to fail at all costs. (And if you ever happen to see somebody tell me that ebooks are a bad idea because look, people aren't buying them, and you see me go into a spitting rage, well, now you'll understand why.)

Friday, March 20, 2009

Scale, or Why the AIG Bonuses are Irrelevant

It's a well-known fact that human beings are terrible at comprehending scale. And when I say "well-known", I mean well-known to myself, because somehow, even today, the majority of people don't have a firm intuitive grasp on the difference between a million, or a billion, or a trillion.

To pick an extremely relevant example, I'll copy today's XKCD, and talk about the AIG bonus scandal. 165 million dollars is a lot of money, sure, but let's put it in perspective. Out of every thousand dollars that has gone to AIG, the executives there have received about 97 cents. Given the circumstances, you can easily argue that that's 97 cents more than they deserve, but come on now. Aren't you (and by you, I mean the news networks) even a little curious about where the other 99.903% of the money has gone?

The more I think about the bonus scandal, the more it seems like a distraction. On the scale that the government is throwing money around these days, $165 million is nothing - yet, for a few days, everybody is completely focused on it, including Congress, apparently. Right now, I think there are much bigger fish to fry than the AIG execs, however much they may deserve a good frying.

Mini-post today because I have been driving for seven hours or so and I am basically dead. >_>

Thursday, March 12, 2009

The Story of Paxos

In distributed computing, Paxos is a very important algorithm. It was created by Leslie Lamport in 1990, but for somewhat amusing reasons, nobody noticed it for a few more years. I'll give the history of the algorithm first, and then explain what it actually does.

Paxos was first published in 1990, in Lamport's The Part-time Parliament. Unfortunately, Lamport decided to add some humor to it, so the whole paper is written from the point of view of an archaeologist, who has just discovered an ancient civilization, which used the Paxos algorithm. It's actually a pretty funny paper, and if you translate the Greek names in it to English letters you get the names of various well-known computer scientists in the field of distributed computing. Unfortunately, when anybody read it, their first reaction was, "are you sure this is actually computer science?" Needless to say, it never got published.

Lamport was disappointed at how humorless everybody else was; and, to make matters worse, nobody really understood the algorithm anyway. So, despite the algorithm being a fairly revolutionary piece of computer science, Lamport shelved it away and continued working on other stuff. It wasn't until 1998 (and don't forget, eight years is a really long time in computer science!) that the paper was finally published.

Unfortunately, all was not well. Partly because of all the Greek names and terminology, and partly because the algorithm is fairly subtle, people kept complaining to Lamport that Paxos was too hard to understand. It eventually got to the point that, in 2001, he had to write a short follow-up paper, Paxos Made Simple, explaining Paxos in much easier terms. Even now, though, most people don't understand it.

So what does Paxos do, and why was it revolutionary? There are three things you have to understand before it will make sense - asynchrony, consensus, and the FLP result.

There are two basic models for thinking about distributed systems. The asynchronous model makes just a few really weak assumptions - you have some computers, and they can send each other messages. It makes no guarantees about when messages will arrive, or about the properties of the computers. The synchronous model is basically the asynchronous model with wristwatches: every computer agrees on what time it is, and there is an upper bound on how long a message can take to arrive. It's much easier to make algorithms for the synchronous model, but algorithms that work in the asynchronous model tend to be much faster.

The consensus problem is a pretty simple one to understand: given a set of networked computers, have them agree on something. It has formal properties for agreement, validity, integrity, and termination, which you can see in the linked article, but I'm not going to go into them here. If none of the computers ever crash, then consensus is really easy to solve in a synchronous system; it's solvable if there are failures, too, but you have to think a bit harder to make it work. (Aside: There's a whole hierarchy of failure models for distributed systems, ranging from "nothing will ever fail" to "basically anything can go wrong at any time". The latter is called a "Byzantine" failure model, after the Byzantine generals problem.) Consensus is also solvable in an asynchronous system, assuming there are no failures.

If even a single computer can crash in your asynchronous system, though, you're in a world of hurt. Way back in 1985, Fischer, Lynch, and Paterson proved that it's Impossible-with-a-capital-I to solve consensus in an asynchronous system with a single crash failure. Since "single crash failure" is the weakest failure model that allows for failures at all, this could put a bit of a damper on your enthusiasm for distributed computing. After all, consensus is a really basic problem; it's the sort of thing you expect to be able to solve no matter what.

However, and this is key: you'd have to be really, really unlucky to actually be using an asynchronous system. Theoretical models aside, if you had a system where messages could take, say, ten years to arrive, you would not be a happy camper. In practice, networks are a lot closer to the synchronous model; they're only really asynchronous under exceptional conditions. (Or, as my distributed computing prof has said many times: Impossibility results are just excuses to cheat.)

The Paxos algorithm can solve consensus in a synchronous system with various failure models, depending on what you need - there are a bunch of variants of it these days. However, here's the neat bit - in an asynchronous system, Paxos doesn't break, it just fails to make progress. When the system becomes synchronous again, Paxos picks up where it left off. This doesn't quite solve consensus in an asynchronous system, but it steps around it fairly elegantly, and is actually robust enough to be used in real-world systems.

I would go into how Paxos works, but somehow I don't think that enough of you care about that level of detail. :) So, instead: Happy Friday the 13th!

Thursday, March 5, 2009

µBlogging

Yeah, I'm jumping on the microblogging bandwagon. It seems like an interesting medium, with dynamics that kind of straddle normal blogs and IRC. Plus, one microblogging service (Twitter) has even had the distinction of being verbed, bringing it into the ranks of other web sensations such as Google, Facebook, and Rick Astley. There has to be something to it, right? As such, you can find me at http://identi.ca/rxp.

The first reaction I am expecting is, "Why aren't you using Twitter? You should use Twitter. Everybody else uses Twitter." Identi.ca is open-source, everything is licensed under Creative Commons, and supports a system for decentralized microblogging. Basically, I like the direction they're headed more than Twitter.

The other major question, of course, is "why are you blogging about this?" Frankly, microblogging would be a lot more fun if I actually knew more people that were doing it. That's why ya'll should try it too! Maybe it'll work out, maybe it'll end up being just a fad, but either way it should be interesting.