Monday, November 30, 2009

Against Sleep

I don't dislike sleep itself.

Right now I should be going to sleep, but I'm not because for some reason, some nights, the apex of my alertness for the entire day hits as soon as my head hits the pillow. Tonight already feels like one of those nights where I lie awake for hours waiting for my brain to shut itself off so I can stop staring at my clock and watching the hours tick by. And every ten minutes I remember something else I could be doing instead of waiting for sleep (like writing this blog post) and I either do it, or it bounces around inside my skull for the next half hour until I remember something more important that I've forgotten to do.

What's worse than the nights, though, is the mornings. I always feel incredibly groggy in the mornings, sometimes (rarely, luckily) to the point that I can't even get myself out of bed - and that's when I've had enough sleep. When I don't get enough sleep, it feels like microscopic ninjas have snuck under my skin while I was sleeping and sprinkled a layer of fine grit right under my epidermis. If I could switch the way I feel in the morning with the way I feel in the middle of the night, I might not be a happier person, but I'd probably at least be a lot more balanced.

And in between these two wonderfully inverted polar opposites, I dream. I'm really glad I don't remember the vast majority of my dreams, because based on the ones I do remember, I apparently dream in low-budget sci-fi horror flicks. Let's look at some recent dreams that I remember. In the most recent one, I had the power to travel back in time and redo parts of my life, but I was hunted and repeatedly killed by somebody with a similar power. In another recent dream, I die in some sort of zombie outbreak caused by a mind-controlling fungus. I wake up decades later after it revives me for some reason; it's taken over the world and nobody realizes it (because the fungus altered everybody's memories, natch). It's never bright and happy dreams, it's always stuff that freaks me the hell out.

So anyway, I've got nothing against sleep, really. It's everything associated with it that I can't stand.

Sunday, November 29, 2009

Decentralization VII: Definitions

I started this series of posts to try and get a better handle on what decentralization is, what it's good for, and what general situations require it. The level of centralization is an important characteristic of systems, but people don't always understand the tradeoffs that exist there. I think, at this point, that it all comes down to the division of control.

The definition I think I'm going to use: A system is decentralized in a certain aspect if control over that aspect is divided up over the components of the system. It's too vague to say that a system is a decentralized system without specifying what aspect you're talking about. Systems can be centralized in some aspects and decentralized in others, and in fact I think nearly all systems are a bit of both.

Mini case study: Blogspot has centralized ownership (Google) and code, along with centralized authentication (Mostly Google accounts, I think?) and centralized execution (runs on Google's servers). On the other hand, it has decentralized authorship (contrast with a newspaper or something like that), and participates in several decentralized protocols (linkbacks, RSS, etc). Contrast with Wordpress, which optionally has decentralized authentication and execution (you can run it on your own servers), and somewhat decentralized (open-source) code. Because of this, we can say that, broadly speaking, Wordpress is less centralized than Blogspot.

Systems can vary in the degree that they're decentralized. Federated systems, which I've mentioned before, have a set of independent servers which are decentralized, but clients connecting to the system don't have to worry about this and can just pretend they're connecting to a centralized system. (Contrast with p2p networks like Gnutella, where every user on the network is simultaneously a client and a server.) Once we understand decentralization as a division of control, it becomes clearer what's going on here: control is being divided among only the participants in the system that want to run servers, and thus reflects the reality that only a few users will actually want the responsibility that comes with that control.

Can we divide up control in more complex ways? Yes, but there's not a whole lot of benefit to it, and it increases the complexity of the system significantly. Simplicity is inherently valuable when it comes to code; complexity is usually a problem to be eliminated. I can't think of any computer systems right now that have more complex control hierarchies than federated systems - they almost certainly exist, but they're not common.

So what are the tradeoffs? Control is important in systems for both technical and business reasons, which aren't always the same. Twitter would definitely be more reliable if they made it more decentralized (more on robustness in a bit), but it would also impact their (nebulous) ability to turn a profit if they gave up control.

Why are decentralized systems more robust? There are a lot of factors that come into play here, really, so I don't have a clean answer yet. Let's look at some failure modes of systems. System overload is less of a problem in decentralized systems, because they're necessarily designed in more scalable ways, which makes it easier to throw hardware at the problem until it goes away. Hardware failure is also less of a problem, because systems that execute in a decentralized manner can be made immune to individual pieces of hardware failing more easily, Software bugs are less harmful in systems with decentralized codebases, because different implementations of the same protocols rarely have the same bugs.

Maybe this is the answer: centralization implies control, and control implies an opportunity for the one in control to make a mistake. If we define robustness as the number of mistakes or hardware failures that would have to occur to take the whole system down, or better yet the probability of a sufficient number of mistakes happening, it's easy to see why decentralized systems are robust. Thought experiment: if we had a system that was decentralized in such a way that any failure would break the system, we would have a system where decentralization made it far more fragile, rather than robust. Luckily, you'd have to be crazy to design something like that.

This is the last post in this series, I think - I've found what I was looking for.

Saturday, November 28, 2009

Climategate

Is it just me, or is it kind of ridiculous the way every scandal in American politics is eventually referred to as something-gate?

For those of you that don't keep up with every little thing to happen in climate politics, somebody hacked into the servers of a climate research group the other week, and released an archive containing selected portions of the past 10 years of their email correspondence, and probably some other stuff too. Naturally, people were able to find all sorts of embarrassing stuff in there. The most talked-about email is one referring to a "trick" somebody used to combine two data sets, back in 1999 - this obviously proves that the entire field of climate research is a hoax, to hear people talk about it.

I won't be talking about any of the specifics of the leaked data, though, because I can't look at the source data in good conscience. Breaking into somebody's servers and putting over a decade of correspondence online for somebody's rivals to pore over is beyond reprehensible. It's especially bad in this case, when the rivals in question are playing politics and have very low standards for proof.

One particular issue that keeps cropping up is the removal of context. People will go and cherry-pick quotes (here is a recent and out-of-mainstream example), and somebody will eventually point out that the code in question only applied to one graph that was only used for "cover art". It's easy, when you're looking at correspondence over a span of many years, to find quotes that look bad out of context - but what does that actually prove?

Let's be miserly to the researchers here. Let's assume that some of these quotes really are as bad as they look out-of-context, that there really is a massive conspiracy to rewrite the data, which is somehow only evident in a few snippets. Even assuming the worst, what does this prove? This is only correspondence for a single institution, which is competing with several other groups - and yet the results all still point in the same direction. Should we posit that every group working on climate science is in cahoots, and engaged in a massive conspiracy? What scares me about this is that many people would say that we should.

Maybe I'm wrong here, but here's the impression I get watching the climate change debate. First, there are serious researchers, who (generally) agree that climate change is probably happening, and do actual work to figure out the extent of it. Second, there are the "green" cheerleaders on the political left, who say mostly inane, common-sensical things about global warming, and push for specific political/technical solutions (looking at you, Al Gore). Their hearts are generally in the right places, except for the ones that are only on this team because they've found a way to make a profit (I'm watching you, Al Gore).

Then there's the anti-global-warming crowd, made up of both right-wing political types and people reacting against the second group, who seem content to take random potshots at the first two groups indiscriminately, cry about conspiracies, and (like all conspiracy theorists) don't really care about having a consistent response to the people doing actual science. It's telling that there are almost no climate scientists in this group. (Members of this group will respond: "because of the conspiracy!", and thus prove my point.) This group is especially irritating to me, because they tend to be very anti-scientist.

Seriously, though - hacking people maliciously is bad enough, but calling it Whatevergate is just awful.

Friday, November 27, 2009

My room is not my room

In the tried and true tradition of people trying to come up with something under a deadline, I'm going to look around and write about the first thing that comes to mind.

I'm staying at home this weekend 'cos of Thanksgiving, and I'm realizing that my room no longer really feels like my room. Everything in the room, with the exception of a floppy red hat that I picked up this summer, is exactly as it was when I left a little over three years ago. Thanks to the ruthless forces of natural selection, everything I really care about has moved with me to my apartment, and everything else (a lot of plastic dinosaurs, for example) is stuff I simply don't need. I suppose you could say that my stuff has evolved.

After a series of ruthless throwings-away, just about everything is gone. Just like with archaeology, the only bits that are left from the 12-year or so span are the ones that happened to land or get stuck in favorable spots. Here's a jigsaw puzzle that's a National Geographic cover; we thought it looked neat so we framed it. I used to do huge jigsaw puzzles all the time with my mom, I wonder when that tradition died out. I kind of miss that.

Next to it is a set of collapsible shelves. I was never content with just having ordinary shelves, so I put them together in some weird way, with varying heights. The whole thing is kind of rickety since it was never supposed to work this way, but I kind of like it. I think it looks better this way, really - it's a pretty nondescript piece of white plastic furniture. There's a brown plastic Tyrannosaurus on the shelf. I've had it for as long as I can remember.

I should mention that all the posters on the wall are tilted by about 10-15 degrees. During some nonconformist phase (back in middle school, I think?) I went and made everything off-center, because I liked the way it looked. I probably would have done the furniture that way too, if it'd been feasible. They're all basically as they were when I left; the only poster I really cared to bring with me to college was a six- or seven-foot-tall picture of a Shuttle launch (it is pretty awesome).

Something that I'm realizing as I look around. The summer after eighth grade is when I got my own computer, and it's also when I really stopped caring what my room looked like, since I was spending a lot more time online and a lot less time in my room. There's almost nothing in here that reflects me after I started high school. Maybe that's part of the reason the room feels so alien to me now - it's largely still my room from eighth grade. That's kind of a depressing thing to realize. :/

At some point (probably next May) I'm going to have to move out of this room properly. I'll take down all the posters, clean out all the drawers, throw out whatever I don't want to bring with me, and officially return the room to its status as a guest bedroom. After that, I'll come back and visit this house, but I'm never going to live here again. That's another depressing thing to realize, but one that I'm going to try to come to terms with before it becomes painfully relevant in about six months. I've already started to refer to my apartment as home; I think that's a good first step.

Wednesday, November 25, 2009

Taking Back Chrome OS

Chrome OS is a pretty interesting platform, all the more so before it's open source. It's pretty strongly tied to Google right now, but there's nothing stopping me or anybody else from taking out all the Google-specific bits, and building up something new in their place. That's the interesting thing about open source - rather than being controlled by whoever wrote the software, it can go in directions that the creators may not have anticipated, and may not want.

The obvious first change would be to switch out the default browser - Firefox OS? Opera OS? This would depend on putting the X stack back into Chrome OS, since if I recall correctly Google dropped that. It's definitely possible, though. I think the harder part would be making the necessary modifications to the browser to allow it to be used as the sole interface to the system. Chrome can apparently view folders, and supports pinning web apps to a menu; I don't know if other browsers would want to do things exactly the same way, but they'd have to do something similar.

We could also switch in other non-browser applications. The command-line addict in me would love to have an OS that boots straight to a command line, and can connect to the internet so I can ssh. Having that on a netbook that I can carry around would be insanely useful. Application vendors could also create single-application systems as demos (and if they do, people will be trying out their apps running on Linux!).

Some people are going to ask what the point is. We already have much more capable Linux distros that can do the same thing, so what's the point? The interesting thing that Chrome OS adds is completely safe, transparent upgrades, and a read-only root. This has two humongous implications for the end-user: they don't have to worry about software upgrades ever, because the system auto-updates, and they can always reboot the system to get it back to a known-good state. Yes, this also means that you lose a lot of flexibility with the system; it's a tradeoff, and in many cases, the benefits of this outweigh the costs.

The core interesting idea of Chrome OS that we can use is the idea of "appliance computing". Looking at it a certain way, the OS really doesn't matter at all - what the user cares about is the application. Chrome OS, minus Chrome, gives us the necessary foundation to build completely stripped-down single-application systems. Obviously, this isn't going to be useful for everybody in all cases, but it's a pretty interesting idea to play around with.

I think the first step should be factoring out the "OS" part of Chrome OS. It'd be useful to have a core OS part, which automatically updates in the background, and an application part, as an image which is overlaid on the OS and contains the application. This way, there could be a single shared source for OS upgrades for all the different single-app systems. It would also make it pretty easy to make your own computing appliance, since you'd just have to package up the necessary files. There would need to be a dependency system so that version mismatches between the two parts don't cause problems, but this is a problem that Linux distros have explored pretty thoroughly.

This is kind of drifting into some ideas I've had about new ways that distros could be structured, but that's a whole other topic for a whole other post. ;)

Free electronic stuff!

This is going to be kind of awesome.

It's a really interesting idea for a promotion, actually - everybody loves free stuff, so the news that this is happening is going to be all over the place by January 7th. I, for one, am going to start planning a shopping list for that day well in advance, just in case I actually get the free stuff (they're capping the giveaway at $100k, which will probably dry up really quickly). If I don't get it, I might actually still buy the stuff - it depends on how I feel about whatever project I come up with.

I really want to build some kind of flying robot, dirigible-style. No idea how that's going to work, yet - ideally it'd be able to float mostly unpowered, so I'd either have to buy a lot of helium, or learn how electrolysis works (hydrogen rocks! :D). That's my tentative plan, anyway. We'll see how long it lasts. No matter what, I'm going to have some kind of plan come January 7th.

This sort of promotion ("hey, guys! free stuff!") would be really neat if more retailers started doing it, actually. Imagine if Newegg or Amazon just announced that 1% of orders on the site would be totally free. People really love lotteries, as history has shown, so this could be really effective. If I'm choosing between a deal offered on two sites, and one of them has a chance of giving me the order for free, I know who I'm buying from.

Anyway, short post today, because Thanksgiving is tomorrow! I've made cornbread and cranberry sauce so far. :D

Tuesday, November 24, 2009

Terrorists we need to watch out for

With the terror alert level pegged at Yellow (or Orange, if you're flying), and thousands of American troops in the Middle East fighting "terrorists", I think it's important that we know who the enemy is. Here are some prominent terrorists of the past few years:

* Richard Reid, the Shoe Bomber - Despite the stringent security measures in place at airports, Reid did the unthinkable, and managed to smuggle a bomb aboard an airplane. After being admonished once by a flight attendant for lighting a match, he was later found struggling with a fuse, another match, and one of his shoes. Next time you have to take your shoes off at the airport, remember this: every air traveler has had to undergo this inconvenience for several years now, because of a terrorist that couldn't detonate his own bomb.

* Gary McKinnon, Super-hacker - The US government is currently trying to extradite this Extremely Dangerous man from the UK. McKinnon broke into nearly a hundred military computer systems during 2001 and 2002, using advanced hacker techniques such as guessing default or blank passwords. He used his access to top-secret computer systems to delete some files, and search for information on alien life, which he believe NASA and the US government were hiding. Gary McKinnon is one of the most credible cyber-terrorists discovered to date.

* Iyman Faris - Suspected of plotting to destroy the Brooklyn Bridge. This would have been a tremendous disaster, on several levels. Luckily, the plot involved cutting the bridge's support cables with an acetylene torch, which would have taken some pretty serious time and effort, since the cables are necessarily pretty sturdy, and consist of thousands of individual wires.

* Najibullah Zazi - One of the most recent terrorists uncovered in the US, Zazi received terrorist training directly from al-Qaeda, including instructions on bomb making. Had he managed to construct a bomb, he could have done some serious damage. Unfortunately for him, he failed to build a bomb despite nearly a year of repeated attempts. Nevertheless, this is hailed by some as the most serious terrorist plot uncovered in the US since September 11th.

* The Fort Dix attackers - Their entire plan was to show up at a military base and kill as many people as possible. This could have been a really serious plot, if they'd managed to acquire the weapons the plot required. However, they made a few fatal blunders: they videotaped themselves firing automatic weapons and talking about the plot, and then sent the footage to a Circuit City to get it made into a DVD. They also attempted to buy their weapons from an FBI informant.

* Star Simpson - While picking up a friend at the airport while wearing a homemade nametag with blinking lights, Simpson was arrested by officers carrying submachine guns. For a few days, the media was in a frenzy, gleefully reporting it as a threat with a "fake bomb". They stopped talking about the situation, naturally without any kind of retraction, once it became clear that she wasn't actually any kind of threat. I feel safer already!

So remember, folks: terrorists are people you need to be afraid of. The next time you see a major terrorist plot being reported on the news, pay attention to the details - given the media's track record in this area, it's far more likely that they alleged terrorist is incompetent, insane, or just plain innocent.

Monday, November 23, 2009

Cop-out post again

No time to write a proper post today :( Instead, I'll just point you to MLIA - hopefully it is enough of a distraction from this fail post >_>

Sunday, November 22, 2009

Monad documentation hate

Monads are a useful construct in Haskell. Even though it's a pure-functional language, monads allow you to "cheat" a little, or at least pretend to, and carry state through computations (or do other stuff). This is pretty important, since a truly pure functional language wouldn't be able to do useful things like generating output.

Unfortunately, just looking at the documentation that exists online about monads, I wouldn't be entirely convinced that they're not just an elaborate joke cooked up by functional programming partisans - the equivalent of a linguistic snipe hunt. Let's look at some examples.

First, the Haskell wiki article on monads, since that comes up pretty high in search results. This is pretty close to coming straight from the source; I bet they have some great, readable documentation, right?

This is the introductory image they use. The caption reads: "Representation of the Pythagorean monad". Thanks, Haskell wiki! Since you haven't yet explained what a monad is, or why there would be a Pythagorean one, this diagram means nothing at all to me. Actually, it's worse than that - having put some serious effort into learning what monads are, and actually having a basic understanding of what they do, this diagram still makes no sense to me. Awesome!

Or, let's look at this tutorial called "A Gentle Introduction to Haskell". It's the first result when I search for "haskell using monads", so it must be pretty useful, right? Unfortunately, it begins by diving straight into the "mathematical laws" that "govern" monads, whatever that means. Actually, I'll just quote it:
Mathematically, monads are governed by set of laws that should hold for the monadic operations. This idea of laws is not unique to monads: Haskell includes other operations that are governed, at least informally, by laws. For example, x /= y and not (x == y) ought to be the same for any type of values being compared.
Which is great and all, but doesn't explain a whole lot to somebody who doesn't yet know what monads are. This is not how you write introductory documentation, guys! To borrow an example from the quote, you don't introduce the == operator by saying that it follows a mathematical law. You say what it actually does first!

So what is a monad, actually? As far as my (admittedly extremely rough) understanding goes, it's the functional equivalent of a wrapper class in OO languages. You can use it to wrap up some boilerplate code that would otherwise be really irritating, in a more or less standard way. Haskell also provides some syntactic sugar in the form of "do" blocks, which you can use for some monads to make your code look like imperative code.

See, now that wasn't so hard, was it?

Saturday, November 21, 2009

Building ChromeOS on Gentoo

Google released the source code to Chrome OS, and there are already some disk images floating around, but we Gentoo users know that if you want it done right, you have to compile it yourself. ;) Here's how I built a VMware image on Gentoo.

The build instructions say you need Linux, but Google, as usual, seems to assume that you're running Ubuntu when you're trying to build Chrome OS. Here are the steps I took to build it, following the documentation that starts here:

(If you don't really care about how I built it, and just want the image, click here.)

1. emerge debootstrap
(debootstrap is necessary to actually build the OS, but you can leave this running in the background while you do the next few steps.)

2. wget http://src.chromium.org/svn/trunk/tools/depot_tools.tar.gz
3. tar xzvf depot_tools.tar.gz
4. export PATH=$PATH:$PWD/depot_tools
5. gclient config http://src.chromium.org/git/chromiumos.git
6. gclient sync
(wait for a really long time while it downloads the source)
7. cd chromiumos.git/src/scripts

(at this point, you'll need debootstrap installed, it'll be in /usr/sbin so let's put that in PATH)
8. export PATH=$PATH:/usr/sbin:/sbin

(if this fails, and you have to rerun it, you might need to delete the repo/ directory between runs. also, a few of these scripts will sudo, mainly to set up the mounts in the chroot; read through the script if you don't trust it, it's pretty short)
9. ./make_local_repo.sh
10. ./make_chroot.sh
11. cd ../..

(now, we download the browser. the link on the build instructions is broken, but this one should work:)
12. mkdir -p src/build/x86/local_assets
13. wget http://build.chromium.org/buildbot/continuous/linux/LATEST/chrome-linux.zip -O src/build/x86/local_assets/chrome-chromeos.zip

(now, we enter the chroot and continue with the build)
14. cd src/scripts
15. ./enter_chroot.sh

(at this point you should be in the chroot)
(grab a snack, this'll take a while)
16. ./build_platform_packages.sh
17. ./build_kernel.sh

(I ran into a conflict here: HAL on the host system somehow blocks ACPI from being installed in the chroot. Stopping hald on the host system worked around it successfully.)
18. ./build_image.sh

If you made it this far, you have an image built! Awesome.

To build a VMware image, use the image_to_vmware.sh script:
19. ./image_to_vmware.sh --from=../../src/build/images/999.999.32509.204730-a1

Note that this script requires qemu-img. You can edit the script and replace qemu-img with kvm-img if (like me) you have kvm installed but not qemu.

I haven't tried building a USB image, but it should work something like this:
20. ./image_to_usb.sh --from=../../src/build/images/999.999.32509.204730-a1 --to=/dev/sdb

So, as of right now, I'm running Chrome OS in VirtualBox. It's pretty slow, being in a VM and all; I'm going to try to get it on my Eee later, when I have more time.

First impressions:
The boot time is pretty freaking ridiculous, especially since it's running in a VM. Something like ten seconds to a login screen. Can't wait to see how it does on real hardware.

There's some kind of bug where I get certificate errors for Google sites - but only the pinned ones. The pinned Gmail tab errors, for instance, but logging into Gmail in a new tab looks fine. Other people have reported similar problems for other builds, it looks like, so I'm expecting that it'll get fixed eventually. It might have something to do with how it tries to automatically log in based on your login credentials; that's complete speculation on my part, though.

The Chrome menu that was in the demo is missing in my build. Not really sure why. :( Could be because I used a different Chrome binary than the one they listed in the install docs. Will have to try that again once they fix the link. >_>

Friday, November 20, 2009

Decentralization VI: Package Management

I have issues with software installation on Linux distributions. As opposed to Windows and Mac OS X, where (for better or worse) you can download a package from a website and install it, Linux packages come through centralized repositories. There are advantages and disadvantages to this method, but Linux distros don't do it because of the advantages; they do it out of necessity. I'll explain why in a bit, but I want to get the pros and cons out of the way first.

Centralizing packaging has certain benefits from a QA standpoint. It allows distros to ensure compatibility between all the various components of the system, and enforce various standards across all the packages. It also allows them to smooth over incompatibilities between distros - if Ubuntu does things a certain way, but the person that wrote the software had Red Hat in mind when they wrote it, central packaging allows Ubuntu devs to sort that out before the software lands on people's machines. Distros also prefer to have control over packaging because there are a lot of different package formats used on Linux, and it would be kind of ridiculous if every software author out there had to support all of them. There aren't hard and fast rules about which parts of installation are distro responsibilities, but there are conventions, at least.

Distros also use centralized distribution, for the most part: when you install an Ubuntu package, you download it from an Ubuntu server, using an Ubuntu package manager. This simplifies finding and installing software, obviously. You don't have to look very far to find any given program, and you're assured that what you're installing is actually the program, and not some random virus. The organization behind the distro also has to provide servers, of course, but this isn't too much of a problem. Bandwidth is cheap these days, and for a distro of any significant size, there are plenty of people willing to donate a spare server or two.

As for the disadvantages, centralized software creates a barrier to entry. Anybody can write a program for Linux, but actually getting it into the distro repositories takes a certain amount of notoriety, which is more difficult to gain without being in the repos in the first place. The result is that there's a lot of software out there that doesn't exist in any repository. Users generally don't like to install software outside of the distro package managers, because when you do, you don't get any of the nice features (such as, oh, being able to uninstall the software) that the package manager provides.

Distributions also get saddled with the unenviable job of cataloguing every useful piece of software for Linux that people have written. This takes a huge amount of developer effort; Gentoo, for instance, has hundreds of people (the vast majority of people working on it!) dedicated to just maintaining the programs you can install. We can really take a more general lesson from this: When you try to centralize something which is, in its natural state, decentralized, it's an expensive and ongoing job.

But pros and cons aside, I said earlier that Linux distributions do this out of necessity, not just because they want to. If you write a piece of software for Windows, Microsoft has a lot of people working pretty hard to ensure that it'll work on future versions of Windows. Backwards compatibility is intricate, uninteresting work. Since Linux is written mostly by volunteers, it's exactly the sort of work that never gets done, because nobody really wants to do it. The result is that backwards compatibility on Linux is a bad joke. Developers break compatibility, often gratuitously, often without warning, and so Linux software needs to be constantly maintained or it simply ceases to function.

In an environment like that, you absolutely need somebody doing the hard work and making sure a piece of software still works every once in a while. That job falls to the distros, because the original authors of the software don't always care that it works outside of the configuration that matters to them. Look at it from the distros' perspective; if you're trying to make a coherent system, but the components of the system are prone to randomly break when you upgrade them, you need to maintain centralized control if you want to have any hope of keeping things stable. In other words, the lack of backwards compatibility on Linux forces distros to centralize software distribution, and do a lot more work than they would otherwise.

These posts are supposed to be case studies in decentralization, so I'll summarize. The difference between Linux and commercial platforms is the degree of compatibility in the system. The degree of compatibility determines the amount of control you need to make sure the system works as a coherent whole. With Linux, the need for control is much higher, so distributions are pushed towards a centralized software distribution model.

Thursday, November 19, 2009

Chrome OS

I just watched the live webcast announcing Google Chrome OS. I was expecting a lot from Google with this, but they've gone even beyond that; this announcement is serious business. They're talking about fundamentally changing the way people use computers.

First, the basics: Chrome OS is exactly what it sounds like. It's an operating system that boots directly into Chrome. The OS is a stripped down Debian install, but that doesn't really matter, as we'll see in a bit. Everything happens through the browser window - there's a file browser built into the browser, for instance. The start menu equivalent (of course there's one) is a Chrome logo in the top left corner of the browser. There's no desktop, no My Computer, nothing else - just Chrome.

This brings us to the first major difference between Chrome OS and other OSes. There are no applications to install; everything you could conceivably want as an application is a web application. They make this a bit easier by pinning some shortened tabs ("application tabs", they call them) at the front of the tab list, so that you have one-click access to your Gmail, for instance. Obviously, this is a pretty radical design choice. The emphasis is definitely on shifting to online services for everything, rather than using desktop applications - and, not at all coincidentally, Google has spent years polishing their online versions of desktop applications. (They showed the online version of Office during the demo, but it looked terrible. Half the screen was taken up by the controls. I can't see how it's a serious competitor to Google Docs, in its current incarnation.)

Not only does this moot one of the usual objections to Linux ("I can't run my apps on it"), it dramatically simplifies securing the system. My understanding of it is, Chrome OS has a small number of signed and/or whitelisted programs that it runs, and the system can assume that anything outside that set is a virus. This is such a fundamentally powerful idea that I'm surprised it took this long for somebody to try it out in a consumer OS. Chrome OS then takes it to the next level by signing the kernel and OS, so that there's (probably) no way at all for malicious code to get in. Their goal is for you to be able to trust that if it boots Chrome OS, it's safe to use. As for updates, it automatically updates itself in the background - this is a difficult thing to get right, but Google is more likely than most to pull it off.

Because there are no local applications, they can get away with having no local storage. This bears repeating, because again, this is pretty radical: you can't save files to the hard drive. You can use flash drives and stuff like that, which makes sense, but the OS drive itself is locked down and mounted read-only. This will be one of the more controversial decisions, I'm sure. It forces you to store all your files and settings in the "cloud", which makes migration easier, but is probably going to be kind of a pain.

I'm not totally clear on how it handles user accounts, but my impression is that you'll be able to sign in to any Chrome OS machine using any Google account, and have it work the same. This is an incredibly powerful idea! It essentially means that you're completely independent of the hardware you're using. If your computer explodes, or you spill apple juice on it, or it's stolen by pirates, or whatever - no problem, it's easy to replace. This ties in with the no-local-files thing - if all of your files are already in the cloud, then it makes sense that you'd be able to log into any Chrome OS machine and have it work the same as your own.

A word on the cloud: This is going to be a sticking point for some people. When I say the "cloud", what I really mean is Google's servers - I doubt they'll allow you to switch to other providers, if that's even feasible. There are legitimate user interface reasons for doing it this way, but it still has the potential to be a privacy nightmare, not to mention the power Google is going to have once they hold everybody's user data. Again, this is one of those things that Google can get right if anybody can; the question is whether or not anybody actually can.

Another word on the cloud: When your internet connection is down, or when you're on an airplane, or when Google's servers go down (it's happened before, it'll happen again), your Chrome OS computer is going to be basically useless. People asked about this in the Q&A, and the answers boiled down to "HTML 5 lets you use web apps offline" and "Google has better uptime than most personal computers, so nyah!" It's not really a satisfactory answer, since HTML5 offline storage is only useful for web apps that have been specifically designed to work offline. In the end, Chrome OS won't be that useful if you're someplace without free and easy Internet access.

Random thought: It's accepted among security experts that local access = root access, for a sufficiently determined attacker, for lots of reasons. Google has taken some steps to prevent that (signed OS is a big one), but it remains to be seen whether it's enough. There are a lot of really devious attacks that you can use if you have unfettered local access.

So what's the big idea?

The paradigm that Google is aiming for with this is something called "appliance computing" - treating the computer like any other appliance. You have a basic expectation that a refrigerator, for example, will just work, without requiring constant maintenance. Appliance computing is when computers are that simple to use. This is something that people have wanted for a long time, but the existing model of computation made really difficult. Designing an OS that doesn't require administration is hard, but based on the info I've read so far, it seems like Google might have pulled it off. Designing a truly secure OS is really hard, but while I'm never willing to bet against hackers, I think Google has at least done better than anybody else who's tackled this problem.

Their goal with Chrome OS is netbooks and things like that, which makes sense. While Chrome OS looks nice, I wouldn't ever use it as my primary operating system. There's a certain level of control over my systems that I prefer to have, and one of the key goals of Chrome OS is shifting most of that control (and the associated responsibility) to Google. As a spare system, on the other hand, Chrome OS will be really useful, and that's how most netbooks are used these days anyway.

Wednesday, November 18, 2009

Death of Advertising, round 2: personal RFP

I've blogged on this topic once before, but I didn't have as much to say.

Advertising is annoying, isn't it? It sometimes feels like an overenthusiastic robot salesman that always follows you around, interjecting every few minutes about so-and-so product that people who fit your profile might enjoy. Not only is it annoying, it's remarkably inefficient - I don't care about 99.5% of the ads I see on a daily basis. This really sucks!

So, why not kill the ad industry?

I've wanted to say that for a long time, but until recently I couldn't back it up. If we want to get rid of advertising, we need to replace it with something better, otherwise nobody is going to listen. This is setting a remarkably low bar, actually: "we need to improve on a universally-reviled system." It's actually pretty surprising that nobody's managed to do this so far.

Enter Project VRM. You may have heard of CRM; VRM is basically that in reverse. They've floated the idea of a "personal RFP", the idea being that instead of being advertised to, you put something up on a website when you need something, detailing your requirements, and vendors come to you. Think of it as a reverse eBay.

This is a pretty disruptive (and therefore awesome?) idea. It could completely change the dynamic between buyers and sellers, for the better on both sides. If I'm a buyer, I don't have to spend as much time on comparison shopping - options are brought to me, instead. If I'm a seller, I don't have to waste time and money on broadcast advertising - I can just search for people who want what I'm selling, and contact them directly. I can also customize my offer to each individual, something which companies would love to do right now, but can't because broadcast advertising is such a limiting medium.

One problem on the way to adoption could be the chicken-and-egg problem. There's not really any reason for buyers to use something like this until there are sellers, and vice versa. We could get out of this by crowdsourcing comparison shopping, at least initially - having people look at requests, and finding the best deal online that matches, in exchange for a cut of the sale. This would probably even happen spontaneously, as long as the service doesn't explicitly discourage it.

Would this actually kill advertising? Maybe not, but it would certainly change the nature of it. Right now, advertising has two major functions: convincing you that you need a product, and directing you to someplace you can get it. The former goal wouldn't really be affected at all; only the latter would change. All isn't lost, though. In a perfect world, people would see the advertisements, consider them, decide they want the product... and instead of clicking on them, go to a personal RFP site, and get it that way. If this takes off in any meaningful way, we could see advertising take a huge hit.

Tuesday, November 17, 2009

Decentralization V: Regulating thepiratebay

So I heard today that thepiratebay is shutting down their tracker for good, and shifting to using DHT exclusively. Hooray, this blog deals with current events for once!

There are plenty of arguments either way between centralized trackers versus trackerless DHT. Centralized trackers are much simpler for everybody, they allow you to keep statistics on your torrents, and they let you control who can access a torrent. DHT is more robust, since it cannot be taken down except by extraordinary measures, and cheaper, since that's one less server you need to keep running. More importantly, in this case, it also lets you spread around the legal liability, diffusely enough that the network can't be taken down by lawsuits.

Let's be honest, that's the real reason TPB is doing this. For all their talk of this being the right technical decision, and not motivated by the current lawsuits, the fact is that they're potentially facing a lot of liability because they run a tracker in addition to indexing torrents. Switching to DHT exclusively means that they're free from that, and their defense that they're equivalent to a search engine has a better chance of working.

So here's the interesting point. If you want to shut down a website, it's relatively easy to do - there are several centralization points, such as the domain name, or the server it's hosted on, that are owned and paid for by people, and people are subject to the law. A fully decentralized service, on the other hand, is much more tricky. Take Freenet, for example. The system is specifically designed to be completely anonymous, secure, and deniable - if you're careful, it's next to impossible to prove that you downloaded or uploaded something from or to Freenet. I'll be blunt here - child pornography is traded occasionally on Freenet, and while there are a lot of people that would like to shut it down, for good reasons, it is basically impossible. If you want to shut down BitTorrent DHT, or the Freenet network, you're basically out of luck.

Here's the relevant question, then. How do you regulate a completely decentralized system? Is it even possible? I would argue that, with the Internet's current architecture, it's not. This is a huge deal - right now, it is possible to build completely secure and untraceable communication networks, at next to zero cost to yourself, which cannot be taken down by anything less than the scale of a massive military operation. It doesn't even have to have a lot of people using it. Take, as another example, any of the various mega-botnets running around these days. These are networks composed of millions of computers, next to impossible to shut down, where almost all of the participants in the network are there involuntarily.

What does it mean for society, now that communication has the potential to be completely unregulable? How do we shut down a terrorist cell, when they talk to each other over encrypted VoIP instead of cell networks? (Actually, that's not much of a problem. Despite what the government tells you, most terrorists would be lucky to set off a firecracker, much less talk about setting one off over VoIP.) Do laws about libel still mean anything, if speech can be truly, untraceably anonymous? What about copyright law? Now I'm drifting into old questions, though.

There are interesting parallels between legal regulation and hardware failure, actually. Decentralization, in its ability to protect against the latter, actually seems to prevent the former very effectively.

Monday, November 16, 2009

Things I nearly forgot today

* To get out of bed.

* To wake up, once I got out of bed. It took a few extra hours.

* Today is Monday. Crap.

* To finish writing my statement of purpose for grad school apps. I've had that 95% complete for way too long.

* To do other stuff with my grad school apps.

* What I'm doing here.

* To avoid tvtropes.org like the plague.

* What we're doing for our stats project. Luckily, one of my group members remembered! (The other found out for the first time today, I am pretty sure.)

* That I needed to pick up my sister after I got done with the stats group meeting.

* That ice cream doesn't actually count as dinner, no matter how much I want it to.

* To eat dinner. (Fried rice from home, plus Sriracha, eventually!)

* To do laundry. No, wait, put that under things that I actually did forget.

* To update this blog.

Sunday, November 15, 2009

Decentralization IV: Nonresident societies

Wouldn't it be neat if we could be citizens of the Internet? And not just because it'd vindicate this xkcd. More and more, we can live our lives almost completely online (Second Life being the classic-if-somewhat-irrelevant example). Taking this to the extreme, what if we could become citizens of the Internet? What would the consequences be? Would it even make sense?

(There are obvious problems with treating the Internet as a place, of course, and I blame sci-fi for a lot of them, going back to Neuromancer, and probably well before that. Sci-fi has stuck us with the gigantic lie that is "cyberspace", and convinced a lot of people that would otherwise know better that the Internet is a place, filled with things, which are separated by distance, and separated by [fire]walls. All of this is complete nonsense.)

What I'm talking about here is decentralized governance, maybe, until I find a better word for it. All governments and countries today are centralized in a really important way: centralized jurisdiction. The government has control, imperfect though it may be, over your presence in the country at any given time. They can force you to stay or to leave, and it's difficult to stay or leave without the government's tacit approval. There's a rough balance between the government's jurisdiction over you, and your ability to travel off the grid. This is enforced by the laws of physics, on some level: if we could teleport, for instance, or become invisible, the balance here would be very different.

Whenever there's a limitation imposed by the laws of physics, we can expect technology to take a crack at it. High speed transportation is changing the balance: with the advent of air travel, for example, you can pass through a country without ever being in it in any meaningful sense. This doesn't really change the balance in any meaningful way, though, because governments can simply step up control of airports, and pull the balance back in the other direction. Fundamentally, it's still the same situation.

Jurisdiction is important. It forms the basis for both laws enforced by governments, and services provided by governments. A military, for instance, protects a territory. It wouldn't make sense for countries to have militaries, unless there was a clear-cut equivalence between a government and the territory it owns. With an Internet-based government, though, you would lose that equivalence. When every citizen of a country can leave at will, simply by disconnecting, there are many services that it's impossible to provide.

You should be objecting at this point that this is all moot, since any Internet citizen would also be a citizen of whatever country they happened to be living in offline. While this is true now, it's not a necessary condition. We could imagine a region, with minimal (or without any) governance, designed specifically for people from various Internet nations to coexist. This could become arbitrarily ludicrous; imagine being a police officer in such a place, and not knowing whether or not any given person you saw was within your jurisdiction. Or, imagine trying to collect taxes from citizens that suddenly switch their allegiance for a few weeks every April.

Since I'm running short on time now, I'll just go ahead and assert that an Internet nation can only regulate what happens on its own servers and systems. This is not entirely useless, and a lot of services (especially identity-related services, which every government in the world is currently sucking at) could be provided this way. Even so, at this point we're talking about such a watered-down and neutered conception of citizenship that an Internet nation ceases to have any meaning at all.

Exclusive Internet citizenship is pretty much a bust. All is not lost, though. We could imagine a form of Internet dual-citizenship, where an online "virtual country" provides some additional services, and works with governments to provide them across national borders. (Actually, I really hope the phrase "virtual country" doesn't ever catch on. I'm starting to hate it already. >_>) This would represent a hybrid decentralization of government - doing what can be done in a decentralized way, but falling back to the existing central government for everything else.

Saturday, November 14, 2009

College Puzzle Challenge non-post!

Today is the College Puzzle Challenge! Non-stop puzzle solving from 11 AM to 11 PM, it'll be awesome. But, if you're expecting a real post from me, you're crazy. :p

Instead, I have a fortuitous link or two!



These two blog posts look at corporate politics via The Office. Even if you don't watch The Office, they are filled with seriously awesome stuff! But you should maybe consider watching The Office anyway. <_<

Friday, November 13, 2009

Review: Sony Reader PRS-500

Somebody asked me to do a review of this, since I occasionally talk about how awesome it is. (My theory is that, because it's a relatively niche product, Sony forgot to have their crack team of anti-engineers work on it and cripple it in some way.)





So what's good about it? Mostly, it rides on the strengths of the e-ink screen. The battery life is simply ridiculous - I only have to charge it every few weeks, or every few days if I'm reading non-stop. In terms of books, the battery lasted long enough to get through Anathem (nearly 1000 pages) on a single charge. E-ink only uses energy when it updates the screen, so it's perfect for this kind of device. For people that say that an iPhone is all the ebook reader they'll ever need: yeah, let's see your iPhone last this long.

Physically, it's comparable in size to a really thin paperback - it's a bit larger, thinner, and heavier. It'll even fit in your pocket, if you have big pockets. It comes with a fairly sturdy cover, which is pretty nice - as long as you don't manage to snap it in half or something, it's almost as sturdy as a real book. There are multiple buttons you can use to flip pages, which is convenient if you want to hold it in different ways. It connects to a computer using a normal mini-usb plug.

The contrast on the screen could be higher, but it's still perfectly readable in most lighting. Bright light definitely helps, though - imagine a book printed on medium gray paper. That's about how reading the screen feels. You can adjust the font size, but I usually keep it on the smallest setting so I can see more text at a time. You can't change the font that it uses, which would be nice, but isn't really essential. The screen has 170 DPI, which I sometimes wish was higher, because the edges of the letters look jagged if you look really closely. It's not bad enough to be distracting when you're reading, though. Overall, the reading experience is pretty good - it's much easier on the eyes than an LCD screen.

It has about 100MB of usable storage built in, which is enough for several dozen ebooks - I haven't even come close to using up the space yet, since I usually delete books from it when I finish reading them. It also comes with an SD card slot, though, so if you wanted to you could put thousands of books on it at a time, and swap out SD cards for even more space. Basically, for all intents and purposes, you can treat it as having unlimited storage.

One thing I wish it had is a way to jump to a specific page. It's usually pretty good about remembering your page, but if you lose it somehow it's a huge pain to get back to it. It'd also be nice if you could do more from the interface: you can't delete books, for instance. For that, you have to use Sony's provided software, which is a gripe in its own right, because it is kind of shittastic. It took me forever to figure out how to even copy a book onto the device.

This is largely mitigated, though, because there's an pretty good open source replacement for it, called Calibre. Not only does it actually work (on Windows, Mac, and Linux, no less), it handles conversions between different ebook formats pretty smoothly. The only thing you might conceivably need Sony's software for is firmware updates, and since I'm pretty sure they've stopped supporting the PRS-500, that's not a huge concern anymore. >_>

One more thing: Sony is launching a bookstore based on EPUB, but for some inexplicable reason, they've decided that the PRS-500 isn't important enough to update with EPUB support. I applaud their zeal in trying to retroactively screw over this device, but since EPUB DRM has already been broken, I'm not anticipating too much trouble converting to an older format.

Edit: Wow, I should complain about stuff more often! I just saw via mobileread that Sony will offer a free upgrade to PRS-500 owners. XD

Thursday, November 12, 2009

Decentralization III: B.Y.O.I.

There's one question you can ask of any new Internet technology that can usually predict whether or not it has any chance of succeeding as infrastructure: "Can I just run my own, and have it work the same way?"

This sort of decentralization is common to nearly all successful Internet-scale technologies. I can run my own Web server, and anybody can access it just fine - I don't have to get permission from the Web Corporation, or run my website on hardware provided by them, or anything like that. Same with e-mail - anybody can make their own mail server, and send messages to anybody (unless the ISP blocks it, which many do these days). Same with XMPP - anybody can run their own XMPP server, and use it to chat with anybody else in the world using XMPP. Same with Google Wave - Google knows what I'm talking about! They designed it to run in a federated model, so that people can set up their own Wave servers, and communicate with people using other ones. Same with dozens or hundreds of other protocols.

Of course, there are some notable exceptions, in every direction. Google is arguably Internet-scale, even though they're a centralized service. Internally, though, they've got decentralized infrastructure all over the world. Basically, they've gotten some of the benefits of decentralized infrastructure, by paying for all of it themselves. IRC, on the other hand, is sort of a counterexample because even though it has decentralized infrastructure, it scales pretty poorly - if you're on one IRC server, you can't talk to people on other servers. It's a tradeoff, really. IRC servers have enough problems to deal with even without being able to send messages between servers.

Twitter's a counter-counterexample, because I love to hate on Twitter: They fancy themselves to be an Internet-scale service, but I don't believe they can scale to match the demands that come with truly being Internet-scale. (They're still getting taken down by DDoS attacks every once in a while. Can you imagine the kind of DDoS it would take to knock Google offline?!) Clever engineering may save them yet, but I wouldn't count on it.

As a counter-counter-counterexample (can he do that? :O), we have Identi.ca and the OpenMicroBlogging network. This is a twitter-like service that actually can scale, because they've designed it to be usable across multiple servers. For example, if I have an account on Identica and somebody else has an account on Brainbird, I can subscribe to them and everything just sort of works. I guess that instead of being a counter^3 example, this is just an example: anybody can run their own instance and have it work with everybody else's.

The BYOI (bring your own infrastructure) approach works. But what are the tradeoffs? Decentralization in these cases brings scalability and robustness, both of which are really important for systems that are trying to gain traction. On the other hand, you lose centralized control. This makes it more difficult (next to impossible, in some cases) to update the protocol, and it means that some unwanted uses of the system (such as spam) are impossible to control.

It also means you have to subdivide your namespace, and this can be a tricky problem. With Twitter or AIM, you can talk to people using just a username. With email or XMPP, you have to specify what server they're on as well (user@server.com), because people can use the same username across different servers. People have tried to design decentralized systems that use a centralized namespace before, but it's a fundamentally hard problem to solve without compromising your design. I would argue that DNS (the system that maps domain names to addresses) has managed it - whenever you type in a website address, you're pretty confident that anybody else in the world will see the same website, because it's a centralized namespace on a decentralized system. On the other hand, they only managed this by charging money for domain names, which isn't something that'd work well in other places.

Finally, for a system based on a decentralized protocol, you get some level of additional security over time. There are two classes of security holes: holes in individual applications, and holes in the protocol itself (the latter being much rarer). With a centralized system, these have basically the same impact, since there's really only one instance and one implementation. With a decentralized system, there are usually a few major implementations, and a lot of minor ones around the edges. If tomorrow a major bug was found in BIND (the most common DNS server) and all BIND servers had to be taken down until it was fixed, the Internet would mostly continue to function. You can't get that with a centralized service.

Tuesday, November 10, 2009

PiCoWriMo

As some of you are intimately aware, November is National Novel Writing Month! One of these years, I'm actually going to do it, but right now is really a bad time I think. Maybe next year. Several of my friends are doing it, though, so this is a tribute to them!

A NaNoWriMo is a bit much for me right now, but thanks to the magic of the metric system, I can at least manage a PiCoWriMo. So here is a fifty word story:

The suburban lights sprawled beneath them, in jarringly uniform rows, like government-issue constellations. As they fell, they tumbled, and glimpsed for a moment their makeshift dirigible, scudding silently against the Moon's halo. They landed hand in hand, mad grins on their faces, as noiselessly as two trees in the forest.

Monday, November 9, 2009

Default applications on Linux

Guys, I'm interviewing with Microsoft today, wish me luck! To commemorate this, I'm going to do a tribute to the anonymous Linux Hater's excellent blog.

How to set default applications

On Windows: Right-click on a file of a given type, go to "Open With", and set the application that you want it to open with.

On a Mac: Pretty similar, except you use "Get Info" instead of "Open With".

On Linux: It depends on what desktop environment you're using. Gnome, KDE, XFCE, and every other desktop, does it in a completely different, completely uncoordinated way. Also, if you switch desktops, you lose all your settings.

Q: But doesn't that make people's lives more difficult?

A: Linux is about choice! Specifically, the choice we made to do whatever the hell we wanted.

Q: But what if I want my program to set the default application for a certain filetype?

A: Oh, you should never need to do that. We won't tell you what you should do instead, but we're very sure that you'll never want to do that.

Q: Wait, I thought Linux was about choice! What about my choice as an application developer?

A: Did we say that? Actually, it's just an excuse we use to avoid any of the hard work that might be involved in actually standardizing things. See, because when we don't, users have to make choices they otherwise wouldn't need to worry about, but this way they feel better about it. Choice, baby!

But wait, never fear! There's the Portland Project, which had the lofty goal of unifying Gnome, KDE, and all the other desktops out there! Surely, an effort this important to the success of the Linux desktop would receive the huge amount of developer attention it needs to become successful!

...no, wait, my bad, I'm thinking of a less dysfunctional operating system. Actually, xdg-utils (the result of the Portland project, which was supposed to unify all this) is buggy, mostly unmaintained, and hasn't actually seen a release in years. On top of that, instead of being an awesome unifying solution for the completely separate default application systems that exist, it's just a few shell scripts that support Gnome, KDE3, and sometimes XFCE. There's not even a generic fallback mechanism in there! It's completely useless for users of anything other than Gnome and KDE (and really, XFCE too).

Wouldn't it be nice if there was a tool that would let you just go to a command line and type "open whateverfile.whatever"? All the desktops could standardize on it, users of other desktops wouldn't be left out, and everything would be simpler. I have been thinking about this for a while, and I even took a few shots at writing such a tool. So you can imagine how dismayed I was when I discovered that this tool already exists on Macs and it works perfectly. You'd think the Linux community would have been all over that; ripping off Apple is what they do best!

Sunday, November 8, 2009

Decentralization II: currency

There are two major systems of currency in this country right now. (There are probably others that I'm overlooking, but we'll simplify.) I'm referring to cash, and credit cards. Ostensibly, these two are a single currency, since they're readily interchangeable and tied together, but they're clearly two separate systems.

Cash is a decentralized system. Any given bill holds its value more or less independent from its surroundings, assuming of course that the government is still around. Credit and debit cards, on the other hand, form a centralized system - your credit card may have a fancy picture on it, but unless whoever you're trying to pay has a dedicated communication channel with the credit card company, it's just a very pretty piece of plastic. This distinction is made clear in several ways.

First, authentication of value: Cash uses a relatively weak distributed authentication scheme - the bill or coin only has to look valid, which can be made difficult, but never impossible. This only works because breaking the authentication (counterfeiting) on a large scale can be expensive, and each individual break only nets you a relatively small payoff (the value of the coin or bill). Credit and debit cards, on the other hand, use relatively strong authentication for value - you contact a centralized server, which can keep track of all currency in the system. This is far more difficult than the decentralized cash model, but it's also next to impossible to break, assuming the server is written securely. (Note that authentication of value isn't the same as authenticating the owner of the money - both cash and credit cards are pretty bad at this, in different ways.)

There's also the issue of robustness. It's not unheard of (even if it's not exactly common) for credit card processing to "break" at a given venue, forcing them to process all transactions with cash, or by other means. This can happen because credit cards simply don't work without being able to contact a central location to get the balance on the card, and subtract from it. Cash, on the other hand, is available for use under a much wider range of circumstances. Because it holds its value in a fully decentralized manner, it's not subject to any kind of communication breakdown, and so it works as long as people are willing to accept that it has value. (Note, though, that the authentication is weaker because of this - thus, counterfeiting is possible for cash, while a fake credit card wouldn't get you very far.)

Cash also has the property that, since no centralized communication is required for its use, it can be used without being tracked. In other words, cash can be completely anonymous. With credit cards, there's a single point through which all transaction data flows, and this in turn means that data about when the card is used can be collected very efficiently. This is another property that applies in general to centralized systems, but it can be applied to decentralized systems too - it's just harder. We could imagine, for instance, a system in which people were required to scan all currency that they handled into some sort of centrally controlled device, to prevent counterfeiting. This could be circumvented, since the scanning step isn't required for the transaction, but it's a way to graft centralized notions of control onto an otherwise distributed system.

This is an important point, actually. In a lot of cases, centralized systems can have decentralized aspects added, or vice versa. With money, for instance, it's all printed in a few centralized locations, and this contributes to it having value: if anybody could print their own money, it'd be worthless. When it comes to distributed systems, introducing centralization can be a powerful technique, if the tradeoff is worth it.

In other news, I haven't figured out everything I want to do for the rest of these, so I just might take requests. XD

Saturday, November 7, 2009

Decentralization I

Which is better, centralization or decentralization? I've been giving this question a lot of thought lately, in the context of distributed systems, and I've come to realize a few things.

First, the question applies to more things than you'd expect. For once, I'm not going to spend the entire time talking about tech - command and market economies, for instance, can be understood as centralized and decentralized systems, respectively, and analyzed as such.

Second, there are fundamental limitations in both directions. For example, a fully centralized system will always undergo some sort of failure eventually, thanks to Murphy's law, while a completely decentralized system has to deal with malicious individuals, asynchrony, and other such issues that keep people that work on distributed systems awake at night.

Third, it's not an all-or-nothing question; it may not even be a smooth gradient. In addition to the tradeoff between centralization and decentralization, we need to consider hybrid decentralized systems, which have proven to be a good compromise. Popular examples include e-mail, and XMPP (aka Jabber, aka Google Talk).

Over the course of this month, I'm going to be writing a series of posts about decentralization. Many of them won't even be about computers, but about other systems that I'm looking at from this perspective. I think that there are a few core principles that account for the difference between centralized and decentralized systems, and I'm trying to tease out what those are.


Friday, November 6, 2009

The OOM Killer

There is a small but vocal contingent of Linux "advocates" that are only too happy to tell you that Linux is super-awesome and will solve all your problems. "Linux will do everything that wind0ze will do, only better, and it'll do it for free!" and on and on like that.

I'm not writing this post specifically to annoy people like that, but it certainly wouldn't hurt. :3

So what's this "OOM killer"? It's something that Linux fanboys generally don't like to talk about, assuming they even know it exists. Let's get some background first, on memory allocation.

When a process is created on Linux, the operating system needs to copy all the memory from the parent process into the new process. It does this because, following the traditional process abstraction, the child process needs to inherit all the data from the parent, but also needs to be able to modify it without messing up the parent. Linux uses a technique called "copy on write", or COW, to do this really quickly. The trick with COW is, instead of actually copying the data, you just mark it read-only, and point both the parent and the child to the same copy. Then, if either of them tries to write to it, you copy it secretly and pretend that it was a writable copy all along. This works really, really well, since the vast majority of memory ends up never being written to for various reasons.

Normally, Linux will handle when it runs out of memory by returning an error to the process that tried to request the memory. This works more or less well. There's an unfortunate tendency among programmers to ignore the result of malloc, though, which means that some programs will start to randomly crash when you get close to running out of memory. The point is, though, that at least there's a way to detect the situation - carefully written programs can avoid crashing by checking the return value of malloc and reacting properly if the system is out of memory.

But there's another problem here. Notice that with COW, the actual memory allocation happens at some random time, when you try to write to some random variable. If the system is out of memory, and has to make a copy, then you've got a problem. (Ideally, you'd make sure that there's enough free memory when you create the process, but then you're wasting a lot of memory - can't have that!) You can't just tell the program that you couldn't allocate memory, because the program didn't try to allocate memory in the first place! You have an error that can't be properly handled. So, Linux handles this situation with... the OOM killer.

The OOM (out of memory) killer does exactly what it sounds like: when you run out of memory, it kills a random process so that the system can keep going. They've developed some rather elaborate heuristics for how it selects the process to kill, so that it's less likely to be a process that you really care about, but as described in this awesome analogy, that's somewhat akin to an airline trying to decide which passengers to toss out the airplane if they're low on fuel. No matter what you do, the fact remains that you've gotten yourself into a bad situation.

I've seen swap provided as a solution to this, but that's basically just saying "don't run out of memory in the first place" - it's not terribly helpful. The fact is, no matter how carefully you write your program, and how meticulous you are about checking for errors, there's still a chance that your program will crash randomly, for no reason at all. Yay, Linux!

Thursday, November 5, 2009

An Egg-and-Chicken Situation

I was reminded today that, while it's perfectly obvious to me that the egg came first, there are a lot of people that still aren't convinced. So, without further ado...

Initially, we can trivially say that all chickens were preceded by dinosaur eggs. This is no fun though, so I'll go ahead and strengthen the "paradox": which came first, the chicken or the chicken egg?

We define a creature as a "chicken" based on its DNA being sufficiently close to the modern species of chicken.

First, we take it as a given that any creature sufficiently chicken-like to be called one will have hatched from an egg. (Indeed, if a chicken-looking creature was born by some other means, we would probably not accept it as a chicken - and this is a proof in and of itself, albeit a less interesting one.) Thus, every chicken is preceded by at least one egg. However, this does not preclude an infinite cycle, which is the source of the paradox.

We next note that, because a chicken does not have the same DNA as either of its parents, and we're defining chicken-ness based on DNA, its possible that a chicken could be born where one or both of its parents are not-quite-chickens. Furthermore, I assert that, since there has been a point in time at which chickens did not exist, this must have happened at least once. If we consider that first chicken, it had the same chicken DNA when it was an egg, so it was in fact preceded by an egg. QED.

Wednesday, November 4, 2009

An Unnecessary Evil

What do antivirus software and URL shorteners have in common? They're both elaborate solutions to fixable problems that should never have existed in the first place. They should have both been implemented by the one responsible for the problem, but both were instead solved by third parties. Also, they're both annoying. >_>

Start with antivirus software. The problem, obviously, is the rampant insecurity of Windows as a platform, combined with (let's be fair to Microsoft here, after all) a user base that's been trained to believe that running programs (installers) you've downloaded from random places on the Internet is okay, or even normal. The combination of these two factors has led to a malware industry that's actually pretty impressive, in terms of scope.

These days, Windows security is a lot better than it used to be, and Microsoft is even providing their own antivirus product. Some complain that it's anticompetitive, and that Microsoft is in a position to wipe out the rest of the antivirus market. I will be blunt: I would be perfectly happy to see that entire market shrivel up and die. It should never have existed in the first place, and if Microsoft can secure their OS to the point that we don't need antivirus, we'll all be better off for it. They haven't solved the problem yet, but they're trying, at least.

As for URL shorteners, let's look at why they exist at all. Twitter has a hard 140-byte limit on updates, a limit which is actually imposed by the 160-byte limit of text messages. People who receive updates through texts are the only ones for whom the limit is relevant; for the rest of us, it can easily be glossed over in the interface.

So, why doesn't twitter just run their own URL shortener for SMS users, keep the messages with shortened URLs within their own system, and automatically expand them when displaying them to users that aren't using SMS? Then users wouldn't have to worry about shortened URLs at all when reading tweets; Twitter clients could automatically shorten URLs to an appropriate length when posting updates; Twitter could even charge for use of really short URLs and finally have an actual revenue stream. (Or not, on that last one, I dunno if there's enough scarcity there to even support micropayments.) Ideally, the shortener would guarantee some minimum length of URL that it could provide, but then actually use the longest possible URL that would fit in the message, to preserve the URL space.

As with antivirus software, a market has sprung up to address this unnecessary flaw in Twitter's service. There was a rash of URL shorteners popping up with the rise in Twitter's popularity, though I think that's cooled off quite a bit with Twitter's use of bit.ly as the default, and is.gd going out of business (or not? I forget now). Given that a lot of people aren't using Twitter via SMS, URL shorteners are an unnecessary evil in most cases. Twitter should follow Microsoft's example, step up to the plate, and give us back our URLs.

Tuesday, November 3, 2009

Idea post: low-latency caching proxies for mobile internet

This is just an idea post, since I seriously don't have time to implement this right now.

A problem that has been getting worse with mobile internet (on cell phones, etc) is latency. Rather, I should say, the bandwidth on mobile devices has been getting better and better, and is starting to rival slow DSL connections for throughput, but the latency is still atrociously bad. This could be addressed by mobile network operators, but since I'd like to see this problem solved before I am dead, I think it's fair to look into alternate solutions.

Quick explanation first: Bandwidth is more complicated than people usually think. There are two main components to it: throughput, which is the number of bytes per second, and latency, which is the amount of time those bytes take to arrive. Internet connections are usually sold exclusively in terms of throughput, but latency can be really important, especially for applications that have to go back and forth between you and a server a lot. Wireless networks (cell networks and wi-fi both) generally have way higher latency than wired connections of any type. For a good wired connection, 30-150 milliseconds latency is normal, while for a wireless connection, it ranges from a few hundred to a few thousand milliseconds. Congratulations! You now know more about internet connections than a Level 2 AT&T tech support person.

idea 0: low-latency caching proxies
This is actually a pretty trivial thing - anybody can set up a proxy server, and point their mobile web browser at it. (Well, theoretically.) This doesn't gain us that much, though. It'll mainly reduce latency caused by the website, not by the mobile network, so it doesn't really address the actual problem. There are two reasons for even mentioning this: it gives us compression, since that's a relatively straightforward thing that proxies can add, and it makes the rest of the ideas possible.

idea 1: aggressive HTTP pipelining
HTTP allows for an optimization called "pipelining". Basically, the client makes a bunch of requests at the same time, on the same connection, and the server responds to them in order. This can do wonders for reducing latency. The usual sequence of actions goes something like this: send request, wait, get response, send request, wait, get response, send request... Pipelining eliminates most of the waiting. Clients will usually limit the amount of pipelining they do to be kind to web servers, but if we're getting everything from our own proxy server from step 0, we can send as many requests as we want.

It's not perfect, though. On the web, documents will usually pull in other files, like images, scripts, etc, and those can themselves pull in other files. So while we'd like to make all the requests up front, we usually don't know which files we'll need before the responses start coming back. Pipelining helps, but we still end up having to wait more than we'd like.

idea 2: inlining images and scripts and stylesheets
What if instead of depending on the client to fetch everything, we had the server help a bit? A lot of files on the web can actually be either linked to as external documents, or rendered inline inside a page. Javascript and CSS can be inlined pretty easily, and every web browser supports that. It turns out it's possible to also inline images this way, using something called a data URL.

The proxy server could fetch the web page, fetch all the other parts of the page that the client would normally have to request individually, and package them up into a giant page, before sending that to the client. If this works perfectly, then the client only has to make one request, and wait for one response.

This has some pretty serious disadvantages, though. For one thing, it bypasses caching on the client device, so anything included using this method would have to be fetched each time the page loads. This would be fine for small files, but there would have to be some kind of heuristic on the server for what to inline, and what to leave out. I'm also not entirely sure that javascript behaves exactly the same way when it's linked versus inline, but given how well mobile devices support javascript to begin with, that may not be such a big issue.

idea 3: image prescaling
Jumping back to decreasing bandwidth here. It seems kind of a waste to download a full-size image to a mobile device when it's just going to scale it down before it's displayed. Why not scale the image down to the display size before it leaves the proxy? We'd need some way to control this from the client end, since otherwise the server won't know what size to scale to. A custom request header would work, probably. This would also save a lot of CPU time on the mobile device, which translates to faster rendering and less battery usage.

This may not work so well for zoomable browsers, though, such as the iPhone. In this case, it would be interesting for the server to make the image a progressive scan image, so that the client could receive a scaled version first, and then the rest of it later. We could even imagine something fancy, with progressive scan images and HTTP Range headers, where the client first downloads the first progressive part of each image, enough to display a scaled version, and then goes back and fetches the rest of each image file.

(And while we're at it, we can convert GIFs to PNGs. :p)

idea 4: speculative server-initiated prefetch (needs a better name) blah blah
We've been trying to work around the fact that clients have to fetch everything that's on a webpage to load it. What if we bypass that entirely? (this one is more me thinking out loud, actually)

What I'm proposing here is a way (that I haven't thought through properly) to have the server push files to the client, rather than waiting for the client to request them. This would allow the client to cache page elements properly, but still have the same performance as inlining page elements. The problem we'd run into then is that the server has no way of knowing what's in the client's cache (and really, shouldn't), so there will still be a lot of wasted bandwidth here. Maybe the server could send a list of page elements and -- nope, then we're back to client fetch anyway. Hmm. This one might be a lost cause. >_>

Existing stuff

Naturally, I'm not the first one to think of this. There are a few implementations of parts of this; the one that comes to my mind first is Opera Turbo. What I'm proposing here, though, is an open-source server that anybody can run, rather than an add-on controlled by one company. After all, the Internet has proven time and time again that the usefulness of a technology is proportional to how easy it is for an enterprising geek to roll their own.

Monday, November 2, 2009

Lesser-known racist fallacies

"My friend, who is a member of (ethnic group), was not offended by this joke; therefore, it's not racist." This is really a generalization of the inexplicable tendency of people to assume that any member of another race speaks for all members of their race, which is roughly the intellectual equivalent of "You all look the same to me." I, for example, am a pretty nonrepresentative sample of Indians. If you want to know what all Indians think, give me a few days so I can go around and ask all 1.1 billion of them.

"I didn't mean for it to be taken as racist." a.k.a., "The road to racism is detoured by good intentions." The intent was never the problem to begin with; racism is as problematic as it is because it offends people, not because you wanted to. (okay, so this is debatable.)

"Minorities, when speaking about racism, are above reproach." If anything, minorities are frequently just as racist or more racist than others, just because nobody thinks to call them out on it. (Asians, for example, can be incredible nationalistic snobs. Just try asking a first- or second-generation immigrant parent about all the amazing things that were done first in their country.) This is still a bad thing!

"Traditions and culture are especially important to minorities." In my (unnecessarily inflammatory) opinion, traditions are mainly important to people who don't have much else. I am more than what I've been handed down by thousands of years of handing-down, or at least, I aspire to be.

"Meat is delicious, you should try it sometime. :o" Nah, see, I might be convinced by this reasoning, except that it kind of misses the entire point. >_> I will not be swayed by your delicious bacon!

Sunday, November 1, 2009

Barriers to entry for open-source contributions

Contributing to open-source projects is relatively easy, but it could definitely be easier. There have been times that I've found a problem, decided on a solution, even written a patch once or twice, and then lost interest because of some random restriction along the way. This is wasted potential!

Moreover, contributing to open source is easy for me, but I'm already an open source contributor. For new contributors, the process can be really daunting, for a variety of reasons. I'm going to go through a list of things that projects can do to encourage contributions, roughly in order of increasing difficulty.

Ask!
It's really surprising how few open-source projects even get this far. Every project needs to have a page somewhere that says something along the lines of, "If you want to contribute, we could use a hand with X, Y, and Z. Contact so-and-so for details." If you don't say something like this, most people - especially those that aren't familiar with open source - will just assume that you don't really want outside contributions.

Keep some easy bugs around for new contributors
This is pretty easy for any project that has a reasonable volume of bug reports coming in. Invariably, some of them will be for really easy stuff - spelling fixes in documentation, or other really trivial fixes. Instead of just fixing these, give them a special tag or something on your bug tracker, or make a list of them on your website, and advertise this list to your users. The ones that are interested in contributing will have something quick and easy to get started on, and you'll have that many fewer bugs to fix - it's a win-win! Ideally, once you have a system like this in place, people will feel more comfortable about filing trivial bugs too, and you'll end up with higher-quality software overall.

Don't require registration to submit a bug report
Trac is kind of bad about this since everybody runs their own instances, though if I recall correctly recent versions support OpenID, which is a big step forward. If I find a bug, but I have to go through the whole registration dance just to report it, I'm just not going to bother reporting it unless it's a really severe bug. You may say that I'm not a serious contributor if I let a registration page stop me, but that's kind of missing the point. Casual contributors can be just as valuable as really serious ones, and there are a lot more of them out there.

Allow editing through a web interface
This one doesn't actually exist yet, but it could. The normal process of submitting a patch involves checking out the source code, making modifications, generating a patch, and submitting that to the maintainers of the program. 3/4 of those steps could be automated on a code hosting site such as SourceForge or github or bitbucket. The code could be cloned on the server; the user could be presented with a simple text editor in the browser (or something more fancy, like Bespin), the user could save the code with a commit message when they're done, and the commit could automatically be submitted as a pull request on the server. I wouldn't want to use this to make serious changes to the code, but this isn't designed for people that are already making serious changes. For somebody who's just making a small cosmetic change, this would be a huge timesaver, and that in turn increases the number of contributions you get.