Friday, November 20, 2009

Decentralization VI: Package Management

I have issues with software installation on Linux distributions. As opposed to Windows and Mac OS X, where (for better or worse) you can download a package from a website and install it, Linux packages come through centralized repositories. There are advantages and disadvantages to this method, but Linux distros don't do it because of the advantages; they do it out of necessity. I'll explain why in a bit, but I want to get the pros and cons out of the way first.

Centralizing packaging has certain benefits from a QA standpoint. It allows distros to ensure compatibility between all the various components of the system, and enforce various standards across all the packages. It also allows them to smooth over incompatibilities between distros - if Ubuntu does things a certain way, but the person that wrote the software had Red Hat in mind when they wrote it, central packaging allows Ubuntu devs to sort that out before the software lands on people's machines. Distros also prefer to have control over packaging because there are a lot of different package formats used on Linux, and it would be kind of ridiculous if every software author out there had to support all of them. There aren't hard and fast rules about which parts of installation are distro responsibilities, but there are conventions, at least.

Distros also use centralized distribution, for the most part: when you install an Ubuntu package, you download it from an Ubuntu server, using an Ubuntu package manager. This simplifies finding and installing software, obviously. You don't have to look very far to find any given program, and you're assured that what you're installing is actually the program, and not some random virus. The organization behind the distro also has to provide servers, of course, but this isn't too much of a problem. Bandwidth is cheap these days, and for a distro of any significant size, there are plenty of people willing to donate a spare server or two.

As for the disadvantages, centralized software creates a barrier to entry. Anybody can write a program for Linux, but actually getting it into the distro repositories takes a certain amount of notoriety, which is more difficult to gain without being in the repos in the first place. The result is that there's a lot of software out there that doesn't exist in any repository. Users generally don't like to install software outside of the distro package managers, because when you do, you don't get any of the nice features (such as, oh, being able to uninstall the software) that the package manager provides.

Distributions also get saddled with the unenviable job of cataloguing every useful piece of software for Linux that people have written. This takes a huge amount of developer effort; Gentoo, for instance, has hundreds of people (the vast majority of people working on it!) dedicated to just maintaining the programs you can install. We can really take a more general lesson from this: When you try to centralize something which is, in its natural state, decentralized, it's an expensive and ongoing job.

But pros and cons aside, I said earlier that Linux distributions do this out of necessity, not just because they want to. If you write a piece of software for Windows, Microsoft has a lot of people working pretty hard to ensure that it'll work on future versions of Windows. Backwards compatibility is intricate, uninteresting work. Since Linux is written mostly by volunteers, it's exactly the sort of work that never gets done, because nobody really wants to do it. The result is that backwards compatibility on Linux is a bad joke. Developers break compatibility, often gratuitously, often without warning, and so Linux software needs to be constantly maintained or it simply ceases to function.

In an environment like that, you absolutely need somebody doing the hard work and making sure a piece of software still works every once in a while. That job falls to the distros, because the original authors of the software don't always care that it works outside of the configuration that matters to them. Look at it from the distros' perspective; if you're trying to make a coherent system, but the components of the system are prone to randomly break when you upgrade them, you need to maintain centralized control if you want to have any hope of keeping things stable. In other words, the lack of backwards compatibility on Linux forces distros to centralize software distribution, and do a lot more work than they would otherwise.

These posts are supposed to be case studies in decentralization, so I'll summarize. The difference between Linux and commercial platforms is the degree of compatibility in the system. The degree of compatibility determines the amount of control you need to make sure the system works as a coherent whole. With Linux, the need for control is much higher, so distributions are pushed towards a centralized software distribution model.

No comments: