Tuesday, November 2, 2010

Building a storage cluster (on the cheap)

So RAID is one thing, but (as I found out the hard way, recently) you're still screwed if the computer you put the RAID in starts going flaky. Now that I have a bit of cash, I decided it was time to take things to the next level. :O

My initial plan was to build a super-awesome NAS - basically an external hard drive that you plug into the network, so you can use it from multiple computers at once. After a lot of thought, I abandoned this plan. I don't trust commercial NAS systems (though the Drobo looks pretty sweet, I must admit), and building my own would have been too expensive. It'd also be kind of a pain to add storage - take it offline, install a new hard drive, mess around with cables (possibly forgetting to plug some in, which actually happens more often than you'd think >_>), and when you fill up the box, you either start throwing drives out (wasteful!) or upgrade the entire thing. Plus, a NAS is still a single point of failure, which is lame! I can totally do better than that.

So the new plan is to build a Ceph storage cluster. Ceph is a relatively new distributed filesystem that one of the Dreamhost guys did for his PhD thesis. It's Linux-based (and even included in recent kernels), actively maintained, well-designed (aside: I'm still pretty grateful to the distributed systems class I took at UT; without it, I probably wouldn't even know what well-designed meant in this context), and generally meets most of my standards for a distributed filesystem. At the time, it seemed like a pretty good option, and it still does! Now, if only I could find some ridiculously cheap computers to use as servers...

It was at about this point in my thought process that I discovered that Jetway makes some ludicrously cheap minimal servers. You know that plug computer that Marvell came out with the other year? Jetway's thing has much higher specs, runs x86, has a full complement of ports and room for a hard drive or two, and is well-reviewed on Newegg. The only thing Marvell's plug computers really have over it is that they're ridiculously tiny, and who really needs that?

So, long story short: I pick up two of those plus some RAM, add my desktop machine, and I have a three-way storage cluster that's infinitely expandable, open source, has no single points of failure (except for the network parts, maybe? but those things last forever), and ended up costing less than even the cheapest Drobo.

Belatedly, justification

Somebody is inevitably going to comment that I'm trying way too hard, and that backups to an external are cheaper. The trouble with doing your own backups is that you actually have to pay attention to them. My goal here is to get a system going where I don't actually have to think about it day-to-day - everything should just work.

Somebody else will probably say that I should just store everything in the cloud - then I wouldn't have to worry about anything, because they'll certainly take better care of my data than I could. The trouble with the cloud is that it's too expensive (about an order of magnitude more than I'm paying to buy my own storage cluster, which is a lot more than I'm willing to pay). Plus, there aren't any cloud storage providers that I actually trust not to snoop around in my emails, or whatever.

2 comments:

tort said...

And someone else is inevitably going to say:
YOU ARE THE MOTHERFUCKING DATA STORAGE BOSS.

But seriously, well done. I'm going to check back with you in a few months and if everything is still running cheerfully you might consider playing consultant and helping out your buds with their backup needs. ;)

P. Static said...

Yeah! I'm replicating a dataset... LIKE A BOSS :D

And, mmkay! One caveat, though, if you're not storing a large (completely legitimate) archive of files, putting it in the cloud is probably going to be cheaper. Just sayin'