Thursday, September 3, 2009

5 Things about ZFS that suck

So it's Friday now, and it turns out I completely forgot about writing a post this week. (I have two in the works, but both deserve more seriousness than I can find time for today.) Part of the reason for this is that I spent a good part of the weekend fixing my computer - in the course of getting KMS working (see previous two posts), I had to do several hard shutdowns when things didn't go right, and one of these killed my big media RAID. I keep a lot of files in that RAID, almost 700 GB, so this was kind of a big deal.

The kicker here, though, is that I was using ZFS for my RAID, precisely because it promises a hitherto-only-dreamed-of level of reliability. The idea behind ZFS is that you configure it across several hard drives, with as much or as little redundancy as you want, and it'll survive just about anything, including loss of one or more drives or random disk corruption, and keep on running through all that without a hiccup. It's an impressive filesystem, to be sure, but it's not without its flaws.

In fact, I got fed up enough with the flaws to make a list!

1) No native Linux support

Yes, I understand the licensing issues involved, and yes, I understand that the FUSE port of ZFS works pretty well on Linux, and yes, I understand that because of various design choices ZFS has almost no chance of making it into the Linux kernel. But you know what? All of these things are solvable problems, and I think it says a lot that they weren't solved a long time ago.

2) Inflexibility of disk mirrors

If you have two hard drives that are mirrors of each other, ZFS handles things well when you first set up the array. Instead of requiring that the disks be the same size, like some RAIDs, it'll just take the minimum of the two sizes, and pretend both the disks are that size. Then, if you take the smaller disk offline, the array will automatically grow to whatever size is available to it. Here's the problem, though: once ZFS is set up, you can't add a smaller disk to mirror a larger disk - not even if the data would fit on the smaller disk, not even if there's no data at all yet on the larger disk. This can lead to the somewhat ridiculous situation of detaching a disk from a mirror, and being unable to reattach it, because it's become too small.

3) Inflexibility of RAID-Z

RAID-Z is just RAID5, but in ZFS. It's the most flexible RAID option when you're setting up, but make sure you've got it configured how you want it, because once it's set up there's no way to add or remove disks in a RAID-Z. All you can do is replace them if they fail. Modifying a RAID-Z would apparently require a fairly significant change to ZFS called "block pointer rewriting". (More on that in a sec)

4) Inability to remove top-level vdevs

This is another one that can really bite you in the ass. Let's say you're trying to add a mirror to an existing drive, but you're careless and you use the "add" command instead of the "attach" command. No problem, right? No data's even hit the disk, so you should be able to just remove it, right? Wrong. Top-level vdevs are permanent in ZFS. Your only recourse if this happens is to back up all your data, destroy the pool, and then re-create it from scratch. (Guess who made this exact mistake this weekend. :( )

Apparently, removal of top-level vdevs is another thing that depends on "block pointer rewriting". Funny thing, though - even though it's necessary for features that people have been clamoring for since day 1, and despite repeated assurances from people at Sun that it's a super-high priority for them, there's been no public progress on this issue in years. (And by public progress I mean released code, or at least an official announcement. The most recent news I can find is a blog post from the end of last year. :/)

This leads into the next thing...

5) Opaque development

Maybe I'm just spoiled by Linux development, where you can literally watch changes as they go through, but I find it irritating when open source software development happens behind closed doors. Sure, I realize that Sun has legitimate business reasons for not releasing all their source code as they write it, but I think they've gone too far in that direction. It took me forever to even find their bug tracker page on vdev removal, for instance, and even that is almost completely bare of information.

Really, this is a larger problem with Sun's open source efforts, and one they've been criticized for before. They open up their source code, which is nice, but the barrier to entry for contributions is high enough that a community never really builds up, so all work ends up being done by Sun engineers in the end. This isn't usually a huge loss, since Sun has some pretty sharp engineers, but it means that their open-source efforts will never really live up to their potential.

Now, having said all this, I do have to admit that ZFS is a pretty slick piece of work - it's easily five years ahead of anything else out there in the end-user market. It's not perfect, though, and people need to remember that - for every skeptical page about ZFS on the Internet, there are dozens that are little more than cheerleading. And that doesn't help anybody.

5 comments:

Æther said...

Does not compute. :(

I know what a RAID is, but after that you've lost me. :( @ computer science acronyms I've never seen before.

George said...

It's sad to see a 2 year old post on this that is still correct.

Unfortunately it seems everything anyone wants relies on block pointer rewrite but it's been years now with little more than vague promises. I'm starting to think that it's just a running joke for the developers. Any time anyone asks for anything that isn't currently possible, they get told that block pointer rewrite will add that functionality and that it's just around the corner, and has been for years.

Trenton D. Adams said...

No, it's sad to see a 3 year old post on this that is still correct!!!! :P

Anonymous said...

You know what's even more sad? Seeing a 4 year old post that is still correct.

Anonymous said...

And even sadder? A 5 year old post that is still correct. And now ZFS is closed-source again so we know even less than we did...