Friday, March 12, 2010

Pre-vacation idea dump

I'm going to be out of the country for a week and a half! Gotta braindump this stuff or I will forget it by the time I get back.

Mobile filesystem

My search for a network filesystem that supports offline writes can best be described as a dismal failure. AFS is the closest thing that I could find, and it was designed in the '80s and is kind of archaic, and looks to be a pain to set up. (Plus I doubt my home stuff will ever scale up to tens of thousands of computers. :p) DRBD+NFS was the next best candidate, but I dunno. I'm sick of all the interesting stuff being implemented at the block layer, while filesystem designers are still dicking around with consistency models (see for instance the drama last year with ext4 and fsync()). Plus it won't do me any good for disconnected operation, whoops. Rsync is the third best candidate, which is seriously just sad. (Honorable mention goes to XtreemFS, which (in addition to having a truly awful name) only supports read-only replication. Way to only solve the easy problems, guys!)

So, fine. If it doesn't exist, I'll just build the damn thing myself.

My design is based on BTRFS writable snapshots, for a lot of reasons. First because they're freaking cool, second because it means this design degrades in the worst case to a completely usable filesystem. Third, because they're really freaking cool. :D

So, computer A wants to sync with computer B. A creates a writable snapshot, and then uses rsync to pull in the updates from B. Now A has a copy of the current state of B's filesystem, without using up excessive space because it was copied over a snapshot, and we're assuming that the differences are relatively small. (Directory renames mess this up, and are a weakness of this approach, but not a deal-breaker.) This can all happen in the background, so far. Now, the user at A can apply a merge strategy using a hypothetical merge program, and bring their main copy of the data into sync with B's copy. They can do this at their convenience, because the data's all stored locally, so there's no need for B to even be connected. If there are conflicting changes, we can look back at the original data (because we have a snapshot) and the two divergent copies; a three-way merge is pretty much the best we can hope for here. Advantages of using writable snapshots: saves disk space, lets this strategy work with any number of other computers, freaking cool.

The merge program is completely hypothetical, because whoops, it turns out that merging sets of files automatically is a fundamentally hard problem. Still, there are all sorts of useful heuristics we can use, before we kick the problem up to the user. If a copy of the filesystem isn't being used, it could be set in a "receiver" mode, where it just takes all the updates it sees from other computers. Files that are only modified on one side or the other of the merge could also be copied safely, probably. Er, maybe not. Gotta think about that more carefully. N-way merges could be handled as a series of 2-way merges, I think.

Vector clocks are basically magic, and this scheme will have to use them to keep track of merge results (so that they can be propagated), and for garbage collection (because disk space is cheap, but it's not that cheap). All the accounting information can be stored as a special file in the filesystem, which is doubly convenient because it means that every system has access to the state of every other system. Ooooh, we could store the vector clock across all of those - it needs to store a logical clock per-replica, and we conveniently have per-replica storage already because of the nature of the system. Slick. :D

Structured (JSON) pipes

The philosophy of UNIX involves creating programs as commands that can receive input from other programs, and pass output to other programs. In that sense, UNIX has failed us. Text streams can be easy to read, and they can be easy for a program to parse, but they are rarely both. This is really a topic for a proper rant, some other time.

But what to do about it? My solution (which I will implement when I get some free time) is structured pipes. Pipes should take a stream of JSON objects (or some other format, but JSON is nice for this sort of thing) on standard input, and write a stream of JSON objects to standard output. Then, instead of writing fragile and error prone nonsense to parse text using cut and awk and sed and miscellaneous tools, we could have a standard JSON query command that pulls out a given field of an object. Instead of trying to use the same output stream for humans and programs, we could have a standard program to translate the JSON stream to something human-readable. (Hell, a terminal could do this automatically - then we wouldn't even have to think about the JSON layer underneath. Everything would just work.) Programs could output a format specifier as part of the stream - then we could get output at least as nice looking as the crap we have now, and the potential for it to be a lot better. Plus, programs could change their output format without breaking every damn script that uses them written in the past three decades. (Example: df. It's hopelessly out of date, and there's no good way to shoehorn information about RAID, or compression, or modern disk stuff, into its output. But nobody can change the output format, because scripts depend on it.)

UNIX also sucks at delimiting fields, which you'd think would be a basic thing when your entire operating system philosophy is based on processing text streams. It uses a random mishmash of spaces, tabs, and null characters for cases when whitespace won't do. Why not set a standard delimiter? Like, I don't know, one of the ASCII characters explicitly set aside for use as delimiters? The problem is, UNIX text streams have no concept of escaping, so any character could potentially appear in the stream. UNIX gets around this, when it bothers to address the problem at all, by letting you choose between a few output delimiters. For instance, if you think your output will have newlines, you can tell the find command to use nulls instead. This complicates the program reading the output of find, of course, so the ones that are likely to be hooked up to find (xargs, for example) also have special cases for handling null-terminated input.

This is kind of turning into a rant, so I will summarize: The whole thing is an ugly mess. And using JSON as an intermediate format would also get us standard delimiters for free.

When I write this, I suppose I'll have to also write a set of core utilities that operates on JSON streams. It shouldn't be too much work, I don't think.

No comments: