Wednesday, January 7, 2009

Generalizing pipes

The pipe is a fairly fundamental piece of UNIX. By using the command line to string programs together, you can create far more powerful programs out of basic building blocks. Pipes have a ton of other advantages, too: you can write different components in different programming languages, since every programming language worth talking about supports the standard input/output streams; you can get pipeline parallelization basically for free, since the OS handles multiple concurrent processes; testing is easy, since you can take each part individually and feed it input manually.

Pipes have an important restriction, however - you can only chain then one after another. There's no clean way to, say, split a stream to multiple other commands, or to merge streams. This has actually been an issue for me in the past, with certain video encoding scenarios, and the solutions are pretty dismal. If you want to split a stream, you can either write the output to a temporary file and run the next stage separately (untenable for video encoding, since the temporary file in question would probably be a few dozen gigabytes), or you can run all the preliminary stages multiple times (workable, but slower than it needs to be).

What would a generalized pipe utility look like? Instead of taking a string of commands to execute in series, it should support execution of a graph of commands, with the edges in the graph representing dataflow, and the nodes being commands. This is really too much information to express on the command line concisely, so it would have to load the graph from a file. Splitting streams is relatively easy, and can be accomplished with existing pipe semantics. Merging streams is somewhat more difficult, and won't be possible without specifying separate text and binary modes (since really, it's not even clear what it means to merge two or more binary streams).

To be honest, this post is basically just a justification for a utility I wish existed. (Of course I've written a prototype, do you even have to ask?)

No comments: