Monday, May 31, 2010

portable functions braindump

I have just had an incredibly cool idea, and I'm afraid that it'll leave me once the drinks wear off, so I'm going to go ahead and write it down now.

First-class functions are pretty cool, so let's take them to the next level. Imagine for a moment that we could represent a function as a file in a reasonable clean way. (This doesn't take a lot of imagination, since it turns out we can, and the only disagreement is over how to do it best, but bear with me for a second.) What could we do with this? As a start, we could expose functions over the network, and call them dynamically from programs. Instead of having to have a copy of all relevant code on all computers that want to use it, we could store functions online, and dynamically pull them into a cache when we actually want to call them, and abstract away software updates entirely. But wait, there's more.

Cloud computing services are drifting in fits and starts toward a clean model where you submit jobs to the cloud and they happen. Right now, Amazon EC2 is considered state of the art, which is just ridiculous - they make you virtualize a whole OS instance! What the hell is that?! It maximizes flexibility at the cost of incredible inefficiencies. The major bottleneck to a better cloud compute service, though, is a common format for executables. If you only want to submit a single program to a compute service, you and the cloud compute service provider need to agree on a format for the program, and that's basically impossible today. There's simply no consensus at all, not even the beginning of one. (Services like PiCloud are a nice first step, but they do it by standardizing on one language and ignoring the rest of the universe. Useful, but not especially interesting.)

So here's the utopia I'm imagining:

* You put a chunk of code online
* You give your choice of cloud compute service the URL for the code
* They fetch it and execute it
* Everybody is happy and nobody has to fiddle around with what availability zone their EC2 instances are in, and whether or not it matches their data

We'd use HTTP, naturally; HTTP is the new TCP, and everybody uses it as a starting point for their protocols these days. As long as we're exposing code over HTTP, we could expose documentation and other metadata in the same way. would give and executable, /somefunction/doc would be rendered documentation, /somefunction/sig would be a signed hash of the code, which is important if people are ever going to let it run on their machines. I'm nodding off right now, but pretend I am describing a good security model instead.

(This would be even cooler when combined with Tahoe-LAFS, which I'm going to blog about soon, but not tonight.)

I vaguely remember somebody having worked on this before in one of those languages nobody uses - Haskell? Erlang? Clojure? I forget - but it was a language-specific solution and therefore boring.

Anyway. My real goal in this is to abstract away the languages that people code in. It's 2010, and we should have papered over all of this decades ago. Up until a few years ago, the command line was the closest thing we had to my ideal. These days, web services might be closer - they support structured data through JSON or XML, generally, which is a hell of a lot more than you can say about the command line. They're also a lot harder to get at, but that's just because of a lack of decent tools.

Actually, though, now that I think about it, what I'm describing here has a lot in common with the #! functionality of the command line. Actually, it's almost exactly the same. Maybe building this will be a lot easier than I thought. :D

It's been too long since I blogged. I am forgetting how to write. :( Gonna hit publish before I sober up totally.