P. Static has got Opinions: December 2010

Sunday, December 5, 2010

Adventures in UNIX: Pipe buffer edge cases, half-open sockets

So yesterday I had a really neat idea: exposing pipe-based programs as network services. You could open a connection to a program, send it data, and the remote computer would put it through some predefined command and send it back. Then I realized that with IPv6, you could give each program its own IP address, and give them all names in the DNS, so that you'd have a perfectly usable system without using any higher-level protocols than DNS and TCP. Then I realized that this would be simple enough that you wouldn't even need to write any code to make this work - it's all doable with simple shell commands! (I was wrong about this last one, and that's the topic of this post.)

(By way of comparison: This is sort of an inversion of the Plan9 model of network computation. Instead of mounting a remote filesystem and piping it through a local program, you're piping local data through a remote program.)

Pipes are usually a one-way structure, but for this to work properly, I needed something a little more exotic. I need to be able to take output from a command, and pipe it back around to the beginning of the pipe, so that the pipe has a loop in it. If I could do that, I could combine netcat with any command, and that'd be a one-liner that implements a server. :D

So here's the first thing I tried.

Server:

mkfifo t
while true; do nc -l 127.0.0.1 9999 < t | tr a-z A-Z >> t; done

Client:

nc 127.0.0.1 9999 < testdata

In a perfect world, this would work! But here's where we get into the details of pipe buffers.

The first problem with this is in the server. When you use a pipe, data's actually buffered along the way, in the commands that are being piped. Normally, this is transparent, because the buffers are flushed out when the previous command exits. This doesn't work when you have a loop through a fifo, though! The data that's buffered in the tr command doesn't get flushed to the fifo until netcat exits, and so netcat never actually has the chance to send the tail end of the data. The only way I could think of to solve this was to write some code for the server - pretty disappointing, but probably necessary. (I'm not going to post that code here, because it's even messier than being a prototype should justify. >_>)

But that's not all - it turns out the client part is broken, for a completely different reason. When you give netcat an EOF (Ctrl-D), it doesn't know how to tell the remote side of the connection that there was an EOF. The server then doesn't have any way to know when to flush the buffers out and end the command, so the whole thing deadlocks waiting for more input that's never coming.

It turns out that TCP solves this problem; the bug is in netcat. With a TCP socket, you can close one direction of traffic, but keep using the other - for example, when you're done writing data to a socket, you can shut down the socket for writes, which signals to the remote side that you're done writing, and then read whatever the server sends back. This, unfortunately, required more code.

netpipe.py:

import sys
addr = sys.argv[1]

import select
def attempt_read(s, BUF_SIZE):
    if select.select([s], [], [], 0)[0]:
        return s.recv(BUF_SIZE)
    return ''

import socket
s = socket.create_connection((addr, 9999))

BUF_SIZE = 4096
buf = sys.stdin.read(BUF_SIZE)
while buf:
    s.sendall(buf)
    
    sys.stdout.write(attempt_read(s, BUF_SIZE))
    sys.stdout.flush()
    
    buf = sys.stdin.read(BUF_SIZE)

s.shutdown(socket.SHUT_WR)

buf = s.recv(BUF_SIZE)
while buf:
    sys.stdout.write(buf)
    sys.stdout.flush()
    buf = s.recv(BUF_SIZE)

(This is trivial enough that I'm planning to port it to C soon.)

Finally, some good news: this works perfectly! :D With this, you can open a connection and use it as a component in a pipe.

Wednesday, December 1, 2010

p2p DNS

Now that the US is considering forcing pirate domain names out of the DNS, one of the founders of The Pirate Bay is floating the idea of a p2p DNS alternative.

Okay, wow. This is an incredibly terrible idea.

I'll start with the obvious objections:

The DNS is meant to be authoritative

NO.

The DNS is meant to be reliable
Performance
Secure decentralized systems are HARD

There are some people whose first reaction to any data management problem is to try to stick it in a magic DHT and forget about it. In many cases it works - see BitTorrent, for example. A DHT will work in any application where you don't especially need data to be reliable or trustworthy; it's a perfect fit for BitTorrent peer exchange, where reliability is optional because the DHT is only a backup for the real tracker, and trustworthiness doesn't matter because the peers aren't trusted in the first place. For the DNS, though, a DHT is exactly the wrong solution.

It may be possible, someday, to fully decentralize the DNS. To do it will take some fundamental advances in computer science, though, and Peter Sunde isn't going to be able to make that happen by rallying the pirates to his cause.

P. Static has got Opinions

Sunday, December 5, 2010

Adventures in UNIX: Pipe buffer edge cases, half-open sockets

Wednesday, December 1, 2010

p2p DNS

Followers

Blog Archive

About Me