P. Static has got Opinions: August 2009

Sunday, August 30, 2009

Radeon KMS on Gentoo (really this time!)

Alright, so many thanks to FireBurn, who pointed out that the x11 overlay has all the packages necessary for KMS support. Here's what I had to do to get it working:

New kernel:

Needs to be from the 2.6.31 series or better. Using 2.6.31-rc8 right now. To enable KMS, you have to enable:

Device Drivers > Staging drivers > uncheck "Exclude staging drivers from being built"

Device Drivers > Staging drivers > Enable modesetting on radeon by default

Add the x11 overlay:

layman --add x11

Create ebuilds for some new packages (optional)
xf86-input-evdev and xorg-server have new beta versions that aren't in the tree or the x11 overlay yet. If you want to use these instead of live git ebuilds, make ebuilds for xf86-input-evdev-2.2.99.1 and xorg-server.1.6.99.900.

Unmask the necessary packages
Here's what I added to my package.unmask:
=x11-drivers/xf86-video-ati-9999

# make these =-9999 if you're not using the beta versions
<x11-base/xorg-server-9999
<x11-drivers/xf86-input-evdev-9999

<x11-proto/fixesproto-9999
<x11-proto/xextproto-9999
<x11-proto/renderproto-9999
<x11-proto/recordproto-9999
<x11-proto/inputproto-9999
<x11-proto/xineramaproto-9999
<x11-libs/libXext-9999
<x11-libs/libX11-9999
<x11-proto/xproto-9999
<x11-proto/xcb-proto-9999
<x11-libs/libXi-9999
<x11-libs/libXinerama-9999
<x11-libs/libxcb-9999
<x11-proto/bigreqsproto-9999
<x11-proto/xcmiscproto-9999

Note that most of them are "<package-9999", which will unmask all versioned versions, but not the live git version of the ebuild. Live ebuilds are a pain since you have to update them manually, so avoid them when possible.

Really long compile:

I think these need to be installed first, so let's make sure:
emerge -uav1 libdrm xorg-server mesa

Then:
emerge -uDNav world
revdep-rebuild

At this point, it should probably work - reboot into your new kernel and find out.

Caveats:

Some packages (pulseaudio, maybe others?) don't build against these versions of the X libraries. They fail with missing header errors, so I think it's probably just a build system issue, and they should both be relatively easy to fix. Still, you've been warned.

When I initially finished these steps, X wouldn't output anything when I tried to run it, and I had to ssh in to kill it. Also, I kept on getting strange errors in my /var/log/messages along the lines of:

Aug 30 11:32:01 3vil [drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule IB(5).
Aug 30 11:32:01 3vil [drm:radeon_cs_ioctl] *ERROR* Faild to schedule IB !

Depressingly enough, I have no idea what I did that fixed this. The problem persisted through several reboots, and then went away for no reason that I can discern. :(

Edit 8/31: Redo package.unmask.

Edit 9/10: kms USE flag is on by default now, so we don't have to enable that. Redo package.unmask again.

Saturday, August 29, 2009

Radeon KMS on Gentoo (almost!)

Radeon KMS has been in the Linux kernel for a few months now; it may not be stable or even complete yet, but I still want it. :o

(If you have no idea what Radeon KMS is, this post isn't for you. I'm mainly writing it for future reference.)

I started from the guide that Dave Airlie wrote a few months ago on his blog. It's somewhat out of date, since a lot of the code he recommends pulling from git has made it into released versions, but still needs some extra configure options.

Before you try this guide: it ends in a hard lockup on my system. All the steps appear correct; I don't know whether I screwed up or whether the latest code is broken right now, but don't get your hopes up. I'm putting this online purely for reference.

Step 1: Kernel, at least 2.6.31

As of 2.6.31, the Radeon KMS kernel bits are in as a staging driver. To enable KMS, you have to enable:

Device Drivers > Staging drivers > uncheck "Exclude staging drivers from being built"

Device Drivers > Staging drivers > Enable modesetting on radeon by default

Step 2: libdrm, at least 2.4.12

libdrm needs an extra configure option to enable KMS. Copy the ebuild into your overlay, rename it libdrm-2.4.12-r99 or something, and add the following line to it:

CONFIGURE_OPTIONS="--enable-radeon-experimental-api"

And then re-emerge.

Step 3: xf86-video-ati, from git (or newer than 6.12.2)

The ATI driver hasn't seen a release in like five months, so this one still needs to be pulled from git. As of this writing, there's a bug in the latest git tree - in radeon_dri2.c, in radeon_dri2_create_buffer, it assigns to the format field of a DRI2BufferPtr. Unfortunately, format was only added in a later version of that struct (DRI2Buffer2Ptr, iirc), so this fails to compile. The bug's been there for two months now, so I don't know if my setup is just weird or what.

Quick but shady patch: just comment out the assignment.

diff --git a/src/radeon_dri2.c b/src/radeon_dri2.c
index 613fde8..37db359 100644
--- a/src/radeon_dri2.c
+++ b/src/radeon_dri2.c
@@ -202,7 +202,7 @@ radeon_dri2_create_buffer(DrawablePtr drawable,
buffers->pitch = pixmap->devKind;
buffers->cpp = pixmap->drawable.bitsPerPixel / 8;
buffers->driverPrivate = privates;
- buffers->format = format;
+ //buffers->format = format;
buffers->flags = 0; /* not tiled */
privates->pixmap = pixmap;
privates->attachment = attachment;

Copy the xf86-video-ati-6.12.2 ebuild (not the -r1, it has extra patches that I'm not sure we want, since they apply to a five month older version) in the portage tree to your overlay, and rename it xf86-video-ati-9999.ebuild - this will automatically turn it into a live git ebuild. Save the above patch in files/ as xf86-video-ati_buffer-version.patch. Add to the end of src_prepare:

epatch ${FILESDIR}/xf86-video-ati_buffer-version.patch

With this, it should build.

Step 4: Mesa, at least 7.5

Dave Airlie's guide says to rebuild mesa from git at this point, but since a version has been released since then, I'm not sure this step is necessary anymore. Re-emerge mesa if you're feeling superstitious.

At this point, you might have working KMS! I get a nice "[KMS] Kernel modesetting enabled." message in my Xorg.log, but then my whole system crashes pretty hard. I'm planning to poke the radeon driver devs and see what they think, and then I'll update this post with details.

Friday, August 28, 2009

The Internet, from Bottom to Top

This is a post I've been meaning to write for a while. It's a high-level overview of how the Internet works, starting at Ethernet cables and working our way up.

I'll be following something called the OSI model throughout this post, or at least the relevant bits. It divides the functioning of a network into seven layers, much like a seven-layer burrito at Taco Bell. Unlike a burrito, though, only the first four layers are generally relevant. (Also, hot sauce does nothing to make networks better. But I digress.) This kind of separation into independent layers is really important, as we'll see later on.

Layer 1 - Physical layer

The physical layer is mainly concerned with the specifics of wiring. Ethernet cable (more properly called CAT-5 or 6, since we're talking specifically about the cabling) falls under this layer, as do fiber, coaxial cable, telephone lines, and various other things. Wireless radios also fall under this category, if we're talking about Wi-Fi.

It's important to know of this layer, and to be able to tell the different kinds of cables apart I guess, but this is far from the most interesting layer in the stack, so I'm not going to go into any great detail about the signaling rate of CAT-5 versus CAT-6, or anything like that. It's interesting to somebody, I'm sure, but it's just not my cup of tea.

Layer 2 - Link layer

The link layer is responsible for sending messages between computers that are on the same local network. Link layer protocols are closely tied to the physical layers they run on top of, to the point that it almost doesn't make sense to separate them into two layers, because each link-layer protocol is heavily dependent on the properties of its physical layer.

I'm going to be talking mainly about Ethernet here, because it's simple, and because I know just enough about other link-layer protocols to embarrass myself. >_> Wi-Fi, for instance, has to deal with situations where there's interference that neither transmitting party can detect, because they're both within the range of the receiver, but not of each other. Madness!

Ethernet is designed to work when all the computers in a network are just sort of plugged into each other on a shared line (the "Ether"). This has the benefit of making simple networking equipment extremely cheap and simple, since the network switch can get away with doing next to nothing. It works using a protocol that's really similar to talking with a group of people: each computer just starts transmitting to the entire network when it's ready, if it doesn't see anybody else transmitting, and if another computer starts transmitting at the same time and there's a collision, they both wait a random amount of time and then try again.

In practice, detecting collisions and waiting a bit to recover from them hurts network performance if there's more than one host on the network trying to communicate. With a modern network switch, the switch will process the packets and take care of sending them one by one to only their designated recipients, so collisions are significantly reduced.

Every Ethernet device has an address hardcoded into it - the "MAC address" - which is supposed to be a globally unique address - no other device in the world should have the same one. Some hardware will let you change its MAC address, but if you change it to the same address as another device on the network, weirdness will ensue. This is in contrast to IP addresses, which I will get to in a second.

Layer 3 - Network Layer

This layer is where real magic starts to happen. Layer 2 lets us communicate with computers that we're connected directly to, but how do we communicate with systems that are anywhere on Earth (or even off Earth)?

In the IP (Internet Protocol) layer, the packets of data from layer 2 can be shuttled between local network segments. Given an IP address, every computer on the Internet either knows the layer 2 address of the computer that has that IP, or knows the layer 2 address of a "router" - basically another computer that's connected to multiple networks, and knows which addresses are on each one. Iterate this across enough networks, usually at least a few dozen, and a data packet can reach any computer in the world that has an IP address. (Well, mostly - more on the abomination that is NAT later on.)

"So wait," you may be asking, "why add another set of addresses? Don't we already have layer 2 addresses we could use?"

But there are a few problems with that. First, not everybody uses the same layer 2 protocol, and not all layer 2 protocols have the same kinds of addresses. A new layer is needed on top for everybody to be able to communicate. Second, layer 2 addresses are distributed pretty randomly. If you want to reach somebody with a given MAC address, it's easy if they're on the same network, since you can just send out a broadcast message to the network, but it's impossible if they're on another network.

IP gets around that by having each router keep track of which IP addresses can be found where, in a routing table. On a home network, it's pretty simple, since all IP addresses except the one the router has are out on the Internet, and can be reached through whatever type of connection it's using. For big routers out on the Internet, the situation is a lot more complicated, and beyond the scope of this post. This is why IP addresses are assigned, rather than just being attached to the hardware when it's manufactured - whoever assigns the address is responsible for also remembering where packets meant for that address should go.

IP addresses are 32 bits long, which means there are 2^32 of them, or a little over four billion. It turns out there are a lot more than four billion people in the world, and most of them want to be on the Internet. Who could have seen this coming? D:

Currently, we're solving this problem mainly with NAT - network address translation. With NAT, you have a router with a single IP, which assigns arbitrary IP addresses to the computers it routes for. If you combine this with some fiddling around with layer 4, you can have a lot of computers behind one real IP address, and it mostly works! Except that none of the computers knows their real IP address, which makes network programming kind of a pain. Oh, and you can't connect to any of them from the Internet without manually setting that up. Aside from those problems, it works!

(Actually, as far as NAT goes, we've got it pretty good here in the US. American companies and organizations managed to snap up a huge fraction of the available IP addresses in the early days of the Internet, so our ISPs can afford to give out public IP addresses to all their customers. You should hear some of the horror stories I've heard about NAT in places like India...)

The long-term solution to address space depletion is IPv6. (Internet Protocol, version 6. Current IP is actually IPv4.) IPv6 has 128-bit addresses, which means that not only can every man, woman, and child on Earth have an IPv6 address, but their constituent atoms can have unique addresses too. In other words, it's comfortably large enough that every computer can actually be on the public Internet, and we won't have to play dirty tricks to make it last longer anytime soon. IPv6 also has some other nice features, like getting rid of packet fragmentation and header checksums, which makes things simpler for routers (which, in the Internet core, are pretty overworked these days).

So why isn't everybody (or, realistically speaking, anybody) using IPv6? Technologies are in place for the transition to happen gradually, but unless your ISP is actively deploying it, the only way to get IPv6 is to get a tunnel to a tunnel broker, which is kind of a pain. ISPs aren't actively deploying it, because websites don't use it yet. Websites don't use it yet, because end-users won't see any benefit from it. The IPv6 transition could happen at any time, all the pieces are there, but it's one of those things that nobody has any reason to do unless somebody else does it first. (I'd call it a chicken-and-egg situation, but everybody knows the egg came first. Really, I'm not even sure why people still use that phrase.)

On the other hand, the projection is that in about two years, we'll run out of new IP addresses to give out, after which a market in IP addresses will probably form, and when the price goes high enough, IPv6 will sort of just happen. This is another case in which the layered architecture of the Internet is really useful. The entire layer 3 protocol, arguably the most important one in the entire stack, can just be switched out with no changes to layer 2 and minimal changes to layer 4. Cool, huh?

Layer 4 - Transport layer

IP is pretty neat, but it's not the last word in this. IP operates on a best-effort basis, with extremely lax guarantees, to reflect the realities of computer networks. The Internet will do its best to get your packets from A to B, but it's allowed to lose them, or mess them up, or send them out of order, or even send multiple copies if it really wants to mess with your head. In computer science, if you don't like the properties a system gives you, standard operating procedure is to build another layer on top of it that gives you what you want. :D

TCP is the most widely used transport layer protocol, for good reason. It gives you a stream that you can read and write data from and to, without having to worry about packets and the funny things that they're allowed to do. It does this by assigning packets numbers, so that dropped packets can be detected, waiting for acknowledgement from the other side, to guarantee packet delivery, and adding data checksums, so that corrupted data can be detected. (This is definitely an oversimplification, but TCP can get pretty complicated.)

The interplay between TCP and IP is kind of interesting in itself. It reflects the "dumb network with smart edges" principle - the Internet at large only sees IP packets, which are easy to deal with, and so can be dealt with in incredible volume. Only the edges of the network, where connections are actually being made from and to, have to deal with the much more complex TCP protocol. At the inception of the Internet, there were other protocols which made much more stringent demands on their network layers. TCP/IP wiped the floor with them, largely because it made almost no demands of its network layer other than that it usually work. This allowed just about anybody could connect to the Internet using just about any method they could send packets over. (Those links are a prime example of what network engineers consider funny. Bonus: somebody actually built it!)

UDP is the next most commonly used layer 4 protocol, and the only other one you're at all likely to have heard of. It offers very similar guarantees to IP, so it's not usually that useful, except when people want just a bit more performance than they can get from TCP. There are also newer protocols, such as SCTP or DCCP, which would probably be really useful if anybody actually used them. I'm basically only mentioning them here for completeness, though.

Layers 5-8 - Meh.

Layers 5-7 (Session, Presentation, and Application) are defined in the OSI stack, but they don't really apply to networks, so I'm just going to leave them out. Layer 5 is sometimes used in some protocols where persistent sessions are useful, but layers 6 and 7 are almost universally ignored.

Layer 8 is actually a network geek in-joke - following the progression of the OSI stack, layer 8 could be interpreted to be the user, so a "layer-8 network failure" is basically equivalent to PEBKAC, or an ID-10-T error.

Hmm, this post went on for a lot longer than I'd expected. >_> The most important thing to take from it is that the Internet runs on many different layers, each of which builds upon the last, allowing it to run on a wide variety of hardware, and under a wide variety of conditions. It also allows for upgrades, since changing out a layer usually needs changes in at most one layer directly above it.

Thursday, August 20, 2009

Republican Spam - Health Care Edition

Yep, I'm still getting Republican spam. It's been enlightening, to say the least. For example, this email, purportedly from Michael Steele, chairman of the RNC:

President Obama told the American people in his weekend address he wanted to "start dispelling the outlandish rumors" about the Democrats' risky health care experiment.

I couldn't agree more with the President.

There is no place for outlandish rumor or outrageous rhetoric in the debate for the affordable and accessible health care reform we all want...

Rhetoric: President Obama Promises Americans Can Keep Their Current Health Care Coverage. "You know, the interesting thing is we've actually been very clear on what we want. I've said I want to make sure if you have health care you are going to keep it..." (PBS's "The Newshour With Jim Lehrer," 7/20/09)
FACT: Analysis Shows Over 88 Million People To Lose Current Insurance Under Government Health Care Takeover. "Under current law, there will be about 158.1 million people who are covered under an employer plan as workers, dependents or early retirees in 2011. If the act were fully implemented in that year, about 88.1 million workers would shift from private employer insurance to the public plan."

Inconveniently enough, the CBO has estimated that only around ten million people total will switch to the public option, if it's implemented. Even if you're content to ignore facts like this, as many Republicans are today, you still have to admit that nothing in the quote actually suggests that anybody will lose their insurance - just that they'll switch to a public plan. The fallacy is fairly obvious, but what makes it particularly disgusting is the way that it's framed as a response to excessive rhetoric. :(

Going further with this same email:

Rhetoric: President Obama Pledges Americans Can Keep Their Doctor. "If you like your plan and you like your doctor, you won't have to do a thing. You keep your plan. You keep your doctor...We're not going to mess with it." (President Barack Obama, Remarks At White House Press Conference, The White House, 6/23/09)
FACT: Mayo Clinic Says Government-Run Health Care Will Force Doctors To Drop Patients. '[L]awmakers are on track to approve across-the-board federal payment reductions of $155 billion over 10 years for hospitals ... Mayo and similar health systems object to the sweeping cuts. 'Across-the-board cuts will be harmful to everyone and we think it is particularly bad to penalize the high-value organizations,' said Jeff Korsmo, executive director of the Mayo Clinic Health Policy Center. 'We will have to violate our values in order to stay in business and reduce our access to government patients.'" (Phil Galewitz, "'Model' Health Systems Press Case For Medicare Fix In Reform," Kaiser Health News, 7/20/09)

Gosh, that certainly sounds bad! Just for completeness' sake, though, let's put back in part of the quote they felt the need to remove.

Both the American Medical Association and the American Hospital Association say they favor additional study of new payment strategies, rather than any major revamping now.

That's the approach Congress seems to be pursuing. In addition, lawmakers are on track to approve across-the-board federal payment reductions of $155 billion over 10 years for hospitals, reflecting a deal reached recently by major hospital groups, the White House and Senate Democrats. That agreement assumes that the hospitals will see increased revenues as reform legislation results in fewer uninsured Americans, whose care is now a financial burden.

Other emails have repeatedly stated that the plan will cost trillions of dollars (specifically, 1.5 trillion, as stated in an email from gopsenators.com sent on the same day as the previous one). Once again, we turn to the ever-reliable CBO, which does indeed estimate the cost to be over one trillion dollars - as long as you ignore any additional revenue from the bill (which offsets most of the costs), and add up all the costs over the next ten years. I doubt this is what most people have in mind when they consider the cost of the bill.

Right now, there is a strong push by Barack Obama and the Democrat-controlled Congress for the federal government to virtually take over our health care system - running it like "nationalized" health care programs in Canada and England.

(from an email purportedly from John Cornyn, sent on the 17th.) It would be nice if it were true - I've heard a lot of nice things about health care in Canada and England. Unfortunately, all the health care plan does is create a public health care option. It won't even compete with private insurance companies, because the rates it charges would be tied to the rates that private companies already charge. This is actually a textbook example of a standard Republican play: If you say a lot of untrue things at once, in such a way that your opponents can only respond to one of them in a single soundbite, then you get the rest for free.

And then, of course, we have Sarah Palin. Not one to be outdone by others, she has jumped straight to claiming that under Obama's health care plan, her child with Down's syndrome would be put to death by evil government bureaucrats. Yeah, I don't know either.

The Republican party used to be great, and this is what they've been reduced to? What happenend? The Republicans of yesteryear would at least have had the decency to propose an alternative plan, something in line with conservative values. I can't even call the current batch of Republicans conservatives anymore, because the party leadership has so thoroughly abandoned conservative principles that it would make a mockery of the term. It's like they don't believe in anything anymore, and just want to score a political victory, for the sake of politics itself, no matter what the cost may be.

Monday, August 17, 2009

I've been asked to do a post on what actually goes on at an Indian wedding, so this is that. My second cousin Devi got married this weekend, so I've been in New Brunswick for the wedding. Fair warning - I've just gotten back from the reception as of writing this, so this is going to be a pretty halfhearted post.

Thursday was some kind of ceremony which mainly involved the girls all getting henna tattoos on their hands! Not being a girl, and not especially desiring henna, I mainly hung out with cousins.

Saturday was the actual wedding! It began with the groom showing up, and walking down the aisle, which took a while. The priest doing the ceremony was pretty long-winded, and tended to drag things out. I kind of dozed off at this point for a bit, but I am informed that I didn't miss all that much. I woke up in time for the satyapadi, where the bride and groom take seven steps together, each of which symbolizes something (don't ask me what). They then exchanged vows - traditionally, the bride extracts seven promises from the groom, and then the groom gets one in return from the bride. After that they exchanged rings, and then the priest went on for a bit, managed to quote Shakespeare and Browning while explaining the ceremonies, and then there was food.

Sunday was the reception. I know that people reading this will be most interested in the differences between Indian weddings and traditional Christian weddings, and they will be mostly disappointed here - as far as I know, the biggest difference was the choice of music. There's a fairly standard repertoire of Guyanese music that gets played at all my family events, since my mom's side of the family hails from there. Here's a pretty representative sample.

Looking at it, this is a pretty crappy post. I am posting it anyway, because I am sleep-deprived and have been for quite a while, and if I don't fall asleep soon my parents will start snoring. :(

Thursday, August 13, 2009

Delayed post

So I was going to do a post on what actually goes on at an Indian wedding, since so many people ask! Complication: the actual wedding isn't until Saturday. >_> Real post coming Monday probably.

Thursday, August 6, 2009

Data Mining Post x2

1. Netflix

The Netflix prize competition finished up recently! This is of special interest to me, because the competition was the final project in the data mining course that I took recently. Basically, Netflix has some software (Cinematch) that predicts what movies people will like, based on what movies they've liked in the past. The competition was to write a program which makes these predictions with 10% greater accuracy than Cinematch, with error measured as the RMSE. The prize is one million dollars (!!!), which is enough to spur some serious data-mining research into the subject.

The contest ended when one team (BellKor's Pragmatic Chaos) achieved the 10% goal. This started a month-long countdown, and there was a bit of back-and-forth and drama between BKPC and another team (The Ensemble), as they both tried to hold the #1 spot at the moment the competition ended. The Ensemble took the spot with a better result with only one day to go, but BKPC took it back with just 20 minutes to go. The Ensemble then submitted a better entry in the final few minutes of the competition, but it looks like that one wasn't accepted for some reason, because at this point it seems like BKPC has won.

A really interesting thing that emerged as the competition got to the later stages was the merging of teams. BKPC is a merger of BellKor in BigChaos and Pragmatic Theory. BellKor in BigChaos is a combination of BellKor and BigChaos, and team BellKor was itself a collaboration. On the other side, The Ensemble was a combination of the second- and third-place teams when BKPC broke 10%, and both those teams were also merged teams. If I were a lot less lazy, I'd put together a flowchart of which teams ended up where - it'd definitely be fun to look at.

(Hint for companies looking for algorithmic improvements: A huge pile of prize money is probably a cheaper and more reliable way of getting results than actually paying for that kind of research. Also it's a lot more fun for everybody involved!)

The actual data provided was split into two sets - one massive set (100 million ratings) to be used as input for the algorithm, and a much smaller set(a few tens of thousands? I forget) for actually testing the performance of the algorithm. Having separate training and test sets is pretty standard for this type of problem; if you use the same set for both you risk overspecialization, where the algorithm does well on data it's been fed but poorly on any other data.

One hundred million ratings is a pretty freaking huge number of ratings. We used smaller datasets (5M and 20M ratings) to actually test the algorithms, because running a test against the 100M rating dataset can take hours or days (or weeks, if your code is especially sloppy). What makes it worse is that the 100M ratings we were given are only a small portion of all the ratings that were possible. It's convenient to view the data as a table, with movies along one axis and users along another, but keep in mind that the 100M ratings would only fill about 1% of this table. Even treating this table as a sparse matrix in Matlab, it took several gigabytes of RAM just to keep in memory.

Anyway, I'll spare you all the details of the algorithm my group implemented for the project (iterative trace-norm minimization by singular value thresholding; I only have the vaguest grasp of why it works myself :( ), but we ended up with a roughly 1% improvement over Cinematch's accuracy. We could definitely have done better, I think, but we were severely crunched for time toward the end (and actually, quite a bit of the work was done in a frantic all-nighter the night before the project was due, since we got the due date wrong). Realistically speaking, if we'd had time to properly test and tune the algorithm, we might have been able to beat Cinematch by 2-4%.

2. The Broken Window

What are the odds? I write a blog post on data mining, and then on the flight back from Pittsburgh I pick up The Broken Window, by Jeffrey Deaver, and it's about... data mining. :o

(Quick plot summary: The killer has access to a megadatabase created by a data mining corporation, and uses it to frame other people for his crimes.)

I picked up the book mainly because the last book I read by Jeffrey Deaver (The Blue Nowhere) was about hackers, and somehow, he got everything basically right. An author that can write a sto'ry involving hacking, and do their homework so thoroughly that I am satisfied with the result, is an author that has earned my respect. The Broken Window is a good read, if not a stellar mystery novel, and factually accurate enough that I didn't throw the book down in disgust (looking at you, Dan Brown).

On the other hand, the book just felt late, but this isn't necessarily Deaver's fault. Put it this way: fifteen years ago, this would have been science fiction. Ten years ago, it would have been seen as prescient. Five years ago, it would have been a timely novel; today (and more importantly in 2008, when it was published) it feels just a bit dated. For example, one of the pivotal moments in the book involves the main characters, a group of top-notch detectives and police officers, finding out about RFID tags for the first time.

I get the feeling that this was supposed to be a mystery novel, but it was missing something. Mysteries are supposed to have an "aha!" moment, when the culprit is revealed, and all the pieces come together in your mind. Instead, for me, it was more like an "oh." moment - it made sense, but there weren't enough clues there to make it an "aha!". The solution was just too unexpected. Frankly, I think Deaver spent too much effort on misdirection, making you think it was the CEO the whole time. Sterling, the CEO, was an obsessive, secretive type, and therefore entirely too obvious to be the actual culprit. Didn't stop Deaver from hinting that he was for 2/3 of the book, though. :/

All in all, though, I'd give this book a 7/10. It had some flaws, but it was a fun and worthwhile read overall; better than anything that hack James Patterson ever wrote. :p (I picked up one of his books for a flight one time. Considering how often people recommend him to me, it sure was terrible. XD)

Sunday, August 2, 2009

Doppelganger Desktop

Been playing with Amazon EC2 lately. Basically, it lets you rent time on virtual machines on an hourly basis, for astonishingly reasonable prices. I haven't tried this out yet, but I think I've come up with a solution to the problem of wanting to always have a desktop running somewhere.

Essentially, I want to have my desktop running all the time, so that I can keep various programs active - instant messaging, IRC, email, RSS aggregation, and others. Unfortunately, when I move or go on vacation or lose internet access for some reason, I can't keep my desktop on. One solution is to shift all of these to online services, but this isn't really possible today - for instance, there's no good way to run AIM remotely. (Bitlbee is neat, but I really wouldn't want to use it on a day-to-day basis. :p)

Another solution is to keep a synchronized home directory somewhere in the cloud, but this isn't especially useful except as a last-ditch backup. A home directory by itself isn't sufficient; what I really need is all that data plus the programs that use it. For instance, I have access to my email somewhere in Thunderbird's files, but not in a form that I can really use - at best, I could export it to a different program and use that.

So how about this: what if I keep a second desktop in the cloud, with all the same software that I have at home, and synchronize my files to that? This "doppelganger" system could be accessible over VNC, so I could use it almost as easily as a regular desktop. This would only be needed when my main desktop is offline, so I'd waste a lot of money if I paid for a dedicated server. Luckily, EC2 only charges you for time that your virtual instances are actually running, making it easy to keep a system on standby for when you actually do need it.

This system would have to be augmented to work in practice, since I keep a lot of large media files. I already have those on an external hard drive right now, so I could use that in conjunction with the doppelganger to fill in the gaps. (Why not just use the external for all my files? Because the real value in having a doppelganger isn't accessing the files, it's having a system with a persistent internet connection.)

To do a clean transition, I'd have to do the following:

* Log out of the main desktop, and bring up the doppelganger

* Sync home directory to the doppelganger, and sync my media drive to my external

* Log in on the doppelganger, and shut down the main desktop
All of which shouldn't take more than ten or fifteen minutes.

If I sync regularly, I'll also have a remote backup at a fraction of the cost of Exavault, which is what I'm currently using. As a bonus, I'll also have a system capable of using the backup ready to go at any time. rsync is the obvious choice for syncing, since it's fast, incremental, can use compression, and basically is the perfect tool for what it does. unison is also an interesting option, mainly because it provides two-way sync on top of all the things that make rsync awesome. Distributed filesystems such as AFS would be neat, but I don't think it's the best choice for something like this, where disconnected operation is the common case.

So yeah, that's my idea. I'm going to be trying it out while I'm on vacation next week; if you never hear about it again, that means it worked perfectly. :3

P. Static has got Opinions