Thursday, November 25, 2010

Layers of Fail

So here's an annoying multi-layered fail, notable because it affects three adjacent layers of the network stack! As any programmer will tell you, the most interesting bugs to diagnose are the ones that result from the interaction of other bugs. This particular one results in me losing messages on AIM.

First fail: U-Verse and my Mac

I don't know if anybody else has seen this problem, but my Macbook Pro cannot keep up a reliable connection to any U-Verse modem. I've seen this problem with multiple U-Verse modems, and only my Mac, so the possibilities are (in order of likeliness):

  1. Buggy AT&T software in the modem
  2. Buggy firmware for my wireless card
  3. Random hardware fault in my Mac (unlikely, because it only happens with U-Verse modems)

This is pretty annoying, because WiFi is supposed to be standardized! All implementations are supposed to be interoperable with all others. Either AT&T or Apple could have caught this (isn't it standard practice to test with other widely-used hardware?), so the fault could lie with either company. Luckily, I don't have U-Verse at home, so I don't have the need to diagnose this properly - it's only an issue when I'm visiting people that do, like my parents.

Second fail: Automatically dropping connections on interface down

This is such a widespread thing that I think it must be intentional, but I can't figure out any reason that it's not a terrible idea. On any OS that I've used, when a network interface goes down, all connections are severed automatically. The thing is, the IP protocol is explicitly designed to allow lost packets, and the TCP protocol on top of it is designed to handle it, so the dropping of connections is unnecessary. If operating systems just ignored the loss of the interface on the assumption that it'll come back up soon, everything will still work as designed, and a lot of situations involving intermittent connections will work much better!

In other words, effort went in to a feature which makes things worse, which definitely counts as a failure in my book.

Third fail: AIM protocol doesn't handle dropped connections cleanly

This one's pretty simple: if a connection drops, and a message is sent during the timeout before the server decides that the connection is dead, that message seems to be lost. Seems like a simple bug to fix on the server, but it's been going on for a while now, so apparently that's not going to happen. What's more irritating about this one is, AIM already seems to save messages that are sent to somebody that's offline - it just can't detect that you're offline during the timeout period.

The end result of these three (or two, if you don't want to count the middle one) bugs is that AIM is nearly unusable for me when I'm using the WiFi at my parents' house. (Yet another reason to switch to GTalk? :D No idea if it has the same problem, though.)

1 comment:

Kiriska said...

Hm. GTalk isn't great about telling you when the other person is offline either, I don't think, though my main issue with it is that its built-in browser interface isn't all that obvious about alerting you when the other person's replied... which is kind of nice when I'm at work and trying to be subtle, but not as nice otherwise. Then again, this can be circumvented with a third-party client. AIM's failure to alert you to timeout/lost messages might also be alleviated in a third-party thing? idk. :O