Sunday, December 21, 2008

Real-world filesystems

I was thinking today about notebooks, and how you find information in them. When I take notes in my classes, I start at the first page, write down whatever seems important, and pick up where I left off for subsequent classes. When I want to look something up from my notes, I try to remember when we talked about it in class, and then do a sort of binary search through the notebook based on that. If I don't remember well enough, I fall back on linear search.

This is very nearly optimal for writing notes, but pretty damn bad for actually going back and reading them. (Probably, this reflects poorly on my study habits. >_>) More rigorously: Writing involves a O(log n) seek, which can be amortized over the length of a class period. Reading involves a seek which varies between O(log n) and O(n), depending on how well I remember the contents of the notebook. O(log n) is pretty acceptable performance, all things considered, but an O(n) worst case is pretty bad.

This is where the CS nerd in me immediately starts drawing comparisons to filesystems. Unfortunately, the comparisons are pretty worthless in the end, because a notebook has fundamentally different characteristics when compared to a computer's hard drive. Also, and probably more importantly, humans are fundamentally different from the computers which access hard drives - algorithms involving hashing, for example, are a no-go. Still, there's a lot of potential here for better filesystems.

First: indexing. This is basically the same algorithm used in books, for the table of contents - you dedicate a few pages at the beginning to a list of all the sections in the notebook, along with their page numbers, and then number the rest of the pages in the book. This gives you guaranteed O(size of index + log n) performance for looking stuff up, at the cost of some time every time you write in the notebook (because you have to update the index). Multi-level indexing is possible, but they don't make notebooks big enough for it to matter.

Second: file fragmentation. This is akin to what newspapers do, with continuations from the front page a few pages deeper in. If you're writing on a subject that you think you might want to come back to later, you can reserve some space at the end. When you continue writing later, on some other page, you can then go back and write the page number you continued on in that space.

Here's another thing to consider: If you use a binder instead of a notebook, you can move the pages around at will. While this is useful, it also renders your page numbers useless. This is kind of an interesting tradeoff, and it remains to see what can be done with it; I'll probably be posting on this topic again.

(Also, shameless plug: A good friend of mine (and also a frequent commenter here!) has started his own blog! Props to him for this.)

4 comments:

Frank Church said...

Do you really think I care about any part of your entry other than the pimping? I mean, let's be honest. That said, I understood most of your entry, and I'm proud of myself. Also, God!

Anonymous said...

So, as far as binders go, there's a reason they make things called dividers. You can also stick little post-it notes to the ends of pages and label those. If you're even cheaper, you can make your own using tape and little pieces of paper. Sure, it takes time, but if you're concerned with how quickly you can find something with little effort, it's what I'd do. Nay, it's what I already do.

Kiriska said...

This is pretty interesting.

@Æther: There are huge drawbacks to labeling and dividing everything though. Obviously, if you don't label enough, you'll just end up with huge sections of mini-notebooks that would all require linear searching (especially since they likely wouldn't automatically fall back to a chronological order within each section). But if you have too many, then it would take a while just to browse through the labels. It's hard to find a good middle.

But hey. Bring your laptop to class. Type your notes. Default to digital filesystem. :P

On an unrelated note. All of you guys are in Houston, yeah? All of you have Comcast as your ISP? Are they fucking up your connection as much as they're fucking with ours? They're randomly cutting off our Internet with downtime ranging from 30 seconds to 20 hours. :\ They also seem to be arbitrarily blocking a lot of sites on my blogroll only to put them back some arbitrary time later. Like. Right now, I'm visiting this blog through a proxy. What the fuck?

Anonymous said...

Kirista, my labeling is always perfect. :) I store things in binders by type of information and then chronologically, so I never get bogged down with a linear search of my papers. Of course, there was ochem lab, in which I just threw everything in one giant division in frustration with no order at all. Well, the order was the order I found them in, but that's not altogether helpful.

Comcast doesn't sound like the ISP that I am using. I think we have SBC, which became ATT, if that counts as an ISP...The only trouble I have is that our router keeps failing and has to be unplugged and then plugged back in. Usually I have to do that a couple times a day, which is annoying. Not that I don't love to see AIM and irc crash all the time, of course.