P. Static has got Opinions: May 2009

Thursday, May 28, 2009

Lazy Post Today

I don't think I have a proper post in me this week. Instead, here are some neat things I've seen online lately!

Finger trees - The Good Math, Bad Math blog occasionally has some really interesting stuff on computer science. Summary, for non-CS people: log(n) is good! We like log(n). Everything should run in O(log(n)).

Ethics for Machines - Evolutionary ethics may be a somewhat unsatisfying framework for moral reasoning, but it's also unexpectedly robust, which I'll take over aesthetics any day. This essay explores some of the ethical issues surrounding created machine intelligences.

We may have been wrong about dinosaur posture all along - This is the sort of question that would be solved really quickly if we just had a time machine of some sort.

An Inconvenient Talk - A bleak perspective on peak oil, from somebody who probably knows what he's talking about. Also points out that the rosier perspectives (the ones that we always hear) always come from politicians and oil company executives.

Friday, May 22, 2009

Turning a computer off

For various reasons, I've become accustomed over the years to having my desktop running 24/7, to the point that I basically depend on it. This is less than ideal, though. It's a huge hassle when I'm moving or traveling, and it's a single point of failure when something goes wrong. It also wastes electricity, and since I've spent the last year paying my own electricity bills, this is suddenly very important to me.

Therefore, one of my projects for this summer is turning it off for a while. What follows is a list of things which will be problematic.

Instant messaging: Being signed on to AIM and similar services all the time is really convenient, even if I'm not around 24/7 to actually read the messages. With a desktop, this is easy - I can just leave my client of choice on and that's the end of it. Without one, my preferred solution would be something similar to meebo, but with a persistent connection on their end so that you could stay "online" all the time, and check received messages whenever you're on the site. The next best thing would be an "AIM proxy", which does the same thing but on hardware that I control. (Unfortunately, I would have to write this one myself, since nothing like this exists yet.)

Google Talk already supports this, as far as I can tell - too bad none of the people I talk to on AIM use it. XD

File access: Right now, I have access to all my files on my desktop, which is good. I also have a remote backup service that I sync to every week or so, which is better since it gives me read-only access to files even when my desktop is off.

Read-only is less than perfect, though, so I've been looking into distributed network filesystems. My requirements are that it must work on Windows, Mac, and Linux, have some kind of security, and allow for offline editing of files. NFS is the most widely used thing, but it fails two of those - no offline editing, and nonexistent authentication. AFS seems much more promising, but I haven't yet had a chance to play with it. Unfortunately, those two are the only options for distributed cross-platform filesystems. :(

RSS Reader: I used to use Liferea, which was a pain - really slow, no remote access, etc. I've since switched to my own RSS reader, feedme, which is fast enough, and can be accessed remotely through screen. I could shift this to a server at any point with a minimum of fuss, and keep getting updates uninterrupted.

This still has a single point of failure, though; ideally, I'd like to have multiple servers with synchronized state and automatic failover, possibly only grabbing the feeds to which it has the lowest latency... hrmm. I think I'm overengineering the hell out of this one, my current setup plus frequent backups is probably good enough.

IRC: The challenges for IRC access are similar to instant messaging, but the solution is quite a bit easier. If I can get a command-line IRC client configured such that it isn't a pain to use on a day-to-day basis, then I can just run that on a server and be done with it.

Bookmarks: I know there are services that sync your bookmarks across multiple computers, but I'd need one with support for multiple browsers (at a minimum, Firefox, Safari, and Chrome). Alternately, having a webpage somewhere with a list of my bookmarks would really be just as good. Something like this probably exists; further research is required.

Friday, May 15, 2009

Power-aware scheduling in TFlex

Whenever people ask me what I'm doing for my honors thesis, I find myself in a bit of a bind, because actually explaining what I'm doing properly would take entirely too long for casual conversation. I thought I'd do a post on what I'm actually doing, so that when I say I'm working on my research project this summer, y'all will have some idea what I'm actually talking about.

Unfortunately, actually explaining things properly requires a lot of background explanation.

Background: current architectures

When I say computer architecture, I'm not talking about buildings, but rather the internal structure of the processor, and the way it executes programs. If you take any current computer architecture, and handwave away a whole lot of implementation details, you'll almost always end up with about the same thing. Current architectures are all based on an extremely simple model: the processor reads an instruction from memory, does what the instruction says, and then moves on to the next instruction. Instructions are the basic units of computation - some commonly used instructions will perform various arithmetic operations, or read data from memory, or have the processor look elsewhere for the next instruction.

Well, it's not quite that simple. Modern processors can do all sorts of things to make the process faster, like reading the instructions from high-speed caches instead of main memory, or analyzing the dependencies between instructions so that they can execute more than one at a time. And instructions aren't that simple either; some can execute in a single cycle (where a cycle is on the order of nanoseconds), while others can take hundreds of cycles. With very few exceptions, though, computers maintain the illusion that they are executing instructions one at a time from memory.

Maintaining this illusion makes things really convenient for programmers, but it also means that computers aren't as fast as they could be. Moore's Law has given chip designers more transistors than they know what to do with. You could easily fit dozens or hundreds of simple execution units on a processor, but because the processor needs to pretend that it's executing one instruction at a time, it's very difficult for it to use more than four or so execution units at a time.

The problem doesn't lie in the current generation of processors, but in the current generation of architectures. To make computers significantly faster than they are now, we're going to need new architectures, explicitly designed for high levels of parallelism. Which brings me to my next topic...

The TRIPS architecture

TRIPS is an experimental architecture that was created here at the University of Texas. The most important feature of the TRIPS architecture is that it operates on blocks of instructions, rather than individual instructions. Memory access and control flow, both of which are traditionally foes of high-performance computing, are handled at the block level, rather than at each instruction. (Memory access is a problem because the amount of time it takes is highly unpredictable; control flow is a problem because the uncertainty about the next instruction to execute stops the processor for a short time.) Not handling these on a per-instruction basis also simplifies the individual execution cores.

Within each block, instructions are executed in dataflow order. This means that each instruction within a block runs as soon as its operands are ready for it, rather than having to wait for its turn in the instruction stream. This design implies that a block runs as fast as it's theoretically possible for it to run, unless it's constrained by the actual dimensions of the processor grid. TRIPS uses an 8x4 grid of processors, for 32 cores total.

(As an aside: this isn't just idle theorizing about abstract issues in computer architecture. They actually built a working prototype of TRIPS a few years back.)

This is all really neat stuff, and it solves a lot of scaling issues which current processors face really elegantly. But wait, it gets better!

The TFlex architecture

Parallelism is nice, but the sad fact is that some programs simply aren't parallelizable. Even if you were to try to run them on a processor like TRIPS, they would end up only using one core at a time. Wouldn't it be nice if you could put a program like that on one core, and use the other 31 cores for other stuff at the same time?

The TFlex architecture, a derivative of TRIPS, does exactly that. It allows you to divide the execution cores into different-sized tiles, and run totally different programs on each tile. So, for example, you could run two programs on two 4x2 tiles, and a third program on the remaining 4x4 block of space.

CPU scheduling is the process of allocating time on processors to programs on a computer. Generally, on computers like you or I would have, there are far fewer processors than there are programs running, so CPU scheduling involves switching between different programs quickly enough that you don't notice, while giving each one a fair share of CPU time. On TFlex, we have the luxury of approaching the problem a bit differently: since we can decide on the fly how many processors we want to have, and what sizes they should be, we can make the simplifying assumption that every program is running simultaenously. Context switches are fun (unless you're working on a project for OS, and it's due the next day, and the switch to kernel space is just a little bit off and the whole thing crashes when you run it... well, that's another story entirely), but for an experimental system they're not really necessary.

Every program will have slightly different performance characteristics on such a machine - the speed of a program doesn't vary linearly with the size of the tile it's running on, and the relation between performance and tile size varies tremendously. The reconfigurable nature of TFlex, then, adds a whole new dimension to the problem of CPU scheduling.

And that's it for background stuff.

My research project (finally)

My research is focused around power-aware CPU scheduling on the TFlex architecture. System performance isn't the only thing you might want to optimize for with scheduling - on TFlex, you can also schedule running programs to minimize power consumption, since the power consumption of a program varies with tile size. Currently, I'm working on figuring out the relationship between performance and power consumption. If they're strongly correlated, then it makes my life a lot easier, because optimizing for one will also optimize for the other. On the other hand, if there are significant tradeoffs between the two, the problem may turn out to be more interesting.

Once you end up considering both execution speed and power consumption, there are other interesting questions you can ask. For example: "I want to run these programs, within this power budget. How quickly can they run without going over?" On a power-constrained device, running off a battery, this could be a very useful thing to know. My main for this project is to design a CPU scheduler that addresses this problem, and others like it.

Saturday, May 9, 2009

Cop-out post

It is easy to forget while programming, but there is value inherent in simplicity.

(Hit a mental block on the post I had planned for yesterday. Real update next week, I swear >_>)

P. Static has got Opinions