Journal Club: Speed Kills
I’m on my fourth semester of CS now, so I’m taking an introductory class on operating systems. (Introductory as in “actually,” not the bizarre practice of calling grad-level courses “introductory.”) Today, we got to some concrete instances of scheduling algorithms: given a laborer who’s told to do a bunch of jobs as they come in, determine the best ordering of jobs. (Best is according to some metric.)
The first one we discussed is “in order of arrival.” It’s not interesting. The second one we discussed was “shortest time to completion.” Offline, it isn’t too hard to implement– just use a binary heap or some other kind of priority queue. It’s also a rare instance of a (provably) optimal greedy algorithm optimizing for highest average throughput. That isn’t a bad metric for scheduling, so you’d think the machine you’re reading this on might use it.
However, in an online scheduling problem, like picking which process for your CPU to do (which motivates all this discussion), it’s provably impossible to implement. This is because it’s about as hard as the halting problem: how are you supposed to know how long a program runs for when you haven’t ran it yet?
This hasn’t stopped people from trying. You can approximate it with statistics on previous program runtimes, or make estimates based on static analysis, or some combinations of those two or other things. Your machine incorporates those ideas when it’s scheduling right now.
Today, I’ll be trying to understand “Speed Is as Powerful as Clairvoyance,” which implies that it’s possible to get provably pretty close to that optimal “shortest time to completion” offline scheduling algorithm in real life. It’s written by Dr. Pruhs and Dr. Kalyanasundaram of Pitt’s CS department. I’ve taken two of Dr. Pruhs’s classes now, and he’s tied for the best lecturer I’ve ever had, so I kinda feel like I owe it to him to try and understand his most-cited paper.
As an unrelated aside before I get into the paper, there’s a stronger version of the result of the halting problem’s undecidability, called Rice’s Theorem. It says that all nontrivial semantic properties of an arbitrary program in a Turing-complete model of computation are undecidable. NASA, for example, gets around it by requiring that all code has to be proven to terminate on all inputs at compile time (so they can statically analyze the heck out of it)– their model of computation isn’t Turing complete. Move fast and break things doesn’t fly, both figuratively and literally.
As another unrelated aside, I wrote and deployed this post without touching a browser. I think that’s tight.
Speed Is as Powerful as Clairvoyance
“Clairvoyance” is being able to see into the future. Scheduling algorithms like the one in your operating system unfortunately can’t see into the future. A hypothetical optimal scheduling algorithm is clairvoyant, so it can see into the future.
Scheduling algorithms’ analysis is generally done by “competitive analysis” and their “competitive ratio,” which is the highest ratio of that algorithm’s cost to the optimal one’s for a given input. This type of analysis has apparently been criticized for some reasons I don’t have time to do a deep dive on.
This paper introduces a new type of analysis of online scheduling problems and their related algorithms based on the theory of approximation algorithms, called “resource augmentation analysis,” where you give the processor more resources (like a faster clock speed) than whatever the “bad guy” (that’s making it do all this work) has.
Background aside: what’s “Approximation Algorithms?” What’s “Online?”
An “online” problem is one where you get the input over time, not all at once before the algorithm starts.
Some problems are hard to solve with algorithms. Computer scientists suspect there can’t exist an algorithm to always solve a “hard” problem “quickly” (which is defined as running in a number of steps that’s bounded by some polynomial function of the number of cities in the problem instance). An example of a hard problem to solve is the Traveling Salesman Problem, which is the problem of finding the shortest path to visit a bunch of cities while only visiting each city once.
An approximation algorithm is a fast algorithm to provably “always get close” to the optimal solution. An example of an approximation algorithm is the Christofides 3/2 algorithm for that same problem, which always gets within 50% of the optimal solution in time proportional to the cube of the input size (that’s a polynomial), subject to reasonable constraints about the kind of space these cities are arranged in. (“Reasonable” as in “you can understand what they imply if you took a semester of analysis,” which I haven’t, but I can tell you that the cities in America, or any other kind of country, are arranged “reasonably.”)
Back to resource augmentation analysis
The core argument (with real proofs to support it!) is that adding a little bit more resources, like CPU speed in the scheduling problem, allow some algorithms (a couple are given in the paper) to become arbitrarily close to the impossible-to-implement optimal ones.
That is, resource augmentation analysis is cool because it shows that you can let a processor bound the negative effects of its inability to see into the future by making, say, its clock speed a constant factor faster. That’s all you need to do. I’m going to repeat myself again. Making the processor a little faster is effectively letting it make its decisions as if it could see into the future.
All this is proven in the paper with facts and logic. Go read it.
Okay?
In my book review of Algorithmica (a few posts ago), I talked about how whiteboard analysis of algorithms can fail to capture real-world performance, which actually matters if you want to make money without writing grant applications.
I’ve observed it firsthand– I wrote a market data feed handler when I was learning C and data structures. It’s basically a wrapper over a collection of highly-performant data structures, including a symbol table (mapping integers to pointers). I stress-tested them with terabytes of input per hour, with memory usage measured in gigabytes (i.e. asymptotics should matter), and I still found that vectors beat binary search trees (self-balancing or not– data was normally distributed, as is that specific kind of market data): cache locality owns whiteboard analysis. (I also rolled a vector that beats the C++ STL’s. People need to stop putting the STL on a pedestal.)
This is an awesome result that has real-life implications. (Seriously– when you want to prove that Union-Find from the Sedgewick/Wayne book is fast, you have to bust out the Ackermann function?) You can, in fact, use this knowledge to write better programs, from web servers (caching is a very similar problem) to the actual CPU schedulers that motivated all this. Additionally, it serves to practically direct effort away from benchmarking heuristics towards just experimenting with BIOS settings.
Now, read the actual paper! It’s good for you.