People are still talking about framerates and resolutions, and which of the new-gen games support 960p and which ones have successfully made the jump to 1080p. The idea is that games with a higher framerate are “more responsive”. That’s true, although if you’re really worried about having a game feel responsive, framerate might be the least of your worries.
Game development has become radically complex. So complex that no single human being knows how the entire system works. A modern console is the work of dozens of specialists, each with their own domain: The console controller, the device itself, the operating system, the game engine, the rendering engine, the rendering hardware, and the monitor. Each of those systems is designed to be as “black box” as possible. The idea is that if you’re working on the rendering engine, you want to be able to focus on this one job and not sweat the details of what the operating system or or monitor is doing.
In the old days — and I mean the really old days — the whole system was much simpler. If you used an Atari 2600 controller, then you nearly had a direct line to the processor. If you tore one apart, you would see that each input was a simple circuit. By pushing left, your physical pressure on the stick completed an electrical connection that flowed right into the device, and the game programmer wrote machine code that would test to see if that circuit was open or closed. In programmer talk people call this coding “direct to the metal”.
But as the industry grew and the complexity of our gaming devices increased, we needed more layers of abstraction between the disparate systems. Each layer is another task that needs to be performed between the moment you push a button and the instant you see the results onscreen. Let’s look at the journey:
Let’s assume you’re using some sort of wireless setup, since that’s probably the most common. You waggle the Wiimote, push the Xbox A button, or smack your thumb on the spacebar to create some input. The device encodes this new state into a packet and broadcasts it according to whatever protocol it’s using. (Which is very likely Bluetooth.) The wireless receiver catches the packet, unpacks it, figures out what device it’s from, and hands the data off to the device driver.
The device driver takes the data and turns it back into some sort of useful input, figuring out which buttons are down, which ones are up, and so on. Then it hands that off to the operating system. The OS needs to have a look at it just in case you’ve done something that requires its attention. For example, if you’ve hit the Xbox guide button (the big green glowing X in the middle) or the Windows key, then it needs to take some sort of drastic action. Assuming none of that is going on, the OS will pass along the input data to the game.
The game itself very likely has another layer of abstraction inside of it, and perhaps even more than one. Engines like Unreal or Unity tend to have an indirect way of asking for input that lets game developers use (roughly) the same code for Xbox, PlayStation, PC, and so on. So these button inputs get translated into some intermediate stage that maps “A button” to “jump”, or whatever. At this point the game engine is finally aware of your action and is able to respond. Congratulations! We’re (conceptually) halfway there!
There might be a bit of delay inherent to the game itself. When you hit the jump button in a modern platformer like Uncharted or Tomb Raider, the game sort of makes a note to begin the jumping animation the next time the current run animation comes around to the point where the game can blend from the “run” animation to the “leap” animation. Other actions (such as shooting) ought to be more direct, although I wouldn’t be surprised to learn that there was a tiny bit of animation fudging when you pull the trigger, too.
In any case, once your action has been registered and the state of the game changes, then it’s finally time to draw the result. We have to wait for the next time the rendering engine begins a new frame, because changing the state of rendered objects during rendering is usually a good way to cause a crash.
The game renders the polygons to form the final image. It’s entirely possible that this one step is as complex as everything else in this article combined, but it’s probably not worth going over in detail. I think nearly everyone understands that 3D rendering is messy business and we don’t need to belabor this part. Let’s just say that somehow, we end up with a fancy 960p or 1080p image to show the player.
Done, right? No. Not even close.
If this game has a lot of special effects – like motion blur, depth-of-field, or really aggressive anti-aliasing, then we have a lot more processing to do on this image. More importantly, this brand new frame might sit in a queue, waiting for two or three more frames before it actually gets sent to the monitor.
Once the image is sent out, the monitor has to do some last-minute processing. It needs to turn that HDMI or DVI signal into a meaningful pattern of red / green / blue pixels to make the final image show up. There might be some further finessing if you’re running a game at a non-native resolution or doing some sort of interpolation. (Technically you don’t need that sort of thing for a videogame as it’s primarily designed for smoothing out movies, but your TV can’t tell the difference between games and movies. So there might be one more frame of delay for the sake of making frame-to-frame transitions smoother.)
We’re still not done.
As it turns out, the last step can be the most expensive. Once the television is done screwing around and is ready to show you the new image, it can take a long time for the individual pixels of your LCD screen to actually change. Some pixels need to turn off, and others need to turn on, and this process is not instant. Pixels can take anywhere from 4 milliseconds (four one-thousandths of a second) to 20 milliseconds to change state. (This figure is based on tests performed by John Carmack some time ago, when people started worrying about this sort of thing.) That latter number should worry you. 20ms is one 1/50th of a second. If you’ve got a crappy television like this and you’re trying to play a game that runs at 60fps, then you’re not really getting the full experience because the game is sending frames faster than the pixels on the screen can respond.
Now at last the player sees the result of their button-press.
It’s true that some of these layers are so fast they’re barely worth worrying about, but taken together it’s a long road from the moment your finger hits the button to when the result shows up on screen. These layers have been gradually accumulating over the years, adding more and more tasks in an effort to standardize and simplify devices that are becoming breathtakingly complex. (And I wouldn’t be surprised to learn there are more layers I don’t know about.) It wasn’t until recently that developers took a step back and began asking where all the time was going. It was actually John Carmack that got everyone’s attention when he demonstrated that (on some setups) your computer could send an internet packet across the Atlantic Ocean faster than it could change the pixels on your monitor.
Consumers pushing for better framerates are fine, but I just want people to understand that if you’re really worried about a game being “responsive”, then the jump from 30pfs to 60fps isn’t going to help if the other layers are clogged, slow, or inefficient. It’s also not something to worry about if you’re using a cheap television with a bad pixel response time. Don’t get caught up in marketing hype, and if you notice that a game with double the framerate doesn’t feel twice as responsive, it’s probably not your imagination.
Shamus Young is a programmer, critic, comic, and crank. You can read more of his work here.