Reflexes and Input Generation

Human Reaction Time

The impact of input lag is compounded by what goes on before we even react. As soon as an image requiring a response hits your eyes, it will take somewhere between 150ms and 300ms to translate that into action. Average human response time to visual stimulus is about 200ms (0.2 seconds) for young adults, which is a long time compared to how quickly games can respond to input. But with this built-in handicap, when fast response to what's happening on screen is required, it is helpful to claim every advantage possible (especially for relative geezers like us).

Human response time is mitigated by the fact that we are also capable of learning, anticipation and extrapolation. In "practicing," a.k.a. playing a game, we can learn to predict future frames from current state for very small time slices to compensate for our response time. Our previous responses to input and the results that followed can also factor in to our future responses. This is part of the learning curve, especially for FPS games. When input lag is below a reasonable threshold, we are able to compensate without issue (and, in fact, do not perceive the input lag at all).

The larger input lag gets, the harder it gets to do something like aim at a moving target. Our expectation of the effect our input should have is different from what we see. This gets into something that combines reaction time and proprioception (reception of self produced stimulus). I'm not a psychologist, but I would love to see some studies done on how much input lag people can compensate for, where it starts to be uncomfortable (where it just "feels" wrong) and when it becomes an obviously visible phenomenon. In digging around the net, I've seen a few game developers conjecture that the threshold is about 100 milliseconds, but I haven't found any actual data on the subject. At the same time, 100 milliseconds (or maybe something like 1/2 reaction time?) seems a pretty reasonable hypothesis to me.

The Input Pipeline

As it is key in most games, we'll examine the case of the mouse when it comes to input. As soon as a mouse is moved, we have a delay. The mouse must begin by detecting this movement. Sorting out how responsive a mouse is these days is incredibly clouded by horrendous terminology. As understanding how a mouse works is important in groking it's impact on input lag, we'll dissect Logitech's specs and try to get some good information on exactly what's going on.

There are three key numbers in the reported specifications of Logitech mice we'll look at: megapixels/second, maximum speed, DPI, and reports/second. For the Logitech G9x high end gaming mouse, this is: 9 MP/s, 150 inches/second, 5000 DPI, and 1000 reports/second. Other gaming and good quality mice can do 500 to 1000 reports/second and have lower DPI and MP/s stats.

The first stat, megapixels/second, is important in how fast the mouse sensor itself can collect movement data. Optical and laser mice detect movement by taking pictures of the surface they are on and comparing the difference in images many times every second. To really understand how fast the mouse takes pictures (and thus how fast it can detect and calculate movement in units called "counts"), we would need to know how many pixels per frame the image is. Our guess is that it can't be larger than 17x17 based on its maximum speed rating (though it might be more like 12x12 if it needs to generate two frames for every count rather than reusing frames from the previous calculation). It'd be great if they listed this data anywhere, but we are left guessing based on other stats at this point.

Next up is DPI, or dots per inch. 5000 for the G9x. DPI is sort of a misrepresentation as the real specification should be in CPI (counts per inch). As it is, the number can be considered maximum DPI if each count moves the cursor one pixel (or dot). Under MS Windows, with no ballistics applied at the default pointer speed, DPI = CPI. Decreasing pointer speed means moving one dot for more than one count, and increasing pointer speed means moving more than one dot for every count. Of course with ballistics, talking about DPI as related to the mouse doesn't make any sense: moving the mouse faster or slower changes the number of dots moved per count dynamically. Because of this, we'll talk about CPI for accuracy sake, and consider that mouse manufacturers intend to use the terms interchangeably (despite the fact that they are not).

CPI is the number of steps the mouse can count within one inch; 1 / CPI inches is the smallest distance in inches the mouse is able to measure as a movement. The full benefit of a high definition mouse is realized when one count is less than or equal to one "dot," which is possible in games (with sensitivity sliders) and in windows if you decrease your mouse speed (though going to something with an odd cadence could cause problems).

Thus, when you tell your 5000 "DPI" mouse to run at 200 "DPI", it would be nice if it still reported 5000 CPI yet and allowed the driver to handle scaling the data down (or performing ballistics on raw data). For this example, we would only move the cursor one dot (one unit on the screen) every 25 counts. But the easy way out is it maintain a 1:1 ratio of counts to dots and drop your actual counts per inch down to 200. This provides no accuracy advantage (though with a fixed sensor speed it does increase maximum velocity and acceleration tolerance). And again it would be helpful if mouse makers could actually tell us what they are doing.

Since the Logitech G9x can do 150 inches/second maximum movement speed at 200 CPI, we know how many counts it must generate per second (though Logitech doesn't make it clear that the maximum speed and acceleration can only happen at the lowest CPI, it only makes sense with the math). The reported specifications indicate that the G9x can do about 30000 counts per second (150 inches in one second at 200 counts per inch). This is consistent with a 9 megapixel/second speed in that such a sensor could collect about 30000 17x17 frames every second based on this data.

After looking at all that, we can say that our Logitech G9x mouse is capable of detecting movement of between 1/5000th and 1/200th of an inch (depending on the selected CPI) about every 33.3 microseconds (these are 1/1000ths of a millisecond) after the movement happens. That's pretty freaking fast. Other mice can be much slower, but even cutting the speed in half won't affect hugely affect latency (though it will affect the maximum speed at which the mouse can be moved without problem).

Once the mouse has generated a count (or several) we need to send that data to the computer over USB. Counts are aggregated into groups called reports. USB is limited to 1000 Hz polling, so the 1000 reports/second maximum of the G9x makes sense: USB limits the transmission rate here. For those interested, to actually achieve 150 inches per second at 200 CPI, the mouse would need to be able to send about 30 counts per report at 1000 reports per second. This seems reasonable, but it'd be great if someone with USB engineering experience could give us some feedback and let us know for sure.

So, let's say that we've moved our mouse about a couple dozen microseconds before a report is sent. In this case, we've actually got to wait the whole millisecond for that data to be sent to the PC (because the count can't be generated fast enough to be included in the current report). So despite the very fast sensor in the mouse, we are transmission bound and our first "large" delay is on the order of single digit milliseconds. Other mice (like the Logitech G5 I'm using right now) may generate 500 reports per second, while the slowest speed we can expect is 127 reports/second. This can mean a 1ms - 8ms delay in input getting from the motion of the mouse to the computer.

Most gamers use halfway decent mice these days, so we can expect that latency is more like 2ms to 4ms for most wired USB mouse users and 1ms for gamers with higher end mice. This delay can't be cut down to anything less than 1ms until USB 2.0 is replaced by something faster. We'll ignore any cable (or any other wire) delay, as this will only add something on the order of nanoseconds to transmission time.

The input lag from a good mouse, on it's own, is in not perceivable to humans, but remember that this is all part of a larger picture. And now it's on to the software.

Index Parsing Input in Software and the CPU Limit
POST A COMMENT

83 Comments

View All Comments

  • Kaihekoa - Saturday, July 18, 2009 - link

    From the conclusion this point wasn't clear to me. Reply
  • DerekWilson - Sunday, July 19, 2009 - link

    at present triple buffering in DirectX == a 1 frame flip queue in all cases ...

    so ... it is best to disable triple buffering in DirectX if you are over refresh rate in performance (60FPS generally) ...

    and it is better to enable triple buffering in DirectX if you are under 60 FPS.
    Reply
  • Squall Leonhart - Wednesday, March 30, 2011 - link

    This is not always the case actually, there are some DirectX engines specifically the age of empires 3 engine as an example, that have hitching when moving around the map unless triple buffering is forced on the game. Reply
  • billythefisherman - Saturday, July 18, 2009 - link

    First of all I'd like to say well done on the article you're probably the first person outside of game industry developers to have looked at this rather complex topic and certainly the first to take into account the whole hardware pipeline as well.

    Sadly though there are some gaping holes in your analysis mainly focused around the CPU stage. Sadly your CPU isn't going to run any faster than your GPU (and actually the same is correct in reverse) as one is dependent on the other (the GPU is dependent on the CPU). As such the CPU may finish all of its tasks faster than the GPU but the CPU will have to wait for the GPU to finish rendering the last frame before it can start on the next frame of logic.

    No game team in the world developing for a console is going to triple buffer their GPU command list.

    I intentionally added 'developing for a console' as this is also an important factor I'd say around 75% (being very conservative) of mainstream PC games now are based on cross platform engines. As such developers will more than likely gear their engines to the consoles as these make up the largest market segment by far.

    The consoles all have very limited memory capacities
    in comparison to their computational power and so developers will more than likely try to save memory over computation thus a double buffered command list is the norm. Some advanced console specific engines actually dropping down to a single command buffer and using CPU - GPU synchronisation techniques because of CPU's being faster than GPU's. This kind of thing isnt going to happen on the PC because the GPU is invariably faster than the CPU.

    When porting a game to PC a developer is very unlikely to spend the money re-engineering the core pipline because of the massive problems that can cause. This can be seen in most 'DirectX 10' games, as they simply add a few more post processing effects to soak up the extra power - you may call it lazy coding, I don't, it's just commercial reality these are businesses at the end of the day.

    So both your diagrams on the last page are wrong with regards to the CPU stage as they will be roughly the same amount of time as the GPU in the vast majority of frames because of frame locality ie one frame differs little to the next frame as the player tends not to jump around in space and so neighbouring frames take similar amounts of time to render.

    Onto my next complaint :
    "If our frametime is just longer than 16.67ms with vsync enabled, we will add a full additional frame of latency (with no work being done on the GPU) before we are able to swap the finished buffer to the front for scanout. The wasted work can cause our next frame not to come in before the next vsync, giving us up to two frames of latency (one because we wait to swap and one because of the delay in starting the next frame)."

    What are you talking about man!?! You don't drop down to 20fps (ie two more frames of latency) because you take 17ms to render your frame - you drop down to 30fps! With vsync enabled your graphics processor will be stalled until the next frame but thats all and you could possibly kick off your CPU to calculate the next frame to take advantage of that time. Not that thats going to make the slightest jot of difference if you're GPU bound because you have to wait for the GPU to finish with the command buffer its rendering (as you don't know where in the command buffer the GPU is).

    As I've said on the consoles there are tricks you can do to synchronise the GPU with the CPU but you don't have that low level control of the GPU on the PC as Nvidia/ATI don't want the internals of thier drivers exposed to one another.

    And as I've said not that you'd want to do such a thing on PC as the CPU is usually going to be slower than the GPU and cause the GPU to stall constantly hence the reason to double buffer the command buffer in the first place.

    I've also tried to explain in my posts to your triple buffering article why there's a lot cobblers in the next few paragraphs.
    Reply
  • DerekWilson - Sunday, July 19, 2009 - link

    Fruit pies? ... anyway...

    Thanks for your feedback. On the first issue, the console development is one of growing importance as much as I would like for it not to be. At some point, though, I expect there will be an inflection point where it will just not be possible to build certain types of games for consoles that can be built on PCs ... and we'll have this before the next generation of consoles. Maybe it's a pipedream, but I'm hoping the development focus will shift back to the PC rather than continue to pull away (I don't think piracy is a real factor in profitability though I do believe publishers use the issue to take advantage of developers and consumers).

    And I get that with GPU as bottleneck you have that much time to use the CPU as well ... but you /could/ decouple CPU and GPU and gain performance or reduce lag. Currently, it may make sense that if we are GPU limited the CPU stage will effectively equal the GPU stage in latency -- and likewise that if we are CPU limited, the GPU state effectively equals the CPU stage (because of stalling) in input latency.

    Certainly it is a more complex topic than I illustrated, and if I didn't make that clear then I do apologize. I just wanted to get across the general idea rather than a "this is how it always is" kind of thing ... clearly Fallout 3 has even more input lag than any of my worst case scenarios account for even with 2 frame of image processing on the monitor ... I have no idea what they are doing ...

    ...

    As for the second issue -- you can get up to two frames of INPUT LAG with vsync enabled and 17ms GPU time.

    you will get up to these two frames (60Hz frames) of input lag at 30FPS ...

    I'm not talking about the frame rate dropping to 2 frames then 1 frame (20 FPS) ... I'm talking about the fact that, at best, your input is gathered 17ms before your frame completes on the GPU (1 frame of input lag) and (because it missed vsync) it will take another frame for that to hit the screen (for a total of two).
    Reply
  • billythefisherman - Monday, July 20, 2009 - link

    I have to re-iterate: well done on tackling this rather complex issue, I applaud you! (I just wish you hadn't whipped up your punters so much in the benefits of triple buffering!) Reply
  • Gastra - Saturday, July 18, 2009 - link

    For (quite a lot if you follow the links) of information on what an optical mouse see:
    http://hackedgadgets.com/2008/10/15/optical-mouse-...">http://hackedgadgets.com/2008/10/15/optical-mouse-...
    Reply
  • DerekWilson - Sunday, July 19, 2009 - link

    That's pretty cool stuff ... And it lines up pretty well with our guess at mouse sensor resolution for the G9x.

    It'd still be a lot nicer if we could get the specs straight from the manufacturer though ...
    Reply
  • PrinceGaz - Friday, July 17, 2009 - link

    "For input lag reduction in the general case, we recommend disabling vsync. For NVIDIA card owners running OpenGL games, forcing triple buffering in the driver will provide a better visual experience with no tearing and will always start rendering the same frame that would start rendering with vsync disabled."

    I'm going to ask this again I'm afraid :) Are you sure Derek? Does nVidia's triple-buffer OpenGL driver implementation do that, or is it just the same as what most people take triple-buffer rendering to be, that is having one additional back buffer to render to so as to provide a steady supply of frames when the framerate dips below the refresh rate? Have you got confirmation either from screenshots or something else (like nVidia saying that is how it works) that OpenGL triple-buffering is any different from Direct3D rendering, or how AMD handle it?.

    Because if you don't, then all you are saying is that triple-buffering is a second back-buffer which is filled to prevent lags when the framerate falls below the refresh rate. Do you know for sure that nVidia OpenGL drivers render constantly when in triple-buffer mode or are you only assuming they do so?
    Reply
  • PrinceGaz - Friday, July 17, 2009 - link

    Just to add

    "For input lag reduction in the general case, we recommend disabling vsync"

    It is rather ironic that you used that phrase, when in the your previous article you were strongly stating the case for v-sync always being used (preferably with triple-buffering).

    Unless you are certain that nVidia's OpenGL implementation of triple-buffering works how you think it does (and not how most people think it does), posting articles may be unwise.
    Reply

Log in

Don't have an account? Sign up now