Enter FCAT

In our comprehensive look at stuttering and FRAPS, we laid out what our ideal method would be for measuring frame intervals. Ideally we would like to be able to tag a frame from the start of the rendering pipeline to the end, comparing frames as they come in and out of the rendering pipeline by time stamping frames and then comparing the intervals in those time stamps to the intervals between the frames at the end of the rendering pipeline when they are displayed. Ideally, these two intervals would match up (or be close enough), with the simulation time between frames coming at an even pace, and the frame interval itself coming at an even pace.

Of course in the real world this isn’t quite impossible, but it’s highly impractical due to the fact that it requires the participation and assistance of the application itself to write the time stamps (by the time draw calls are being made, it’s too late). In lieu of that, simply being able to look at the end of the rendering pipeline would be a major benefit. After all, the end of the rendering pipeline is where frame swaps actually happen, and it is the position in the rendering pipeline that best describes what the user is seeing. If FRAPS isn’t enough because it can only see the start of the rendering pipeline, then the logical next step is to look at the end of the rendering pipeline instead.

This brings us to the subject of today’s article, FCAT, the Frame Capture Analysis Tool.

As we mentioned in our look at stuttering yesterday, as it turns out both NVIDIA and AMD agree with the fundamental problem of trying to judge frame intervals from the start of the rendering pipeline. For the past couple of years NVIDIA has been working on an alternative tool to measure frame latency at the end of the rendering pipeline, and at long last they are releasing this tool to reviewers and the public. This tool is FCAT.

So what is FCAT? FCAT is essentially a collection of tools, but at its most fundamental level FCAT is a simple, yet ingenious method to measure frame latency at the end of the rendering pipeline. Rather than attempting to tap into the video drivers themselves – a process inherently fraught with problems if you’re intending to do it in a vendor-neutral manner that works across all video cards – through FCAT NVIDIA can do true frame analysis, capturing individual frames and looking at them to determine when a buffer swap occurred, and in turn using that to measure the frame interval.

How FCAT Works

So how does FCAT work? FCAT is essentially a 2 part solution. We’ll dive into greater detail on this in part 2 of our FCAT article, but in summary, due to the inner-workings of video cards, monitors, and PC capture cards, both monitors and PC capture cards work at fixed intervals. Regardless of the frame rate an application is running at, most PC LCD monitors operate at a 60Hz refresh interval. In the case of v-sync this means buffer swaps are synchronized with the refresh interval (which among other things caps the framerate at 60fps), but when v-sync is disabled, buffer swaps can occur in the middle of a refresh. As a result any given refresh interval can be composed of multiple frames. This makes it possible to display well over 60fps on what’s otherwise a 60Hz monitor, with the end result being that multiple frames can be in one refresh interval.

PC capture cards work on the same principle, and just as how a monitor would refresh at 60Hz a PC capture card will capture at 60Hz. The end result being that while a PC capture card can’t see more than 60 whole frames, it can see parts of those frames, and being able to see parts of frames is good enough. In fact it sees the same parts of those frames that a user would see, since the 60Hz refresh rate on a monitor causes the same effect.

Ultimately by capturing frames and analyzing them, it is possible to tell how many frames were delivered in any given refresh interval, and furthermore by counting the time between those partial frames and comparing it to the refresh interval, it is possible to compute just how long the frame interval was and how long any individual frame was visible.

Of course doing this on a raw game feed would be difficult in the best of situations. As a simple thought experiment, consider a game where the player isn’t moving. If nothing changes in the image, how is one to be able to tell if a new frame has been delivered or not?

The solution to this is in the first-half of FCAT, the overlay tool. The overlay tool at its most basic level is a utility that color-codes each frame entering the rendering pipeline. By tagging frames with color bars, it is possible to tell apart individual frames by looking at the color bars. Regardless of the action on the screen (or lack thereof), the color bars will change with each successive frame, making each frame clear and obvious.

On a technical level, the FCAT overlay tool ends up working almost identically to video game overlays as we see with FRAPS, MSI Afterburner, and other tools that insert basic overlays into games. In all of these cases, these tools are attaching themselves to the start of the rendering pipeline, intercepting the Present call, adding their own draw commands for their overlay, and then finally passing on the Present call. The end result is that much like how FRAPS is able to quickly and simply monitor framerates and draw overlays, the FCAT overlay tool is able to quickly insert the necessary color bars, and to do so without ever touching the GPU or video drivers.

With the frames suitably tagged, the other half of the FCAT solution comes into play, the extractor tool. By using a PC capture card, the entire run of a benchmark can be captured and recorded to video for analysis. The extractor tool in turn is what’s responsible for looking at the color bars the overlay tool inserts, parsing the data from a video file to find the individual frames and calculate the frame intervals. Though not the easiest thing to code, conceptually this process is easy; the tool is merely loading a frame, analyzing each line of the color bar, finding the points where the color bar changes, and then recording those instances.

This ultimately results in a Tab Seperated Values file that contains a list of frames, when they occurred, the color bar they were attached to, and more. From here it is possible to then further process the data to calculate the frame intervals.

The end result of this process is that through the use of marking frames, capturing the output of a video card, and then analyzing that output, it is possible to objectively and quantitatively measure the output of a video card as an end-user would see it. This process doesn’t answer the subjective questions for us – mainly, how much stutter is enough to be noticed – but it gives us numbers that we can use to determine those answers ourselves.

Finally, for the purposes of this article we’ll be glossing over the analysis portion of FCAT, but we’ll quickly mention it. Along with the overlay and extractor tools, FCAT also includes a tool to analyze the output of the extractor tool, from which it can generate graphs, identify so-called “runt” frames, and more. The analysis tool is not strictly necessary to use FCAT – one can always do their own analysis – but the analysis tool does simplify the use of the suite by quickly and conveniently handling that last step of the process. We’ll get into the analysis tool in much greater detail in part 2 of our article, where we can apply it to our full suite of test results to better understand what it looks for and what it’s representing.

Introduction More To Come
Comments Locked

88 Comments

View All Comments

  • Th-z - Wednesday, March 27, 2013 - link

    This can be time consuming and costly, but for the authenticity of FCAT's numbers, reviewers can always cross-ref with a second set of tool, e.g. high speed camera in random sections of a benchmark run to see if two results are consistent. If the stutter is really bad, a player can also detect it by eyes. My way of checking how constant the frames are is to move side to side with constant speed if it's a first person shooter (A and D in WASD config), or a controller to move left stick all the way to left or right for other types of games.
  • Kevin G - Wednesday, March 27, 2013 - link

    A couple of generic questions regarding FCAT and the extractor tools.

    I presume that the same color will appear across multiple frames on the monitor if the FPS is below 60?

    Are the colors chosen random or do they repeat in a regular pattern? IE red green blue yellow red green blue yello red green blue yellow etc.

    Does the extractor tool just use the color bar or does it attempt to determine where in a scan line a new frame starts?

    Can this be used on an Eyefinity/Surround setup? Does only one of the monitors need to be captured? What is the maximum resolution that can be captured?

    Looking at PLOT.png, there already appears to be some abnormally large spikes. Is there going to be any sort of visual inspection to verify that such oddities are just performace spike and not something erroneous going on with FCAT?

    If FCAT and FRAPS are used simultaneously, which one intercepts the Present call first to process an overlay?
  • Kevin G - Wednesday, March 27, 2013 - link

    Two more questions:

    What happens when the Present call is invokes when the color bar is being drawn on screen? IE one scan lines has the color bar composed of two different colors? Does the extractor record this as one scan line as one frame? (For example this could lead to a spike where one frame is recorded at 1080 fps on a 1080p display.)

    If FRAPS can determine the frame rate from the application's perspective and FCAT can records the frame rate at the display level, then couldn't the latency invoked by just the OS/driver be determined on a per frame basis? Determining this latency would be interesting.

    Can FCAT be used to determine the where a frame was rendered in a multiple GPU set up? IE each GPU is only allowed a specific subset of colors to use for their color bar.
  • Ryan Smith - Friday, March 29, 2013 - link

    1) A scan line cannot be composed of two different colors. A frame cannot be switched mid-line. It has to wait until the end of the line.

    2) So without timestamping and clock syncing, we would not be able to determine latency. It wouldn't be possible to easily match up frames to present calls, nor how long it took that one frame to traverse the pipe.

    3) No. FCAT cannot tell us which GPU rendered a frame. AFR is too abstracted from the rendering pipeline for that.
  • Ryan Smith - Friday, March 29, 2013 - link

    1) Correct. If the FPS is below 60, the monitor would be repeating part of a frame, so the color bar would stay the same until a new frame is finally served up.

    2) The colors are in a regular pattern of 16 colors to detect dropped frames.

    3) The extractor tool only looks for the color bar

    4) Yes, this can be used on a surround setup. You'd just capture the leftmost display. The maximum resolution right now would be 2560x1440 (the limits of the capture card), which would be part of a 7680x1440 or 12800x1440 setup.

    5) We've already taken a look at results like those. FCAT is almost dead simple; those aren't anomalies in FCAT.

    6) It would be FRAPS first, then FCAT. NVIDIA suggests starting FRAPS second, so it comes earlier in the chain.
  • wingless - Wednesday, March 27, 2013 - link

    Frame Render Ahead/Pre-Rendering (frames prepared by the CPU ahead of time)? How does this option change the outlook for SLI and Crossfire? The Battlefield 3 community is saying if you change this value to 0 or 1 versus the default value of 3, game play is smoother. I've done it myself on my old HD 4870 Crossfire setup on a i7-2600k+Z77 system and it does seem to help.

    Does this pre-rendering affect the amount of runt frames on an SLI/Crossfire configuration?

    What is the affect of different CPU/Chipset combinations. At one point folks used to say AMD system felt smoother, despite showing lower FPS.

    Thanks for your analysis!
  • marc1000 - Wednesday, March 27, 2013 - link

    so, this new method requires a PCIe card to capture the video? is that correct?
  • bobbozzo - Wednesday, March 27, 2013 - link

    It requires a capture card, and the capture probably needs to be done in a different computer, otherwise game performance would suffer.
  • Ryan Smith - Friday, March 29, 2013 - link

    Correct. A separate system must be running a capture card to capture the output of the test system. We're using a Datapath Limited VisionDL-DVI card.
  • HisDivineOrder - Wednesday, March 27, 2013 - link

    Soooo... not only did AMD NOT realize that this was an issue, but nVidia has known enough to worry about this particular issue for YEARS and develop a tool to detect and measure it.

    Wow, AMD. You were behind the curve BEFORE you fired all those engineers in your R&D. The more this story develops, the more sad it becomes. Damn. It's like you're a farmer who failed to realize that you have to spray your fields for insects in addition to fertilizing them.

Log in

Don't have an account? Sign up now