Original Link: https://www.anandtech.com/show/1573
NVIDIA Enables PureVideo on GeForce 6 GPUs
by Anand Lal Shimpi on December 20, 2004 1:22 PM EST- Posted in
- GPUs
Eight months.
We'll let you think about that once more.
Eight months.
Eight months have passed since NVIDIA introduced the GeForce 6800 and its Video Processor and today, after eight long months of waiting with no explanation, we can finally take advantage of it. The wait is over, NVIDIA's PureVideo DVD decoder and drivers are publicly available for download. GeForce 6 owners can finally take advantage of the ~20 million transistors set aside for NVIDIA's "Video Processor" through the driver and codec that are being released today.
When NVIDIA first told us about NV40 back in March 2004 they were quite excited about this "Video Processor" they had built into the chip. What we were originally told is that the Video Processor would be a fully programmable video acceleration engine, capable of accelerating both encoding and decoding operations, making HD video encoding and decoding accessible to all users, regardless of system specs. Eight months later, here are the major points of what NVIDIA's Video Processor can do:
1) Hardware acceleration of Windows Media Video 9 and MPEG-2 decode
2) Spatial-Temporal Adaptive Per Pixel De-Interlacing (with 3-2 and 2-2 detection)
3) Everything previous NVIDIA GPUs have been able to do
The feature list isn't as impressive as say full hardware accelerated encoding, but it's still worth a look. Other features such as gamma correction and motion estimation engine are also supported but we won't dive into them as there's not much new to talk about there.
What was once known only as the NV4x Video Processor has now been given the marketing name PureVideo. PureVideo is exclusively available on the GeForce 6 series of GPUs and only the latest GeForce 6 GPUs have a fully functional PureVideo core. The original NV40 and NV45 (GeForce 6800GT/Ultra) do not have functional Windows Media Video 9 decode acceleration, but the rest of the GeForce 6 series are feature complete (GeForce 6800/6600GT/6600/6200).
So after we've hounded NVIDIA for months about PureVideo, we're finally able to test it. But before we can test it, there's a bit of background that has to be taken care of...
An Interlacing Primer
A big part of the PureVideo feature set are its de-interlacing capabilities, but before we explain what de-interlacing is we have to explain what interlacing is and why you would want to de-it. Let's say we wanted to display an animation and here we have one frame of that animation: If the world were perfect we would just broadcast as many frames of our animation as we had, at a constant frame rate, and we would have accomplished what we set out to do. Unfortunately the world isn't perfect and when we first wanted to broadcast this animation there were significant bandwidth limitations both on the transmitting and receiving side, preventing us from sending one complete animation frame at a time. One solution to this problem would be to divide up each frame into separate parts and display those parts in sequence. If the sequence is fast enough, the human eye would be hard pressed to notice the difference. So let's do it, we take our original frame and produce two separate fields, each with half of the resolution of the original frame:
Field 1
Field 2
Frame Rate Conversion and You
There are two basic types of content stored on most DVDs: content that came from a 24 fps source and content that came from a 30 fps source. The most popular is 24 fps source content because all movies are recorded at 24 fps and since the majority of DVDs out there are movies, this is the type we'll talk about first.
Although most motion pictures are recorded at (approximately) 24 frames per second, no consumer television is capable of displaying at that frame rate. In order to get TVs up to the sizes that we all know and love (and at affordable prices), TVs are not as flexible as computer monitors - they are fixed frequency displays, so displaying contents with varying frequencies is not exactly possible. The DVD production houses know this so they have to convert their 24 fps source into something that can be displayed on the majority of TVs out there.
In the North American market, the majority of TVs are interlaced NTSC TVs that display 60 fields per second. As we've just explained, a single interlaced field has half the resolution of a full frame in order to save bandwidth; by displaying 60 of those interlaced fields per second, the human eye is tricked into thinking that each frame is complete. But how can you convert 24 non-interlaced (aka progressive) film frames into 60 interlaced fields?
The first step is to convert the progressive film frames into interlaced frames, which is pretty simple, just divide up each frame into odd and even lines and send all the odd ones to one field and all the even ones to another.
Now we have 48 interlaced fields, but we are still short of that 60 fields per second target. We can't just add 12 more fields as that will make our video look like we hit the fast forward button, so the only remaining option is to display some of the 48 fields longer. It turns out that if we perform what is known as a 3-2 pulldown we will have a rather nice conversion.
Here's how it works:
We take the first progressive frame and instead of just splitting it into two interlaced fields, we split it into three, with the third being a copy of the first. So frame 1 becomes field1a, field2a and field1a again. Then, we take the next progressive frame and split it into two interlaced fields, field2a and field2b, no repetition. We repeat this 3-2 pattern over and over again to properly display 24 fps film source on interlaced NTSC TV.
There are a few movies and some TV shows that are recorded at a different frame rate: 30 fps. The 30 fps to 60 fields per second conversion is a lot easier since we don't need to alternate the pattern, we still create interlaced fields for the sake of NTSC compatibility but we display each field twice, thus performing a 2-2 pulldown instead of the 3-2 pulldown that is used for film. One of the most popular 30 fps sources is Friends (note: it turns out that Friends is incorrectly flagged as a 30 fps source but is actually a 24 fps source) but other sources are sometimes recordered at 30 fps, including some bonus material on DVDs. Because of this, while 24 fps sources are usually categorized as "film", 30 fps sources are usually called "video" (these names will have significance later on).
Remember that the whole point for performing these conversions is that until recently, all televisions have been these low bandwidth interlaced displays. Recently however, televisions have become more advanced and one of the first major features to come their way was the ability to display non-interlaced video. A non-interlaced TV is useless without non-interlaced content, thus manufacturers produced affordable non-interlaced DVD players, otherwise known as progressive scan DVD players.
But if you have a progressive scan DVD player you don't have to buy progressive scan DVDs, the DVD player instead does its best to reassemble the original progressive scan frames from the interlaced content stored on the DVD. Given the two major algorithms we mentioned above, reconstructing the original progressive frames from the interlaced data on the DVD shouldn't be a difficult task. Once the DVD player know if it is dealing with 24 fps or 30 fps content, it simply needs to stitch together the appropriate fields and send them out as progressive frames. The DVD spec makes things even easier by allowing for flags to be set per field that tell the DVD player how to recover the original progressive source frames. No problems, right? Wrong.
It turns out that the flags on these DVDs aren't always reliable and can sometimes tell the DVD player to do the wrong thing, which could result in some pretty nasty image quality. So DVD players can't just rely on the flags, so algorithms were created to detect the type of source the DVD player was dealing with. If the decoder chip detected a 3-2 pattern it would switch into "film" mode and if it detected a 2-2 pattern it would switch into "video" mode. The problem here is that due to a variety of factors including errors introduced during editing, transition between chapters on a disc, and just poorly encoded DVDs, these algorithms are sometimes told to do the wrong thing (e.g. treat 24 fps content as 30 fps content). These hiccups in the 3-2 pattern don't happen for long periods of time (usually), but the results can be quite annoying. For example, if the DVD decoder chip tries to combine two fields that belong to different frames, the end result is a frame that obviously doesn't look right. While it may only happen in a few frames out of thousands on a single DVD those few frames are sometimes enough to cause a ruffled brow while watching your multi-thousand-dollar home theater setup.
So what does all of this have to do with NVIDIA's PureVideo? Although it's not in a set-top box, PureVideo is just as much of a DVD decoder as what you have sitting underneath your TV, it's just in your computer. And to measure its effectiveness, we have to look at how it handles these trouble cases. Remember that your PC is inherently a "progressive scan" device, there's no interlacing here, so the quality of your videos directly depends on NVIDIA's algorithms.
A Brief Look at De-Interlacing Modes
The process of taking interlaced content and displaying it in a non-interlaced form is often referred to as de-interlacing (for obvious reasons). There are two basic methods of de-interlacing, commonly known as "bob" and "weave."
Bob de-interlacing is more technically referred to as linear interpolation and it simply fills in the missing lines of resolution by interpolating between the resolution lines that are available. This interpolation form of de-interlacing is particularly useful if there are a lot of solid colors on the screen and if the decoder screws up and decides to combine two fields from different frames.
Weave de-interlacing, as the name implies, simply combines alternating lines of resolution from two separate interlaced fields. Using either method individually is generally not the best way of doing things, but thanks to the decent amount of power in today's PCs more sophisticated algorithms can be implemented to dynamically switch between bob and weave on a per pixel basis within a frame (usually referred to as adaptive per pixel de-interlacing).
NVIDIA's PureVideo supposedly takes adaptive per pixel de-interlacing one step further with what they call Spatial-Temporal de-interlacing. The idea here is that normal per pixel adaptive de-interlacing uses data from fields within a single frame to essentially fill in the blanks. NVIDIA's Spatial-Temporal de-interlacing can use data from fields in other frames to improve de-interlacing quality. We'll have to see if this ends up improving quality or not in our tests later in the article.
NVIDIA's PureVideo Driver and Encoder
There are two parts to the software side of PureVideo - the GPU driver and the PureVideo DVD decoder. The driver is simply a version of the ForceWare 67.01 driver, the PureVideo DVD decoder is the latest update to NVIDIA's NVDVD decoder - version 1.00.65. The GPU driver is obviously available free to the public, while the PureVideo DVD decoder sells for $19.99 due to associated royalties. The PureVideo DVD decoder is available as a 30-day free trial from NVIDIA's website.
The PureVideo DVD decoder installs just like any application would and has a control panel associated with it. You can only access the control panel while using the decoder (e.g. watching a DVD) or if you are using a media player that lets you access it directly (e.g. Zoom Player). The PureVideo decoder control panel has a few options to it, although the control panel is unnecessarily complicated.
The main options you'll want to adjust are the de-interlacing options, but unfortunately NVIDIA included two separate de-interlacing controls in the driver that will undoubtedly confuse users.
The first control is marked De-interlace Control and has the following options: Automatic, Film, Video and Smart. Automatic mode simply uses the DVD flags to determine what the source is and applies the appropriate algorithms based on the flags.
The Film and Video modes tell the DVD decoder to treat all content as 24 fps or 30 fps content respectively. Smart mode is the option you'll want to set and it uses both flags as well as NVIDIA's own algorithms to determine the best de-interlacing to apply.
Then we have the De-interlace Mode control which has the following options: Best available, Display fields separately and Combine fields.
Display fields separately and Combine fields force bob and weave, respectively, regardless of content.
Best available is the option you'll want to use for the best image quality as it uses NVIDIA's per pixel adapative de-interlacing algorithms. So the combination you'll want to use is Smart mode with the Best available setting. NVIDIA included the other options for the tweakers in all of us, however we'd much rather see a single control or something that is at least a bit more intuitive than what NVIDIA has put together right now.
DVD Playback Quality
Now that we've laid the background information, it's time to look at DVD playback quality. Although NVIDIA provided us with around 700MB of test data, we took it upon ourselves to put together our own test suite for image quality comparisons. We used some tests that have been used in the home theater community as de-interlacing benchmarks, as well as others that we found to be particularly good measures of image quality.
For all of our quality tests we used Zoom Player Pro, quite possibly one of the most feature filled media players available.
Our first set of tests are Secrets of Home Theater and High Fidelity tests. The Galaxy Quest theatrical trailer isn't flagged at all and relies entirely on the DVD decoder's algorithms for proper de-interlacing. The default image below is ATI's X700 Pro, mouse over it to see NVIDIA's PureVideo enabled 6600GT:
Hold mouse over image to see NVIDIA's Image Quality
Neither ATI or NVIDIA pass the Big Lebowski test, what went wrong here? The correct image above was generated by using a software decoder (DScaler 5) and forcing "bob" de-interlacing, which uses none of the data from the next field in constructing the current frame. The reason this works is because this particular scene causes most DVD decoders to incorrectly weave two fields together from vastly different scenes, resulting in the artifacts seen above. It's quite disappointing that neither ATI nor NVIDIA are able to pass this test as it is one of the most visible artifacts of poor de-interlacing quality.
Our next set of tests are taken entirely from The Best of Friends Volume 3 DVD. As we mentioned earlier, Friends is recorded at 24 fps but is flagged as 30 fps source, which provides a good stress test for DVD decoder detection algorithms as well as de-interlacing abilities. Our first scene shows a similar number of artifacts on both the ATI and NVIDIA GPUs, image quality appears to be identical across both GPUs.
Hold mouse over image to see NVIDIA's Image Quality
The test here is to look at the white rail through the window - it should be perfectly white without any interruptions. ATI fails the test but NVIDIA passes it.
Hold mouse over image to see NVIDIA's Image Quality
HD Decode Performance
The other aspect of PureVideo that matters is its decode acceleration. DVD decoding isn't really an issue these days, as even the slowest CPUs are powerful enough to handle DVD decoding - the new stress test is decoding of HD content. We used Windows Media Player 10 and the publicly available Terminator 2 trailers in 720p and 1080p formats. However because our test bed was limited to 1600 x 1200, the 1080p test was fairly useless as we were resolution bound on the machine, making the 720p test much more stressful.
We measured average, minimum and maximum CPU utilization over the entire 1:59 trailer. Our test bed was an Intel Pentium 4 570J (3.8GHz), however higher CPU utilizations on this test bed will translate into proportionally higher CPU utilizations on slower CPUs. We tested in both Overlay and VMR9 modes, the latter being directly applicable to Windows XP Media Center Edition as it uses VMR9 exclusively.
In Overlay mode in a window, ATI has significantly lower CPU utilization:
WMV9 CPU Utilization (Lower is Better) - Overlay Window - 720p Terminator Trailer | |||
Minimum | Average | Maximum | |
ATI | 9.4 | 22 | 35.2 |
NVIDIA | 14.8 | 28.3 | 40.6 |
WMV9 CPU Utilization (Lower is Better) - Overlay Full Screen - 720p Terminator Trailer | |||
Minimum | Average | Maximum | |
ATI | 11.7 | 22.3 | 33.6 |
NVIDIA | 25 | 37.7 | 46.9 |
ATI sees a very small performance penalty when scaling up to full screen, while NVIDIA faces a huge performance penalty in full screen mode. VMR9 is much more stressful on ATI than it is on NVIDIA, the winner here is NVIDIA.
WMV9 CPU Utilization (Lower is Better) - VMR9 Window - 720p Terminator Trailer | |||
Minimum | Average | Maximum | |
ATI | 28.9 | 41.4 | 50.8 |
NVIDIA | 15.6 | 26.6 | 40.6 |
WMV9 CPU Utilization (Lower is Better) - VMR9 Full Screen - 720p Terminator Trailer | |||
Minimum | Average | Maximum | |
ATI | 31.3 | 42.2 | 50 |
NVIDIA | 20.3 | 38.5 | 50.8 |
Even in full screen mode, NVIDIA is able to offer slightly lower CPU utilization than ATI.
A Preview of the Future - Fully Hardware Accelerated HD Decode
NVIDIA sent us elements of a forthcoming update to Windows Media Player 10 that will further take advantage of PureVideo's hardware acceleration. At the same time, ATI sent us information on how to enable hardware acceleration of WMV9 on their cards before the forthcoming WMP10 update.
To enable WMV9 Hardware Acceleration on ATI X series cards see the following note from ATI:
Note: WMV9 acceleration has been disabled until Microsoft issues a new patch for WMV9. To enable this with other versions of Catalyst (with some rendering errors), RUN regedit -> HKEY_LOCAL_MACHINE -> SYSTEM -> CurrentControlSet -> Control ->Video and find your ATI reg value. The key to update is DXVA_WMV = 1
The improvements are nothing short of impressive, full hardware decode acceleration cuts CPU utilization almost in half for both vendors.
WMV9 HW Accelerated CPU Utilization (Lower is Better) - VMR9 Window - 720p Terminator Trailer | |||
Minimum | Average | Maximum | |
ATI | 10.9 | 17.8 | 24.2 |
NVIDIA | 7.7 | 16.6 | 24.2 |
WMV9 HW Accelerated CPU Utilization (Lower is Better) - VMR9 Full - 720p Terminator Trailer | |||
Minimum | Average | Maximum | |
ATI | 3.9 | 17.9 | 24.2 |
NVIDIA | 5.5 | 17.3 | 24.2 |
Full screen decode performance is much more manageable at under 18% for both GPUs.
Not so Ultra: No Decode Acceleration on NV40 and NV45
As we mentioned before, PureVideo is only partially functional on NV40 and NV45 (GeForce 6800 Ultra and 6800GT, AGP and PCIe), all of the de-interlacing and image quality functionality is there, however those GPUs do not have any WMV9 decode acceleration which is made evident by the performance comparison below:
NV40/45 vs. NV43 - WMV9 HW Accelerated CPU Utilization (Lower is Better) - VMR9 Window | |||
Minimum | Average | Maximum | |
NVIDIA GeForce 6600GT | 7.7 | 16.6 | 24.2 |
NVIDIA GeForce 6800GT | 18 | 31.6 | 43 |
NV40/45 vs. NV43 - WMV9 HW Accelerated CPU Utilization (Lower is Better) - VMR9 Full | |||
Minimum | Average | Maximum | |
NVIDIA GeForce 6600GT | 5.5 | 17.3 | 24.2 |
NVIDIA GeForce 6800GT | 18.8 | 34.2 | 49.2 |
Remember that this only applies to the NV40 (GeForce 6800GT/Ultra AGP) and NV45 (GeForce 6800GT/Ultra PCIe), NV41 (GeForce 6800) has fully function WMV9 decode acceleration.
Final Words
Well, was it worth the wait? Considering that PureVideo came as a free feature on GeForce 6 cards, it's more like unwrapping an early Christmas present - one that we were promised eight months ago.
NVIDIA's image quality is pretty good for a PC DVD decoder, PureVideo delivered de-interlacing image quality that was equal to and in some cases better than what ATI brought to the table. And although we did not feature the comparison here, the NVIDIA PureVideo codec even offered better image quality than the latest DScaler 5 build.
Despite doing better than the competition, NVIDIA still is far from perfect with PureVideo. The Big Lebowski test was proof alone that there's still room for improvement.
The scaling quality and WMV9 playback were both quite competitive with ATI's offerings, although not strikingly better. With hardware acceleration enabled, WMV9 acceleration is promising and will greatly reduce the CPU requirements for high definition content playback.
Overall we're pleased with PureVideo, there's very little to complain about. We aren't as happy with it as we could have been, but we mostly have issue with the way NVIDIA handled the entire situation remaining quiet for far too long. Not to mention that there can't be too many happy 6800GT owners out there knowing that 6600GT owners will have lower CPU utilization when playing WMV9-HD files.
In the end, PureVideo is a positive feature for GeForce 6 owners, a verdict that we are glad we can finally give.