Original Link: http://www.anandtech.com/show/2876



I suppose I could start this article off with a tirade on how frustrating Adobe Flash is. But, I believe the phrase “preaching to the choir” would apply.

I’ve got a two socket, 16-thread, 3GHz, Nehalem Mac Pro as my main workstation. I have an EVGA GeForce GTX 285 in there. It’s fast.

It’s connected to a 30” monitor, running at its native resolution of 2560 x 1600.

The machine is fast enough to do things I’m not smart or talented enough to know how to do. But the one thing it can’t do is play anything off of Hulu in full screen without dropping frames.

This isn’t just a Mac issue, it’s a problem across all OSes and systems, regardless of hardware configuration. Chalk it up to poor development on Adobe’s part or...some other fault of Adobe’s, but Flash playback is extremely CPU intensive.

Today, that’s about to change. Adobe has just released a preview of Flash 10.1 (the final version is due out next year) for Windows, OS X and Linux. While all three platforms feature performance enhancements, the Windows version gets H.264 decode acceleration for flash video using DXVA (OS X and Linux are out of luck there for now).

The same GPU-based decode engines that are used to offload CPU decoding of Blu-rays can now be used to decode H.264 encoded Flash video. NVIDIA also let us know that GPU acceleration for Flash animation is coming in a future version of Flash.

To get the 10.1 pre-release just go here. NVIDIA recommends that you uninstall any existing versions of flash before installing 10.1 but I’ve found that upgrading works just as well.

What Hardware is Supported?

As I just mentioned, Adobe is using DXVA to accelerate Flash video playback, which means you need a GPU that properly supports DXVA2. From NVIDIA that means anything after G80 (sorry, GeForce 8800 GTX, GTS 640/320MB and Ultra owners are out of luck). In other words anything from the GeForce 8 series, 9 series or GeForce GT/GTX series, as well as their mobile equivalents. The only exceptions being those G80 based parts I just mentioned.

Anything based on NVIDIA’s ION chipset is also supported, which will be the foundation of some of our tests today.

AMD supports the following:

- ATI Radeon™ HD 4000, HD 5700 and HD 5800 series graphics
- ATI Mobility Radeon™ HD 4000 series graphics (and higher)
- ATI Radeon™ HD 3000 integrated graphics (and higher)
- ATI FirePro™ V3750, V5700, V7750, V8700 and V8750 graphics accelerators (and later)

It’s a healthy list of supported GPUs from both camps, including integrated graphics. The only other requirement is that you have the latest drivers installed. I used 195.50 from NVIDIA and Catalyst 9.10 from AMD. (Update: The Release Notes now indicate Catalyst 9.11 drivers are required, which would explain our difficulties in testing. ATI just released Catalyst 9.11 but we're having issues getting GPU acceleration to work, waiting on a response from AMD now)

Intel’s G45 should, in theory, work. We tested it on a laptop for this article and since the acceleration is DXVA based, anything that can offload H.264 decode from the CPU using DXVA (like G45) should work just fine. As you’ll see however, our experiences weren’t exactly rosy.



Flash/Hulu on ION: Nearly Perfect

I dusted off ASRock’s ION system based on the Intel Atom 330 (dual-core 1.6GHz Atom) processor for the first part of today’s testing. It had a copy of Windows Vista x64 installed so I stuck with that. The integrated GeForce 9300/9400M chipset supports DXVA/DXVA2 and should be able to offload much of the video decode from the sluggish CPU to the integrated GPU.

As you can see from the results below, CPU utilization drops significantly when going from Flash 10.0.32.18 to 10.1.51.45. Not only do the numbers drop, but playback performance (number of dropped frames) improves significantly. I’d say that all of the tests below were totally playable on the Ion system thanks to Flash 10.1.

Windowed Average CPU Utilization Flash 10.0.32.18 Flash 10.1.51.45
Hulu Desktop - The Office - Murder 70% 30%
Hulu HD 720p - Legend of the Seeker Ep1 75% 52%
Hulu 480p - The Office - Murder 40% 23%
Hulu 360p - The Office - Murder 20% 16%
YouTube HD 720p - Prince of Persia Trailer 60% 12%
YouTube - Prince of Persia Trailer 14% 7%

 

These are awesome improvements. The Hulu HD results were a bit high but the YouTube HD test showed a drop from 60% CPU utilization down to 12%. Most impressive. Now on to the full screen Hulu tests:

Full Screen 1920 x 1200 Average CPU Utilization Flash 10.0.32.18 Flash 10.1.51.45
Hulu Desktop - The Office - Murder 70% 55%
Hulu HD 720p - Legend of the Seeker Ep1 83% 68%
Hulu 480p - The Office - Murder 70% 70%
Hulu 360p - The Office - Murder 70% 70%

 

The biggest difference I saw was running Hulu Desktop in full screen mode (1920 x 1200). While CPU usage wasn’t at 100%, the latest episode of The Office was completely unwatchable in the previous version of Flash. Updating to 10.1 not only dropped CPU utilization, but it made full screen Hulu Desktop watchable on a ~1080p display with the Ion system. I can’t believe it took this long to happen, but it finally did.

The one anomaly I encountered was CPU utilization not dropping while watching Hulu in a maximized IE8 window. I’ve brought it up with NVIDIA and we’re trying to figure out what’s going on.

There is some additional funniness that happens with certain NVIDIA GPUs and some flash video content. Some YouTube videos use a 854 pixel-wide resolution, and default to software decoding on NVIDIA ION and GeForce 8400GS (G98) GPUs. To fix this problem you have to do one of two things. Under IE8 NVIDIA recommends that you do the following:

With Internet Explorer, you may not be able to enter GPU-accelerated playback mode on many clips that naturally start in 854x mode. As a workaround, append “&fmt=22” to the end of 720p clip URLs and &fmt=37 to the end of 1080p clip URLs. The videos will then play in GPU- accelerated HD mode.

Firefox 3.5.5 users have to follow a separate set of instructions:

Before running a YouTube HD clip, please go to Firefox menus and select Tools/Clear Recent History. Ensure the Cookies checkbox is checked, and do the clear. Next, go to Tools/Options/Privacy and select “Never Remember History”.

The above procedure will ensure an HD clip is first loaded in SD mode with 640x horizontal resolution, and then you select the HD button and get GPU- accelerated playback at 1280x HD mode. If you do not first delete Cookies and then turn off history, you may enter an 854x SD horizontal resolution upon starting up an HD clip which is not GPU-accelerated today. If starting in 854x SD mode, when you switch to the HD version, it will still be non-GPU accelerated.

These limitations are only on ION and GeForce 8400GS based GPUs, the rest of NVIDIA supported GPUs accelerate all content regardless of resolution. NVIDIA expects this behavior to be fixed either by updated NVIDIA drivers or an updated version of Flash.



Testing with AMD GPUs: Doesn't Work Yet

Update 4: AMD has released Catalyst 9.11 with Flash support for Radeon HD 5000 series and 4000 series GPUs. No word on integrated graphics platforms. We've begun testing but the drivers don't seem to enable H.264 decode acceleration under Hulu at this point, waiting for a response from AMD.

Update 3: AMD tells us that Flash 10.1 support is coming later today, we should have a working driver soon.

Update 2: The latest beta drivers from ATI do not enable Flash 10.1 hardware acceleration support (both leaked and the supposed Catalyst 9.11 drivers from ATI's developer site). We're still waiting for ATI to get us a version of their drivers that does enable GPU acceleration under Flash 10.1.

NVIDIA's drivers are publicly available however:

Desktop

http://www.nvidia.com/object/winxp_195.55.html

http://www.nvidia.com/object/win7_winvista_32bit_195.55.html

http://www.nvidia.com/object/win7_winvista_64bit_195.55.html

Notebook

http://www.nvidia.com/object/notebook_winxp_195.55.html

http://www.nvidia.com/object/notebook_winvista_win7_195.55.html

http://www.nvidia.com/object/notebook_winvista_win7_x64_195.55.html

Update: The Release Notes now indicate Catalyst 9.11 drivers are required, which would explain our difficulties in testing. We're still waiting on a version of Catalyst 9.11 from AMD that works with Flash 10.1. We will post updated data as soon as we have the driver.

I’d say that my ION testing went pretty smoothly, but the same definitely doesn’t hold true for AMD.

I setup an AMD 785G system (integrated Radeon HD 3200) with a AMD Sempron LE-1150. This is a 2.0GHz, single core, K8 based processor with a 512KB L2 cache. Definitely faster than an Atom.

The integrated graphics of the 785G chipset fully supports H.264 decode acceleration and shouldn’t have a problem with Flash 10.1. AMD has it on the supported list and things should be smooth. Unfortunately, the numbers don’t agree:

Windowed Average CPU Utilization Flash 10.0.32.18 Flash 10.1.51.45
Hulu Desktop - The Office - Murder 97% 100%
Hulu HD 720p - Legend of the Seeker Ep1 94% 100%
Hulu 480p - The Office - Murder 57% 60%
Hulu 360p - The Office - Murder 27% 35%
YouTube HD 720p - Prince of Persia Trailer 90% 100%
YouTube - Prince of Persia Trailer 8% 8%

 

Not only did CPU utilization figures not go down, in many cases they went up. I asked Jarred to help me with a sanity check. He had a notebook based on the mobile version of the same chipset with an Athlon 64 X2 QL-64 (dual core 2.0GHz) and ran his own numbers:

Windowed Average CPU Utilization Flash 10.0.32.18 Flash 10.1.51.45
YouTube HD 720p - Prince of Persia Trailer 46% 46.5%

 

There was no change in CPU utilization when moving from Flash 10.0 to 10.1.

The two of us did notice something however. Flash 10.1, although not perfect on AMD hardware, did seem to improve performance. Jarred measured the number of dropped frames between Flash 10.0 and 10.1 in our YouTube HD test:

Windowed # of Frames Dropped (lower is better) Flash 10.0.32.18 Flash 10.1.51.45
YouTube HD 720p - Prince of Persia Trailer 289 frames 212 frames

 

There’s a definite improvement in 10.1, but just not nearly as much as we saw from NVIDIA.

I tried a few more things before giving up on AMD. I tossed in a Radeon HD 5850 to see if it was the integrated GPU at fault - still no change in CPU utilization. Finally I upgraded processors and used an Athlon II X2 240 instead of the meager Sempron.

Full Screen (1920 x 1200) Average CPU Utilization Flash 10.0.32.18 Flash 10.1.51.45
Hulu Desktop - The Office - Murder (Sempron LE-1150) 100% 100%
Hulu Desktop - The Office - Murder (Athlon II X2 240) 80% 72%

 

CPU utilization finally went down, but not nearly as much as what we saw with NVIDIA. There’s something not quite right about how AMD’s hardware interacts with the Flash 10.1 preview; I guess that’s why they’re calling it a prerelease.



Flash 10.1 on GM45 and ION Laptops

As Anand mentioned, I ran some tests on laptops as a sanity check. Besides the AMD numbers (ATI HD 3200 using a Gateway NV52 laptop), I also ran tests on an HP Mini 311 (NVIDIA ION LE) and a Gateway NV58 (Intel GMA 4500 MHD). My results with the ION LE laptop are similar to Anand's experience, except that I didn't have an external display so I used the native 1366x768 laptop LCD. The difference between Flash 10.0 and 10.1 is absolutely stunning on an ION-based netbook. I conducted all of the laptop testing with the videos running in fullscreen mode.

HP Mini 311 (ION LE)
Full Screen 1366x768 Performance
  Flash 10.0.32.18 Flash 10.1.51.45
Hulu HD 720p - LOTS - Avg. CPU 98% 66%
Hulu HD 720p - LOTS - FPS 1.1 24.2
Hulu 480p - The Office - Avg. CPU 92% 66%
Hulu 480p - The Office - FPS 7.1 27.6
YouTube HD 720p - PoP - Avg. CPU 90% 69%
YouTube HD 720p - PoP - FPS (Dropped) 10.5 (1519) 24.0 (0)

Using Flash 10.0, the ION netbook is horrible for Flash video. Standard definition movies on YouTube are about as good as it gets, and there's still obvious frame dropping when running in fullscreen mode. HD movies range from dropping about one third of the frames to dropping well over half of the frames, and that's at 720p. With YouTube now starting to support 1080p videos, things only get worse. We averaged around three frames per second on a 30 FPS video. Hulu is even worse, with SD video managing just 7.1 FPS and a 720p video running a 1 FPS slideshow.

Upgrade to Flash 10.1 and pretty much all of the problems mentioned above are gone. Average CPU utilization drops by 20 to 35% and every video we tested worked without a hitch (provided we used the &fmt=22 workaround mentioned earlier). Hulu's 720p Legend of the Seeker (one of their few HD videos at present) ran at a buttery smooth 24 FPS. Needless to say, your typical netbook using an Intel GMA 950 isn't going to be able to do any of this stuff, regardless of which version of Flash you're running.

Moving on to the Gateway NV58 with GMA 4500MHD....

Gateway NV58 (GMA 4500MHD)
Full Screen 1366x768 Performance
  Flash 10.0.32.18 Flash 10.1.51.45
Hulu HD 720p - LOTS - Avg. CPU 76% 56%
Hulu HD 720p - LOTS - FPS 25.3 24.5
Hulu 480p - The Office - Avg. CPU 72% 62%
Hulu 480p - The Office - FPS 33.5 10.2
YouTube HD 720p - PoP - Avg. CPU 52% 41%
YouTube HD 720p - PoP - FPS (Dropped) 26.2 (0) 24.0 (0)

Things were a bit more interesting on the NV58. First, we really didn't have any trouble watching any of the videos in full screen mode using Flash 10.0. CPU usage was rather high on the 2.1 GHz T6500 processor, but there were no noticeable frame drops. Both Hulu videos had CPU utilization at above 70%, with spikes hitting 95%. The YouTube 720p video we looked at didn't require nearly as much CPU power, and it didn't drop any frames. One oddity worth noting is that frame rates actually tended to be slightly higher than the video content, though it didn't cause any noticeable distortion.

Updating to Flash 10.1 was a mixed bag. The good news is that CPU utilization dropped by 11 points on the YouTube 720p video. The frame rate also locked in at 24 FPS, which is what you would expect since the source movie is 24 FPS. Our Hulu HD 720p movie dropped CPU usage by 20%, again with frame rates running at the expected 24 FPS (give or take). The anomaly was the Hulu SD video, where we saw CPU usage dropped 10% but frame rates went from a smooth 33 FPS down to 10 FPS. Unfortunately, looking around Hulu, the vast majority of their videos appear to have this problem on the GMA 4500MHD.

Considering the problems we had with ATI video playback and Flash 10.1, the problem appears to be either graphics drivers or incomplete support for non-NVIDIA hardware in Flash 10.1. We expect this is one of those areas Adobe will work on during the next couple of months prior to the official launch of Flash 10.1.



ATI and Intel Update, 11/19/2009:

After uninstalling Flash 10.1, reinstalling, rebooting, and switching to the High Performance power profile (instead of Balanced), some of the Hulu problems noted on the previous page seemed to clear up slightly. We already tested with the latest Intel drivers, so that wasn't the issue. Additional testing revealed that if you disable GPU acceleration with 10.1 (and restart your browser), the Hulu 480p problems are not present, but we continue to have difficulties with Hulu 480p playback on the GMA 4500MHD with GPU acceleration enabled on all the videos we've tested. The 360p videos work without any problems. Here are the updated results, including results from the Gateway NV52 HD 3200 laptop using the Catalyst 9.11 drivers. We've also added the data for 10.1 with GPU acceleration disabled as a point of reference.

Intel GMA 4500MHD (Gateway NV58)

Updated Gateway NV58 (GMA 4500MHD)
Full Screen 1366x768 Performance
  Flash 10.0 Flash 10.1
(GPU)
Flash 10.1
(No GPU)
Hulu 720p - CPU 61% 37% 69%
Hulu 720p - FPS 26.3 24.7 25.3
Hulu 480p - CPU 58% 56% 68%
Hulu 480p - FPS 35.9 10.9 33.9
YouTube 720p - CPU 32% 24% 37%
YouTube 720p - FPS (Dropped) 26.5 (0) 24.0 (0) 19.5 (104)

Starting with Intel, the results have only changed slightly. We can now use Flash 10.1 in all cases, but we have to disable GPU acceleration for certain videos. This may be an issue similar to NVIDIA stating that ION has problems with YouTube HD videos that are 854 pixels wide; hopefully it will be cleared up with driver and/or Flash updates. HD Flash on the other hand definitely benefits from the GPU acceleration and DXVA in Flash 10.1. The Hulu HD Legend of the Seeker video has CPU usage drop 24% while the 720p Prince of Persia trailer on YouTube reduces CPU usage by 8%. Hulu's The Office does reduce CPU usage 2%, but frame rates drop from 30+ FPS to only 10 FPS.

Turning off GPU acceleration in Flash 10.1 shows where and how much the 4500MHD is helping. The YouTube HD trailer drops to around 20 FPS with occasional dropped frames causing noticeable stuttering, and CPU usage jumps 13%. Hulu HD playback remains smooth, but CPU usage jumps 32%, so the DXVA acceleration clearly helps a lot in this instance. Standard Hulu videos like The Office return to a smooth frame rate, but CPU usage is 10% higher than Flash 10.0. Overall, since the Intel GMA 4500MHD with a T6500 CPU manages to handle Flash video up to 720p in full screen mode using Flash 10.0, the 10.1 update isn't critical right now. If you're using a CULV processor (or a display with a higher resolution), Flash 10.1 may be more beneficial. We'll look at that scenario in a future article.

ATI HD 3200 (Gateway NV52)

Gateway NV52 (ATI HD 3200)
Full Screen 1366x768 Performance
  Flash 10.0 Flash 10.1
(GPU)
Flash 10.1
(No GPU)
Hulu 720p - CPU 76% 56% 76%
Hulu 720p - FPS 13.2 24.5 24.5
Hulu 480p - CPU 72% 62% 73%
Hulu 480p - FPS 12.7 34.9 31.3
YouTube 720p - CPU 53% 22% 42%
YouTube 720p - FPS (Dropped) 26.0 (0) 24.0 (0) 21.3 (103)

With the updated Catalyst 9.11 drivers, our results were a lot better than before. Previously, using Flash 10.0 we were unable to view either of the Hulu videos (720p or 480p) in full screen mode without severe stuttering. YouTube HD on the other hand worked fine with 0 dropped frames. Moving to Flash 10.1 with DXVA GPU acceleration, we now see smooth frame rates on all Hulu content and lower CPU usage for both Hulu and YouTube videos. YouTube CPU usage on the Prince of Persia trailer drops 31%, Hulu's Legend of the Seeker drops CPU use 20% while nearly doubling the frame rate (i.e. from dropping half the frames to showing everything), and 480p Hulu drops CPU usage 10% with frame rates almost tripling (from ~13 FPS to over 30 FPS for what appears to be 30 FPS video content).

Disabling the GPU acceleration in Flash 10.1 still results in a better experience at Hulu than Flash 10.0, with roughly the same CPU load but no stuttering. YouTube HD is similar to the GMA 4500MHD in this case, with a frame rate of 21 FPS and slight stuttering. Unlike the Intel platform, if you have an ATI card and a moderate CPU it appears that Flash 10.1 is a clear win.



Huge Improvements under OS X

The release notes for the Flash 10.1 preview say the following about cross-platform hardware accelerated H.264 decoding support:

In Flash Player 10.1, H.264 hardware acceleration is not supported under Linux and Mac OS. Linux currently lacks a developed standard API that supports H.264 hardware video decoding, and Mac OS X does not expose access to the required APIs. We will continue to evaluate adding the feature to Linux and Mac OS in future releases.

Ouch. Linux isn’t ready and Apple isn’t open enough. That’s not to say that there aren’t major performance gains to be had.

I took the same Office clip I’d been using for all of the other tests and ran it on my Mac Pro at full screen (2560 x 1600). Using Activity Monitor I looked at the CPU utilization of the Flash Player plug-in. I compared both versions of Flash and saw a significant drop in CPU utilization:

Hulu Full Screen (2560 x 1600) Average CPU Utilization Flash 10.0.32.18 Flash 10.1.51.45
Hulu 480p - The Office - Murder 450% 190%

Going from roughly 450% down to 190% (or a bit over 10% of total CPU utilization across 16 threads) made full-screen Hulu playable on my machine. In the past I always had to run it in a smaller window, but thanks to Flash 10.1 I don’t have to any longer.

With actual GPU-accelerated H.264 decoding I’m guessing those CPU utilization numbers could drop to a remotely reasonable value. But it’s up to Apple to expose the appropriate hooks to allow Adobe to (eventually) enable that functionality.

Until then, even OS X users have something to look forward to with the Flash 10.1 upgrade.

Final Words

It's finally here. GPU accelerated video decode for Adobe Flash. Grab the preview and let us know how it fares on your system in the comments.

Log in

Don't have an account? Sign up now