Comments Locked

60 Comments

Back to Article

  • 3930K - Tuesday, May 15, 2012 - link

    Why is there a * next to the SB quad core?
  • phoenix_rizzen - Tuesday, May 15, 2012 - link

    Most likely to show that it gets a boost from DXVA, but not from the actual OpenCL code. So, while it's using the OpenCL option, it's not actually using OpenCL code-path.

    It's explained in the paragraph above the graph.
  • 3930K - Tuesday, May 15, 2012 - link

    Oh, I see, thanks.
  • mevans336 - Tuesday, May 15, 2012 - link

    I was wondering that also? Maybe there was an nVidia GPU used on the Sandy Bridge platform since there are results for the same i7-2820QM lower on the chart?
  • JMS3072 - Tuesday, May 15, 2012 - link

    Can you give a preview of how well Handbrake will perform on a system with higher-powered discrete graphics? If it's good enough, AMD might have just gained my loyalty for HTPC graphics.
  • A5 - Tuesday, May 15, 2012 - link

    Theoretically, there is nothing that prevents OpenCL code from working on NVidia parts as well. Would be interesting to see a comparison between SI and Kepler, though.
  • Doormat - Tuesday, May 15, 2012 - link

    2nd testing this on discrete hardware - mid-range AMD/NV, high end AMD/NV. Nice to see someone taking up OpenCL branches of x264.
  • WeaselITB - Tuesday, May 15, 2012 - link

    Thirded. This would be a very intersting point of comparison.
  • Anand Lal Shimpi - Tuesday, May 15, 2012 - link

    Not quite yet, the current build hasn't really been optimized or tested for dGPU usage. We were asked to limit our testing to integrated solutions for the time being.

    Take care,
    Anand
  • Denithor - Tuesday, May 15, 2012 - link

    Crap.

    I see this as the one place where nVidia's decision not to launch Big Kepler may actually bite them in the ass.

    The compute power of the 79x0 cards versus 670/680 is undeniably stronger. And in this kind of app it might actually make a true difference.
  • CeriseCogburn - Wednesday, May 23, 2012 - link

    ROFL - amd doesn't have this ready ? Why imagine that, how so terribly unusual for them.... hahahhaha

    Well,what about the 2 compute benches 79xx won against 680 who won 3?

    If 79xx can pull out this "hopefully coming" test they can even up the compute bench scoring... and can be one step short even then of showing more compute performance instead of dead paperweighted claims.
  • thefoodaddy - Tuesday, May 15, 2012 - link

    oh boy oh boy oh boy FINALLY.
  • CeriseCogburn - Wednesday, May 23, 2012 - link

    "It's not ready for prime time".... so finally is still waiting.
  • Khato - Tuesday, May 15, 2012 - link

    As per the subject, are we certain that the IVB sample actually was using GPU OpenCL acceleration? I ask because IVB is showing roughly the same 1FPS per core increase going from the public to beta Handbrake as the two SNB samples. That implies that the gains are due only to the DXVA decode acceleration just as on SNB. Well, either that or IVB's GPU OpenCL acceleration potential is non-existent, which really doesn't seem right.
  • Khato - Tuesday, May 15, 2012 - link

    And much as I hate replying to myself... Looks like I may have found an answer to my question. Apparently the IVB GPU OpenCL implementation is rather finicky - http://software.intel.com/en-us/forums/showthread....
  • name99 - Tuesday, May 15, 2012 - link

    "The open source community thus far hasn't been very interested in supporting Intel's proprietary technologies. As a result, Quick Sync remains unused by the applications we want to use for video transcoding."

    Is this an honest description of the situation?
    I thought the problem was more some combination of
    - Intel doesn't provide good docs about how to use it AND
    - it's privileged so you have to go through the OS to use it --- which means you're gated by what the OS is willing or not willing to provide you.
  • TerdFerguson - Tuesday, May 15, 2012 - link

    It's a fair assessment of the situation. The engineer who devised quicksync introduced himself to the x264 team in a public forum and they acted like jerks instead of recognizing the gift dropped in their laps.
  • Manabu - Wednesday, May 16, 2012 - link

    That is not the full history. Latter, Dark Shikari, one of the main developers of x264, said:

    "Since the original Intel failure, I have learned quite a bit more about the lower-level details, and I'd quite love to explain more, but unfortunately I am now deep into NDA territory. If this means people are going to blame x264 for QuickSync's failings, well, unfortunately there's not much I can legally do about it anymore."

    > does that mean you are now technically able to allow some parts of x264
    > encoding to be done by quicksync? If so is this support going to be added?

    "Maybe yes, probably not. There are some pretty devastating technical limitations."

    Source: http://forum.doom9.org/showthread.php?p=1511469#po...
  • CeriseCogburn - Wednesday, May 23, 2012 - link

    Same source: " Originally Posted by Dark Shikari View Post
    If you set the bitrate sufficiently high, the quality difference between encoders becomes negligible
    That's the whole point. So if the quality differences are consistently negligible, why wouldn't you favor the encoder that is magnitudes faster?

    Quote:
    Originally Posted by Dark Shikari View Post
    Twice as fast at what settings? You cannot validly claim "X is faster than Y" if you told Y to go slowly.
    Please refer to the testing methodology of the AnandTech article on the first page of this thread. You may also refer to the TomsHardware benchmark comparisons. Although quality isn't discussed with that article you can see a 3x speed-up against CUDA-based encodes, and 6x against software-only encodes using a commercial product, MediaExpresso. Another TomsHardware benchmark of Quick Sync against the AMD Llano APU can be found here. Outside of the obvious hardware differences, we're talking 46 seconds versus 3:13 minutes.

    Not trying to begin another quality vs. performance argument, so I'll just leave it at that to let you deal with the facts on your own."

    LOL - nice try
  • SikSlayer - Tuesday, May 15, 2012 - link

    Does Handbrake OpenCL acceration work for Nvidia GPU users?
  • vincentlaw - Tuesday, May 15, 2012 - link

    Where can we find this build? I've been looking at the nightlies, but there's no explicit statement of support for OpenCL, so I'm not sure which one I need to grab. I know it's not ready for prime time, but for testing purposes, I'm willing to experiment.
  • JarredWalton - Tuesday, May 15, 2012 - link

    The OpenCL code at present is developed in house by AMD and they will eventually provide all of the code and documentation back to the open source community. However, for now it is all under tight wraps and there is no way to get it (other than from AMD).
  • CeriseCogburn - Wednesday, May 23, 2012 - link

    No, amd is an evil proprietary company who does not practice what it preaches.
    Shouldn't be a big surprise to amd fans anymore.
  • ltcommanderdata - Tuesday, May 15, 2012 - link

    BenchPress was a bit over the top in replying to many comments with it in the Ask the Experts article, but I think it is a worthwhile question here. Does Handbrake 0.9.6 currently make full use of SSE4.x and AVX? If not, what type of speedup would SSE4.x and AVX support bring and how would it compare to these OpenCL results? If the speedup turns out to be fairly comparable, is adopting SSE4.x/AVX easier for these programs that already make use of other SSE versions than adopting OpenCL? Since that would be a disincentive to GPGPU uptake.
  • BenchPress - Tuesday, May 15, 2012 - link

    I'm pretty sure it uses SSE4, but AVX is unlikely since it's basically only floating-point.

    However, AVX2 extends all integer SIMD instructions to 256-bit as well, doubling the throughput compared to SSE4. And other goodies like gather support, vector-vector shift and any-to-any permutes may help as well. BMI and TSX won't hurt either.
  • ilovegpu - Tuesday, May 15, 2012 - link

    Handbrake uses x264 for encode, x264 for sure uses SSE/AVX
  • BenchPress - Wednesday, May 16, 2012 - link

    I checked the code and it only uses AVX-128, which is basically the same as SSE4 but with non-destructive destination operands. So it's not gaining any benefit from the 256-bit floating-point instructions, as expected.

    AVX2 will extend the integer SIMD instructions to 256-bit as well, effectively doubling the throughput and giving GPGPU a run for its money.
  • saneblane - Tuesday, May 15, 2012 - link

    Trinity just got a lot more interesting. Video conversion is the most demanding thing that 85 percent of people do. And with this, Amd is back in the game. Interesting time ahead indeed.
    Wow, seems like Trinity my be the swiss army knife of processors.
  • saneblane - Tuesday, May 15, 2012 - link

    "Trinity might be the swiss army knife of processors". that should have read. Wow, this site needs an edit button.
  • CeriseCogburn - Wednesday, May 23, 2012 - link

    And just moments ago video conversion was a who cares piece of crap that no real mobile gamer could give a crap about.
    My my, how times have changed.
  • Byte - Tuesday, May 15, 2012 - link

    Will this work on Brazos E series APUs?
  • ET - Wednesday, May 16, 2012 - link

    Yes, I'd love to see Brazos in the benchmark. I know it's not a platform most people will use for video transcoding, but I'm curious.
  • cosmotic - Tuesday, May 15, 2012 - link

    Should say 16.5 I'm guessing.
  • mavere - Tuesday, May 15, 2012 - link

    These results are very promising and will tremendously benefit notebook users.

    However, x264's developer wrote about OpenCL-x264 today on another forum: "It's buggy and nowhere near ready for serious usage [...] Buggy drivers aren't helping either."

    :(
  • Impulses - Tuesday, May 15, 2012 - link

    Hasn't that been the story for ages now? Apparently coding GPU acceleration is no cake walk.
  • Beenthere - Tuesday, May 15, 2012 - link

    AMD is definitely moving in the right direction with Trinity and I'm buying a laptop with one to support their efforts.
  • ncrubyguy - Tuesday, May 15, 2012 - link

    When watching the first pass of a 2 pass encode (x264 cli w/ 8 threads), it's pretty obvious the other threads are sitting around waiting for the decode thread. At a bare minimum this should make the first pass much faster.
  • JoeDaddy0710 - Wednesday, May 16, 2012 - link

    Please tell me this is on the list of tests for the Desktop Trinity chips when they come around Q3? Also, add some kind of normal and fast profiles of Handbrake so we can see the spread of performance vs. quality. I have found some of the fast or normal profiles to be adequate for my needs when transcoding.

    I currently record live TV to an old Athlon X2 home server and that old thing takes longer to transcode the 1080i MPEG2TS files than it takes to watch the shows. Instead of making that old dog do the work I copy the files to my Phenom II X4 to transcode them to 720 mp4 files. Ideally I would like to build a new home server using a 65W TDP chip (Trinity or i5 IVB). I am really curious to see how Handbrake OpenCL pans out and I have always been a fan of AMD due to the great value of their processors.
  • Klimax - Wednesday, May 16, 2012 - link

    FFDShow-tryout can probably use QuickSync. (It is in source tree, but I have "only" 3930k, so I cannot test it; might be just decoder part)

    Probably some other mainly smaller projects might have it too.
  • JKnows - Wednesday, May 16, 2012 - link

    Where could I download this beta version of Handbreak? I only can find the latest public 0.9.6 version.
  • CeriseCogburn - Wednesday, May 23, 2012 - link

    AMD has their evil proprietary iron fist on the code. Good luck.
  • Roman2K - Wednesday, May 16, 2012 - link

    This is one of the most interesting AnandTech articles I've read in years.

    First, because I'm extremely excited by delegation of tasks to GPUs in general, and second, because x264 transcoding performance is one less reason to buy Intel instead of AMD processors.

    With this criteria out of the way, I don't really care for x86 performance anymore. Let's hope the power efficiency of Vishera will be on par with Intel's.

    Thanks to AMD for pushing OpenCL forward (in their interest as well as consumers', even Intel's who have Linux OpenCL drivers for their iGPUs) and to AnandTech for both unveiling a terrific news and publishing some comparison tests.
  • aegisofrime - Wednesday, May 16, 2012 - link

    It's not that simple.

    Remember that lookahead is the only function offloaded to OpenCL at this point (probably forever). A lot of people have asked about GPGPU support for x264 and the developers have always said that the only function feasible for GPGPU is lookahead, which is what you are getting now. Other functions will still run on the CPU.

    So, don't be so quick to throw away that Intel CPU.
  • BenchPress - Wednesday, May 16, 2012 - link

    The AVX2 instruction set extension will double the CPU's throughput, and also adds 'gather' support (accessing eight memory locations with a single instruction).

    This is in fact the same sort of technology which makes a GPU fast at throughput computing, but next year it will be merged into the CPU. So there's no need to get excited about delegating tasks to the GPU when the CPU has the same capabilities. Its cache and out-of-order execution even give it unique advantages in avoiding memory bottlenecks and stalls.

    Note that in the above benchmarks the Trinity GPU still loses against the Intel CPU. That gap will widen with AVX2, and I don't see how AMD could counter that unless they implement AVX2 as well and sell us more cores (modules) for less.
  • Roman2K - Thursday, May 17, 2012 - link

    @aegisofrime & BenchPress
    Thanks for your informative replies.
  • CeriseCogburn - Thursday, May 24, 2012 - link

    " Note that in the above benchmarks the Trinity GPU still loses against the Intel CPU. That gap will widen with AVX2, and I don't see how AMD could counter that unless they implement AVX2 as well and sell us more cores (modules) for less."

    Amd, too little, too late. It's not a problem though, fanboy fever will take care of the sad realities, at least for the vast majority here.

    I love the new cpu doesn't really matter for a laptop lines were getting now, too, from the very same. The fever is reaching near 106F and stroking out comes soon (at least they will die happily in ignorant bliss).
    LOL

    I'm going to have to wait at least one more amd generation, perhaps two, until their apu/GPU part, really is worth something. I see HD4000 winning or tying about half the tests, not that either contestant has playable frame rates (portal2 excepted).
  • Riek - Thursday, May 24, 2012 - link

    AMD will also support AVX2 eventually, so mooth point.

    Intel doesn't have AVX2 for at least a year and even then its not clear cut when an application will support it either.

    AMD delivers the same performance in above benchmarks as a higher TDP intel part that has a far bigger cpu and costs double or more.
  • gcor - Wednesday, May 16, 2012 - link

    I ask because I used to work on a Telecom's platform that used PPC chips, with vector processors that *I think* are quite analogous to GPGPU programming. We off loaded as much as possible to the vector processors (e.g. huge quantities of realtime audio processing). Unfortunately it was extremely difficult to write reliable code for the vector processors. The software engineering costs wound up being so high, that after 4-5 years of struggling, the company decided to ditch the vector processing entirely and put in more general compute hardware power instead. This was on a project with slightly less than 5,000 software engineers, so there were a lot of bodies available. The problem wasn't so much the number of people, as the number of very high calibre people required. In fact, having migrated back to generalised code, the build system took out the compiler support for the vector processing to ensure that it could never be used again. Those vector processors now sit idle in telecoms nodes all over the world.

    Anyway, I hope the problem isn't intrinsically too hard for mainstream adoption. It'll be interesting to see how x264 development gets through it's present quality issues with OpenCL.
  • gcor - Wednesday, May 16, 2012 - link

    Opps, I forgot to also ask...

    Wasn't the lack of developer take up of vector processing one of the reasons why Apple gave up on PPC and moved to Intel? Apple initially touted that they had massively more compute available than Windows Intel based machines. However, in the long run no, or almost no, applications used the vector processing compute power available, making the PPC platform no better.
  • gcor - Wednesday, May 16, 2012 - link

    Gah! Should have asked these questions on the other thread. Doh.
  • piroroadkill - Wednesday, May 16, 2012 - link

    Get that running on a powerful GPU and I'll be in heaven.
    Absolute x264 quality but faster than modern x86 can manage sounds too good.
  • jamawass - Wednesday, May 16, 2012 - link

    So I could purchase a $599 Envy ultrathin and still get comparable encoding time to a much more expensive i7 based ultrabook? Handbrake is one of my most frequently used apps. Sign me up!
  • JKflipflop98 - Sunday, May 20, 2012 - link

    No, your problem here is that you're buying a notebook to encode videos with. Build a desktop.
  • CeriseCogburn - Wednesday, May 23, 2012 - link

    No, the sleekbook is $599, and is a 4lb 15" low end trinity lug and chug, and it is not an ultrabook. It comes out after the Intel models, so you'll have to wait as well.

    You can get base sleek for $599, bottom of the barrel.
    HP Envy SleekBook (15.6-inches): June 20 launch
    Price: $599
    CPU: AMD Trinity APU
    Weight: 4 lb
    Thickness: ~19.8 mm
    Screen: 15.6-inch 1366x768 pixel
    Memory: 4 GB 1600 MHz DDR3
    Storage: 320 GB HDD
    Connectivity: 802.11a/g/n
  • Riek - Thursday, May 24, 2012 - link

    the quad core i7 costs 400 , almost as much as a complete sleekbook and with double or more the TDP.

    Even if it gets half the speed as those parts it would be a huuuuge win.
  • Marburg U - Thursday, May 17, 2012 - link

    Do these solutions offer high quality image settings? I mean, stuff like multipass, high profile, variable bitrate, and so on.

    If amd narrows its gap with intel on the speed side, intel can always invest some more $ in better software and superior video quality (and coompression factor).

    Frankly speaking, quickly transcoding video for my smartphone is nice, but image quality is unacceptable for anything bigger than 7''.
  • jamawass - Friday, May 18, 2012 - link

    Handbrake's output quality is excellent for all outputs not just smartphones. Some of the tasks are just offloaded to the apu not the compression quality.
  • dado023 - Friday, May 18, 2012 - link

    I am curious,
    does openCL encoded video results in same video quality as CPU encoded video?, because i often use "placebo" setting in x264....this way i save filesize.

    Also, when using openCL with x264, is it possible to fine tune all x264 settings?

    Regards
    Dan
  • shin0bi272 - Monday, May 21, 2012 - link

    This is a great cpu for laptops but theres a problem I see on the horizon. No one buys laptops to do video transcoding. This would be a great cpu for web browsing and video conferencing to lower power consumption and heat while maintaining performance. Might even be a good cpu for ultrabooks, home theater PCs, or even netbooks.... if anyone is still making those.

    But in the end though the only reason we are seeing this review is so that this cpu doesnt get spanked by a discreet cpu/gpu combo in gaming... And that's what the bulk majority of anandtech.com readers want to see their pc hardware testing. For the 5-10% who want to see other benchmarks this review is for you...
  • fb39ca4 - Saturday, November 24, 2012 - link

    Where do I get the Handbrake build with the OpenCL?

Log in

Don't have an account? Sign up now