
Original Link: https://www.anandtech.com/show/2838
The Lynnfield Followup: Turbo Mode and Overclocking Investigated
by Anand Lal Shimpi on September 18, 2009 12:00 AM EST- Posted in
- CPUs
When Lynnfield launched I was on the other side of the country talking GPUs. AMD rented out the USS Hornet to tempt press in and kept them captive for several hours while divulging details of its new DX11 GPUs, the RV770 successors.
Needless to say, while I could read your comments about Lynnfield and what you desired, there was little I could do about it while sitting on (er, in) an aircraft carrier. Gary did a wonderful job with his follow ups but they managed to delay his P55 motherboard reviews. As soon as I landed, it was straight to work on the Lynnfield followup and that's what I'm publishing today.
The first thing many of you asked for were turbo off results between Lynnfield and Bloomfield. On top of that, some wanted results with the exact same memory frequencies. Bloomfield officially only supports up to DDR3-1066, while Lynnfield's memory controller is validated for operation at up to DDR3-1333. Clearly both will work at speeds up to and beyond DDR3-2000, but for stock comparisons I've always tried to stay within validated limits of the platforms.
Below are our application benchmark results comparing Lynnfield to Bloomfield, with turbo disabled, and both platforms running DDR3-1333 at 7-7-7-20 timings. Unlike our standard test beds I've forced the CPUs into their highest performance mode so they are always running at 2.93GHz. The only difference between the two chips is that the Bloomfield is an underclocked Core i7 975 so its un-core runs at 2.66GHz compared to 2.40GHz for the Lynnfield. The real world impact of that difference is negligible:
On average, Lynnfield without turbo mode delivers 96.5% of the performance of Bloomfield. The extra memory controller of Bloomfield is responsible for a 3.5% performance advantage on average; it's just not going to do much at these clock speeds/core configurations.
There are exceptions to the rule. Some applications do benefit from Bloomfield's triple channel memory controller. We see a 15% difference in Windows Media Encoder and nearly a 10% difference in a few other applications. Ultimately it's up to you whether or not the performance difference is worth it, but enable turbo and the advantage clearly goes to Lynnfield as we showed in our initial review.
Lynnfield vs. Bloomfield: Overclocked and Without Turbo
The second request was how Lynnfield and Bloomfield stacked up with turbo disabled but when overclocked. At higher frequencies the demands on the memory subsystem go up so it's more than a valid concern.
I took both systems and overclocked them to 3.8GHz, a level that wasn't too difficult to achieve (more on this later):
The average performance difference doesn't appear to change even as we scale up clock speed. Lynnfield actually does better here thanks to better than expected scores in the game tests (the on-die PCIe controller to blame?), but also falls behind in some other tests (e.g. x264). Overall the performance difference seems to hold even when overclocked; the performance you give up when going to Lynnfield at stock speeds with turbo disabled doesn't get any worse at overclocked speeds.
It's also worth noting that there are applications that we haven't tested that could demand even more of the memory subsystem, but on average, for most users I'd say that the third memory channel isn't worth the price difference.
Hitting 3.8GHz: The Good, The Bad and The Ugly
I picked 3.8GHz for the comparison on the previous page and I just wanted to share what I had to do to reach that frequency.
Bloomfield was by far the easiest to get up to 3.8GHz. I just increased the BCLK and the system POSTed at 3.8GHz. After going through several benchmarks I found that I needed to add a tiny bit of voltage (~40mV) to make it completely stable, but I really didn't have to do anything above and beyond that.
Lynnfield was a bit more difficult. After increasing the BCLK there was a lot more guess and test of voltage levels before I could get the system completely stable. As I mentioned in our Lynnfield launch article, thanks to the on-die PCIe controller any serious overclock will require a bit of voltage. I ended up running the chip at around 1.265V for full stability.
It was far easier to overclock Lynnfield if I just used voltages above 1.30V, but then I ran into another problem: heat. The chip wouldn't hit 3.8GHz regularly at such high voltages, although Gary's testing indicates that a bigger heatsink/fan could fix that. With some work you can definitely overclock Lynnfield using the retail heatsink/fan, it's just not nearly as easy as Bloomfield.
And finally we get to the Phenom II X4 965 BE. With Vista 32-bit installed, the Phenom II system had no problem running at 3.8GHz - however all of our application tests run under a 64-bit OS and this is the Phenom II's achilles' heel. Getting the system stable at 3.8GHz in a 64-bit OS was the most difficult out of the three overclocks I performed for this article. The chip required an uncomfortable amount of voltage and ultimately I couldn't get my sample 100% stable at 3.8GHz in 64-bit Vista (although 32-bit OSes weren't an issue).
If you're curious, the performance gap between AMD and Intel does widen considerably at these higher frequencies:
Processor | Adobe Photoshop CS4 | DivX | x264 - 1st Pass | x264- 2nd Pass | WME |
AMD Phenom II @ 3.8GHz | 19.5 seconds | 39.1 seconds | 85.0 fps | 22.2 fps | 26 seconds |
Intel Bloomfield @ 3.8GHz | 13.3 seconds | 28.8 seconds | 100.0 fps | 36.3 fps | 21 seconds |
Intel Lynnfield @ 3.8GHz | 13.6 seconds | 29.0 seconds | 95.7 fps | 33.9 fps | 24 seconds |
Power Consumption While Overclocked
Guru3D pointed out an important observation in their Lynnfield review: power consumption goes up considerably when you overclock. It's not just the overclock, but it's the process of increasing core voltage that makes power consumption skyrocket. This is partly why I stress stock-voltage overclocking so much. Let me give you an example:
Processor | Stock Power Consumption | Power Consumption While Overclocked to 3.8GHz @ 1.3V |
Intel Core i7 875 | 181W | 215W |
That's a pretty hefty gain in power consumption, over 18% but we get a 29.% increase in clock frequency. Remember my troubles getting the Phenom II X4 965 BE to work in 64-bit Windows? I ran some numbers to show exactly what a lot of extra voltage will do to power consumption:
Processor | Stock Voltage @ 3.4GHz | Stock Voltage @ 3.8GHz | +0.2Vcore, +0.1V NB @ 3.8GHz |
AMD Phenom II X4 965 BE | 223W | 239W | 300W |
Increasing the clock speed by 400MHz only drives up power consumption by 7%, boosting voltage on top of that results in an additional 25% power gain. When overclocking you always want to increase as much as possible while adding as little voltage as possible to maintain the most power efficient system.
GPU Limited Gaming Oddities
Scott Wasson first picked up on this anomaly in his GPU-limited FarCry 2 results at the bottom of this page. Jon Stokes pointed it out and our own Gary Key duplicated and expanded upon the results.
The situation is this: in some cases, Nehalem can go from being much faster than Phenom II, to being measurably slower within the same benchmark depending on resolution. Gary was the first to tie the issue to the GPU used. Gary found that NVIDIA GPUs appeared to behave this way on Nehalem/Phenom II while AMD GPUs didn't. In other words, NVIDIA GPUs were running faster on AMD hardware while AMD GPUs were running faster on Intel hardware. It's all very strange.
It's no surprise that Ryan and I are working on the reviews for AMD's next-generation DX11 GPUs due out before the end of September. I cloned my GPU testbed SSD and moved it over to my CPU testbeds. I then proceeded to run a subset of our GPU tests on the Core i7 920, Core i7 870, Core i5 750, Phenom II X4 965 BE and Core 2 Quad Q9450 on two different GPUs, a GeForce GTX 275 and a Radeon HD 4890.
Let's go through the results game by game, shall we?
I'll start with Gary's FarCry 2 benchmark. We're running in DX10 mode with the optimal quality defaults (latest patch) and 2X AA. Much more GPU-bound than our normal CPU gaming tests, but that's exactly what we're looking for here. The benchmark of choice is "Ranch small", it comes with the game:
So I've duplicated Gary's results. The Nehalem cores all perform about the same, the i7 920 is a bit slower thanks to lacking turbo mode it seems. But look at the Phenom II X4, it is significantly faster regardless of resolution. Now look at the same test with a Radeon HD 4890:
The Phenom II X4 965 BE advantage disappears completely. That's odd.
Next, I ran the FarCry 2 benchmark we're using for our upcoming GPU reviews. It's the Playback action demo with Ultra Quality defaults and 4X AA enabled. First on NVIDIA hardware:
The Core i7 920 falls a bit behind the other Nehalems and while the Phenom II X4 965 BE pulls ahead slightly at 2560 x 1600, the performance is generally GPU bound across the board. An unexpected result is that the Core 2 Quad Q9450 at 1680 x 1050 is actually CPU bound. There may just be a gaming reason to upgrade your CPU after all. Now let's switch to AMD hardware:
Now this is strange. The Core 2 Quad doesn't fall behind in performance, in fact it ties the Core i7 870 at 1680 x 1050. In other words, it doesn't appear to be CPU bound anymore at 1680 x 1050. Confused?
Let's keep going.
The next game I tested was Crysis Warhead. Again I ran all of the numbers in DX10 mode, this time with "Gamer" quality presets but with "Enthusiast" quality shaders. I ran the "frost" benchmark included with the initial version of the game.
All of the lines are overlapping as they should be, we're in a GPU limited situation afterall. The 870 pulls ahead slightly at the end but it's nothing to get terribly excited about.
Switch to the Radeon HD 4890 and we now have an outlier. The Core i7 920 is measurably slower than everything else at 1680 x 1050. The only change we made was the graphics card/drivers. Next.
Dawn of War II is a RTS/RPG that includes a wonderful built in benchmark. I ran with all settings maxed out in the game (including turning AA "on"):
At 1680 x 1050 we actually see some performance breakdown here. The Lynnfields are fastest, most likely due to faster turbo modes. The Core i7 920 is next on the charts, followed by the Phenom II X4 965 BE. At the bottom we have the Core 2 Quad Q9450. But at 2560 x 1600 they all converge at roughly the same point. Since many users have monitors capable of resolutions lower than 1920 x 1200 it's quite possible that the differences between these CPUs would be noticeable.
Things don't change too much as we switch graphics cards. The Phenom II X4 does a bit better with the Radeon HD 4890, but that's about the only change.
Left 4 Dead is next. All settings are maxed including Anisotropic Filtering at 16X. V-Sync is disabled and AA is set to 4X MSAA.
These numbers mostly make sense. The i7 870 is the fastest, followed by the i5 750 and the i7 920 - you have turbo to thank for that. The Phenom II is a bit slower and the Core 2 Quad is a lot slower. But by the time you hit 2560 x 1600, all roads lead to around 76 fps.
Similar behavior with ATI hardware, whew.
HAWX is a combat flight simulator that also doubles as a great DX10 benchmark. I ran the DX10 version of the game with all settings at their highest values with the exception of Ambient Occlusion, which was set to "low".
This is another one of those games where the Phenom II pulls ahead of the Nehalem processors even at a supposedly GPU-bound 2560 x 1600 resolution. The advantage isn't huge, about 7% but the Core 2 Quad gives us some indication as to what's going on. The Q9450 actually beats everything here - perhaps it's a large L2 thing? Now look at what happens with a Radeon HD 4890:
The Core 2 Quad still does better than everything else, but pretty much everything converges at the same point. The Phenom II advantage seems to disappear. So far we have HAWX and FarCry 2 exhibiting this behavior. Mental note, next benchmark please.
Our final test is Battleforge, a free to play online card based RTS. I ran with all settings maxed out:
Here we see the opposite happening - the Phenom II X4 965 BE is far slower than anything else at 1680 x 1050. As expected, all CPUs tend to converge at the same point if you crank the resolution up high enough.
Switch graphics cards and the AMD disadvantage actually disappears. It's the opposite of what we've been seeing in games like FarCry 2 and HAWX where switching to an AMD GPU causes the AMD advantage to disappear.
What can we conclude from all of this data? Not much unfortunately. There are a couple of certainties:
1) Even at relatively stressful GPU settings, 1680 x 1050 with 4X AA enabled, some games are still CPU bound. The next-generation of DX11 GPUs will make this even more true.
2) Gaming performance isn't totally clean cut between all of these CPUs. There are situations where Nehalem is faster, Penryn is faster or Phenom II is faster. The trend appears to be that Nehalem is generally the fastest, followed by Phenom II and only rarely does the Core 2 Quad end up on top.
How do I explain the odd behavior that we've seen in some of these games? Honestly, I'm not sure if there's any one explanation. What appears to happen is a perfect storm of CPU power, GPU power, GPU drivers, cache sizes, clock speeds and instruction mix. In some cases it looks to be cache related as the Core 2 and Phenom II both do very well and have a noticeably larger L2 than Nehalem, but in other cases it's much more difficult to explain by any one variable. The fact that the situation changes almost entirely when switching to ATI hardware is what makes me believe the GPU driver is playing some role in all of this.
Ultimately it's not a big (or consistent) enough of an issue to get too worked up about, but it's definitely something real and not just a figment of testbed imagination. I've shared all of my data with hopes of figuring out exactly what's going on, but as I mentioned in my Lynnfield review - not all applications/games are going to play out the same way. I'll update you if I do find anything out.