Homework: How Turbo Mode Works

AMD and Intel both figured out the practical maximum power consumption of a desktop CPU. Intel actually discovered it first, through trial and error, in the Prescott days. At the high end that's around 130W, for the upper mainstream market that's 95W. That's why all high end CPUs ship with 120 - 140W TDPs.

Regardless of whether you have one, two, four, six or eight cores - the entire chip has to fit within that power envelope. A single core 95W chip gets to have a one core eating up all of that power budget. This is where we get very high clock speed single core CPUs from. A 95W dual core processor means that individually the cores have to use less than the single 95W processor, so tradeoffs are made: each core runs at a lower clock speed. A 95W quad core processor requires that each core uses less power than both a single or dual core 95W processor, resulting in more tradeoffs. Each core runs at a lower clock speed than the 95W dual core processor.

The diagram below helps illustrate this:

  Single Core Dual Core Quad Core Hex Core
TDP
Tradeoff

 

The TDP is constant, you can't ramp power indefinitely - you eventually run into cooling and thermal density issues. The variables are core count and clock speed (at least today), if you increase one, you have to decrease the other.

Here's the problem: what happens if you're not using all four cores of the 95W quad core processor? You're only consuming a fraction of the 95W TDP because parts of the chip are idle, but your chip ends up being slower than a 95W dual core processor since its clocked lower. The consumer has to thus choose if they should buy a faster dual core or a slower quad core processor.

A smart processor would realize that its cores aren't frequency limited, just TDP limited. Furthermore, if half the chip is idle then the active cores could theoretically run faster.

That smart processor is Lynnfield.

Intel made a very important announcement when Nehalem launched last year. Everyone focused on cache sizes, performance or memory latency, but the most important part of Nehalem was far more subtle: the Power Gate Transistor.

Transistors are supposed to act as light switches - allowing current to flow when they're on, and stopping the flow when they're off. One side effect of constantly reducing transistor feature size and increasing performance is that current continues to flow even when the transistor is switched off. It's called leakage current, and when you've got a few hundred million transistors that are supposed to be off but are still using current, power efficiency suffers. You can reduce leakage current, but you also impact performance when doing so; the processes with the lowest leakage, can't scale as high in clock speed.

Using some clever materials engineering Intel developed a very low resistance, low leakage, transistor that can effectively drop any circuits behind it to near-zero power consumption; a true off switch. This is the Power Gate Transistor.

On a quad-core Phenom II, if two cores are idle, blocks of transistors are placed in the off-state but they still consume power thanks to leakage current. On any Nehalem processor, if two cores are idle, the Power Gate transistors that feed the cores their supply current are turned off and thus the two cores are almost completely turned off - with extremely low leakage current. This is why nothing can touch Nehalem's idle power:

Since Nehalem can effectively turn off idle cores, it can free up some of that precious TDP we were talking about above. The next step then makes perfect sense. After turning off idle cores, let's boost the speed of active cores until we hit our TDP limit.

On every single Nehalem (Lynnfield included) lies around 1 million transistors (about the complexity of a 486) whose sole task is managing power. It turns cores off, underclocks them and is generally charged with the task of making sure that power usage is kept to a minimum. Lynnfield's PCU (Power Control Unit) is largely the same as what was in Bloomfield. The architecture remains the same, although it has a higher sampling rate for monitoring the state of all of the cores and demands on them.

The PCU is responsible for turbo mode.

New Heatsinks and Motherboards Lynnfield's Turbo Mode: Up to 17% More Performance
POST A COMMENT

341 Comments

View All Comments

  • nikrusty - Wednesday, November 18, 2009 - link

    With this article Anandtech is Harder, Better, Faster Stronger.
    Seriously AWESOME ARTICLE! It cleared many of my doubts FLAT OUT! Now I know i5 is the way to go especially becoz I dont care about overclocking and just want good gaming performance...nothing screamingly extreme. Budget + Performance always keeps you level headed.
    Reply
  • shiro - Wednesday, October 21, 2009 - link

    what is that monster hoop of death heatsink that's on page 3? lol Reply
  • Eeqmcsq - Saturday, September 19, 2009 - link

    I asked a similar question in one of the other articles, so pardon me if this sounds repetitive.

    According to the Turbo charts, the slowest Turbo speed is higher than the stock speed. Why is that? For example, why not just make the 750 a stock GHz of 2.8 GHz instead of 2.66GHz?
    Reply
  • Eeqmcsq - Saturday, September 19, 2009 - link

    Argh, please ignore. Replied using the wrong Firefox tab. Reply
  • The0ne - Tuesday, September 15, 2009 - link

    Clear up what you're trying to show on the graphs please. You're getting more FPS at max setting than at min settings? Label the graphs like you did with the others please. With the others I can just look and understand what you're doing. With these, I'm scratching my head. Reply
  • The0ne - Tuesday, September 15, 2009 - link

    Ah, turbo mode represented in FPS >.>' Reply
  • kkara4 - Monday, September 14, 2009 - link

    over at bittech.net, they are saying that it is more worth it to go for the i7-920, if we are considering anything above the i5. this is a conflicting story, since anand is recommending the lynnfields. anand or anybody else for that matter could you please see their articles and tell me what they have done wrong? (or perhaps you guys failed to see something). Your article explains things in great technical detail which i can understand since i have studied microprocessors, hence i am more inclined to go for lynnfield. anyway if someone could cross check that would be good Reply
  • mapesdhs - Tuesday, September 15, 2009 - link


    If I've understood Anand's analysis correctly, the conclusion is that,
    for application mixes which involve a lot of single and/or dual-threaded
    codes, and assuming one is not interested in high-end SLI/CF setups
    or hard oc'ing all 4 cores all the time for tasks like video encoding
    or animation rendering, the 750/860 are better buys because they
    will internally push 1-core and 2-core clocks to a higher rate than
    occurs with the 920 via the Turbo function, giving better results
    than the 920, and of course the 750/860 are cheaper solutions
    (although the 860 price is similar to the 920, the mbd costs less
    than an X58, from what people say).

    So it depends on what you want to use your system for. No interest
    in CF/SLI? Running games that don't hammer 4 cores? An i5 750 or
    i7 860 makes more sense. Using apps that don't use more than 2 cores?
    Again the 750/860 is more logical, especially from a cost viewpoint.

    This ties in with the other advantage of the X58 platform, ie. the
    upgrade path to 6-core and 8-core CPUs. If this is something that
    holds no value to you, then P55 makes more sense.

    As always, it depends on what you want to use the system for. The
    attraction of the 860 from a more general point of view is that it
    also offers good quad-core performance when one does use all 4 cores
    without sacrificing the traditional higher-clocks possible with
    single or dual core setups when one is only using 1 or 2 cores. It's
    the best of both worlds, at least for out-of-the-box functionality
    anyway.

    However, if one does intend to use all 4 cores almost all the time
    (I do) with a strong overclock, then the 920 is a better choice
    because of the voltage issue and (IMO) the 6/8-core upgrade path.
    Likewise, high-end multi-GPU setups work better with X58.

    Given that general usage of a PC rarely uses more than 2 cores, this
    is why the 750 and 860 are such attractive options.

    As for the 870, despite its 1/2-core speed advantages, the price is
    too high IMO. For that kind of money, a 920 makes more sense, paired
    with better cooling if one has such a spare budget, or buy a better
    GPU setup which, for gaming, is where the real bottleneck lies.

    Anand, please correct me if I'm wrong with the above.

    Ian.

    PS. As always, real-world pricing issues can make a mess of on-paper
    technical conclusions. Also, although many games/apps don't exploit
    more than 2 cores now, this is likely to change in the near future as
    multi-core coding becomes more pervasive in the industry.

    Reply
  • mapesdhs - Monday, September 14, 2009 - link


    Anand/Gary,

    Re your comments about an X58 advantage being the ability to use
    later 6 and 8-core CPUs...

    I've been planning to build an i7 920 system for video encoding, so
    a max oc on all cores is useful to me; from the article I thus infer
    the X58 is a better choice.

    However, if I did buy such a setup instead of an i5 or i7 860, what
    would the cost tradeoff be do you think when the 6-core CPUs arrive
    with respect to upgrading? By that I mean, for total processing
    throughput, do you reckon a 6-core upgrade would be significantly
    cheaper than simply buying a second i7 920 setup? (gfx not an issue)
    If not, then the ability to use 6/8-core CPUs later in this context
    is somewhat lessened, something that would apply to animation
    rendering aswell (ie. extra complete systems perhaps more cost
    effective in increased overall throughput compared to upgrading to
    more cores). Any ideas? Also, unless the applications used can
    exploit more than 4 cores, the later 6-core CPUs won't help. I have
    about 1500 hours of material to convert to DivX. Each file is about
    40 to 45 minutes (documentary), so converting multiple files on
    multiple systems at the same time is very doable.

    Given the above, I'm looking forward to more details on how a max
    oc'd i860/i870 compares to a max oc'd 920.

    At present I'm just using a 6000+ setup to work out the appropriate
    format/conversion paths.

    Ian.

    PS. May I suggest you don't bother replying to those moaning in such
    an obviously ludicrous manner about the Turbo mode being active? I
    have the distinct impression their posts are designed purely to
    irritate. Please don't encourage them. Anyone with any sense will
    read the article and understand the salient points you've highlighted
    about Turbo mode being an integral function of the chip.

    Reply
  • Milleman - Sunday, September 13, 2009 - link

    I would say that i5 750 and Pehnom II X4 965 is fully comparable. AMD just have to adjust the pricetag and the price/performance will be on par. Looking at the Gaming rig performance, both i5 750 and Pehnom II X4 965 are well enough for gaming pleasure. I wouldn't shell out my bucks for the more expensive Intel top models. It's such a waste of money, unless you are working with huge video and image editing processes. Reply

Log in

Don't have an account? Sign up now