The Prelude

As Intel got into the chipset business it quickly found itself faced with an interesting problem. As the number of supported IO interfaces increased (back then we were talking about things like AGP, FSB), the size of the North Bridge die had to increase in order to accommodate all of the external facing IO. Eventually Intel ended up in a situation where IO dictated a minimum die area for the chipset, but the actual controllers driving that IO didn’t need all of that die area. Intel effectively had some free space on its North Bridge die to do whatever it wanted with. In the late 90s Micron saw this problem and contemplating throwing some L3 cache onto its North Bridges. Intel’s solution was to give graphics away for free.

The budget for Intel graphics was always whatever free space remained once all other necessary controllers in the North Bridge were accounted for. As a result, Intel’s integrated graphics was never particularly good. Intel didn’t care about graphics, it just had some free space on a necessary piece of silicon and decided to do something with it. High performance GPUs need lots of transistors, something Intel would never give its graphics architects - they only got the bare minimum. It also didn’t make sense to focus on things like driver optimizations and image quality. Investing in people and infrastructure to support something you’re giving away for free never made a lot of sense.

Intel hired some very passionate graphics engineers, who always petitioned Intel management to give them more die area to work with, but the answer always came back no. Intel was a pure blooded CPU company, and the GPU industry wasn’t interesting enough at the time. Intel’s GPU leadership needed another approach.

A few years ago they got that break. Once again, it had to do with IO demands on chipset die area. Intel’s chipsets were always built on a n-1 or n-2 process. If Intel was building a 45nm CPU, the chipset would be built on 65nm or 90nm. This waterfall effect allowed Intel to help get more mileage out of its older fabs, which made the accountants at Intel quite happy as those $2 - $3B buildings are painfully useless once obsolete. As the PC industry grew, so did shipments of Intel chipsets. Each Intel CPU sold needed at least one other Intel chip built on a previous generation node. Interface widths as well as the number of IOs required on chipsets continued to increase, driving chipset die areas up once again. This time however, the problem wasn’t as easy to deal with as giving the graphics guys more die area to work with. Looking at demand for Intel chipsets, and the increasing die area, it became clear that one of two things had to happen: Intel would either have to build more fabs on older process nodes to keep up with demand, or Intel would have to integrate parts of the chipset into the CPU.

Not wanting to invest in older fab technology, Intel management green-lit the second option: to move the Graphics and Memory Controller Hub onto the CPU die. All that would remain off-die would be a lightweight IO controller for things like SATA and USB. PCIe, the memory controller, and graphics would all move onto the CPU package, and then eventually share the same die with the CPU cores.

Pure economics and an unwillingness to invest in older fabs made the GPU a first class citizen in Intel silicon terms, but Intel management still didn’t have the motivation to dedicate more die area to the GPU. That encouragement would come externally, from Apple.

Looking at the past few years of Apple products, you’ll recognize one common thread: Apple as a company values GPU performance. As a small customer of Intel’s, Apple’s GPU desires didn’t really matter, but as Apple grew, so did its influence within Intel. With every microprocessor generation, Intel talks to its major customers and uses their input to help shape the designs. There’s no sense in building silicon that no one wants to buy, so Intel engages its customers and rolls their feedback into silicon. Apple eventually got to the point where it was buying enough high-margin Intel silicon to influence Intel’s roadmap. That’s how we got Intel’s HD 3000. And that’s how we got here.

Haswell GPU Architecture & Iris Pro
Comments Locked

177 Comments

View All Comments

  • kyuu - Saturday, June 1, 2013 - link

    It's probably habit coming from eluding censoring.
  • maba - Saturday, June 1, 2013 - link

    To be fair, there is only one data point (GFXBenchmark 2.7 T-Rex HD - 4X MSAA) where the 47W cTDP configuration is more than 40% slower than the tested GT 650M (rMBP15 90W).
    Actually we have the following [min, max, avg, median] for 47W (55W):
    games: 61%, 106%, 78%, 75% (62%, 112%, 82%, 76%)
    synth.: 55%, 122%, 95%, 94% (59%, 131%, 102%, 100%)
    compute: 85%, 514%, 205%, 153% (86%, 522%, 210%, 159%)
    overall: 55%, 514%, 101%, 85% (59%, 522%, 106%, 92%)
    So typically around 75% for games with a considerably lower TDP - not that bad.
    I do not know whether Intel claimed equal or better performance given a specific TDP or not. With the given 47W (55W) compared to a 650M it would indeed be a false claim.
    But my point is, that with at least ~60% performance and typically ~75% it is admittedly much closer than you stated.
  • whyso - Saturday, June 1, 2013 - link

    Note your average 650m is clocked lower than the 650m reviewed here.
  • lmcd - Saturday, June 1, 2013 - link

    If I recall correctly, the rMBP 650m was clocked as high as or slightly higher than the 660m (which was really confusing at the time).
  • JarredWalton - Sunday, June 2, 2013 - link

    Correct. GT 650M by default is usually 835MHz + Boost, with 4GHz RAM. The GTX 660M is 875MHz + Boost with 4GHz RAM. So the rMBP15 is a best-case for GT 650M. However, it's not usually a ton faster than the regular GT 650M -- benchmarks for the UX51VZ are available here:
    http://www.anandtech.com/bench/Product/814
  • tipoo - Sunday, June 2, 2013 - link

    I think any extra power just went to the rMBP scaling operations.
  • DickGumshoe - Sunday, June 2, 2013 - link

    Do you know if the scaling algorithms are handled by the CPU or the GPU on the rMBP?

    The big thing I am wondering is that if Apple releases a higher-end model with the MQ CPU's, would the HD 4600 be enough to eliminate the UI lag currently present on the rMBP's HD 4000?

    If it's done on the GPU, then having the HQ CPU's might actually get *better* UI performance than the MQ CPU's for the rMPB.
  • lmcd - Sunday, June 2, 2013 - link

    No, because these benchmarks would change the default resolution, which as I understand is something the panel would compensate for?

    Wait, aren't these typically done while the laptop screen is off and an external display is used?
  • whyso - Sunday, June 2, 2013 - link

    You got this wrong. 650m is 735/1000 + boost to 850/1000. 660m is 835/1250 boost to 950/1250.
  • jasonelmore - Sunday, June 2, 2013 - link

    worst mistake intel made was that demo with DIRT when it was side by side with a 650m laptop. That set people's expectations. and it falls short in the reviews and people are dogging it. If they would have just kept quite people would be praising them up and down right now.

Log in

Don't have an account? Sign up now