The Prelude

As Intel got into the chipset business it quickly found itself faced with an interesting problem. As the number of supported IO interfaces increased (back then we were talking about things like AGP, FSB), the size of the North Bridge die had to increase in order to accommodate all of the external facing IO. Eventually Intel ended up in a situation where IO dictated a minimum die area for the chipset, but the actual controllers driving that IO didn’t need all of that die area. Intel effectively had some free space on its North Bridge die to do whatever it wanted with. In the late 90s Micron saw this problem and contemplating throwing some L3 cache onto its North Bridges. Intel’s solution was to give graphics away for free.

The budget for Intel graphics was always whatever free space remained once all other necessary controllers in the North Bridge were accounted for. As a result, Intel’s integrated graphics was never particularly good. Intel didn’t care about graphics, it just had some free space on a necessary piece of silicon and decided to do something with it. High performance GPUs need lots of transistors, something Intel would never give its graphics architects - they only got the bare minimum. It also didn’t make sense to focus on things like driver optimizations and image quality. Investing in people and infrastructure to support something you’re giving away for free never made a lot of sense.

Intel hired some very passionate graphics engineers, who always petitioned Intel management to give them more die area to work with, but the answer always came back no. Intel was a pure blooded CPU company, and the GPU industry wasn’t interesting enough at the time. Intel’s GPU leadership needed another approach.

A few years ago they got that break. Once again, it had to do with IO demands on chipset die area. Intel’s chipsets were always built on a n-1 or n-2 process. If Intel was building a 45nm CPU, the chipset would be built on 65nm or 90nm. This waterfall effect allowed Intel to help get more mileage out of its older fabs, which made the accountants at Intel quite happy as those $2 - $3B buildings are painfully useless once obsolete. As the PC industry grew, so did shipments of Intel chipsets. Each Intel CPU sold needed at least one other Intel chip built on a previous generation node. Interface widths as well as the number of IOs required on chipsets continued to increase, driving chipset die areas up once again. This time however, the problem wasn’t as easy to deal with as giving the graphics guys more die area to work with. Looking at demand for Intel chipsets, and the increasing die area, it became clear that one of two things had to happen: Intel would either have to build more fabs on older process nodes to keep up with demand, or Intel would have to integrate parts of the chipset into the CPU.

Not wanting to invest in older fab technology, Intel management green-lit the second option: to move the Graphics and Memory Controller Hub onto the CPU die. All that would remain off-die would be a lightweight IO controller for things like SATA and USB. PCIe, the memory controller, and graphics would all move onto the CPU package, and then eventually share the same die with the CPU cores.

Pure economics and an unwillingness to invest in older fabs made the GPU a first class citizen in Intel silicon terms, but Intel management still didn’t have the motivation to dedicate more die area to the GPU. That encouragement would come externally, from Apple.

Looking at the past few years of Apple products, you’ll recognize one common thread: Apple as a company values GPU performance. As a small customer of Intel’s, Apple’s GPU desires didn’t really matter, but as Apple grew, so did its influence within Intel. With every microprocessor generation, Intel talks to its major customers and uses their input to help shape the designs. There’s no sense in building silicon that no one wants to buy, so Intel engages its customers and rolls their feedback into silicon. Apple eventually got to the point where it was buying enough high-margin Intel silicon to influence Intel’s roadmap. That’s how we got Intel’s HD 3000. And that’s how we got here.

Haswell GPU Architecture & Iris Pro
POST A COMMENT

169 Comments

View All Comments

  • n13L5 - Tuesday, June 11, 2013 - link

    " An Ultrabook SKU with Crystalwell would make a ton of sense, but given where Ultrabooks are headed (price-wise) I’m not sure Intel could get any takers."

    They sure seem to be going up in price, rather than down at the moment...
    Reply
  • anandfan86 - Tuesday, June 18, 2013 - link

    Intel has once again made their naming so confusing that even their own marketing weasels can't get it right. Notice that the Intel slide titled "4th Gen Intel Core Processors H-Processors Line" calls the graphics in the i7-4950HQ and i7-4850HQ "Intel HD Graphics 5200" instead of the correct name which is "Intel Iris Pro Graphics 5200". This slide calls the graphics in the i7-4750HQ "Intel Iris Pro Graphics 5200" which indicates that the slide was made after the creation of that name. It is little wonder that most media outlets are acting as if the biggest tech news of the month is the new pastel color scheme in iOS 7. Reply
  • Myoozak - Wednesday, June 26, 2013 - link

    The peak theoretical GPU performance calculations shown are wrong for Intel's GFLOPS numbers. Correct numbers are half of what is shown. The reason is that Intel's execution units are made of of an integer vec4 processor and a floating-point vec4 processor. This article correctly states it has a 2xvec4 SIMD, but does not point out that half is integer and half is floating-point. For a GFLOPS computation, one should only include the floating-point operations, which means only half of that execution unit's silicon is getting used. The reported computation performance would only be correct if you had an algorithm with a perfect mix of integer & float math that could be co-issued. To compare apples to apples, you need to stick to GFLOPS numbers, and divide all the Intel numbers in the table by 2. For example, peak FP ops on the Intel HD4000 would be 8, not 16. Compared this way, Intel is not stomping all over AMD & nVidia for compute performance, but it does appear they are catching up. Reply
  • alexcyn - Tuesday, August 06, 2013 - link

    I heard that Intel 22nm process equals TSMS 26nm, so the difference is not that much. Reply
  • alexcyn - Tuesday, August 06, 2013 - link

    I heard that Intel 22nm process equals TSMC 26nm, so the difference is not that big. Reply
  • Doughboy(^_^) - Friday, August 09, 2013 - link

    I think Intel could push their yield way up by offering 32MB and 64MB versions of Crystalwell for i3 and i5 processors. They could charge the same markup for the 128, but sell the 32/64 for cheaper. It would cost Intel less and probably let them take even further market share from low-end dGPUs. Reply
  • krr711 - Monday, February 10, 2014 - link

    It is funny how a non-PC company changed the course of Intel forever for the good. I hope that Intel is wise enough to use this to spring-board the PC industry to a new, grand future. No more tick-tock nonsense arranged around sucking as many dollars out of the customer as possible, but give the world the processing power it craves and needs to solve the problems of tomorrow. Let this be your heritage and your profits will grow to unforeseen heights. Surprise us! Reply
  • s2z.domain@gmail.com - Friday, February 21, 2014 - link

    I wonder where this is going. Yes the multi core and cache on hand and graphics may be goody, ta.
    But human interaction in actual products?
    I weigh in at 46kg but think nothing of running with a Bergen/burden of 20kg so a big heavy laptop with ingratiated 10hr battery and 18.3" would be efficacious.
    What is all this current affinity with small screens?
    I could barely discern the vignette of the feathers of a water fowl at no more than 130m yesterday, morning run in the Clyde Valley woodlands.
    For the "laptop", > 17" screen, desktop 2*27", all discernible pixels, every one of them to be a prisoner. 4 core or 8 core and I bore the poor little devils with my incompetence with DSP and the Julia language. And spice etc.

    P.S. Can still average 11mph @ 50+ years of age. Some things one does wish to change. And thanks to the Jackdaws yesterday morning whilst I was fertilizing a Douglas Fir, took the boredom out of a another wise perilous predicament.
    Reply
  • johncaldwell - Wednesday, March 26, 2014 - link

    Hello,
    Look, 99% of all the comments here are out of my league. Could you answer a question for me please? I use an open source 3d computer animation and modeling program called Blender3d. The users of this program say that the GTX 650 is the best GPU for this program, siting that it works best for calculating cpu intensive tasks such as rendering with HDR and fluids and other particle effects, and they say that other cards that work great for gaming and video fall short for that program. Could you tell me how this Intel Iris Pro would do in a case such as this? Would your test made here be relevant to this case?
    Reply

Log in

Don't have an account? Sign up now