We've known for a while now that Intel will integrate some form of DRAM on-package for the absolute highest end GPU configurations of its upcoming Haswell SoC. Memory bandwidth is a very important enabler of GPU (and multi-core CPU) performance, but delivering enough of it typically required very high speed interfaces (read: high power) and/or very wide interfaces (read: large die areas). Neither of the traditional approaches to scaling memory bandwidth are low power or cost effective, which have kept them out of ultra mobile and integrated processor graphics. 

The days of simple performance scaling by throwing more transistors at a design are quickly coming to an end. Moore's Law will continue but much like the reality check building low power silicon gave us a while ago, building high performance silicon will need some out of the box thinking going forward.

Dating back to Ivy Bridge (3rd gen Core/2012), Intel had plans to integrate some amount of DRAM onto the package in order to drive the performance of its processor graphics. Embedding DRAM onto the package adds cost and heat, and allegedly Paul Otellini wasn't willing to greenlight the production of a part that only Apple would use so it was canned. With Haswell, DRAM is back on the menu and this time it's actually going to come out. We've referred to the Haswell part with embedded DRAM as Haswell GT3e. The GT3 refers to the GPU configuration (40 EUs), while the lowercase e denotes embedded DRAM. Haswell GT3e will only be available in a BGA package (soldered-on, not socketed), and is only expected to appear alongside higher TDP (read: not Ultrabook) parts. The embedded DRAM will increase the thermal load of the SoC, although it shouldn't be as painful as including a discrete GPU + high speed DRAM. Intel's performance target for Haswell GT3e is NVIDIA's GeForce GT 650M

What we don't know about GT3e is the type, size and speed of memory that Intel will integrate. Our old friend David Kanter at RealWorldTech presented a good thesis on the answers to those questions. Based on some sound logic and digging through the list of papers to be presented at the 2013 VLSI Technology Symposium in Kyoto, Kanter believes that the title of this soon to be presented Intel paper tells us everything we need to know:

"A 22nm High Performance Embedded DRAM SoC Technology Featuring Tri-Gate Transistors and MIMCAP COB"

According to Kanter's deductions (and somewhat validated by our own sources), Haswell GT3e should come equipped with 128MB of eDRAM connected to the main SoC via a 512-bit bus. Using eDRAM vs. commodity DDR3 makes sense as the former is easier to integrate into Intel's current fabs. There are also power, manufacturability and cost concerns as well that resulted in the creation of Intel's own DRAM design. The interface width is a bit suspect as that would require a fair amount of area at the edges of the Haswell die, but the main takeaway is that we're dealing with a parallel interface. Kanter estimates the bandwidth at roughly 64GB/s, not anywhere near high-end dGPU class but in the realm of what you can expect from a performance mainstream mobile GPU. At 22nm, Intel's eDRAM achieves a density of around 17.5Mbit/mm^2, which works out to be ~60mm^2 for the eDRAM itself. Add in any additional interface logic and Kanter estimates the total die area for the eDRAM component to be around 70 - 80mm^2. Intel is rumored to be charging $50 for the eDRAM adder on top of GT3, which would deliver very good margins for Intel. It's a sneaky play that allows Intel to capture more of the total system BoM (Bill of Materials) that would normally go to a discrete GPU company like NVIDIA, all while increasing utilization of their fabs. NVIDIA will still likely offer better perfoming solutions, not to mention the benefits of much stronger developer relations and a longer history of driver optimization. This is just the beginning however.

Based on leaked documents, the embedded DRAM will act as a 4th level cache and should work to improve both CPU and GPU performance. In server environments, I can see embedded DRAM acting as a real boon to multi-core performance. The obvious fit in the client space is to improve GPU performance in games. At only 128MB I wouldn't expect high-end dGPU levels of performance, but we should see a substantial improvement compared to traditional processor graphics. Long term you can expect Intel to bring eDRAM into other designs. There's an obvious fit with its mobile SoCs, although there we're likely talking about something another 12 - 24 months out.

AMD is expected to integrate a GDDR5 memory controller in its future APUs, similar to what it has done with the PlayStation 4 SoC, as its attempt to solve the memory bandwidth problem for processor based graphics.

Source: RealWorldTech

Comments Locked

83 Comments

View All Comments

  • name99 - Saturday, April 27, 2013 - link

    The basic desiderata for caches are
    - for L1 what matters most is latency
    - for L2 what matters most is bandwidth
    - for L3 what matters most is capacity.

    IBM made the right decision.
  • epobirs - Tuesday, April 23, 2013 - link

    The power draw may be too much for a light mobile device but could be excellent for something like an HTPC. I'm very curious as to what will be offered in the mini-ITX format in a few months. Some good GPU power and a lower overall power envelope than current IB choices would be worth waiting for.
  • jb14 - Tuesday, April 23, 2013 - link

    I suppose the TDP of this chip will preclude its use in an intel NUC unit? Reckon it would be a great combo if the cooling could handle it.
  • tipoo - Tuesday, April 23, 2013 - link

    The models I've seen so far show GT3e in 47W TDP parts. The Mac Mini had a discreet GPU in it at some point, so without that I'm sure something of the size can theoretically handle 47W.
  • SetiroN - Tuesday, April 23, 2013 - link

    To be honest I was hoping for (if not expecting) quite a bit more than 64GB/s... as beneficial as lower latency is, quad channel DDR3 already gives us 50.
    Even for notebook graphics, that's far from ground-breaking. 650M performance seems like a stretch, especially considering how this will be relegated to larger laptops, at which point having a dGPU sounds much more feasible.
  • shiznit - Tuesday, April 23, 2013 - link

    True, but you don't get quad channel ddr3 in a laptop and especially not with ram soldered to the mobo. This is was perf/watt decision with mobile as a priority.

    You get a lot of temporal goodness for games (streaming texture buffer should fit in the 128MB nicely) and smaller datasets and you free up the pipes to main memory so they can keep the L4 full. It's a win-win.

    Servers will come when it can scale to really benefit them.
  • ShieTar - Wednesday, April 24, 2013 - link

    It's only 128MB. With 64GB/s, you can already read or overwrite the complete memory in only 2ms, thus less than a quarter of even a 120Hz frame. Since most graphics software should not work iteratively, this seems fast enough. And why does 650M performance seem like a stretch to you based on this number, when the 650M itself does have exactly the same bandwidth, or even less with the DDR3-version? Even the 675M comes only with 96GB/s.
  • iwodo - Tuesday, April 23, 2013 - link

    So I guess this is what all those Fabs extra capacity are about. Not actually Fabing a lots of Custom Silicon for others, but doubling their usage with eDRAM.

    Intel's biggest problem is still their Drivers. Its Ugly. Its Slow, its not tunes, and you can feel no love with it when comparing to Nvidia.
  • ShieTar - Wednesday, April 24, 2013 - link

    In server environments, I can see embedded DRAM acting as a real boon to multi-core performance.


    But 128MB is not that much more than the already available 20MB of L3 cache in a Xeon, while it is much less than the 32GB (or more) of available RAM. Sounds to me like only a very specific class of software would be able to profit from it. And if you have software that really speeds up with more low-latency memory, does that not mean you're better of running it on a Xeon Phi anyways?
  • Crazy1 - Wednesday, April 24, 2013 - link

    If the previous rumor of a TDP of 55W for a chip carrying GT3e and a graphics performance roughly equivalent to a GT 650m, then I imagine large screen (15"+) "ultrabooks" carrying this chip. Laptops like the Macbook Pro and Razor. Without the need for a descrete GPU, other manufacturers would not need premium materials or smart engineering to stay within the thermal limit of their thin designs.

    An additional $50 is not much when you look at the price of a mobile i7 chip. It's another 2-4% price bump for the overall price of the typical i7 carrying laptop. If the GT3e performs similar to a GT 640m (more realistic) it would easily be a worthwhile upgrade for the people who only need/want mid-range GPU performance.

Log in

Don't have an account? Sign up now