As of late, Intel has been unusually guarded about releasing information about its microprocessor designs. Haswell launched last year with great architecture disclosure, but very limited disclosure about die sizes, transistor counts and almost anything surrounding the interface between Haswell and its optional embedded DRAM (Crystalwell) counterpart. This week at ISSCC, Intel will finally be filling in some of the blanks. 

The first bit of new information we have are official transistor counts for the range of Haswell designs. At launch Intel only disclosed transistor counts and die areas for Haswell ULT GT3 (dual-core, on-die PCH, GT3 graphics) and Haswell GT2 (quad-core, no on-die PCH, GT2 graphics). Today we have both the minimum and maximum configurations for Haswell. Note all transistor counts below are schematic not layout:

Intel Haswell
  CPU Configuration GPU Configuration Die Size Transistor Count
4+3 Quad-Core GT3e 260mm2 + 77mm2 1.7B + ?
ULT 2+3 Dual-Core GT3 181mm2 1.3B
ULT 2+2 Dual-Core GT2 ? (est) ~1B
4+2 Quad-Core GT2 177mm2 1.4B
2+2 Dual-Core GT2 130mm2 0.96B

I've organized the table above by decreasing die size. I still don't have confirmation for the ULT 2+2 configuration, but the rest is now filled in and accurate. If you remember back to our Iris Pro review, I measured the die area for Haswell GT3 and the Crystalwell eDRAM using some cheap calipers. I came up with 264mm2 + 84mm2, the actual numbers are pretty close at 260mm2 + 77mm2.

Doing some rough math we see that the addition of a third graphics slice to a Haswell core accounts for around 300M transistors. That would put the ULT2+2 configuration at around 1B total transistors. I suspect the ULT 2+2 configuration is similar in size to the quad-core + GT2 configuration.

Next up on the list is some additional information on the Crystalwell (embedded DRAM) design and configuration. Intel explained how it arrived at the 128MB L4 eDRAM cache size, but it wouldn't tell us the operating frequency of the memory or the interface between it and the main CPU die. In its ISSCC disclosures, Intel filled in the blanks:

The 128MB eDRAM is divided among eight 16MB macros. The eDRAM operates at 1.6GHz and connects to the outside world via a 4 x 16-bit wide on-package IO (OPIO) interface capable of up to 6.4GT/s. The OPIO is highly scalable and very area/power efficient. The Haswell ULT variants use Intel's on-package IO to connect the CPU/GPU island to an on-package PCH. In this configuration the OPIO delivers 4GB/s of bandwidth at 1pJ/bit. When used as an interface to Crystalwell, the interface delivers up to 102GB/s at 1.22pJ/bit. That amounts to a little under 1.07W of power consumed to transmit/receive data at 102GB/s.

By keeping the eDRAM (or PCH) very close to the CPU island (1.5mm), Intel can make the OPIO extremely simple.

Intel also shared some data on how it achieved substantial power savings with Haswell, including using a new stacked power gate for the memory interface that reduced leakage by 100x over Ivy Bridge. Haswell's FIVR (Full Integrated Voltage Regulator) is also a topic of discussion in Intel's ISSCC papers. FIVR ends up being 90% efficient under load and can enter/exit sleep in 0.32µs, requiring only 0.1µs to ramp up to turbo frequencies.

Intel's Haswell ISSCC disclosures don't really change anything about Haswell, but they do further illustrate just how impressive of a design it is.

POST A COMMENT

44 Comments

View All Comments

  • dragonsqrrl - Sunday, February 09, 2014 - link

    Disregard, sorry for some reason I thought you meant transistor count and not die area. I'm actually a little curious why the transistor counts don't seem to correlate much with die area, despite the fact that they're all the same architecture. Are ULT processors manufactured on a different power-optimized 22nm process? Reply
  • stickmansam - Sunday, February 09, 2014 - link

    I'm assuming the GPU and CPU have differing densities Reply
  • dragonsqrrl - Sunday, February 09, 2014 - link

    That might make sense if ULT 2+2 had a larger GPU than quad-core GT2, but it doesn't. It has the same GPU configuration and half the CPU cores, yet its approximately the same die area of quad-core GT2. Reply
  • stickmansam - Monday, February 10, 2014 - link

    Well it is an estimate, it may very well be the ULT 2+2 is a harvested chip? Reply
  • dragonsqrrl - Monday, February 10, 2014 - link

    ... then why give an estimate? Reply
  • IntelUser2000 - Monday, February 10, 2014 - link

    That doesn't make sense. Looking at the 2+3 config above, the extra 20 EUs is an exact mirror. Therefore, they can simply cut it off and make a 2+2 part.

    The Gallery is showing 5 different parts with 2+3 and 2+2 being seperate, so why would the die be same?
    Reply
  • p1esk - Sunday, February 09, 2014 - link

    If the embedded RAM frequency is 1600 MHz, how could its interface produce 6.4 GT/s?
    Shouldn't it be 3.2 GT/s?
    Reply
  • Stahn Aileron - Sunday, February 09, 2014 - link

    You are assuming it is double-pumped (DDR). There is such a thing as quad-pumping (Quad Data Rate - QDR) relative to operating frequency. (https://en.wikipedia.org/wiki/Quad_data_rate)

    With the eDRAM that physically close to the CPU, quad-pumping 1.6GHz wouldn't be much of a problem given the short interconnects.
    Reply
  • Devfarce - Sunday, February 09, 2014 - link

    I think it's interesting that they used the same circuitry for the eDRAM and the onboard PCH for the ULT models. I was wondering why there wasn't also an onboard PCH with the GT3e models but I'm now really impressed with the answer. I was also under the impression that the eDRAM was built on the 32nm SOC process so a lot of good info is presented here. Reply
  • extide - Sunday, February 09, 2014 - link

    Kind of surprising that the OPIO is single-ended, and not differential. I guess with such a short distance, you can get away with that and just have the wider bus. Cool! Reply

Log in

Don't have an account? Sign up now