The Memory Bandwidth Challenge

By 2015 Rattner predicted that Intel CPUs would have 10s or 100s of cores on each die, which in turn would require a lot of memory bandwidth. The problem with memory bandwidth at that level is that you effectively become pin limited, you can't physically have enough pins leaving your microprocessor to allow for a wide enough memory bus delivering the sort of bandwidth necessary to feed those 10s or 100s of cores.

One solution that Rattner presented was 3D die and wafer stacking. Normally microprocessor circuits are laid out in a flat 2D surface, as the name implies 3D die and wafer stacking builds on top of that, literally.

First let's talk about wafer stacking; wafer stacking involves stacking two identically sized/shaped wafers on top of each other, and using through-silicon vias (interconnects) to connect the top wafer layer to the bottom layer. The best example of an application of this would be a DRAM wafer sitting on top of a CPU wafer, meaning that you would have memory (not cache, that would still be inside your CPU) sitting directly on top of your CPU.

With wafer stacking, instead of having hundreds or thousands of pins between your CPU and main memory, you have 1 - 10 million connections between your CPU and memory, directly increasing memory bandwidth. What's interesting is that this method of stacking could also mean the end of external memory.

Die stacking is another possibility, where you could stack multiple different sized die on top of the CPU core logic, that die could also be DRAM as well as Flash memory or anything else really. Intel showed off an 8 layer configuration using die stacking, which according to Intel is a very realistic option.

Rattner was fairly confident in the potential with die and wafer stacking, so it's a technology that we'll definitely have to keep an eye on as time goes on. There are definitely limitations to consider, such as power and thermal dissipation, but there are solutions in the works for that as well (e.g. nanoscale thermal pumps).

The Super Resolution demo A Fully Virtualized Platform
Comments Locked

15 Comments

View All Comments

  • Verdant - Saturday, March 5, 2005 - link

    sigh - no compiler is going to magically make software work in parallel.


    not everything is "parallel-able" (my new word for the day)

    some tasks must be serial-processed, it is the nature of computing.

    my main point is that i hope we can see individual "cores" keep increasing their speed..


    did he say anything about what the highlighted "photonics" box on the slide was about?
  • mkruer - Friday, March 4, 2005 - link

    As if Intel can predict 10yrs into the future. They having trouble predicting one year in advance. I seriously doubt that Intel massive parallelism will be the solution to all their CPU issues. Looking somewhat ahead I see the parallelism thread dying out at around 8 pipelines for the simple reason, that most “standard” (non games or scientific apps) programs would never use more then eight. Look at RISC, most RISC architecture have 10 thread, and its been that way for the last 10yrs or more. You can only go so wide before the width becomes detrimental to the processing of the instruction.
  • Locut0s - Thursday, March 3, 2005 - link

    #12 Oops should have reap above posts. Yeah that makes more sense then.
  • xsilver - Thursday, March 3, 2005 - link

    the super resolution demo requires video people;
    it interpolates a 60-90 frames into 1 frame like the guy above said....

    and #8 ... I think they mean 1000x because the size of the image used in the demo is very small... so if you wanted to use it on say a face then you would need WAY more computing power.... eg. the stuff on CSI is so bunk....
  • Locut0s - Thursday, March 3, 2005 - link

    Am I the only one that thinks that the "Super Resolution" Demo shown there is just a little too good to be true?
  • xsilver - Thursday, March 3, 2005 - link

    "nanoscale thermal pumps"
    sounds like some tool you need to get botox done :)
  • sphinx - Thursday, March 3, 2005 - link

    All I can say is, we'll see.
  • DCstewieG - Thursday, March 3, 2005 - link

    60 seconds to do 3 seconds of footage. That would seem to me it needs 20x the power to do it in real-time. What's this about 1000x?
  • clarkey01 - Thursday, March 3, 2005 - link

    Intel said in early 03 that they would be at 10 Ghz ( nehalem) in 2005.

    So dont hold you breathe on thier dual core predictions
  • Phlargo - Thursday, March 3, 2005 - link

    Didn't Intel originally say that they could scale the P4 architecture to 10 ghz?

Log in

Don't have an account? Sign up now