• What
    is this?
    You've landed on the AMD Portal on AnandTech. This section is sponsored by AMD. It features a collection of all of our independent AMD content, as well as Tweets & News from AMD directly. AMD will also be running a couple of huge giveaways here so check back for those.
    PRESENTED BY

This is a bit unusual. I got an email from AMD PR this week asking me to correct the Bulldozer transistor count in our Sandy  Bridge E review. The incorrect number, provided to me (and other reviewers) by AMD PR around 3 months ago was 2 billion transistors. The actual transistor count for Bulldozer is apparently 1.2 billion transistors. I don't have an explanation as to why the original number was wrong, just that the new number has been triple checked by my contact and is indeed right. The total die area for a 4-module/8-core Bulldozer remains correct at 315mm2.

CPU Specification Comparison
CPU Manufacturing Process Cores Transistor Count Die Size
AMD Bulldozer 8C 32nm 8 1.2B ~2B 315mm2
AMD Thuban 6C 45nm 6 904M 346mm2
AMD Deneb 4C 45nm 4 758M 258mm2
Intel Gulftown 6C 32nm 6 1.17B 240mm2
Intel Sandy Bridge E (6C) 32nm 6 2.27B 435mm2
Intel Nehalem/Bloomfield 4C 45nm 4 731M 263mm2
Intel Sandy Bridge 4C 32nm 4 995M 216mm2
Intel Lynnfield 4C 45nm 4 774M 296mm2
Intel Clarkdale 2C 32nm 2 384M 81mm2
Intel Sandy Bridge 2C (GT1) 32nm 2 504M 131mm2
Intel Sandy Bridge 2C (GT2) 32nm 2 624M 149mm2

Despite the downward revision in Bulldozer's transistor count by 800M, AMD's first high-end 32nm processor still  boasts a higher transistor density than any of its 45nm predecessors (as you'd expect):

Transistor Density Comparison

Transistor density depends on more than just process technology. The design of the chip itself including details like the balance between logic, cache and IO transistors can have a major impact on how compact the die ends up being. Higher transistor densities are generally more desirable to a manufacturer (fewer defects per die, more die per wafer, lower costs), but from the end user's perspective the overall price/performance (and power?) ratio is what ultimately matters.

POST A COMMENT

42 Comments

View All Comments

  • Conficio - Friday, December 02, 2011 - link

    Unfortunately the transistor count i snot mentioned in the table for Liano.

    However, Liano a 4C part of the old Core types has in the density graph double the density of Bulldozer?

    I'd think if AMD is capable of such a dense design and it is advantageous, they'd use it for their flag ship processor.

    In other words, can you add the Liano numbers to the first table and verify that the density is correct?

    Thanks!
    Reply
  • Marc HFR - Friday, December 02, 2011 - link

    1.45B for 228mm2

    But the 1.45B seems way too high

    For example Athlon II X2 CPU is 234 Millions transistors, and Redwood GPU is 627 Millions. 234x2 + 627 = 1.095 Billions and in this number we get double IMC etc...
    Reply
  • tipoo - Friday, December 02, 2011 - link

    Probably due to the on-die GPU portion of Llano, since GPU's have so much redundant hardware its easier to make them nice and dense. Reply
  • Evleos - Friday, December 02, 2011 - link

    How could anyone believe that it was 2.4 billion?

    http://en.wikipedia.org/wiki/List_of_future_AMD_mi...
    Reply
  • The_Countess - Friday, December 02, 2011 - link

    2 x 1.2 = 2.4 billion for the dual-die server parts?
    I can easily see how that could lead to confusion.
    Reply
  • chromatix - Friday, December 02, 2011 - link

    Okay, what we know is that cache (and DRAM) are extremely transistor-dense, GPU compute area is fairly dense, and CPU compute area is much less dense (because it doesn't make regular patterns). Crossbar switches and other routing stuff is perhaps the least dense of all - it's all wires.

    As a rough estimate, caches require 64 transistors per byte, hence 64 million transistors per megabyte - so Deneb's 8MB total makes 512 million transistors just in the cache, Bulldozer doubles that to 16MB and 1024 million transistors for cache.

    Subtracting the appropriate cache sizes from the original Deneb and Bulldozer figures left Bulldozer with twice the transistor count per core - not per module, per *core* - than Deneb. With no performance improvement per clock per core to show for it, I thought that was a really strange result.

    Subtracting 800 million transistors from Bulldozer makes that comparison much more interesting. Deneb gets 246M over four cores, giving 61.5M transistors per core. Bulldozer gets only about 200M transistors over four *modules*, making on average 50M transistors per module, 25M transistors per core.

    So somehow, Bulldozer's modules are actually more efficient in transistor count than Deneb's, despite the longer pipeline and contaiing two threads! A slight reduction in IPC per core is therefore entirely justified.
    Reply
  • Marc HFR - Friday, December 02, 2011 - link

    Bulldozer module (including L2 cache) is 213 millions transistors according to AMD at the 2011 International Solid-State Circuits Conference.

    85 millions excluding L2 cache according to your data (64 millions transistors per L2 Megabyte). It's much more than 50M ...
    Reply
  • twhittet - Friday, December 02, 2011 - link

    40% reduction in transistor count equals makes perfect sense, because it's about 40% slower than I thought it should have been.

    I remembered looking at the charts at the beginning, and wondering how the hell it was slower, clock for clock, than Thuban, with more than twice the amount of transistors.
    Reply
  • dew111 - Friday, December 02, 2011 - link

    I'm somewhat relieved at this news as well. It doesn't change bulldozer's performance, but it sure makes it look better for future variants to increase performance and power efficiency. If AMD can't beat Intel with 2x the transistor count, they would be in huge trouble. Luckily, with 1.33x the transistor count, they can trounce Intel in many multithreaded workloads. This makes a lot more sense, as it's what the architecture was designed to do. Bulldozer was meant to add more 'cores' with fewer transistors, and it appears with the real transistor count they have achieved this. Reply
  • Aone - Friday, December 02, 2011 - link

    AMD should has corrected transistors of ONE module which w/ 2MB L2 has 213M tr. because if we'd do calculation 213M(tr./one module)*4= 852M transistors. 1200M - 852M= 348M.
    Is it possible that 348M transistors could serve 8MB L3 plus uncore parts?
    Reply

Log in

Don't have an account? Sign up now