Cache and Memory Performance

I mentioned earlier that cache latencies are higher in order to accommodate the larger caches (8MB L2 + 8MB L3) as well as the high frequency design. We turned to our old friend cachemem to measure these latencies in clocks:

Cache/Memory Latency Comparison
  L1 L2 L3 Main Memory
AMD FX-8150 (3.6GHz) 4 21 65 195
AMD Phenom II X4 975 BE (3.6GHz) 3 15 59 182
AMD Phenom II X6 1100T (3.3GHz) 3 14 55 157
Intel Core i5 2500K (3.3GHz) 4 11 25 148

Cache latencies are up significantly across the board, which is to be expected given the increase in pipeline depth as well as cache size. But is Bulldozer able to overcome the increase through higher clocks? To find out we have to convert latency in clocks to latency in nanoseconds:

Memory Latency

We disable turbo in order to get predictable clock speeds, which lets us accurately calculate memory latency in ns. The FX-8150 at 3.6GHz has a longer trip down memory lane than its predecessor, also at 3.6GHz. The higher latency caches play a role in this as they are necessary to help drive AMD's frequency up. What happens if we turn turbo on and peg the FX-8150 at 3.9GHz? Memory latency goes down. Bulldozer still isn't able to get to main memory as quickly as Sandy Bridge, but thanks to Turbo Core it's able to do so better than the outgoing Phenom II.

L3 Cache Latency

L3 access latency is effectively a wash compared to the Phenom II thanks to the higher clock speeds enabled by Turbo Core. Latencies haven't really improved though, and Bulldozer has a long way to go before it reaches Sandy Bridge access latencies.

The Impact of Bulldozer's Pipeline Windows 7 Application Performance
Comments Locked

430 Comments

View All Comments

  • saneblane - Wednesday, October 12, 2011 - link

    What was the cpu usage like, i have a sinking feeling that cpu usage was low for most of the Review. I heard rumors that Amd are working on a patch, it would make sense because Zambezi losses to the atlon x4 sometimes, and that doesn't make any sense to me at all. Their has to be a performance loss on the cpu, whether it is based on the cpu or maybe it's design is hard for windows to handle.this processor can't be this slow.
  • punchcore47 - Wednesday, October 12, 2011 - link

    Look back when the first Phenom hit the street, I think AMD will right the ship and update over
    time and fix any problems. The gaming performance really looks sad though.
  • bhima - Wednesday, October 12, 2011 - link

    BD will have to drop their prices pretty hard to compete with these benchmarks. They are designed for an even smaller niche than gamers: People who use heavily threaded applications all day.

    I also don't see why anyone would ever put these procs into a server, with over 100 watts extra of heat running through your system compared to the i5 and i7. Interlagos may be more efficient but the architecture already is very power hungry compared to intel's offering.

    Really great way to end the review though Anand, AMD must return to its glory days so Intel doesn't continue to jack consumers. Hell after these benchmarks I could see intel INCREASING their prices instead of decreasing them.
  • haukionkannel - Thursday, October 13, 2011 - link

    Hmm... It seems that BD is leaking a lot of energy when running high freguency! But I am guite sure, that is very good in low 95w usage, with lower freguency. So I think that BD is actually really good and low energy CPU for server use, but the desk top usege is very problematic indeed.

    Seems to be a lot like Phenom release. A lot of current leakage and you got either good power and weak porformance or a little better performance and really bad power consumption... Next BD upgrade can remedy a lot of this, but it can not make miracles...

    I am guite sure, that BD will become reasonable CPU with upgrades and tinkering, but is it enough? The 32nm production technology will get better in time, so the power usage will get better, so they can upgrade freguencies. The problem with single threath speed is the main problem... If, bye some divine intervertion, programers really learn to use multible cores and streams, the future is bright... But most propably the golden amount of cores is 2-4 to far distant future... (not counting some speacial programs...) And that is bad. It would reguire a lot of re-engineering the BD to make it better in single stream aplications and that may be too expensive at this moment. There is some real potential in BD, but it would reguire too much from computer program side to harnes that power, when Intel has so huge lead in single core speed... Same reason Intel burried their "multicore" GPU project some time ago...

    We can only hope that fusion and GPU department keeps AMD floating long enough... Or we will have to face the long dark of Intel monopoly... It would be the worst case scenario.
  • Shining Arcanine - Wednesday, October 12, 2011 - link

    Anand, your compilation benchmark tests only single threaded improvements. Would it be possible to do multithreaded benchmark? Just do compilation on Linux with MAKEOPTS=-j9.

    Also, most of your benchmarks only test floating point performance. It was obvious to me that Bulldozer would be bad at that and I am not surprised. Is it possible to test parallel integer heavy workloads like a LAMP server? Compilation is another one, but I mentioned that above.
  • know of fence - Wednesday, October 12, 2011 - link

    Here is to hoping, that reviews to follow will offer at least some perspective on why single thread performance is still important. Instead just harping on it (as did reviews before it).

    Everybody can run a benchmark, but it's the broad context and perspective that I came to appreciate to read about in Anandtech reviews, beyond "I suspect this architecture will do quite well in the server space". Mind you I'm not referring to the big AMD vs. INTEL broad strokes, but the nitty-gritty.
  • geforce912 - Wednesday, October 12, 2011 - link

    Honestly, i think AMD would have been better off shrinking phenom II to 32nm and slapping on two more cores.
  • tech4tac - Wednesday, October 12, 2011 - link

    Agreed. An enhanced 8 core Phenom II X8 on 32nm process would have used ~1.2B transistors on ~244mm^2 die (smaller than Deneb & about the size of Gulftown) as opposed to the monstrous ~2B and 315mm^2 of a Bulldozer 8 core. Given the same clock speed, my estimates have it outperforming the i7-2600 on most multi-threaded applications. And, with a few tweaks for more aggressive turbo under single core workloads, it would have at least been somewhat competitive in games.

    Bulldozer is a BIG disappointment! It would need at least another 4 cores (2 modules) tacked on to be worth while for multi-threaded applications. AMD has stated it is committed to providing as many cores as Intel has threads (Gulftown has 12 threads so 12 core Bulldozer?), so maybe this will happen. Still... nothing can help its abysmal single core performance. If they can do a 12 core Bulldozer for less than $300, I might get one for a work machine but stick with an Intel chip for my gaming rig.
  • Shadowmaster625 - Wednesday, October 12, 2011 - link

    Companies this incompetent should not be allowed to survive. They bought a GPU company 5 years ago, and have done absolutely nothing to create any type of fusion between the cpu and gpu. You still have a huge multi-layer, multi-company software bloat separating the two pieces of hardware. They have done nothing to address this, and it is clear they never will. Which makes the whole concept a failure. It was a total waste of money.
  • HalloweenJack - Wednesday, October 12, 2011 - link

    and the day after intel triples its cpu prices... is that what you want?

    $500 entry level cpu`s?

Log in

Don't have an account? Sign up now