Cache and Memory Performance

I mentioned earlier that cache latencies are higher in order to accommodate the larger caches (8MB L2 + 8MB L3) as well as the high frequency design. We turned to our old friend cachemem to measure these latencies in clocks:

Cache/Memory Latency Comparison
  L1 L2 L3 Main Memory
AMD FX-8150 (3.6GHz) 4 21 65 195
AMD Phenom II X4 975 BE (3.6GHz) 3 15 59 182
AMD Phenom II X6 1100T (3.3GHz) 3 14 55 157
Intel Core i5 2500K (3.3GHz) 4 11 25 148

Cache latencies are up significantly across the board, which is to be expected given the increase in pipeline depth as well as cache size. But is Bulldozer able to overcome the increase through higher clocks? To find out we have to convert latency in clocks to latency in nanoseconds:

Memory Latency

We disable turbo in order to get predictable clock speeds, which lets us accurately calculate memory latency in ns. The FX-8150 at 3.6GHz has a longer trip down memory lane than its predecessor, also at 3.6GHz. The higher latency caches play a role in this as they are necessary to help drive AMD's frequency up. What happens if we turn turbo on and peg the FX-8150 at 3.9GHz? Memory latency goes down. Bulldozer still isn't able to get to main memory as quickly as Sandy Bridge, but thanks to Turbo Core it's able to do so better than the outgoing Phenom II.

L3 Cache Latency

L3 access latency is effectively a wash compared to the Phenom II thanks to the higher clock speeds enabled by Turbo Core. Latencies haven't really improved though, and Bulldozer has a long way to go before it reaches Sandy Bridge access latencies.

The Impact of Bulldozer's Pipeline Windows 7 Application Performance
Comments Locked

430 Comments

View All Comments

  • silverblue - Thursday, October 13, 2011 - link

    Isn't the AseTek cooler self-contained?
  • jaygib9 - Thursday, October 13, 2011 - link

    Belard, water cooling is what many of the higher end gaming systems already run. What makes it stupid? It's far more effective than air cooling, it just requires more equipment and is a little more costly. You say a 25% overclock won't make up the performance difference, but what about possibly going up to 6 GHz/core with water cooling? Do you really think that wouldn't have some pretty good numbers? Hey silverblue, I'm not sure.
  • dillonnotz24 - Wednesday, October 12, 2011 - link

    This is a rather naïve sounding post, but it just occurred to me and I figured I might share.

    Now, I'm putting a lot of faith in simple marketing gimmicks here, but bear with me, and you might find this excellent food for thought.

    When I first discovered the leak conceding AMD's new Bulldozer consumer CPU's, I was kind of put off by AMD's naming scheme. FX 8150 seems like such a small number, and obviously wouldn't appear appealing to the eyes of un-savvy consumers. Now, one might find this claim a bit irrelevant, but if you look at history, numbers sell. Even AMD confirmed this when it launched its new series naming jargon, the "A4's, A6's, and A8's." This is quite obviously a marketing illusion to make AMD processors appear better than Intel's Core i3's, i5's, and i7's in an area that most unaware consumers will see first: the name!

    That said, I started thinking about processor branding. In the past, AMD has used a really strict branding system for its last two CPU designs. A Phenom II's part name very consistently correlated with the CPU's clockspeed and, therefore, performance. Slap on and extra +5 to the name and you got an extra 0.1 GHz of CPU frequency. Also, the higher end CPU's were always placed in the higher spectrum of the thousands place. The top of the line quad cores populated the numbers 925-1000, while the hexa-cores resided in the 1000-1100's. The rebranded CPU's based on original Phenom and Athlon architectures were given much lower values in the 1000's place, with the very popular 555 BE being a prime example. With Llano, the top-end A8-3850 reiterates this phenomenon. The further the part name extends from the number "4000," the less performance you received from the CPU and GPU, relatively incrementally. So, as you can see, AMD consistently used this strategy to give value to their parts without listing a single specification. Larger numbers generally means more performance, and to the casual onlooker, unfamiliarity with the performance range you actually received from the processors in comparison to Intel's made that sub-$200 price point look really tasty.

    So, I say all that to present the following theory. Given that these processors can reach 4.6 GHz on air, and the unicorn-like 5.0 GHz (presumably) on AMD's water cooling solution, there seems to be a lot of headroom for AMD to pull off the most unprecedented comeback in the history of computing. That's right, I'm saying that maybe AMD intends to release new Bulldozer variants with upped clockspeeds and an actual included water-cooling solution for a raised pricepoint
  • dillonnotz24 - Wednesday, October 12, 2011 - link

    ...a raised price point. Could we see a future Bulldozer AMD A8 8950 @ 4.5 GHz with water-cooling bundled for $350 once Global F. Gets it's act together with producing reliable chips? Think about it...AMD's CPU frequency stepping and naming is nowhere to be found with these CPU's, and they are all huddled down around the number 8000. If this is actually the very bottom of the spectrum, this would mean that the very low end Bulldozer variants were on par with the best of Phenom II. Subsequently, the higher end Bulldozer's I propose would have nothing to lose, but anything to gain with higher clock speeds. All they can do is go up! With higher clockspeeds, Bulldozer could make up for all its woes seen here today in single and double threaded applications, which comprise nearly 50% of consumer level apps. There's potential here, but I will admit to those of you who find this whole concept absurd, I have my doubts. Can AMD do it? They'd have my eternal respect, and wallet, if they do.
  • Belard - Thursday, October 13, 2011 - link

    Sooner or later.... someone (perhaps Anandtech) will benchmark a 5Ghz AMD FX 8000 series CPU.

    If said 5.0Ghz CPU (water cooled) is still SLOWWWER in any way compared to a $200 intel 2400 (3.1Ghz) or the $210 2500. Who would care to buy such a $300~350 chip?

    Okay.. I upclock the 2500k to 4ghz and it kills the 8150 at 5~6Ghz.... Nobody buys the 8150 or higher. It just doesn't matter... its too slow.
  • stephenbrooks - Wednesday, October 12, 2011 - link

    They support FMA instructions but then don't fuse multiply and add micro-ops to *make* FMA instructions (as far as I can tell from the article). That's stupid.

    The way they've done it, everyone has to get a new compiler to take advantage of their chips. If they created FMAs in the muop-fusion stage, then even older software would get a boost too.
  • mczak - Wednesday, October 12, 2011 - link

    You can absolutely not fuse mul+add on the fly to fma as the results will be different. Now you can argue the result is "better" by omitting the rounding after the multiplication but fact is you need to 100% adhere to the standard which dictates you need to do it. Software might (quite likely some really will) rely on doing the right thing.
    For the same exact reason compilers can't do such optimizations neither, unless you specifically tell them it's ok and you don't care about such things (it's not only standard adherence but also reproducability - such compiler switches also allow them to reorder things like a+b+c as a+(b+c) which isn't allowed neither otherwise for floating point as you can get different results which makes things unpredictable).
    (gcc for instance has a -ffast-math switch which I guess might be able to do such fusing, I don't know if it will though I know you can get surprising bugs with that option if you don't think about it...)
  • stephenbrooks - Thursday, October 13, 2011 - link

    Thanks for explaining that. I'd kind of assumed FMA would just round as if the MUL happened first. Defining it "correctly", they've thrown away a lot of compatibility for a really marginal increase in accuracy!
  • Pipperox - Thursday, October 13, 2011 - link

    Nope, it's not marginal.
    Basically with your "fused madd" you'd get code which on Bulldozer gives slightly different results than on any other CPU.. silently.
    This is called a bug.
    It is just not acceptable for a CPU to produce "optimizations" which alter even slightly the expected numerical output, because then the programs which run on them would fail in very slight and hard to track ways.
  • mczak - Thursday, October 13, 2011 - link

    That isn't quite correct. There is a real demand for fused multiply add, not doing rounding after the mul is something which is quite appreciated. You just can't use fma blindly as a mul+add replacement, but it's perfectly defined in standard floating point too nowadays (ieee 754-2008).
    Besides, it would be VERY difficult for the cpu to fuse mul+adds correctly even if it would do intermediate rounding after the mul. First the cpu would need to recognize the mul+add sequence - doable if they immediately follow each other I guess, though requires analysis of the operands. Unless both instructions write to the same register it also wouldn't be able to do it anyway since it cannot know if the mul result won't get used later by some other instruction.
    This is really the sort of optimization which compilers would do, not cpus. Yes cpus do op fusion these days but it's quite limited.

Log in

Don't have an account? Sign up now