Today at the annual Hot Chips conference, AMD’s new CTO Mark Papermaster unveiled the first details about the Steamroller x86 CPU core.

Steamroller is the third instantiation of AMD’s Bulldozer architecture, first conceived in the mid-2000s and finally brought to market in late 2011. Committed to this architecture for at least one more design after Steamroller, AMD has settled on roughly yearly updates to the architecture. For 2012 we have the introduction of Piledriver, the optimized Bulldozer derivative that formed the CPU foundation for AMD’s Trinity APU. By the end of the year we’ll also see a high-end desktop CPU without processor graphics based on Piledriver.

Piledriver saw a switch to hard edge flip flops, which allowed for a considerable decrease in power consumption at the expense of careful design and validation work. Performance didn’t change, but AMD saw a 10% - 20% reduction in active power. Piledriver also brought some scheduling efficiency improvements, but prefetching and branch prediction were the two other major design improvements in Piledriver.

Steamroller is designed to keep the ball rolling. It takes fundamentals from the Bulldozer/Piledriver architectures and offers a healthy set of evolutionary improvements on top of them. In Intel speak Steamroller wouldn’t be a tick as it isn’t accompanied by a significant process change (28nm bulk is pretty close to 32nm SOI), but it’s not a tock as the architecture is mostly enhanced but largely unchanged. Steamroller fits somewhere in between those two extremes when it comes to changes. 
 

Front End Improvements

 
One of the biggest issues with the front end of Bulldozer and Piledriver is the shared fetch and decode hardware. This table from our original Bulldozer review helps illustrate the problem:
 
Front End Comparison
  AMD Phenom II AMD FX Intel Core i7
Instruction Decode Width 3-wide 4-wide 4-wide
Single Core Peak Decode Rate 3 instructions 4 instructions 4 instructions
Dual Core Peak Decode Rate 6 instructions 4 instructions 8 instructions
Quad Core Peak Decode Rate 12 instructions 8 instructions 16 instructions
Six/Eight Core Peak Decode Rate 18 instructions (6C) 16 instructions 24 instructions (6C)
 
Steamroller addresses this by duplicating the decode hardware in each module. Now each core has its own 4-wide instruction decoder, and both decoders can operate in parallel rather than alternating every other cycle. Don’t expect a doubling of performance since it’s rare that a 4-issue front end sees anywhere near full utilization, but this is easily the single largest performance improvement from all of the changes in Steamroller. 
 
The penalties are pretty obvious: area goes up as does power consumption. However the tradeoff is likely worth it, and both of these downsides can be offset in other areas of the design as you’ll soon see.

Steamroller inherits the perceptron branch predictor from Piledriver, but in an improved form for better performance (mostly in server workloads). The branch target buffer is also larger, which contributes to a reduction in mispredicted branches by up to 20%. 
 

Execution Improvements

 
AMD streamlined the large, shared floating point unit in each Steamroller module. There’s no change in the execution capabilities of the FPU, but there’s a reduction in overall area. The MMX unit now shares some hardware with the 128-bit FMAC pipes. AMD wouldn’t offer too many specifics, just to say that the shared hardware only really applied for mutually exclusive MMX/FMA/FP operations and thus wouldn’t result in a performance penalty. 
 
The reduction of pipeline resources is supposed to deliver the same throughput at lower power and area, basically a smarter implementation of the Bulldozer/Piledriver FPU. 

There’s no change to the integer execution units themselves, but there are other improvements that improve integer performance. 
 
The integer and floating point register files are bigger in Steamroller, although AMD isn’t being specific about how much they’ve grown. Load operations (two operands) are also compressed so that they only take a single entry in the physical register file, which helps increase the effective size of each RF. 
 
The scheduling windows also increased in size, which should enable greater utilization of existing execution resources. 
 
Store to load forwarding sees an improvement. AMD is better at detecting interlocks, cancelling the load and getting data from the store in Steamroller than before.
Cache Improvements & Looking Forward
Comments Locked

126 Comments

View All Comments

  • CeriseCogburn - Friday, October 12, 2012 - link

    Congratulations. You have achieved what others have repeatedly failed to do.
    You have broken the darkside grasp of the AMD PR fanboy advertising pump campaign.

    " Consequently, I sit here asking myself WTF!? Not at AMD (this let down was expected) but at myself. There is no other manufacturer, service provider, or producer that I would tollerate this from, why am accepting it from AMD? "

    It appears your mind has cleared, you have exited enslavement to the Deathstar.
  • MLSCrow - Wednesday, September 5, 2012 - link

    Quote from Laststop311: "-28nm really? Intel will be on it's 2nd gen of 22nm over a year after 22nm debuts for intel and amd still can't match that size. Will steamroller be enough to go up against haswell, thats not even a legit question, haswell is going to obliterate steamroller in every way imaginable."

    Response: Intel will always be ahead of AMD in terms of their fab process. This is nothing new. AMD will go to 28 after intel has gone to 22. AMD will then go to 14 after Intel moves to 11 and so on and so forth. Old news. The question isn't whether steamroller will be enough to go up against Haswell. Haswell will be better in every way, you're right. Anyone who tries to argue that isn't well informed. The real question is, will Steamroller be enough to keep AMD in the game and the answer is, if it performs the way they are saying it will, I predict yes, it will for sure keep them in the game. According to AMD's information (I trust this group more than the group in charge of the original Bulldozer who were fired), that Steamroller will perform at least 15% faster than Piledriver, but from what they said at Hot Chips, 30% faster. 30% faster than Piledriver will put it's performance up against Ivy Bridge, possibly Ivy Bridge-E and to be honest, once you're at that level of performance, no one will even notice a 10-20% improvement which is what Haswell is referring to. Most people today wouldn't know the difference if you put a phenom ii x4 in their system or a Sandy bridge. It's only the benchmarkers, hardcore gamers, and enthusiasts that will notice and they only make up a very small portion of the market. If HSA takes off and it might, especially with Samsung having signed on last week, which may spark other companies to join considering Samsung is so huge, and companies start to code for it, Steamroller might actually give AMD a moment of glory that they haven't seen since Athlon 64. If the industry moves toward HSA the way AMD is betting the farm on, Steamroller will actually steamroll Haswell pretty bad, but who knows what the future holds in that regard. Regardless, whether or not HSA takes off, Steamroller should be enough to keep AMD in the game.

    Quote by hapkiman: "If they don't hit one out of the park soon, I see AMD turning into a second rate company making low-end APUs for OEMs. and of course graphics cards."

    Response: They already are a second rate company making low end APU's for OEMs and of course graphics cards. LOL. They'd HAVE to hit a home run with HSA and Steamroller to truly get back into the first rate game. They've been out of it for a few years now. Thanks to Hector Ruiz (should be Ruinz).
  • mack53 - Wednesday, September 12, 2012 - link

    Still think it boils down to what you want. If it does that who cares. Plus I can't spend the money that Itel wants for the newest and best. Amd has done me right for along time. If we didn't hav AMD, Itel would tsake over and I'd hate to see the costs then....
  • HexiumVII - Sunday, April 7, 2013 - link

    Imagine the win ultrabook/tablet when AMD can put a Core class ( even first gen will do) CPU with their Radeon APUs. Common AMD go!
  • scorpysr - Tuesday, June 11, 2013 - link

    hi everyone,
    ive read every post here, lol kinda comical in a way with that said just a bit of introduction: been building comps since the 386 and boy we have come a long way im not a rocket scientist by any means but want to say a few things...Mac53 thats pretty much it in a nut shell...but firstly money is always a factor in life cant get around it, there be some that would rather have a corvet but eat bologna for 2 years but so be it...personally i like my steaks... :) secondly..i started out amd and i am still glad they are around, we need competition in life it does drive the wheels to innovation and helps keep prices down...however the business model has sadly changed alot its no longer sell alot and make 20% fair profit now its get what you can get even if you gouge..i digressed a little sorry but here it is...i bought a i7 950 bought 2 years ago...got a nice video card and 6 gig of ram i doubt anyone here can justify for me going out and spending another 600 bucks for the lastest upgrade..nope this rig will take me well into 2016 or when performance of this chip is beaten by at least 30%....and guys dont get me wrong having a hobby is nice...but can you imagine a site dedicated to who makes the best refridgerator !!!!!!!!!! lol just food for thought and
    thanks for allowing me this oppertunity

    cheers
  • gareth112 - Tuesday, June 25, 2013 - link

    Intel have been releasing great products of the last say 4 years I series but the bang for buck is always with AMD and ATI, you can build a great system normally for half the price with AMD and ATI products with the same performance as the Intel based system.

    I use both Intel and AMD for work, and yes Intel processors are faster, but when you have a AMD in your computer doesn't seem to cause as many random crashes because they have been developed longer not to just rush out.

    plus i like being on the Rebels/under dogs team, as come on everyone likes to bet on the under dog.

Log in

Don't have an account? Sign up now