AMD Beema/Mullins Architecture & Performance Preview

Name: AMD Beema/Mullins Architecture & Performance Preview
Item: AMD Beema/Mullins Architecture & Performance Preview
Author: Anand Lal Shimpi

by Anand Lal Shimpi on April 29, 2014 12:00 AM EST

82 Comments | Add A Comment

82 Comments

When AMD launched its Kabini and Temash APUs last year it delivered a compelling cost/performance story, but its power story wasn’t all that impressive. Despite being built out of relatively low power components, nearly all of AMD’s entry level APUs carried 15W TDPs, with a couple weighing in at 8 - 9W and only a single 1GHz dual-core part dropping down to 3.9W. By comparison, Intel was shipping full blown Haswell Ultrabook parts at 15W - offering substantially better CPU performance, in a similar thermal envelope (although at a higher cost). The real disruption for AMD was Intel’s Bay Trail, which showed up with a similar looking micro architecture running at substantially higher clock speeds and TDPs below 8W.

AMD seemed to have all of the right pieces to build a power efficient mobile SoC, but for some reason we weren’t seeing it. Today that begins to change with the the successors to Kabini and Temash.

Codenamed Beema and Mullins, these are the 2014 updates to Kabini and Temash (respectively). Beema is aimed at entry level notebooks, while Mullins targets tablets. For now, both are designed for Windows machines. Although I suspect we’ll eventually see AMD address the native Android market head on, for now AMD is relying on running Android on top of Windows for those who really want it. No word on if/when we’ll get a socketed Beema for entry level desktops.

Like their predecessors, Beema and Mullins combine four low power AMD x86 cores (Puma+ this time, instead of Jaguar) with 128 GCN based Radeon GPU cores. AMD will continue to offer a couple of dual-core SKUs, but they are harvested from a quad-core die. AMD remains unwilling to release official die area figures, but there is a slight increase in transistor count:

AMD/Intel Transistor Count & Die Area Comparison
SoC	Process Node	Transistor Count	Die Area
AMD Zacate	TSMC 40nm	450M+	75mm²
AMD Kabini/Temash	TSMC 28nm	914M	~107mm² (est)
AMD Beema/Mullins	GF 28nm	930M	~107mm² (est)
AMD Llano	GF 32nm SOI	1.18B	228mm²
AMD Trinity/Richland	GF 32nm SOI	1.30B	246mm²
AMD Kaveri	GF 28nm SHP	2.41B	245mm²
Intel Haswell (4C/GT2)	Intel 22nm	1.40B	177mm²

I’d expect a similar die size to Kabini/Temash. It’s interesting to note that these SoCs have a transistor count somewhere south of Apple’s A7.

Puma+ is based on the same micro architecture as Jaguar. We’re still looking at a 2-wide OoO design with the same number of execution units and data structures inside the chip. The memory interface remains unchanged as well at 64-bits wide. These new SoCs are still built on the same 28nm process as their predecessor. The process however has seen some improvements. Not only are both the CPU and GPU designs slightly better optimized for lower power operation, but both benefit from improvements to the manufacturing process resulting in substantial decreases in leakage current.

AMD claims a 19% reduction in core leakage/static current for Puma+ compared to Jaguar at 1.2V, and a 38% reduction for the GPU. The drop in leakage directly contributes to a substantially lower power profile for Beema and Mullins.

AMD also went in and tweaked the SoC’s memory interface. Kabini/Temash had a standard PC-like DDR3 memory interface. All of the complexity required for broad memory module compatibility and variations in trace routing was handed by the controller itself. This not only added complexity to the DDR3 interface but power as well. With Beema and Mullins, AMD took a page from the smartphone SoC design guide and traded flexibility for power. These platforms now ship with more strict guidelines as to what sort of memory can be used on board and how traces must be routed. The result is a memory interface that shaves off more than 500mW when in this more strict, low power mode. OEMs looking to ship a design with socketed DRAM can still run the memory interface in a higher power mode to ensure memory compatibility.

These SoCs won’t be available in a PoP configuration unfortunately - OEMs will have to rely on discrete DRAM packages rather than a fully integrated solution. Beema/Mullins also show up to a 200mW reduction in power consumed by the display interface compared to Kabini/Temash.

The combination of all of this is 20% lower idle power compared to the previous generation of AMD entry level and low power APUs. AMD put together a nice graph illustrating its progress over the years:

Beema and Mullins are definitely in a good place, however they still do consume more power at idle than the smartphone SoCs we typically find in iOS and Android tablets. AMD isolated APU power for the graph above and is using an “eReader” workload (aka display on but not animating, system otherwise idle). It just so happens I gathered similar data for our 2013 Nexus 7 review. The workloads and measurements are different (AMD isolates APU power, I’m looking at total platform power minus display) but it’s enough to put things in perspective:

SoC Idle Power Comparison

AMD has dropped power consumption considerably over the years, but it’s still not as power efficient as high end mobile silicon.

AMD sees no value in supporting Microsoft's Connected Standby standard at this point, which makes sense given the limited success of Windows 8 tablets. Once again this seems to point to AMD eventually adopting Android for its tablet aspirations.

Looking forward, AMD has more tricks up its sleeve to continue to drive power down. Most interesting on the list? We’ll see an integrated voltage regulator (ala Haswell’s FIVR) from AMD in 2015.

New Turbo Boost, The Lineup & TrustZone

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

82 Comments

View All Comments

superunknown98 - Tuesday, April 29, 2014 - link
Although I don't think it would happen, or at least be publicly announced, Microsoft could use these new cores in Xbox One but could only enable turbo for the two cores that run the virtualization and Xbox OS. They would also benefit from the reduced TDP, which is something that eventually happens at some point anyway.
lmcd - Thursday, May 1, 2014 - link
Now that comment on the OS and virtualization cores was quite interesting. I now thing that a Puma-edition is likely (though I think a GPU switch-up is more likely if more efficient GCN variants occur.
MikeMurphy - Monday, May 26, 2014 - link
No sense revising an entire chip to save a few watts of power. They might revise it later provided that substantial power savings are attainable, otherwise will implement the usual die shrinks. Minor performance increases shouldn't be ruled out although focus will be on power reduction while maintaining similar performance.
Rockmandash12 - Tuesday, April 29, 2014 - link
And this is what happens when AMD gets into gear and makes a new architecture. Real improvement that's competitive with rivals. Common.... new Desktop flagship architecture that's faster and more efficient? please?
Samus - Wednesday, April 30, 2014 - link
This is pretty impressive. And honestly, out of nowhere. They all the sudden have an amazing tablet/uSFF SoC.
silverblue - Tuesday, April 29, 2014 - link
Didn't Kaveri launch with a fully enabled PSP as well?

Judging by the performance story thus far, I think it will put to bed the calls for cat cores to replace AMD's higher powered offerings. Yes, we're past K8 performance levels now, but Llano and Trinity (let alone Richland/Kaveri) still have it beat. You'd need some serious clock speeds to get decent performance and it's the wrong silicon for that.

I was disappointed to see that it's practically the same uarch as Jaguar, meaning we're still going to have a single channel memory controller, however the performance and power improvements are substantial, and the memory controller has been improved anyway which should reduce the need for said controller.
ET - Tuesday, April 29, 2014 - link
I think it's hard to draw conclusions of core performance when the RAM is a limitation. It's entirely possible that these cores are still pretty far from the big cores, but on the other hand it's possible that more bandwidth could up performance by quite a few percent.
ssnitrousoxide - Tuesday, April 29, 2014 - link
AMD has always done some impressive work to squeeze every bit of performance from an inferior node. How they managed to improve energy efficiency so much is beyond me.
mfoley93 - Wednesday, April 30, 2014 - link
It seems that most of the power reduction is at the manufacturing level; maybe it's more accurate to engineering tolerances or perhaps a more pure silicon, either way I don't think TSMC will be telling us what it is. The rest comes from eliminating circuitry that provides some more flexibility to OEMs, something nVIDIA has been doing for a couple years now, and while it doesn't really
count, it's something Apple does very well.
yannigr - Tuesday, April 29, 2014 - link
Any idea if Beema will be compatible with existing AM1 motherboards?
(with only a BIOS update of course)

AMD Beema/Mullins Architecture & Performance Preview

Post Your Comment

82 Comments

View All Comments

superunknown98 - Tuesday, April 29, 2014 - link

lmcd - Thursday, May 1, 2014 - link

MikeMurphy - Monday, May 26, 2014 - link

Rockmandash12 - Tuesday, April 29, 2014 - link

Samus - Wednesday, April 30, 2014 - link

silverblue - Tuesday, April 29, 2014 - link

ET - Tuesday, April 29, 2014 - link

ssnitrousoxide - Tuesday, April 29, 2014 - link

mfoley93 - Wednesday, April 30, 2014 - link

yannigr - Tuesday, April 29, 2014 - link

Log in

Don't have an account? Sign up now