Final Words

Bobcat was a turning point for AMD. The easily synthesized, low cost CPU design was found in the nearly 50 million Brazos systems AMD sold since its introduction. Jaguar improves upon Bobcat in a major way. The move to 28nm helps drive power even lower, which will finally get AMD into tablet designs with Temash. Despite being lower power, Jaguar also manages to increase performance appreciably over Bobcat. AMD claims up to a 22% increase in IPC compared to Bobcat. Combine the IPC gains with a more multi-core friendly design and Jaguar based APUs should be appreciably faster than their predecessors.

Quite possibly one of the only real weaknesses with Jaguar is the lack of aggressive turbo modes in any of the shipping implementations of the design. It appears that the first implementations of Jaguar were under time constraints, leaving many features (including improved thermal monitoring/management and turbo boost) on the cutting room floor. Kabini and Temash seem ripe for a mid-cycle update enabling turbo across more parts, which could do wonders for single threaded performance.

The Jaguar power story actually looks very good, it's just hampered by traditional PC legacy. None of the launch APUs here support the low power IOs necessary to drive platform power down even further. AMD is getting very close though. Jaguar's core power is easily sub-2W for lightweight tablet tasks, the rest of the platform (excluding display) drives it up to 4 - 7W. AMD definitely has the right building blocks to go after truly low power tablets in a major way, should it have the resources and bandwidth to do so.

In its cost and power band, Jaguar is presently without competition. Intel’s current 32nm Saltwell Atom core is outdated, and nothing from ARM is quick enough. It’s no wonder that both Microsoft and Sony elected to use Jaguar as the base for their next-generation console SoCs, there simply isn’t a better option today. As Intel transitions to its 22nm Silvermont architecture however Jaguar will finally get some competition. For the next few months though, AMD will enjoy a position it hasn’t had in years: a CPU performance advantage.

I can’t stress enough how important it is that AMD continues to focus on driving the single threaded performance of its cat-line of cores. Second chances are rare in this business, but that’s exactly what AMD has been offered with the rise of good enough computing. Jaguar vs. Atom is the best CPU story AMD has had in years. Regular updates to the architecture coupled with solid execution are necessary to ensure that history doesn’t repeat itself in a new segment of AMD’s business.

Long term, I can’t help but wonder what Bobcat’s success will do to shape AMD’s future microarchitecture decisions. I’m not sure what Jim Keller’s SoC project is, but I’m wondering if the days of really big cores might be over. I don’t know that really small cores are the answer either, but perhaps something in between...

The APUs: Kabini, Temash, Xbox One & PS4
Comments Locked

78 Comments

View All Comments

  • GuMeshow - Friday, May 24, 2013 - link

    The Embedded G-Series SOCs seem to be exactly Kabini + ECC memory enabled (ex: GX-420CA and A5-5200). This will probably be the cheapest way to get ECC enabled and better performance then Atom, next step up would be Intel S1200KPR + Celeron G1610?.

    I've been thinking of putting together a Router/Firewall/Proxy/NAS combo ...
  • R3MF - Thursday, May 23, 2013 - link

    HSA?
  • Spoelie - Thursday, May 23, 2013 - link

    Is it just me or does the shared L2 cache merely enable the same scaling to 4 cores as bobcat had to 2 cores? There is no "massive benefit" as alluded to in the numbers or discussion.

    Bobcat has for one thread 0.32 and for two threads 0.61, or a scaling of 95%. (0.64 perfect scaling)
    Jaguar has for one thread 0.39 and for four threads 1.50, or a scaling of 96% (1.56 perfect scaling)

    The 1% difference could easily be a result of score rounding. I see that a four core bobcat would probably scale worse than jaguar, but the percentages chosen in the table are a bit misleading.
  • Spoelie - Thursday, May 23, 2013 - link

    Of course, drawing such conclusions from a single benchmark is dangerous. If other benchmarks exhibit more code/data sharing and thread dependencies than Cinebench, their numbers might show a more appreciable scaling benefit from the shared L2 cache.
  • tipoo - Thursday, May 23, 2013 - link

    I wonder how this compares to the PowerPC 750, which the Wii U is based off of. The PS4 and One being Jaguar based, that would be interesting.
  • aliasfox - Thursday, May 23, 2013 - link

    Wii U uses a PPC 750? Correct me if I'm wrong, but the PPC 750 family is the same chip that Apple marketed as the G3 up until about 10 years ago? And IIRC, Dolphin in the GameCube was also based on this architecture?

    Back in the day, the G3 at least had formidable integer performance -clock for clock, it was able to outdo the Pentium II on certain (integer heavy) benchmarks by 2x. Its downfall was an outdated chipset (no proper support for DDR) and the inability to scale to higher clockspeeds - integer performance may have been fast, but floating point performance wasn't quite as impressively fast - good if the Pentium II you're competing against is nearly the same clock, bad when the PIII and Core Solos are 2x your clockspeed.

    Considering the history of the PPC 750, I'd love to know how a modern version of it would compare.
  • tipoo - Thursday, May 23, 2013 - link

    Yes, the Gamecube, Wii, and Wii U all use PowerPC 750 based processors. The Wii U is the only known multicore implementation of it, but the core itself appears unchanged from the Wii, according to the hacker that told us the clock speed and other details.
  • tipoo - Thursday, May 23, 2013 - link

    And you're right, it was good at integer, but the FPU was absolutely terrible...Which makes it an odd choice for games, since games rely much more on floating point math than integer. I think it was only kept for backwards compatibility, while even three Jaguar cores would have been better performing and still small.

    The Nintendo faithful are saying it won't matter since FP work will get pushed to the GPU, but the GPU is already straining to get even a little ahead of the PS360, plus not all algorithms work well on GPUs.
  • tipoo - Thursday, May 23, 2013 - link

    Also barely any SIMD, just paired singles. Even the ancient Xenon had good SIMD.
  • tipoo - Thursday, May 23, 2013 - link

    Unchanged on the actual core parts I mean, obviously the eDRAM is different from old 750s.

Log in

Don't have an account? Sign up now