A7 SoC Explained

I’m still surprised by the amount of confusion around Apple’s CPU cores, so that’s where I’ll start. I’ve already outlined how ARM’s business model works, but in short there are two basic types of licenses ARM will bestow upon its partners: processor and architecture. The former involves implementing an ARM designed CPU core, while the latter is the creation of an ARM ISA (Instruction Set Architecture) compatible CPU core.

NVIDIA and Samsung, up to this point, have gone the processor license route. They take ARM designed cores (e.g. Cortex A9, Cortex A15, Cortex A7) and integrate them into custom SoCs. In NVIDIA’s case the CPU cores are paired with NVIDIA’s own GPU, while Samsung licenses GPU designs from ARM and Imagination Technologies. Apple previously leveraged its ARM processor license as well. Until last year’s A6 SoC, all Apple SoCs leveraged CPU cores designed by and licensed from ARM.

With the A6 SoC however, Apple joined the ranks of Qualcomm with leveraging an ARM architecture license. At the heart of the A6 were a pair of Apple designed CPU cores that implemented the ARMv7-A ISA. I came to know these cores by their leaked codename: Swift.

At its introduction, Swift proved to be one of the best designs on the market. An excellent combination of performance and power consumption, the Swift based A6 SoC improved power efficiency over the previous Cortex A9 based design. Swift also proved to be competitive with the best from Qualcomm at the time. Since then however, Qualcomm has released two evolutions of its CPU core (Krait 300 and Krait 400), and pretty much regained performance leadership over Apple. Being on a yearly release cadence, this is Apple’s only attempt to take back the crown for the next 12 months.

Following tradition, Apple replaces its A6 SoC with a new generation: A7.

With only a week to test battery life, performance, wireless and cameras on two phones, in addition to actually using them as intended, there wasn’t a ton of time to go ridiculously deep into the new SoC’s architecture. Here’s what I’ve been able to piece together thus far.

First off, based on conversations with as many people in the know as possible, as well as just making an educated guess, it’s probably pretty safe to say that the A7 SoC is built on Samsung’s 28nm HK+MG process. It’s too early for 20nm at reasonable yields, and Apple isn’t ready to move some (not all) of its operations to TSMC.

The jump from 32nm to 28nm results in peak theoretical scaling of 76.5% (the same design on 28nm can be no smaller than 76.5% of the die area at 32nm). In reality, nothing ever scales perfectly so we’re probably talking about 80 - 85% tops. Either way that’s a good amount of room for new features.

At its launch event Apple officially announced both die size for the A7 (102mm^2) as well as transistor count (over 1 billion). Don’t underestimate the magnitude of both of these disclosures. The technical folks at Cupertino are clearly winning some battle to talk more about their designs and not less. We’re not yet at the point where I’m getting pretty diagrams and a deep dive, but it’s clear that Apple is beginning to open up more (and it’s awesome).

Apple has never previously disclosed transistor count. I also don’t know if this “over 1 billion” figure is based on a schematic or layout transistor count. The only additional detail I have is that Apple is claiming a near doubling of transistors compared to the A6. Looking at die sizes and taking into account scaling from the process node shift, there’s clearly a more fundamental change to the chip’s design. It is possible to optimize a design (and transistors) for area, which seems to be what has happened here.

The CPU cores are, once again, a custom design by Apple. These aren’t Cortex A57 derivatives (still too early for that), but rather some evolution of Apple’s own Swift architecture. I’ll dive into specifics of what I’ve been able to find in a moment. To answer the first question on everyone’s mind, I believe there are two of these cores on the A7. Before I explain how I arrived at this conclusion, let’s first talk about cores and clock speeds.

I always thought the transition from 2 to 4 cores happened quicker in mobile than I had expected. Thankfully there are some well threaded apps that have been able to take advantage of more than two cores and power gating keeps the negative impact of the additional cores down to a minimum. As we saw in our Moto X review however, two faster cores are still better for most uses than four cores running at lower frequencies. NVIDIA forced everyone’s hand in moving to 4 cores earlier than they would’ve liked, and now you pretty much can’t get away with shipping anything less than that in an Android handset. Even Motorola felt necessary to obfuscate core count with its X8 mobile computing system. Markets like China seem to also demand more cores over better ones, which is why we see such a proliferation of quad-core Cortex A5/A7 designs. Apple has traditionally been sensible in this regard, even dating back to core count decisions in its Macs. I remembering reviewing an old iMac and pitting it against a Dell XPS One at the time. This was in the pre-power gating/turbo days. Dell went the route of more cores, while Apple chose for fewer, faster ones. It also put the CPU savings into a better GPU. You can guess which system ended out ahead.

In such a thermally constrained environment, going quad-core only makes sense if you can properly power gate/turbo up when some cores are idle. I have yet to see any mobile SoC vendor (with the exception of Intel with Bay Trail) do this properly, so until we hit that point the optimal target is likely two cores. You only need to look back at the evolution of the PC to come to the same conclusion. Before the arrival of Nehalem and Lynnfield, you always had to make a tradeoff between fewer faster cores and more of them. Gaming systems (and most users) tended to opt for the former, while those doing heavy multitasking went with the latter. Once we got architectures with good turbo, the 2 vs 4 discussion became one of cost and nothing more. I expect we’ll follow the same path in mobile.

Then there’s the frequency discussion. Brian and I have long been hinting at the sort of ridiculous frequency/voltage combinations mobile SoC vendors have been shipping at for nothing more than marketing purposes. I remember ARM telling me the ideal target for a Cortex A15 core in a smartphone was 1.2GHz. Samsung’s Exynos 5410 stuck four Cortex A15s in a phone with a max clock of 1.6GHz. The 5420 increases that to 1.7GHz. The problem with frequency scaling alone is that it typically comes at the price of higher voltage. There’s a quadratic relationship between voltage and power consumption, so it’s quite possibly one of the worst ways to get more performance. Brian even tweeted an image showing the frequency/voltage curve for a high-end mobile SoC. Note the huge increase in voltage required to deliver what amounts to another 100MHz in frequency.

The combination of both of these things gives us a basis for why Apple settled on two Swift cores running at 1.3GHz in the A6, and it’s also why the A7 comes with two cores running at the same max frequency. Interestingly enough, this is the same max non-turbo frequency Intel settled at for Bay Trail. Given a faster process (and turbo), I would expect to see Apple push higher frequencies but without those things, remaining conservative makes sense. I verified frequency through a combination of reporting tools and benchmarks. While it’s possible that I’m wrong, everything I’ve run on the device (both public and not) points to a 1.3GHz max frequency.

Verifying core count is a bit easier. Many benchmarks report core count, I also have some internal tools that do the same - all agreed on the same 2 cores/2 threads conclusion. Geekbench 3 breaks out both single and multithreaded performance results. I checked with the developer to ensure that the number of threads isn’t hard coded. The benchmark queries the max number of logical CPUs before spawning that number of threads. Looking at the ratio of single to multithreaded performance on the iPhone 5s, it’s safe to say that we’re dealing with a dual-core part:

Geekbench 3 Single vs. Multithreaded Performance - Apple A7
  Integer FP
Single Threaded 1471 1339
Multi Threaded 2872 2659
A7 Advantage 1.97x 1.99x
Peak Theoretical 2C Advantage 2.00x 2.00x

Now the question is, what’s changed in these cores?

 

Introduction, Hardware & Cases After Swift Comes Cyclone
POST A COMMENT

466 Comments

View All Comments

  • Wilco1 - Wednesday, September 18, 2013 - link

    If all you can do is name calling then you clearly haven't got a clue or any evidence to prove your point. Either come up with real evidence or leave the debate to the experts. Do you even understand what IPC means?

    For example in your link a low clocked Jaguar is keeping up with a much higher clocked Bay Trail (yes it boosts to 2.4GHz during the benchmark run), so the obvious conclusion is that Jaguar has far higher IPC than Bay Trail. For example Jaguar has 28% higher IPC than BT in the 7-zip test. Just like I said.

    Now show me a single benchmark where BT gets better IPC than Jaguar. Put up or shut up.
    Reply
  • zeo - Wednesday, September 18, 2013 - link

    The point that BT Beats Jaguar, especially at performance per watt, clearly proved the point given!

    And insisting as you are on your original assessment is a characteristic of acting like a Troll... So you're not going to convince anyone by simply insisting on being right... especially when we can point to Anandtech pointing out multiple benchmarks in this article that showed the Kabini performing lower than bother BT and the A7!

    So either learn to read what these reviews actually post or accept getting labeled a Troll... either way, you're not winning this argument!
    Reply
  • Wilco1 - Wednesday, September 18, 2013 - link

    No, Bob's claim was that Bay Trail was faster clock for clock than Jaguar, when the link he gave to prove it clearly showed that is false. BT may well beat Jaguar on perf/watt, but that's not at all what we were discussing.

    So next time try to understand what people are discussing before jumping in and calling people a Troll. And yes I stand by my characterization of various microarchitectures, precisely because it's based on actual benchmark results.
    Reply
  • Bob Todd - Wednesday, September 18, 2013 - link

    IPC as a comparison point made a lot of sense when we were arguing about which 130 watt desktop processor had the better architecture. It seems largely irrelevant for mobile where we care about performance per watt. Your argument is continually that the ARM/AMD designs are 'faster' based on Geekbench. If Jaguar has a 28% higher IPC than Bay Trail, do you honestly think it matters if Bay Trail is still the faster chip @ 1/3 (or less) of the power requirements? If someone came up with a crazy design that needed 5x the clocks to have a 2x performance advantage of their competitor, but did so with half the power budget, they'd still be racking up design wins (assuming parity for all other aspects like price). That's a two way street. If ARM designs a desktop/server focused chip that needs higher clocks than Intel to reach performance parity or be faster than Haswell, but does so with significantly less power it's still a huge win for them. Reply
  • Wilco1 - Wednesday, September 18, 2013 - link

    IPC matters as you can compare different microarchitectures and make predictions on performance at different clock speeds. I'm sure you know many CPUs come in a confusing variation of clockspeeds (and even different base/turbo frequencies for Intel parts), but the underlying microarchitecture always remains the same. You can't make claims like "Bay Trail is faster than Jaguar" when such a claim would only valid at very specific frequencies. However we can say that Jaguar has better IPC than BT and that will remain true irrespectively of the frequency. So that is the purpose of the list of microarchitectures I posted.

    I was originally talking about the performance of Apple A7 and Bay Trail in Geekbench. You may not like Geekbench, but it represents close to actual CPU performance (not rubbish JavaScript, tuned benchmarks, cheating - remember AnTuTu? - or unfair compiler tricks).

    Now you're right that besides absolute performance, perf/W is also important. Unfortunately there is almost no detailed info on power consumption, let alone energy to do a certain task for various CPUs. While TDP (in the rare cases it is known!) can give some indication, different feature sets, methodologies, "dial-a-TDP" and turbo features makes them hard to compare. What we can say in general is that high-frequency designs tend to be less efficient and use more power than lower frequency, higher IPC designs. In that sense I would not be surprised if the A7 also shows a very good perf/Watt. How it compares with BT is not clear until BT phones appear.
    Reply
  • Bob Todd - Wednesday, September 18, 2013 - link

    Your point about benchmarks is actually what surprises me the most nowadays. The biggest thing every in-depth review of a new ARM design brings to light is how freaking piss poor the state of mobile benchmarking is from a software standpoint. I didn't expect magic by the time we got to A9 designs, but it's a little ridiculous that we're still in a state of infancy for mobile benchmarking tools over half a decade after the market really started heating up. Reply
  • Bob Todd - Wednesday, September 18, 2013 - link

    And by "ARM design" I mean both their cores or others building to their ISA. Reply
  • Wilco1 - Thursday, September 19, 2013 - link

    Yes, mobile benchmarking is an absolute disgrace. And that's why I'm always pointing out how screwed up Anand's benchmarking is - I'm hoping he'll understand one day. How anyone can conclude anything from JS benchmarks is a total mystery to me. Anand might as well just show AnTuTu results and be done with it, that may actually be more accurate!

    Mobile benchmarks like EEMBC, CoreMark etc are far worse than the benchmarks they try to replace (eg. Dhrystone). And SPEC is useless as well. Ignoring the fact it is really a server benchmark, the main issue is that it ended up being a compiler trick contest than a fair CPU benchmark. Of course Geekbench isn't perfect either, but at the moment it's the best and fairest CPU bench: because it uses precompiled binaries you can't use compiler tricks to pretend your CPU is faster.
    Reply
  • akdj - Thursday, September 19, 2013 - link

    SO.....what is it the 'crew' is supposed to 'do'? NOT provide ANY benchmarks? Anand and team are utilizing the benchmarks available right now. They're not building the software to bench these devices...they're reviewing them...with the tools available, currently, NOW---on the market. If you're so interested in better mobile benchmarking (still in it's infancy---it's only really been 5 years since we've had multiple devices to even test), why not pursue and build your own benchmarking software? Seems like it may be a lucrative project. Sounds like you know a bit about CPU/GPU and SoC architecture---put something together. Sunspider is ubiquitous, used on any and all platforms from desktops to laptops---tablets to phones, people 'get it'. As well, GeekBench is re-inventing their benchmarking software---as well, the Google Octane tests are fairly new...and many of the folks using these devices ARE interested in how fast their browser populates, how quick a single core is---speed of apps opening and launching, opening a PDF, FPS playing games, et al.
    Again---if you're not 'happy' with how Anand is reviewing gear (the best on the web IMHO), open your own site---build your own tools, and lets see how things turn out for ya!
    Give credit where credit is due....I'd much rather see the way Anand is approaching reviews in the mobile sector than a 1500 word essay without benchmarking results because current "mobile benchmarking is an absolute disgrace"
    YMMV as always
    J

    PS---Thanks for the review guys....again, GREAT Job!
    Reply
  • Bob Todd - Thursday, September 19, 2013 - link

    Umm...I think you missed my point. I love the reviews here. That doesn't change the fact that mobile benchmarking software sucks compared to what we have available on the desktop. That isn't a slam against this site or any of the reviewers, and I fully expect them to use the (relatively crappy) software tools that are available. And they've even gone above and beyond and written some tools themselves to test specific performance aspects. I'm just surprised that with mobile being the fastest growing market, nobody has really stepped up to the plate to offer a good holistic benchmarking suite to measure cpu/gpu/memory/io performance across at least iOS/Android. And no, I don't expect anyone at Anandtech to write or pay someone to write such a tool. Reply

Log in

Don't have an account? Sign up now