After Swift Comes Cyclone Oscar

I was fortunate enough to receive a tip last time that pointed me at some LLVM documentation calling out Apple’s Swift core by name. Scrubbing through those same docs, it seems like my leak has been plugged. Fortunately I came across a unique string looking at the iPhone 5s while it booted:

I can’t find any other references to Oscar online, in LLVM documentation or anywhere else of value. I also didn’t see Oscar references on prior iPhones, only on the 5s. I’d heard that this new core wasn’t called Swift, referencing just how different it was. Obviously Apple isn’t going to tell me what it’s called, so I’m going with Oscar unless someone tells me otherwise.

Oscar is a CPU core inside M7, Cyclone is the name of the Swift replacement.

Cyclone likely resembles a beefier Swift core (or at least Swift inspired) than a new design from the ground up. That means we’re likely talking about a 3-wide front end, and somewhere in the 5 - 7 range of execution ports. The design is likely also capable of out-of-order execution, given the performance levels we’ve been seeing.

Cyclone is a 64-bit ARMv8 core and not some Apple designed ISA. Cyclone manages to not only beat all other smartphone makers to ARMv8 but also key ARM server partners. I’ll talk about the whole 64-bit aspect of this next, but needless to say, this is a big deal.

The move to ARMv8 comes with some of its own performance enhancements. More registers, a cleaner ISA, improved SIMD extensions/performance as well as cryptographic acceleration are all on the menu for the new core.

Pipeline depth likely remains similar (maybe slightly longer) as frequencies haven’t gone up at all (1.3GHz). The A7 doesn’t feature support for any thermal driven CPU (or GPU) frequency boost.

The most visible change to Apple’s first ARMv8 core is a doubling of the L1 cache size: from 32KB/32KB (instruction/data) to 64KB/64KB. Along with this larger L1 cache comes an increase in access latency (from 2 clocks to 3 clocks from what I can tell), but the increase in hit rate likely makes up for the added latency. Such large L1 caches are quite common with AMD architectures, but unheard of in ultra mobile cores. A larger L1 cache will do a good job keeping the machine fed, implying a larger/more capable core.

The L2 cache remains unchanged in size at 1MB shared between both CPU cores. L2 access latency is improved tremendously with the new architecture. In some cases I measured L2 latency 1/2 that of what I saw with Swift.

The A7’s memory controller sees big improvements as well. I measured 20% lower main memory latency on the A7 compared to the A6. Branch prediction and memory prefetchers are both significantly better on the A7.

I noticed large increases in peak memory bandwidth on top of all of this. I used a combination of custom tools as well as publicly available benchmarks to confirm all of this. A quick look at Geekbench 3 (prior to the ARMv8 patch) gives a conservative estimate of memory bandwidth improvements:

Geekbench 3.0.0 Memory Bandwidth Comparison (1 thread)
  Stream Copy Stream Scale Stream Add Stream Triad
Apple A7 1.3GHz 5.24 GB/s 5.21 GB/s 5.74 GB/s 5.71 GB/s
Apple A6 1.3GHz 4.93 GB/s 3.77 GB/s 3.63 GB/s 3.62 GB/s
A7 Advantage 6% 38% 58% 57%

We see anywhere from a 6% improvement in memory bandwidth to nearly 60% running the same Stream code. I’m not entirely sure how Geekbench implemented Stream and whether or not we’re actually testing other execution paths in addition to (or instead of) memory bandwidth. One custom piece of code I used to measure memory bandwidth showed nearly a 2x increase in peak bandwidth. That may be overstating things a bit, but needless to say this new architecture has a vastly improved cache and memory interface.

Looking at low level Geekbench 3 results (again, prior to the ARMv8 patch), we get a good feel for just how much the CPU cores have improved.

Geekbench 3.0.0 Compute Performance
  Integer (ST) Integer (MT) FP (ST) FP (MT)
Apple A7 1.3GHz 1065 2095 983 1955
Apple A6 1.3GHz 750 1472 588 1165
A7 Advantage 42% 42% 67% 67%

Integer performance is up 44% on average, while floating point performance is up by 67%. Again this is without 64-bit or any other enhancements that go along with ARMv8. Memory bandwidth improves by 35% across all Geekbench tests. I confirmed with Apple that the A7 has a 64-bit wide memory interface, and we're likely talking about LPDDR3 memory this time around so there's probably some frequency uplift there as well.

The result is something Apple refers to as desktop-class CPU performance. I’ll get to evaluating those claims in a moment, but first, let’s talk about the other big part of the A7 story: the move to a 64-bit ISA.

A7 SoC Explained The Move to 64-bit
Comments Locked

464 Comments

View All Comments

  • BrooksT - Wednesday, September 18, 2013 - link

    Nobody will disagree because you've completely destroyed your credibility by insulting the credibility, integrity, and competence of the reviewer, the site, and Apple because the evidence doesn't conform to your speculations and bias. You are not to be taken seriously, and at this point I think everyone sees that.

    Post evidence of this conspiracy or STFU.
  • ddriver - Thursday, September 19, 2013 - link

    How a whiff of reality for you - my credibility is and has not been on the line on this one. You don't know who I am, you don't know my credentials. This is not the case for Anand, even if I am right he is not in the position to admit to compiling the review in a manner that creates an unrealistically good presentation of a product, because unlike for me, that would be a huge credibility calamity for him. If anything, his responses are very "political" carefully dancing around the pivot points of my concerns. While his response did partially bring light to a few of my concerns, my key points remain valid - the article continues to not compare A7 with ARMv7 head to head in the sole native CPU benchmark present in the article, "CPU performance" was not renamed to JS performance or moved to browser performance or something like that. See, just because he didn't agree with my points and admit to being biased does not mean I am wrong and that is not the case, considering he is not in the position to do that. I didn't really expect anything more or less than the same "carefully dancing" answer as the article itself, my main motivation was to show him that not all AT readers are incapable of reading between the lines, for the sake of future articles, I did not expect that he would make any revision to the article at hand. Honesty is for those who have nothing to lose, and while his credibility is no the line, my isn't, make the conclusions, if you can ;)
  • CyberAngel - Thursday, September 19, 2013 - link

    Don't worry! I believe you...conditionally!
    I put it this way: I greatly doubt that the tests would reveal any points that are less than favorable the Apple. ANY company would do the same: promote the best parts and highlight the strength of the product.
  • akdj - Thursday, September 19, 2013 - link

    "You don't know who I am, you don't know my credentials."
    I'm not sure anyone here is interested---you've already made clear you're a conspiracy theorist, that you believe Apple is paying off reviewers, that you disrespect folks MUCH more intelligent than yourself when it comes to chip architecture...and that your "main motivation was (Is) to show him that not all AT readers are incapable of reading between the lines". You've shown NO one ANYthing substantiated. You continue to argue baseless facts and accuse respected individuals and groups/teams of intelligent members of being bias towards Apple. Nothing in this review supports your claims---NOTHING! And, as I pointed out earlier---even the biggest anti-apple sites are applauding Apple's efforts with this SoC effort.
    You're in the minority---and to be so vain that we would care about who you are and what your credentials are is silly. It sounds to me like you're a 17 year old with a decent vocabulary and not enough paper in the pocket to pick up an iPhone 5s for yourself. But...what do I know. I don't know you, your credentials...or how you lean politically, nor do I care.
    IMO---you're an insult to the entire Anand crew. I'm not sure why I continue to read your responses, they're all the same, just worded differently. Again...you're in the (extreme) minority. You're certainly not an engineer, chip designer, app developer or technological guru---if you were, you would understand the feat Apple has achieved with this SoC architecture.
    J
  • Nurenthapa - Friday, September 20, 2013 - link

    I've been enjoying reading this in China, but you, sir, are really annoying me with your sniveling drivel. You have an axe to grind and simply won't shut up. Hope you disappear from this forum. BTW, I use a HTC One and iPad 2, and occasionally my old original 2007 iPhone. I love IOS and iPhones, but won't be buying one until they come out with a somewhat bigger screen.
  • oryades - Wednesday, September 18, 2013 - link

    Intel, now Apple, the same featured reviews.
  • edward kuebler - Wednesday, September 18, 2013 - link

    We are talking about 64 bits too much. The story is new instruction set in ARMv8. Instead complicating the hardware for backwards compatibility (e.g. look at x86 still supporting 16bit code) they wrote a new instruction set faster and less energy demanding. There is still ARMv7 compatibility, but the 64bit mode is independent. And the thing is, once you redesign your architecture, why not go 64bit? what´s the point of staying 32 bit? Moving more data is both slower and faster. More and wider registers help compiler optimizations and media decoding. I didn't get all this “cunning deceitful conspiracy” feeling you talk about. Staying in 32 bit land, *that* would keep me guessing.
  • Anand Lal Shimpi - Wednesday, September 18, 2013 - link

    Our browser based suite (stressing js/HTML5 and other browser based workloads) remains unchanged from all of the other mobile SoC reviews we've done. There's no way of getting around the software differences on these mobile devices as you buy hardware+software together. Unfortunately it's still our best option for non-GPU cross platform comparisons, there just aren't many good cross platform CPU tests.

    I called out the inclusion of hardware accelerated AES/SHA when referencing those tests, there were no attempts to hide that fact. The fact remains that those algorithms will see a speedup on ARMv8 hardware because of those instructions. Note this is no different than when we run the TrueCrypt benchmarks on AES-NI enabled processors vs. those that don't have it (e.g. http://images.anandtech.com/graphs/graph5626/44765...

    Apple provided absolutely zero guidelines on how the review was to be conducted. The only stipulations were around making sure we didn't disclose the fact that we had devices. In fact, most manufacturers don't - at least not with us. Whenever there are any stipulations presented, we always disclose them on the site (e.g. see our early look at Trinity desktop performance).

    Krait implements ARMv7, so that's 64-bit wide registers for its NEON units. It expanded the width of the execution units, but the registers themselves have to adhere to the ARMv7 ISA.

    I think we explained why 64-bit makes sense (doing so at the last minute doesn't make sense, immediate SIMD/Crypto perf increases today, and helps build up the ecosystem), and even highlighted cases where a performance degradation does happen (see: Dijkstra tests). Keep in mind that iOS has always erred on the side of being more thrifty with RAM to begin with. I would like to see more but I don't know how necessary it is today.

    Take care,
    Anand
  • ddriver - Wednesday, September 18, 2013 - link

    Anand, maybe you should hire a developer to write native cross platform benchmark tools. This is the only way to avoid all caveats like sponsored exclusive optimizations, different implementations, eliminate unrealistic low footprint synthetics, "selective compilers" (*cough Intel*) and whatnot. Considering the amount of reviews you are doing and the fact that C/C++ compilers have caught up with ARM for a long time, this is nothing hard and something that entirely makes sense, especially relative to using different JS engine implementations to measure CPU performance. JS should go in the "browser" department, not CPU performance.

    According to wikipedia, Krait implements 128bit SIMD, so maybe that is a mistake on wikipedia's behalf?

    I still think encryption results belong in their own chart, and have no place in a chart that is supposed to be indicative of the integer performance delta between 32 and 64bit execution modes. Even with the clarification you made, it creates an unrealistic impression, not to mention some people skimp over the text and only look at the numbers. Encryption is encryption, integer performance is integer performance. Why mix the two (except for the reason I already mentioned and you deny)?

    I wish you'd reflected a bit on the marketing aspect of the transition to 64, considering how much apple is riding it this time around. No one argues 64bit is good and more performance is good, but this brings up the issue of the particular implementation, e.g. a fast chip with only a single gigabyte of ram, and how will that play out with an actual performance demanding real world application.

    Thanks for addressing my concerns.
  • Wilco1 - Wednesday, September 18, 2013 - link

    ARMv7 has 32 64-bit SIMD registers but they can also be used as 16 128-bit SIMD registers. Modern CPUs like Cortex-A15 and Krait support many 128-bit SIMD operations in a single cycle, but not all operations are supported (such as double precision FP). ARMv8 has 32 128-bit SIMD registers and supports SIMD of 2 64-bit doubles.

Log in

Don't have an account? Sign up now