After Swift Comes Cyclone Oscar

I was fortunate enough to receive a tip last time that pointed me at some LLVM documentation calling out Apple’s Swift core by name. Scrubbing through those same docs, it seems like my leak has been plugged. Fortunately I came across a unique string looking at the iPhone 5s while it booted:

I can’t find any other references to Oscar online, in LLVM documentation or anywhere else of value. I also didn’t see Oscar references on prior iPhones, only on the 5s. I’d heard that this new core wasn’t called Swift, referencing just how different it was. Obviously Apple isn’t going to tell me what it’s called, so I’m going with Oscar unless someone tells me otherwise.

Oscar is a CPU core inside M7, Cyclone is the name of the Swift replacement.

Cyclone likely resembles a beefier Swift core (or at least Swift inspired) than a new design from the ground up. That means we’re likely talking about a 3-wide front end, and somewhere in the 5 - 7 range of execution ports. The design is likely also capable of out-of-order execution, given the performance levels we’ve been seeing.

Cyclone is a 64-bit ARMv8 core and not some Apple designed ISA. Cyclone manages to not only beat all other smartphone makers to ARMv8 but also key ARM server partners. I’ll talk about the whole 64-bit aspect of this next, but needless to say, this is a big deal.

The move to ARMv8 comes with some of its own performance enhancements. More registers, a cleaner ISA, improved SIMD extensions/performance as well as cryptographic acceleration are all on the menu for the new core.

Pipeline depth likely remains similar (maybe slightly longer) as frequencies haven’t gone up at all (1.3GHz). The A7 doesn’t feature support for any thermal driven CPU (or GPU) frequency boost.

The most visible change to Apple’s first ARMv8 core is a doubling of the L1 cache size: from 32KB/32KB (instruction/data) to 64KB/64KB. Along with this larger L1 cache comes an increase in access latency (from 2 clocks to 3 clocks from what I can tell), but the increase in hit rate likely makes up for the added latency. Such large L1 caches are quite common with AMD architectures, but unheard of in ultra mobile cores. A larger L1 cache will do a good job keeping the machine fed, implying a larger/more capable core.

The L2 cache remains unchanged in size at 1MB shared between both CPU cores. L2 access latency is improved tremendously with the new architecture. In some cases I measured L2 latency 1/2 that of what I saw with Swift.

The A7’s memory controller sees big improvements as well. I measured 20% lower main memory latency on the A7 compared to the A6. Branch prediction and memory prefetchers are both significantly better on the A7.

I noticed large increases in peak memory bandwidth on top of all of this. I used a combination of custom tools as well as publicly available benchmarks to confirm all of this. A quick look at Geekbench 3 (prior to the ARMv8 patch) gives a conservative estimate of memory bandwidth improvements:

Geekbench 3.0.0 Memory Bandwidth Comparison (1 thread)
  Stream Copy Stream Scale Stream Add Stream Triad
Apple A7 1.3GHz 5.24 GB/s 5.21 GB/s 5.74 GB/s 5.71 GB/s
Apple A6 1.3GHz 4.93 GB/s 3.77 GB/s 3.63 GB/s 3.62 GB/s
A7 Advantage 6% 38% 58% 57%

We see anywhere from a 6% improvement in memory bandwidth to nearly 60% running the same Stream code. I’m not entirely sure how Geekbench implemented Stream and whether or not we’re actually testing other execution paths in addition to (or instead of) memory bandwidth. One custom piece of code I used to measure memory bandwidth showed nearly a 2x increase in peak bandwidth. That may be overstating things a bit, but needless to say this new architecture has a vastly improved cache and memory interface.

Looking at low level Geekbench 3 results (again, prior to the ARMv8 patch), we get a good feel for just how much the CPU cores have improved.

Geekbench 3.0.0 Compute Performance
  Integer (ST) Integer (MT) FP (ST) FP (MT)
Apple A7 1.3GHz 1065 2095 983 1955
Apple A6 1.3GHz 750 1472 588 1165
A7 Advantage 42% 42% 67% 67%

Integer performance is up 44% on average, while floating point performance is up by 67%. Again this is without 64-bit or any other enhancements that go along with ARMv8. Memory bandwidth improves by 35% across all Geekbench tests. I confirmed with Apple that the A7 has a 64-bit wide memory interface, and we're likely talking about LPDDR3 memory this time around so there's probably some frequency uplift there as well.

The result is something Apple refers to as desktop-class CPU performance. I’ll get to evaluating those claims in a moment, but first, let’s talk about the other big part of the A7 story: the move to a 64-bit ISA.

A7 SoC Explained The Move to 64-bit
Comments Locked

464 Comments

View All Comments

  • Wilco1 - Wednesday, September 18, 2013 - link

    If all you can do is name calling then you clearly haven't got a clue or any evidence to prove your point. Either come up with real evidence or leave the debate to the experts. Do you even understand what IPC means?

    For example in your link a low clocked Jaguar is keeping up with a much higher clocked Bay Trail (yes it boosts to 2.4GHz during the benchmark run), so the obvious conclusion is that Jaguar has far higher IPC than Bay Trail. For example Jaguar has 28% higher IPC than BT in the 7-zip test. Just like I said.

    Now show me a single benchmark where BT gets better IPC than Jaguar. Put up or shut up.
  • zeo - Wednesday, September 18, 2013 - link

    The point that BT Beats Jaguar, especially at performance per watt, clearly proved the point given!

    And insisting as you are on your original assessment is a characteristic of acting like a Troll... So you're not going to convince anyone by simply insisting on being right... especially when we can point to Anandtech pointing out multiple benchmarks in this article that showed the Kabini performing lower than bother BT and the A7!

    So either learn to read what these reviews actually post or accept getting labeled a Troll... either way, you're not winning this argument!
  • Wilco1 - Wednesday, September 18, 2013 - link

    No, Bob's claim was that Bay Trail was faster clock for clock than Jaguar, when the link he gave to prove it clearly showed that is false. BT may well beat Jaguar on perf/watt, but that's not at all what we were discussing.

    So next time try to understand what people are discussing before jumping in and calling people a Troll. And yes I stand by my characterization of various microarchitectures, precisely because it's based on actual benchmark results.
  • Bob Todd - Wednesday, September 18, 2013 - link

    IPC as a comparison point made a lot of sense when we were arguing about which 130 watt desktop processor had the better architecture. It seems largely irrelevant for mobile where we care about performance per watt. Your argument is continually that the ARM/AMD designs are 'faster' based on Geekbench. If Jaguar has a 28% higher IPC than Bay Trail, do you honestly think it matters if Bay Trail is still the faster chip @ 1/3 (or less) of the power requirements? If someone came up with a crazy design that needed 5x the clocks to have a 2x performance advantage of their competitor, but did so with half the power budget, they'd still be racking up design wins (assuming parity for all other aspects like price). That's a two way street. If ARM designs a desktop/server focused chip that needs higher clocks than Intel to reach performance parity or be faster than Haswell, but does so with significantly less power it's still a huge win for them.
  • Wilco1 - Wednesday, September 18, 2013 - link

    IPC matters as you can compare different microarchitectures and make predictions on performance at different clock speeds. I'm sure you know many CPUs come in a confusing variation of clockspeeds (and even different base/turbo frequencies for Intel parts), but the underlying microarchitecture always remains the same. You can't make claims like "Bay Trail is faster than Jaguar" when such a claim would only valid at very specific frequencies. However we can say that Jaguar has better IPC than BT and that will remain true irrespectively of the frequency. So that is the purpose of the list of microarchitectures I posted.

    I was originally talking about the performance of Apple A7 and Bay Trail in Geekbench. You may not like Geekbench, but it represents close to actual CPU performance (not rubbish JavaScript, tuned benchmarks, cheating - remember AnTuTu? - or unfair compiler tricks).

    Now you're right that besides absolute performance, perf/W is also important. Unfortunately there is almost no detailed info on power consumption, let alone energy to do a certain task for various CPUs. While TDP (in the rare cases it is known!) can give some indication, different feature sets, methodologies, "dial-a-TDP" and turbo features makes them hard to compare. What we can say in general is that high-frequency designs tend to be less efficient and use more power than lower frequency, higher IPC designs. In that sense I would not be surprised if the A7 also shows a very good perf/Watt. How it compares with BT is not clear until BT phones appear.
  • Bob Todd - Wednesday, September 18, 2013 - link

    Your point about benchmarks is actually what surprises me the most nowadays. The biggest thing every in-depth review of a new ARM design brings to light is how freaking piss poor the state of mobile benchmarking is from a software standpoint. I didn't expect magic by the time we got to A9 designs, but it's a little ridiculous that we're still in a state of infancy for mobile benchmarking tools over half a decade after the market really started heating up.
  • Bob Todd - Wednesday, September 18, 2013 - link

    And by "ARM design" I mean both their cores or others building to their ISA.
  • Wilco1 - Thursday, September 19, 2013 - link

    Yes, mobile benchmarking is an absolute disgrace. And that's why I'm always pointing out how screwed up Anand's benchmarking is - I'm hoping he'll understand one day. How anyone can conclude anything from JS benchmarks is a total mystery to me. Anand might as well just show AnTuTu results and be done with it, that may actually be more accurate!

    Mobile benchmarks like EEMBC, CoreMark etc are far worse than the benchmarks they try to replace (eg. Dhrystone). And SPEC is useless as well. Ignoring the fact it is really a server benchmark, the main issue is that it ended up being a compiler trick contest than a fair CPU benchmark. Of course Geekbench isn't perfect either, but at the moment it's the best and fairest CPU bench: because it uses precompiled binaries you can't use compiler tricks to pretend your CPU is faster.
  • akdj - Thursday, September 19, 2013 - link

    SO.....what is it the 'crew' is supposed to 'do'? NOT provide ANY benchmarks? Anand and team are utilizing the benchmarks available right now. They're not building the software to bench these devices...they're reviewing them...with the tools available, currently, NOW---on the market. If you're so interested in better mobile benchmarking (still in it's infancy---it's only really been 5 years since we've had multiple devices to even test), why not pursue and build your own benchmarking software? Seems like it may be a lucrative project. Sounds like you know a bit about CPU/GPU and SoC architecture---put something together. Sunspider is ubiquitous, used on any and all platforms from desktops to laptops---tablets to phones, people 'get it'. As well, GeekBench is re-inventing their benchmarking software---as well, the Google Octane tests are fairly new...and many of the folks using these devices ARE interested in how fast their browser populates, how quick a single core is---speed of apps opening and launching, opening a PDF, FPS playing games, et al.
    Again---if you're not 'happy' with how Anand is reviewing gear (the best on the web IMHO), open your own site---build your own tools, and lets see how things turn out for ya!
    Give credit where credit is due....I'd much rather see the way Anand is approaching reviews in the mobile sector than a 1500 word essay without benchmarking results because current "mobile benchmarking is an absolute disgrace"
    YMMV as always
    J

    PS---Thanks for the review guys....again, GREAT Job!
  • Bob Todd - Thursday, September 19, 2013 - link

    Umm...I think you missed my point. I love the reviews here. That doesn't change the fact that mobile benchmarking software sucks compared to what we have available on the desktop. That isn't a slam against this site or any of the reviewers, and I fully expect them to use the (relatively crappy) software tools that are available. And they've even gone above and beyond and written some tools themselves to test specific performance aspects. I'm just surprised that with mobile being the fastest growing market, nobody has really stepped up to the plate to offer a good holistic benchmarking suite to measure cpu/gpu/memory/io performance across at least iOS/Android. And no, I don't expect anyone at Anandtech to write or pay someone to write such a tool.

Log in

Don't have an account? Sign up now