Sensible Scaling: OoO Atom Remains Dual-Issue

The architectural progression from Apple, ARM and Qualcomm have all been towards wider, out-of-order cores, to varying degrees. With Swift and Krait, Apple and Qualcomm both went wider. From Cortex A8 to A9 ARM went OoO and then from A9 to A15 ARM introduced a significantly wider architecture. Intel bucks the trend a bit by keeping the overall machine width unchanged with Silvermont. This is still a 2-wide architecture.

At the risk of oversimplifying the decision here, Intel had to weigh die area, power consumption as well as the risk of making Atom too good when it made the decision to keep Silvermont’s design width the same as Bonnell. A wider front end would require a wider execution engine, and Intel believed it didn’t need to go that far (yet) in order to deliver really good performance.

Keeping in mind that Intel’s Bonnell core is already faster than ARM’s Cortex A9 and Qualcomm’s Krait 200, if Intel could get significant gains out of Silvermont without going wider - why not? And that’s exactly what’s happened here.

If I had to describe Intel’s design philosophy with Silvermont it would be sensible scaling. We’ve seen this from Apple with Swift, and from Qualcomm with the Krait 200 to Krait 300 transition. Remember the design rule put in place back with the original Atom: for every 2% increase in performance, the Atom architects could at most increase power by 1%. In other words, performance can go up, but performance per watt cannot go down. Silvermont maintains that design philosophy, and I think I have some idea of how.

Previous versions of Atom used Hyper Threading to get good utilization of execution resources. Hyper Threading had a power penalty associated with it, but the performance uplift was enough to justify it. At 22nm, Intel had enough die area (thanks to transistor scaling) to just add in more cores rather than rely on HT for better threaded performance so Hyper Threading was out. The power savings Intel got from getting rid of Hyper Threading were then allocated to making Silvermont an out-of-order design, which in turn helped drive up efficient use of the execution resources without HT. It turns out that at 22nm the die area Intel would’ve spent on enabling HT was roughly the same as Silvermont’s re-order buffer and OoO logic, so there wasn’t even an area penalty for the move.

The Original Atom microarchitecture

Remaining a 2-wide architecture is a bit misleading as the combination of the x86 ISA and treating many x86 ops as single operations down the pipe made Atom physically wider than its block diagram would otherwise lead you to believe. Remember that with the first version of Atom, Intel enabled the treatment of load-op-store and load-op-execute instructions as single operations post decode. Instead of these instruction combinations decoding into multiple micro-ops, they are handled like single operations throughout the entire pipeline. This continues to be true in Silvermont, so the advantage remains (it also helps explain why Intel’s 2-wide architecture can deliver comparable IPC to ARM’s 3-wide Cortex A15).

While Silvermont still only has two x86 decoders at the front end of the pipeline, the decoders are more capable. While many x86 instructions will decode directly into a single micro-op, some more complex instructions require microcode assist and can’t go through the simple decode paths. With Silvermont, Intel beefed up the simple decoders to be able to handle more (not all) microcoded instructions.

Silvermont includes a loop stream buffer that can be used to clock gate fetch and decode logic in the event that the processor detects it’s executing the same instructions in a loop.

Execution

Silvermont’s execution core looks similar to Bonnell before it, but obviously now the design supports out-of-order execution. Silvermont’s execution units have been redesigned to be lower latency. Some FP operations are now quicker, as well as integer multiplies.

Loads can execute out of order. Don’t be fooled by the block diagram, Silvermont can issue one load and one store in parallel.

 

OoOE & The Pipeline ISA, IPC & Frequency
Comments Locked

174 Comments

View All Comments

  • Jumangi - Monday, May 6, 2013 - link

    Let me know when Intel has something of actual substance to show and not just bunch of marketing/hype focused Powerpoint slides. ARM continues to delivers solid performance gains year after year with low power usage...Intel says yea we'll will get around to updating our 5 year old design...eventually we promise...yawn...
  • Krysto - Monday, May 6, 2013 - link

    Great point. Intel keeps promising how awesome they will be when they launch their new "mobile" chip, and at always it's ALWAYS disappointing, because in the mean time ARM chips keep shipping on their merry way, and keep improving. Fast.
  • A5 - Monday, May 6, 2013 - link

    Eh. A15 wasn't exactly a home run. The performance is good for what it is, but they overshot their TDP targets big time.
  • saurabhr8here - Monday, May 6, 2013 - link

    A15 wasn't a home run because it has been developed on an early bleeding edge technology. As the process technology matures and the design is optimized for the process, the power/performance numbers will improve.
  • DanNeely - Tuesday, May 7, 2013 - link

    A15's problem isn't overshooting TDP targets; it's that it was originally designed for use in entry level NASes and other similar level embedded systems/micro servers. A few extra watts for better CPU performance isn't a big problem there.
  • xTRICKYxx - Tuesday, May 7, 2013 - link

    Exactly. A15 was not initially designed for smartphones.
  • Wilco1 - Tuesday, May 7, 2013 - link

    That's not correct, ARM has said from the early announcements that it would go into mobiles at lower frequencies and core counts. Of course both core counts and frequencies turned out to be higher than originally expected, so power consumption is higher too. The Exynos 5250 appears to be released quickly in order to be first to market. The Octa core is far more tuned and will do better. NVidia has stated Tegra 4 uses 40% less power than Tegra 3 at equivalent performance levels.
  • Krysto - Monday, May 6, 2013 - link

    Let's do a recap. Performance is as high as Cortex A15...a chip launched in 2012.

    GPU performance is where iPad 4 was...in 2012.

    They are doing their benchmarks against last-gen ARM chips...okay.

    Intel Silvermont is expected late 2013/early 2014.

    Yeah...it's obviously so competitive! NOT.

    By the time Intel Silvermont arrives in smartphones (Merrifield), we will see 20nm ARMv8 chips in smartphones, already shipping. Good luck, Intel, another hit and a miss.

    As for what you said that Silvermont is conservative because they don't want to basically cannibalize Haswell - that's EXACTLY Intel's biggest problem right now. Their conflict of interest between the low-end, unprofitable Atom division, with the high-end very profitable Core division.

    This is exactly what killed their Xscale division, too. And it's what will kill Intel in the end. Because Intel will have to make Atom compete *whether they want to or not*. ARM chips are going to go higher and higher performance and become "good enough" for most everything. What is Intel going to do then? They'll have to keep up, which will slowly eliminate their *profitable* Core chips from the market. And what then? Survive on $20 chips with a dozen competitors? This is going to be very interesting for Intel in the next few years - and not in a good way, especially with a brand new CEO.
  • Kjella - Monday, May 6, 2013 - link

    It's been four months of 2013, how many quad-core ARM processors have launched since 2012? They're comparing against what is out now (if they were able to compare against unreleased ARM processors there'd be something very wrong) and beating them, not sure where your reading comprehension failed there. Looks to me like they're ready for a clash of the titans around year's end. Also 1-5W chips don't compete much with 15-85W Haswells no matter what, AMD is dying fast and people need their x86 computers so whatever. Reminds me of all the posts that say Windows is sooooooo dead.
  • xTRICKYxx - Tuesday, May 7, 2013 - link

    AMD is making a lot of money right now.

Log in

Don't have an account? Sign up now