Sensible Scaling: OoO Atom Remains Dual-Issue

The architectural progression from Apple, ARM and Qualcomm have all been towards wider, out-of-order cores, to varying degrees. With Swift and Krait, Apple and Qualcomm both went wider. From Cortex A8 to A9 ARM went OoO and then from A9 to A15 ARM introduced a significantly wider architecture. Intel bucks the trend a bit by keeping the overall machine width unchanged with Silvermont. This is still a 2-wide architecture.

At the risk of oversimplifying the decision here, Intel had to weigh die area, power consumption as well as the risk of making Atom too good when it made the decision to keep Silvermont’s design width the same as Bonnell. A wider front end would require a wider execution engine, and Intel believed it didn’t need to go that far (yet) in order to deliver really good performance.

Keeping in mind that Intel’s Bonnell core is already faster than ARM’s Cortex A9 and Qualcomm’s Krait 200, if Intel could get significant gains out of Silvermont without going wider - why not? And that’s exactly what’s happened here.

If I had to describe Intel’s design philosophy with Silvermont it would be sensible scaling. We’ve seen this from Apple with Swift, and from Qualcomm with the Krait 200 to Krait 300 transition. Remember the design rule put in place back with the original Atom: for every 2% increase in performance, the Atom architects could at most increase power by 1%. In other words, performance can go up, but performance per watt cannot go down. Silvermont maintains that design philosophy, and I think I have some idea of how.

Previous versions of Atom used Hyper Threading to get good utilization of execution resources. Hyper Threading had a power penalty associated with it, but the performance uplift was enough to justify it. At 22nm, Intel had enough die area (thanks to transistor scaling) to just add in more cores rather than rely on HT for better threaded performance so Hyper Threading was out. The power savings Intel got from getting rid of Hyper Threading were then allocated to making Silvermont an out-of-order design, which in turn helped drive up efficient use of the execution resources without HT. It turns out that at 22nm the die area Intel would’ve spent on enabling HT was roughly the same as Silvermont’s re-order buffer and OoO logic, so there wasn’t even an area penalty for the move.

The Original Atom microarchitecture

Remaining a 2-wide architecture is a bit misleading as the combination of the x86 ISA and treating many x86 ops as single operations down the pipe made Atom physically wider than its block diagram would otherwise lead you to believe. Remember that with the first version of Atom, Intel enabled the treatment of load-op-store and load-op-execute instructions as single operations post decode. Instead of these instruction combinations decoding into multiple micro-ops, they are handled like single operations throughout the entire pipeline. This continues to be true in Silvermont, so the advantage remains (it also helps explain why Intel’s 2-wide architecture can deliver comparable IPC to ARM’s 3-wide Cortex A15).

While Silvermont still only has two x86 decoders at the front end of the pipeline, the decoders are more capable. While many x86 instructions will decode directly into a single micro-op, some more complex instructions require microcode assist and can’t go through the simple decode paths. With Silvermont, Intel beefed up the simple decoders to be able to handle more (not all) microcoded instructions.

Silvermont includes a loop stream buffer that can be used to clock gate fetch and decode logic in the event that the processor detects it’s executing the same instructions in a loop.

Execution

Silvermont’s execution core looks similar to Bonnell before it, but obviously now the design supports out-of-order execution. Silvermont’s execution units have been redesigned to be lower latency. Some FP operations are now quicker, as well as integer multiplies.

Loads can execute out of order. Don’t be fooled by the block diagram, Silvermont can issue one load and one store in parallel.

 

OoOE & The Pipeline ISA, IPC & Frequency
Comments Locked

174 Comments

View All Comments

  • Jaybus - Monday, May 13, 2013 - link

    In the full Win 8 tablet market, I don't think any low power SoC is going to be adequate to compete against 13 W Ivy Bridge.
  • 1d107 - Tuesday, May 7, 2013 - link

    Did I miss memory bandwidth comparison with A6X? Will it support hi-res displays with acceptable performance? And by performance I mean not playing Angry birds on a so 1366x768 or even 1080p, but smooth scrolling and fast text rendering on a 3840x2400 screen. This would be cool for a descent Windows tablet with an external display attached.

    I'm afraid that by the time Silvermont is released and incorporated into actual products, Apple will have iPad 5 already shipping with A7X chip that will have twice the battery life, while maintaining better performance than A6X. They will need it for the iPad mini, but full-sized iPads will benefit also.
  • fteoath64 - Tuesday, May 7, 2013 - link

    One cannot know what the A7X can deliver but can take a couple of guesses. Here: 1) Optimise Swift further with pipeline shortening but still staying on A9 architecture, 2) Leap to A15 dual core with minimal optimization. On gpu side, it becomes more tricky as Pvr554 being used is Max out at 4 cores, they would have to either jack that up(6 cores ?) or jack up the clock rate.
    Remember that S800 and T4 products are yet to be announced so there is some time to watch the progression.
    Intel's key weakness here is STILL on gpu side. To put 3 cores of PVR 554 would eat a lot of power while giving it respectable performance. Going 1/4 HD4000 is just a dumb idea as the drivers are very bad and will remain so. Again too much power budget to slot in 8EU on SIlvermont quad.
    On thing is for sure: Silvermont is going to make a wicked NAS cpu!.
  • thunng8 - Wednesday, May 8, 2013 - link

    1) Swift is not A9 architecture.
    2) A7X will likely get the next generation PVR graphics chip (SGX Series 6 aka Rogue).
  • nunomoreira10 - Wednesday, May 8, 2013 - link

    considering the power budget, 1/4 hd4000 is quite good
    hd4000 consumes around 10w during games, 1/4 with clock cut down and power improvements we should expect 1-2w which is the max they could allow.
    drivers are good for the games normally played on tablets.
  • BSMonitor - Tuesday, May 7, 2013 - link

    Awesome review! This is the one we have been waiting for from Windows Phone / Windows Tablets!!

    Anand, is it the next Lumia that Intel has scored a design win?? x86 Windows 8 on a next gen Lumia??
  • warezme - Wednesday, May 8, 2013 - link

    Sounds like Intel is going hammer time on the mobile SOC arena. It's gonna get ugly but very interesting.
  • futbol4me - Wednesday, May 8, 2013 - link

    Can someone out there answer a few questions for me?

    (1) If Intel Atom powered tablet were running android, do APPS available on Google Play need to be recompiled for the platform?
    (2) Will a Windows8 Intel Atom powered tablet have enough horsepower to run android effectively as a Virtual Machine?

    Do you think there is enough
  • biertourist - Wednesday, May 8, 2013 - link

    To answer Question #2: Yes. Current Intel Atom tablets can run Android apps ala the "BlueStacks" app currently.
  • rootheday - Thursday, May 9, 2013 - link

    re #1, Android apps written in Dalvik/Java require no recompile because they are compiled against a virtual machine spec. Android apps written as "native" against ARM instruction set -> Intel has implemented a binary translation capability called Houdini that converts them to x86 on the fly and optimizes them in the background.

Log in

Don't have an account? Sign up now