Sensible Scaling: OoO Atom Remains Dual-Issue

The architectural progression from Apple, ARM and Qualcomm have all been towards wider, out-of-order cores, to varying degrees. With Swift and Krait, Apple and Qualcomm both went wider. From Cortex A8 to A9 ARM went OoO and then from A9 to A15 ARM introduced a significantly wider architecture. Intel bucks the trend a bit by keeping the overall machine width unchanged with Silvermont. This is still a 2-wide architecture.

At the risk of oversimplifying the decision here, Intel had to weigh die area, power consumption as well as the risk of making Atom too good when it made the decision to keep Silvermont’s design width the same as Bonnell. A wider front end would require a wider execution engine, and Intel believed it didn’t need to go that far (yet) in order to deliver really good performance.

Keeping in mind that Intel’s Bonnell core is already faster than ARM’s Cortex A9 and Qualcomm’s Krait 200, if Intel could get significant gains out of Silvermont without going wider - why not? And that’s exactly what’s happened here.

If I had to describe Intel’s design philosophy with Silvermont it would be sensible scaling. We’ve seen this from Apple with Swift, and from Qualcomm with the Krait 200 to Krait 300 transition. Remember the design rule put in place back with the original Atom: for every 2% increase in performance, the Atom architects could at most increase power by 1%. In other words, performance can go up, but performance per watt cannot go down. Silvermont maintains that design philosophy, and I think I have some idea of how.

Previous versions of Atom used Hyper Threading to get good utilization of execution resources. Hyper Threading had a power penalty associated with it, but the performance uplift was enough to justify it. At 22nm, Intel had enough die area (thanks to transistor scaling) to just add in more cores rather than rely on HT for better threaded performance so Hyper Threading was out. The power savings Intel got from getting rid of Hyper Threading were then allocated to making Silvermont an out-of-order design, which in turn helped drive up efficient use of the execution resources without HT. It turns out that at 22nm the die area Intel would’ve spent on enabling HT was roughly the same as Silvermont’s re-order buffer and OoO logic, so there wasn’t even an area penalty for the move.

The Original Atom microarchitecture

Remaining a 2-wide architecture is a bit misleading as the combination of the x86 ISA and treating many x86 ops as single operations down the pipe made Atom physically wider than its block diagram would otherwise lead you to believe. Remember that with the first version of Atom, Intel enabled the treatment of load-op-store and load-op-execute instructions as single operations post decode. Instead of these instruction combinations decoding into multiple micro-ops, they are handled like single operations throughout the entire pipeline. This continues to be true in Silvermont, so the advantage remains (it also helps explain why Intel’s 2-wide architecture can deliver comparable IPC to ARM’s 3-wide Cortex A15).

While Silvermont still only has two x86 decoders at the front end of the pipeline, the decoders are more capable. While many x86 instructions will decode directly into a single micro-op, some more complex instructions require microcode assist and can’t go through the simple decode paths. With Silvermont, Intel beefed up the simple decoders to be able to handle more (not all) microcoded instructions.

Silvermont includes a loop stream buffer that can be used to clock gate fetch and decode logic in the event that the processor detects it’s executing the same instructions in a loop.

Execution

Silvermont’s execution core looks similar to Bonnell before it, but obviously now the design supports out-of-order execution. Silvermont’s execution units have been redesigned to be lower latency. Some FP operations are now quicker, as well as integer multiplies.

Loads can execute out of order. Don’t be fooled by the block diagram, Silvermont can issue one load and one store in parallel.

 

OoOE & The Pipeline ISA, IPC & Frequency
Comments Locked

174 Comments

View All Comments

  • R0H1T - Tuesday, May 7, 2013 - link

    Let's see, umm Snapdragon 600 & then there's this soon to be released 800 ? So lemme get this straight, an unreleased product vs one that was available last year, Intel's latest(future indefinite) vs old/dated(relatively) from ARM seems fair to me !
  • ssiu - Monday, May 6, 2013 - link

    Exactly the 2 points I wonder about too:

    (1) GPU performance -- 1/4 of an HD4000, about iPad 4 level -- so slower than e.g. PowerVR Rogue which should come out around the same time

    (2) more importantly, even if Intel can make competitive/superior product, can it survive on such low margin?
  • zeo - Wednesday, May 8, 2013 - link

    Well, yes and no on point 1... The iPad is using a quad SGX544, and Rogue doesn't improve performance by that massive amount that a single Rogue/Series 6 could beat a quad Series 5. So it's not that Rogue will be better than the Bay Trail GMA but can scale higher with a multiple configuration!

    On the margins, Intel is lowering their costs moving to 22nm FAB and despite the declining PC market they're still doing well and so should be fine for the foreseeable future... They'll have to do terribly in all markets to really start hurting now and that's not likely yet...
  • andrewaggb - Monday, May 6, 2013 - link

    too early to say I think. This atom should be pretty good. if it's both twice as fast as the old atom and uses less power (which I believe is what they are trying to tell us), that's pretty good. It will be competing with 2nd gen a-15 designs or better, so the current performance claims are largely meaningless. GPU performance continues to be an issue, aiming for last years performance is definitely way too low. Fortunately gpu speed can normally be scaled more quickly than cpu speed, but intel seems to consistently underspec on gpu so I doubt they'll do better this time. Unless they go haswell style and have various different gpu skus. guess we'll see.

    Considering how much success rambus has had suing everybody I think if intel wanted to they could probably sue anybody working on advanced processor designs without sufficient licensing arrangements. Drive the minimum cost up a bit so the margins are higher.
  • R0H1T - Tuesday, May 7, 2013 - link

    This comment is hilarious ~ "gpu speed can normally be scaled more quickly than cpu speed" that's only if you're packing moar cores i.e. like SNB<IVB<<Haswell !

    GPU's cannot be scaled for performance unless there's some major redesigns of the underlying architecture, like AMD's transition to GCN, so unless you've got some insider info into how Intel plans to use their superior Iris(Pro) graphics in Silvermont I see this myth, about Intel's superior graphics, of yours being busted yet again, only this time in the mobile arena !
  • ominobianco - Monday, May 6, 2013 - link

    If you had actually read the article you would know that they are comparing against performance PROJECTIONS of competitors parts available at product launch time, NOT current parts.
  • zeo - Wednesday, May 8, 2013 - link

    Sorry but ARMv8 64bit aren't coming out till the later half of 2014 at the earliest and they're pushing to be on 16nm and not 20nm, which may delay them further!

    While there's no major improvements planned for ARM until then! Many of the original Cortex A15 SoC releases have been delayed from 2012 to 2013!
  • MrSpadge - Monday, May 6, 2013 - link

    Error: On page 1 you correctly write "Remember that power scales with the square of voltage". Almost immediately followed by "At 1V, Intel’s 22nm process gives ... or at the same performance Intel can run the transistors at 0.8V - a 20% power savings."
    Ouch - forgot that square!
  • dusk007 - Monday, May 6, 2013 - link

    I thought we would wait for 14nm for Intel to definitely pull ahead. This looks very promising.
    Now my perfect smartphone would sport a dual core Silvermont with a 4000mah battery, the HTC One camera and otherwise durable.
    GPU I don't care as long as it is good enough for the GUI I don't play games that would require something fast. Thin? Not at the cost of a smaller battery.
    I would love some feature phone like battery life. Triple what we have to deal with now would be incredible and possible it seems to me. Maybe the Motorola Phone X x86 Version can deliver that.
    Camera is secondary and I don't need a 1080p screen. Just 4.3-4.5" of 720p and long battery life.

    I feel like battery life is where this new generation can really promise new things. 32nm Atom already does really well in the tablets compared to quad core ARM competition. It will be a waste if they add 1500mah batteries though. I hope they finally realize as smartphones are mainstream that a lot of people would care first about battery life and second about 7mm thinness.
  • beginner99 - Tuesday, May 7, 2013 - link

    Agree. Current phones are too big, 1080p is pretty much useless and wastes battery life and even the GPU in Medfield is good enough for the GUI. The lower screen resolution of course helps too with needing a not so good GPU. But with both you save on power. I want a phone I need to charge once a week not every day.

Log in

Don't have an account? Sign up now