Sensible Scaling: OoO Atom Remains Dual-Issue

The architectural progression from Apple, ARM and Qualcomm have all been towards wider, out-of-order cores, to varying degrees. With Swift and Krait, Apple and Qualcomm both went wider. From Cortex A8 to A9 ARM went OoO and then from A9 to A15 ARM introduced a significantly wider architecture. Intel bucks the trend a bit by keeping the overall machine width unchanged with Silvermont. This is still a 2-wide architecture.

At the risk of oversimplifying the decision here, Intel had to weigh die area, power consumption as well as the risk of making Atom too good when it made the decision to keep Silvermont’s design width the same as Bonnell. A wider front end would require a wider execution engine, and Intel believed it didn’t need to go that far (yet) in order to deliver really good performance.

Keeping in mind that Intel’s Bonnell core is already faster than ARM’s Cortex A9 and Qualcomm’s Krait 200, if Intel could get significant gains out of Silvermont without going wider - why not? And that’s exactly what’s happened here.

If I had to describe Intel’s design philosophy with Silvermont it would be sensible scaling. We’ve seen this from Apple with Swift, and from Qualcomm with the Krait 200 to Krait 300 transition. Remember the design rule put in place back with the original Atom: for every 2% increase in performance, the Atom architects could at most increase power by 1%. In other words, performance can go up, but performance per watt cannot go down. Silvermont maintains that design philosophy, and I think I have some idea of how.

Previous versions of Atom used Hyper Threading to get good utilization of execution resources. Hyper Threading had a power penalty associated with it, but the performance uplift was enough to justify it. At 22nm, Intel had enough die area (thanks to transistor scaling) to just add in more cores rather than rely on HT for better threaded performance so Hyper Threading was out. The power savings Intel got from getting rid of Hyper Threading were then allocated to making Silvermont an out-of-order design, which in turn helped drive up efficient use of the execution resources without HT. It turns out that at 22nm the die area Intel would’ve spent on enabling HT was roughly the same as Silvermont’s re-order buffer and OoO logic, so there wasn’t even an area penalty for the move.

The Original Atom microarchitecture

Remaining a 2-wide architecture is a bit misleading as the combination of the x86 ISA and treating many x86 ops as single operations down the pipe made Atom physically wider than its block diagram would otherwise lead you to believe. Remember that with the first version of Atom, Intel enabled the treatment of load-op-store and load-op-execute instructions as single operations post decode. Instead of these instruction combinations decoding into multiple micro-ops, they are handled like single operations throughout the entire pipeline. This continues to be true in Silvermont, so the advantage remains (it also helps explain why Intel’s 2-wide architecture can deliver comparable IPC to ARM’s 3-wide Cortex A15).

While Silvermont still only has two x86 decoders at the front end of the pipeline, the decoders are more capable. While many x86 instructions will decode directly into a single micro-op, some more complex instructions require microcode assist and can’t go through the simple decode paths. With Silvermont, Intel beefed up the simple decoders to be able to handle more (not all) microcoded instructions.

Silvermont includes a loop stream buffer that can be used to clock gate fetch and decode logic in the event that the processor detects it’s executing the same instructions in a loop.

Execution

Silvermont’s execution core looks similar to Bonnell before it, but obviously now the design supports out-of-order execution. Silvermont’s execution units have been redesigned to be lower latency. Some FP operations are now quicker, as well as integer multiplies.

Loads can execute out of order. Don’t be fooled by the block diagram, Silvermont can issue one load and one store in parallel.

 

OoOE & The Pipeline ISA, IPC & Frequency
Comments Locked

174 Comments

View All Comments

  • PolarisOrbit - Monday, May 6, 2013 - link

    Re: FSB
    Intel tried to get rid of the FSB several years ago, but it was seen as anti-competitive because they simultaneously locked out 3rd parties like Nvidia Ion. One lawsuit later, Intel was bound to keep the FSB in their low power architectures until 2013 for 3rd party support. Basically Intel wasn't playing fair and Nvidia burned their ship.
  • DanNeely - Tuesday, May 7, 2013 - link

    There was no usable FSB in anything beyond the first series of atom chips. The rest still had it within the die to connect the CPU with the internal northbridge; but the only external interface it offered was 4 PCIe2(?) lanes. ION2 connected to them; not to FSB.
  • Kevin G - Tuesday, May 7, 2013 - link

    Actually Intel is to keep PCI-e on their chips until 2016 by that anti-trust suit. This allows 3rd part IP, like nVidia's ION, to work with Intel's SoC designs.
  • tipoo - Monday, May 6, 2013 - link

    This makes me wonder if companies that make in-house SoCs (I guess Apple in specific, since Samsung also sells them to others while Apple just does it for themselves) will ever switch mobile devices to Intel if they just can't match the performance per watt of this and future Atom cores.
  • tipoo - Monday, May 6, 2013 - link

    Also won't the much anticipated SGX 600 series/Rogue be out by around then? That's the GPU that's supposed to take these mobile SoCs to the 200Gflop territory which the 360/PS3 GPUs are around.
  • xTRICKYxx - Tuesday, May 7, 2013 - link

    I would think Apple would (or any company) would want all of their software running on the same architecture/platform if they could.
  • R0H1T - Tuesday, May 7, 2013 - link

    And kill what a billion or so iDevices sold with incompatibility ? Me thinks you dunno what you're talking about !
  • CajunArson - Monday, May 6, 2013 - link

    Did somebody pay you to post that reply? Because if so, they aren't getting their money's worth.

    Silvermont Atoms are targeted at smartphones in 2-core configurations and tablets in the 4-core Baytrail configurations. Their power consumption is in a completely different league than even the low-end Temash parts. Let me reiterate: a Temash with a 4 watt TDP is going to have substantially higher real-world power consumption than even a beefy Baytrail and will likely only compete with the microserver Atom parts where Intel intentionally targets a higher power envelope.

    I'm sure you can't wait to post benchmarks of a Kabini netbook with a higher power draw than Haswell managing to beat a smartphone Atom as proof that AMD has "won" something, but for those of us on planet earth, these Silvermont parts are very interesting and we appreciate hard technical information on the architecture.
  • nunomoreira10 - Tuesday, May 7, 2013 - link

    Jaguar will be available on fanless designs wille haswell wont, you cant realy compare them.
    The facto is intel still doesn't hás a good enougf CPU for a good experiency on a legacy windows 8 fanless design, there is this big hole in the market that AMD is trying to seek.
  • raghu78 - Monday, May 6, 2013 - link

    Intel silvermont is the start of the Intelization of the mobile world. within the next 2 - 3 years Intel should have bagged Apple , Google or Samsung. with the world's best manufacturing process which is atleast 2 - 3 years ahead of other foundries and Intel's relentless tick - tock chip development cadence the ARM crowd is going to be beaten to a pulp. Qualcomm might survive the Intel juggernaut but Nvidia will not.

Log in

Don't have an account? Sign up now