The Intel Skylake Mobile and Desktop Launch, with Architecture Analysis
by Ian Cutress on September 1, 2015 11:05 PM ESTFinal Words
Based on our testing of Skylake-K, performance per clock did not impress as expected, but by going back several generations it gave an overall performance boost worthy of an upgrade from 3-to-5 year old platforms where more CPU performance might be a good thing to have. The chipset side of the equation is also blown wide open, with PCIe ports being able to provide any PCIe x4 or under idea through the chipset without having to share bandwidth.
The reasons for Skylake-K’s clock-for-clock performance relates to the fact that Skylake is a mobile first design, and Intel has focused a lot of resources in two areas. First is power delivery, by adding more power gating and finer frequency control direct to the hardware through Speed Shift, the processor can take advantage of unused parts as well as improving the power consumption of a non-steady state workload. Speed Shift does two things - it requires operating system support, but it allows the operating system to hand control of frequency adjustments back to the hardware within an OS specified range. As the processor is able to adjust its own frequency settings almost 30x faster than the operating system, it can respond to dynamic workloads such as those experienced in day-to-day use of a device faster than the OS. The second part of Speed Shift relates to the power states of a system, and the Skylake platform allows this hardware control to be substantially more granular than the high level P-states, allowing the onboard controller to determine the best efficiency mode to run the processor with respect to system power (which is also now tracked). When the system is in its most efficient state and performance is not a concern, Skylake can also induce a duty cycle for the processor cores, switching them on and off, to remain at the most efficient power state when performance is not a priority.
Also on the power side of things, the integrated graphics gets a boost here too. Intel has added a number of fixed function units, such as those for HEVC decode, to reduce power as well as more power gating and frequency domain differentiation. Generation 9 graphics also gets added RAW processing functionality for camera setups, and Multi Plane Overlay performs stretch/rotation and translation of parts of the desktop directly on the display controller rather than requiring trips out to main memory to waste power in doing so. Multi Plane Overlay is still in its infancy, but it is part of the graphics platform that Intel seems keen to improve over the years. All together, from a power perspective, it will be interesting to feel the claimed benefits of the new architecture.
The other avenue for Skylake’s design choices stems from the corporate or business aspect of Intel’s platform. At some point a form of the Skylake architecture will find its way into an extreme platform Xeon, which means the design must be able to cope from 4.5W in Skylake-Y up to a potential of 140W in a Xeon E5 (which is what currently exists on Haswell-EP). The best way to do this is by making the design more modular, adding in performance when necessary. While Intel has not been explicit about which segments are more modular than others, we have been indicated that the L2 cache and the movement from an 8-way associative design to a 4-way design is part of this. It is worth noting that this change also affords power benefits for Skylake, at a small expense in performance.
Also, as per our architecture analysis, despite the decrease in L2 associativity, CPU performance limited environments benefit from wider and deeper execution buffers, attempting to extract as much instruction parallelism as possible. Skylake extends this by providing a larger out-of-order window for micro-ops to queue into, providing larger buffers for in-flight data (both load and store), doubling of the L2 cache miss bandwidth, general improvements to the branch predicition to reduce unnecessary speculation, an increase in micro-op dispatch and retirement as well as new instructions for better cache management. As per our understanding, these under-the-hood changes will be seen as the Skylake platform develops.
From the processor perspective, Skylake affords an interesting dynamic. For the most part, the TDP ratings in the stack of processors remain similar to those in Broadwell, meaning that the 15W design for Broadwell should only need minor adjustments (new motherboard, new ports for new features) and still provide sufficient cooling but at better performance and/or more battery life. Despite not being directly released today, Intel is talking a lot about its eDRAM implementations, both as 4+3e (quad core, GT3 with 64MB) models and 4+4e (quad core, GT4 with 128MB) models.
The addition of Iris and Iris Pro graphics to Intel's 15W and 28W SKUs stands to significantly improve the state of graphics performance on ultrabooks and near-ultrabook form factors. Intel has offered processors with more powerful graphics since Haswell, although seemingly begrudgingly, withholding larger GPUs and eDRAM for the 45W SKUs. While OEMs will be paying more for these higher performance processors, these new Iris and Iris Pro configurations are practically a love letter to Apple, who has long desired more powerful iGPUs for their products and was a major instigator of the Iris lineup in the first place. We wouldn’t be the least bit surprised to see Apple jump on these new parts for the 13” MacBook Pro and the MacBook Air series, as they deliver just what Apple has been looking for.
This ends up being doubly beneficial for all concerned, as the actual working nature of the eDRAM has changed, making it more like a DRAM buffer with lower latency and transparent to software, but it is clear that Intel is releasing substantially more parts than it ever did on Haswell and Broadwell, with more to follow as time progresses. It is unclear if an eDRAM part will make the desktop however, though we might imagine a Skylake-H part with eDRAM in a future iMac in order to update the line.
Two new parts that will be constantly discussed over the next few months is the overclockable mobile processor, the i7-6820HK and the mobile Xeons. Firstly the overclockable processor is most likely only going to be in the large desktop replacement type of environments, coming out as a 45W part, but it will most likely end up in gaming systems where users want to add more heat and voltage to their gaming system. We might see some interesting laptop designs as a result of this. The mobile Xeon aspect of the release is Intel's first foray into Xeons specifically targeted at mobile, rather than a repurposed desktop processor in a bulky chassis. At this point, Intel is only launching a couple of E3 v5 models at 45W, suggesting that they will still end up in large workstation chassis with professional graphics cards, but it implements a step that Intel might take down into the notebook segment with both ECC and vPro and other Xeon features that translate well into business environments.
As always, if previous analysis has proven anything, architecture discussions can promise much on paper but where it really matters is if it translates through the devices into which the architecture is used. Implementations such as Speed Shift, fixed function units and an increase in power gating/frequency control should all impact a substantial segment of users, assuming all these functions have the correct operating support. Arguably how earth shattering or paradigm shifting Intel’s Skylake platform is can be up for discussion, but this week is the tech show IFA in Berlin where we expect a number of OEMs to start showing devices as well as processors actually going on sale. We will update with product testing when we can get our hands on them.
173 Comments
View All Comments
tipoo - Tuesday, September 1, 2015 - link
The bit of eDRAM on even ultrabook parts may be one of the more exciting bits of Skylake. Should bring baseline performance up significantly, even with half the eDRAM of the Pro 5200.tipoo - Tuesday, September 1, 2015 - link
That 72EU part also comes shockingly close to XBO GPU Gflop numbers, which, while not directly comparable, means integrated graphics will catch up to this gens consoles very soon.
RussianSensation - Wednesday, September 2, 2015 - link
But it's irrelevant in the real world for 5 reasons:1) Intel's best CPUs don't focus on IGP (i.e., i7-6600K, 6700K, 5820K-5960X) which means someone who is interested in gaming is buying a dedicated i5/i7, especially K series and going for a discrete graphics card.
2) Since we are discussing PC gaming, not console gaming, a budget gamer is going to be better off getting a lower end discrete GPU like the $90 GTX750Ti or even going on the used market and buying a $100 HD7970/GTX670, instead of trying to play games on Intel's 72 EU part.
3) Looking at historical pricing of Intel's parts with eDRAM, they'll probably cost almost as much as the Xbox One/PS4.
4) No one buys an Xbox One/PS4 because they want the best graphics. If you want that, you build a Core i7 + GTX980Ti SLI/Fury X CF system. People buy consoles for ease of use, to play online with their friends, and to have exclusives. In the areas the consoles excel, a 13-15" PC laptop with a 72 EU Intel part will fail miserably in comparison to the gaming experience one would get on a PS4/XB1 + large TV in the living room. Frankly, these 2 devices aren't competing with each other.
5) Overall cost of the device - a $300 Intel CPU is worthless without a motherboard, ram, SSD/HDD, keyboard, etc. That means trying to compare how fast an Intel's CPU with 72 EUs and EDRAM is vs. an Xbox One and PS4 and ignoring the Total System Cost is misleading.
I guarantee it that anyone interested in PC gaming could care less about Intel's IGP as any serious gamer will be getting a Skylake laptop with a Maxwell and next year a Pascal GPU.
HideOut - Wednesday, September 2, 2015 - link
No where in his comment did he mention same performance and cost. He was merely making an observation.Jumangi - Wednesday, September 2, 2015 - link
That Intels Integrated graphics can finally match a ten year old console? Big deal...IanHagen - Wednesday, September 2, 2015 - link
No, that it matches a console released last year.SunLord - Wednesday, September 2, 2015 - link
I doubt it they might be able to match the spec numbers but actual real world performance will likely still favor the console simply because of the specialized optimizations developers use on consoles vs the more generic options pc games are forced to use thanks to the near infinit hardware combinatiosn one can havetuxRoller - Sunday, September 6, 2015 - link
There's already a vulkan driver ready for Intel (on Linux) made by lunarg. That will allow for the optimizations needed if the developers care to make them.Jumangi - Wednesday, September 2, 2015 - link
Ahaha you think this thing will match an Xbox one? Wow the delusion is seriously big with some people. Also the cost of one of these high end Iris Pro CPUs alone will cost more than an entire console or a decent AMD laptop with still better graphics.BillyONeal - Wednesday, September 2, 2015 - link
Considering both the PS4 and XBO also use integrated graphics solutions from a couple years ago it isn't far fetched.