Intel Readying 15-core Xeon E7 v2

by Ian Cutress on February 11, 2014 1:32 PM EST

Posted in
CPUs
Intel
Xeon

55 Comments | Add A Comment

55 Comments

Reports from ISSCC are coming out that Intel is preparing to launch a 15-core Xeon CPU. The 15-core model was postulated before Ivy Bridge-E launch, along with 12-core and 10-core models – the latter two are currently on the market but Intel was rather silent on the 15-core SKU, presumably because it harder to manufacturer one with the right voltage characteristics. Releasing a 15-core SKU is a little odd, and one would assume is most likely a 16-core model with one of the cores disabled – based on Intel’s history I doubt this core will be able to be re-enabled should the silicon still work. I just received the official documents and the 15 core SKU is natively 15-core.

Information from the original source on the top end CPU is as follows:

4.31 billion transistors
Will be in the Xeon E7 line-up, suited for 4P/8P systems (8 * 15 * 2 = 240 threads potential)
2.8 GHz Turbo Frequency (though the design will scale to 3.8 GHz)
150W TDP
40 PCIe lanes

Judging by the available information, it would seem that Intel are preparing a stack of ‘Ivytown’ processors along this design, and thus a range of Xeon E7 processors, from 1.4 GHz to 3.8 GHz, drawing between 40W and 150W, similar to the Xeon E5 v2 range.

Predictions have Ivytown to be announced next week, with these details being part of the ISSCC conference talks. In comparison to some of the other Xeon CPUs available, as well as the last generation:

Intel Xeon Comparison
	Xeon E3-1280 v3	Xeon E5-2687W	Xeon E5-2697 v2	Xeon E7-8870	Xeon E7-8890 v2
Socket	LGA1150	LGA2011	LGA2011	LGA1567	LGA2011
Architecture	Haswell	Sandy Bridge-EP	Ivy Bridge-EP	Westmere-EX	Ivy Bridge-EX
Codename	Denlow	Romley	Romley	Boxboro	Brickland
Cores / Threads	4 / 8	8 / 16	12 / 24	10 / 20	15 / 30
CPU Speed	3.6 GHz	3.1 GHz	2.7 GHz	2.4 GHz	2.8 GHz
CPU Turbo	4.0 GHz	3.8 GHz	3.5 GHz	2.8 GHz	2.8 GHz
L3 Cache	8 MB	20 MB	30 MB	30 MB	37.5 MB
TDP	82 W	150 W	130 W	130 W	155 W
Memory	DDR3-1600	DDR3-1600	DDR3-1866	DDR3-1600	DDR3-1600
DIMMs per Channel	2	2	2	2	3 ?
Price at Intro	$612	$1885	$2614	$4616	>$5000 ?

According to CPU-World, there are 8 members of the Xeon E7-8xxx v2 range planned, from 6 to 15 cores and 105W to 155W, along with some E7-4xxx v2 also featuring 15 core models, with 2.8 GHz being the top 15-core model speed at 155W.

All this is tentative until Intel makes a formal announcement, but there is clearly room at the high end. The tradeoff is always between core density and frequency, with the higher frequency models having lower core counts in order to offset power usage. If we get more information from ISSCC we will let you know.

Original Source: PCWorld

Update: Now I have time to study the document supplied by Intel for ISSCC, we can confirm the 15-core model with 37.5 MB L3 cache, using 22nm Hi-K metal-gate tri-gate 22nm CMOS with 9 metal layers. All the Ivytown processors will be harvested from a single die:

Ivytown Die Shot

The design itself is capable of 40W to 150W, with 1.4 GHz to 3.8 GHz speeds capable. The L3 cache has 15x 2.5MB slices, and data arrays use 0.108µm²cells with in-line double-error-correction and triple-error-detection (DECTED) with variable latency. The CPU uses three clock domains as well as five voltage domains:

Level shifters are placed between the voltage domains, and the design uses lower-leakage transistors in non-timing-critical paths, acheving 63% use in the cors and 90% in non-core area. Overall, leakage is ~22% of the total power.

The CPUs are indeed LGA2011 (the shift from Westmere-EX, skipping over Sandy Bridge, should make it seem more plausible), and come in a 52.5x51.0mm package with four DDR3 channels. That would make the package 2677 mm², similar to known Ivy Bridge-E Xeon CPUs.

CPU-World's list of Xeon E7 v2 processors come from, inter alia, this non-Intel document, listing the 105W+ models.

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

55 Comments

View All Comments

Kevin G - Sunday, February 16, 2014 - link
You need to define precisely what an actual SMP is. I would argue that the main attributes are a global memory address space, cache coherency between all sockets, only one instance of an OS/hypervisor is necessary to run across all sockets.

Also you apparently didn't read your link to the Register very well. To quote it "This is no different than the NUMAlink 6 interconnect from Silicon Graphics...". Note that SGI UV2000 uses NUMALink6 and according to your reference is an SMP machine. So please on your definition include why the SPARC M6 would be an SMP machine even though the SGI UV 2000 would not.

As for MPI, it is useful on larger SMP due to its ability to take advantage of memory locality in a NUMA system. It is simply desirable to run a calculation on the a core that resides closest to where the the variables are stored in memory. It reduces the number of links data has to move over before it is processed, thus improving efficiency. This idea applies to both large scale NUMA where the links are intersocket as well as clusters where the links are high speed networking interfaces. Using MPI provides a common interface the programmer regardless if the code is running on a massively parallel SMP machine or a cluster made of hundreds of independent nodes.

As for the IBM p795, you don't have to do any of the compiling, IBM has precompiled Redhat and Suse binaries ready to go. That goes outside of the point though, regardless of price, it is a large SMP server that can run Linux in an enterprise environment with full support. It meets your criteria for something you said did not exist. As for your thoughts on Linux not scaling past 32 sockets for business applications, IBM does list world records for SPECjbb2005 using a p795 and Linux: http://www-03.ibm.com/systems/power/hardware/bench...
Brutalizer - Sunday, February 23, 2014 - link
"...So please on your definition include why the SPARC M6 would be an SMP machine even though the SGI UV 2000 would not..."

The definition of SMP is not in the architecture or how the server is built or which cpu it uses. The definition of SMP, is if it can be used for SMP workloads, simple as that. And as SGI and ScaleMP - both selling large linux servers with 10.000s of cores say in my links: these SGI and ScaleMP servers are only used for HPC, and never used for SMP workloads. Read my links.

Simple as that, they say it explicitly "not for SMP workloads, only for HPC workloads".

I dont care if a cluster can replace a SMP server running SMP workloads - then that cluster is good for SMP workloads. But fact is, no cluster can run SMP workloads, they can only run SMP workloads.

If you have a counterexample of SGI or ScaleMP running SMP workloads, please post them here. That would be the first time a cluster can replace a SMP server.
Kevin G - Monday, February 24, 2014 - link
That does not address the architectural similarities between the SGI UV2000 and the SPARC M6 for what defines a big SMP. Rather you're attempting to use the intentionally vague definition of running SMP by merely running SMP style software. I fully reject that definition as a single socket desktop computer with enough RAM can run that software with no issue. Sure, it'll be slower than these big multisocket machines and the results maybe questionable as it has no real RAS features but it would work. I also reject the idea that clusters cannot run what you define as SMP workloads - enterprise scale applications are designed to run on clusters for the simple reason of redundancy. For example large databases run in at least pairs to cover possible hardware failure and/or the need to service a machine (and depending on the DB, both instances can be active but it is unwise to beyond 50% capacity per machine). Further more, these clusters have a remote replication to another data center in case of a local catastrophe. That'd be three or more instances in a cluster.

Thus I stand by my definition of what an SMP machine is: global memory space, cache coherency across multiple sockets and only one OS/hypervisor necessary across the entire system.

There are business class benchmarks for the SGI UV2000. It is number two in the SPECjbb2005 benchmark when configured with 512 cores ( http://www.spec.org/jbb2005/results/res2012q2/jbb2... ). (The Fijitsu M10-4S is 2% faster but it has double the core count to do so. http://www.spec.org/jbb2005/results/res2013q3/jbb2... )

You also have ignored the IBM p795 Linux benchmarks for SPECjbb2005 which falls into your SMP workload category. The p795 should fit anyone's definition of an SMP machine.

As for reading your links, I obviously have as I'm pulling quotes out of them that contradict your claims ( "This is no different than the NUMAlink 6 interconnect from Silicon Graphics, which implements a shared memory space using Xeon E5 chips..." http://www.theregister.co.uk/2013/08/28/oracle_spa... ) or have noticed that they're horrendously out of date from 2004 ( http://www.realworldtech.com/sgi-interview/6/ ).
mapesdhs - Wednesday, July 16, 2014 - link
Blah blah...

My rationale is simple: a "cluster" by definition does not have the low latency
required to function as a shared memory, single combined system. The UV 2000
does, hence it's not a cluster. I know people who write scalable code for 512+
cores, and that's just on the older Origin systems which are not as fast. There's
a lot of effort going into increasing code scalability, especially since SGI intends
to increase the max cores over 250K.

If you want to regard the UV 2000 as a cluster, feel free, but it's not, because
it functions in a manner which a conventional cluster simply can't: shared
memory, low latency RAM, highly scalable I/O. Clusters use networking
technologies, Infiniband, etc., to pass data around; the UV can have a single
OS instance run the entire system. Its use of NUMALink6 to route data
around the system isn't sufficient reason to call it a cluster, because NUMA
isn't a networking tech.

Based on the scalability of target problems, one can partition UV systems
into multiple portions which can communicate, but they still benefit from
the high I/O available across the system.

It's not a cluster, and no amount of posting oodles of paragraphs will
change that fact.

Ian.

PS. Kevin, thanks for your followup comments! I think at the time this article was
current, I just couldn't be bothered to read Brutalizer's post. :D
olderkid - Wednesday, February 19, 2014 - link
HP DL980

We have a couple running as virtualization boxes. We've also considered making them huge Oracle servers.
nathanddrews - Tuesday, February 11, 2014 - link
Cool!
wishgranter - Tuesday, February 11, 2014 - link
15 cores ? a bit weird in PC industry, its intel bug in their calculations or just they forget the 16th core ??? Or just because of thermal issues ??
iMacmatician - Tuesday, February 11, 2014 - link
The layout on the die is 3x5 so 15 cores makes sense.
Niloc2792 - Tuesday, February 11, 2014 - link
But can it run Crysis?
Conficio - Tuesday, February 11, 2014 - link
Typos?

In the text it says Turbo Frequency = 3.8, while in the table it says 2.8.

Also in the text it says source - CPU-World, while in the Feed at the end it says Original Source PCWorld.

Intel Readying 15-core Xeon E7 v2

Post Your Comment

55 Comments

View All Comments

Kevin G - Sunday, February 16, 2014 - link

Brutalizer - Sunday, February 23, 2014 - link

Kevin G - Monday, February 24, 2014 - link

mapesdhs - Wednesday, July 16, 2014 - link

olderkid - Wednesday, February 19, 2014 - link

nathanddrews - Tuesday, February 11, 2014 - link

wishgranter - Tuesday, February 11, 2014 - link

iMacmatician - Tuesday, February 11, 2014 - link

Niloc2792 - Tuesday, February 11, 2014 - link

Conficio - Tuesday, February 11, 2014 - link

Log in

Don't have an account? Sign up now