The Intel Xeon 5670: Six Improved Cores
by Johan De Gelas on March 16, 2010 3:39 PM EST- Posted in
- IT Computing
The new Xeon “Westmere” 5600 series, has arrived. Basically an improved 32nm version of the impressive Xeon 5500 series “Nehalem” CPU. The new Xeon won’t make a big splash like the Xeon 5500 series did back in March 2009. But who cares? Each core in the Xeon 5600 is a bit faster than the already excellent performing older brother, and you get an extra bonus. You choose: in the same power envelope you get two extra cores or 5-10% higher clockspeed. Or if you keep the number of cores and clockspeed constant, you can get lower power consumption. The most thrifty quadcore Xeon is now specced at a 40W TDP instead of 60W.
The Westmere Die: an enlarged Nehalem. Trivia: Notice the unused space on the top left
Intel promises up to 40% better performance or up to 30% lower power. The Xeon 5600 can use the same servers and motherboards at the Xeon 5500 after a BIOS update, making the latter almost redundant. Promising, but nothing beats some robust independent benchmarking to check the claims.
So we plugged the Westmere EP CPUs in our ASUS server and started to work on a new Server CPU comparison. Only one real problem: our two Xeon X5670 together are good for 12 cores and 24 simultaneous threads. Few applications can cope with that, so we shifted our focus even more towards virtualization. We added Hyper-V to our benchmark suite, hopefully an answer to the suggestion that we should concentrate on other virtualization platforms than VMware. For those of you looking for Opensource benchmarks, we will follow up with those in April.
Platform Improvements
Westmere is more than just a die shrunk Nehalem. In this review we're taking a look at the Xeon X5670 2.93 GHz, the successor to the 2.93GHz Xeon X5570.
The most obvious improvement is that the X5670 comes with six instead of four cores, and a 12MB L3 cache instead of an 8MB cache. But there are quite a few more subtle tweaks under the hood:
- Virtualization : VMexit latency reductions
- Power management: An “uncore” power gate and support for low power DDR-3
- TLB improvements: Address Space IDs (ASID) and 1 GB pages
- Yet another addition to the already incredible crowded x86 ISA (AES_NI).
Just a few years ago, many ESX based servers used binary translation to virtualize their VMs. Binary translation used clever techniques to avoid transitions to the hypervisor. In the case of the Pentium 4 Xeons, using software instead of hardware virtualization was even a best practice. As we explained earlier in “Hardware virtualization: the nuts and bolts”, hardware virtualization can be faster than software virtualization so long as VM to hypervisor transitions happen quickly. The new Xeon 5600 Westmere does this about 12% faster than Nehalem.
Pretty impressive, if you consider that this makes Westmere switch between hypervisor and VM twice as fast as the “Xeon 5400” series (based on the Penryn architecture), which itself was fast. As the share of the VM-hypervisor-VM in hypervisor overhead gets lower, we don’t expect to see huge gains though. Hypervisor overhead is probably already dominated by other factors such as emulating I/O operations.
The Xeon 3400 “Lynnfield” was the first to get an un-core power gate (primarily the L3 cache). An un-core power gate will reduce the leakage power to a minimum if the whole CPU is in a deep sleep state. In typical server conditions, we don’t think this will happen often. Shutting down the un-core means after all that all your cores (even those at the other CPU) should be sleeping too. If only one core is even the slightest bit active, the L3-cache and memory controller must be working. For your information, we discussed server power management, including power gating in detail here.
The fact that Westmere's memory controller supports low power DDR3 might have a much larger impact on the your server’s power consumption. In a server with 32GB or more memory, it is not uncommon for the RAM power consumption to be about quarter of the total server power consumption. Moving to 40nm low power DDR3 drops DRAM voltage from 1.5V to 1.35V, which can make a big impact on that quarter of server power.
According to Samsung, 48 GB of 40nm low power DDR3 1066 should use on average about 28W (an average of 16 hour idle and 8 hours of load). This compares favorably with the 66W for the early 60nm DDR3 and the currently popular 50nm based DRAM which should consume about 50W. So in a typical server configuration, you could save – roughly estimated – 22W or about 10% of the total server power consumption.
AMD has more than once confirmed that they would not use DDR3 before low power DDR3 was available. So we expect this low power DDR3 to be quite popular.
There is more. The Xeon 5600 also supports more memory and higher clock speeds. You can now use up to two DIMMs at 1333MHz, while the Xeon 5500 would throttle back to 1066MHz if you did this. The Xeon 5500 was also limited to 12 x 16 GB or 192 GB. If you have very deep pockets, you can now cram 18 of those ultra expensive DIMMs in there, good for 288 GB of DDR3-1066!
Deeper buffers allow the memory controller of the Westmere to be more efficient: a dual Xeon 5670 reaches 43 GB/s while the older X5570 was stuck at 35 GB/s with DDR-3 1333. That will make the X5670 quite a bit faster than its older brother in bandwidth intensive HPC software.
40 Comments
View All Comments
Wireloop - Saturday, March 27, 2010 - link
After watching vApus' result for both Intel and AMD gear, the natural conclusion drawn is that Hyper-V is more optimized for the Opteron architecture than ESX since the latter achieves a lower Geometric Mean VM rate (on that platform).I guess it has something to do with maneuver of data into the L3 cache which is a critical condition for high multithreaded performance on the AMD platform. If so, my kudos to Microsoft.
mgbell - Friday, March 19, 2010 - link
Hey Anand,I think you should do set up a test pitting the Xeon line against their perspective i7 counterparts and run some workstation type tests. I would be very interested in any testing that had to do with video encoding/rendering. I am a video editor and would love to see a side by side comparison with a xeon sytem of the same speed against a core i7 system. Also just for fun turn off the second processor or turn it on so we can see what kinds of rendering benefits a second processor with 4/6 cores (8/12 threads) would gain.
Thanks
MB
lemonadesoda - Sunday, March 21, 2010 - link
I very much agree. It would be interesting to run a typical "enthusiast" or "workstation" application/benchmark just to see how it compares.I would like to see a Cinebench R10 comparison, a Everest PhotoWorxx, and a Fritz Chess Benchmark. Possibly a video encoding benchmark too.
A lot of enthusiasts run dual Xeons as workstations... you cant predict what software they will be running, but the above 3 tests are good general comparatives.
There are also servers providing other services like OCR or PDF generation. These Oracle database benchmarks are useful, but represent only one type of server/workstation use.
damianrobertjones - Thursday, March 18, 2010 - link
I'm sitting here at the end of and ADSL line with a fresh WIndows XP machine, all updates, new Kaspersky install.While waiting for an app to install I've visited this page....
Bang. Kaspersky popped up with a warning
Trojan downloader.java.agent.aw from www.googleadsenstats.ru/useralexey/files/gsb50.jar/Appletx.class
Do you have something against ie8 as this doesn't happen with Opera?
PLEASE MAKE YOUR SITE SAFE!
itsmeagain - Wednesday, March 17, 2010 - link
Any chance you could throw a couple of these in a mac pro and give us a preview?Shadowmaster625 - Wednesday, March 17, 2010 - link
The E5503 looks like the most reasonable and appealing server processor for those of us that live in the real world. Yet there are no benchmarks...Lukas - Thursday, March 18, 2010 - link
The 550x CPUs are crap. They don't have HyperThreading or TurboBoost. The only reason they exist is for a cheap entry price tag. If you don't need a lot of CPU (e.G. unvirtualized LOB software), better go with a 34xx series Xeon. A lot cheaper than the 55xx series.majortom1981 - Tuesday, March 23, 2010 - link
they also exist for government and public service contracts . We got a z600 with 4 gig ram ,1 5504 xeon, and an 80 gig 10k rpm enterprise sata drive (also nvida gpu) for $700. For just $239 i can add another 5504 .pvdw - Wednesday, March 17, 2010 - link
How come only Windows servers are being used. What about RHEL with a Tomcat or JBOSS bench (surely such exists).Lukas - Thursday, March 18, 2010 - link
Probably because the benchmarkers are not familiar with those platforms? Doing benchmarks on a platform about which you don't know enough will not give you any usable results.