Intel Xeon 5570: Smashing SAP records (scoop!)
by Johan De Gelas on December 16, 2008 12:00 AM EST- Posted in
- IT Computing general
We have emphasized
it more than once: the Nehalem architecture is all about
regaining the performance crown in servers and HPC, desktop and mobile
use were sometimes a bonus, sometimes an afterthought. Today it becomes
almost painfully obvious. Just read Anand's
thoughts about the Core i7:
and now look at the graph below."The Core i7's general purpose performance is solid, you're looking at a 5 - 10% increase in general application performance at the same clock speeds as Penryn"
Intel
has apparantely allowed HP and Fujitsu-Siemens to break the NDA on the Xeon 5570 processor for PR reasons as both companies have published SAP
numbers on a Dual Xeon 5570. The Xeon 5570 is based on the same architecture as the Core i7. It is a 2.93 GHz quadcore CPU with 4 times a 256 KB L2-cache and one huge shared 8 MB L3.
The SAP
numbers are absolutely astonishing, as Intel's dual socket is able
to outperform quad socket opteron machines. Based
on the scaling of Barcelona, we speculate that a quad Shanghai at 2.7
GHz would obtain the performance of the Dual Xeon 5570 w/o HT.The new Xeon 5570 outperforms the "old" 5450 by 119%!!!
These
numbers are so high, that we checked and checked again. The database
used is the same (SQL Server 2005), so unless there is some incredible
tuning parameter that HP and FS have discovered and that we have yet to hear
about, that is not it.
At
this point we have no idea how it is possible that a 3 GHz Nehalem
outperforms the latest Opteron by a margin as high as 80% and
more. But we can give it a try. In a previous server oriented article, we summed up a rough profile of SAP S&D:
• Very parallel resulting in excellent scaling
• Low to medium IPC, mostly due to “branchy” code
• Not really limited by memory bandwidth
• Likes large caches
• Sensitive to Sync (“cache coherency”) latency
• Very parallel resulting in excellent scaling
• Low to medium IPC, mostly due to “branchy” code
• Not really limited by memory bandwidth
• Likes large caches
• Sensitive to Sync (“cache coherency”) latency
One
of the biggest bottlenecks for Intel has been the sync latency. It is
possible that once the "sync" bottleneck was removed, the intel architecture
is able to show it's real integer crunching power thanks to the out of
order loads (memory disambiguation) and better branch prediction.Those are two areas where the opteron architecture is still weak.
The
slightly lower latency of the L3-cache of Nehalem helps too. This kind
of software also makes the buffers fill up due to the long dependency chains.
Those OOO buffers have been increased and the depencency chains have been
shortened by a very low latency L2 cache and relatively fast L3.
Still
we are absolutely amazed that the difference is this large. We would
have expected Nehalem to outperform Shanghai by lower margins. Although
we still are a bit skeptical that the difference is this large ("too good to be true" syndrome), we do
not see how you could artificially inflate a SAP benchmark. It sure
is not as easy as SPECJBB or SPECfp/int.
28 Comments
View All Comments
amazi - Wednesday, December 17, 2008 - link
Now both web-page and pdf shows that FSC had HT on (16 threads). So you need to correct the chart.Wernte - Tuesday, December 16, 2008 - link
The numbers are certainly high, but I think it could be possible. Compared to Dual Opteron 8384, the new Xeon is about 67% faster clock-for-clock based on this benchmark, which isn't too out of whack considering all the changes made to eliminate the various system bottlenecks and HT, since Intel CPU itself (by that I mean capabilities of the CPU only, such as its wider execution core, etc) has always been more powerful than the AMD counterpart.If this is indeed true, though, it'd mean that Intel will wipe out AMD from their coveted 4 and 8 socket server market even with the new Opteron based on K10.5 architecture. Very scary...
JohanAnandtech - Tuesday, December 16, 2008 - link
I have been a hardware journalist for 10 years now, and I never seen this. A new CPU + platform doubles the performance over a previous one without: 1) Using new instructions 2) a newer process technology 3) large jump in clockspeed or 4) running a very exotic benchmark that stresses only a very small part of the CPU.TeXWiller - Tuesday, December 16, 2008 - link
Sufficient bandwidth of the POWER6 results very good scaling with SMT not only in SAP but Spec tests as well. Nehalem's increased bandwidth could be a reason for the good scaling with SMT in this case.Riek - Tuesday, December 16, 2008 - link
My guess would be that they screwed up the number of cores (dual = quad)... That would bring it down to expected gains and figures...Altough if the performance is indeed correct... The i7 based serverchips will be the fastest cpu's in the servermarket for a very long time... And that might be a very bad thing for AMD and the microprocs industry in general.
defter - Tuesday, December 16, 2008 - link
That's not possible, since quad socket Nehalem will not be available until H2 2009.Riek - Tuesday, December 16, 2008 - link
Since it appeared that HT was enabled i was not that wrong :')liuxue - Monday, April 12, 2010 - link
Customers will expect it to be in parity with what is offered breitling watches by AT&T and other GSM carriers. That said, the Journal story said the upgraded breitling watches GSM iPhone is being made by Taiwanese contract manufacturer Hon Hai Precision Industry, which produced Apple's breitling watch previous breitling watches iPhones. And the CDMA iPhone model is being made by Pegatron Technology.http://www.watchvisa.com/breitling-watches.html