Intel Woodcrest, AMD's Opteron and Sun's UltraSparc T1: Server CPU Shoot-out
by Johan De Gelas on June 7, 2006 12:00 PM EST- Posted in
- IT Computing
The Official SPEC Numbers
SPEC FP and Int 2000 are the standard benchmarks to evaluate CPU performance. However, the benchmark numbers are highly dependant on the compiler. SPEC fp and Integer show the best case performance as the CPU runs on the aggressively compiled and highly optimized code. In the real world, code is compiled in a more conservative/less optimized way.
In practice this means that Intel's SPEC numbers - thanks to it's highly capable compiler team - are (slightly) higher than in real applications. Nevertheless, SPEC CPU 2000 is a good starting point to understand what a CPU is capable off. As mentioned earlier, the Xeon 5100 is the Xeon Woodcrest, based on the new core architecture.
The new Woodcrest is about 20-25% faster than the fastest dual-core Opteron. The 7% clockspeed advantage is most likely a result of the fact that the Woodcrest was baked with a newer 65nm process. If AMD manages to keep up with Intel when it comes to clockspeed, the advantage of their newest CPU might shrink to 15% or less. However, Intel's Woodcrest will have a much bigger advantage in all applications that make heavy use of 64 and 128-bit SSE.
When it comes to integer performance, the Woodcrest numbers are simply stunning and vastly superior to any other architecture. Let us find out if this vastly superior integer performance in SPEC Int 2000 pays off in server applications.
Latencies...
LMBench is a set of micro-benchmarks which can be helpful for determining memory latency and instruction latencies. We tested with LMBench 3.0a-5. It must be said that LMBench is usually right, but not always. If the benchmark is not aware of some of the particularities of a certain architecture, it can measure wrong values. So we have to double check if the values measured make sense.
The massive 4 MB L2 cache has an amazingly low latency of 14 cycles. This seems to be the worst case, as we have measured 12 cycles with other benchmarking tools such as ScienceMark. Nevertheless, even 14 cycles at 3 GHz is pretty amazing. The Core Duo, a.k.a. Yonah, accesses a shared cache that's half as large in 14 cycles at a substantially lower 2.33 GHz.
On the other hand, the memory latency very high; luckily the 4 MB L2 cache will minimize that effect. The problem seems to be the FB-DIMMs. The Advanced Memory Buffer introduces extra latency, and of course the registered DDR-2 533 chips with a CAS latency of 4 have a higher latency by themselves. This results in a memory subsystem with pretty high 115 ns latency, while the Opteron has access to the RAM in only 73 ns
ScienceMark didn't agree completely and reported about 65-70 ns latency on the Opteron system and 70-76 ns (230 cycles) on the Woodcrest system. We have reason to believe that Woodcrest's latency is closer to what LMBench reports: the excellent prefetchers are hiding the true latency numbers from Sciencemark. It must also be said that the measurements for the Opteron on the Opteron are only for the local memory, not the remote memory.
SPEC FP and Int 2000 are the standard benchmarks to evaluate CPU performance. However, the benchmark numbers are highly dependant on the compiler. SPEC fp and Integer show the best case performance as the CPU runs on the aggressively compiled and highly optimized code. In the real world, code is compiled in a more conservative/less optimized way.
In practice this means that Intel's SPEC numbers - thanks to it's highly capable compiler team - are (slightly) higher than in real applications. Nevertheless, SPEC CPU 2000 is a good starting point to understand what a CPU is capable off. As mentioned earlier, the Xeon 5100 is the Xeon Woodcrest, based on the new core architecture.
SPECfp | ||
Clockspeed | SPEC fp 2000 | |
POWER5+ | 2200 | 3271 |
Itanium 2 | 1666 | 2851 |
Xeon 5160 | 3000 | 2783 |
Opteron | 2800 | 2256 |
Pentium 4 E | 3733 | 2232 |
The new Woodcrest is about 20-25% faster than the fastest dual-core Opteron. The 7% clockspeed advantage is most likely a result of the fact that the Woodcrest was baked with a newer 65nm process. If AMD manages to keep up with Intel when it comes to clockspeed, the advantage of their newest CPU might shrink to 15% or less. However, Intel's Woodcrest will have a much bigger advantage in all applications that make heavy use of 64 and 128-bit SSE.
SPECint | ||
Clockspeed | SPEC Int 2000 | |
Xeon 5160 | 3000 | 3057 |
Pentium 4 E | 3733 | 1870 |
Opteron | 2800 | 1837 |
Pentium 4 Xeon | 3733 | 1813 |
POWER5+ | 2200 | 1705 |
Itanium 2 | 1666 | 1502 |
When it comes to integer performance, the Woodcrest numbers are simply stunning and vastly superior to any other architecture. Let us find out if this vastly superior integer performance in SPEC Int 2000 pays off in server applications.
Latencies...
LMBench is a set of micro-benchmarks which can be helpful for determining memory latency and instruction latencies. We tested with LMBench 3.0a-5. It must be said that LMBench is usually right, but not always. If the benchmark is not aware of some of the particularities of a certain architecture, it can measure wrong values. So we have to double check if the values measured make sense.
LMBench | |||||||
Clockspeed | L1 (ns) | L1 (cycles) | L2 (ns) | L2 (cycles) | RAM (ns) | RAM (cycles) | |
Xeon 5160 3 GHz | 3000 | 1.01 | 3 | 4.7 | 14 | 117.3 | 345 |
Pentium- M 1.6 GHz | 1593 | 2 | 3 | 6 | 10 | 92.1 | 147 |
Sun T1 1 GHz | 980 | 3 | 3 | 22.1 | 22 | 107.5 | 105 |
Opteron 275 | 2209 | 1 | 3 | 5.5 | 12 | 73 | 161 |
Xeon Irwindale 3.6 GHz | 3594 | 1 | 4 | 8 | 28 | 48.8 | 175 |
The massive 4 MB L2 cache has an amazingly low latency of 14 cycles. This seems to be the worst case, as we have measured 12 cycles with other benchmarking tools such as ScienceMark. Nevertheless, even 14 cycles at 3 GHz is pretty amazing. The Core Duo, a.k.a. Yonah, accesses a shared cache that's half as large in 14 cycles at a substantially lower 2.33 GHz.
On the other hand, the memory latency very high; luckily the 4 MB L2 cache will minimize that effect. The problem seems to be the FB-DIMMs. The Advanced Memory Buffer introduces extra latency, and of course the registered DDR-2 533 chips with a CAS latency of 4 have a higher latency by themselves. This results in a memory subsystem with pretty high 115 ns latency, while the Opteron has access to the RAM in only 73 ns
ScienceMark didn't agree completely and reported about 65-70 ns latency on the Opteron system and 70-76 ns (230 cycles) on the Woodcrest system. We have reason to believe that Woodcrest's latency is closer to what LMBench reports: the excellent prefetchers are hiding the true latency numbers from Sciencemark. It must also be said that the measurements for the Opteron on the Opteron are only for the local memory, not the remote memory.
91 Comments
View All Comments
duploxxx - Monday, June 19, 2006 - link
2 weeks have past the way, still no word from anand about the microsoft benches? (i recieved a coment that it should only take 1 week to finish......reason????? don't make us guess why you don't post these benches
severian64 - Tuesday, June 13, 2006 - link
I've be reading Anandtech articles for a long time and i have to say that this article is so biased that i think it should be retracted. I can't wait for the next pro intel article.The MySQL and Sun combination attained a result of 712.87 SPECjAppServer2004 JOPS@Standard running a 64-bit version of MySQL 5.0 and SJSAS 9.0 on Sun Microsystems' Sun Fire(TM) X4100 servers powered by Dual-Core AMD Opteron(TM) processors(1). The result demonstrates superb scalability of the whole solution, as compared to the previous result of 266 SPECjAppServer2004 JOPS@Standard that was achieved with Single-Core AMD Opteron processors (2). This solution also demonstrated the best database performance, measured in SPECjAppServer2004 JOPS@Standard per database core (SPECjAppServer2004 JOPS@Standard /DB core), of any competitive submission using less than 20 total cores in database and application tiers. MySQL's SPECjAppServer2004 JOPS@Standard /DB core metric surpassed an Oracle-powered result by over 30 percent (3).
MySQL Helps Set World Record in Java Application Server Benchmarks; High-Speed Open Source Software Blaze Past Proprietary Solutions
CUPERTINO, Calif.--(BUSINESS WIRE)--June 12, 2006--A popular application server benchmark, featuring a complete open source software stack with MySQL 5.0 database, the Solaris(TM) 10 Operating System, and Sun Java(TM) Systems Application Server 9.0 Platform Edition (Project GlassFish(SM)) has shattered the competition by offering up to 8.6 times lower cost of acquisition than the comparable solution (1,4), according to the benchmark test results published at http://www.spec.org/jAppServer2004/results/jAppSer...">http://www.spec.org/jAppServer2004/results/jAppSer....
Maintained by the Standard Performance Evaluation Corp. (SPEC(R)), the SPECjAppServer(R)2004 test is a recognized industry standard benchmark used to measure performance of Java EE application server platforms and each of the components that make up the application environment -- including hardware, database software, JDBC drivers, JVM software and the system network. It is designed to model a real-world automotive dealership application, including manufacturing, supply-chain management and an order/inventory system.
"Open Source software can provide dramatic benefits for enterprise IT applications - especially in terms of real performance and TCO," said Ethan O'Rafferty, director of Strategic Alliances for MySQL AB. "We are proud that Solaris 10 is an ideal deployment platform for MySQL 5.0."
The MySQL and Sun combination attained a result of 712.87 SPECjAppServer2004 JOPS@Standard running a 64-bit version of MySQL 5.0 and SJSAS 9.0 on Sun Microsystems' Sun Fire(TM) X4100 servers powered by Dual-Core AMD Opteron(TM) processors(1). The result demonstrates superb scalability of the whole solution, as compared to the previous result of 266 SPECjAppServer2004 JOPS@Standard that was achieved with Single-Core AMD Opteron processors (2). This solution also demonstrated the best database performance, measured in SPECjAppServer2004 JOPS@Standard per database core (SPECjAppServer2004 JOPS@Standard /DB core), of any competitive submission using less than 20 total cores in database and application tiers. MySQL's SPECjAppServer2004 JOPS@Standard /DB core metric surpassed an Oracle-powered result by over 30 percent (3).
ChuaChua - Saturday, June 10, 2006 - link
I'm confused about the charts.What are the numbers on the X and Y axes?
JohanAnandtech - Sunday, June 11, 2006 - link
"To interpret the graphs below precisely, you must know that the X-axis gives you the number of demanded requests and the Y-axis gives you the actual reply rate of the server. The first points all show the same performance for each server, as each server is capable of responding fast enough. "http://www.anandtech.com/IT/showdoc.aspx?i=2772&am...">http://www.anandtech.com/IT/showdoc.aspx?i=2772&am...
JohanAnandtech - Friday, June 9, 2006 - link
1. "you use workstaion/budget motherboard against the intel server board. use a sun galaxy or hp proliant. "No, we do not. We used the MSI K2-102A2M for ALL opteron testing except the one where we tested MySQL with Solaris as the serverworks chipset was not supported by Solaris x86.
Note that this server performed better than the HP DL385, which uses slower memory timings. Using the HP proliant would have resulted in slightly LOWER Opteron numbers not faster!
I don't get that a few people make a big fuzz about the MSI K8N Master2-FAR, as we only use it once, out of necessity as it worked under Solaris. Know that Solaris x86 supports only a limited amount of x86 hardware.
2. About our testing methods: yes, we use our own benchmarks. We'll add some industry standard benchmarks to the mix later. However, Industry benchmarks are what manufacturers optimize for, while our benches come straight out of the realworld, and are what real people are using. The same tests showed the Opteron beating the old Xeon by a pretty big margin, check our previous MySQL results. I don't see why now all of a sudden our tests should be changed
If you feel there are other issues, feel free. I will definitely try to answer any concerns you have.
duploxxx - Friday, June 9, 2006 - link
hmm thx for the reply. thats clear now. seeing all the reactions here on your review.it seems that the way its build up is far from structured and people do have problems reading it.
you didn't answer my question why some benches are single sock some dual sock... but i quess you are rather busy.
the way you talk as this core is the best thing (not released yet) against a new platform from competition that will be launched at same time... still does make me wonder but anyhow.
your own benchmarks and rather strange OS for benches (with personal tweaking) is still not relevant. Giving results on a far more used platform would be much nicer to compare... but i already have a good idea of the few benches you will be showing in this review on a wintel OS (let's hope i am wrong). youre benches on linux might be straight out of real world, but impossible to verify.
The only comparisson you make are 2 Spec benches and they were probably done also on that nice linux platform.... looking at the figures. but how comes that for one or another reason the opteron is clocked now at 2800 while you have 2400 and 2600 systems?
JohanAnandtech - Friday, June 9, 2006 - link
Ok, addressing the other issues:1. Why no dual socket, quadcore in some benches:
the reason is that with the LAMP tests we ended up with a limit in our httperf benchmark: it couldn't measure anything above more than 3000 req/s for some reason. So there is another bottleneck kicking in. So we avoided the bottleneck for now by not testing with quad core. This was happening on both the Opteron as the Woodcrest system.
2. The Gentoo numbers of the previous review that gained 9-10% was a comparison between 1 Dualcore and two dual single Core CPUS. Note that the same review shows only 38% performance increase from one to two CPUs.
3. A few people try to discredit this review in every way possible, I well aware of that. However, even though these benchmarks can't be repeated by other people for obvious reasons (the databases are not available to the public), the benches are in line with what other people have found.
http://www.mysqlperformanceblog.com/2006/06/08/int...">http://www.mysqlperformanceblog.com/200...el-woodc...
P. Zaitsev is one of the most respected people when it comes to MySQL performance and is head of performance tuning of MySQL.
ashyanbhog - Monday, June 12, 2006 - link
<Quote>The Gentoo numbers of the previous review that gained 9-10% was a comparison between 1 Dualcore and two dual single Core CPUS. Note that the same review shows only 38% performance increase from one to two CPUs.</Quote>Correct me if I am wrong. Follow the link below to Anandtech's own earlier benchmarks. Goto the last table on the page and check results of "Dual Dual Core 875" and "Dual Opteron 248" from 5 Concurrencies onwards. The increase is slight, but there is definetly no performance degradation. The earlier review too uses Opterons+Linux+MySQL+InnoDB, the same as this setup used. Why do you get totally different results sets this time around?
http://www.anandtech.com/IT/showdoc.aspx?i=2447&am...">http://www.anandtech.com/IT/showdoc.aspx?i=2447&am...
In the next page of the same review, DB2 shows fanastic gains from 5 Concurrencies onwards when going from Dual Cores to Four Cores. Check the first table under "Benchmarks IBM DB2: Single core versus Dual core". Note results of "Dual Opteron 2.2 hz" and "Dual Dual Core AMD 2.2 Ghz". This too is on linux. DB2 is definetly better suited than MySQL to reflect gains when moving from two core to four core setup.
Was publishing the obviously wrong MySQL results that you got this time necessary?
Despite choosing Gentoo, for the optmization capabaility, you have not chosen to publish the optimizations options used. Gentoo gurus would have verified that the optimizations for each of the processors where fairly chosen. This is not a allegation that you have taken sides, but why hold back some specific info when they are not secretive or proprietery in nature? And yes, I know you have used Gentoo in many previous benchmarks without specifiying the optimizations, but as this review seems to have become conterversial, it will help clear the air a bit. Specifying the options used when you have deviated from the default settings will surely increase credibility of review.articles.
Using a standard, preconfigured and widely available package like Debian, RHEL or SLES in their default settings was another option to ensure a neutral platform.
Cant comment about other parts of the review as I was only interested in the database performance
<Quote>A few people try to discredit this review in every way possible, I well aware of that.</Quote>
Why cant readers of Anandtech question the process used in a review? Afterall its their page views / site visit that brings the ad revenue. Readers tend to have limited time to go thru and comment if they want to. Exaggration may happen when we dont have time to express our doubts in detail. Your comment above was cheap shot.
Still, nice to see that Intel finally has something that can be compared to a Opteron. Good to have a choice, but the 3 year wait was too long.
And thanks for replying about the hardware used.
zsdersw - Monday, June 12, 2006 - link
Apparently you have no idea how long it takes to bring a totally new chip to market. This generally takes approximately 5 years.
BasMSI - Saturday, June 10, 2006 - link
Hahahaha, what balony.....Aceshardware didn't have problem reaching above 8000 requests on the Dual 844 back in 2003.
The article can be found here: http://www.aceshardware.com/read.jsp?id=60000279">>>Click here<<
And you where unable?
Get real.
If you don't know how to setup a server, then stay away from trying to do such.