AMD's Quad-Core Barcelona: Defending New Territory
by Johan De Gelas on September 10, 2007 12:15 AM EST- Posted in
- IT Computing
Software Rendering: zVisuel (32-bit Windows)
This benchmark is the zVisuel Kribi 3D test, which is exclusive to AnandTech.com and which simulates the assembly of an exclusive mechanical watch. The complete model is very detailed with around 300,000 polygons and a lot of texture, bump, and reflection maps. More than 1000 frames are rendered and the average FPS (frames per second) is reported. All this is rendered on the "Kribi 3D" engine, an ultra-powerful real-time software rendering 3D engine. That all this happens at reasonable speeds is a result of the fact that the newest AMD and Intel architectures contain four cores and can perform up to eight 32-bit FP operations per clock cycle and per core. The people of zVisuel told us that - in reality - the current Core architecture can sustain six FP operations in well optimized loops.
The 3D model of the benchmark in the middle of its assembly
Can the newest AMD architecture sustain the same amount of massive FP power? Eric Bron provided us with a benchmark which is based on real world use by a well know zVisuel client. The first benchmark does not use antialiasing
The tables are turning: while the newest AMD quad-core had to let the faster clocked Intel quad-core Xeon go in the LINPACK tests, it takes a small but still measurable lead in zVisuel. Notice that the Intel CPU has the advantage when it comes to raw processing power: it is about 19% faster in a single CPU configuration. Once you add a second CPU in both systems, that 19% lead is turned into a 3% advantage for AMD. Also note that a 2GHz Quad Opteron 2350 is about as fast as a dual 3GHz Opteron 2222 DC.
Be aware though that you need the Enterprise edition of Windows 2003 to see this kind of performance. The 32-bit Windows 2003 standard does not support NUMA and the bandwidth hungry AMD quad-core does not like that at all. Performance was up to 14% (!) lower, showing only 73 fps instead of 85 fps.
We performed the same benchmark, but with antialiasing applied. AA makes the application a bit more memory intensive. The AMD quad-core extends its lead from 3% to 5%, and a single 2GHz quad-core is now capable of even outperforming a 3.2GHz dual Opteron SE 2224.
The LINPACK and zVisuel benchmarks make it clear that Intel and AMD have about the same raw FP processing power (clock for clock), but that the Barcelona core has the upper hand when the application has to access the memory a lot.
This benchmark is the zVisuel Kribi 3D test, which is exclusive to AnandTech.com and which simulates the assembly of an exclusive mechanical watch. The complete model is very detailed with around 300,000 polygons and a lot of texture, bump, and reflection maps. More than 1000 frames are rendered and the average FPS (frames per second) is reported. All this is rendered on the "Kribi 3D" engine, an ultra-powerful real-time software rendering 3D engine. That all this happens at reasonable speeds is a result of the fact that the newest AMD and Intel architectures contain four cores and can perform up to eight 32-bit FP operations per clock cycle and per core. The people of zVisuel told us that - in reality - the current Core architecture can sustain six FP operations in well optimized loops.
The 3D model of the benchmark in the middle of its assembly
Can the newest AMD architecture sustain the same amount of massive FP power? Eric Bron provided us with a benchmark which is based on real world use by a well know zVisuel client. The first benchmark does not use antialiasing
The tables are turning: while the newest AMD quad-core had to let the faster clocked Intel quad-core Xeon go in the LINPACK tests, it takes a small but still measurable lead in zVisuel. Notice that the Intel CPU has the advantage when it comes to raw processing power: it is about 19% faster in a single CPU configuration. Once you add a second CPU in both systems, that 19% lead is turned into a 3% advantage for AMD. Also note that a 2GHz Quad Opteron 2350 is about as fast as a dual 3GHz Opteron 2222 DC.
Be aware though that you need the Enterprise edition of Windows 2003 to see this kind of performance. The 32-bit Windows 2003 standard does not support NUMA and the bandwidth hungry AMD quad-core does not like that at all. Performance was up to 14% (!) lower, showing only 73 fps instead of 85 fps.
We performed the same benchmark, but with antialiasing applied. AA makes the application a bit more memory intensive. The AMD quad-core extends its lead from 3% to 5%, and a single 2GHz quad-core is now capable of even outperforming a 3.2GHz dual Opteron SE 2224.
The LINPACK and zVisuel benchmarks make it clear that Intel and AMD have about the same raw FP processing power (clock for clock), but that the Barcelona core has the upper hand when the application has to access the memory a lot.
46 Comments
View All Comments
tshen83 - Monday, October 1, 2007 - link
according to mysql site, starting with 5.0.37, the mutex contention bug and the Innodb bug has been improved by a lot, which helps 8 core systems.I was wondering that since 5.0.45 is available on mysql's website, why isn't the latest mysql being benchmarked? 5.0.26 still has that bug, and you can see it in the benchmark where a 8 core system is slower than a 4 core which is slower than a 2 core.
Now that we are benchmarking 8-16 core systems, the newest versions of software should be used to reflect the improved multithreading.
swindelljd - Wednesday, September 12, 2007 - link
I currently have a 4 way 2.4ghz opteron as a production db server that I am considering upgrading. I'm trying to use the Anandtech benchmarks to help project how much performance gain we'll see in a new machine.We're running Oracle but are considering moving to MySQL. So I am trying to compare the stat's in 2 Anandtech reviews to see how the new Barcelona cores compare to the Intel Woodcrest and Clovertown.
In looking at this article from June 2006( http://www.anandtech.com/IT/showdoc.aspx?i=2772&am...">http://www.anandtech.com/IT/showdoc.aspx?i=2772&am... ) , 2x3ghz Woodcrests (4 cores, right?) run the MySQL test at about 950 QPS (queries per second) for 25,50 and 100 concurrent sessions.
However this recent article in September 2007 ( http://www.anandtech.com/IT/showdoc.aspx?i=3091&am...">http://www.anandtech.com/IT/showdoc.aspx?i=3091&am... ) appears to show the same 2x3ghz Woodcrests running 700,750 and 850 QPS for 25,50 and 100 connections respectively. That represents a 20% or so DECREASE in performance of the same chip in the last 12 months.
What am I missing?
Ultimately I want to compare the Opteron 2350 vs Xeon 5345 and then the Opteron 8350 vs Xeon E7330 but I'm starting with what exists for benchmarks first so I can make sure I understand what I am reading.
Can someone please help set me straight.
thanks,
John
JohanAnandtech - Monday, September 17, 2007 - link
The article in june 2006 uses 5.0.21, and there might also be a small change in tuning. The article in September 2007 uses the standaard 5.0.26 mysql version that you get with SLES 10 SP1.The best numbers are here:
http://www.anandtech.com/cpuchipsets/intel/showdoc...">http://www.anandtech.com/cpuchipsets/intel/showdoc...
The newest version 5.0.45 will give you performance like the above article: MySQL has incorporated the Patches we talked about (that Peter Z. wrote) in this new version.
Jjoshua2 - Tuesday, September 11, 2007 - link
I like this benchmark alot as I am a fan of computer chess. Higher was spelled wrong on the graph on that page in Hiher is better.Schugy - Tuesday, September 11, 2007 - link
Maybe it's too early for gcc optimizations but how about testing programs like oggenc, ffmpeg, blender, kernel compilation, apache with openssl, doom III and so on?erikejw - Monday, September 10, 2007 - link
I read another review and they got these scores on the slightly lowerspeed 1.9 GHz Barcelona.Barcelona 2347 (1.9Ghz)
37.5 Gflop/s
Intel Xeon 5150(2.6Ghz)
35.3 Gflop/s
It seems your Barcelona scores are way off for some reason.
The Xeons score is more or less identical.
This seems really weird. Normally the higher score is the correct one due to some bad optimizations. The rest of the article is great though.
kalyanakrishna - Monday, September 10, 2007 - link
This article seems to be very biased.1) they choose faster Intel processors, 2 GHz Opteron. There are 2 GHz processors available across all the processors used in this analysis.
2) No mention of what compiler was used. Intel compilers earlier had a trick, which was not documented - any code optimized for Intel processors if used on non-intel processors (uhm! AMD), would disable all optimizations. Who knows what else they are doing now. And this gentleman used Intel optimized code on AMD to test performance. Who in the right mind measuring performance would do that?
3) Intel MKL was used for BLAS. Shouldnt they use ACML for AMD code? Again, who would do that when looking for performance?
4) Memory Subsystem - knowing that the frequencies are different, why were all the results not normalized?
5) They managed to comment that Tulsa and Opteron 2000 series are half the performance of core or Barcelona and hence should not be considered in the first page. But in Linpack page, it is mentioned that Intel chips ate AMD ones for breakfast. Of course, they did - peak of Xeon 5100 series is twice that of Opteron 2000 series. You dont need LINPACK to tell you that. Gives a very biased impression.
6) LinPACK results graph could not be any more wrong. The peak performance of each CPU considered is different ... obviously their sustained performance is going to be different. The author should have at least made the effort to normalize the graph to show the real comparison.
7) Since when is Linpack "Intel friendly"
The author says they didnt have time to optimize code for AMD Opteron ... why would you do a performance study in the first place if you didnt have the methodology right.
I didnt even read beyind LinPACK .. I would be careful reading articles from this author next time and maybe the whole site ... Its sad to see such an immature article. Whats worse is majority of people would just see the "fact" Intel is still faster than AMD.
Over all, a very immature article with false information cleverly hidden behind numbers. or could it be that this article was intended to be biased .... who knows.
JohanAnandtech - Monday, September 10, 2007 - link
What about the bytes/Cycle in each table?
Why is that the "real comparison"? If Intel has a clockspeed advantage, nobody is going to downclock their CPUs to be fair to AMD.
First you claim we are biased. As we disclose that the binary that we run was compiled with Intel compilers targetting Core architecture, it is clear that the binary is somewhat Intel friendly.
It not wrong. It is incomplete and we admit that more than once. But considering AMD gaves us a few days before the NDA was over, it was impossible to cover all angles.
erikejw - Tuesday, September 11, 2007 - link
That is true in the desktop scene but I am sure you know that servers is about performance/price and performance/w. Prices will declinge and we don't know what the price is tomorrow. It is ok to compare against a similarly priced cpu but a comparison against a
same frequency cpu is very interesting too.
Your LINPACK score just seems obscure. Somewhat Intel friendly compiler? LOL. If the compiler is so great why is the gcc score I read on another review 30% higher with the Barcelona(with a 1.9 GHz CPU)? That is just ridiculous. I thought this review was about architechture and what it can perform and not about which compiler we use and if it is true that optimizations is turned off in then Intel compiler if it is an AMD cpu then the score is worthless and the comparison is severly biased.
JohanAnandtech - Tuesday, September 11, 2007 - link
Which review? Did they fully disclose the compiler settings?
If the Intel compiler did fool us and turned off optimisations, we will update the numbers.