Original Link: http://www.anandtech.com/show/6508/the-new-opteron-6300-finally-tested



AMD unveiled their Opteron 6300 series server processors, code name Abu Dhabi, back in November 2012. At that time, no review samples were available. The numbers that AMD presented were somewhat confusing, as the best numbers were produced running the hard to assess SPECJbb2005 benchmark; the SPEC CPU2006 benchmarks were rather underwhelming.

Compared to an Opteron 6278 at 2.4GHz, the Opteron 6380 (2.5GHz) performed 24% better, performance per Watt improved by 40% according to AMD. In contrast, SPECint_Rate2006 improved by only 8%, and SPECfp_Rate2006 by 7%. However, it is important to note that SPECCPU2006 rates do not scale well with clockspeed. For example an 8% clockspeed (6380 vs 6376) only results in a 3.5% higher SPECint_Rate2006 and a 3% higher SPECfp_Rate2006. And the SPEC CPU 2006 benchmarks were showing the Interlagos Opteron at its best anyway. You can read our analysis here.

Both benchmarks have only a distant link to real server workloads, and we could conclude only two things. Firstly, performance per GHz has improved and power consumption has gone down. Secondly, we are only sure that this is the case with well optimized, even completely recompiled code. The compiler setting of SPEC CPU 2006, the JVM settings of Specjbb: it is all code that does not exist on servers which are running real applications.

So is the new Opteron "Abu Dhabi" a few percent faster or is it tangibly faster when running real world code? And are the power consumption gains marginal at best or measureable? Well, most of our benchmarks are real world, so we will find out over the next several pages as we offer our full review of the Opteron 6300.



SKUs and Pricing

Before we start with the benchmarks, we first have to check what you get for your money. Let's compare the AMD chips with Intel's offerings.

AMD vs. Intel 2-socket SKU Comparison
Xeon
E5
Cores/
Threads
TDP Clock
(GHz)
Price Opteron Modules/
Integer
cores
TDP Clock
(GHz)
Price
High Performance High Performance
2680 8/16 130W 2.7/3/3.5 $1723          
2665 8/16 115W 2.4/2.8/3.1 $1440 6386 SE 8/16 140W 2.8/3.2/3.5 $1392
2660 8/16 95W 2.2/ $1329          
2650 8/16 95W 2/2.4/2.8 $1107          
Midrange Midrange
          6380 8/16 115W 2.5/2.8/3.4 $1088
2640 6/12 95W 2.5/2.5/3 $885 6378 8/16 115W 2.4/2.7/3.3 $867
          6376 8/16 115W 2.3/2.6/3.2 $703
2630 6/12 95W 2.3/2.3/2.8 $639          
          6348 6/12 115W 2.8/3.1/3.4 $575
2620 6/12
95W
2/2/2.5 $406 6234 6/12 115W 2.6/2.9/3.2 $415
High clock / budget High Clock / Budget
2643 4/8 130W 3.3/3.3/3.5 $885          
2609 4/4 80W 2.4 $294 6320 4/8 115W 3.0/3.3/3.6 $293
2637 2/4 80W 3/3.5 $885 6308 2/4 115W 3.5 $501
Power Optimized Power Optimized
2630L 8/16 60W 2/2/2.5 $662 6366HE 8/16 85W 1.8/2.3/3.1 $575

We tested two AMD Opterons: the 6376 and the 6380. The 6380 competes against the octal-core 2GHz 2650, the 6376 targets the six-core 2630 at 2.3GHz. There is more than list prices of course. At the end of the day, most of us do not buy trays of processors, we buy server systems. As Dell's website is still the easiest to use, we configured very similar systems on the DELL US site. All systems include:

  • Two 500GB SATA drives
  • 64GB of 1600MHz RDIMMs
  • A PERC H700/710 with 512MB of NV RAM
  • iDRAC Express and all other "cheap" options (no OS, Single PSU...)

Below you can find the total price, when configuring such a system in the beginning of February 2013.

AMD vs. Intel System Price
Model CPU Memory Other Price
Dell R720 Dual Xeon E5-2630

8x8GB

Perc H710 512MB NV $5008
Dell R720 Dual Xeon E5-2660 8x8GB Perc H710 512MB NV $6778
Dell R715 Dual Opteron 6376 8x8GB Perc H700 512MB NV $4225
Dell R715 Dual Opteron 6380 8x8GB Perc H700 512MB NV $5339

The Intel based systems have a small advantage as they have two additional hard disk bays, but that difference can be ignored as that will hardly make the system significantly more expensive. The reason why we upgraded the R720 to an 8-bay chassis is that we wanted all the servers to have 2.5-inch bays and thus similar storage systems; 2.5-inch drives are now more common anyway.

A Dell R715 with a dual Opteron 6376 costs $500 less than a similarly configured Dell R720 with Dual Xeon E5-2630, despite the fact that the listed price of the Opteron is slightly higher. This might be a result of AMD offering larger discounts, but it's probably also a result of keeping the platform the same. As the Opteron 6100, 6200 and 6300 use the same socket and motherboard infrastructure, validation costs are very low for the OEMs.

If the Opteron 6376 can beat or even match the Xeon E5-2630 in performance/watt, it can offer a cost advantage. If the Opteron 6380 can come close to an E5-2660, it can offer a significant cost advantage. The latter Opteron must however defeat the E5-2630 clearly to be attractive to the server buyers. After all, most people buy AMD for a cost or performance bonus (preferably both).

We'll compare our new Opterons with two Xeon configurations: the Xeon 2660 and a Xeon 2660 with two cores disabled. To be competitive, the Opteron 6376 should beat the Xeon 2660 with two cores disabled. If the 6380 can offer about 90% of the performance of the 2660 and consume a similar amount of energy, it can become a very attractive alternative as well. So the goals are clear and set for the AMD Opterons. Let us see if they can pull it off.



Benchmark Configuration

Unfortunately, the Intel R2208GZ4GSSPP is a 2U server, which makes it hard to compare it with the 1U Opteron "Interlagos" and 1U "Westmere EP" servers we have tested in the past. We will be showing a few power consumption numbers, but since a direct comparison isn't possible, please take them with a grain of salt.

Intel's Xeon E5 server R2208GZ4GSSPP (2U Chassis)

CPU Two Intel Xeon processor E5-2660 (2.2GHz, 8c, 20MB L3, 95W)
RAM 64GB (8x8GB) DDR-1600 Samsung M393B1K70DH0-CK0
Motherboard Intel Server Board S2600GZ "Grizzly Pass"
Chipset Intel C600
BIOS version SE5C600.86B (01/06/2012)
PSU Intel 750W DPS-750XB A (80+ Platinum)

The Xeon E5 CPUs have four memory channels per CPU and support DDR3-1600, and thus our dual CPU configuration gets eight DIMMs for maximum bandwidth. The typical BIOS settings can be found below.

Not visible in the above image is that all prefetchers are enabled in all of the tests.

Supermicro A+ Opteron server 1022G-URG (1U Chassis)

CPU Two AMD Opteron "Abu Dhabi" 6380 at 2.5GHz
Two AMD Opteron "Abu Dhabi" 6376 at 2.3GHz
Two AMD Opteron "Bulldozer" 6276 at 2.3GHz
Two AMD Opteron "Magny-Cours" 6174 at 2.2GHz
RAM 64GB (8x8GB) DDR3-1600 Samsung M393B1K70DH0-CK0
Motherboard SuperMicro H8DGU-F
Internal Disks 2 x Intel MLC SSD710 200GB
Chipset AMD Chipset SR5670 + SP5100
BIOS version v2.81 (10/28/2012)
PSU SuperMicro PWS-704P-1R 750Watt

The same is true for the latest AMD Opterons: eight DDR3-1600 DIMMs for maximum bandwidth. You can check out the BIOS settings of our Opteron server below.

C6 is enabled, TurboCore (CPB mode) is on.

ASUS RS700-E6/RS4 1U Server

CPU Two Intel Xeon X5670 at 2.93GHz—6 cores
Two Intel Xeon X5650 at 2.66GHz—6 cores
RAM 48GB (12x4GB) Kingston DDR3-1333 FB372D3D4P13C9ED1
Motherboard ASUS Z8PS-D12-1U
Chipset Intel 5520
BIOS version 1102 (08/25/2011)
PSU 770W Delta Electronics DPS-770AB

To speed up benchmarking, we tested the Intel Xeon and AMD Opteron system in parallel. As we didn't have more than eight 8GB DIMMs, we used our 4GB DDR3-1333 DIMMs. The Xeon system only gets 48GB, but this isn't a disadvantage as our highest memory footprint benchmark (vApus FOS, 5 tiles) uses no more than 40GB of RAM. There is no real alternative as our Xeon has three memory channels and cannot be outfitted with the same amount of RAM as our Opteron 6300 or Xeon E5 system (four channels).

Common Storage System

For the virtualization tests, each server gets an Adaptec 5085 PCIe x8 card (driver aacraid v1.1-5.1[2459] b 469512) connected to six Cheetah 300GB 15000 RPM SAS disks (RAID-0) inside a Promise JBOD J300.

Software Configuration

All vApus testing is done on ESXi vSphere 5--VMware ESXi 5.1. All vmdks use thick provisioning, independent, and persistent. The power policy is "Balanced Power" unless otherwise indicated. All other testing is done on Windows 2008 Enterprise R2 SP1. Unless noted otherwise, we use the "High Performance setting" on Windows 2008 R2 SP1.

Other Notes

Both servers are fed by a standard European 230V (16 Amps max.) powerline. The room temperature is monitored and kept at 23°C by our Airwell CRACs. We use the Racktivity ES1008 Energy Switch PDU to measure power consumption. Using a PDU for accurate power measurements might seem pretty insane, but this is not your average PDU. Measurement circuits of most PDUs assume that the incoming AC is a perfect sine wave, but it never is. However, the Rackitivity PDU measures true RMS current and voltage at a very high sample rate: up to 20,000 measurements per second for the complete PDU.



Virtualization Performance: Linux VMs on ESXi

We introduced our new vApus FOS (For Open Source) server workloads in our review of the Facebook "Open Compute" servers. In a nutshell, it is a mix of four VMs with open source workloads: two PhpBB websites (Apache2, MySQL), one OLAP MySQL "Community server 5.1.37" database, and one VM with VMware's open source groupware Zimbra 7.1.0. Zimbra is quite a complex application as it contains the following components:

  • Jetty, the web application server
  • Postfix, an open source mail transfer agent
  • OpenLDAP software, user authentication
  • MySQL is the database
  • Lucene full-featured text and search engine
  • ClamAV, an anti-virus scanner
  • SpamAssassin, a mail filter
  • James/Sieve filtering (mail)

All VMs are based on a minimal CentOS 6 setup with VMware Tools installed. All our current virtualization testing is on top of the hypervisor which we know best: ESXi (5.0). We have changed two things in our vApusMark FOS setup: we upgradeded the guestOS from 5.6 to 6.0 and increased the number of vCPUs of the OLAP VM from 2 to 4. This small upgrade means that our latest results should not be compared to the results in our older articles. We test with four tiles (one tile = four VMs). Each tile needs nine vCPUs, so the test requires 36 vCPUs.

vApusMark FOS

For being just a minor update, the new Piledriver core does pretty well. Clock for clock performance goes up by 11%. The total performance gain (IPC+clock) is 20%, which is significant. The Opteron 6376 performs only 4% better than its direct competitor the E5-2630 (as the latter will perform very similar to our E5-2660 with 6 cores), but that is not bad at all: you get slightly better performance for a lower (server) price.

The top of the line 6380 cannot keep up with the best Xeons. Offering 86% of the more expensive Xeon E5-2660 is hardly a disaster, however. Note "maximum amount of affordable memory" is on top of many virtualization hosts shopping lists followed by price/performance. For those buyers, considering that a server based upon the Opteron cost less, the Opteron is once again a potent virtualization host if the power usage is similar.

With the lack of c-states, the Opteron 6174 did pretty poorly. The Opteron 6276 consumed a lot less at idle than its predecessor, but consumed a lot more when pressured to perform at high load. So we were very keen to learn whether AMD has improved power consumption too. Did AMD finally get that part right?



Measuring Real-World Power Consumption

The Equal Workload (EWL) version of vApus FOS is very similar to our previous vApus Mark II "Real-world Power" test. To create a real-world “equal workload” scenario, we throttle the number of users in each VM to a point where you typically get somewhere between 20% and 80% CPU load on a modern dual CPU server. The amount of requests is the same for each system, hence "equal workload".

The CPU load is typically around 30-50%, with peaks up to 65% (for more info see here). At the end of the test, we get to a low 10%, which is ideal for the machine to boost to higher CPU clocks (Turbo) and race to idle. We use the "Balanced" power policy and enable C-states as the current ESXi settings make poor use of the C6 capabilities of the latest Opterons and Xeons.

vApus FOS EWL Power consumption

We cannot say "mission accomplished", but AMD has made significant progress. 12% to 20% better performance while decreasing the power consumption by 6% to 8% is pretty good. The 95W TDP Xeons are still the performance per Watt champs though. Still, it looks like the Opteron is a decent alternative for some. Power consumption is about 12-13% higher (6376 vs E5-2660), but the performance per dollar is slightly better.



SAP S&D

The SAP S&D 2-Tier benchmark has always been one of my favorites. This is probably the most real world benchmark of all server benchmarks done by the vendors. It is a full blown application living on top of a heavy relational database. And don't forget that SAP is one of the most successful software companies out there, the undisputed market leader of Enterprise Resource Planning.

SAP is thus an application that misses the L2 cache much more than most applications out there, with the exception of some exotic HPC apps. We made an in depth profile of SAP S&D, but here is the summary:

  • The application has very low instruction level parallelism (ILP) and as a result is not taxing the integer units much (IPC = 0.3-0.55, SPECint 2006: >1) .
  • SAP misses the L2 cache much more than most applications out there (4 to 10 times more than SPECint2006 apps)
  • The application has a relatively large but "prefetcheable" instruction footprint, which allows the prefetchers to reduce the instruction related cache misses
  • The application has a massive and random data footprint, putting great pressure on the load subsystem. As a result the out of order engine has to hide the latency the best it can, and large ROB and load buffers help a lot. The latency of the memory subsystem matters.

SAP Sales & Distribution 2 Tier benchmark

The new Opteron does not boost SAP performance. A 6% clock increase translates into a 5% performance increase. As we discussed previously, SAP is one of the few complex server applications where the "Interlagos" Opteron performs a lot better than its predecessor. The application does not seem to benefit from any of the small improvements that the Piledrive core offers. Or maybe HP's benchmark team did not spend much time on this particular benchmark. Since the HP score is the only Interlagos score available, we have no other option than to wonder which of the two options is the closest to the truth.

Not that it matters much: the best SAP servers are Xeon E5 based. In this market of expensive consulting and software, $500 dollar savings on hardware is peanuts. So people tend to go for the best performance, and the Xeon E5 are clearly better at delivering raw SAP performance.



Java Server Performance

The SPECjbb®2013 benchmark is based on a " usage model based on a world-wide supermarket company with an IT infrastructure that handles a mix of point-of-sale requests, online purchases and data-mining operations". It uses the latest Java 7 features, makes use of XML, compressed communication and messaging with security.

Benchmark architecture diagram

We tested with four groups of transaction injectors and backends. We applied a relatively basic tuning to mimic real world use.

"-server -Xmx4G -Xms4G -Xmn1G -XX:+AggressiveOpts -XX:+UseLargePages-server -Xmx4G -Xms4G -Xmn1G -XX:+AggressiveOpts -XX:+UseLargePages"

With these settings, the benchmark takes about 40GB of RAM.

SPECJBB2013 max-jops

Since SPECJBB®2013 is very new, we will research the benchmark in more detail later. The first results are very interesting though. Notice how one Opteron 6380 edges out the Xeon 2660. Once we double the amount of CPUs, the Xeon outperforms the best Opteron by 17%. The fact that each Opteron processor is a dual NUMA node is not helping the Opteron. It is clear that the single die or "native octal-core" approach scales better here (for now).

SPECJBB®2013 is a registered trademark of the Standard Performance Evaluation Corporation (SPEC).

 



Rendering Performance: Cinebench

Cinebench, based on MAXON's CINEMA 4D software, is probably one of the most popular benchmarks around as it is pretty easy to perform this benchmark on your own home machine. The benchmark supports 64 threads, more than enough for our 24- and 32-thread test servers. First we test single-threaded performance, to evaluate the performance of each core.

Cinebench 11.5 Single threaded

Cinebench achieves an IPC between 1.4 and 1.8 and is mostly dominated by SSE2 code. The new Opteron is clock-for-clock about 3% more efficient. Let's check out the multi-threaded score.

Cinebench R11.5

The Opteron 6300 is about 6% faster than its predecessor at the same clockspeed. People in the rendering market tend to go for the best and still affordable performance. A few hundred dollars more can easily be recouped if your rendering is finished earlier. This remains Intel territory.



3DS MAX 2013

Our previous benchmark, the "architecture" scene that is included in the SPEC APC 3DS Max 2007 test, was getting way too old. So we decided to switch to the "Trol_cleric29_max2010" scene while upgrading to 3DS MAX 2013. We render at 1080p (1920x1080) resolution and measure the time it takes to render the first three frames (from 0 to 2). The 64-bit version of 3DS Max 2013 runs on top of 64-bit Windows 2008 R2 SP1. All results are reported as rendered images per hour; higher is thus better.

3DS Max 2013

The results are pretty chaotic at first sight. But the numbers are correct and can be verified by a third party or by yourself for that matter. Let us try to make sense out of this.

First of all, we used the NIVDIA Mental Ray renderer, which despite the "NVIDIA" part in its name is still a CPU only renderer. Secondly, the new benchmark is better than the old one: most of the time all cores are working at very high CPU load: typically 96% and more. However we noticed that without Hyper-Threading and CMT, the CPUs are able to turbo longer and at higher clockspeeds and there are small periods of single threaded action. These two facts together probably explain why disabling Hyper-Threading or CMT improves performance by 20% and more.

Cinebench reports that the Xeon 2660 is 20% faster than the Opteron 6380. In the 3DS Max, the Xeon is up to 77% faster. The new Mental Ray engine seems to be extremely well optimized for the Intel architectures and underperforms on the AMD architecture.

At the end of the day, it is clear that Intel has a huge advantage here, but also that this market is shifting more and more to GPU rendering. This is out of the scope of this article, but many people in the rendering business are using GPU accelerated rendering thanks to NVIDIA's iray renderer. CPU + GPU rendering with iRay seems to outperform Mental Ray in almost all scenes except those with relatively simple lighting, so combining an Intel E5 Xeon with a fast GPU is the best option.



LS-DYNA

LS-DYNA is a "general purpose structural and fluid analysis simulation software package capable of simulating complex real world problems", developed by the Livermore Software Technology Corporation (LSTC). It is used by the automobile, aerospace, construction, military, manufacturing and bioengineering industries. Even simple simulations take hours to complete, so even a small performance increase results in tangible savings. Add to this the fact that many of our readers have been asking that we perform some benchmarking with HPC workloads and we have reasons enough to include our own LS-DYNA benchmarking.

These numbers are not directly comparable with AMD's and Intel's benchmarks as we did not perform any special tuning besides using the message passing interface (MPI) version of LS-DYNA (ls971_mpp_hpmpi) to run the LS-DYNA solver to get maximum scalability. This is the HP-MPI version of LS-DYNA 9.71. Our first test is a refined and revised Neon crash test simulation.

LS-Dyna Neon-Refined Revised

The second test is the "Three Vehicle Collision Test" simulation, which runs a lot longer.

LS-Dyna Three Vehicle Collision Test

Both test paint a similar picture. The new Opteron 6376 is 5% to 7% faster than the Opteron 6276. The best AMD Opteron (6380) is about 16% faster than the previous one (6276). Not bad at all, but HPC buyers are typically categorized as either going after top performance or searching for the best performance per dollar.

The first category will go after the best Xeon E5s like the Xeon E5-2690 or the 2670 (2.6GHz, 115W) if the former's power usage is too high to fit in the dense server chassis. The second category can get 10% higher performance (E5-2660 vs 6380) for a few hundred dollars more. It is close, but it is probably not convincing enough to go for AMD. Most professional buyers need a bigger incentive before they will choose the underdog over the market leader.

HPC people are less concerned about energy consumption, but even HPC data centers run into cooling and energy supply limitations. Next stop, high performance energy consumption measurements.



LS-DYNA Power Consumption

For HPC buyers, peak power tends to be a very important metric. As HPC systems are run at close to or equal to 100% CPU load, the energy consumption is at its peak for a long time. Peak power thus also determines the cooling and energy requirements. This is in sharp contrast with most other servers, where calculating the power and amps based on the peak load of a complete rack is considered wasteful as it is highly unlikely that all servers will hit 100% CPU load at the same time. We took the 95th percentile of our power numbers.

LS-DYNA Peak Power consumption

Note that the Xeon E5 numbers are not directly comparable to the Opteron numbers as the CPUs are tested in servers with different form factors. We will tackle that in the next test. Let us focus on the Opteron results for now.

AMD has made some real progres here. At the same clock, the total power consumption is 6% lower. Even at a 200MHz higher clock the peak power is very slightly—but consistently—lower (2%).

Of course, we also want to compare the AMD and Intel CPUs directly. To do this, we always run the fans at maximum speed. That way, the fans always consume the same amount of power. We then test with one and two CPUs, while keeping the amount of memory (64GB) the same. This way we measure how much extra power you consume at the wall when you add a second CPU. This number thus includes the voltage regulators (which can amount to up to 10% of the total server power) and the PSU inefficiency.

LS-DYNA Peak CPU Power consumption

The Intel Xeon has a TDP of 95W, but even with a very FP intensive application it does not get anywhere near that number. About 75W out of those 94W are consumed by the CPU, as measured by our Hardware Monitoring Software that reads out the MSRs. We are still working on our version for the AMD platform (AMD's documentation is a bit late), but we estimate that the Opteron 6376 consumes about 110W and the Opteron 6380 needs about 120W. That means that AMD's top CPUs are probably consuming a bit more than their TDP indicates if you push the FP unit hard.

We also tried to measure idle power. Take the numbers with a grain of salt, but we measured about 19-20W for the Opteron 6380 (p-states disabled), 17-18W for the Opteron 6376 and 16-17W for the Xeon.



TrueCrypt 7.1 Benchmark

TrueCrypt is a software application used for on-the-fly encryption (OTFE). It is free, open source and offers full AES-NI support. The application also features a built-in encryption benchmark that we can use to measure CPU performance. First we test with the AES algorithm (256-bit key, symmetric).

TrueCrypt AES

We also test with the heaviest combination of the cascaded algorithms available: Serpent-Twofish-AES.

TrueCrypt AES-Twofish-Serpent

Intel has the fastest implementation when you are using a simple AES encryption or decryption; the Xeon 2660 is more than 30% faster than the best Opterons. This aspect of the architecture has not been improved in the "Piledriver" core.

However, when we apply the slowest encryption algorithms, Twofish and Serpent, the Piledriver core is no less than 4% to 6% faster than the older Bulldozer core clock per clock. The total speedup (6380 vs 6276) is no less than 13% to 15%.

We are no experts in encryption/decryption algorithms, so we cannot explain these differences. However, it is important to realize that these benchmarks are synthetic. In the best case, encryption will determine a few tens of percent of the total performance of a website.



7-Zip 9.2

7-zip is a file archiver with a high compression ratio. 7-Zip is open source software, with most of the source code available under the GNU LGPL license. The benchmark uses the LZMA method.

7-zip

LZMA compression speed depends more on memory latency. The Xeon has a lower memory latency thanks to its fast caches. But that does not explain why the Opteron does well in decompressing but completely falls behind in the compression part of the benchmark. We tried something new.

7-zip, Dual vs Single CPU

The Xeon E5 is indeed better at compression than the Opteron 6300, but not as much as the previous benchmark would make us believe: the difference is about 19%. The Opteron 6380 is 3% faster in decompression. It appears 7-zip doesn't scale well at all on AMD with additional CPU sockets

So yes, the new and improved Sandy Bridge branch predictor is probably the reason why the Xeon E5 handles compression a lot more efficiently (clock for clock about 35%). But it is not the main reason. Add another Xeon, and your compression performance improves by 78%. Add another Opteron and you get a meagre 12% improvement! Our best guess is that 7-zip does not handle the fact that the Opteron is a quad NUMA node system very well. We have seen similar behavior in the Euler3D HPC test.



Conclusions

For those that prioritize performance/watt or performance/dollar and for the CPU enthusiasts, we've summarized our findings in a comparison table. We made four columns for easy comparison:

  • In the first column, we compare the fastest Opteron with Intel's best offering. The closer the AMD Opteron can get to the E5-2660, the more price advantage can compensate for the higher power usage of the Opteron.
  • In the second column, we compare the Opteron with the best performance per dollar ratio with a comparably priced Xeon.
  • In the third column we measure how much progress AMD has made by replacing the Bulldozer core with the Piledriver core (higher IPC and clock).
  • The fourth column gives you an idea of how much the small changes inside the Piledriver have improved the IPC.

We also group our benchmarks in different software groups and indicate the importance of this software group in the server market (we discussed this here). 100% means that both CPUs perform equally.

Software: Importance in the market Opteron 6380
vs

Xeon E5-2660
Opteron 6376
vs
Xeon E5-2630
Opteron 6380
vs
Opteron 6276
Opteron 6376
vs
Opteron 6276

Virtualisation: 20-50%

       
ESXi + Linux

86%

104%

120%

111%

OLTP, ERP : 10%

 

 

 

 

SAP S&D 2-tier

95%**

N/A

105%*

100%*

HPC: 5-7%

 

 

 

 

LS Dyna

92%

97%

116%

105%

Back-end webserver: 10-15%

       
SPECjbb2013

85%

N/A

N/A

N/A

Rendering software: 2-3%

 

 

 

 

Cinebench

84%

98%

115%

106%

3DS Max 2012 (Mental Ray)

56%

66%

143%

126%

 

 

 

 

 

Other: N/A

 

 

 

 

Encryption
Decryption AES

71%

77%

94%

96%

101%

101%

100%

100%

Encryption
Decryption
Twofish/Serpent

113%

108%

132%

128%

115%

113%

107%

103%

Compression
decompression

100%

53%

118%

60%

113%

108%

105%

100%

* estimate
** Rough estimate

After reviewing the Xeon-E5 we concluded:

"...it will be hard to recommend the current Opteron 6200. The Opteron 6200 might still have a chance as a low end virtualization server. After all, quite a few virtualization servers are bottlenecked by memory capacity and not by raw processing power. The Opteron can then leverage the fact that it can offer the same memory capacity at a lower price point. The Opteron might also have a role in the low end, price sensitive HPC market, where it still performs very well. Whether you want high performance per dollar or performance per watt, the Xeon E5-2660 is simply a home run. End of story."

To sum it up, the Xeon E5 was the best choice for most applications, as the Opteron 6200 could only leverage its price advantage in the low end virtualization and HPC market. But the lower acquisition costs were easily negated by the higher power draw and the fact that in most IT projects a few hundred dollars per server does not matter.

The new Opteron 6376 offers 5% to 11% better performance per clock, 8% lower energy consumption, 6% lower peak power draw, and an 11% lower price than the Opteron 6276. That's all good, but there is more. Keeping the G34 platform alive has a very positive effect on the OEM pricing: the Opteron servers are tangibly cheaper. The price difference is quite a bit higher than the CPU list prices suggest. You can get a 6380 based server for the price of a Xeon E5-2640 based server.

All these small steps forward make the AMD Opteron attractive again for the price conscious buyers looking for a virtualization host or an HPC crunching machine. The Opteron machines need more energy to do their job, but once again you get better performance per dollar than Intel's midrange offerings.

However, if your consulting or software costs are a lot higher than the hardware costs, the octal core Xeons offer an excellent performance/watt ratio and are by far the best performers too. In a nutshell, Intel's octal core Xeons are still unmatched, but AMD is putting some pressure on Intel's hex-core midrange offerings, and that is always good news for the customers.

Log in

Don't have an account? Sign up now