Virtualization Performance: ESXi 5.1 & vApus FOS Mark 2 (beta)

We introduced our new vApus FOS (For Open Source) benchmark in our review of the Facebook "Open Compute" servers. In a nutshell, it is a mix of four VMs with open source workloads: two PhpBB websites (Apache2, MySQL), one OLAP MySQL "Community server 5.1.37" database, and one VM with VMware's open source groupware Zimbra 7.1.0.

As we try to keep our benchmarks up to date, some changes have been made to the original vApus FOS Mark. We've added more realistic workloads and tuned them in accordance with optimizations performed by our industry partners.

With our latest and greatest version (a big thanks to Wannes De Smet), we're able to:

  • Simulate real-world loads
  • Measure throughput, response times, and energy usage for a each concurrency
  • Scale to 80 (logical) core servers and beyond

We have a grouped our different workloads into what we call a 'tile'. A tile consists of four VMs, each running a different load:

  • A phpBB forum atop a LAMP stack. The load consists of navigating through the forum, creating new threads, and posting replies. There are also large res pictures on the pages, causing proper network load.
  • Zimbra, which is stressed by navigating the site, sending emails, creating appointments, adding and searching contacts, etc.
  • Our very own Drupal-based website. We create new posts, send contact emails, and generate views in this workload.
  • A MySQL database from a news aggregator, loaded with queries from the aggregator for an OLAP workload.

Each VM's hardware configuration is specced to fit each workload's needs. These are the detailed configurations:

Workload CPUs Memory (GB) OS Versions
phpBB 2 4 Ubuntu 12.10 Apache 2.2.22, MySQL server 5.5.27
Zimbra 4 4 Ubuntu 12.04.3 Zimbra 8
Drupal 4 10 Ubuntu 12.04.2 Drupal 7.21, Apache 2.2.22, MySQL server 5.5.31
MySQL 16 8 Ubuntu 12.04.2 MySQL server 5.5.31

Depending on the system hardware, we place a number of these tiles on the stressed system to max it out and compare its performance to other servers. Developing a new virtualization benchmark takes a lot of time, but we wanted to give you our first results. Our benchmark is still in beta, so results are not final yet. Therefore we only tested one system, the Intel system, using three CPUs.

vApusMark FOS 2013 - beta

Intel reports that the Xeon E5-2697 v2 is 30% faster than the Xeon E5-2690 on SPECvirt_sc2010. Our current benchmark is slightly less optimistic, however it is pretty clear that the Ivy Bridge based Xeons are tangibly faster.

We also measured the power needed to run the three tiles of vApusMark FOS 2013 beta. It is by no means realistic, but even then, peak power remains an interesting metric since all CPUs are tested in the same server.

vApusMark FOS 2013 - beta Power Consumption

According to our measurements, the Xeon E5 2697 v2 needs only 85% of the peak power of the Xeon E5-2690. That is considerable power savings, considering that we get 22% more throughput. Also note that the virtualization improvements (vApic, VT-d large pages) are not implemented in ESXi 5.1.

Benchmarking Configuration SAP S&D
Comments Locked

70 Comments

View All Comments

  • JohanAnandtech - Friday, September 20, 2013 - link

    I have to admit were are new to SPECjbb 2013. Any suggestions for the JVM tunings to reduce the GC latency?
  • mking21 - Wednesday, September 18, 2013 - link

    Surely its more interesting to see if the 12 core is faster than the 10 and 8 core V2s.
    Its not obvious to me that the 12 Core can out perform the 2687w v2 in real world measures rather than in synthetic benchmarks. The higher sustained turbo clock is really going to be hard to beat.
  • JohanAnandtech - Wednesday, September 18, 2013 - link

    There will be a follow-up, with more energy measurements, and this looks like a very interesting angle too. However, do know that the maximum Turbo does not happen a lot. In case of the 2697v2, we mostly saw 3 GHz, hardly anything more.
  • mking21 - Wednesday, September 18, 2013 - link

    Yes based on bin specs 3Ghz is what I would expect from 2697v2 if more than 6 or more cores are in use. 5 or more cores on 2687wv2 will run @ 3.6Ghz. While 2690v2 will run 3.3Ghz with 4 or more cores. So flat out the 12 core will be faster than 10 core will be faster than 8 core - but in reality hard to run these flat out with real-world tasks, so usually faster clock wins. Look forward to u sharing some comparative benchmarks.
  • psyq321 - Thursday, September 19, 2013 - link

    3 GHz is the maximum all-core turbo for 2697 v2.

    You are probably seeing 3 GHz because several cores are in use and 100% utilized.
  • JohanAnandtech - Friday, September 20, 2013 - link

    With one thread, the CPU ran at 3.4 GHz but only for very brief periods (almost unnoticeable).
  • polyzp - Saturday, September 21, 2013 - link

    AMD's Kaveri IGPU absolutley destroys intel iris 5200! Look at the first benchmarks ever leaked! +500% :O

    AMDFX .blogspot.com
  • Jajo - Tuesday, October 1, 2013 - link

    E5-2697v2 vs. E5-2690 +30% performance @ +50% cores? I am a bit disappointed. Don't get me wrong, I am aware of the 200 Mhz difference and the overall performance per watt ratio is great but I noticed something similar with the last generation (X5690 vs. E5-2690).
    There are still some single threaded applications out there and yes, there is a turbo. But it won't be aggressive on an averagely loaded ESXi server which might host VMs with single threaded applications.
    I somehow do not like this development, my guess is that the Hex- or Octacore CPUs with higher clocks are still a better choice for virtualization in such a scenario.

    Just my 2 cents
  • Chrisrodinis - Wednesday, October 23, 2013 - link

    Here is an easy to understand, hands on video explaining how to upgrade your server by installing an Intel E5 2600 V2 processor: http://www.youtube.com/watch?v=duzrULLtonM
  • DileepB - Thursday, October 31, 2013 - link

    I think 12 core diagram and description are incorrect! The mainstream die is indeed a 10 core die with 25 MB L3 that most skus are derived from. But the second die is actually a 15 core die with 37.5 MB. I am guessing (I know I am right :-))
    That they put half of the 10 core section with its QPIs and memory controllers, 5 cores and 12.5 MB L3 on top and connected the 2 sections using an internal QPI. From the outside it looks like a 15 core part, currently sold as a 12 core part only. A full 15 core sku would require too much power well above the 130W TDP that current platforms are designed for. They might sell the 15 core part to high end HPC customers like Cray! The 12 core sku should have roughly 50% higher die area than the 10 core die!

Log in

Don't have an account? Sign up now