POST A COMMENT

51 Comments

Back to Article

  • blue_falcon - Tuesday, August 10, 2010 - link

    The R715 is an AMD box. Reply
  • webdev511 - Tuesday, August 10, 2010 - link

    Yes, and the R715 has 2x AMD Opteron™ 6176SE, 2.3GHz with 12 cores per socket with an approx price of $8,000 Reply
  • fic2 - Tuesday, August 10, 2010 - link

    4. Part of the Anandtech 13 year anniversary giveaway?!! ;o) Reply
  • mino - Wednesday, August 11, 2010 - link

    Big Thanks for that ! Reply
  • Etern205 - Tuesday, August 10, 2010 - link

    *stares at cpu graph*
    ~Drrroooollllliiiieeeeeeee~~~~
    Reply
  • yuhong - Tuesday, August 10, 2010 - link

    The incorrect references to Xeon 7200 should be Xeon 7100.
    "Other reasons include the fact that some decision makers never really bothered to read the benchmarks carefully"
    You didn't even need to do that. Knowing the difference between NetBurst vs Core 2 vs Nehalem would have made it obvious.
    Reply
  • ELC - Tuesday, August 10, 2010 - link

    Isn't the price of software licenses a major factor in the choice of optimum server size? Reply
  • webdev511 - Tuesday, August 10, 2010 - link

    So does the NUMA barrier.

    I'd go for less sockets with more cores any day of the week and as a result Intel= second string.
    Reply
  • Ratman6161 - Wednesday, August 11, 2010 - link

    For the software licensing reasons I mentioned above, there is a distinct advantage to fewer sockets with more cores. Reply
  • davegraham - Wednesday, August 11, 2010 - link

    so NUMA is an interesting one. Intel's QPI bus is actually quite good and worth spending some time to get to know.

    dave
    Reply
  • Ratman6161 - Wednesday, August 11, 2010 - link

    Many products license on a per CPU basis. For Microsoft anyway, what they actually count is the number of sockets. For example SQL Server Enterprise retails for $25K per CPU. So an old 4 socket system with single cores would be 4 x $25K = $100K. A quad socket system with quad core CPUs would be a total of 16 cores but the pricing would still be 4 sockets x $25K = 100K. It used to be that Oracle had a complex formula for figuring this but I think they have now also gone to the simpler method of just counting sockets (though their enterprise edition is $47.5K).

    If you are using VMWare, they also charge per socket (last I knew) so two dual socket systems would cost the same as a single 4 socket system. Thing is though you need to have at least two boxes in order to enable the high availability (i.e. automatic failover) functionality.
    Reply
  • Stuka87 - Wednesday, August 11, 2010 - link

    For VMWare they have a few pricing structures. You can be charged per physical socket, or you can get an unlimited socket license (which is what we have, running one seven R910's). You just need to figure out if you really need the top tier license. Reply
  • semo - Tuesday, August 10, 2010 - link

    "Did I mention that there is more than 72GHz of computing power in there?"

    Is this ebay?
    Reply
  • Devo2007 - Tuesday, August 10, 2010 - link

    I was going to comment on the same thing.

    1) A dual core 2GHz CPU does not equal "4GHz of computing power" - unless somehow you were achieving an exact doubling of performance (which is extremely rare if it exists at all).

    2) Even if there was a workload that did show a full doubling of performance, performance isn't measured in MHz & GHz. A dual-core 2GHz Intel processor does not perform the same as a 2GHz AMD CPU.

    More proof that the quality of content on AT is dropping. :(
    Reply
  • mino - Wednesday, August 11, 2010 - link

    You seem to know very little about the (40yrs old!) virtualization market.
    It flourishes from *comoditising* processing power.

    Why clearly meant a joke, that statement of Johan, is much closer to the truth than most market "research" reports on x86.
    Reply
  • JohanAnandtech - Wednesday, August 11, 2010 - link

    Exactly. ESX resource management let you reserve CPU power in GHz. So for ESX, two 2.26 GHz cores are indeed a 4.5 GHz resource. Reply
  • duploxxx - Thursday, August 12, 2010 - link

    sure you can count resources together as much as you want... virtually. But in the end a single process is still only able to handle the max ghz a single cpu can offer but can finish the request faster. That is exactly the thing why those Nehalem and gulf still hold against the huge core count of Magny cours. Reply
  • maeveth - Tuesday, August 10, 2010 - link

    So I have nothing at all against AnandTech's recent articles on Virtualization however so far all of them have only looked at Virtualization from a compute density point of view.

    I currently am the administrator of a VMware environment used for development work and I run into I/O bottle necks FAR before I ever run into a compute bottleneck. In fact computational power is pretty much the LAST bottleneck I run into. My environment currently holds just short of 300 VMs, OS varies. We peak at approximately 10-12K IOPS.

    From my experience you always have to look at potential performance in a virtual environment at a much larger perspective. Every bottleneck effects others in subtle ways. For example if you have a memory bottleneck, either host or guest based you will further impact your I/O subsystem, though you should aim to not have to swap. In my opinion your storage backend is the single most important factor when determining large-scale-out performance in a virtualized environment.

    My environment has never once run into a CPU bottleneck. I use IBM x3650/x3650M2 with Dual Quad Xeons. The M2s use X5570s specifically.

    While I agree having impressive magnitudes of "GHz" in your environment is kinda fun it hardly says anything about how that environment will preform in a real world environment. Granted it is all highly subject to work load patterns.

    I also want to make it clear that I understand that testing on a such a scale is extremely cost prohibitive. As such I am sure AnandTech, Johan speficially, is doing the best he can with what resources he is given. I just wanted to throw my knowledge out there.

    @ELC
    Yes, software licensing is a huge factor when purchasing ESX servers. ESX is licensed per socket. It's a balancing act that depends on your work load however. A top end ESX license costs about $5500/year per socket.
    Reply
  • mino - Wednesday, August 11, 2010 - link

    However, IMO storage performance analysis is pretty much beyond AT's budget ballpark by an order of magnitude (or two).

    There is a reason this space is so happily "virtualized" by storage vendors AND customers to a "simple" IOPS number.
    It is a science on its own. Often closer to black (empiric) magic than deterministic rules ...

    Johan,
    on the other hand, nothing prevents you form mentioning this sad fact:

    Except edge cases, a good virtualization solution is build from the ground up with
    1. SLA's
    2. storage solution
    3. licensing considerations
    4. everything else (like processing architecture) dictated by the previous
    Reply
  • JohanAnandtech - Wednesday, August 11, 2010 - link

    I can only agree of course: in most cases the storage solution is the main bottleneck. However, this is aloso a result of the fact that most storage solutions out there are not exactly speed demons. Many storage solutions out there consist of overengineered (and overpriced) software running on outdated hardware. But things are changing quickly now. HP for example seems to recognize that a storage solution is very similar to a server running specialized software. There is more, with a bit of luck, Hitachi and Intel will bring some real competition to the table. (currently STEC has almost a monopoly on the enterprise SSD disks). So your number 2 is going to tumble down :-). Reply
  • haplo602 - Wednesday, August 11, 2010 - link

    This is one of the bottlenecks of your virtualised environemnt. A storage solution is only the limit if you do not use it as it was designed to be used.

    the more IO demanding application you have, the less virtualisation is going to offer any benefits. usualy CPU power is the last issue after netwrok, disk and memory.

    I had a good laugh at the opening page. High end servers are High end not because of the increased performance but because of the better management and disaster tolerance/recovery they offer. After all, they use the same CPUs and memory as the low end servers, just everything else is different (OLRAD, hot swap/plug of almost anything except memory and CPU).
    Reply
  • webdev511 - Thursday, August 12, 2010 - link

    Well, if you're willing to spend some more money on Solid State (if you go with two twelve core cpus you'll save on licences) you could stuff four of the new Fusion IO 1.28 TB Duo Drives into the box and map them as System Drives and then use attached storage for big files. Reply
  • SomeITguy - Wednesday, August 11, 2010 - link

    No offense intended, and I know this will put you on the defensive, but it sounds to me like the "development environment" was ill conceived in the design phase. You obviously overbought on processor power. The first step in designing an environment, is knowing what your apps need. You can't just buy servers, then whine about how poorly the performance matches the overall system capability...

    Last job I had Citrix Xen on HP blades with 53xx and 54xx CPU's, running about 150 production VM's. On the order of >300 total, with R&D and QA. The company had no money, and because of that we only ran local storage for the OS and most functions. The shared data we did have were on Netapps, and that alone constantly spiked up to +25k IOPS. I can't remember were each blade sat on IOPS, but it was high. I was able to balance resources utilized most of the day to about the ~60% level, with spikes hitting the high 80's. No resources being overly wasted. To do this effectively takes time and patience. You need to economize. 12 VM's on a blade with 16GB of memory was not unheard of...

    Then there is the whole ESX thing, eh, won't get into that. Again, you need to know what is going to run on the servers before you spend (waste) money.

    In my experience, It's typical that managers just override the lowly sysadmin advice, take a vendors word over the sysadmin who manages the app, or a business unit buys you the equipment without consulting, then says "here, make it work".

    Overall, I thought the article good. It is just a guide, not a bible.
    Reply
  • davegraham - Tuesday, August 10, 2010 - link

    So, i'm sitting here with a spanking new Dell R815 which is a quad socket G34 system and is shipping today w/ AMD Opteron 6176SE parts...so, this article is outdated even before it begins. (oh, did i mention it's only 2RU?)

    I'm also very curious as to what the underlying storage is for all these tests as it definitely can have an impact on the servicability of the testing.

    I'm curious as to the details per VM was well...IOMMU choices, HT sharing, NUMA settings, as well as the version of ESX being used?

    dave
    Reply
  • JohanAnandtech - Wednesday, August 11, 2010 - link

    "So, i'm sitting here with a spanking new Dell R815 which is a quad socket G34 system and is shipping today w/ AMD Opteron 6176SE parts...so, this article is outdated even before it begins. (oh, did i mention it's only 2RU?)"

    Testing servers is not like testing videocards. I can not plug the R815 in a ready installed windows pc and push the button of "Servermark". It does not work that way as you indicate yourself. A complete storage system must be set up, and in many cases ESX fails to install the first time on a brand new server. We perform a whole battery of monitoring tests for example that confirm that the DQL is low enough.

    The storage system we use for the 4 tile test is a 8 disk SSD system for the OLTP tests (described in this article). The VMs themselves sit on a separate RAID controller connect to a promise JBOD. The JBOD has 8 15000 rpm SAS disks. The only really disk intensive app is Swingbench in this test, and by making sure both data and logs get their separate SSD , we achieve DQLs under 0.1. There is lot more to the Oracle config, but if you are interested, we can share the parameter file.

    Anyway, the low DQL and the fact that we scale well from 2 tot 4 tiles shows that we are not limited by the disks.
    Reply
  • davegraham - Wednesday, August 11, 2010 - link

    johan,

    I work with VMware for a living doing platform testing for the product i support. ;) consequently, I'm very well aware of the requirements for testing VMware and the various and sundry components within the server. Hence, my slightly critical view of what you're doing here.

    appreciate the response on the storage....again, all well and good with that explanation.

    I'll put my quad socket 6176SE system against your 7500 system anyday and i'll enjoy lower rack footprint, lower power consumption, and a positively brilliant VMware experience. ;)

    keep up the good work.

    dave
    Reply
  • blue_falcon - Wednesday, August 11, 2010 - link

    If you wan to do a similar 2U config, try the R810, only has 32 dimm sockets but nearly identical to the R910. Reply
  • mapesdhs - Tuesday, August 10, 2010 - link


    Johan, how would this system compare to a low-end quad-socket Altix UV 10? (max
    RAM = 512GB).

    Ian.
    Reply
  • JohanAnandtech - Wednesday, August 11, 2010 - link

    I never tested an SGI server, so I can not say for sure. But the hardware looks (and probably is) identical to what we have tested here. Reply
  • Casper42 - Wednesday, August 11, 2010 - link

    Due to the way Dell implemented the memory on their latest Quad socket machines, if you run 2 CPUs with the FlexMem bridge, you get full memory bandwidth but half of the memory sockets are further away from the CPU due to the extra trace length of going to the empty CPU socket and through the FlexMem bridge.

    When you put in 4 CPUs you only get half the memory bandwidth of an Intel reference design. This is because the traces that would normally go to the empty CPU socket and through the FlexMem now go essentially nowhere because the CPU in that socket needs the access instead.

    I would say try IBM or HP. Just beware that IBM does some weird stuff when it comes to their Max5 memory expansion module that can also cause additional memory latency for some of the DIMM sockets and not the others.
    Reply
  • davegraham - Wednesday, August 11, 2010 - link

    which is actually why you should be using a Cisco C460 for this type of test.

    dave
    Reply
  • MySchizoBuddy - Wednesday, August 11, 2010 - link

    Is there an exact correlation with number of cores and VMs. How many VMs can a 48 core system support.

    Let's assume you want 100 systems virtualized. What's the minimum number of cores that will handle those 100 VMs.
    Reply
  • dilidolo - Wednesday, August 11, 2010 - link

    Depends on how many vCPU and memory you assign to each VM and how much physical memory your server has. CPU is rarely the bottleneck , memory and storage are.

    Then not all the VMs have the same workload. So no one can really answer your question.
    Reply
  • davegraham - Wednesday, August 11, 2010 - link

    was going to say that a small amount of memory oversubscription is "ok" depending on the workload but you'd want that buffered with something a little more powerful than spinning disk (SSD, for example). Reply
  • tech6 - Wednesday, August 11, 2010 - link

    The parameters for determining the optimal configuration for VMWare go well beyond just which CPU is faster. I like the AT stories about server tech but there need to be broader considerations of server features.

    1. Many applications are memory limited and not CPU bound so the memory flexibility may trump CPU power. That is why 256Gb with a dual 75xx or 6xxx series CPU in an 810 may well be the better choice than either a quad socket or dual socket 56xx configuration.

    2. Software licensing is a big part of choosing the server as it is often licensed per socket. Sometime more cores and more memory is cheaper than more sockets.

    3. Memory reliability is another major issue. Large amounts of plain ECC memory will most likely result in problems 2-3 years after deployment. The platforms available with the 6xxx and 75xx series CPUs support memory reliability features that often make it a better choice for VM data centers.

    4. Power and density is another major issue which drive data center costs that must be given consideration when reviewing servers.
    Reply
  • don_k - Wednesday, August 11, 2010 - link

    Would like to see some non-windows VM benchmarks as well as a different virtualisation application used and by extension an SQL server that does not come from microsoft. Also would like to see benchmarks on para-virtualised VMs along with full hardware virtualised VMs.

    The review as is is quite meaningless to anyone that does not run windows VMs and/or does not use VMware.

    You do have oracle on a windows VM so maybe oracle on a solaris/bsd VM as well as oracle on a linux para-virtualised guest.

    There is also no mention of how, if at all, the VMs were optimised for the workloads they are running. In particular and most importantly how are the DBs using the disks? Where is the data and where are the logs? How are the disks passed on to the VM (local file, separate partition, virtual volume, full access to one/more drives etc etc).

    Way too many variables to make any kind of an accurate conclusion in my opinion.
    Reply
  • phoenix79 - Wednesday, August 11, 2010 - link

    I'm curious as to why you didn't include a quad-socket Magny-Cours system. I would have been very interested to see how it would have stacked up in this article. Reply
  • Stuka87 - Wednesday, August 11, 2010 - link

    Ditto, I would like to see the best from each CPU maker. To really see which has the best price:performance ratio. Reply
  • davegraham - Wednesday, August 11, 2010 - link

    if vApus II was available i could run it on my Magny-Cours.

    dave
    Reply
  • JohanAnandtech - Thursday, August 12, 2010 - link

    The Dell R815 and quad MC deserve an article on their own. Reply
  • fynamo - Wednesday, August 11, 2010 - link

    WHERE ARE THE POWER CONSUMPTION CHARTS??????

    Awesome article, but complete FAIL because of lack of power consumption charts. This is only half the picture -- and I dare to say it's the less important half.
    Reply
  • davegraham - Wednesday, August 11, 2010 - link

    +1 on this. Reply
  • JohanAnandtech - Thursday, August 12, 2010 - link

    Agreed. But it wasn't until a few days before I was going to post this article that we got a system that is comparable. So I kept the power consumption numbers for the next article. Reply
  • watersb - Wednesday, August 11, 2010 - link

    Wow, you IT Guys are a cranky bunch! :-)

    I am impressed with the vApus client-simulation testing, and I'm humbled by the complexity of enterprise-server testing complexity.

    A former sysadmin, I've been an ignorant programmer for lo these past 10 years. Reading all these comments makes me feel like I'm hanging out on the bench in front of the general store.

    Yeah, I'm getting off your lawn now...
    Reply
  • Scy7ale - Wednesday, August 11, 2010 - link

    Does this also apply to consumer HDDs? If so is it a bad idea to have an intake fan in front of the drives to cool them as many consumer/gaming cases have now? Reply
  • JohanAnandtech - Thursday, August 12, 2010 - link

    Cold air comes from the bottom of the server aisle, sometimes as low as 20°C (68F) and gets blown at high speed over the disks. Several studies now show that this is not optimal for a HDD. In your desktop, the temperature of the air that is blown over the hdd should be higher, as the fans are normally slower. But yes, it is not good to keep your harddisk at temperatures lower than 30 °C . use hddsentinel or speedfan to check on this. 30-45°C is acceptable. Reply
  • Scy7ale - Monday, August 16, 2010 - link

    Good to know, thanks! I don't think this is widely understood. Reply
  • brenozan - Thursday, August 12, 2010 - link

    http://en.wikipedia.org/wiki/UltraSPARC_T2
    2 sockets =~ 153GHz
    4 sockets =~ 306GHz
    Like the T1, the T2 supports the Hyper-Privileged execution mode. The SPARC Hypervisor runs in this mode and can partition a T2 system into 64 Logical Domains, and a two-way SMP T2 Plus system into 128 Logical Domains, each of which can run an independent operating system instance.

    why SUN did not dominate the world in 2007 when it launched the T2? Besides the two 10G Ethernet builtin processor they had the most advanced architecture that I know, see in
    http://www.opensparc.net/opensparc-t2/download.htm...
    Reply
  • don_k - Thursday, August 12, 2010 - link

    "why SUN did not dominate the world in 2007 when it launched the T2?"

    Because it's not actually that good :) My company bought a few T2s and after about a week of benchmarking and testing it was obvious that they are very very slow. Sure you get lots and lots of threads but each of those threads is oh so very slow. You would not _want_ to run 128 instances of solaris, one on each thread, because each of those instances would be virtually unusable.

    We used them as webservers.. good for that. Or file servers that you don't need to do any cpu intensive work.

    The theory is fine and all but you obviously have never used a T2 or you would not be wondering why it failed.
    Reply
  • JohanAnandtech - Thursday, August 12, 2010 - link

    "http://en.wikipedia.org/wiki/UltraSPARC_T2
    2 sockets =~ 153GHz
    4 sockets =~ 306GHz"

    You are multiplying threads times clockspeed. IIRC, the T2 is a finegrained multithread CPU where 8 (!!) threads share two pipelines of *one* core.

    Compare that with the Nehalem core where 2 threads share 4 "pipelines" (sustained decode/issue/execution/retire) per cycle. So basically, a dual socket T2 is nothing more than 16 relatively weak cores which can execute 2 instructions per clockcycle at the most, or 32 instructions per cycle. The only advantage of having 8 threads per core is that (with enough indepedent software threads) the T2 is able to come relatively close to that kind of throughput.

    A dual six-core Xeon has a maximum throughput of 12 cores x 4 instructions or 48 instructions per cycle. As the Xeon has only 2 threads per core, it is less likely that the CPU will ever come close to that kind of output (in business apps). On the other hand, it performs excellent when you have some amount of dependent threads, or simply not enough threads in parallel. The T2 will only perform well if you have enough independent threads.
    Reply
  • duploxxx - Thursday, September 02, 2010 - link

    Looking at the differences between olap/oltp and web it is very clear that this web based test:

    The MCS eFMS portal, a real-world facility management web application, has been discussed in detail here. It is a complex IIS, PHP, and FastCGI site running on top of Windows 2003 R2 32-bit. Note that these two VMs run in a 32-bit guest OS, which impacts the VM monitor mode. We left this application running on Windows 2003, as virtualization allows you to minimize costs by avoiding unnecessary upgrades. We use three MCS VMs, as web servers are more numerous than database servers in most setups. Each VM gets two vCPUs and 2GB of RAM space.

    is really in favor of intel cpu's this makes actually the final result a bit out of order....

    database wise it would actually mean that you can order a L5640 or 6136 and you will have about the same virtualization performance, this means that it is only due to the web based vm behavior and results that you get such a difference. I think it is clear that although the vApus is a nice benchmark it should be enhanced more with different kinds of applications, the web based solution is providing in the end a wrong total conclusion.
    Reply

Log in

Don't have an account? Sign up now