Motherboards Memory Storage Cases/Cooling/PSUs IT Computing Displays Mobile Mac CPUs & Chipsets Video Digital Cameras Linux Gadgets Systems Trade Shows Guides Home Increase Font Size Decrease Font Size Change Page Size
AnandTech.com Blogs : Johan De Gelas


  November 3, 2009

Choosing the right foundation: which hypervisor do you evaluate?
blog post by Johan De Gelas
First of all, we were pretty excited to see so many comments and votes (5000!) on our last IT poll. It is good to see that professional IT is so much alive at Anandtech.com. So yes, we should have updated this blog quicker, to keep the momentum going. The reason why this update comes rather late is -once again - that we are working on the much delayed hypervisor comparison. Hundreds of tests have already been done, but we have added more tests to check important I/O performance factors such as VMDq and iSCSI performance.
 
And of course, the virtualization market is evolving fast. There is a new kid on the block: KVM. Two of the three most important Linux vendors, Red Hat and Canonical, have ripped Xen out of their distributions in favor of KVM. KVM has an interesting philosophy: it simply adds two kernel modules to the Linux kernel to turn the latter into a hypervisor. As a result, KVM can leverage the huge amount of Linux drivers and the Linux kernel improvements such as power management. Still, a virtualization solution needs to mature quite a bit before it is ready. And that is more than a cliche. Xen's support for Windows VMs was for example supposed to work at the beginning of 2007, as Xen introduced support for Hardware Virtual Machines at the end of 2006. But only around in the middle of 2008, we felt confident enough to say that Windows virtual machines work well on Xen. We reported
 
"Xen 3.2.0 which can be found in the newest Novell SLES 10 SP2, is capable of running Windows 2003 R2 under heavy stress."
So it took Xen several major revisions to really get it right. It is unlikely that KVM will do this much quicker. We will be giving KVM some heavy stresstesting so we can tell you more than just hearsay.
 
In the mean time, a new survey by Centrify shows a still dominant VMware, but it also tell us that Hyper-V and Xen are making a lot of progress, growing strong enough to be dangerous opponents in the near future. I have been talking to tens of Small and Medium Enterprises (SME) in Belgium and the Netherlands. Our own tests show that VMware ESX is still the most robust hypervisor and most people concur. However VMware's half-hearted attempts to make vSphere more attractive to the SME does not create  a lot of enthousiasm. If VMware does not create a more budgetfriendly solution for SMEs (and VMware, newsflash: most SME have more than 3 servers), we have the impression it may lose the server virtualization battle in the SME world, where everything is still possible. But those are my personal impressions. At the end of the day, what will happen in your working environment determines who will prevail. So let us know what you are planning...
 


November 3, 2009, 30 comments
  October 7, 2009

The basic
blog post by Johan De Gelas

If you read our last article, it is clear that when your applications are virtualized, you have a lot more options to choose from in order to build your server infrastructure . Let us know how you would build up your "dynamic datacenter" and why!


October 7, 2009, 48 comments
  May 27, 2009

Intel talking about the 16-thread RISC killer
blog post by Johan De Gelas
Take two Nehalem dies, turn them  90 degrees, add a lot of system interface logic and 8 MB extra of L3-cache and you get - very oversimplified - the impressive Nehalem EX, alias "Beckton". The new Xeon MP is an impressive monster, just like it's predecessor Dunnington. Dunnington consisted of 1.9 Billion transistors, the Xeon MP based on the "Nehalem" architecture will feature up to 2.3 Billion transistors.
 
 
Those 2.3 Bilion transistors are needed for 
  • Up to eight cores, 16 threads thanks to SMT
  • Up to 24MB of shared L3 cache
  • four QuickPath links
  • four memory channels which support for up to 16 memory modules per socket 
Intel calls the chips to drive the DDR-3 modules "Scalable Memory Buffer" chips, which means that Intel figured out that it is best to move the power gobbling AMB chip from the FBDIMMs to the systemboard. As you need only one chip to drive several registered DDR-3 modules, it consumes a lot less power than placing an AMB chip on each DIMM.
 
 
 
 
In the second of half of this year, Intel will have a IBM Power 6 killer and a server platform to match. The irony is that when it comes to "Intel Scalable Memory Buffers", IBM has the right to say "what to took you so long to figure out that FB-DIMMs were a pretty bad idea?" Back in 2005, IBM's X3 chipset already featured a solution that allowed large memory capacities with lower latency and much lower power consumption than FBDIMMs.
 
It will be interesting to see what IBM's respons to the Nehalem EX will be, as Intel's first octal core is going to enter the last market where RISC CPUs still hold their ground: 8 sockets and more.There have been previous attempts, but this time it is for real:more than 15 8+ socket designs are being readied. More irony: IBM will probably design the servers with the highest socket counts which really give the Power servers a run for their money...
 
As Intel gave its octal core CPU RAS features (MCA) that once belonged to the RISC and Itanium families only, it seems that the last stronghold of the non-x86 servers is going to fall..."mainframe slowly"  but steadily. Only the Ultrasparc T2 with its radically different architecture may survive this assault.
 
The Machine Check Architecture is of course ultra important for the future Xeon MP systems. Even a quad socket system will contain 32 cores and probably up to 512 GB of RAM. That kind of machine simply cries out for large databases and virtualization consolidation. In the latter case, MCA should allow hypervisors such as ESX to overcome critical errors in one of the VMs, instead of shutting down tens of VMs. 
 
In a different note, Intel claims that by August 2009 50% of it's DP server processors sold  will be "Nehalem" based. So even though AMD is executing very well and introducing the hex-core "Istanbul" soon, it is not a minute too soon as the Opterons are under heavy attack.
 
Update:  Anand also talked about Nehalem EX in his lab update here.
 

May 27, 2009, 8 comments
  May 19, 2009

quick update from the
blog post by Johan De Gelas
We promised you a new datapoint, a new independent virtualization benchmark in "a few days". Those "few days" have become a week in good "IT at Anandtech" tradition. :-) But this wednesday, unless Murphy strikes us hard, the article will be online. It will offer a refreshing look at the virtualization performance, the result of months of work.  Liz will follow up quickly with a "performance optimization for virtualization" article.

Until then, we have updated two articles. We told you in one of our "Intel Nehalem vs AMD Istanbul" blogs, that you will have to wait for ESX 4.0 for EPT support. However, we found that "forcing hardware VMMU" (= EPT) improves performance tangible, so we wrote that ESX 3.5 update 4 has support for EPT. That is not true, at least not officially. EPT is only officially supported on ESX 4.0 (the hypervisor of vSphere 4.0).  Check out the updates that we did to the last article, as it clarifies some of the VMmark benchmarking. Our thanks goes to Scott Drummonds of VMware for the excellent info.
 
The last update can be found in our "The Best Server CPUs part 2" article. We solved the problems with our Shanghai "exchange" server and managed to get some Opteron numbers. The newest quadcore "Shanghai" opterons are clock for clock as fast as the quadcore "Harpertown Xeons. In other words, Microsoft exchange runs faster on the Xeons 54xx thanks to their clockspeed advantage, and the Xeon 55XX is still by far the MS Exchange champion. You can find the benchmarks here.So expect a lot of new content soon... New CPUs, new servers, new storage. The second part of May and June should be fun.
 
 
 
 

May 19, 2009, 0 comments
  April 7, 2009

The million dollar question: how do you upgrade your datacenter
blog post by Johan De Gelas
 
"the challenge for AMD and Intel is to convince the rest of the market - that is 95% or so - that the new platforms provide a compelling ROI (Return On Investment). The most productive or intensively used servers in general get replaced every 3 to 5 years. Based on Intel's own inquiries, Intel estimates that the current installed base consists of 40% dual-core CPU servers and 40% servers with single-core CPUs."
 
At the end of the presentation of Pat Gelsinger (Intel) makes the point that replacing nine servers based on the old single core Xeons with one Xeon X5570 based server will result in a quick payback. Your lower energy bill will pay back  your investment back in 8 months according to Intel.
 
Why these calculations are quite optimistic is beyond the scope of this blogpost, but suffice to say that Specjbb is a pretty bad benchmark to perform ROI calculations (it can be "inflated" too easiliy) and that Intel did not consider the amount of work it takes to install and configure those servers. However, Intel does have a point that replacing the old power hungry Xeons (irony...) will deliver a good return on investment.
 
In contrast, John Fruehe (AMD) is pointing out that you could upgrade dualcore Opteron based servers (the ones with four numbers in their modelnumbers and DDR-2) with hex-core AMD "Istanbul" CPUs. I must say that I encountered few companies who would actually bother upgrading CPUs, but his arguments make some sense as the CPU will still use the same kind of memory: DDR-2. As long as your motherboard supports it, you might just as well upgrade the BIOS, pull out your server, replace the 1 GB DIMMs with 4 GB DIMMs and replace the dual cores with hex-cores instead of replacing everything. It seems more cost effective than redo the cabling, reconfigure a new server and so on...
 
There were two reasons why few professional IT people bothered with CPU upgrades:
  1. You could only upgrade to a slightly faster CPU. Upgrading a CPU to a higher clocked, but similar CPU rarely gave any decent performance increase that was worth the time. For example, the Opteron was launched at 1.8 GHz, and most servers you could buy at the end of 2003 were not upgradeable beyond 2.4 GHz.
  2. You could not make use of more CPU performance. With the exception of the HPC people, higher CPU performance rarely delivered anything more than even lower CPU percentage usage. So why bother?
AMD has also a point that both things have changed. The first reason may not be valid anymore if hex-cores do indeed work in a dualcore motherboard. The second reason is no longer valid as virtualization allows you to use the extra CPU horse power to consolidate more virtual servers on one physical machine. On the condition of course that the older server allows you to replace those old 1 GB DIMMs with a lot of 4 GB ones. I checked for example the HP DL585G2 and it does allow up to 128 GB of DDR-2.
 
So what is your opinion? Will replacing CPUs and adding memory to extend the lifetime of servers become more common? Or should we stick to replacing servers anyway?
 

April 7, 2009, 23 comments
  February 27, 2009

Istanbul versus Nehalem, some extra notes
blog post by Johan De Gelas

My last post generated quite a bit of discussion, some of it based on misunderstandings. In this post I'll try to make a few things more clear. In a previous post, I pointed out that there are a good indications that a dual Nehalem EP has a 40 to 100% advantage over Shanghai (depending on the application, based on the SAP and Core i7 workstation benchmarks).

If Istanbul is introduced in the early part of H2 2009, AMD will have a small window of opportunity of competing with a hex-core versus a quad-core (Intel's Nehalem EP). Time will tell of course how small, large or non-existing this window will be.

In well threaded applications, the best a "hex-core Shanghai" can do is give about a 30-40% boost to performance compared to the current Shanghai, which is most likely not enough to close the gap with the upcoming Nehalem CPU (let alone the 32 nm hex-core version). However, Istanbul is more than a hex-core Shanghai. The improved memory controller and HT-assist can lower the latency of inter-CPU syncing and increase the effective memory bandwidth. For that reason, Istanbul will do better than just "a shanghai with 2 added cores" in many applications such as SAP, OLTP databases, Virtualization scenario's and HPC. Depending on the application, Istanbul might prove to be competitive with the quad-core Nehalem. It is clear that the hex-core "Westmere" which will have a slightly improved architecture will be a different matter.

But back to the "this higher amount of bandwidth will allow the quad Istanbul to stay out of the reach of the dual Nehalem EP Xeons" comment. It is very embarrassing, and simply bad PR if a quad socket platform is beaten by a dual socket platform in any benchmark. This is something we have witnessed in the early SAP numbers. That is why I commented that the improved "uncore" will help the quad socket Istanbul to stay out of the reach of the dual Nehalem EP. I was and am not implying that people who would consider a dual Nehalem EP are suddenly going to consider a quad Istanbul.

It is clear those looking for a 4S and 2S server are in a slightly overlapping but mostly different market. Quad socket is mostly chosen for large back end applications such as OLTP databases or for virtualization consolidation. The number of DIMM slots in that case is a very important factor. However, even with the advantage of having more DIMM slots, better RAS etc., a quad socket platform that cannot outperform a dual socket platform will leave a bad taste in the mouth of potential buyers. It is important that there is a minimal performance advantage.

The fact that the performance/power ratio of such a quad server will be worse than a dual socket server is an entirely different discussion. IBM's market research (see the picture below) shows which form factor is bought mostly for consolidating VMs. As you can see it comes down to some people being convinced that a number of 4-socket rack servers is the best way, others are firm believers that about twice as much low power 2-socket blades is the way to go. It is very hard to convince the latter or former group to switch sides and that is why I feel that 2S and 4S servers are mostly in different markets.

In many cases, the number of virtual machines you can consolidate on one physical server is mostly a function of the amount of RAM. If the number of DIMM slots allows you to consolidate twice as many virtual machines on the quad socket machine, the consumed energy might be better than using two DP machines with the same number of DIMMs.

So despite the fact that the two DP machines have a lot more CPU power, the "scale up" buyers still prefer to go for a large box with more memory; they are not limited by raw CPU power, but by the amount of RAM that they can put in this server. It is these people that AMD will target with their 4S platform, a platform which has - especially for virtualization - a number of advantages over the current Intel 4S "Dunnington" platform... at least until Intel's octal-core arrives. Whether you choose the 2S blades or 4S rack servers depends on whether you believe in the "scale up" or "scale out" philosophy.

The conclusion is that many 4S rack servers are not only bought for raw CPU performance, but for the amount of RAM, their RAS features, and so on. However, it is clear that a 4S server should still outperform 2S servers so that the group of buyers who are believers in the "scale up" philosophy feel good about their purchase.


February 27, 2009, 18 comments
  February 25, 2009

How AMD's Istanbul might close the gap with Nehalem EP
blog post by Johan De Gelas
The Istanbul cores are the same as those that can be found in the AMD's latest Shanghai CPU. But the "uncore" part of Istanbul is more interesting. By now, you have probably heard about AMD's "HT-assist" technology, a probe or snoop filter. Every time a new cacheline is brought into the L3-cache of for example CPU 1 on the current Shanghai Platform, a broadcast message is sent to all L3-caches of all CPUs, and CPU 1 has to wait until those CPUs answer. 
 
In the case of Istanbul, the CPU will simply check it's snoop filter in it's own L3-cache, and if none of the other CPUs have that certain cacheline, it can go ahead. This lowers the latency of bringing in a new cacheline and raises the effective bandwidth.
 
To better understand this, we combined our own stream benchmarking with the one that AMD presented. All AMD systems are using DDR-2 800.
 
Stream Triad benchmark
 
As each Stream thread works on its own data, there is no reason to send out coherency synchronization requests. These requests slow the process of getting new cachelines in the L3 and hence lower effective memory bandwidth. What is interesting is that this will not only benefit the applications that use the HT interconnects a lot for coherency traffic, but also applications like stream which do not need the HT interconnects. Also notice that HT 3.0 does not improve memory bandwidth, as Stream will try to keep its thread data local. Our testing used SUSE SLES 10 SP2 and AMD used Windows 2008. Both OSs are well optimized and NUMA aware.
 
This means that especially HPC applications, with many threads all working on their own data, will benefit from the higher effective bandwidth. Besides HT assist, AMD has now confirmed to us that the memory controller has been tuned quite a bit. This higher amount of bandwidth will allow the quad Istanbul to stay out of the reach of the dual Nehalem EP Xeons in many HPC applications.
 
HT assist might also improve the SAP and OLTP scores quite a bit, but for a different reason. SAP and OLTP applications perform a lot of cache coherency syncronization requests, so the snoop filter will substantially lower the average latency of such requests as in some cases:
  • the CPU will only wait on one other CPU (instead of waiting for all responses to come back)
  • the CPU won't have to wait at all, as the other CPUs don't have this line.
Secondly, this will also lower memory latency, which is a bonus for almost every multi-threaded application.
 
Lower memory latency, higher bandwidth, lower "cache coherency" latency and more interconnect bandwidth: the improved "uncore" of Istanbul will be vital to close the gap with Nehalem. Much will depend on how quickly Intel introduces its own hexacore 32 nm Xeons, but that probably won't happen before 2010. Istanbul is shaping up to be a really good alternative for Intel's quadcore Nehalem. We might see a good fight after all...
 
Don't forget to check it.anandtech.com (IT portal) often, as many of our blogposts (for example the VMworld 2009 coverage) are not published on the frontpage of Anandtech.com.
 


February 25, 2009, 40 comments
  February 23, 2009

AMD fighting back with hexacore Istanbul and
blog post by Johan De Gelas
Last Friday, AMD has given a good answer to the approaching  Intel Xeon Nehalem EP thunderstorm. AMD demonstrated to a handful of journalists (Charley and Scott) an up and running dual and quad socket Hexacore Istanbul system. Istanbul, which should be ready in the Autumn of this year, is basically a six core version of the current AMD Opteron "Shanghai". While we could not attend the Istanbul demo, we had a long phone conversation with the AMD people. A few interesting points came up during that phone conversation, and we love to share them with you.

AMD seems to recognize that the best Nehalem EP will be between 40 to 100% faster than their flagship CPU, but claims there will be much more benchmarks near the 40% than the 100% mark. AMD however believes that Intel will only be able to steal back the "performance is everything" HPC market, as it will counter Nehalem by launching an Energy Efficient version of the current Shanghai CPU. AMD firmly believes that the 95W Nehalems EP (2.66 to 2.93 GHz) will not be very attractive to many datacenters. AMD also points out that even the low power versions of Nehalem (up to 2.26 GHz) need 60W. We will see whether AMD can offer higher clockspeeds with lower energy consumption.It is interesting to hear that AMD firmly targets the low power market. According to AMD, many customers are already putting "power caps" (a BIOS feature) on their CPUs to avoid that the server exceed a certain power consumption level. This means that the CPU is staying in the lower p-states and is never able to run at full clockspeed. This is used by many customers that do not buy low power CPUs.
 
Secondly, AMD believe that  the total number of servers, based on Nehalem EP, will probably amount to being small percentages of the total server shipped in Q2. Buyers will oppose the high price of DDR-3 according to AMD. We are rather sceptic:
 
So the price difference is small to non-existing on a $3000-$4000 dual socket server. 
 
Still, Nehalem is a completely new platform and it will take some effort from the system administrator to verify if the currently running applications run well with Hyperthreading and Turbo Mode. Also AMD's RVI is already well supported in ESX 3.5, while we'll have to wait for VMware's vSphere ("ESX 4.0") before EPT will be supported. That means that the realworld performance of Nehalem running ESX will probably be lower than the published benchmarks in 2009. Yes, we are at VMworld 2009 remember!
 
The Shanghai platform is basically the same as the Barcelona one, so that earns AMD a few points in the "easier to integrate and upgrade to" departement. AMD is thus hoping that by the time Nehalem EP will really take off (Q3?), Istanbul will be ready to answer the threath.And there is something interesting about Istanbul... but we'll discuss that in a later post. 
 
 
 
 

February 23, 2009, 11 comments
  February 12, 2009

Will Nehalem conquer the server world by storm?
blog post by Johan De Gelas
A dramatic turn of events is the best way to describe what we'll witness in a few weeks. But let us first talk about the current situation. As we pointed out in our last server CPU comparison, AMD latest quadcore Opteron was a very positive surprise. Sure, you can show a few server benchmarks where the Intel CPU wins like Black Scholes or some exotic HPC benchmark but the server applications that really make the difference like webservers, database servers run faster on the latest AMD "Shanghai" CPU. All depends on what kind of application is important for you of course. But let us look at the complete picture: performing more than 30% faster in Virtualization benchmarks is the final proof that AMD's latest is overall the best server CPU at this point in time.

But a few weeks from now, that will all change. As always we can not disclose benchmark information before a certain date, but if you look around here at this site, you have been able to discern the omens. The K10 architecture of Shanghai is a well rounded architecture, but one that misses really crucial weapons to keep up with the Nehalem:
  • Simultaneous Hyperthreading offers performance boost that IPC Improvements are not capable of delivering (up to 45%!).
  • Memory latency. Nehalem's memory latency is up to 40% lower
  • Memory bandwidth: 3 channels is complete overkill for desktop apps, but it does wonders for many HPC and in a lesser degree server applications.
  • a really aggressive integer engine
Nehalem will use somewhat more expensive DDR-3 DIMMs, which hardly offer any real performance boosts (as compared to DDR-2). So moving to DDR-3 will not help AMD much.
 
Istanbul? 
The details on the six-core Istanbul are still sketchy. But the dual socket Xeon "Westmere" will get six cores too and will appear in the same timeframe as AMD's hexacore. Only if AMD added SMT very secretly to Istanbul, they will be able to turn the tide. Considering that this would be a first for AMD, it is very unlikely SMT made it to Istanbul.
 
A dent in Nehalem's armour?
Does AMD have a chance in the server market in 2009 (and possibly 2010)? I must say it was not easy to find a weakness in Nehalem's architecture. The challenge made it very attractive to search anyway :-). So what follows is a big "IF- iF" story and you should take it with a big grain of salt ... as you should always do with forward looking articles.
 
There is one market where AMD has really been the leader and that is virtualization thanks to the IMC and the support for segments (four privilege levels) in the AMD64 Instruction Set Architecture. AMD's performance running VMware ESX in the "good old" ESX Binary translating mode (software virtualization) was better than running an Intel on the latest hardware virtualization hypervisor. VMware only uses hardware virtualization on an AMD server if NPT (or RVI or HAP) is present . In contrast, hardware virtualization slowed the Xeons of 2005 and 2006 a bit down but was absolutely necessary to run 64 bit guests on a hypervisor on top of a Xeon server.
 
Nehalem is catching up with EPT and VPID (see here), and while it was well implemented, one thing is lacking: the TLB is rather small. I have been pointing out this out about a year ago: while the TLB got AMD a lot of bad press, it will probably be the one thing that keeps AMD somewhat in Intel's slipstream. Let me make that more clear: 
 
CPU
L1 TLB Data
L1 TLB Instr
L2 TLB
AMD Shanghai/ Opteron 238x or 838x
48  (4 KB)
48 (large)
48 (4 KB)
48 (large)
 512 (4 KB)
 128 (large)
Intel Penryn / Xeon 54xx
16 (4 KB)
16  (large)
128(4 KB)
8 (large)
 256 (4 KB)
 32 (large)
 Intel Nehalem / Xeon 55xx
  64 (4KB)
32 (large)
 128 (4 KB)
  14 (large)
 512 (4 KB)
 0 (large)
 
Notice that in case you use large pages, the Nehalem TLB has few entries. So, let us now do a thought experiment. Currently, most of the virtualization benchmarks like VMmark (VMware) and VConsolidate (Intel) use relatively small VMs. VMs are for example a small Apache webserver and Mysql server which get between 512 MB and 2 GB of RAM. As a result most of them run with large pages off (Page size = 4 KB). These benchmark are very similar to the daily practice of an enterprise which uses IT mostly for "infrastructure purposes" such as authentificating it's employees and giving them access to mail, ftp, fileserver, print serving and web browsing.
 
It becomes totally different when you are an IT firm that offers it's services to a relatively large amount of customers on the internet. You need a large database with many probably pretty heavy webportals which offer a good interactive experience.
 
So you are not going to consolidate something like  84 (14 tiles x 6 VMs) tiny VMs on one physical machine, but rather 5 to 10 "fat" VMs. With fat VMs I mean VMs that get 4 GB and more of RAM, 2 to 4 vCPUs, run a 64 bit guest OS and so on.
 
Those applications also open tons of connections, which they have to destroy and recreate after some time. In other words, lots of memory activity going on. 
 
EPT and NPT can offer between 10 and 35% better performance when lots of memory management activity is going on. Compared to the shadow page table technique, each change in the page tables does not cause a trap and  the associated overhead (which can be 1000s of cycles). So you could say that going to the TLB of your CPU is a lot smoother. But if the TLB fails to deliver, the hardware page walk is very costly.
 
In search of the real page table
A hardware page walk consists of searching in several tables which allow the CPU to find the real physical address as the running software always supplies a virtual address. With a normal OS, the OS has set the CR3 register to contain a physical address where the first table is located.The first table converts the first part of the virtual address into a physical one,  a pointer towards the physical address where the next table is located. With large pages, it takes about 3 steps to translate the virtual address to the physical one.
 
With EPT/NPT, the Guest OS gives a (CR3) address which in fact virtual and which must be converted into a real physical address. All the Guest OS tables contain pointers to a virtual addresses. So each table gives you a virtual address towards the other table. But the next table is not located at this virtual address, so we need to go out and search for the real address. So instead of 3 accesses to the memory, we need 3x3 accesses. If this happens too many times, EPT will actually reduce performance instead of improving it!
 
It is a good practice to use large pages with large database. Now remember we are moving towards a datacenter where almost everything is virtualized, databases included. In that case, Nehalem's TLB can only make sure that about 32 x 2 MB or only 64 MB of data and 28 MB of code is covered by the TLB. As a result, lots of relatively heavy hardware page walks will happen. Luckily, Intel caches the real physical page tables in the L3-cache, so it should not be too painful.
 
The latest quadcore Opteron has a much more potent TLB. As instructions take a lot less space than data, it is safe to say that the data TLB can cover up 176 (48 + 128) times 2 MB or 352 MB of data. Considering that virtualized machines have easily between 32 and 128 GB and are much better utilized (60-80% CPU load), it is clear that the AMD chip has an advantage there. How much difference can this make? We have to measure it, but based on our profiling and early benchmarking we believe that "an overflowing TLB" can decrease virtualized performance by as much 15%. To be honest: it is to early to tell, but we are pretty sure it is not peanuts in some important applications.
 
So what are we saying? Well, it is possible that the Opteron might be able to do some "damage control" compared to Nehalem when we try out a benchmark with large and fat VMs (Like we have done here). But there are a lot of "IF"s.  Firstly, AMD must also cache the page tables in the caches. If for some reason they keep the page tables out of the caches, the advantage will probably be partly negated. Secondly, if the applications running on the physical machine demand a lot of bandwidth, the fact that the Nehalem platform has up to 70% more bandwidth might spoil the advantage too.
 
The last AMD Stronghold?
So Should Intel worry about this? Most likely not. For simplicity sake, let us assume that both cores - Shanghai and Nehalem- offer equal crunching power. They more or less do when it comes to pure raw FP power, but SpecInt makes it clear that Nehalem is faster in integer loads.
 
But let us forget that, as most server applications are unable to use all that superscalar power anyway. The AMD chip is still disadvantaged by the fact that it does not have SMT. Considering that most server apps have ample threads and that virtualization makes it easier to load each logical CPU up to 80% that remains a hard to close gap. Secondly, many of these applications do not fit entirely in the cache, so the fact that AMD's memory latency is up to 40% higher is not helping either. Thirdly, all top Xeons (2.66 GHz and higher) are capable of adding 2 extra speedbins even if all 4 cores are busy (like it was the case in SAP). It will be interesting to see how much power this costs, and if Turbo mode is possible with a 80% loaded virtualized machine.
 
In a nutshell: expect Nehalem with it's ample bandwidth and EPT to do very well in VMmark. However, we think that AMD might stay in the slipstream of the Intel flagship in some virtualization setups. It is possible that AMD counters with an even better optimized memory controller in Istanbul, but it is going to be tough.
 
Return to Linpack
The benchmarks where AMD will be able to stay close should have no use for massive amounts of memory bandwidth, SMT or Turbo mode. Feel free to educate us, but so far we have only found one benchmark that answers this profile: Linpack. Linpack achieves the highest IPC rates of probably almost all softwares. That means the Nehalem Xeon will be consuming peak power, and will not be able to use Turbo mode. Linpack (with MKL or ACML) is also so carefully optimized that it runs almost completely in the caches, and SMT or hyperthreading is only disturbing the carefully placed code lines. Considering that a 2.7 GHz Shanghai CPU with registered RAM was only a tiny bit slower than a Nehalem CPU with non registered RAM, you may expect to see both CPUs very close in this benchmark. 
 
Outlook to 2009
The AMD quadcore is now the server CPU to get, but it is not going to stay that way very long. Until AMD comes up with SMT or another form of multi-threading and a faster memory controller,  Intel's newest platform and CPU will force AMD to make the quadcore opteron very cheap. We expect that the AMD quadcore will only be competitive in Linpack and some virtualization scenario's.
 
And unless Istanbul has a very nice surprise for us, it is not going to change soon. Agreed, to our loyal readers, this does not come as a surprise...
 


February 12, 2009, 17 comments
  February 11, 2009

Nehalem Xeon EP  update: too good but true
blog post by Johan De Gelas
We were quite amazed, even slightly suspicious, when HP and Fujitsu-Siemens Published their SAP numbers. These numbers showed that the newest Xeon X5570 (Nehalem EP) series offer an enormous performance boost over the Xeon X5470 (Harpertown). After all, an almost 100% improvement at a slightly lower speed (2.93 GHz vs 3.3 GHz) is nothing short of amazing. Turns out that the real clockspeed is 3.2 GHz (2.93 GHz + 266 MHz turbo) but that does not alter the fact that these are truly incredible performance numbers.

I can now confirm that there are no tricks behind these numbers: they paint the right picture about the Xeon Nehalem EP. Talking to SAP benchmarking specialists, it became clear that few tuning tricks exist that are not know to the big OEM. The benchmark has been analyzed and tuned so well, that even the use of a different database (for example MS SQL instead of DB2) only makes a 2 to 3% difference most of the time. So you might even compare SAP numbers which are obtained on different databases. To resume, the SAP numbers can only be really boosted by better hardware (CPU-memory).
 
Now why I am talking so much about SAP benchmarking numbers? It is not like the expensive ERP software is run by everyone.
 
Well, the SAP numbers are showing a dual 2.93 GHz (or 3.2 GHz) Xeon beating the only quad AMD 8384 (Shanghai at 2.7 GHz) score of 22000 we have so far. Granted, a blade server is most of the time a bit slower. But four AMD 8384 2.7 GHz will be in the same league as a dual Xeon X5570, which will be out very soon now.
 
Even worse for AMD is that the SAP benchmark is not some exotic exceptional benchmarking case for the Xeon 55xx series. It shall be no surprise that the HPC numbers will be very impressive too.So it looks like AMD is in a tough spot.
 
What happened? 
As the SAP threads are sharing a lot of data (as is typical for these kind of database driven applications), hyperthreading can not be the only explanation why Nehalem is simply doubling performance and annihilating the competition. SAP benchmarking specialists expect hyperthreading to be good for about one third of the performance boost. We tend to believe these people who performed this benchmark for years now. The reason why it is not one of the "top cases" for hyperthreading on Nehalem is that this OLTP based benchmark spends a lot of time on shared data. Our own Nehalem OLTP benchmarking (Oracle and MySQL) points also in that direction.
 
As we have pointed out before the benchmark also
  • responds very well to low latency cache and memory latency
  • does not care too much about memory bandwith
  • and is very sensitive to "syncing latency".
Since the AMD Shanghai CPU has the same fast way to sync between cores (via the L3-cache) as Nehalem, it can not explain why AMD falls behind. Another explanation is of course that these benchmarks are run on a CPU which uses turbo, which explains about a 5% advantage as the Nehalem CPU actually runs at 3.2 GHz. 
 
Nehalem has faster access to the memory than AMD's latest quadcore (70 ns vs 110 ns), which is probably the second reason why Shanghai falls behind. But AMD will probably have to redesign it's integer execution pipeline significantly before it will catch up with Nehalem (think memory disambiguation for example). Basically, AMD's better NUMA - integrated memory controller platform was hiding this disadvantage. Now that the new Intel platform does not put "the brakes" on the integer execution engine anymore, the superiority of Intel's integer engine is showing.
 
The lack of any form of multi-threading is hurting AMD badly. It is well known that most of these business applications achieve very low IPC (0.2-0.6) and that modern superscalar CPUs have ample execution resources for running two threads in these applications. The results is Simultaneous Multi Threading offers a typical 20 tot 40% performance advantage. And that is huge, considering that you need 25 to 50% more clockspeed to counter that. It is basically a mission impossible for a modern CPU without SMT to outperform a similar superscalar CPU with SMT in OLTP, Java, webserver, rendering and ERP workloads. AMD really dropped the ball there, SMT should have been part of the K10 architecture.
 
Difficult times ahead for AMD
Even if AMD is able to speed up beyond 3 GHz, chances are slim that AMD will be able to compete with the new Nehalem Xeons. Add Turbo mode, hyperthreading, a lower latency memory controller and a better integer core together and you get a performance gap the size of the "Grand Canyon".
 
So does AMD have any chance at all beyond a new architecture in 2011? Is it over and out for AMD in 2009 and 2010? Adding 2 cores at the end of 2009 is a good step in the right direction. But even if AMD executes flawlessly  the 32 nm Xeon Westmere will only give a window of a few months to the AMD hexacore "Istanbul".  Istanbul should appear at the end of 2009, the Westmere Xeon is scheduled for very early 2010.
 
Westmere has few performance optimizations, it seems to be a pretty straight forward shrink. Slightly higher clockspeeds, about 20% lower power consumption, and yet another addition to the ridiculously long list of SSE-instructions in the form of seven new instructions (six instructions are for crypto/AES acceleration). Westmere is only an evolutionary step forward, but the "Grand Canyon" gap that Nehalem EP has made is probably large enough.

 

It is sure that we'll see better (lower) virtualization switching from virtual machine to hypervisor time and some small tweaks in AMD's Istanbul CPU, but it remains unclear if there are any significant performance boosters in the core. So it looks like Intel will own the dual socket space throughout 2009 and 2010, if we may believe the current roadmaps.
 
As the SAP numbers indicate,  even the slowest Intel Xeons will show a large performance gap with the best AMD Opteron's. Is AMD doomed completely? In a large part of the market, yes. AMD's istanbul will make the gap a bit smaller but probably not small enough. 
 
There are some unknown factors that together with one of the few remaining weaknesses (or rather less strong points) of Nehalem that might make it possible that AMD's opteron comes close enough in a particular area of the market. In my next post, I will clarify the one and only opportunity that I see for AMD in the next two years.  Until then, don't shoot the messenger :-).

February 11, 2009, 35 comments
  December 16, 2008

Intel Xeon 5570: Smashing SAP records (scoop!)
blog post by Johan De Gelas
We have emphasized it more than once: the Nehalem architecture is all about regaining the performance crown in servers and HPC, desktop and mobile use were sometimes a bonus, sometimes an afterthought. Today it becomes almost painfully obvious. Just read Anand's thoughts about the Core i7:
 
"The Core i7's general purpose performance is solid, you're looking at a 5 - 10% increase in general application performance at the same clock speeds as Penryn"
and now look at the graph below.

 
Intel has apparantely allowed HP and Fujitsu-Siemens to break the NDA on the Xeon 5570 processor for PR reasons as both companies have published SAP numbers on a Dual Xeon 5570. The Xeon 5570 is based on the same architecture as the Core i7. It is a 2.93 GHz quadcore CPU with 4 times a 256 KB L2-cache and one huge shared 8 MB L3. 
 
 
SAP Sales & Distribution 2 Tier benchmark
 
The SAP numbers are absolutely astonishing, as Intel's dual socket is able to outperform quad socket opteron machines. Based on the scaling of Barcelona, we speculate that a quad Shanghai at 2.7 GHz would obtain the performance of the Dual Xeon 5570 w/o HT.The new Xeon 5570 outperforms the "old" 5450 by 119%!!!
 
These numbers are so high, that we checked and checked again. The database used is the same (SQL Server 2005), so unless there is some incredible tuning parameter that HP and FS have discovered and that we have yet to hear about, that is not it.
 
At this point we have no idea how it is possible that a 3 GHz Nehalem outperforms the latest Opteron by a margin as high as 80% and more. But we can give it a try. In a previous server oriented article, we summed up a rough profile of SAP S&D:

• Very parallel resulting in excellent scaling
• Low to medium IPC, mostly due to “branchy” code
• Not really limited by memory bandwidth
• Likes large caches
• Sensitive to Sync (“cache coherency”) latency
 
One of the biggest bottlenecks for Intel has been the sync latency. It is possible that once the "sync" bottleneck was removed, the intel architecture is able to show it's real integer crunching power thanks to the out of order loads (memory disambiguation) and better branch prediction.Those are two areas where the opteron architecture is still weak.
 
The slightly lower latency of the L3-cache of Nehalem helps too. This kind of software also makes the buffers fill up due to the long dependency chains. Those OOO buffers have been increased and the depencency chains have been shortened by a very low latency L2 cache and relatively fast L3.
 
Still we are absolutely amazed that the difference is this large. We would have expected Nehalem to outperform Shanghai by lower margins. Although we still are a bit skeptical that the difference is this large ("too good to be true" syndrome), we do not see how you could artificially inflate a SAP benchmark. It sure is not as easy as SPECJBB or SPECfp/int. 
 
 
Update (a few hours later): It seems that the SAP page was wrong about HT. It reported 8 threads on 8 cores on the Fujitsu Siemens Primergy Server. The certification page says otherwise: 16 threads on 8 cores. So hyperthreading (SMT) plays probably an important role in this benchmark as the SAP application has very low IPC and is very parallel. So this completely annihilating performance comes from combining a wide superscalar CPU with an excellent Simultaneous Multithreading implementation. Hats off to the Intel engineers...
 
 
 

December 16, 2008, 29 comments
  December 1, 2008

LINPACK: Nehalem vs Shanghai part 2
blog post by Johan De Gelas
The last post generated some very interesting comments and questions, which I wanted to address. Unfortunately, some people misinterpreted the post as a "the best scores Nehalem and Shanghai can get in Linpack" review.
 
So let me make this very clear: this and the previous blogpost are not meant to be a "buyer's guide". The Nehalem desktop system and AMD "Shanghai" server are completely different machines, targeted at totally different markets. Normally, we should wait for the Xeon 5500 to run these kind of benchmarks, but consider this a preview out of curiosity.
 
Secondly, we were not trying to get the highest possible LINPACK scores on both architectures. We wanted to use one binary which has good optimizations for both AMD's and Intel CPU's. Fully optimized binaries won't even run on the other CPU. Our only goal is to get an idea how the Nehalem and Shanghai architectures compare when running a "LINPACK" alike binary which is optimized to run on all machines.
 
Thirdly, this is not our review of course. This is a blogpost which talks about some of the tests we are doing for the review.
 
MKL on AMD?
Using the Intel Math Kernel Libraries on an AMD CPU is of course a good way to start some heavy debates. As I pointed out in the last blogpost however, in some cases, the slightly older MKL versions still do a very good job on AMD CPUs when you benchmark with low matrix sizes. You don't have to take my word for it of course.
 
Compare the Intel Linpack 9.0 (available mid 2007) with the binary that AMD produced at the end of 2007. AMD made a K10 only version using the ACML version 4.0.0, and compiling Linpack with the PGI 7.0.7 compiler (with following flags: pgcc -O3 -fast -tp=barcelona-64).
 
All the benchmarks below are done on one CPU with 4 GB (AMD, Intel Xeon) or 3 GB (Intel Core i7). Speedstep, Powernow! and Turbo mode were disabled. 
 
LINPACK version 2007
 
As predicted, the ACML binary which was compiled with 2007 compiler is slower than the MKL "2007" version also compiled in 2007. The MKL version runs on any CPU that has support for (S)SSE-3, so it continues to be a very interesting one for us to test. As you can clearly see from the Xeon 5472 (3 GHz) score, it is not fully optimized for the latest 45 nm Intel CPUs with SSE-4. It is a good "not too optimized" version which can be used on both Intel and AMD CPUs.  You can clearly see this as the 3 GHz Xeon 5472 is behind the AMD Opteron 8384. If this Intel Binary was giving the AMD CPUs a badly optimized code path, this would not be possible.
 
As we move forward to 2008,  we have to create a new binary as both AMD and Intel's fully optimized Linpack versions will not run on the competitor's CPU. Intel released the Linpack benchmark version 10.1, which is not fully optimized for the "Nehalem" architecture, but for 45 nm "Harpertown" family.
 
AMD has created a new Linpack binary using ACML 4.2 and the PGI 7.2-4 compiler.  Below you see how the two CPUs compare.
 
LINPACK version late 2008
 
Bottom line is that these LINPACK benchmarks are moving targets like the SPEC CPU benchmarks, as the compilers and libraries used are just as important as the CPUs.When the Xeon 5500 will materialize, LINPACK performance will probably be higher as the binary is built for the "Penryn/Harpertown" family.
 
While it is useful for the HPC people to see which CPU + compiler can offer the best performance, it is also interesting to understand what kind of performance you get when you compile binaries that have to run on all current CPUs. It is pretty hard to compare CPU architectures if you are using totally different binaries.
 
In the next post we'll delve a bit deeper on what is happening with Hyperthreading, Linpack and the new architectures.

December 1, 2008, 35 comments
  November 28, 2008

LINPACK: Intel's Nehalem versus AMD Shanghai
blog post by Johan De Gelas
A "beta BIOS update" broke compatibility with ESX, so we had to postpone our virtualization testing on our quad CPU AMD 8384 System.
 
So we started an in depth comparison of the 45 nm Opterons, Xeons and Core i7 CPUs. One of our benchmarks, the famous LINPACK (you can read all about it here) painted a pretty interesting performance picture. We had to test with a matrix size of 18000 (2.5 GB of RAM necessary), as we only had 3 GB of DDR-3 on the Core i7 platform. That should not be a huge problem as we tested with only one CPU. We normally need about 4 GB for each quadcore CPU to reach the best performance.
 
We also used the 9.1 version of Intel's LINPACK, as we wanted the same binary on both platforms. As we have show before, this version of LINPACK performs best on both AMD and Intel platforms when the matrix size is low. The current 10.1 version does not work on AMD CPUs unfortunately.
 
We don't pretend that the comparison is completely fair: the Nehalem platform uses unbuffered RAM which has slightly lower latency and higher bandwidth than the Xeon "Nehalem" will get. But we had to satisfy our curiousity: how does the new "Shanghai" core  compare to "Nehalem"?
 

 
 LINPACK

 
Quite interesting, don't you think? Hyperthreading (SMT) gives the Nehalem core a significant advantage in most multi-threaded applications, but not in Linpack: it slows the CPU down by 10%. May we have found the first multi-threaded application that is slowed down by Hyperthreading on Nehalem? That should not spoil the fun for Intel though, as many other HPC benchmarks show a larger gap. AMD has the advantage of being first to the market, Nehalem based Xeons are still a few months away.
 
Also, the impact of the memory subsystem is limited, as a 50% increase in memory speed results in a meager 6% performance increase. The Math Kernel Libraries are so well optimized that the effect of memory speed is minimized. This in great contrast to other HPC applications where the tripple channel DDR-3 memory system of Nehalem really pays off. More later...
 
 

November 28, 2008, 61 comments
  November 9, 2008

MySQL and the power of Intel SSDs
blog post by Johan De Gelas
More and more of the database vendors are talking about the wonders that SSD can do for transactional (OLTP) databases. So I read Anand's latest SSD article with more than usual interest. If many of the cheaper MLC SSD's write small blocks 20 times slower than a decent harddrive, these SSD's are an absolute nightmare for OLTP databases. 
 
In our last Dunnington review, we showed our latest virtualization test which includes 4 concurrent OLTP ("Sysbench") tests on four separate MySQL 5.1.23 databases in four ESX virtual machines. We were fairly confident that our 6 disks RAID-0 for data and 1 separate disk for logging were capable of keeping up. After all, each disk is a 300 GB Cheetah Seagate at 15000 rpm, probably one of the fastest (mechanical) disks on this planet as it can deliver up to 400 I/O per second (and 125 MB/s sequential data rate).
 
But it is better to be safe than to be sorry. We did extensive monitoring with IOstat (on a "native" SLES 10 SP2) and found the following numbers on the disk that performs the logging transactions: 
  • queue length is about 0.22 (More than 2 indicates that the harddisk can not keep up)
  • typical average I/O latency is 0.23 ms (90%), with about 10% spikes of 7 to 12 ms (we measure the average over the past 2 seconds) 
That reassured us that our transaction log disk was not a bottleneck. On a "normal" SLES 10 SP2 we achieved 1400 tr/s on a quad core (an anonymous CPU for now ;-). But Anand's article really got us curious and we replaced our mighty Cheetah disk with the Intel x25-M SSD (80 GB). All of a sudden we achieved 1900 tr/s! No less than 35% more transactions, just by replacing the disk that holds the log with the fastest SSD of the moment. That is pretty amazing if you consider that there is no indication whatsoever that we were bottlenecked by our log disk.
 
So we had to delve a little deeper. I first thought that as long as the harddisk is not the bottleneck, the number of transactions would be more or less the same with a faster disk. It turned out that I was somewhat wrong. 

In MySQL each user thread can issue a write when the transaction is commited . More importantly is a completely serial, there doesn't seem to be a separate log I/O thread which would allow our user thread to "fire" a disk operation "and forget". As we want to be fully ACID compliant our database is configured with
 innodb_flush_log_at_trx_commit = 1
 
So after each transaction is committed, there is a "pwrite" first, then followed by a flush to the disk. So the actual transactions performance is also influenced by the disk write latency even if the disk is nowhere near it's limits.
 
We still have to investigate this further but this seems to go a bit against the typical sizing advice that is given for OLTP databases: make sure your log disks achieve a certain numbers of I/Os or put otherwise: "make sure you have enough spindles". That doesn't seem to paint the complete picture: as each write to disk action seems to be in the "critical speed path" of your transaction, each individual access latency seems to influence performance.
 
We monitored the same Sysbench benchmark on our Intel X25-M disk: 
  • Queue length is lower: 0.153 (but 0.2 was already very low)
  • typical access latency: an average 0.1 with very few spikes of 0.5 ms.
  • 1900 instead of 1400 tr/s
 So our conclusion so far seems to be that in case of MySQL OLTP, sizing for IO/s seems to be less important than the individual write latency. To put it more blunt: in many cases even tens of of spindles will not be able to beat one SSD as each individual disk spindle has a relatively high latency. We welcome your feedback!
 
 
 



November 9, 2008, 33 comments
  October 7, 2008

The fun side of virtualization: virtualized gaming and hyper-V benchmarking
blog post by Johan De Gelas
Want to get rid of Windows (Vista) and run DirectX games on Linux? Found the recent virtualization articles just a bit tad too much "nuts and bolts"? Or do you wonder how well Microsoft's Hyper-V performs compared to VMware's ESX? It is all cooking in our IT lab.  
 
Liz will explain you why virtualization is fun and interesting in layman's terms, and will make you see the virtual wood despite the trees. This upcoming article should give those of you taking your first steps with virtualization a strong base knowledge of the technology, and is a good prelude to our in-depth articles. Of course, all work and no play would make Anandtech IT a dull website, so the article will also look at some of the fun bits virtualization has introduced to the world of the desktop user. Half-life 2 running on Linux or Mac OS-X? Runs fine and relatively fast!
 
Back to work. Microsoft has made a big splash with Hyper-V, so we could not resist: we had to include it in our long awaited hypervisor comparison. Hyper-V is a very interesting technology: it is a mix of paravirtualization and hardware virtualization. Surprisingly, Hyper-V fully supports Linux, the paravirtualized "Linux integration tools" (a paravirtualized driver pack) is available for several linux distributions. There is one catch: SMP does not work (yet?). In other words, Windows 2008 software can work with up to 4 virtual CPUs, Linux guest OS have to be content with only one. Officially, Windows 2003 only supports 2, but we found that running with 4 virtual CPUs is not a problem at all (in contrast with Linux: more than one CPU will simply not work on Hyper-V).
 
The Hyper-V team went to great lengths to paravirtualized Windows 2008, less effort was spend in Windows 2003. Remember the Oracle OLTP test, the MySQL decision support database test and the heavy php website that we ran together?
 
Well, we noticed that
  • the mysql DSS ran 2% faster
  • the heavy php website ran 3-7% faster   
  • and the OLTP oracle database ran a tangible 18% faster
if we run those realworld workloads on Windows Server 2008 instead of Windows Server 2003. We did not alter anything but the operating system: for example, the php site was still running on a IIS6 webserver instead IIS7 (which is standard on Windows 2008). How does it compare to ESX? Well, we'll report our full results soon. It is extremely interesting how the picture changes from application to application. Intel or AMD? It can make a difference in the hypervisor race.
 
Quick note to our Dutch and Flemish readers: we will be presenting - live -  our virtualization research on the October 23rd, together with VMware, Microsoft and Novell in great detail. Some of the greatest hardware will also be present (HP Blades, HP EVA storage array, Intel "Dunnington" and maybe more we can't talk about :-). Look for more details here (Dutch) or here (English).
 

 
 
 

October 7, 2008, 20 comments
  September 15, 2008

IT portal update: Nehalem, women and heavy benchmarking
blog post by Johan De Gelas
Updated (22/09/2008): Our Dunnington review will be online tomorrow.

After seeing how much reactions Derek got with his now famous post about "female readership at Anandtech", I thought that this title would definitely draw your attention :-). Anyway, one of suggestions that Derek got was that Anandtech should hire female editors for the woman point of view. Those people really need to read it.anandtech.com more: we have Liz writing some heavy and well written articles here such as container based virtualization and software virtualization.
 
So, that took care of the "women" part of this blogpost. Another suggestion that was posted was that we should deliver what we have promised you. And indeed it.anandtech.com promised you an in depth comparison of the hypervisors out there such as Hyper-V, ESX and Xen. We delivered only a few virtualization benchmarks so far. What happened? Well, after we did so much testing, we worked with the performance teams of VMware and Microsoft and after some time we ironed out a few mistakes that we made. So that made us rerun a whole battery of benchmarks and tests, but that is the price you got to pay when you start testing something which is new to you. So expect a full hypervisor comparison late this month. Why this late? Because a very important launch of CPUs caused us to shift our focus: we want to deliver some benchmarks on those new CPUs first. If all goes well, you should see those spanky new CPUs benchmarked in a virtualized setup this week. I believe it will be a very new and interesting experience to see CPUs tested this way.
 
Last but not least, I'll be off to London this wednesday to meet Ronak Singhal. If you still have - after Anand's excellent article -an excellent technical question for the Chief Architect Ronak Singhal about Intel's Nehalem, I'll be glad to ask him this question. Understand that I probably only have time for the very best questions, but please feel free to either send me this question or post it here.
 
Basically, check out our IT portal regularly the coming days and weeks: you'll be freed of all that gaming stuff on the front page (*), and rest assured we'll show you some interesting things!   
 
(*) Just Joking ;-)
 
 
 
 

September 15, 2008, 23 comments


more posts More posts


AnandTech.com Blog Categories
All categories
Anand's Macdates
Anand's Theater Construction
Anand's Updates
Cases and Power Supplies
CeBIT 2008
CES 2008
Computex 2009
Derek Decanted
Eddie's Got Game
Gary's First Looks
IT Computing general
Jarred's Musings
Kris's Corner
Raja's Ramblings
Rob's Experiences...
Ryan's Ramblings
Virtualization
What's New with Wes
Blank
Blank

Blank

Latest news by
DailyTech

 November 20, 2009

Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank

 November 19, 2009

Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank


more Blogs Discussions



pipeboost
Copyright © 1997-2009 AnandTech, Inc. All rights reserved. Terms, Conditions and Privacy Information.
Click Here for Advertising Information