The million dollar question: how do you upgrade your datacenter
by Johan De Gelas on April 7, 2009 12:00 AM EST- Posted in
- IT Computing general
In our last article about server CPUs, I wrote:
"the challenge for AMD and Intel is to convince the rest of the market - that is 95% or so - that the new platforms provide a compelling ROI (Return On Investment). The most productive or intensively used servers in general get replaced every 3 to 5 years. Based on Intel's own inquiries, Intel estimates that the current installed base consists of 40% dual-core CPU servers and 40% servers with single-core CPUs."
At the end of the presentation of Pat Gelsinger (Intel) makes the point that replacing nine servers based on the old single core Xeons with one Xeon X5570 based server will result in a quick payback. Your lower energy bill will pay back your investment back in 8 months according to Intel.
Why these calculations are quite optimistic is beyond the scope of this blogpost, but suffice to say that Specjbb is a pretty bad benchmark to perform ROI calculations (it can be "inflated" too easiliy) and that Intel did not consider the amount of work it takes to install and configure those servers. However, Intel does have a point that replacing the old power hungry Xeons (irony...) will deliver a good return on investment.
In contrast, John Fruehe (AMD) is pointing out that you could upgrade dualcore Opteron based servers (the ones with four numbers in their modelnumbers and DDR-2) with hex-core AMD "Istanbul" CPUs. I must say that I encountered few companies who would actually bother upgrading CPUs, but his arguments make some sense as the CPU will still use the same kind of memory: DDR-2. As long as your motherboard supports it, you might just as well upgrade the BIOS, pull out your server, replace the 1 GB DIMMs with 4 GB DIMMs and replace the dual cores with hex-cores instead of replacing everything. It seems more cost effective than redo the cabling, reconfigure a new server and so on...
There were two reasons why few professional IT people bothered with CPU upgrades:
- You could only upgrade to a slightly faster CPU. Upgrading a CPU to a higher clocked, but similar CPU rarely gave any decent performance increase that was worth the time. For example, the Opteron was launched at 1.8 GHz, and most servers you could buy at the end of 2003 were not upgradeable beyond 2.4 GHz.
- You could not make use of more CPU performance. With the exception of the HPC people, higher CPU performance rarely delivered anything more than even lower CPU percentage usage. So why bother?
AMD has also a point that both things have changed. The first reason may not be valid anymore if hex-cores do indeed work in a dualcore motherboard. The second reason is no longer valid as virtualization allows you to use the extra CPU horse power to consolidate more virtual servers on one physical machine. On the condition of course that the older server allows you to replace those old 1 GB DIMMs with a lot of 4 GB ones. I checked for example the HP DL585G2 and it does allow up to 128 GB of DDR-2.
So what is your opinion? Will replacing CPUs and adding memory to extend the lifetime of servers become more common? Or should we stick to replacing servers anyway?
{poll 124:400}
23 Comments
View All Comments
JohanAnandtech - Saturday, April 11, 2009 - link
I given some counterpoints, but let me thank you for your excellent feedback! This is the kind of discussion that will enlighten the it.anandtech.com community.has407 - Sunday, April 12, 2009 - link
Thanks! Glad to be able to constructively contribute to the discussion. To clarify some of the earlier points...Most CFO's I've encountered have a pretty good understanding of technology, and will work very hard to try and make things work. However, realize that they have a pretty rigid set of very visible metrics they are judged by (much more rigid and visible than most people), and are often end stuck mediating between competing interests. Like most people, they're also trying to execute according to a plan. And while situations and plans change, knowing what you have to juggle with and how much can be juggled without throwing everything out of kilter is important.
That's one reason predictability is important--but that doesn't necessarily mean rigidity. Another reason is that predictability is a first-order indication of whether you know what you're doing and can execute to a plan--whether it's cost, schedule or defects. Where IT can help the CFO is to better understand how to juggle the expenses associated with various parts, and the tradeoffs. That starts with understanding how various parts of the IT budget fit into the financial equation, the tradeoffs, and ultimately how it all shows up in the financial statements.
Much of this depends on a company's financial priorities at any given point--and those priorities are likely to change over time. E.g., at one company EBITDA was the priority; after, net was the priority; after that cashflow was the priority. That was a company with a fairly heavy up-front capital infusion. At another company cashflow was the priority (few investors and not a lot of cash cushion). Those priorities are also typically related to where a company is in its lifecycle; specifically, the exit strategy. For companies looking at acquisition as the exit strategy, how the company is valued will likely make a big difference (revenue? gross profit? operating profit? net profit?).
Whether extending equipment life makes sense depends in part on those priorities. A CFO may be willing to take a hit to net if it improves cashflow. OTOH, a company with good cashflow may be looking to trade some of that for an improvement to gross or net. This is also where virtualization can make a big difference, as the options for extending equipment life are considerably greater. (Whether appropriate is another matter, and is dependent on the organization.) E.g., instead of a bunch of discrete systems (X server, Y server, Z server), you have a pool and can operate more like a utility. Some of the systems kept in service might not be the most efficient, but in many cases they are still cost-effective. (NB: Google's MO is an extreme example of this.)
Depreciation doesn't necessarily make extending the life of equipment unattractive per-se. However, the rules tend to have an influence. E.g., maintenance contracts tend to get more expensive over time not simply due to equipment age, but because of decreasing demand that is arguably a result of those accounting rules; in many cases it is unavailable or prohibitively expensive beyond 5-years. However, if the IRS decided tomorrow the maximum depreciation rate for IT was 5 years, I'd bet you'd see maintenance available for most equipment for at least 7 years. That doesn't mean the equipment isn't useful after that time, but no company I know of has any hardware of software running in a critical capacity that isn't under maintenance--and when maintenance is no longer available, it gets dumped or sent to be a lab rat.
That said, big organizations tend to have more options. They can do self-maintenance. They can negotiate maintenance or lease deals with a lot more options. Most SMB's don't have those options. You're in a 3-year lease for that equipment? Then while you may acquire new equipment, I guarantee it's not going to replace the existing equipment until the lease on the current equipment is up. Unless you're exceptionally strapped for power or space and paying exhorbitant rates, the lease payments (and the net hit) of those systems now sitting unused in the closet will dwarf any savings. (And thus while Intels ROI and 9-in-1 claims are ultimately hollow, even if true, but that's another subject.)
In short, this isn't magic... basic calculations and numbers. However, understanding what those numbers mean to different people, and the priorities and tradeoffs--as in most problems--is the trick. But this is not fundamentally different than many problems engineers deal with every day.
JohanAnandtech - Saturday, April 11, 2009 - link
First I admit that I know very little about Corporate financials. But I learn the basics."Depreciation. That allows us to write off the equipment. All the costs, including maintenance, can then be amortized. CFO's like that. "
Agreed. But does writing off equipment make extending the life of equipment unattractive? AFAIK, writing off means you like to lower the result of the company and pay less taxes. But there are probably limits to it's usefullness? (In most European countries this is the case IIRC, don't know about the RIS)
"power savings are a drop in the bucket compared to the cost."
Not if you need to install another airco or more power lines because your are hitting some limits :-).
"CFO values predictability"
Ok. But it all sounds like a very static rigid model. Because it looks good in the accountant books, it is really good for the company? Without generalisation, but the CFO should, just as CTO, be there to serve the business goals and not the other way around.
" more people invested time in learning to read a financial statement and understanding the business parameters, rather than simply focusing on speeds and feeds."
True. Some basic knowledge helps. But the same is true for the CFO :-)
mlambert - Saturday, April 11, 2009 - link
I should've read your post before replying with a new you. You hit most of the key points fairly accurately.has407 - Thursday, April 9, 2009 - link
Instead of replacing entire units, virtualization makes upgrading existing units more feasible and justifiable.The configurations of our last cycle was with an eye towards a mid-life CPU/memory upgrade, with rolling upgrades... move the workload off those servers, upgrade them, then put them back in the pool. That is much more difficult and time-consuming without virtualization. With virtualization the lifespan of a unit can also be extended... ok, so it's too old and slow to run our OLTP system, but there may still be workloads we can run on it.
That said, what makes sense depends a lot on other factors, including space and how much other than the CPU/memory is part of the equation. E.g., many IT environments have configurations which look very similar to HPC environments: (1) boxes with little more than CPU, memory and network interfaces; (2) network boxes; (3) SAN boxes. In those environments, the difference between upgrading vs. replacing may be much smaller.
tshen83 - Wednesday, April 8, 2009 - link
That a Xeon E5504 at $227 with broken HT and broken Turbo Boost and castrated 800Mhz IMC can have the same performance as the Opteron 2384 at $700. You do the math.AMD foolishly thinks that a 50 dollar cut to selective "channel partners" would tip the balance toward Opteron upgrades. A flat price reduction only works at then low end, making the Opteron 2376 the only CPU worth buying. (175-50 = 125 dollars?). At the high end, Opteron 2384/2387/2389, it is hardly a 5-8% price reduction.
I don't know that a price reduction this small will prevent people from jumping to Nehalem-EP let alone the upgradable 32nm Westmere. There are several misconceptions AMD wants people to believe:
1. Nehalem-EP platform is more expensive.
I say BS. 2S Nehalem-EP board can be found for as little as 250 dollars now, from respectable vendors like Asus and Tyan(Asus Z8NA-D6C, Tyan S7002). AMD 2S Opteron board is above 300 at most vendors place. At the motherboard level, it is about the same.
2. DDR3 ram is more expensive.
Only for 4GB DIMMs. Yes DDR3 density hasn't caught up with DDR2 yet, but one of the design decisions Intel did right is to support Unregistered DDR3 ECC ram or UDIMMs. 2GB DDR3 UDIMMs are selling for 30 dollars, effectively at price parity vs 2GB DDR2 REG ECC ram that the Opteron uses. A 2S Nehalem can support up to 24GB of UDIMMs for as low as $360.(30*12). If you need more ram for Database, get Dunningtons which will get you 128GB ram(4*32) cheap or 256GB ram(8GB*32) if you can pay for it.
3. DDR3 uses more power.
BS. DDR3-800 at the same speed as DDR2-800 uses 15% less power. The extra power at the higher speed allows the DDR3 to scale to 1333Mhz, something DDR2 can't do reliably.
The current pathetic 50 dollar price cut by AMD still doesn't address the fundamental problem that Intel's lowest grade broken Nehalem can be as fast as AMD's highest end Opterons selling for 3 times the price. Even at the same performance, you have to remember that E5504 is a 80W TDP part, while the Opteron 2384 is a 115W TDP part. Even with same performance, E5504 has a 25% advantage over the Opteron 2384 at performance/watt metric. Let alone performance/watt per dollar.
ko1391401 - Tuesday, April 7, 2009 - link
Depends on how often you're upgrading. Working in the public sector, we can't afford to upgrade often, so when time comes, too much has changed. And often, cost/performance is way in replacement's favor. I'm currently replacing 3-year-old fully-RAM populated 2core socket940 rackmounts running VMWare with half the number of half-RAM populated 4core socketF blades. Still keeping my socket604s going with RAM upgrades, though.Rigan - Tuesday, April 7, 2009 - link
You missed the third and most important reason, the warranty. Nothing is more important in a machine room than equipment being under easily manageable warranties. Maybe you can get your hardware vendor to replace bits of the machine and extend the warranty of the old parts, but most likely not. And if you do, you'll end up with machines covered by two or more warranties. That's a big mistake. Full replacements every X number of years keeps machines under carefully and easily managed warranties.JohanAnandtech - Tuesday, April 7, 2009 - link
It is a good point. Still, the impact might not be so high depending on your situation. Most warranties are 3 years. So if you extend the life of your server with a CPU/mem upgrade, the warranty is over. However, it is a small risk, as decent manufacs guarantee spare parts for a period of 5 years.In that case, only if the motherboard dies, you probably won't save much as replacing the motherboard is quite a bit of work and might steer you towards a new server anyway. All other problems like a dead disk or PSU are easily and quickly replaced. So IMHO, it pays off to work for a few years without warrantees (they probably won't cover "normal" wear anyway).
StraightPipe - Wednesday, April 8, 2009 - link
This is entirely dependent on the environment.For example, I've supported some small businesses where the IT Dept was competent and even built some servers from scratch. Warranties don't add up to much in those environments, except 4 hour replacement parts, those are pretty nice.
For a larger environment, I wouldn't be singing the same tune. I'd be using vendor supported everything. Ensuring responsibility for a crash falls directly to Dell or HP or [anyone but me].