ARM based servers hold the promise of extremely low power and excellent performance per Watt ratios. It's theoretically possible to place an incredible number of servers into a single rack; there are already implementations with as many as 1000 ARM servers in one rack (48 server nodes in a 2U chassis). What's more, all of those nodes consume less than 5KW combined (or around 5W per quad-core ARM node). But whenever a new technology is hyped, it's important to remain objective. The media loves to rave about new trends and people like reading about "some new thing"; however, at the end of the day the system administrator has to keep his IT services working and convince his boss to invest in new technologies.

At first sight, the relatively low performance per core of ARM CPUs seems like a bad match for servers. The dominant CPU in the server market is without doubt Intel's Xeon. The success of the Xeon family is largely rooted in its excellent single-threaded (or per core) performance at moderate power levels (70-95W). Combine this exceptional single-threaded performance with a decent core count and you get good performance in almost any kind of application. Economies of scale and the resulting price levels are also very important, but the server market has been more than willing to pay a little extra if the response times are lower and the energy bills moderate.

A data point proving that single-threaded performance is still important is the evolution of the T-series of Oracle (or Sun if you prefer). The Sun T3 had 16 cores with 128 threads; the T4 however had only 8 cores with 8 threads each, and CEO Larry Ellison touted more than once that single-threaded performance was massively improved, up to five times faster. Do we really need another server with a flock of slow but energy efficient cores? Has history not taught us that a few "bulls" is better than "a flock of chickens"?

History has also shown that the amount of memory per server is very important. Many HPC and virtualization applications are limited by the amount of RAM. The current Cortex-A9 generation of ARM CPUs has a 32-bit address bus and does not support more than 4GB.

And yet, the interest in ARM-based servers is growing, and there is more to it than just hype. Yes, ARM-based CPUs still lack the number crunching power and the massive amount of DIMM slots that Xeon's memory controller can handle, but ARM CPUs score extremely well when it comes to cost and power consumption.

ARM based CPU have also made giant steps forward when it comes to performance. To give you a few data points: a dual ARM Cortex-A9 at 1.2GHz (Samsung Exynos 1.2GHz) introduced in 2011 compresses more than 10 times faster than the typical ARM 11 based cores in 2008. The SunSpider performance increased by a factor 20 according to Anand's measurements on the iPhones (though part of that is almost certainly thanks to browser and software optimizations). The latest ARM Cortex-A15 is again quite a bit more powerful, offering about 50% higher performance. The A57 will add 64-bit support and is estimated to deliver 20 to 30% higher performance. In short, the single-threaded performance is increasing quickly, and the same is true for the amount of RAM that can be addresssed. The ARM Cortex-A9 is limited to 4GB but the Cortex-A15 should be able to address 16GB while the A57 will be able to address a lot more.

It is likely just a matter of time before ARM products can start to chip away at segments of the server market. How much time? The best way to find out is to look at the most mature ARM server shipping today: the Calxeda based Boston Viridis. Just what can this server handle today, where does it have the potential to succeed, and what are its shortcomings? Let's find out.

It's a Cluster, Not a Server
POST A COMMENT

99 Comments

View All Comments

  • JohanAnandtech - Wednesday, March 13, 2013 - link

    Thanks! Reply
  • SunLord - Wednesday, March 13, 2013 - link

    Hmm if these didn't cost $20,000 they would make a nice front end for larger websites and forums using less rack space and power. What setup using these would you use for anandtech? Would you guys keep the intel DB server? Reply
  • Gunbuster - Wednesday, March 13, 2013 - link

    I just got a Dell R720xd decked out with 384GB and 4.3TB of storage for a hair over that price. Reply
  • JohanAnandtech - Wednesday, March 13, 2013 - link

    Intel Xeons are still by far a better choice for relational databases that are very hard to split up (sharding is only a last resort) Reply
  • zachj - Wednesday, March 13, 2013 - link

    I'm not sure I agree with the absolutism that seems imlicit in your comment that Xeons are better for relational databases...I think there are cases where that won't be true.

    Database scale-out doesn't always require sharding...using any of a number of different off-the-shelf capabilities built right into most SQL engines, you can create multiple active replicas of your database. This is generally better-suited to workloads that aren't write-intensive, but both clustering and replication allow for writes. While this may seem like a quick-and-dirty solution that is architecturally "less good" than sharding, hardware is a lot cheaper than paying people to design a sharding solution and the dollars very often drive the conversation. As long as the database size isn't terribly large this can be a very cost-effective way to scale out a database.

    I would wager that the Anandtech website database (not the forum database) would probably be well-suited to this type of scale-out. You do waste some money on redundant storage but you more than make up for that cost by not having to pay a development team to implement sharding. If the comments section of the Anandtech website gets stored in the same underlying database, the size constraints and the write activity may appear to be incompatible with this approach, but I would in fact argue that comments don't require relational capabilities of SQL and would be more rightly stored as blobs in Hadoop or Azure Storage Tables. Then the Anandtech database is strictly articles and is both much more compact and almost entirely read-only (except for a few new articles per day).
    Reply
  • rwei - Friday, March 15, 2013 - link

    To the best of my understanding, replication does well for scaling reads but doesn't do much for writes. I'd still imagine that this would work decently well with AnandTech, where I can't see the volume of writes being that large relative to the volume of reads. Reply
  • Kurge - Wednesday, March 13, 2013 - link

    They would make a horrible front end for such websites. Just buy a single Xeon server and don't artificially limit it by using 24 VMs. Just run the app straight on the metal and it will perform massively better. Reply
  • Oldboy1948 - Wednesday, March 13, 2013 - link

    Very interesting Johan as your tests often are!
    Interesting that the memory bw is so much lower than anything from Intel. In fact Iphone 5 looks much better...why? Only Intel has about the same rsults in compress and decompress.
    Reply
  • JohanAnandtech - Wednesday, March 13, 2013 - link

    Where did you see the stream results on the A6? I might have missed it somewhere. The only ones I could find reported only 1 GB/s in Triad. http://www.anandtech.com/show/6298/analyzing-iphon... The Quad ECX-1000 got 1.8 GB/s Reply
  • PCTC2 - Wednesday, March 13, 2013 - link

    Do you know what would be an interesting concept for a future version of these cluster-in-a-box systems? A solution like ScaleMP. ScaleMP is basically a reverse VM. A hypervisor on each server clusters together to run a single OS with an aggregation of all resources (cores, RAM, network, and disk). ScaleMP running on 4x Dual-socket 8-core Xeon systems w/ 32GB RAM results in a usable system with 64-cores and 128GB RAM as if it was running natively on the hardware. This would be an interesting concept to transfer to the ARM space (if a form of hardware virtualization ever is designed). In a box like this, there would be 192 cores and 192GB of RAM available to a single Fedora instance. Cluster 2 of these together and suddenly there's a system with 384 cores and 384GB of RAM in 4U. Just some food for thought. Reply

Log in

Don't have an account? Sign up now