Original Link: https://www.anandtech.com/show/10923/openpower-saga-tyans-1u-power8-gt75



Back in April 2016, Tyan has introduced its IBM POWER8-based 1U servers at the OpenPOWER Summit in San Jose, California. The server market is dominated by five OEMs: HPE, Dell, IBM, Lenovo, and Cisco, with ODM Quanta and HPC specialist Supermicro being the challengers. So the fact that Tyan also sells POWER8 servers can hardly be considered a game changing fact. However, Tyan does punch well above its weight. Note that there are only four OpenPOWER POWER8 platforms out there:

  1. The Wistron's Polaris was the basis for the heavy duty S822LC (see our review here). An improved version, the Polaris Plus was the first Power8 + P100 NVLink, probably the fastest 2U HPC server available (see here).
  2. The Supermicro "Briggs" which is found in the latest IBM S822LC "For Big Data".
  3. The Rackspace "Barreleye"
  4. The Tyan Habanero.

As far as we know, the Supermicro board is only available as an IBM server.

The Rackspace Barreleye is the odd one out: it is 1.25 U (instead of 2U) high, 21 inches wide, and needs the power shelves of "OpenRack". The cool thing is that it is no longer a prototype; it is a server that can actually be bought. This means that those who want to start over and make use of the many advantages of OCP have a real choice between the Intel Xeon and IBM POWER. Such servers - even in low quantities - are available from Penguin Computing (Magna 1015), Mark III systems, and Stack Velocity.

Of course, most datacenters are not switching to OCP racks (yet). If you want a standard 19 inch wide OpenPOWER server, there is a pretty good chance your server is based upon the Tyan Habanero. Habanero was not only the platform in most of the Tyan POWER8 offerings, but it is also the board that can be found inside the IBM S812LC and the Penguin Computing Magna 2001. Of course, the OpenPOWER server market is still a young and small market, but Tyan did have quite a bit of impact.

So when we saw that Tyan made a 1U server based upon this Habanero platform, that caught our eye. The power-hungry POWER8 inside a density optimized form factor? And they feed it with a PSU of "only" 750 W? Remember, most POWER8 servers come with 1000+ Watt PSUs.

OpenPOWER: Survive or Thrive?

Now at this point you may be asking: yet another article about OpenPOWER? Isn't all effort going to waste as IBM's piece of the server market is shrinking and Intel reigns supreme? Why would IBM be able to succeed where so many others have failed?

We are among the first to recognize that Intel still rules the server world, and that there is no alternative to the midrange Xeon E5 when you want the best performance-per-watt available. Furthermore the arrival of servers based upon AMD's Naples CPU will put even more pressure on all non-x86 server CPU alternatives, as it is expected to ship with up to 32 cores, 64 thread, offer 8 DDR4 channels and integrate no less than 128 PCIe lanes. And in the long-run (and perhaps most importantly), it will force Intel to come up with better performance/watt/dollar offerings, ratcheting up the pressure on non-x86 server CPUs even further.

As important as performance-per-watt is, several markets – HPC, Analytics, and AI chief among them – consider absolute performance the most important metric. Wattage has to be kept under control, but that is it. Who cares whether the CPU is consuming 200W or 120W when it is placed in a machine with a Terabyte of RAM and two 300W GPUs?

The fact that Rackspace and Google have invested quite a bit in the "Barreleye" and will continue to do so is a good sign (there is a POWER9 Barreleye) but not a guarantee. Both companies and all large datacenters still rely on Intel Xeons for the foreseeable future. The experiment can quickly be terminated.

However, the announcement that China's largest Internet portal – Tencent – will also be using OpenPOWER servers is another sign that the OpenPOWER technology convinces people that it is a viable alternative. At first, the deal was just to license the affordable IBM S822LC (made by Supermicro) to the Chinese reseller Inspur, and Inspur – being a "local" reseller – would sell it to Tencent. However, this deal spurred Inspur to begin developing their own OpenPOWER products. Consequently, OpenPOWER is allowing IBM and partners to break into the fastest growing server market: China. The openness of the software and hardware ecosystem and the strong performance of the CPU cores puts OpenPOWER is a very unique position in China compared to both Intel and ARM. That is a very solid business plan if you ask us.



Tyan's GT75-BP012

Getting to the meat of today's article, we have the Tyan GT75-BP012. Anton has already described the Tyan GT75 servers in great detail here, so we will recap and add a few details.

The Tyan GT75 machines (just like the Tyan TN71-BP012 servers launched a year ago) are based on one IBM POWER8 Turismo Single Chip Module (SCM) processor, offering either eight or ten cores. This CPU finds itself paired with Tyan's Habanero motherboard, the same as in IBM's most affordable OpenPOWER server, the S812LC.

The board has 32 DIMM slots using four IBM Centaur memory buffer chips (MBCs). Since the operational voltage of the Centaur chip PHY maxes out at 1.43V, only low power DDR3 DIMMs are supported. The largest supported DIMMs are the quad ranked 32 GB DIMMs with 4 Gbit chips, allowing the server to have up to 1 TB of RAM. Unfortunately, the latest 8 Gbit based DIMMs are not supported. Tyan ships the server with eight 16 GB DIMMs – for a total of 128GB – if you take the standard configuration.

Tyan GT75: IBM POWER8 Turismo CPU Options
  POWER8 8-Core POWER8 10-Core
Core Count 8 10
Threads 64 80
Nominal Freq.
Turbo
2.33 GHz
3.025 GHz
2.095 GHz
2.926 GHz
L2 Cache 512 KB per core 512 KB per core
L3 Cache 8 MB eDRAM per core
64 MB per CPU
8 MB eDRAM per core
80 MB per CPU
DRAM Interface DDR3L-1600 (Low Power Only)
PCI Express 3 × PCIe controllers, 32 lanes
TDP 130W
169W
130W
169W

As the OpenPOWER POWER8 has to fit and operate within a 1U home, the clockspeed is limited to 2.328 GHz nominal. However, that is just a paper spec just like the clockspeed of the Xeon E5. In reality, the power governor defaults to "on demand". In that case, the CPU runs at 2.06 GHz at low load, and boost up to 3.025 GHz when the CPU is fully loaded. The speedsteps are very small, only +/- 30 MHz, so the second highest speedstep is 2.99 GHz. Below you find the configuration table of all Tyan GT75 servers.

Comparison of Tyan GT75 Servers
  BSP012G75V4H-B4C BSP012G75V4H-Q4T BSP012G75V4H-Q4F
CPU IBM POWER8
8-Core
2.328 GHz
130 W/169 W TDP
IBM POWER8
10-Core
2.095 GHz
130 W/169 W TDP
IBM POWER8
10-Core
2.095 GHz
130 W/169 W TDP
Installed RAM 8 × 16 GB R-DDR3L 16 × 16 GB R-DDR3L 32 × 16 GB R-DDR3L
RAM (subsystem) Up to 1 TB of DDR3L-1333 DRAM, 32 RDIMM modules, four IBM Centaur MBCs
Storage 2 × 512 GB SSDs 2 × 1 TB SSDs 4 × 1 TB SSDs
Tyan Storage Mezzanine MP012-9235-4I
(4-port SATA 6Gb/s IOC w/o RAID stack)
LAN 4 × GbE ports 4 × 10 GbE ports 4 × 10 GbE ports
Tyan LAN Mezzanine MP012-5719-4C
Broadcom 1GbE LAN Mezz Card
MP012-B840-4T
Qlogic+Broadcom 10GbE LAN Mezz Card-
MP012-Q840-4F Qlogic 10GbE LAN Mezz Card

In today's article we're review the basic model, the BSP012G75V4H-B4C. Notice the twelve (!) fans.

The Tyan GT75-BP012 makes use of Tyan's mezzanine cards for networking and for the storage controller. As a result, it can equipped with up to four 3.5” hot-swappable SATA 6G HDD/SSDs and four network controllers (1 GbE or 10 GbE) without using the 8-lane PCIe riser.

Now if you've been counting the CPIe lanes required for all of this, it seems like we should be a bit short, and indeed that's the case. Digging a bit deeper, we'll find that the server is using a PLX PEX8748 PCIe switch to take a PCIe 3.0 x8 root port from the CPU and switch it among the LAN riser, SATA riser, and the two black PCIe x8 slots.



Benchmark Configuration and Methodology

Our testing was conducted on Ubuntu Server 16.04 (kernel 4.2.0) with gcc compiler version 5.2.1. As we were only able to get everything working appropriately with that specific software combination, we were not able to use something newer.

Last but not least, we want to note how the performance graphs have been color-coded. Orange is for used for POWER8 CPUs, while the latest generation of the Intel Xeon (v4) gets dark blue, the previous one (v3) gets light blue, and older Xeon generations are colored with the default gray.

Tyan GT75-BSP012 (1U)

The Tyan GT75-BSP012 is based up on Tyan's "Habanero" platform.

CPU One IBM POWER8 2.328 GHz (up to 3.025 GHz Turbo)
RAM 128 GB (8x16GB) DDR3L-1600
Internal Disks 2x Sandisk 512 GB
Motherboard Tyan SP012GMR-1U "Habanero"
PSU 750W 80Plus Platinum

 

IBM S812LC (2U)

The IBM S812LC is also based up on Tyan's "Habanero" platform. The board inside the IBM server is thus designed by Tyan.

CPU One IBM POWER8 2.92 GHz (up to 3.5 GHz Turbo)
RAM 256 GB (16x16GB) DDR3-1333
Internal Disks 2x Samsung 850Pro 960 GB
Motherboard Tyan SP012
PSU Delta Electronics DSP-1200AB 1200W

 

Intel's Xeon E5 Server – S2600WT (2U Chassis)

Our trusty Xeon E5 collection includes the E5-2699 v4, E5-2699v3, and E5-2690.

CPU One Intel Xeon processor E5-2699 v4 (2.2 GHz, 22c, 55MB L3, 145W)
One "simulated" Intel Xeon processor E5-2680 v4 (2.2 GHz, 14c, 35MB L3, 145W)
One Intel Xeon processor E5-2699 v3 (2.3 GHz, 18c, 45MB L3, 145W)
One Intel Xeon processor E5-2690 v3 (3.2 GHz, 8c, 20MB L3, 135W)
RAM 128 GB (8x16GB) Kingston DDR4-2400
Internal Disks 2x Samsung 850Pro 960 GB
Motherboard Intel Server Board Wildcat Pass
PSU Delta Electronics 750W DPS-750XB A (80+ Platinum)

All C-states are enabled in the BIOS.

SuperMicro 6027R-73DARF (2U Chassis)

CPU Two Intel Xeon processor E5-2697 v2 (2.7GHz, 12c, 30MB L3, 130W)
RAM 128GB (8x16GB) Samsung at 1866 MHz
Internal Disks 2x Intel SSD3500 400GB
Motherboard SuperMicro X9DRD-7LN4F
PSU Supermicro 740W PWS-741P-1R (80+ Platinum)

All C-states are enabled in the BIOS.

Other Notes

Both servers are fed by a standard European 230V (16 Amps max.) power line. The room temperature is monitored and kept at 23°C by our Airwell CRACs.



Java Performance

The SPECjbb 2015 benchmark has "a usage model based on a world-wide supermarket company with an IT infrastructure that handles a mix of point-of-sale requests, online purchases, and data-mining operations." It uses the latest Java 7 features and makes use of XML, compressed communications, and messaging with security.

We tested with four groups of transaction injectors and backends. The reason why we use the "Multi JVM" test is that it is more realistic: multiple VMs on a server is a very common practice.

The Java version was OpenJDK 1.8.0_91. We applied relatively basic tuning to mimic real-world use, while aiming to fit everything inside a server with 128 GB of RAM:

"-server -Xmx24G -Xms24G -Xmn16G -XX:+AlwaysPreTouch -XX:+UseLargePages"

The graph below shows the maximum throughput numbers for our MultiJVM SPECJbb test.

SPECJBB 2015-Multi Max-jOPS

The Critical-jOPS metric is a throughput metric under a response-time constraint.

The 8-core Tyan POWER8 server offers about 72% of the performance of the 10-core IBM S812LC. That is not too bad as the latter not only has 20% more cores, but the chip can also boost 16% higher. In total, the IBM POWER8 CPU inside the 2U S812LC offers about 45% greater processing power ("35 GHz" vs "24 GHz") and delivers about 40% better performance. So compared to the S812LC, the 1U Tyan delivers very decent performance.

But Intel is the one to beat. And by caging the POWER8 inside a 1U, performance has dropped below the power sipping (90W TDP!) Xeon E5-2640v4.

SPECJBB 2013-Multi Critical-jOPS

Meanwhile our next benchmark is a good reminder that OpenJDK 8's performance is not optimal for the POWER8. The IBM JDK (More details here) does not offer much better throughput, unless you start tuning frantically. However, it does increase the most important score, critical-jOPS, with reasonable tuning.

However, while the more powerful 2U POWER8 can still keep up with Intel's best and most expensive (only 9% slower), the frequency capped CPU inside the Tyan 1U fails to impress as it trails the less expensive and less power hungry Xeon E5-2640 v4 by a large margin.



Database Performance: MySQL 5.7.0

Thanks to the excellent repository of Percona, we have been able to vastly improve our MySQL benchmarking with Sysbench. You cannot compare these results to the results published prior to the Cavium Thunder-X review: we made quite a few changes in the way we benchmark. We first upgraded the standard MySQL installation to the better performing Percona Server 5.7.

Secondly, we used Sysbench 0.5 (instead of 0.4) and we implemented the (lua) scripts that allow us to use multiple tables (8 in our case) instead of the default one. This makes the Sysbench benchmark much more realistic, as running with one table creates a very artificial bottleneck.

For our testing we used the read-only OLTP benchmark, which is slightly less realistic, but still much more interesting than most other Sysbench tests. This allows us to measure CPU performance without creating an I/O bottleneck.

Sysbench on 8 tables

The 10-core POWER8 @3.5 GHz is about 40% faster than the 8-core POWER8, which maxes out at 3.025 GHz. This is in-line with our performance expectations. As it turns out, keeping the clock high is not very hard in this kind of application: the application peaks for a relatively small period.

Overall our "1U" POWER8 server is only 8% faster than the Xeon E5-2640v4. Meanwhile reducing the number of threads to 4 is not a wise thing to do.

MySQL Sysbench RO (8 tables)

Although we cannot say that our 8-core "1U" POWER8 setup is much slower than the 10-core S812LC, it still gets a firm beating by the Xeon E5-2640v4.



Apache Spark Benchmarking

Spark is wonderful framework, but you need some decent input data and some good coding skills to really test it. Speeding up Big Data applications is the top priority project at the lab I work for (Sizing Servers Lab of the University College of West-Flanders), so I was able to turn to the coding skills of Wannes De Smet to produce a benchmark that uses many of the Spark features and is based upon real world usage.

The test is described in the graph above. We first start with 300 GB of compressed data gathered from the CommonCrawl. These compressed files are a large number of web archives. We decompress the data on the fly to avoid a long wait that is mostly storage related. We then extract the meaningful text data out of the archives by using the Java library "BoilerPipe". Using the Stanford CoreNLP Natural Language Processing Toolkit, we extract entities ("words that mean something") out of the text, and then count which URLs have the highest occurrence of these entities. The Alternating Least Square algorithm is then used to recommend which URLs are the most interesting for a certain subject.

We tested with Apache Spark 1.5 in standalone mode (non-clustered) as it took us a long time to make sure that the results were repetitive. For now, we're sticking with version 1.5 to be able to compare with earlier results.

Apache Spark 1.5

The Xeon E5-2640 v4 and 8-core POWER8 finish neck-and-neck. And that is not good news for the POWER8. It needs to beat the lower-and cheaper Xeon by a large margin to make people switch.



Energy Consumption

We know that the POWER8 was not designed to be a performance-per-watt champion. Throughput, single threaded performance, and RAS were the main priorities. However, Tyan does position the GT75 as a virtualization server. In that market, performance-per-watt is important.

We tested the energy consumption of our servers for a one-minute period in several situations. The first one is the point where the tested server performs best in MySQL: the highest throughput just before the response time goes up significantly. Then we look at the point where throughput is the highest (no matter what response time). This is the situation where the CPU is fully loaded.

The final column is calculated by dividing the best throughput by the power usage. We define the "best throughput" as throughput where the balance between throughput and the 95th percentile response time is the best. In other words, beyond that point, throughput increases only slightly (less than 10%), but the response time increases much faster.

SKU Server Height TDP
(on paper)
spec
Idle
Server

W
MySQL
Best Throughput
at Lowest Resp. Time (*)
(W)
MySQL
Max Throughput
(W)
Transaction
/s (**)
Tr/watt ( = ** / * )
IBM POWER8 8c@2.3 Tyan 1U 170 W 171 323 330 10300 32
IBM POWER8 10c@2.9 S812LC 2U 190 W 221 259 260 14482 55
Xeon E5-2699 v4 2U 145 W 67 213 235 18997 89
Xeon E5-2640 v4 2U 90 W 76 135 145 9541 71
Xeon E5-2690 v3 2U 135 W 84 249 254 11741 47

At idle, both of the POWER8-based servers reduce their clockspeed to 2.06 GHz and power-gate the cores they do not need. However, the Tyan GT75 PSU is probably more efficient in this case, and the GT75 is a less complex server as well. As a result, the idle power is significantly lower than the S812LC. Still, it is nowhere near the Intel Xeons.

Once we test the server under load, the Tyan GT75 demands a lot more power than the S812LC. That might seem contradictory at first sight, as the latter is equipped with more power hungry CPU. The main culprits are the small, extremely high RPM 1U fans inside the Tyan, which have to work hard to keep a 170W CPU cool in such a cramped environment.

Notice how the IPMI software reports 8800 RPM, but in reality the fan is running at a mindboggling 15600 RPM. A total of twelve such fans results in the cooling system as a whole consuming a lot of power.

This kind of "performance first" CPU policy really needs larger fans and more room. Case in point: in a roomier 2U chassis the load power consumption of a POWER8 setup comes very close to the contemporary 22 nm Xeon E5 v3. It will be interesting to see how this works out in the 1.25U high Rackspace BarrelEye.

Intel's "Broadwell-EP" (Xeon E5 v4) wins here by an vast margin. And there is little doubt that the next generation Skylake Xeons will probably do (slightly?) better.

However, don't count IBM and OpenPOWER out yet. First of all, MySQL is better optimized for x86-64 than for POWER8. Since MySQL is the second most popular database engine (and will probably overtake Oracle soon), we feel our choice is justified. However, it is worth mentioning that PostgreSQL (number 4) and MongoDB (5) have been fully optimized for OpenPOWER and show gains of up to 30%. Lastly, IBM's POWER9 should also do quite a bit better as a result of an improved microarchitecture and being baked with a state-of-the art 14 nm SOI process. The 14 nm POWER9 versus the "tweaked 14 nm" Intel Xeon E5 version 5 should prove a very interesting comparison.



Positioning the Tyan GT75

Tyan positions the GT75-BP012 as an HPC and virtualization server. But that is clearly a mistake. Since the POWER architecture has been almost completely absent in the HPC world for years now, the HPC software ecosystem is dominated by x86. The POWER HPC software ecosystem is very small in comparison. That is a fact, not an opinion, as even IBM talked about "a re-entry in the HPC market" at the end of 2015.

But people do not switch to another server ecosystem easily: there has to be a compelling reason, a Unique Selling Proposition (USP). Only the recently launched S822LC can claim such an USP; it has a much faster link (NVLink) to the best NVIDIA GPUs, and as a result can offer much higher performance in some specifically-tailored HPC applications.

The GT75-BP012 has no GPU capable PCIe slot, let alone an NVLink connection. In the cramped 1U space, the CPU is limited to 3.025 GHz, which is a bad trade-off for HPC users. The GT75-BP012 is definitely not an HPC server.

Neither is it a virtualization server, as the virtualization market lives and dies by performance-per-watt.

The only target that is left is the “In-memory computing” market, which ranges from in-memory key value stores ("Redis") to DB2 BLU. In those cases, the limited processing power is – most of the time – less important than the amount of memory that can be installed at a reasonable price. Originally, the server was limited to 512 GB (32 slots, 16 GB per slot) but it should now support up to 1 TB. We say "should" as we were not able to check this.

Still, we do not feel a 1U server is a good match for POWER8. Place that POWER8 CPU on the same Tyan "Habanero" motherboard inside the 2U IBM S812LC (and Tyan's own 2U TN71 servers) and you'll get a much more attractive server. The CPU performs 40% better, the power under load is 20% lower, and you get much more PCIe slots. To reduce it to a car analogy, a turbocharged V8 cannot breathe through a straw.

Tyan's Business Model: Direct Sales Only

As one of the founding fathers of the OpenPOWER foundation, Tyan is in a unique position. It can be the "Open and easily accessible" vendor for buyers who don't care about the different services IBM offers, but just want a fast POWER server for an affordable price. However, the truth is that it is a lot easier to buy a server from IBM than from Tyan.

While the S812LC can be found on IBM's web shop, Tyan follows the more traditional Direct Sales model. So you are meant to buy the POWER8 servers in large volumes, and pricing varies by contract. As a result of this sales model, Tyan does not readily disclose the price of this server.

Consequently, in our discussions with Tyan they were vague about pricing due to the above, which makes it very difficult for us to make any meaningful comparison to other POWER8 offerings or any Xeon-based servers. While it's of course Tyan's choice how they want to do business, we consider this a missed opportunity for the company, as all of the other OpenPOWER vendors are also targeting the customers who want to buy servers in large volumes. As a result, if you want low volumes, IBM is your only choice as far as we know.

Eye Towards the Future: POWER9

Wrapping things up, the announcement of the POWER9 SMT-4 Scale Out processor is the reason why we remain optimistic about the chances of the OpenPOWER ecosystem. A full analysis of POWER9 is beyond the scope of this article, but there is a lot we like about what has been disclosed thus far for the "Linux optimized" POWER9 SO SMT-4:

  1. It is not an improved POWER8, there are many large improvements
  2. A better balance between single-threaded performance and throughput: SMT-4 will be combined with a more powerful core
  3. NVLink 2.0, which offers 7x more bandwidth to the GPU than PCIe 3.0
  4. No more power-hogging, latency-increasing memory buffers

The POWER9 SO SMT-4 will have up to 8 DDR-4 channels, and up to 2 DIMMs per socket. This means that raw memory bandwidth and memory capacity are halved relative to POWER8, but it is a very good trade-off. The use of direct attached memory (same as the Xeon E5) lowers the latency of DRAM accesses, makes the motherboard design a lot less expensive, and lowers power consumption with 60-80W.

Add 50 to 100% higher performance per socket, and you get formidable competitor for the new Xeon E5 v4 "Skylake EP". Tyan is in a good position to make this powerhouse accessible for the rest of us, so hopefully we will see a more ambitious approach than today.

Log in

Don't have an account? Sign up now