Scale-Out Big Data Benchmark: ElasticSearch

ElasticSearch is an open source, full text search engine that can be run on a cluster relatively easy. It's basically like an open source version of Google Search that can be deployed in an enterprise. It should be one of the poster-children of scale-out software and is one of the representatives of the so called "Big Data" technologies. Thanks to Kirth Lammens, one of the talented researchers at my lab, we have developed a benchmark that searches through all the Wikipedia content (+/- 40GB). Elasticsearch is – like many Big Data technologies – built on Java.

We are not sure why, but installing IBM's JDK caused a lot of headaches. For some reason the JVM stopped working in the middle of our tests. We got the same behavior running Apache Spark. This could be a result of our lack of experience with the IBM JDK, or the fact that the Linux LE ecosytem is still young. To cut a long story short, we ended up useing OpenJDK 8, which is part of the Ubuntu 15.04 distribution. OpenJDK is very similar to and based upon the same code as Oracle's HotSpot JDK.

We limited the systems to one socket to avoid the issues associated with garbage collection pauses and other scaling issues. There is reason why many Java benchmarks on these massive machines are using multiple JVMs.

Elastic Search

Although the POWER8 can probably perform a bit better with the IBM JDK, performance is in the same league as the best Xeons. Meanwhile as a further point of comparison we also included the score of the Xeon D from our previous article.

Database Performance: MySQL Energy and Pricing
Comments Locked

146 Comments

View All Comments

  • hissatsu - Friday, November 6, 2015 - link

    You might want to look more closely. Thought it's a bit blurry, I'm almost certain that's the 80+ Platinum logo, which has no color.
  • DanNeely - Friday, November 6, 2015 - link

    That's possible; it looks like there's something at the bottom of the logo. Google image search shows 80+ platinum as a lighter silver/gray than 80+ silver; white is only the original standard.
  • Shezal - Friday, November 6, 2015 - link

    Just look up the part number. It's a Platinum :)
  • The12pAc - Thursday, November 19, 2015 - link

    I have a S814, it's Platinum.
  • johnnycanadian - Friday, November 6, 2015 - link

    Oh yum! THIS is what I still love about AT: non-mainstream previews / reviews. REALLY looking forward to more like this. I only wish SGI still built workstation-level machines. :-(
  • mapesdhs - Tuesday, November 10, 2015 - link


    Indeed, but it'd need a hefty change in direction at SGI to get back into workstations again, so very unlikely for the forseeable future. They certainly have the required base tech (NUMALink6, MPI offload, etc.), namely lots of sockets/cores/RAM coupled with GPUs for really heavy tasks (big data, GIS, medical, etc.), ie. a theoretical scalable, shared-memory workstation. But the market isn't interested in advanced performance solutions like this atm, and the margin on standard 2/4-socket systems isn't worthwhile, it'd be much cheaper to buy a generic Dell or HP (plus, it's only above this no. of sockets that their own unique tech comes into play). Pity, as the equivalent of a UV 30/300 workstation would be sweet (if expensive), though for virtually all of the tasks discussed in this article, shared memory tech isn't relevant anyway. The notion of connectable, scalable, shared memory workstations based on NV gfx, PCIe and newer multi-core MIPS CPUs was apparently brought up at SGI way back before the Rackable merger, but didn't go anywhere (not viable given the financial situation at the time). It's a neat concept, eg. imagine being able to connect two or more separate ordinary 2/4-socket XEON workstations together (each fitted with, say, a couple of M6000s) to form a single combined system with one OS instance and resources pool, allowing users to combine & split setups as required to match workloads, but it's a notion whose time has not yet come.

    Of course, what's missing entirely is the notion of advanced but costly custom gfx, but again there's no market for that atm either, at least not publicly. Maybe behind the scenes NV makes custom stuff the way SGI used to for relevant customers (DoD, Lockheed, etc.), but SGI's products always had some kind of commercially available equivalent from which the custom builds were derived (IRx gfx), whereas atm there's no such thing as a Quadro with 30000 cores and 100GB RAM that costs $50K and slides into more than one PCIe slot which anyone can buy if they have the moolah. :D

    Most of all though, even if the demand existed and the tech could be built, it'd never work unless SGI stopped using its pricing-is-secret reseller sales model. They should have adopted a direct sales setup long ago, order on the site, pricing configurator, etc., but that never happened, even though the lack of such an option killed a lot of sales. Less of an issue with the sort of products they sell atm, but a better sales model would be essential if they were to ever try to sell workstations again, and that'd need a huge PR/sales management clearout to be viable.

    Pity IBM couldn't pay NV to make custom gfx, that'd be interesting, but then IBM quit the workstation market aswell.

    Ian.
  • mostlyharmless - Friday, November 6, 2015 - link

    "There is definitely a market for such hugely expensive and robust server systems as high end RISC machines are good for about 50.000 servers. "

    Rounding error?
  • DanNeely - Friday, November 6, 2015 - link

    50k clients would be my guess.
  • FunBunny2 - Friday, November 6, 2015 - link

    (dot) versus (comma) most likely. Euro centric versus 'Murcan centric.
  • DanNeely - Friday, November 6, 2015 - link

    If that was the case, a plain 50 would be much more appropriate.

Log in

Don't have an account? Sign up now