Introduction

Earlier this month I drove out to Oak Ridge, Tennessee to pay a visit to the Oak Ridge National Laboratory (ORNL). I'd never been to a national lab before, but my ORNL visit was for a very specific purpose: to witness the final installation of the Titan supercomputer.

ORNL is a US Department of Energy laboratory that's managed by UT-Battelle. Oak Ridge has a core competency in computational science, making it not only unique among all DoE labs but also making it perfect for a big supercomputer.

Titan is the latest supercomputer to be deployed at Oak Ridge, although it's technically a significant upgrade rather than a brand new installation. Jaguar, the supercomputer being upgraded, featured 18,688 compute nodes - each with a 12-core AMD Opteron CPU. Titan takes the Jaguar base, maintaining the same number of compute nodes, but moves to 16-core Opteron CPUs paired with an NVIDIA Kepler K20X GPU per node. The result is 18,688 CPUs and 18,688 GPUs, all networked together to make a supercomputer that should be capable of landing at or near the top of the TOP500 list.

We won't know Titan's final position on the list until the SC12 conference in the middle of November (position is determined by the system's performance in Linpack), but the recipe for performance is all there. At this point, its position on the TOP500 is dependent on software tuning and how reliable the newly deployed system has been.


Rows upon rows of cabinets make up the Titan supercomputer

Over the course of a day in Oak Ridge I got a look at everything from how Titan was built to the types of applications that are run on the supercomputer. Having seen a lot of impressive technology demonstrations over the years, I have to say that my experience at Oak Ridge with Titan is probably one of the best. Normally I cover compute as it applies to making things look cooler or faster on consumer devices. I may even dabble in talking about how better computers enable more efficient datacenters (though that's more Johan's beat). But it's very rare that I get to look at the application of computing to better understanding life, the world and universe around us. It's meaningful, impactful compute.

The Hardware

In the 15+ years I've been writing about technology, I've never actually covered a supercomputer. I'd never actually seen one until my ORNL visit. I have to say, the first time you see a supercomputer it's a bit anticlimactic. If you've ever toured a modern datacenter, it doesn't look all that different.


A portion of Titan


More Titan, the metal pipes carry coolant

Titan in particular is built from 200 custom 19-inch cabinets. These cabinets may look like standard 19-inch x 42RU datacenter racks, but what's inside is quite custom. All of the cabinets that make up Titan requires a room that's about the size of a basketball court.

The hardware comes from Cray. The Titan installation uses Cray's new XK7 cabinets, but it's up to the customer to connect together however many they want.

ORNL is actually no different than any other compute consumer: its supercomputers are upgraded on a regular basis to keep them from being obsolete. The pressures are even greater for supercomputers to stay up to date, after a period of time it actually costs more to run an older supercomputer than it would to upgrade the machine. Like modern datacenters, supercomputers are entirely power limited. Titan in particular will consume around 9 megawatts of power when fully loaded.

The upgrade cycle for a modern supercomputer is around 4 years. Titan's predecessor, Jaguar, was first installed back in 2005 but regularly upgraded over the years. Whenever these supercomputers are upgraded, old hardware is traded back in to Cray and a credit is issued. Although Titan reuses much of the same cabinetry and interconnects as Jaguar, the name change felt appropriate given the significant departure in architecture. The Titan supercomputer makes use of both CPUs and GPUs for compute. Whereas the latest version of Jaguar featured 18,688 12-core AMD Opteron processors, Titan keeps the total number of compute nodes the same (18,688) but moves to 16-core AMD Opteron 6274 CPUs. What makes the Titan move so significant however is that each 16-core Opteron is paired with an NVIDIA K20X (Kepler GK110) GPU.


A Titan compute board: 4 AMD Opteron (16-core CPUs) + 4 NVIDIA Tesla K20X GPUs

The transistor count alone is staggering. Each 16-core Opteron is made up of two 8-core die on a single chip, totaling 2.4B transistors built using GlobalFoundries' 32nm process. Just in CPU transistors alone, that works out to be 44.85 trillion transistors for Titan. Now let's talk GPUs.

NVIDIA's K20X is the server/HPC version of GK110, a part that never had a need to go to battle in the consumer space. The K20X features 2688 CUDA cores, totaling 7.1 billion transistors per GPU built using TSMC's 28nm process. With a 1:1 ratio of CPUs and GPUs, Titan adds another 132.68 trillion transistors to the bucket bringing the total transistor count up to over 177 trillion transistors - for a single supercomputer.

I often use Moore's Law to give me a rough idea of when desktop compute performance will make its way into notebooks and then tablets and smartphones. With Titan, I can't even begin to connect the dots. There's just a ton of computing horsepower available in this installation.

Transistor counts are impressive enough, but when you do the math on the number of cores it's even more insane. Titan has a total of 299,008 AMD Opteron cores. ORNL doesn't break down the number of GPU cores but if I did the math correctly we're talking about over 50 million FP32 CUDA cores. The total computational power of Titan is expected to be north of 20 petaflops.

Each compute node (CPU + GPU) features 32GB of DDR3 memory for the CPU and a dedicated 6GB of GDDR5 (ECC enabled) for the K20X GPU. Do the math and that works out to be 710TB of memory.


Titan's storage array

System storage is equally impressive: there's a total of 10 petabytes of storage in Titan. The underlying storage hardware isn't all that interesting - ORNL uses 10,000 standard 1TB 7200 RPM 2.5" hard drives. The IO subsystem is capable of pushing around 240GB/s of data. ORNL is considering including some elements of solid state storage in future upgrades to Titan, but for its present needs there is no more cost effective solution for IO than a bunch of hard drives. The next round of upgrades will take Titan to around 20 - 30PB of storage, at peak transfer speeds of 1TB/s.

Most workloads on Titan will be run remotely, so network connectivity is just as important as compute. There are dozens of 10GbE links inbound to the machine. Titan is also linked to the DoE's Energy Sciences Network (ESNET) 100Gbps backbone.

Physical Architecture, OS, Power & Cooling
POST A COMMENT

130 Comments

View All Comments

  • Ryan Smith - Wednesday, October 31, 2012 - link

    We have other reasons to back our numbers, though I can't get into them. Suffice it to say, if we didn't have 100% confidence we would not have used it. Reply
  • RussianSensation - Wednesday, October 31, 2012 - link

    Hey Ryan, what about this?

    http://www.brightsideofnews.com/news/2012/10/29/ti...

    The Jaguar is thus renamed into Titan, and the sheer numbers are quite impressive:
    46,645,248 CUDA Cores (yes, that's 46 million)
    299,008 x86 cores
    91.25 TB ECC GDDR5 memory
    584 TB Registered ECC DDR3 memory
    Each x86 core has 2GB of memory

    1 Node = the new Cray XK7 system, consists of 16-core AMD Opteron CPU and one Nvidia Tesla K20 compute card.

    The Titan supercompute has 18,688 nodes.

    46,645,248 CUDA Cores / 18,688 Nodes = 2,496 CUDA cores per 1 Tesla K20 card.
    Reply
  • Ryan Smith - Thursday, November 01, 2012 - link

    Among other things: note that Titan has 6GB of memory per K20 (and this is published information).

    http://nvidianews.nvidia.com/Releases/NVIDIA-Power...

    "The upgrade includes the Tesla K20 GPU accelerators, a replacement of the compute modules to convert the system’s 200 cabinets to a Cray XK7 supercomputer, and 710 terabytes of memory."

    18,688 nodes, each with 32GB of RAM + 6GB of VRAM = 710,144 GB

    (Press agencies are bad about using power of 10, hence "710" TB).
    Reply
  • Ryan Smith - Thursday, November 01, 2012 - link

    The 6GB number is also in the slide deck: http://images.anandtech.com/reviews/video/NVIDIA/T... Reply
  • RussianSensation - Wednesday, October 31, 2012 - link

    Tom's Hardware reported that Titan Supercomputer Packs 46,645,248 Nvidia CUDA Cores
    http://www.tomshardware.com/news/oak-ridge-ORNL-nv...

    46,645,248 CUDA Cores / 18,688 Tesla K20s also gives 2,496 CUDA cores per GPU, instead of 2,688.
    Reply
  • ypsylon - Wednesday, October 31, 2012 - link

    Great article. Fantastic way of showing to us tiny PC users what really big stuff looks like. Data center is one thing, but my word this stuff is, is... well that is Ultimate Computing Pr0n. For people who will never ever have a chance to visit one of the super computer centers it is quite something. Enjoyed that very much!

    @Guspaz

    If we get that kind of performance in phones then it is really scary prospect. :D
    Reply
  • twotwotwo - Wednesday, October 31, 2012 - link

    We currently have 1-billion-transistor chips. We'd get from there to 128 trillion, or Titan-magnitude computers, after 17 iterations of Moore's Law, or about 25 years. If you go 25 years back, it's definitely enough of a gap that today's technology looks like flying cars to folks of olden times. So even if 128-trillion-transistor devices isn't exactly what happens, we'll have *something* plenty exciting on the other end.

    *Something*, but that may or may not be huge computers. It may not be an easy exponential curve all the way. We'll almost certainly put some efficiency gains towards saving cost and energy rather than increasing power, as we already are now. And maybe something crazy like quantum computers, rather than big conventional computers, will be the coolest new thing.

    I don't imagine those powerful computers, whatever they are, will all be doing simulations of physics and weather. One of the things that made some of today's everyday tech hard to imagine was that the inputs involved (social graphs, all the contents of the Web, phones' networks and sensors) just weren't available--would have been hard, before 1980, to imagine trivially having a metric of your connectedness to an acquaintance (like Facebook's 'mutual friends') or having ads matching your interest.

    I'm gonna say that 25 years out the data, power, and algorithms will be available to everyone to make things that look like Strong AI to anyone today. Oh, and the video games will be friggin awesome. If we don't all blow each other up in the next couple-and-a-half decades, of course. Any other takers? Whoever predicts it best gets a beer (or soda) in 25 years, if practical.
    Reply
  • JAH - Wednesday, October 31, 2012 - link

    Must've been a fun trip for a geek/nerd. I'm jealous!

    Question, what do they do with the old CPUs that got replaced? Resale, recycled, donation?
    Reply
  • silverblue - Wednesday, October 31, 2012 - link

    I'd wondering which model Opterons they threw in there. The Interlagos chips were barely faster and used more power than the Magny-Cours CPUs they were destined to replace, though I'm sure these are so heavily taxed that the Bulldozer architecture would shine through in the end.

    Okay, I've checked - these are 6274s, which are Interlagos and clocked at 2.2GHz base with an ACP of 80W and a TDP of 115W apiece. This must be the CPU purchase mentioned prior to Bulldozer's launch.
    Reply
  • silverblue - Wednesday, October 31, 2012 - link

    I WAS wondering, rather. Too early for posting, it seems. Reply

Log in

Don't have an account? Sign up now