The launch of the Kepler family of GPUs in March of 2012 was something of a departure from the normal for NVIDIA. Over the years NVIDIA has come to be known among other things for their big and powerful GPUs. NVIDIA had always produced a large 500mm2+ GPU to serve both as a flagship GPU for their consumer lines and the fundamental GPU for their Quadro and Tesla lines, and have always launched with that big GPU first.

So when the Kepler family launched first with the GK104 and GK107 GPUs – powering the GeForce GTX 680 and GeForce GT 640M respectively – it was unusual to say the least. In place of “Big Kepler”, we got a lean GPU that was built around graphics first and foremost, focusing on efficiency and in the process forgoing a lot of the compute performance NVIDIA had come to be known for in the past generation. The end result of this efficiency paid off nicely for NVIDIA, with GTX 680 handily surpassing AMD’s Radeon HD 7970 at the time of its launch in both raw performance and in power efficiency.

Big Kepler was not forgotten however. First introduced at GTC 2012, GK110 as it would come to be known would be NVIDIA’s traditional big, powerful GPU for the Kepler family. Building upon NVIDIA’s work with GK104 while at the same time following in the footsteps of NVIDIA’s compute-heavy GF100 GPU, GK110 would be NVIDIA’s magnum opus for the Kepler family.

Taped out later than the rest of the Kepler family, GK110 has taken a slightly different route to get to market. Rather than launching in a consumer product first, GK110 was first launched as the heart of NVIDIA’s Tesla K20 family of GPUs, the new cornerstone of NVIDIA’s rapidly growing GPU compute business.


Oak Ridge National Laboratory's Titan Supercomputer

Or perhaps as it’s better known, the GPU at the heart of the world’s fastest supercomputer, Oak Ridge National Laboratory’s Titan supercomputer.

The Titan supercomputer was a major win for NVIDIA, and likely the breakthrough they’ve been looking for. A fledging business merely two generations prior, NVIDIA and their Tesla family have quickly shot up in prestige and size, much to the delight of NVIDIA. Their GPU computing business is still relatively small – consumer GPUs dwarf it and will continue to do so for the foreseeable future – but it’s now a proven business for NVIDIA. More to the point however, winning contracts like Titan are a major source of press and goodwill for the company, and goodwill the company intends to capitalize on.

With the launch of the Titan supercomputer and the Tesla K20 family now behind them, NVIDIA is now ready to focus their attention back on the consumer market. Ready to bring their big and powerful GK110 GPU to the consumer market, in typical NVIDIA fashion they intend to make a spectacle of it. In NVIDIA’s mind there’s only one name suitable for the first consumer card born of the same GPU as their greatest computing project: GeForce GTX Titan.

GeForce GTX Titan: By The Numbers

At the time of the GK110 launch at GTC, we didn’t know if and when GK110 would ever make it down to consumer hands. From a practical perspective GTX 680 was still clearly in the lead over AMD’s Radeon HD 7970. Meanwhile the Titan supercomputer was a major contract for NVIDIA, and something they needed to prioritize. 18,688 551mm2 GPUs for a single customer is a very large order, and at the same time orders for Tesla K20 cards were continuing to pour in each and every day after GTC. In the end, yes, GK110 would come to the consumer market. But not until months later, after NVIDIA had the chance to start filling Tesla orders. And today is that day.

Much like the launch of the GTX 690 before it, NVIDIA intends to stretch this launch out a bit to maximize the amount of press they get. Today we can tell you all about Titan – its specs, its construction, and its features – but not about its measured performance. For that you will have to come back on Thursday, when we can give you our benchmarks and performance analysis.

  GTX Titan GTX 690 GTX 680 GTX 580
Stream Processors 2688 2 x 1536 1536 512
Texture Units 224 2 x 128 128 64
ROPs 48 2 x 32 32 48
Core Clock 837MHz 915MHz 1006MHz 772MHz
Shader Clock N/A N/A N/A 1544MHz
Boost Clock 876Mhz 1019MHz 1058MHz N/A
Memory Clock 6.008GHz GDDR5 6.008GHz GDDR5 6.008GHz GDDR5 4.008GHz GDDR5
Memory Bus Width 384-bit 2 x 256-bit 256-bit 384-bit
VRAM 6 2 x 2GB 2GB 1.5GB
FP64 1/3 FP32 1/24 FP32 1/24 FP32 1/8 FP32
TDP 250W 300W 195W 244W
Transistor Count 7.1B 2 x 3.5B 3.5B 3B
Manufacturing Process TSMC 28nm TSMC 28nm TSMC 28nm TSMC 40nm
Launch Price $999 $999 $499 $499

Diving right into things then, at the heart of the GeForce GTX Titan we have the GK110 GPU. By virtue of this being the 2nd product to be launched based off the GK110 GPU, there are no great mysteries here about GK110’s capabilities. We’ve covered GK110 in depth from a compute perspective, so many of these numbers should be familiar with our long-time readers.

GK110 is composed of 15 of NVIDIA’s SMXes, each of which in turn is composed of a number of functional units. Every GK110 packs 192 FP32 CUDA cores, 64 FP64 CUDA cores, 64KB of L1 cache, 65K 32bit registers, and 16 texture units. These SMXes are in turn paired with GK110’s 6 ROP partitions, each one composed of 8 ROPs, 256KB of L2 cache, and connected to a 64bit memory controller. Altogether GK110 is a massive chip, coming in at 7.1 billion transistors, occupying 551mm2 on TSMC’s 28nm process.

For Titan NVIDIA will be using a partially disabled GK110 GPU. Titan will have all 6 ROP partitions and the full 384bit memory bus enabled, but only 14 of the 15 SMXes will be enabled. In terms of functional units this gives Titan a final count of 2688 FP 32 CUDA cores, 896 FP64 CUDA cores, 224 texture units, and 48 ROPs. This makes Titan virtually identical to NVIDIA’s most powerful Tesla, K20X, which ships with the same configuration. NVIDIA does not currently ship any products with all 15 SMXes enabled, and though NVIDIA will never really explain why this is – yield, power, or otherwise – if nothing else it leaves them an obvious outlet for growth if they need to further improve Titan’s performance, by enabling that 15th SMX.

Of course functional units are only half the story, so let’s talk about clockspeeds. As a rule of thumb bigger GPUs don’t clock as high as smaller GPUs, and Titan will be adhering to this rule. Whereas GTX 680 shipped with a base clock of 1006MHz, Titan ships at a more modest 837MHz, making up for any clockspeed disadvantage with the brute force behind having so many functional units. Like GTX 680 (and unlike Tesla), boost clocks are once more present, with Titan’s official boost clock coming in at 876MHz, while the maximum boost clock can potentially be much higher.

On the memory side of things, Titan ships with a full 6GB of GDDR5. As a luxury card NVIDIA went for broke here and simply equipped the card with as much RAM as is technically possible, rather than stopping at 3GB. You wouldn’t know that from looking at their memory clocks though; even with 24 GDDR5 memory chips, NVIDIA is shipping Titan at the same 6GHz effective memory clock as the rest of the high-end GeForce 600 series cards, giving the card 288GB/sec of memory bandwidth.

To put all of this in perspective, on paper (and at base clocks), GTX 680 can offer just shy of 3.1 TFLOPS of FP32 performance, 128GTexels/second texturing throughput, and 32GPixels/second rendering throughput, driven by 192GB/sec of memory bandwidth. Titan on the other hand can offer 4.5 TFLOPS of FP32 performance, 187GTexels/second texturing throughput, 40GPixels/second rendering throughput, and is driven by a 288GB/sec memory bus. This gives Titan 46% more shading/compute and texturing performance, 25% more pixel throughput, and a full 50% more memory bandwidth than GTX 680. Simply put, thanks to GK110 Titan is a far more powerful GPU than what GK104 could accomplish.

Of course with great power comes great power bills, to which Titan is no exception. In GTX 680’s drive for efficiency NVIDIA got GTX 680 down to a TDP of 195W with a power target of 170W, a remarkable position given both the competition and NVIDIA’s prior generation products. Titan on the other hand will have a flat 250W power target – in line with prior generation big NVIDIA GPUs – staking out its own spot on the price/power hierarchy, some 28%-47% higher in power consumption than GTX 680. These values are almost identical to the upper and lower theoretical performance gaps between Titan and GTX 680, so performance is growing in-line with power consumption, but only just. From a practical perspective Titan achieves a similar level of efficiency as GTX 680, but as a full compute chip it’s unquestionably not as lean. There’s a lot of compute baggage present that GK104 didn’t have to deal with.

Who’s Titan For, Anyhow?
POST A COMMENT

157 Comments

View All Comments

  • hammer256 - Tuesday, February 19, 2013 - link

    Ryan's analysis of the target market for this card is spot on: this card is for small scale HPC type workloads, where the researcher just want to build a desktop-like machine with a few of those cards. I know that's what I use for my research. To me, this is the real replacement of the GTX 580 for our purposes. The price hike is not great, but when put to context of the K20X, it's a bargain. I'm lusting to get 8 of these cards and get a Tyan GPU server. Reply
  • Gadgety - Tuesday, February 19, 2013 - link

    While gamers see little benefit, it looks like this is the card for GPU rendering, provided the software developers at VRay, Octane and others find a way to tap into this. So one of these can replace the 3xGTX580 3GBs. Reply
  • chizow - Tuesday, February 19, 2013 - link

    Nvidia has completely lost their minds. Throwing in a minor bone with the non-neutered DP performance does not give them license to charge $1K for this part, especially when DP on previous flagship parts carried similar performance relative to Tesla.

    First the $500 for a mid-range ASIC in GTX 680, then $1200 GTX 690 and now a $1000 GeForce Titan. Unbelievable. Best of luck Nvidia, good luck competing with the next-gen consoles at these price points, or even with yourselves next generation.

    While AMD is still at fault in all of this for their ridiculous launch pricing for the 7970, these recent price missteps from Nvidia make that seem like a distant memory.
    Reply
  • ronin22 - Wednesday, February 20, 2013 - link

    Bullshit of a typical NV hater.

    The compute-side of the card isn't a minor bone, it's its prime feature, along with the single-chip GTX690-like performance.

    "especially when DP on previous flagship parts carried similar performance relative to Tesla"

    Bullshit again.
    Give me a single card that is anywhere near the K20 in DP performance and we'll talk.

    You don't understand the philosophy of this card, as many around here.
    Thanksfully, the real intended audience is already recognizing the awesomeness of this card (read previous comments).

    You can go back to playing BF3 on your 79xx, but please close the door behind you on your way out ;)
    Reply
  • chizow - Wednesday, February 20, 2013 - link

    Heh, your ignorant comments couldn't be further from the truth about being an "NV hater". I haven't bought an ATI/AMD card since the 9700pro (my gf made the mistake of buying a 5850 though, despite my input) and previously, I solely purchased *multiple* Nvidia cards in this flagship market for the last 3 generations.

    I have a vested interest in Nvidia in this respect as I enjoy their products, so I've never rooted for them to fail, until now. It's obvious to me now that between AMD's lackluster offerings and ridiculous launch prices along with Nvidia's greed with their last two high-end product launches (690 and Titan), that they've completely lost touch with their core customer base.

    Also, before you comment ignorantly again, please look up the DP performance of GTX 280 and GTX 480/580 relative to their Tesla counterparts. You will see they are still respectable, ~1/8th of SP performance, which was still excellent compared to the completely neutered 1/32 DP of GK104 Kepler. That's why there is still a high demand for flagship Fermi parts and even GT200 despite their overall reputation as a less desirable part due to their thermal characteristics.

    Lastly, I won't be playing BF3 on a 7970, try a pair of GTX 670s in SLI. There's a difference between supporting a company through sound purchasing decisions and stupidly pissing away $1K for something that cost $500-$650 in the past.

    The philosophy of this card is simple: Rob stupid people of their money. I've seen enough of this in the past from the same target audience and generally that feeling of "awesomeness" is quickly replaced by buyer's remorse as they realize that slightly higher FPS number in the upper left of their screen isn't worth the massive number on their credit card statement.
    Reply
  • CeriseCogburn - Sunday, February 24, 2013 - link

    That one's been pissing acid since the 680 launch, failed and fails to recognize the superior leap of the GTX580 over the prior gen, which gave him his mental handicap believing he can get something for nothing, along with sucking down the master amd fanboy Charlie D's rumor about the "$350" flagship nVidia card blah blah blah 680 blah blah second tier blah blah blah.

    So instead the rager now claims he wasted near a grand on two 670's - R O F L - the lunatics never end here man.
    Reply
  • bamboo69 - Tuesday, February 19, 2013 - link

    Origin is using EK Waterblocks? i hope they arentt nickel plated, their nickel blocks flake Reply
  • Knock24 - Wednesday, February 20, 2013 - link

    I've seen it mentioned in the article that Titan has HyperQ support, but I've also read the opposite elsewhere.
    Can anyone confirm that HyperQ is supported? I'm guessing the simpleHyperQ Cuda SDK example might reveal if it's supported or not.
    Reply
  • torchedguitar - Wednesday, February 20, 2013 - link

    HyperQ actually means two separate things... One part is the ability to have a process act as a server, providing access to the GPU for other MPI processes. This is supported on Linux using Tesla cards (e.g. K20X) only, so it won't work on GTX Titan (it does work on Titan the supercomputer, though). The other part of HyperQ is that there are multiple hardware queues available for managing the work on multiple CUDA streams. GTX Titan DOES support this part, although I'm not sure just how many of these will be enabled (it's a tradeoff, having more hardware streams allows more flexibility in launching concurrent kernels, but also takes more memory and takes more time to initialize). The simpleHyperQ sample is a variation of the concurrentKernels sample (just look at the code), and it shows how having more hardware channels cuts down on false dependencies between kernels in different streams. You put things in different stream because they have no dependencies on each other, so in theory nothing in stream X should ever get stuck waiting for something in stream Y. When that does happen due to hitting limits of the hardware, it's a false dependency. An example would be when you try to time a kernel launch by wrapping it with CUDA event records (this is the simpleHyperQ sample). GPUs before GK110 only have one hardware stream, and if you take a program that launches kernels concurrently in separate streams, and wrap all the kernels with CUDA event records, you'll see that suddenly the kernels run one-at-a-time instead of all together. This is because in order to do the timing for the event, the single hardware channel queues up the other launches while waiting for each kernel to finish, then it records the end time in the event, then goes on to the next kernel. With HyperQ's addition of more hardware streams, you get around this problem. Run the simpleHyperQ sample on a 580 or a 680 through a tool like Nsight and look at the timeline... You'll see all the work in the streams show up like stair steps -- even though they're in different streams, they happen one at a time. Now run it on a GTX Titan or a K20 and you'll see many of the kernels are able to completely overlap. If 8 hardware streams are enabled, the app will finish 8x faster, or if 32 are enabled, 32x faster.

    Now, this sample is extremely contrived, just to illustrate the feature. In reality, overlapping kernels won't buy you much speedup if you're already launching big enough kernels to use the GPU effectively. In that case, there shouldn't much room left for overlapping kernels, except when you have unbalanced workloads where many threads in a kernel finish quickly but a few stragglers run way longer. With HyperQ, you greatly increase your chances that kernels in other streams can immediately start using the resources freed up when some of the threads in a kernel finish early, instead of waiting for all threads in the kernel to finish before starting the next kernel.
    Reply
  • vacaloca - Monday, March 04, 2013 - link

    I wanted to say that you hit the nail on the head... I just tested the simpleHyperQ example, and indeed, the Titan has 8 hardware streams enabled. For every multiple higher than 8, and the "Measured time for sample" goes up. Reply

Log in

Don't have an account? Sign up now