Original Link: http://www.anandtech.com/show/7169/nvidia-demonstrates-logan-soc-mobile-kepler
NVIDIA Demonstrates Logan SoC: < 1W Kepler, Shipping in 1H 2014, More Energy Efficient than A6X?by Anand Lal Shimpi on July 24, 2013 9:00 AM EST
Ever since its arrival in the ultra mobile space, NVIDIA hasn't really flexed its GPU muscle. The Tegra GPUs we've seen thus far have been ok at best, and in serious need of improvement at worst. NVIDIA often blamed an immature OEM ecosystem unwilling to pay for the sort of large die SoCs necessary in order to bring a high-performance GPU to market. Thankfully, that's all changing. Earlier this year NVIDIA laid out its mobile SoC roadmap through 2015, including the 2014 release of Project Logan - the first NVIDIA ultra mobile SoC to feature a Kepler GPU. Yesterday in a private event at Siggraph, NVIDIA demonstrated functional Logan silicon for the very first time.
NVIDIA got Logan silicon back from the fabs around 3 weeks ago, making it almost certain that we're dealing with some form of 28nm silicon here and not early 20nm samples.
NVIDIA isn't talking about CPU cores, but it's safe to assume that Logan will be another 4+1 arrangement of cores - likely still based on ARM's Cortex A15 IP (but perhaps a newer revision of the core). On the GPU front, NVIDIA confirmed our earlier speculation that Logan includes a single Kepler SMX:
One Kepler SMX features 192 CUDA cores. NVIDIA isn't talking about shipping GPU frequencies either, but it did provide this chart to put Logan's GPU capabilities into perspective:
Don't get too excited as we're looking at a comparison of GFLOPS and not game performance, but the peak theoretical ALU bound performance of mobile Kepler should exceed that of a Playstation 3 or GeForce 8800 GTX (memory bandwidth is another story however). If we look closely at NVIDIA's chart and compare mobile Kepler to the iPad 4, we get a better idea of what sort of clock speeds NVIDIA would need to attain this level of performance. Doing some quick Photoshop estimation it looks like NVIDIA is claiming mobile Kepler has somewhere around 5.2x the FP power of the PowerVR SGX 554MP4 in the iPad 4 (76.8 GFLOPS). That works out to be right around 400 GFLOPS. With a 192 core implementation of Kepler, you get 2 FLOPS per core or 384 FLOPS per cycle. To hit 400 GFLOPS you'd need to clock the mobile Kepler GPU at roughly 1GHz. That's certainly doable from an architectural standpoint (although we've never seen it done on any low power 28nm process), but it's probably a bit too high for something like a smartphone.
NVIDIA didn't want to talk frequencies but they did tell me that we might see something this fast in some sort of a tablet. I suspect that most implementations will be clocked significantly lower. Even at half the frequency though, we're still talking about roughly Playstation 3 levels of FP power out of a mobile SoC. We know nothing of Logan's memory subsystem, which obviously plays a major role in real world gaming performance but there's no getting around the fact that Logan's Kepler implementation means serious business. For years we've lamented NVIDIA's mobile GPUs, Logan looks like it's finally going to change that.
API Support and Live Demos
Unlike previous Tegra GPUs, Kepler is a fully unified architecture and OpenGL ES 3.0, OpenGL 4.4 and DirectX 11 compliant. The API compliance alone is a huge step forward for NVIDIA. It's also a big one for game developers looking to move more seriously into mobile. Epic's Tim Sweeney even did a blog post for NVIDIA talking about Logan's implementation of Kepler and how it brings feature parity between PCs, next-gen consoles and mobile platforms. NVIDIA responded in kind by running some Unreal Engine 4 demos on Android on a Logan test platform. That's really the big story behind all of this. With Logan, NVIDIA will bring its mobile GPUs up to feature parity with what it's shipping in the PC market. Game developers looking to port games between console, PC, tablet and smartphone should have an easier job of doing that if all platforms supported the same APIs. Logan will take NVIDIA from being very behind in API support (with no OpenGL ES 3.0 support) to the head of the class.
NVIDIA took its Ira demo, originally run on a Titan at GTC 2013, and got it up and running on a Logan development board. Ira did need some work to make the transition to mobile. The skin shaders were simplified, smaller textures are used and the rendering resolution is dropped to 1080p. NVIDIA claims this demo was done in a 2 - 3W power envelope.
The next demo is called Island and was originally shown on a Fermi desktop part. Running on Logan/mobile Kepler, this demo shows OpenGL 4.3 and hardware tessellation working.
The development board does feature a large heatspreader, but that's not too unusual for early silicon just out of bring up. Logan's package size should be comparable to Tegra 4, although the die size will clearly be larger. The dev board is running Android and is connected to a 10.1-inch 1920 x 1200 touchscreen.
There's a lot of uncertainty around whether or not Kepler is suitable for ultra low power operation, especially given that we've only seen it in relatively high TDP (compared to tablets/smartphones) PCs. NVIDIA hoped to put those concerns to rest with a quick GLBenchmark 2.7 demo at Siggraph. The demo pitted an iPad 4 against a Logan development platform, with Logan's Kepler GPU clocked low enough to equal the performance of the iPad 4. The low clock speed does put Kepler at an advantage as it can run at a lower voltage as well, so the comparison is definitely one you'd expect NVIDIA to win.
Unlike Tegra 3, Logan includes a single voltage rail that feeds just the GPU. NVIDIA instrumented this voltage rail and measured power consumption while running the offscreen 1080p T-Rex HD test in GLB2.7. Isolating GPU power alone, NVIDIA measured around 900mW for Logan's Kepler implementation running at iPad 4 performance levels (potentially as little as 1/5 of Logan's peak performance). NVIDIA also attempted to find and isolate the GPU power rail going into Apple's A6X (using a similar approach to what we documented here), and came up with an average GPU power value of around 2.6W.
I won't focus too much on the GPU power comparison as I don't know what else (if anything) Apple hangs off of its GPU power rail, but the most important takeaway here is that Kepler seems capable of scaling down to below 1W. In reality NVIDIA wouldn't ship Logan with a < 1W Kepler implementation, so we'll likely see higher performance (and power consumption) in shipping devices. If these numbers are believable, you could see roughly 2x the performance of an iPad 4 in a Logan based smartphone, and 4 - 5x the performance of an iPad 4 in a Logan tablet - in as little as 12 months from now if NVIDIA can ship this thing on time.
If NVIDIA's A6X power comparison is truly apples-to-apples, then it would be a huge testament to the power efficiency of NVIDIA's mobile Kepler architecture. Given the recent announcement of NVIDIA's willingness to license Kepler IP to any company who wants it, this demo seems very well planned.
NVIDIA did some work to make Kepler suitable for low power, but it's my understanding that the underlying architecture isn't vastly different from what we have in notebooks and desktops today. Mobile Kepler retains all of the graphics features as its bigger counterparts, although I'm guessing things like FP64 CUDA cores are gone.
For the past couple of years we've been talking about a point in the future when it'll be possible to start playing console class games (Xbox 360/PS3) on mobile devices. We're almost there. The move to Kepler with Logan is a big deal for NVIDIA. It finally modernizes NVIDIA's ultra mobile GPU, bringing graphics API partity to everything from smartphones to high-end desktop PCs. This is a huge step for game developers looking to target multiple platforms. It's also a big deal for mobile OS vendors and device makers looking to capitalize on gaming as a way of encouraging future smartphone and tablet upgrades. As smartphone and tablet upgrade cycles slow down, pushing high-end gaming to customers will become a more attractive option for device makers.
Logan is expected to ship in the first half of 2014. With early silicon back now, I think 10 - 12 months from now is a reasonable estimate. There is the unavoidable fact that we haven't even seen Tegra 4 devices on the market yet and NVIDIA is already talking about Logan. Everything I've heard points to Tegra 4 being on the schedule for a bunch of device wins, but delays on NVIDIA's part forced it to be designed out. Other than drumming up IP licensing business, I wonder if that's another reason why we're seeing a very public demo of Logan now - to show the health of early silicon. There's also a concern about process node. Logan will likely ship at 28nm next year, just before the transition to 20nm. If NVIDIA is late with Logan, we could have another Tegra 3 situation where NVIDIA is shipping on an older process technology.
Regardless of process tech however, Kepler's power story in ultra mobile seems great. I really didn't believe the GLBenchmark data when I first saw it. I showed it to Ryan Smith, our Senior GPU Editor, and even he didn't believe it. If NVIDIA is indeed able to get iPad 4 levels of graphics performance at less than 1W (and presumably much more performance in the 2.5 - 5W range) it looks like Kepler will do extremely well in mobile.
Whatever NVIDIA's reasons for showing off Logan now, the result is something that I'm very excited about. A mobile SoC with NVIDIA's latest GPU architecture is exactly what we've been waiting for.