History

We first met LucidLogix (now just Lucid) 2.5 years ago at IDF. The promise was vendor-agnostic multi-GPU setups with perfect performance scaling. The technology was announced at a very important time. Intel and NVIDIA were battling out support for SLI on Nehalem motherboards. NVIDIA didn't want SLI enabled on any non-NVIDIA chipsets, and Intel wasn't about to let NVIDIA build any chipsets for Nehalem. Lucid's Hydra technology seemed to be exactly what we needed to get around the legal holdup that kept Nehalem users from enjoying SLI.

Three things made Lucid's technology less interesting as time went on. Hydra took two years to come to market, NVIDIA enabled SLI on Intel platforms and single GPU performance got really, really good.

What made Lucid's Hydra tech possible was a software layer that intercepted OpenGL and DirectX calls from the CPU and directed them to a GPU of Lucid's choosing. While Hydra saw limited success, parts of the technology had another application.

Sandy Bridge's Platform Issues

Although we came away impressed by Intel's Sandy Bridge CPU and GPU, it was the platform that really let us down. SATA controller errata aside, Intel's 6-series chipset lineup had a huge problem. At launch the P67 was the only chipset that supported CPU overclocking, however P67 doesn't support SNB's on-die GPU. Enter the H67 chipset, which does support processor graphics but it doesn't support overclocking. It gets worse.

One of the biggest features Sandy Bridge has to offer is the support for hardware assisted video transcoding (Quick Sync). In our review we found Intel's Quick Sync to be the absolute best way to transcode video for use on portable devices. There's just one issue: Quick Sync only works when the on-die GPU is active.

If you pair Sandy Bridge with a discrete GPU on the desktop, you lose the ability to use one of the CPU's biggest features.

Intel will address the overclocking/processor graphics exclusion through the upcoming Z68 chipset, however that doesn't solve the problem of not being able to use Quick Sync if you have a discrete GPU installed. Intel originally suggested using multiple monitors with one hooked up to the motherboard's video out and the other hooked up to your discrete GPU to maintain Quick Sync support, however that's hardly elegant. At CES this year we were shown a better alternative from none other than Lucid.

Remember the basis of how Hydra worked: intercept API calls and dynamically load balance them across multiple GPUs. In the case of Sandy Bridge, we don't need load balancing - we just need to send games to a discrete GPU and video decoding/encoding to the processor's GPU. This is what Lucid's latest technology Virtu, does.

The name Virtu is short for GPU Virtualization and the setup is pretty simple at a high level.

Start with a platform that supports Sandy Bridge's processor graphics (H6x or Z68) and connect your display to the motherboard's video out. Add in a supported discrete GPU, supply power but don't connect your monitor to it.

Virtu behaves a lot like Hydra. It intercepts API calls and passes them along to a GPU of its choosing. Unlike Hydra however, the goal here isn't to spread the load across multiple GPUs. Instead, Virtu aims to match each task with the GPU best suited to it.

Video output is handled by SNB's GPU, data is simply copied from the dGPU's frame buffer to the iGPU's frame buffer for output. There should be some overhead in this process however Lucid claims it's minimal.

What we end up with is a system that should run all 3D games on your discrete GPU, and run all video decoding and encoding on SNB's GPU. Since this isn't switchable graphics but rather a form of GPU virtualization you can actually run iGPU and dGPU applications at the same time (e.g. you can watch a movie in one window on the iGPU and play a game in another on the dGPU).

Virtu relies on profiles and hard coded GPU support. Currently there are around 100 games/benchmarks that are supported by Virtu. Eventually you'll be able to manually add your own titles but for now we have to rely on what Lucid has validated and enabled. GPU support is broad but limited to anything from the AMD 4xxx, 5xxx and 6xxx series as well as the NVIDIA 2xx, 4xx and 5xx series. Lucid pledges to always ensure the top games are tested/supported as well as the previous two generations of AMD and NVIDIA GPUs.

The Virtu software will be bundled with motherboards. The business arrangements will take place between the motherboard manufacturers and Lucid itself, the end user shouldn't have to worry about licensing the software.

Lucid gave us a copy of the software it shared with motherboard manufacturers: a Virtu release candidate. The software is still not mass production and there are some limits (e.g. can't define our own game profiles, there's a Virtu logo plastered randomly on the screen when you're gaming) but it's enough to give us a brief look at the technology.

Installing Virtu was very simple. Just go through the installer application, reboot and you're good to go. The only requirements are that you're using a compatible video card and that your display is connected to the SNB video out and not the discrete GPU.

Once loaded the first thing I noticed was AMD's Catalyst Control Center and NVIDIA's control panel refused to load. As far as they were concerned, I was running an Intel HD 3000 GPU and they weren't needed. The appropriate AMD and NVIDIA drivers did load however.

Other than the irate control panels, the rest of the experience was completely seamless. I ran games, browsed the web and even transcoded a video - each application behaved as if the only GPU available was the one best suited for the task. Quick Sync even came up as an option under Arcsoft's Media Converter 7.

I measured performance with Virtu and natively off of the dGPU itself in four games to see how much overhead the frame buffer copying and Virtu interception posed:

AMD Lucid Virtu Performance Impact - 1920 x 1200, 4X AA, High Quality
  Civilization V DiRT 2 Metro 2033 World of Warcraft
AMD Radeon HD 6970 39.6 fps 76.4 fps 34.7 fps 111.5 fps
AMD Radeon HD 6970 (Virtu) 36.5 fps 74.4 fps 32.3 fps 102.8 fps

 

NVIDIA Lucid Virtu Performance Impact - 1920 x 1200, 4X AA, High Quality
  Civilization V DiRT 2 Metro 2033 World of Warcraft
NVIDIA GeForce GTX 460 38.8 fps 69.4 fps 18.7 fps 85.4 fps
NVIDIA GeForce GTX 460 (Virtu) 35.8 fps 48.0 fps 18.0 fps 79.7 fps

I generally saw a 2 - 8% drop in performance compared to a standalone discrete GPU without Virtu. The only exception was a big 30% drop on the GeForce GTX 460 running the DiRT 2 benchmark. Given the relatively consistent performance everywhere else, I'm guessing this is an early-software-artifact rather than a normal occurrence.

I also ran a Quick Sync test both with and without a discrete GPU attached - performance remained unchanged:

Lucid Virtu Performance Impact
  Quick Sync Nikon D7000 (1080p24) to iPhone 4
AMD Radeon HD 6970 + Intel HD Graphics 3000 (Virtu) 199.3 fps
Intel HD Graphics 3000 199.3 fps

Finally I decided to run a Quick Sync test while I ran our Metro 2033 benchmark to see how running two tasks, each on an independent GPU, impacted each other:

Lucid Virtu Performance Impact (Metro 2033 + Quick Sync)
  Quick Sync Nikon D7000 (1080p24) to iPhone 4 Metro 2033 Benchmark
Peak Theoretical Performance 199.3 f[s 36.5 fps
AMD Radeon HD 6970 + Intel HD Graphics 3000 (Virtu) 72.0 fps 32.1 fps

While Metro didn't lose much performance, the Quick Sync task ran considerably slower. Remember that the Quick Sync engine shares resources with the Sandy Bridge CPU cores (mainly the ring bus and L3 cache). Having the CPU working on feeding the dGPU vertex data definitely impacts Quick Sync performance.

Finally I measured power consumption:

Lucid Virtu Power Consumption
  Idle Load (Metro 2033)
Intel HD Graphics 3000 34.7W N/A
AMD Radeon HD 6970 (Virtu) 126W 265W
NVIDIA GeForce GTX 460 (Virtu) 52.0W 191W

Here we see that there are still some kinks that need to be worked out. With the Radeon HD 6970 idle power is still quite high, even with the dGPU idle. The GeForce GTX 460 paints a different picture as Lucid manages to mostly power down the NVIDIA GPU when it's not in use. Note that even in this case there's a power penalty over a purely integrated setup - the dGPU is still active to a certain extent.

Final Words

Intel is slowly correcting the issues with the Sandy Bridge platform situation. The first B3 stepping 6-series chipsets are now in the hands of OEMs and motherboard manufacturers and Z68 boards are coming in the next quarter. Lucid's Virtu is a key part of the strategy however, at least on the desktop. In mobile it's a non-issue as everyone supports some form of switchable graphics there, but for desktops we need a universal solution. While the Virtu release candidate still needs some work, it's far more polished than I expected it to be.

Once setup there's no user intervention necessary - the software just works. Fire up a game and it'll run on your discrete GPU. Visit YouTube or transcode a video and your discrete GPU powers down leaving Sandy Bridge's on-die graphics to handle the workload.

There is definite overhead to Virtu - I measured 2 - 8% on average, however I did see a 30% figure pop up in DiRT 2 on NVIDIA hardware. I'd expect the performance hit to be less than 10% in most cases.

Board makers and OEMs should have their hands on the RC of Virtu now, meaning we should see it show up in motherboard boxes in the not too distant future. Of course this still doesn't take care of those users who wish to overclock their CPU, pair it with a discrete GPU and use Quick Sync as well. We'll have to wait until Z68 for that to happen. Even then, Lucid's Virtu will still likely play a role in those systems.

Comments Locked

40 Comments

View All Comments

  • PhoenixEnigma - Tuesday, March 1, 2011 - link

    The point, in a nutshell, is to be able to have both a dGPU and Quick Sync work on one system. Without this, it's one or the other, which is a shame as both provide significant benefits.
  • synaesthetic - Tuesday, March 1, 2011 - link

    To save power, to reduce heat and fan noise.

    (This will probably be more useful on laptops than desktops, however...)
  • JumpingJack - Tuesday, March 1, 2011 - link

    SB enables a key feature that some, maybe even many, people will find useful: video transcoding. If you do not do much nor no video transcoding, then all this is sorta pointless. But, if you are one to move lots of media to your portable devices, then this is a good solution to an otherwise bizarre design of the SB platform.

    QuickSync uses the iGPU to work much of it's magic, a combination of fixed function encode HW and use of the EUs in some of the transcoding stream. Intel, in a fit of non-brilliance, did not account (or did not take time to think through) the fact that some people will would want to use a dGPU. As a result, using a dGPU renders the function of SB HW transcode non-operational. This is a big deal since the SB transcode can out perform most GPU accelerated transcode by a wide margin, for a high end Radeon SB can be as much as 2x faster, and for Fermi as much as 60-80% faster.
  • mbf - Tuesday, March 1, 2011 - link

    ...thinking it would have been a good idea to have iGPU data sent to the dGPU instead of vice versa? Or perhaps even make this configurable.

    I know that you wouldn't be able to cut down on power or noise, but I'm sure you'd lose less perfomance in gaming.
  • QuantumTR - Tuesday, March 1, 2011 - link

    lol exactly...
  • Breit - Tuesday, March 1, 2011 - link

    ...just wanted to ask same question. ;)

    As far as I understand it, Intel shuts down its iGPU if no monitor is connected and that is why you have to route the display output through the iGPU. Maybe Lucid should talk to Intel first before releasing this... 8)

    Btw: How do you connect multiple monitors to such a setup?
  • QuantumTR - Tuesday, March 1, 2011 - link

    I think that it could be better if the Virtu actually copied the frame buffer from integrated GPU in 2D mode or encode and decode to the discrete GPU so that the overhead incurred in 3D rendering would be minimized? I'm pretty sure most people would not prefer having a %5-10 performance impact on their games instead of finishing a QuickSync encoding a few seconds later. Or better than this would be allowing the user to select the output graphics card.
  • jcompagner - Tuesday, March 1, 2011 - link

    http://www.nvidia.com/object/optimus_technology.ht...

    I am searching for a SB notebook, but i want that enabled, because i almost never use the discrete gpu anyway. And yes i have to buy one because in the 17" high end laptop you don't have a choice..
  • mutantmagnet - Tuesday, March 1, 2011 - link

    or Lucid eventually revises their technology so we can choose the output otherwise I wouldn't be able to use Eyefinity without constantly rearranging the cables.

    I really like the idea behind this but I'm wondering how flexible it can be at handling the allocation of multiple gpus to more than 2 displays. Having an agnostic gpu virtualization platform instead of being locked into either AMD or Nvidia would be beneficial espeically since they currently believe only people in the enterprise market are considering gpu virtualization.
  • DooDoo22 - Tuesday, March 1, 2011 - link

    What is apple doing to utilize both quick sync and discrete amd graphics switching? Is it something similar to this?

Log in

Don't have an account? Sign up now