Original Link: http://www.anandtech.com/show/6863/intels-pixelsync-instantaccess-two-new-directx-extensions-for-haswell
Intel's PixelSync & InstantAccess: Two New DirectX Extensions for Haswellby Anand Lal Shimpi on March 27, 2013 9:00 PM EST
As Intel continues its march towards performance relevancy in the graphics space with Haswell, it should come as no surprise that we're hearing more GPU related announcements from the company. At this year's Game Developer Conference, Intel introduced two new DirectX extensions that will be supported on Haswell and its integrated GPU. The first extension is called PixelSync (yep, Intel branding appears alive and well even on the GPU side of the company - one of these days the HD moniker will be dropped and Core will get a brother). PixelSync is Intel's hardware/software solution to enable Order Independent Transparency (OIT) in games. The premise behind OIT is the quick sorting of transparent elements so they are rendered in the right order, enabling some pretty neat effects in games if used properly. Without OIT, game designers are limited in what sort of scenes they can craft. It's a tough problem to solve, but one that has big impacts on game design.
Although OIT is commonly associated with DirectX 11, it's not a feature of the API but rather something that's enabled using the API. Intel's claim is that current implementations of OIT require unbounded amounts of memory and memory bandwidth (bandwidth requirements can scale quadratically with shader complexity). Given that Haswell (and other integrated graphics solutions) will be more limited on memory and memory bandwidth than the highest end discrete GPUs, it makes sense that Intel is motivated to find a smaller footprint and more bandwidth efficient way to implement OIT.
The hardware side of PixelSync is simply the enabling of programmable blend operations on Haswell. On PC GPU architectures, all frame buffer operations flow through fixed function hardware with limited flexibility. Interestingly enough, this is one area where the mobile GPUs have moved ahead of the desktop world - NVIDIA's Tegra GPUs reuse programmable pixel shader ALUs for frame buffer ops. The Haswell implementation isn't so severe. There are still fixed function ROPs, but the Haswell GPU core now includes hardware that locks and forces the serialization of memory accesses when triggered by the PixelSync extension. With 3D rendering being an embarrassingly parallel problem, having many shader instructions working on overlapping pixel areas can create issues when running things like OIT algorithms. What PixelSync does is allows the software to tell the hardware that for a particular segment of code, that any shaders running on the same pixel(s) need to be serialized rather than run in parallel. The serialization is limited to directly overlapping pixels, so performance should remain untouched for the rest of the code. This seemingly simple change goes a long way to enabling techniques like OIT, as well as giving developers the option of creating their own frame buffer operations.
The software side of PixelSync is an evolved version of Intel's own Order Independent Transparency algorithm that leverages high quality compression to reduce memory footprint and deliver predictable performance. Intel has talked a bit about an earlier version of this algorithm here for those interested.
Intel claims that two developers have already announced support for PixelSync, with Codemasters producer Clive Moody (GRID 2) appearing in the Intel press release excited about the new extension. Creative Assembly also made an appearance in the PR, claiming the extensions will be used in Total War: Rome II.
The second extension, InstantAccess is simply Intel's implementation of zero copy. Although Intel's processor graphics have supported unified memory for a while (CPU + GPU share the same physical memory), the two processors don't get direct access to each others memory space. Instead, if the GPU needs to work on something the CPU has in memory, it needs to make its own copy of it first. The copy process is time consuming and wasteful. As we march towards true heterogeneous computing, we need ways of allowing both processors to work on the same data in memory. With InstantAccess, Intel's graphics driver can deliver a pointer to a location in GPU memory that the CPU can then access directly. The CPU can work on that GPU address without a copy and then release it back to the GPU. AMD introduced support for something similar back with Llano.
Source: Intel PR