Kicking off this week is the annual SIGGRAPH conference, the graphics industry’s yearly professional event. Outside of the individual vendor events and individual technologies we cover throughout the year, SIGGRAPH is typically the major venue for new technology and standards announcements. And though it isn’t really a gaming conference – this is a show dedicated to professional software and hardware – a number of those announcements do end up being gaming related, if only tangentially. As a result SIGGRAPH offers something for everyone in the graphics/GPU trident, gaming, compute, and professional rendering alike.

Most years the first major announcement to hit the wire comes from the Khronos Group, and this year is no different. The Khronos Group is of course the industry consortium responsible for OpenGL, OpenCL, WebGL, and other open graphics/multimedia standards and APIs, so their announcements carry a great deal of importance for the industry. Khronos membership in turn is a who’s who of technology, and includes virtually every major GPU vendor, both desktop and mobile.

OpenGL 4.4 Specification Released

Khronos’s first announcement for SIGGRAPH 2013 is that the OpenGL 4.4 specification has been ratified and released. This being the 5th edition of OpenGL 4.x, Khronos has continued iterating on OpenGL in concert with their pipelined development process. OpenGL 4.4 follows up on OpenGL 4.3, which last year broke significant ground for OpenGL by introducing compute shaders, ASTC texture compression, and other new functionality for the API.

This year Khronos isn’t making such sweeping changes to OpenGL, but they are adding several new low-level features that should catch the eyes of developers. Most of these are admittedly so low level that it would be difficult for anyone but developers to appreciate, but there are a few items we wanted to go over for their importance and for wider reflection of the state of OpenGL.

The biggest feature hitting the OpenGL core specification in 4.4 is buffer storage (ARB_buffer_storage). Buffer storage is directly targeted at APUs, SoCs, and other GPU/CPU integrated devices where the two processors share memory pools, address space, and other resources. Buffer storage at its most basic level allows developers to control where memory buffer objects are stored in these unified devices, giving developers the ability to specify whether buffers are stored in video memory or system memory, and how those buffers are to be cached. The buffer storage mechanism in turn also formally allows GPUs to access those buffers not being stored locally, giving GPUs a degree of visibility into the contents of system memory where it’s necessary. Like most Khronos additions this is a forward looking feature, with a clear outlook towards what can be done with HSA and HSA-like products that are due to be launching soon.

Khronos’s other major addition with OpenGL 4.4 is enhanced layouts for the OpenGL Shader Language (ARB_enhanced_layouts). The name on this is somewhat self-explanatory in this case, with enhanced layouts dealing with ways to optimize the layout of data in shader programs for greater efficiency. This includes new ways of packing scalar datatypes alongside vectors, and giving developers more control of variable layout inside uniform and storage blocks. Support for constant variables in qualifiers at compile-time is also added through this extension.

Moving on from the OpenGL core, in keeping with the OpenGL development pipeline several new features are being added as official ARB extensions, being promoted (and modified/unified as necessary) from vendor specific extensions. Chief among these new ARB extensions are extensions to support sparse textures (ARB_sparse_texture) and bindless textures (ARB_bindless_texture). You may recognize these features from the launch of AMD’s Radeon HD 7000 series and NVIDIA’s GeForce GTX 600 series respectively, as these two extensions are based on the new hardware features those products introduced and are the evolution of their previous forms as vendor specific extensions.

Sparse textures, also known as partially resident textures, give the hardware the ability to only keep tiles/chunks of textures in resident memory, versus having to load (and unload) whole textures. The most practical application of this technology is to enable megatexture-like texture management in hardware, loading only the necessary tiles of the highest resolution textures; however for professional developers this also opens up a new usage scenario by allowing the use of textures larger than the physical memory of a card, allowing for the use of larger textures without restriction by memory constraints.

Meanwhile bindless textures functionality does away with the concept of texture “slots” and the limits imposed by the limited number of slots, replacing the fixed size binding table with unlimited redirection through the use of virtual addresses. The primary benefit of this is that it allows the easy addition and use of more textures within a scene (under most DX11 hardware this limit was 128 slots), however there is also a performance angle to this. Since binding and rebinding objects is a task that relies on the CPU, getting rid of binding altogether can improve performance in CPU limited scenarios. Khronos/NVIDIA throws around a 10x best-case number, and while this is certainly the exception rather than the rule it will be interesting to see what the real world benefits are like once applications start coming out utilizing this feature.

Ultimately both of these features, along with several other ARB extensions, are in the middle of their evolution. The ARB extension stage is essentially a half-way house for major features, allowing features to be further refined and analyzed after being standardized by the ARB. The ultimate goal here is for most of these features to graduate from extensions and become part of the core OpenGL standard in future versions, which means if everything goes smoothly we’d expect to see sparse texture support and bindless texture support in the core standard (and the devices that support it) in the not too distant future.

Finally, in a move that should have developers everywhere jumping with joy, OpenGL finally has official and up to date conformance tests. OpenGL has not had an up to date conformance test since the project was led by SGI almost a decade ago, with the task of developing the tests being a continual work in progress for many years. In the interim the lack of conformance testing has been an obstacle for OpenGL, as there wasn’t an official way to validate implementations against known and expected behaviors, leading to more uncertainty and bugs than anyone was comfortable with.

Now with the completion of the new conformance tests, OpenGL implementations can be tested for their conformance, and in turn those implementations will now need to be conformant before they are approved by Khronos. For developers this means they will be writing software against better devices and drivers, and for device makers they will have an official target to chase rather than having to interpret the sometimes ambiguous OpenGL standards.

OpenCL SPIR 1.2: An Intermediate Format For OpenCL


View All Comments

  • ltcommanderdata - Monday, July 22, 2013 - link

    In DirectX 11.2 there's some thought that Tiled Resources tier 1 refers to Bindless Textures while tier 2 refers to Sparse Textures. So is there any hierarchy to these features where GPUs supporting sparse textures (GCN-based) should also support bindless textures (only Keplar officially announced)? Or are they independent?

    And is GK110 the only GPU currently available that supports Dynamic Parallelism and therefore the only currently shipping GPU that will be OpenCL 2.0 compatible? Hopefully Volcanic Islands and Maxwell will bring dynamic parallelism to the full product stack rather than just the top-end GPUs.
  • sontin - Monday, July 22, 2013 - link

    GK208 (Cuda 3.5) supports Dynamic Parallelism. And with next year i guess every GPU and Tegra 5 will support at leats Cuda 3.5. Reply
  • TeXWiller - Monday, July 22, 2013 - link

    That may be Tesla/Quadro-exlusive offering anyway. It would be nice if Nvidia would bring back the "democratization of parallelism" from the times long gone. Reply
  • sontin - Monday, July 22, 2013 - link

    GK208 is a consumer chip. It's not limited to the Quadro or Tesla series. They even advertised cards based on the chip on their blog:
  • xdrol - Monday, July 22, 2013 - link

    The thing is, nVidia does not care about OpenCL, they could not even (ehm, rather, didn't want to) ship an OpenCL 1.2 driver at all.. So it's nice that the hardware supports the feature, if we cannot use it from OpenCL, that's no better than AMD's version. Reply
  • name99 - Tuesday, July 23, 2013 - link

    You mean you won't be able to use OpenCL on Windows?

    I assume on OSX you'll be able to use OpenCL and it will continue to move forward --- unless nVidia thinks it would be a wise business move to ensure Apple never buys from them again.
  • MrSpadge - Saturday, July 27, 2013 - link

    No, by "not shipping OpenCL 1.2 drivers" he means they're still at 1.1. I don't expect this to be any better on OSX. Reply
  • Ryan Smith - Monday, July 22, 2013 - link

    To the best of knowledge this is independent. At one point last year AMD said they didn't believe they could implement bindless textures in GCN hardware in an equivalent manner to NVIDIA. And NVIDIA of course may be bindless, but they can't do sparse in hardware. I fully expect the two to come together in the next generation of hardware, with NV and AMD gaining their competitor's respective functionality. Reply
  • przemo_li - Thursday, July 25, 2013 - link

    Nvidia support both ARB_bindless_texture and ARB_sparse_texture in their drivers:

    And "Tiers" and "feature levels" is MS speak for extensions and OpenGL versions. (Since clean DX9, DX10, DX11 do not work any more)

    Both of those did NOT landed in core. Those are extensions vendors may implement in hw if they like. (Though ARB stand there for a reason, and those extensions should land in core in unchanged form if hw support will be good)
  • chris81 - Monday, July 22, 2013 - link

    Now with the competition of the new conformance tests -> completion

Log in

Don't have an account? Sign up now