In our Kaveri review, we discussed HSA and that Kaveri brings many exciting hardware features such as true CPU/GPU shared memory (hUMA) and others such as heterogeneous queueing (hQ). However, at launch they were not really exposed in the drivers.  AMD has now provided an update on the driver roadmap for exposing the hardware features to various compute APIs.

Today AMD is expected to release a beta driver for Windows that exposes some shared memory extensions to OpenCL. Currently, AMD ships an OpenCL 1.2 implementation for Kaveri. OpenCL 1.2 standard by itself does not really expose shared memory features properly but OpenCL 2.0 will have more robust support.  AMD does not have a full OpenCL 2.0 driver yet, but today they will be providing some of the 2.0 functionality as extensions in their current OpenCL 1.2 driver.  I don't have the details on the exact extensions supported, and I will update the article when I do.

However, OpenCL is only part of the story. Kaveri's promise is that it will provide driver support for HSA software stack which includes components such as a compiler for HSAIL and a HSA runtime. HSA software stack will enable high-level languages and simplified programming models and the exciting HSA developments appear to be happening on Linux first. In Q2 2014, AMD will release a beta HSA software stack for Linux.  The Linux HSA stack release will be around the same time as release of Berlin APU for servers and Bald Eagle APU for embedded applications. These are both variations of Kaveri for different markets and in both of these markets Linux plays a very important role. 

The HSA runtime stack for Linux will enable compiler writers and low-level library developers to start developing for HSA. The official HSA runtime API specifications are not finalized yet,  and this release will be based upon prototype specifications. However, I think the prototype driver will be close enough to the final specifications that it will not matter much to developers.

Most developers are not interested in the base HSA stack and instead will prefer higher-level languages and tools. Several programming languages and tools will be released this year for targeting HSA. First, AMD will release an HSA-enabled version of their Java Aparapi library. Currently Aparapi targets OpenCL 1.2 for all OpenCL capable systems, and the mentioned release will include optimizations specific to HSA enabled systems.  The HSA-enabled release of Aparapi is already under development and testing and should be released soon after the HSA stack release.  

At some point this year, Multicoreware is expected to release the HSA backend for their C++ AMP implementation for Linux. Finally, AMD also mentioned that an extension of GCC is being worked on with SUSE that compiles C/C++/Fortran OpenMP code to generate code for HSA. I am not sure about the version of GCC and the version of OpenMP supported, or whether this will depend on non-standard directives and I will update the article if I get this information.

To summarize, AMD is continuing to work on exposing Kaveri's differentiating hardware features such as hUMA and hQ to various programming languages and tools.  This year we should see the HSA stack and associated tools and languages stabilizing and becoming very usable for Linux developers, especially for server and embedded markets. For Windows, at least applications using OpenCL will be able to tap into some of Kaveri's new hardware features and more options should be coming down the pipeline.

UPDATE: The new Windows driver with preview support for some OpenCL 2.0 type functionality is now downloadable from here. The driver has very specific hardware requirements, such as requiring a A10-7850K in an Asus A88X-PRO motherboard and 8GB of RAM. More details about the requirements and the OpenCL extensions supported can be found in the OpenCL 2.0 preview driver download.

POST A COMMENT

22 Comments

View All Comments

  • pidgin - Monday, March 03, 2014 - link

    will HSA benefit games that from off from dGPU? Reply
  • pidgin - Monday, March 03, 2014 - link

    *run Reply
  • samlebon2306 - Monday, March 03, 2014 - link

    It would benefit games in the following scenario:
    - Physics implemented on HSA on the iGPU
    - Rendering on the dGPU.

    AMD is working on this configuration.
    Reply
  • fteoath64 - Tuesday, March 04, 2014 - link

    Yes. Physics in iGpu via HSA stack and rendering on dGpu (via OpenCL or direct). Other fp-Ops can be on the iGpu has well with HSA depending on the compiler targets for CU resources. Potentially I see the HSA Runtime unit maybe tunable on each config so it can be manually optimized for some really heavy games like Crysis. Reply
  • przemo_li - Monday, March 03, 2014 - link

    AMD must have somthing for Linux gamers. They are shipping Linux benchmarking website crew (and authors of nice benchmarking framework), to the GDC this year.

    While they failed to ship any hw in recent years...
    Reply
  • przemo_li - Monday, March 03, 2014 - link

    hUMA is just ***** great. For everyone. And gamed do use some heavy phisical calculations that can be done swiftly on GPU... When You remove roadblocks, like those nasty memory transfers) Reply
  • fteoath64 - Tuesday, March 04, 2014 - link

    Yeah but advantage is for only AMD since they have the only implementation with Kaveri. One can see that there are a couple of serious improvements needed to increase the Ram controller bandwidth of AMD's core compared to the Intel RAM bandwidth speeds. ALso, it is possible to mix GDDR5 RAM (like in PS4) with DDR3-2400 in a later iteration where the hUMA switch manages the ram segments for all the CUs to use. That would really super-charge the gpu cores.
    Looking even further, I can see AMD changing PCIe slot0 to a hUMA-Port so dGPU will be part of the APU as a full and equal CU. If you look back at AGP development, it is almost the same method. Sure if Nvidia wants to make hUMA-Port dGPU in future, they should be able to do so.
    This would be interesting when Arm 64bit server chips with HSA comes and has dual sided, hUMA-port connects. ie serious GPU compute clusters at low cost. I can
    hope since the HSA architecture has legs beyond they traditional design. It is a greater leap than many people thinks.
    Reply
  • przemo_li - Monday, March 03, 2014 - link

    @Rahul Garg

    OpenMP 4.0 should land in GCC 4.9, and that is based on GOMP branch (Red Hat seam to be moving into merging it at last).

    So AMDs work should be based on this.

    (So GCC 4.9 it should be :) and at least acceleration of some of OpenMP)
    Reply
  • ruthan - Monday, March 03, 2014 - link

    I should be working about of box in all main compiler and without change of simple line of code, after that it have change, otherwise it will next Nvdia / Ati dead project.. Reply
  • duffie - Wednesday, March 05, 2014 - link

    Looking forward to AMD's beta HSA software stack for Linux in Q2 2014! I wish AMD would have released it already when they launched Kaveri... Reply

Log in

Don't have an account? Sign up now