AGEIA PhysX Technology and GPU Hardware

First off, here is the low down on the hardware as we know it. AGEIA, being the first and only consumer-oriented physics processor designer right now, has not given us as much in-depth technical detail as other hardware designers. We certainly understand the need to protect intellectual property, especially at this stage in the game, but this is what we know.

PhysX Hardware:
125 Million transistors
130nm manufacturing process
128MB 733MHz Data Rate GDDR3 RAM
128-bit memory bus interface
20 giga-instructions per second
2 Tb/sec internal memory bandwidth
"Dozens" of fully independent cores


There are quite a few things to note about this architecture. Even without knowing all the ins and outs, it is quite obvious that this chip will be a force to be reckoned with in the physics realm. A graphics card, even with a 512-bit internal bus running at core speed, has less than 350 Gb/sec internal bandwidth. There are also lots of restrictions on the way data moves around in a GPU. For instance, there is no way for a pixel shader to read a value, change it, and write it back to the same spot in local RAM. There are ways to deal with this when tackling physics, but making highly efficient use of nearly 6 times the internal bandwidth for the task at hand is a huge plus. CPUs aren't able to touch this type of internal bandwidth either. (Of course, we're talking about internal theoretical bandwidth, but the best we can do for now is relay what AGEIA has told us.)

Physics, as we noted in last years article, generally presents itself in sets of highly dependant small problems. Graphics has become sets of highly independent mathematically intense problems. It's not that GPUs can't be used to solve these problems where the input to one pixel is the output of another (performing multiple passes and making use of render-to-texture functionality is one obvious solution); it's just that much of the power of a GPU is mostly wasted when attempting to solve this type of problem. Making use of a great deal of independent processing units makes sense as well. In a GPU's SIMD architecture, pixel pipelines execute the same instructions on many different pixels. In physics, it is much more often the case that different things need to be done to every physical object in a scene, and it makes much more sense to attack the problem with a proper solution.

To be fair, NVIDIA and ATI are not arguing that they can compete with the physics processing power AGEIA is able to offer in the PhysX chip. The main selling points of physics on the GPU is that everyone who plays games (and would want a physics card) already has a graphics card. Solutions like Havok FX which use SM3.0 to implement physics calculations on the GPU are good ways to augment existing physics engines. These types of solutions will add a little more punch to what developers can do. This won't create a revolution, but it will get game developers to look harder at physics in the future, and that is a good thing. We have yet to see Havok FX or a competing solution in action, so we can't go into any detail on what to expect. However, it is obvious that a multi-GPU platform will be able to benefit from physics engines that make use of GPUs: there are plenty of cases where games are not able to take 100% advantage of both GPUs. In single GPU cases, there could still be a benefit, but the more graphically intensive a scene, the less room there is for the GPU to worry about anything else. We are certainly seeing titles coming out like Oblivion which are able to bring everything we throw at it to a crawl, so balance will certainly be an issue for Havok FX and similar solutions.

DirectX 10 will absolutely benefit AGEIA, NVIDIA, and ATI. For physics on GPU implementations, DX10 will decrease overhead significantly. State changes will be more efficient, and many more objects will be able to be sent to the GPU for processing every frame. This will obviously make it easier for GPUs to handle doing things other than graphics more efficiently. A little less obviously, PhysX hardware accelerated games will also benefit from a graphics standpoint. With the possibility for games to support orders of magnitude more rigid body objects under PhysX, overhead can become an issue when batching these objects to the GPU for rendering. This is a hard thing for us to test for explicitly, but it is easy to understand why it will be a problem when we have developers already complaining about the overhead issue.

While we know the PhysX part can handle 20 GIPS, this measure is likely simple independent instructions. We would really like to get a better idea of how much actual "work" this part can handle, but for now we'll have to settle for this ambiguous number and some real world performance. Let's take a look a the ASUS card and then take a look at the numbers.

Index ASUS Card and Test Configuration
Comments Locked

101 Comments

View All Comments

  • Clauzii - Thursday, May 11, 2006 - link

    ASUS PhysX P1:

    Processor Type: AGEIA PhysX
    Bus Techonology: 32-bit PCI 3.0 Interface
    Memory Interface: 128-bit GDDR3 memory architecture
    Memory Capacity: 128 MByte
    Memory Bandwidth: 12 GBytes/sec.
    Effective Memory Data Rate: 733 MHz
    Peak Instruction Bandwidth: 20 Billion Instructions/sec
    Sphere-Sphere collision/sec: 530 Million max
    Convex-Convex(Complex) collisions/sec.: 533,000 max
  • jkay6969 - Monday, May 8, 2006 - link

    I have read all your posts with great interest, I feel that some very good points are being made, so here's my 2 cents worth ;-)

    I believe the 'IDEA' of having a dedicated PPU in your increasingly expensive monster rig is highly appealing, even intoxicating and I believe this 'IDEA' coupled with some clever marketing will ensure a good number of highly overpriced, or at least expensive, sales of this mystical technology in it's current (ineficient) form.

    For some, the fact that it's expensive and also holds such high promises will ensure it's place as a 'Must have' component for the legions of early adopters. The brilliant idea of launching them through Alienware, Falcon Northwest and the top of the line Dell XPS600 systems was a stroke of marketing genius as this adds to the allure of owning one when they finally launch to the retail market...If it's good enough for a system most of us can never afford but covet none the less it's damn well good enough for my 'monster RIG'. This arrangement will allow the almost guaranteed sales of the first wave of cards on the market. I have noticed that some UK online retailers have already started taking pre-launch orders for the £218 OEM 128MB version I just have to woner how many of these pre-orders have actually been sold?

    The concept of a dedicated PPU is quite simply phenominal, We spend plenty of money upgrading our GPU's, CPU's and quite recently Creative have brought us the first true APU (X-Fi series) that it makes sense for there to be a dedicated PPU and berhaps even an AiPU to follow.

    The question is, will these products actually benefit us to the value of their cost?

    I would say that a GPU, or in fact up to 4 GPU's running over PCIe x32 (2xPCIe x16 channels) become increasingly less value for money the more GPU's added to the equation. i.e. a 7900GTX 512MB at £440 is great bang for the buck compared to Quad SLI 7900GTX 512MB at over £1000. The framerates in the Quad machine are not 4x the single GPU. Perhaps this is where GPU's could trully be considered worthy of nVidia or ATI's Physics SLI load balancing concept. SLI GPU's are not working flat out 100% of the time...Due to the extremely high bandwidth of Dual PCIe x16 ports there should be a reasonable amount of bandwidth to spare on Physics calculations, perhaps more if Dual PCIe x32 (or even quad x16) Motherboards inevitably turn up. I am not saying that GPU's are more efficient than a DEDICATED and designed for PPU, just that if ATI and nVidia decided the market showed enough potential, they could simply 'design in' or add PPU functionality to their GPU cores or GFX cards. This would allow them to tap into the extra bandwidth PCIe x16 affords.

    The Ageis PhysX PPU in it's current form runs over the PCI bus, a comparitively Narrow bandwicth bus, and MUST communicate with the GPU in order for it to render the extra particles and objects in any scene. This in my mind would create a Bottleneck as it would only be able to communicate at the bandwidth and speed afforded by the Narrow bandwidth and slower PCI bus. The slowest path governs the speed of even the fastest...This would mean that adding a dedicated PPU, even a very fast and efficient one, would be severely limited by the bus it was running over. This phenomenon is displayed in all the real world benchmarks I have seen of the Ageis PhysX PPU to date, The framerates actually DROP when the PPU is enabled.

    To counter this, I believe, Ageis through ASUS, BFG and any other manufacturing partner they sign up with will have to release products designed for the PCIe bus. I believe this is what Ageis knows as the early manufacturing samples were able to be installed in the PCI bus as well as the PCIe bus (although not at the same time ;-) ). I believe the PCI bus was chosen for launch due to the very high installed user base of PCI motherboards, every standard PC I know of that would want a PPU in their system. I belive this is a mistake, as the users most likely to purchase this part in the 'Premium price' period would likely have PCIe in their system, or at least would be willing to shell out an extra £50-£140 for the privelage. Although I could be completely wrong in this as it may allow for some 'Double Selling' as when they release the new and improved PCIe version, the early adopters will be forced to buy into it again at a premium price.

    This leads me neatly onto the price. I understand that Ageis, quite rightly, are handing out the PhysX SDK freely, this is to allow maximum compatibilty and support in the shortest period of time. This does however mean that the end user, who purchases the card in the beginning will have to pay the full price for the card...£218 for the 128MB OEM version. As time goes by and more units are sold, the installed userbase of the PPU will grow and the balance will shift, Ageis will be able to start charging the developers to use their 'must have' Hardware Physics support in their games/software and this will subsidise the cost of the card to the end user, therefore making them even more affordable to the masses and therefore making it a much more 'Must Have' for the developers. This will take several generations of the PPU before we feel the full impact of this I believe.

    If ATI and nVidia are smart, they can capitalise on their high installed initial userbase and properly market the idea of Hardware physics for free with their SLI physics, they may be able to throw a spanner in the works for Agies while they attempt to attain market share. This may benefit the consumer, although it may also knock Agies out of the running depending on how effective ATI and nVidias driver based solution first appears. It could also prompt a swift buy out from either ATI or nVidia like nvidia did with 3DFX.

    Using the CPU for Physics, even on a multicore CPU, in my opinion is not the way forward. The CPU is not designed for physics calculations, and from what I hear they are not (comparitively) very efficient at performing these calculations. A dedicated solution will always be better in the long run. This will free up the CPU to run the OS and also for Ai calculations and well as antivirus, firewall, background applications and generally keeping the entire system secure and stable. Multicore will be a blessing for PC's and consoles, but not for such a specific and difficult (for a CPU) task.

    "Deep breath" ;-)

    So there you have it, My thoughts on the PPU situation as it stands now and into the future. Right now I will not be buying into the dream, but simply keeping the dream alive by closely watching how it develops until such a time as I believe the 'Right Time' comes. £218 for an unproven, generally unsupported, and possibly seriously flawed incarnation of the PPU dream is not in my opinion The Right Time, Yet ;-)

    JKay6969
  • DeathBooger - Monday, May 8, 2006 - link

    The Cellfactor demo is available on Ageia's website now. If you try to play it with out a PhysX card you get an error message. However, the demo includes a game editor. If you open the editor, you can open up the demo level and it allows you to play the game with out the PhysX card. You can't play with bots in the editor, nor can you see cloth or fluid dynamics. Everything else is present, it's like playing the game normally otherwise. I was able to play the game inside the editor with no performance problems. I have a dual core AMD with a single X1900XT and 2GB of RAM. CPU usage does go up to 80% when playing the game and blowing up things and whatnot, but it's a smooth experience with no noticable slow down. Graphics-wise, everything is present inside the editor, including dynamic lighting and normal mapping.

    If Ageia wanted to show people how well the PPU works and is needed in a game like Cellfactor, they should have allowed you to play the game normally with out a PhysX card. Since they didn't do that, it makes me think it's not actually needed.
  • AnnihilatorX - Sunday, May 7, 2006 - link

    People don't want another separate card mainly becuase of slot problem. But if the card uses PCIX1, which I bet most people with new motherboard doesn't have anything at that slot, it doesn't make it unfavourable.

    Since as mentioned, heat and noise of the card is low
  • hellcats - Saturday, May 6, 2006 - link

    If Aegia really want to spur additional support for their PPU card then they should develop a lower-level API than Novodex. Something similar to OpenGL or DirectX for graphics cards. This would encourage other middleware developers to support the PPU for the "heavy lifting". Then Havok and other physics middleware developers would be working with Ageia and not against them. The current situation is as if Nvidia were to provide a proprietary game engine as the only way to access the power of the GPU. Then only the game companies that would agree to support this engine would get graphics accleration.
  • Walter Williams - Sunday, May 7, 2006 - link

    Microsoft is currently working on a physics API that will be like their DirectX but for physics.
  • DerekWilson - Sunday, May 7, 2006 - link

    additionally, AGEIA is working with Havok to try to get them to include support for the PhysX hardware in their product.

    when we spoke with AGEIA about it, we learned that its more important to them to get software support than to create an SDK business. they want PhysX acceleration in Havok very much. but Havok is doing pretty well on its own at the moment and are being a little more cautious about getting involved.

    as is generally the case, small companies don't like to have their success tied up in the success of other small companies. Havok needs to decided if the ROI is worth it for them at this point, while there wouldn't really be a downside for AGEIA to let them include support.
  • Celestion - Saturday, May 6, 2006 - link

    If the 360 doesn't have the the PhysX chip in it, what can PhsyX SDK be used for?
  • JarredWalton - Saturday, May 6, 2006 - link

    A software SDK (Software Development Kit) is a set of libraries that people can use. So rather than write their own physics routines, they can use PhysX. Just like they can use Havok. The way we understand it, if done properly the PhysX libraries can run on either the CPU or the PPU, though the CPU should be slower. Right now, we don't have any 100% identical comparison other than AGEIA's test app, which doesn't really appear complex enough to be truly indicative of maximum performance potential.
  • Celestion - Sunday, May 7, 2006 - link

    Thanks

Log in

Don't have an account? Sign up now