Different Types of Stream Processors

The first thing we need to do when looking at the R600 shader core is to define our terms. AMD and NVIDIA build and refer to their Stream Processors (SPs) differently, and that makes counting them a little more difficult. Throughout our explanation, it will help to remember from our G80 coverage that threads refer to a vertex, primitive or pixel and not a stream of instructions as it would on a CPU.

Stream Processors: The NVIDIA Way

G80 has 128 SPs (for the 8800 GTX; there are 96 SPs on the 8800 GTS models) that are capable of doing a very small number of things at the same time. They can do either standard FP operations (like a MADD), a special function operation (like sine), or an integer operation. There are some cases where they can squeeze out an extra MUL, but more often than not this MUL isn't accessible. Each of these SPs operates on an individual thread (be it a vertex, primitive or pixel).

This gives us a total of up to 128 threads being processed per clock. It is important to realize that each of the 128 SPs isn't entirely independent. That is, we can't run 128 different instructions in one clock, in spite of the fact that we can run a number of instructions on 128 different threads. We'll delve a little deeper into this shortly, but depending on the type of shader running, the same instruction must be running on multiple threads.

For NVIDIA hardware, the minimum number of threads that must be processed using the same instruction is 16 (for vertex threads). NVIDIA's block diagrams show that each group of 16 SPs shares texture, register, and cache resources, so this makes sense. Pixel shaders, which are more important from a performance perspective, must run one instruction on 32 pixels at a time. What we can extrapolate from this is that NVIDIA can issue up to eight separate instructions across all of its 128 SPs (only four if working on pixels) per clock.

128 SPs / 16 Threads per Instruction per Clock = 8 Vertex Instructions per Clock

128 SPs / 32 Threads per Instruction per Clock = 4 Pixel Instructions per Clock

Stream Processors: AMD's R600

Things are a little different on R600. AMD tells us that there are 320 SPs, but these aren't directly comparable to G80's 128. First of all, most of the SPs are simpler and aren't capable of special function operations. For every block of five SPs, only one can handle either a special function operation or a regular floating point operation. The special function SP is also the only one able to handle integer multiply, while other SPs can perform simpler integer operations.

This isn't a huge deal because straight floating point MAD and MUL performance is by far the limiting factors in shader performance today. The big difference comes in the fact that AMD only executes one thread (vertex, primitive or pixel) across a group of five SPs.

What this means is that each of the five SPs in a block must run instructions from one thread. While AMD can run up to five scalar instructions from that thread in parallel, these instructions must be completely independent from one another. This can place a heavy burden on AMD's compiler to extract parallel operations from shader code. While AMD has gone to great lengths to make sure every block of five SPs is always busy, it's much harder to ensure that every SP within each block is always busy.

If we take a step back, we can determine how many threads AMD is able to work on per clock. With 320 total SPs, each grouped into blocks of five-to-a-thread, we get 64 threads per clock. And here's where it starts to get complicated. Before we go back and compare this to NVIDIA's architecture, let's go a little deeper into the implementation.

R600 Overview Stream Processor Implementation
Comments Locked

86 Comments

View All Comments

  • yyrkoon - Tuesday, May 15, 2007 - link

    See, the problem here is: guys like you are so bent on saving that little bit of money, by buying a lesser brand name, that you do not even take the time to research your hardware. USe newegg , and read the user reviews, and if that is not enough for you, go to the countless other resources all over the internet.
  • yyrkoon - Tuesday, May 15, 2007 - link

    Blame the crappy OEM you bought the card from, not nVIdia. Get an EVGA card, and embrace a completely different aspect on video card life.

    MSI may make some decent motherboards, but their other components have serious issues.
  • LoneWolf15 - Thursday, May 17, 2007 - link

    Um, since 95% of nvidia-GPU cards on the market are the reference design, I'd say your argument here is shaky at best. EVGA and MSI both use the reference design, and it's even possible that cards with the same GPU came off the same production line at the same plant.
  • DerekWilson - Thursday, May 17, 2007 - link

    it is true that the majority of parts are based on reference designs, but that doesn't mean they all come from the same place. I'm sure some of them do, but to say that all of these guys just buy completed boards and put their name on them all the time is selling them a little short.

    at the same time, the whole argument of which manufacturer builds the better board on a board component level isn't something we can really answer.

    what we would suggest is that its better to buy from OEMs who have good customer service and long extensive warranties. this way, even if things do go wrong, there is some recourse for customers who get bad boards or have bad experiences with drivers and software.
  • cmdrdredd - Monday, May 14, 2007 - link

    you're wrong. 99% of people buying these high end cards are gaming. Those gamers demand and deserve the best possible performance. If a card that uses MORE power and costs MORE (x2900xt vs 8800gts) and performs generally the same or slower what is the point? Fact is...ATI's high end is in fact slower than mid range offerings from Nvidia and consumes alot more power. Regardless of what you think, people are buying these based on performance benchmarks in 99% of all cases.
  • AnnonymousCoward - Tuesday, May 15, 2007 - link

    No, you're wrong. Did you overlook the emphasis he put on "NOT ALWAYS"?

    You said 99% use for gaming--so there's 1%. Out of the gamers, many really want LCD scaling to work, so that games aren't stretched horribly on widescreen monitors. Some gamers would also like TVout to work.

    So he was right: faster is NOT ALWAYS better.
  • erwos - Monday, May 14, 2007 - link

    It'd be nice to get the scoop on the video decode acceleration present on these boards, and how it stocks up to the (excellent) PureVideo HD found in the 8600 series.
  • imaheadcase - Tuesday, May 15, 2007 - link

    I agree! They need to do a whole article on video acceleration on a range of cards and show the pluses and cons of each card in respective areas. A lot of people like myself like to watch videos and game on cards, but like the option open to use the advanced video features.

  • Turnip - Monday, May 14, 2007 - link

    "We certainly hope we won't see a repeat of the R600 launch when Barcelona and Agena take on Core 2 Duo/Quad in a few months...."


    Why, that's exactly what I had been thinking :)

    Phew! I made it through the whole thing though, I even read all of those awfully big words and everything! :)

    Thanks guys, another top review :)
  • Kougar - Monday, May 14, 2007 - link

    First, great article! I will be going back to reread the very indepth analysis of the hardware and features, something that keeps me a avid Anandtech reader. :)

    Since it was mentioned that overclocking will be included in a future article, I would like to suggest that if possible watercooling be factored into it. So far one review site has already done a watercooled test with a low-end watercooling setup, and without mods acheived 930MHz on the Core, which indirectly means 930MHz shaders if I understand the hardware.

    I'm sure I am not the only reader extremely interested to see if all R600 needs is a ~900-950MHz overclock to offer some solid GTX level performance... or if it would even help at all. Again thanks for the consideration, and the great article! Now off to find some Folding@Home numbers...

Log in

Don't have an account? Sign up now