A More Efficient Architecture

GPUs, like CPUs, work on streams of instructions called threads. While high end CPUs work on as many as 8 complicated threads at a time, GPUs handle many more threads in parallel.

The table below shows just how many threads each generation of NVIDIA GPU can have in flight at the same time:

  Fermi GT200 G80
Max Threads in Flight 24576 30720 12288

 

Fermi can't actually support as many threads in parallel as GT200. NVIDIA found that the majority of compute cases were bound by shared memory size, not thread count in GT200. Thus thread count went down, and shared memory size went up in Fermi.

NVIDIA groups 32 threads into a unit called a warp (taken from the looming term warp, referring to a group of parallel threads). In GT200 and G80, half of a warp was issued to an SM every clock cycle. In other words, it takes two clocks to issue a full 32 threads to a single SM.

In previous architectures, the SM dispatch logic was closely coupled to the execution hardware. If you sent threads to the SFU, the entire SM couldn't issue new instructions until those instructions were done executing. If the only execution units in use were in your SFUs, the vast majority of your SM in GT200/G80 went unused. That's terrible for efficiency.

Fermi fixes this. There are two independent dispatch units at the front end of each SM in Fermi. These units are completely decoupled from the rest of the SM. Each dispatch unit can select and issue half of a warp every clock cycle. The threads can be from different warps in order to optimize the chance of finding independent operations.

There's a full crossbar between the dispatch units and the execution hardware in the SM. Each unit can dispatch threads to any group of units within the SM (with some limitations).

The inflexibility of NVIDIA's threading architecture is that every thread in the warp must be executing the same instruction at the same time. If they are, then you get full utilization of your resources. If they aren't, then some units go idle.

A single SM can execute:

Fermi FP32 FP64 INT SFU LD/ST
Ops per clock 32 16 32 4 16

 

If you're executing FP64 instructions the entire SM can only run at 16 ops per clock. You can't dual issue FP64 and SFU operations.

The good news is that the SFU doesn't tie up the entire SM anymore. One dispatch unit can send 16 threads to the array of cores, while another can send 16 threads to the SFU. After two clocks, the dispatchers are free to send another pair of half-warps out again. As I mentioned before, in GT200/G80 the entire SM was tied up for a full 8 cycles after an SFU issue.

The flexibility is nice, or rather, the inflexibility of GT200/G80 was horrible for efficiency and Fermi fixes that.

Architecting Fermi: More Than 2x GT200 Efficiency Gets Another Boon: Parallel Kernel Support
Comments Locked

415 Comments

View All Comments

  • silverblue - Thursday, October 1, 2009 - link

    Am I hearing you right - you say GT300 isn't a paper launch despite there being no cards for sale for the next few months, yet you said the 5870 was AND THERE WERE CARDS FOR SALE WHEN YOU MADE THE COMMENT! I don't care that you couldn't locate one, the simple fact is people had already bought cards from the first trickle (emphasis on the trickle part) and as such made your statement completely invalid.

    How much more rubbish are you going to spew from your hole?

    (note: I needed to have caps above to make a salient point and not just because I felt like holding the shift key for no particular reason)
  • SiliconDoc - Thursday, October 1, 2009 - link

    roflmao - If you're hearing anything right, you'd keep your text yap shut.
    Please show me the LAUNCH information on GT300, there, brainless bubba, the liar. I really cannot imagine you are that stupid, but then again, it is possible.

    Congratulations for being a COMPLETE IDIOT AND LIAR! Really, you must work very hard to maintain that level of ignorance. In fact, the requirement to be that stupid exceeds the likelihood that you actually are purely ignorant, and therefore, it is more likely you're a troll. My condolences in either case in all seriousness.
  • silverblue - Thursday, October 1, 2009 - link

    The proposed launch is late November but even Fudzilla concede that any problems will delay this. The earliest we'll see a GT300 on the shelves is just under 2 months. There, that's information for you. I want nVidia to launch GT300 this year but we don't always get what we wish for.

    Where did I lie in any of my previous posts? Oh right... I didn't bow down to worship the Green God(dess). If you had any semblance of an open mind or any stability at all, your nose wouldn't need cleaning. Calling me a troll is pure comedy gold and offering me your pity is outstanding to say the least :)

    Keep trying. Or don't. Either way, I doubt many people care for your viewpoints anymore.
  • SiliconDoc - Thursday, October 1, 2009 - link

    Oh jeeze, one red rooster who finally gets it.
    Congratulations, you're not the dumbest of your crowd.
    --
    No shirking here comes the QUOTE !

    " The proposed launch is late November " !!! whoo hoo !

    Now a proposed launch is not an official launch- try to keep that straight in the gourd haters when the time comes.
    --
    Pass it along to all the screaming tards, won't you please, you talk their language, or perhapos we'll just say you already have, because, by golly, they can believe you.
    ROFLMAO
  • rennya - Thursday, October 1, 2009 - link

    'Nvidia LAUNCHED TODAY... se page two by your insane master Anand.'


    This what you have said yourself somewhere in this very discussion. So you must have known about this so-called launch yourself.
  • SiliconDoc - Thursday, October 1, 2009 - link

    That's called sarcasm dear. Jiminy crickets.
  • palladium - Thursday, October 1, 2009 - link

    How can you tell if that's not a GTX285 with redesigned cover/cooler/PCB?
  • samspqr - Monday, October 5, 2009 - link

    it is SO funny that that thing silicondoc's master/god is holding in his hand ended up revealing itself as nothing more than a mock-up...
  • SiliconDoc - Wednesday, September 30, 2009 - link

    PS THE SILICON IS ALREADY CUT AND IN PRODUCTION!
    ---
    Yes anand at the bottom of page 1 claims "it's paper" - DECIEVING YOU, since the WAFERS HAVE ALREADY BEEN BURNED AND YIELDS ARE REPORTED HIGH ! (in spite of ati's marketing arm lying and claiming "only 9 cores per wafer yields" - A BIG FAT LIE NVIDIA POINTED OUT !
    Where have you been with your head in the sand ?
    --
    So at the bottom of page 1 Anand leaves you dips with the impression "it's all paper" (but the TRUTH is DEVELOPER CARDS ARE ALREADY ASSEMBLED and being DEBUGGED and TESTED) just anand won't get one for 2 months.
    ---
    THEN BY PAGE 2 ANAND CALLS IT A PAPER LAUNCH !
    roflmao
    Yes, the red rooster himself has convinced himself "today's nvidia LAUNCH" (that LAUNCH word is what anand made up in his deranged mind) is a paper launch "JUST LIKE ATI'S!".
    ---
    It is nothing short of absolutely AMAZING.
    The red rooster fan has boonswoggled his own gourd, stated in fasle terms, bashed it to be as bad as what ati just did with 5870, and IT'S NOT EVEN A LAUNCH DAY FOR NVIDIA !
    ---
    Congratulations, the massive bias is SCREAMING off the page. LOL
    It's hilarious, to say the least, that the master can be that deluded with his own spew!
  • gx80050 - Friday, October 2, 2009 - link


    Die painfully okay? Prefearbly by getting crushed to death in a
    garbage compactor, by getting your face cut to ribbons with a
    pocketknife, your head cracked open with a baseball bat, your stomach
    sliced open and your entrails spilled out, and your eyeballs ripped
    out of their sockets. Fucking bitch


    I would love to kick you hard in the face, breaking it. Then I'd cut
    your stomach open with a chainsaw, exposing your intestines. Then I'd
    cut your windpipe in two with a boxcutter.
    Hopefully you'll get what's coming to you. Fucking bitch




    I really hope that you get curb-stomped. It'd be hilarious to see you
    begging for help, and then someone stomps on the back of your head,
    leaving you to die in horrible, agonizing pain. Faggot


    Shut the fuck up f aggot, before you get your face bashed in and cut
    to ribbons, and your throat slit.

Log in

Don't have an account? Sign up now