Pipelining: 101

It seems like every time Intel releases a new processor we have to revisit the topic of pipelining to help explain why a 3GHz P4 performs like a 2GHz Athlon 64. With a 55% longer pipeline than Northwood, Prescott forces us to revisit this age old topic once again.

You've heard it countless times before: pipelining is to a CPU as the assembly line is to a car plant. A CPU's pipeline is not a physical pipe that data goes into and appears at the end of, instead it is a collection of "things to do" in order to execute instructions. Every instruction must go through the same steps, and we call these steps stages.

The stages of a pipeline do things like find out what instruction to execute next, find out what two numbers are going to be added together, find out where to store the result, perform the add, etc...

The most basic CPU pipeline can be divided into 5 stages:

1. Instruction Fetch
2. Decode Instructions
3. Fetch Operands
4. Execute
5. Store to Cache

You'll notice that those five stages are very general in their description, at the same time you could make a longer pipeline with more specific stages:

1. Instruction Fetch 1
2. Instruction Fetch 2
3. Decode 1
4. Decode 2
5. Fetch Operands
6. Dispatch
7. Schedule
8. Execute
9. Store to Cache 1
10. Store to Cache 2

Both pipelines have to accomplish the same task: instructions come in, results go out. The difference is that each of the five stages of the first pipeline must do more work than each of the ten stages of the second pipeline.

If all else were the same, you'd want a 5-stage pipeline like the first case, simply because it's easier to fill 5 stages with data than it is to fill 10. And if your pipeline is not constantly full of data, you're losing precious execution power - meaning your CPU isn't running as efficiently as it could.

The only reason you would want the second pipeline is if, by making each stage simpler, you can get the time it takes to complete each stage to be significantly quicker than in the previous design. Your slowest (most complicated) stage determines how quickly you can get data through each stage - keep that in mind.

Let's say that the first pipeline results in each stage taking 1ns to complete and if each stage takes 1 clock cycle to execute, we can build a 1GHz processor (1/1ns = 1GHz) using this pipeline. Now in order to make up for the fact that we have more stages (and thus have more of a difficult time keeping the pipeline full), the second design must have a significantly shorter clock period (the amount of time each stage takes to complete) in order to offer equal/greater performance to the first design. Thankfully, since we're doing less work per clock - we can reduce the clock period significantly. Assuming that we've done our design homework well, let's say we get the clock period down to 0.5ns for the second design.

Design 2 can now scale to 2GHz, twice the clock speed of the original CPU and we will get twice the performance - assuming we can keep the pipeline filled at all times. Reality sets in and it becomes clear that without some fancy footwork, we can't keep that pipeline full all the time - and all of the sudden our 2GHz CPU isn't performing twice as fast as our 1GHz part.

Make sense? Now let's relate this to the topic at hand.

Index 31 Stages: What’s this, Baskin Robbins?
Comments Locked

104 Comments

View All Comments

  • Stlr22 - Sunday, February 1, 2004 - link

    post*
  • Stlr22 - Sunday, February 1, 2004 - link

    KristopherKubicki

    Earlier you said that I should read the article.
    What was your point? What was it about my first pot that you disagreed with?
  • KristopherKubicki - Sunday, February 1, 2004 - link

    #7:

    I agree 100% with Anand and Derek. This processor will be a non-event until we get in the 3.6GHz range. Similar to Northwood's launch.

    #10:

    Check out our price engine. We have already been listing the processor a week!

    http://www.anandtech.com/guides/priceguide.htm

    http://www.monarchcomputer.com/Merchant2/merchant....

  • cliffa3 - Sunday, February 1, 2004 - link

    In the table on page 14 it shows that the 90nm P4@2.8 will have a 533 MHz FSB, but is that the case? I did some quick google research and can't find anything to support that...please confirm or correct, thanks.
  • NFactor - Sunday, February 1, 2004 - link

    Yes, I must agree this is an amazing article, one of the best i have ever read. Thanks.
  • Xentropy - Sunday, February 1, 2004 - link

    VERY interesting article. Thank you Anand and Derek! One of the best I've read on Anandtech, and I consider yours the best hardware site on the net!

    One correction, on page 7, you say, "if you want to multiply a number in binary by 2 you can simply shift the bits of the number to the right by 1 bit," but don't you mean shift to the left one bit (and place a zero at the end)? It's much like multiplying a decimal number by ten for obvious reasons.

    Anyway, it looks like the Prescott is somewhat of a non-event at this time. Just new cores that perform fundamentally the same as the current ones at current speeds. The real news will come later; Intel has just positioned itself for one hell of a speed ramp to come. Northwood was clearly at the end of the line. One analogy, I suppose, would be that Intel didn't fire any shots in the CPU war today, but they loaded their guns in preparation to fire.

    The coming year will be an exciting one for us hardware geeks. I'm interested in seeing how higher clocked Prescotts play out as well as whether anything 64-bit shows up before 2005 to support AMD's stance that we need it NOW.

    Again, thanks for a very thorough article!
  • Stlr22 - Sunday, February 1, 2004 - link

    KristopherKubicki

    So what's your take on these new Prescotts?
  • KristopherKubicki - Sunday, February 1, 2004 - link

    Anand scolded me for not reading the article :( I only read the conclusion and the graphs. Turns out the decision making isnt as clearcut as it sounds.

    As for the thing with the inquirer. Well, lots of people had prescotts. We had one back in August I believe. The thing is they were horribly slow - 533FSB 2.8GHz. Everyone drew the conclusion that these were purposely slowed processors that were jsut for engineering purposes. While the inq benched this processor, most people didnt just becuase they were under the impression this was not to be the final production model. Hope that clears up some discrepancy about the validity.

    Cheers,

    Kristopher
  • wicktron - Sunday, February 1, 2004 - link

    Hehe, I guess the Inq was right about this one. Where are all the Inq bashers and their claim of "fake" benchies? Haha, I laugh.
  • Stlr22 - Sunday, February 1, 2004 - link

    KristopherKubicki - "read the article..."


    lol that might be a good idea, as I only broswed it and read the conclusion. :D

Log in

Don't have an account? Sign up now