Final Words

First keep in mind that these performance numbers are early, and they were run on a partly crippled, very early platform. With that preface, the fact that Nehalem is still able to post these 20 - 50% performance gains says only one thing about Intel's tick-tock cadence: they did it.

We've been told to expect a 20 - 30% overall advantage over Penryn and it looks like Intel is on track to delivering just that in Q4. At 2.66GHz, Nehalem is already faster than the fastest 3.2GHz Penryns on the market today. At 3.2GHz, I'd feel comfortable calling it baby Skulltrail in all but the most heavily threaded benchmarks. This thing is fast and this is on a very early platform, keep in mind that Nehalem doesn't launch until Q4 of this year.

One valid concern is with regards to performance in applications that don't scale well beyond two or four cores, what will Nehalem offer us then?  Our DivX test doesn't scale well beyond four cores and even then Nehalem's performance was in the 20 - 30% faster range that we've been expecting.  The other thing to keep in mind is that none of these tests are really stressing Nehalem's integrated memory controller.  When AMD made the move to an IMC, we saw an instant 20% performance boost in most applications.  I suspect that the applications that don't benefit from Hyper Threading, will at least benefit from the IMC.  We've only scratched the surface of Nehalem here, looking at the benefits of Hyper Threading and its lower latency unaligned cache accesses.  We've hinted at what's to come with the extremely well balanced and low latency memory hierarchy of Intel's new baby.  Once this thing gets closer to launch, we should be able to fill in the rest of the puzzle.

Over six years ago I had dinner with Intel's Pat Gelsinger (back when he was Intel's CTO), and I asked him the same question I always do: "what are you excited about?" Back then his response was "threading", Intel was about to launch Hyper Threading and Pat was convinced that it was absolutely necessary for the future of microprocessors.

It was at the same dinner that Pat mentioned Intel may do a chip with an integrated memory controller much like AMD, but that an IMC wouldn't solve the problem of idle execution units - only indirectly mitigate it. With Nehalem, Intel managed to combine both - and it only took 6 years to pull it off.

Pat also brought up another very good point at that dinner. He turned to me and said that you can only integrate a memory controller once, what do you do next to improve performance? Intel has managed to keep increasing performance, but what I really want to see is what happens at the next tock. Intel proved its ability with Conroe and with Nehalem it shows that the tick-tock model can work, but more than anything looking at Nehalem today makes me excited at what Sandy Bridge will bring.

The fact that we're able to see these sorts of performance improvements despite being faced with a dormant AMD says a lot. In many ways Intel is doing more to improve performance today than when AMD was on top during the Pentium 4 days.

AMD never really caught up to the performance of Conroe, through some aggressive pricing we got competition in the low end but it could never touch the upper echelon of Core 2 performance. With Penryn, Intel widened the gap. And now with Nehalem it's going to be even tougher to envision a competitive high-end AMD CPU at the end of this year. 2009 should hold a new architecture for AMD, which is the only thing that could possibly come close to achieving competition here. It's months before Nehalem's launch and there's already no equal in sight, it will take far more than Phenom to make this thing sweat.

Power Consumption
Comments Locked

108 Comments

View All Comments

  • Anand Lal Shimpi - Thursday, June 5, 2008 - link

    I really wish I could've turned off hyperthreading :)

    DivX doesn't scale well beyond 4 threads so that's the best benchmark I could run to look at how Nehalem performs when you keep clock speeds and number of threads capped. With a 28% improvement that's at the upper end of what we should expect from Nehalem on average.

    Take care,
    Anand
  • SiliconDoc - Monday, July 28, 2008 - link

    Great answer, expalins it to a tee....
    However that leaves myself and I'd bet most of the fans here with not much real world use for 4x4HT ...
    I don't know should we all steal rented DVD's... by re-encoding - only use I know of that might work for the non-connected enduser.
    Not like "folding" is all the rage, they would have to pay me to do their work - especially with all the "power savings" hullaballoo going on in tech.
    That's great, 28% increase, ok...
    So I want it in a 2 core or a single core HT... since that runs everything I do outside the University.
    lol
    I guess the all core useage all the time, will hit sometime....
  • Calin - Thursday, June 5, 2008 - link

    First of all, the Hyperthreading in the Pentium 4 line brought at most a 20% or so performance advantage, with a -5% or so at worst. I don't have many reasons just now to think this new Hyperthreading would be vastly different.
    As for the scaling to 8 cores, maybe the scaling was limited due to other issues (latency, interprocessor communication, cache coherency)? It might be possible that DivX on this new platform to increase performance from 4 to 8 cores?
  • bcronce - Thursday, June 5, 2008 - link

    Intel claims that the new HT is improved and gives 10%-100% increase. The main issue with the P4 is that it had a double pumped alu that could process 2 integers per clock. This was great for HT since you could do 2 instructions per clock. The problem came with competition for the FPU, which there was only 1. This would cause the 2nd thread in the logical cpu to stall and thread swapping has additional overhead.

    You also run into the issue of L2 cashe thrashing. If you have 2 threads trying to monopolize the FPU and also loading large datasets into cache, you're cache misses go up while each thread is bottlenecked at the fpu.
  • techkyle - Thursday, June 5, 2008 - link

    I'd like to know what AMD fans are thinking. As one myself, I'm starting to wonder if I'm going to give in and become an Intel fan.

    Intel implementing the IMC:
    I can only say two things. One, it's about time. Two, THIEF!

    Return of Hyper Threading:
    It seems to me that some sort of intelligence must go in to the design of multi-core hyper threading. If two intensive tasks are given to the processor, the fastest solution would be simple, devote one core to one thread, a second core to the other. With Hyper Threading back with a multi-core twist, what's stopping one thread the first core, first virtual and the second thread on the first core, second virtual?

    Another nail in the coffin:
    AMD can provide no competition to high end Core 2 Quad machines. Even if the K10 line can jump up to par performance with the Core architecture, can they really expect to have a Nehalem competitor ready any time remotely close to Intel's launch? AMD can't afford to keep playing the power efficient and price/performance game.

    AMD is going to be in an even worse position than when it was X2 vs Core 2 if they can't pull something out of their sleeves. Barcelona already isn't clock for clock competitive with the Penryn and now we hear that early Nehalems are 20-40% above Core 2?
    If AMD's next processor flops, is it possible for them to drop desktop and server processors and still be a functioning (not to forget profitable) company? It's no longer a race for the performance crown. It's becoming a race to simply survive.
  • bcronce - Thursday, June 5, 2008 - link

    "Return of Hyper Threading:
    It seems to me that some sort of intelligence must go in to the design of multi-core hyper threading. If two intensive tasks are given to the processor, the fastest solution would be simple, devote one core to one thread, a second core to the other. With Hyper Threading back with a multi-core twist, what's stopping one thread the first core, first virtual and the second thread on the first core, second virtual? "

    The way Windows lists cpu is first physicals, then logicals. So in task manager the first 4, on a quad core w/ HT, will be your physical cpu and the last 4 will be you logical.

    Windows, by default, will put threads on cpu 1-4 first. It will move threads around to different CPUs if it feels that one is under-taxed and another is way over-taxed.

    Programmers can also force Windows to use differnt cores for each thread. So, a program can tell Windows to lock all threads to the first 4 cpus, which will keep them off of the logical. You could then allow a thread that manages the worker threads to run on the logical cpus. You would then be keeping all your hard data-crunching threads from competing with themselves and let the UI/etc threads take advantage of HT.
  • Spoelie - Thursday, June 5, 2008 - link

    Supposedly, Shanghai (the 45nm iteration of Phenom) will be around 20% faster clock per clock over Phenom. This is what AMD said itself some time ago and not verified by an independent source. Judging by current benchmarks, this would put Shanghai at the same or slightly higher performance level of Penryn.

    As such, a very crude estimate is that Shanghai should be as competitive to Nehalem as K8 is to Conroe. Not a very rosy outlook so let's hope this early information is not accurate and AMD can pull something more out of its hat.

    BTW, last I heard is that Bulldozer will come at the 32nm node at the earliest, since the design is supposedly too complex for 45nm. So no instant relieve from that corner. AMD will be fighting a harsh battle the coming years.
  • Calin - Thursday, June 5, 2008 - link

    "Two, THIEF"
    AMD's vector processing is 3DNow!, if I remember correctly. Yet, the Intel's versions of it is are touted on its processors instead (SSE2, SSE3). Now who's the thief?
  • swaaye - Thursday, June 5, 2008 - link

    MMX? Or, MDMX? Who copied who? Nobody, really. SIMD has been around forever.
  • Retratserif - Thursday, June 5, 2008 - link

    I really would not try and think of it as a Fan base. A majority of the OC'er and Benchmarkers use what works for what they are doing. I have owned and water cooled AMD CPU's. It was great at the time.

    Once Conroes came out the door swung the other way. Technology is like the ocean, it comes in waves. All waves die out. Fortunately we as the user are living it up because of Intel's success. At the same time I truly hope that AMD/ATI does something in response to the high power cpu's. If not we get what ever intel wants to give us.

    There will always be to sides to each story. Since Intel is on top and unscathed, they have time to perfect chips before they go mainstream. Same way we have seen the delay in Yorkfields. There was something seriously wrong, and they had time to address it before it was in the hands of thousands of users.

    Ok, I can say I am a fan.... of what ever works the best for what I do. Price/Performance/Practicallity. You take what you can afford and make it work harder for every penny you put in it.

    One thing you have to keep in mind. AMD is selling more budget CPU's and integrated/onboard video PC's to large companies like Dell and HP. They are moving more aggressively into typical home PC and mobile use. Intel just does not do very well there atm. With Ati in the pocket and being pretty green on power consumption, you can get a good mobile AMD that will do everything a typical PC user will ever need for 2 years at a good price.

Log in

Don't have an account? Sign up now