How AMD's Istanbul might close the gap with Nehalem EP
by Johan De Gelas on February 25, 2009 12:00 AM EST- Posted in
- IT Computing general
The Istanbul cores are the same as those that can be found in the AMD's latest Shanghai CPU. But the "uncore" part of Istanbul is more interesting. By now, you have probably heard about AMD's "HT-assist" technology, a probe or snoop filter. Every time a new cacheline is brought into the L3-cache of for example CPU 1 on the current Shanghai Platform, a broadcast message is sent to all L3-caches of all CPUs, and CPU 1 has to wait until those CPUs answer.
In the case of Istanbul, the CPU will simply check it's snoop filter in it's own L3-cache, and if none of the other CPUs have that certain cacheline, it can go ahead. This lowers the latency of bringing in a new cacheline and raises the effective bandwidth.
To better understand this, we combined our own stream benchmarking with the one that AMD presented. All AMD systems are using DDR-2 800.
As each Stream thread works on its own data, there is no reason to send out coherency synchronization requests. These requests slow the process of getting new cachelines in the L3 and hence lower effective memory bandwidth. What is interesting is that this will not only benefit the applications that use the HT interconnects a lot for coherency traffic, but also applications like stream which do not need the HT interconnects. Also notice that HT 3.0 does not improve memory bandwidth, as Stream will try to keep its thread data local. Our testing used SUSE SLES 10 SP2 and AMD used Windows 2008. Both OSs are well optimized and NUMA aware.
This means that especially HPC applications, with many threads all working on their own data, will benefit from the higher effective bandwidth. Besides HT assist, AMD has now confirmed to us that the memory controller has been tuned quite a bit. This higher amount of bandwidth will allow the quad Istanbul to stay out of the reach of the dual Nehalem EP Xeons in many HPC applications.
HT assist might also improve the SAP and OLTP scores quite a bit, but for a different reason. SAP and OLTP applications perform a lot of cache coherency syncronization requests, so the snoop filter will substantially lower the average latency of such requests as in some cases:
- the CPU will only wait on one other CPU (instead of waiting for all responses to come back)
- the CPU won't have to wait at all, as the other CPUs don't have this line.
Secondly, this will also lower memory latency, which is a bonus for almost every multi-threaded application.
Lower memory latency, higher bandwidth, lower "cache coherency" latency and more interconnect bandwidth: the improved "uncore" of Istanbul will be vital to close the gap with Nehalem. Much will depend on how quickly Intel introduces its own hexacore 32 nm Xeons, but that probably won't happen before 2010. Istanbul is shaping up to be a really good alternative for Intel's quadcore Nehalem. We might see a good fight after all...
Don't forget to check it.anandtech.com (IT portal) often, as many of our blogposts (for example the VMworld 2009 coverage) are not published on the frontpage of Anandtech.com.
40 Comments
View All Comments
JarredWalton - Wednesday, February 25, 2009 - link
Actually, I considered my post pretty intelligent and reasonable. Implying otherwise is, frankly, rather insulting. I was responding as much to the initial poster as to you, as that reader is clearly Intel biased. In fact, pretty much any time someone throws out the "you're biased for/against [company X]" we can safely assume that the real problem is the person making the comment is heavily biased in the opposite direction.There's bias in everything we do and write, obviously, and we try to provide a well-balanced view of the entire market. When we get some users saying we're horribly Intel biased and others saying we're horribly AMD biased, I'm inclined to think we're doing a reasonable job at straddling the fence between the various companies.
As for the title and everything else, Johan is providing an article about Istanbul, based on information AMD is providing, and they will obviously try to portray their solution in the most favorable light. That said, there will absolutely be areas where the changes in the "uncore" of Istanbul have a dramatic impact on performance relative to the older 4-core Barcelona variants. Will it be "enough"? I'm sure there will be areas where AMD can actually come out ahead of Intel - yes, even ahead of Nehalem. They aren't likely to be super common, but they will almost certainly exist - and it probably won't matter in some of those situations whether we're looking at 2-core, 4-core, or 8-core Nehalem. Thus, closing the gap in this case means that we may see boosts in performance of 20% or more in some environments. We are not discussing a 1% boost, I don't think.
Johan of course would be the one to determine how much of a difference we're really looking at. This is a blog, so going into depth about every little thing where AMD wins or loses is out of scope. Normally, the goal for blogs is 500 to 1000 words (often closer to 500), so by that token he's right on target. When he gets around to full reviews, he will have more data on all facets. Everyone (i.e. you and several others clearly) is so quick to dismiss Istanbul before it even launches that I have no problem with a headline that might get people to at least momentarily consider the broader picture. A bit sensational? Sure, but it is, ultimately, a short blog and not the final say. I hope that Istanbul ends up being a lot better than what many have already assumed, and it's nice to see some data that suggests it's not *all* doom and gloom for AMD (though admittedly it's quite grim).
Cheers,
Jarred
winterspan - Thursday, February 26, 2009 - link
Jarrod, I have to agree with your critic here. Your responses have intentionally ignored the main argument he was trying to articulate which is that Johan completely ignored the implications of the fact that he was comparing a QUAD-SOCKET Opteron with a DUAL-SOCKET Nehalem platform.
How can he say the Istanbul is closing the gap with Nehalem when he has to reference a QUAD socket Operton system to find competitive performance with a DUAL socket Nehalem??? He completely glosses over the fact that a quad-socket 24-core Opteron system will not only cost much more, but in all likelihood use a lot more power than a dual-socket Nehalem.
Am I missing something here? As was mentioned by others, Johan is a very intelligent and knowledgeable person, so I can't believe that it was unintentional.
BTW, I'm not really biased in either way.. perhaps a tad bit towards AMD as I really want them to come back and keep this market competitive.
hellopeach - Tuesday, June 2, 2009 - link
"How can he say the Istanbul is closing the gap with Nehalem when he has to reference a QUAD socket Operton system to find competitive performance with a DUAL socket Nehalem???"Because it IS closing the gap, a LOT too. If you were in a race, and you were behind your competitor 100 miles, now you are behind your competitor 50 miles, you are closing the gap.
The gap between Istanbul and Nehalem is a LOT smaller than the gap between Barcelona and Nehalem, so yes it IS closing the gap. The article is comparing QUAD-SOCKET BARCELONA with QUAD-SOCKET ISTANBUL with DUAL-SOCKET Nehalem, which shows that the gap is still there, but it is now much smaller than before now.
JohanAnandtech - Thursday, February 26, 2009 - link
"He completely glosses over the fact that a quad-socket 24-core Opteron system will not only cost much more, but in all likelihood use a lot more power than a dual-socket Nehalem."I only say "will allow the quad Istanbul to stay out of the reach of the dual Nehalem EP Xeons in many HPC applications."
Am I saying that this is wonderful? Of course not, you are absolutely right that performance/watt for a dual Nehalem will be much better than on a quad socket system. But it allows AMD to continue to fight for it's quad socket market. There are other reasons why people chose for 4S. They may need the 32 DIMMs slots for virtualization for example.
This post was mostly technical, trying to explain why Istanbul is shaping up a bit better than we previously thought.
And I would really appreciate if the few people that always think that we are biased would be a little bit more reservered. One day it is Intel biased, the other day AMD biased. This really poisons the discussions that should be technical instead of political. I believe that any neutral reader can see that our posts are mostly about the technical merits of a certain platform. I would include the 4S platform of Intel if I had a launch date and more info. But right now, Beckton is still a bit vague (Q1 2010? Just a double Nehalem EP?).
melgross - Friday, February 27, 2009 - link
It seems to me that this is saying that a four cylinder engine car is closing the gap with a two cylinder car in "most areas".Not too good, really. Twice the fuel, more complexity, higher initial costs, etc.
Who would think this is closing the gap?
The same thing applies here. Might as well go to a four core Nehalem instead and keep the advantages of the speed and power savings that it has over the AMD chip models.
I would agree that if two chips with equal specs, four cores vs four cores are within 15% performance of each other, the slower model could be said to be closing the gap if the previous model was 30% behind. It's within range, if costs are lower all around.
But when performance is almost 40% lower core to core, the gap is too wide to be considering a "closing the gap". It's a chasm, not a gap.
The mere fact that a four core AMD product must be compared to a two core Intel product proves this. And the fact that it's competitive in "most areas" isn't saying much. May as well get a two core Intel product instead.
If that product isn't available yet, then most companies today will happily wait for it, things being what they are. Switching to, and supporting a different architecture, is trouble enough unless there is a significant performance and power savings advantage to doing it.
That isn't true here.
hellopeach - Tuesday, June 2, 2009 - link
If you were behind your competitor 100 miles, and now you are behind him 50 miles, you are closing the gap. That doesn't mean you will win the race, that doesn't mean you can close the gap any further, but the FACT remains, that the gap has become a LOT smaller now.I wish people can talk without bias, but only the FACTS. The FACT is, the gap between Istanbul and Nehalem is a LOT smaller than the gap between Barcelona and Nehalem, so yes it IS closing the gap. Period.
Of course, another FACT is that there's still a big gap between Istanbul and Nehalem, that there's little reason to buy Istanbul over Nehalem.
tshen83 - Thursday, February 26, 2009 - link
"I only say "will allow the quad Istanbul to stay out of the reach of the dual Nehalem EP Xeons in many HPC applications." "I really want to believe that Johan, you are more intelligent than that. This is clearly a word play. 41GB vs 34GB. A 20% difference in absolute performance isn't exactly "out of reach". In fact, a 400% difference in performance per watt per dollar metric will make the 4Socket AMD out of reach in any HPC environment.
"This post was mostly technical, trying to explain why Istanbul is shaping up a bit better than we previously thought. "
Really? Compared to what? If it takes 4 CPUs to fight off Intel's 2 CPUs, either you have really low expectations for AMD or you are clearly not thinking straight.
"And I would really appreciate if the few people that always think that we are biased would be a little bit more reservered. One day it is Intel biased, the other day AMD biased"
I am not biased at all, simply have a liitle more common sense than you do looking at the same data. You don't have to defend your position really, your bias is as clear as your title would indicate. If I was anand, non-sensical posts like yours who have been canned.
"But right now, Beckton is still a bit vague (Q1 2010? Just a double Nehalem EP?). "
How is Beckton vague? Isn't Istanbul as vague as Becton? Both are non-released products. Just from your words, "just a double Nehalem EP" shows that you don't know crap about hardware and clearly biased. Becton has 4 channel DDR3 per socket vs Nehalem-EP's 3 channels and has 4 QPIs. Isn't Istanbul a Shanghai with 2 more cores copy and pasted? The fact that they are doing Snoop Filters on HT 3.0 is because they had to, otherwise scaling would suck.
There is a difference between truth and bias. Your readers are not that dumb as you expect. The reason why people are saying you are biased is because you shouldn't blatantly pump an inferior product. It makes you look stupid. And the fact that you ask your critic to be more "resevered" make you a speech nazi. Your argument here is absolutely political and non-technical, precisely the opposite you say.
JarredWalton - Thursday, February 26, 2009 - link
I can't say I keep up on all the latest server stuff as much, but my best guess on the dual-core vs. hex-core is that AMD provided tests for hex-core and Johan doesn't yet have anything more than dual-core Nehalem in house. Like I said already, this is a blog, not anything intended to be comprehensive. Obviously, benches comparing dual-core and hex-core are at best seriously skewed, but the benchmark data isn't even all that meaningful on its own (i.e. memory bandwidth).Until we have actual hardware for both sides, it's premature to declare a winner. Intel will almost certainly win in most cases, but virtualization has been a strong point for AMD for quite some time and hex-core with better memory bandwidth/latency and overall uncore improvements, plus two more cores... well, that can't hurt. Will hex-core Dunnington (or octal-core) surpass what AMD can provide? Almost certainly, but until the fat lady sings let's hold off on clearing the theater. :)
TA152H - Wednesday, February 25, 2009 - link
Well, if you consider an intelligent post one that bases the entire article on the opening line, then we'll have to disagree on that. I consider that deliberately argumentative, especially with what's been mentioned after it has nothing to do with the title.I have no strong bias either way, in reality, I wish AMD were better, and I was really saddened when they spun off their fab plants. If I have any affections, it is towards AMD, and particularly for Jerry Sanders. I have nothing but contempt and disgust for Ruiz though.
I wish I could find a good reason to buy AMD processors, I really do, and I even try to rationalize the Phenom II being good enough. It probably is for some applications, but, let's be honest, it's a horrible design compared to the Nehalem. It's way slower, and it's roughly the same size. It's very frustrating to me that AMD can't get it right, although it's a step forward, it's really not a competitive product if they want to make money. I was sooooo frustrated with AMD when they were talking about how good the K8 was, because I thought it kind of sucked, when it was beating the Prescott. I knew their time was running out, and they were so haughty with their attitude that they could overcome Intel's superior manufacturing with their superior design. It's like watching a blind man being smug as he walks into a chainsaw. Well, we see the results now.
You're argument is changing now, and I wish the article had been presented as you indicated - comparing it to the Barcelona instead of the Nehalem. You took the 1% too seriously, it was to illustrate the point that being technically correct and giving a correct impression are not the same, not to indicate a estimate of performance increase. I'll be real surprised, really, really surprised, to see AMD match Intel, core for core, in any meaningful benchmark. I mean, you could create some, I'm sure, like measuring L1 cache speed (why is the Nehalem's so slow anyway?????), but the processor is dramatically better, and the old days where AMD could make up for it with a better platform around it are gone, unless you find some nice FB-DIMMs to kill the Intel Platform with. It's different from before where Intel had a much better processor, but a much worse server platform. Now they have an even better processor, and the platform is at least as good. Why does AMD STILL waste so many transistors on x87 anyway???? It's obsolete and deprecated, and by them with x86-64.
I also hope that Istanbul is better than what I expect, I really do. But, looking at both designs, my heart and my brain aren't on the same page, and I don't see how it can match the Nehalem, unless it's done simply to make the Nehalem look bad. I can probably make a 386 look fast with the right benchmark, but, we'll see if your prediction on the Istanbul is true. Especially with any application where HT works, and there are so many now, particularly with servers, AMD is going to have such a hard time with performance per watt. FB-DIMMs might save the day :-P .
carniver - Thursday, February 26, 2009 - link
[quote]I have no strong bias either way, in reality, I wish AMD were better, and I was really saddened when they spun off their fab plants. If I have any affections, it is towards AMD, and particularly for Jerry Sanders. I have nothing but contempt and disgust for Ruiz though.I wish I could find a good reason to buy AMD processors, I really do, and I even try to rationalize the Phenom II being good enough.[/quote]
I'm so glad to hear that there's another person holding onto this same view!
Heck, I got permanently banned from AMDZone for expressing the same opinion, after having been their member for more than 8 years now. We don't hate AMD, we just feel strongly disgusted that a fine piece of iron couldn't turn into a solid piece of steel, but instead rusts into something worthless. All because of one reckless CEO.