Magny-Cours

You probably heard by now that the new Opteron 6100 is in fact two 6-core Istanbul CPUs bolted together. That is not too far from the truth if you look at the micro architecture:  little has changed inside the core. It is the “uncore” that has changed significantly: the memory controller now supports DDR-1333, and a lot of time has been invested in keeping cache coherency traffic under control. The 1944-pin (!) organic Land Grid Array (LGA) Multi Chip Module (MCM) is pictured below.

The red lines are memory channels, blue lines internal HT cache coherent connects. The gray lines are external cache HT connections, while the green line is a simple non coherent I/O HT connect.

Each CPU has two DDR-3 channels (red lines). That is exactly the strongest point of this MCM: four fast memory channels that can use DDR-1333, good for a theoretical bandwidth peak of 42.7 GB/s. But that kind of bandwidth is not attainable, not even in theory bBecause the next link in the chain, the Northbridge, only runs at 1.8GHz. We have two 64-bit Northbridges both working at 1.8 GHz, limiting the maximum bandwidth to 28.8 GB/s. That is price AMD’s engineers had to pay to keep the maximum power consumption of a 45nm 2.2 GHz  below 115W (TDP).

Adding more cores makes the amount of snoop traffic explode, which can easily result in very poor scaling. It can get worse to the point where extra cores reduce performance. The key technology is HT assist, which we described here.  By eliminating unnecessary probes, local memory latency is significantly reduced and bandwidth is saved. It cost Magny-cours 1MB of L3-cache per core (2MB total), but the amount of bandwidth increases by 100% (!) and the latency is reduced to 60% of it would be without HT-assist.

Even with HT-assist, a lot of probe activity is going on. As HT-assist allows the cores to perform directed snoops, it is good to reach each core quickly. Ideally each Magny-cours MCM would have six HT3 ports. One for I/O with a chipset, 2 per CPU node to communicate with the nodes that are off-package and 2 to communicate very quickly between the CPU nodes inside the package. But at 1944 pins Magny-Cours probably already blew the pin budget, so AMD's engineers limited themselves to 4 HT links.

One of the links is reserved for non coherent communication with a possible x16 GPU. One x16 coherent port communicates with the CPU that is the closest, but not on the same package. One port is split in two x8 ports. The first x8 port communicates with the CPU that is the farthest away: for example between CPU node 0 and CPU node 3. The remaing x16 and x8 port are used to make communication on the MCM as fast as possible. Those 24 links connect the two CPU nodes on the package.

 

The end result is that a 2P configuration allows fast communication between the four CPU nodes. Each CPU node is connected directly (one hop) with the other one. Bandwidth between CPU node 0 and 2 is twice than that of P0 to P3 however.

Whilte it looks like two Istanbuls bolted together, what we're looking at is the hard work of AMD's engineers. They invested quite a bit of time to make sure that this 12 piston muscle car does not spin it’s wheels all the time. Of course if the underground is wet (badly threaded software), that will still be the case. And that'll be the end of our car analogies...we promise :)

Index The SKUs
Comments Locked

58 Comments

View All Comments

  • 564265425722557 - Monday, March 29, 2010 - link

    1. Why is the TDP of the 65W ACP Magny Cours the question mark? And are you sure the TDP of the 80W ACP ones 115W?

    2. The Intel systems have only 24GB ram against the 32GB ram on the 2S magny cours. That's why the 100GB database test favors the Magny cours by a large margin.
  • JohanAnandtech - Monday, March 29, 2010 - link

    AMD told us the TDP values of the Magny-Cours at 80 and 105W ACP. The TDP values of the Lower power versions were not disclosed yet.

    And as we disclosed on the benchmark config page, none of the benches uses more than 20 GB. The vAPus mark I uses about 19 GB. The SQL Server uses much less. While the SQL server test has to scan through the complete index, it does access the complete 100 GB data. There absolutely no advantage for the Opterons there. We checked.

    The fact that we spec the servers like that is a direct consequence of their memory channels (3 and 4). There is not much we can do about that.
  • Penti - Tuesday, March 30, 2010 - link

    How about about 4P performance? It's cheap now and it's AMD whole selling point. I guess you can get a 4P 48-core 128GB system for not that much. How would that compare to a say 2P Nehalem 12-core 92GB? Wouldn't they cost about the same? Will it still be competitive against 8-core 2P Nehalem-EX? And how about the 4P (like 6-core versions) Nehalem-EX? How about the 8-core versions of 6100 series Opterons?
  • elnexus - Wednesday, March 31, 2010 - link

    In answer to cost:

    Compare our 2P Xeon 5600-series Workstation :http://elnexus.com/products.aspx?line_id=15514
    with our 4P Opteron 6100-series Workstation: http://elnexus.com/products.aspx?line_id=15635

    (I hope this isn't condemned as advertising, since it is an attempt to answer a question about price vs performance.)

    Note how low priced the 6128 chip is (the default chip included in the base price).

    AMD, I think are running away from Intel if you factor in the price...
  • Penti - Wednesday, March 31, 2010 - link

    Thanks, I don't condemn it as advertising as this is a new platform so it's interesting and hard to get prices for complete systems yet. Basically 4P 8-core 6100-series opterons with 128GB DDR3 ECC REG cost as much as 2P six-core Xeon (Westmere EP) with 96GB DDR3 ECC REG. Mainly because you can use cheaper 4GB sticks and still get 128GB. And partly because there's no longer any markup for above >2P parts. I guess it accounts for something. Yeah, 6128 chip virtually don't cost nothing for being 4P compatible. Guess it helps AMD for a lot of workload scenarios. And since you can get 4P in 1U it's really nothing that speaks against it. Will be interesting to see what the Nehalem-EX can do though.
  • TitanusComp - Wednesday, April 6, 2011 - link

    You can really get a good idea by comparing this two products:

    48 Cores:
    http://www.titanuscomputers.com/A400-AMD-Workstati...

    24 Cores (Quad SLi Capable)
    http://www.titanuscomputers.com/X450-Intel-High-Pe...

    Now, things to consider, do you need CPU or GPU power?
  • duploxxx - Monday, March 29, 2010 - link

    To make the whole benchmark complete I think you should ask some AMD Opteron 6136 from AMD to get a full review.
  • duploxxx - Monday, March 29, 2010 - link

    and add the 56xx 4core counterpart off course
  • JohanAnandtech - Tuesday, March 30, 2010 - link

    We are working on it. Expect an update with new SKUs this month. I would say next week, but I would like to take some time to do some in depth analysis.
  • Hacp - Monday, March 29, 2010 - link

    Anand,
    I want to ask why are you biased against AMD? You should base your tests based on price. AMD is selling their 12 core for the price of an Intel 6 core. Compare apples to apples! Do a 12 core vs 6 core comparison and see who wins. Otherwise, you are doing a disservice.

Log in

Don't have an account? Sign up now