AMD Analyst Day Platform Announcements

by Derek Wilson on 6/2/2006 1:00 PM EST
POST A COMMENT

40 Comments

Back to Article

  • MrKaz - Monday, June 05, 2006 - link

    Did you talk someone at AMD if they have some one interested (or going to do) some SQL accelerator, or CAD calculations accelerator, or even multimedia accelerator accelerator?

    It would be nice to boost the performance of SQL by 2X, or even media encoding from minutes to seconds...
    Reply
  • DerekWilson - Tuesday, June 06, 2006 - link

    they are certainly talking heavily about the possibility of hardware like that, but no hardware designers have commited to building anything yet. Reply
  • IntelUser2000 - Sunday, June 04, 2006 - link

    quote:

    As with K8, K8L will have 3 ALUs (arithmetic logic units) and 3 AGUs (address generation units). Combined with cache enhancements and the new ability to reorder loads, K8L has a shot at outpacing Core in integer performance.


    No. Because Core Duo(Yonah) with inferior decoder configuration, inferior memory bandwidth(which won't matter a lot but will make slight difference) and platform, still manages to outperform the current K8's. The Pentium M, which is even worse than Core Duo(slightly) still manages to outperform the K8's in integer. Now put Core with integrated memory controller, and comparison will look like Core Duo against Athlon XP.

    Core microarchitecture will exceed K8's in general integer architecture, and at least equal in K8L's ability. Integer superiority is still gonna be there, K8L will be faster than Core in FP and SSE performance because of low latency integrated memory controller with lots more real-world bandwidth(well that depends on how AMD implements SSE, Intel may still have an advantage if AMD puts a poor implementation like they did with Athlon XP's SSE, or at least it looked poor).
    Reply
  • JarredWalton - Sunday, June 04, 2006 - link

    If ~33% of all instructions are Loads, and K8 pretty much totally lacks the ability to reorder Loads, adding that feature could substantially boost performance. It definitely "has a shot" at beating Core, but it may also fall short. Anyone making blanket statements one way or the other - i.e. it *will* beat Core, or it *won't* come close - needs to take a step back and check what they really know and what they are just assuming.

    At present, AMD is saying K8L is going to have the ability to reorder Loads. They might only do minor reordering, or they might go so far as to have something similar to Conroe's memory disambiguation. Given that AMD hasn't done a major update to K8 in over 3 years (no, DDR2 controller and going dual core don't really count as major updates to the underlying architecture), K8L could be a lot of things. It migth only match Core Duo 2 on a clock-for-clock basis; it might fall short; it might even come out ahead. Also, there has been no indication that Intel is seriously planning on-die memory controller in the near future, probably to continue to protect their chipset market.

    Personally, I really hope AMD manages to basically match CD2 performance, because runaway performance leads don't help the consumer. In the end, theoretical integer, PF, SSE, etc. performance isn't as important as real-world application performance. Right now, it's just too soon to declare a victor in the Core Duo 2 vs. K8L match-up. CD2 vs. K8 is already pretty much a done deal, though, and there's no indication that AMD will be able to come out on top in that rivalry. K8L is their "counterattack", and that's the architecture that needs to compete with CD2.
    Reply
  • IntelUser2000 - Sunday, June 04, 2006 - link

    quote:

    If ~33% of all instructions are Loads, and K8 pretty much totally lacks the ability to reorder Loads, adding that feature could substantially boost performance. It definitely "has a shot" at beating Core, but it may also fall short. Anyone making blanket statements one way or the other - i.e. it *will* beat Core, or it *won't* come close - needs to take a step back and check what they really know and what they are just assuming.


    It's easy to see the performance in integer against Core. Core has ability to reorder loads, but Core Duo is in same situation as K8, it doesn't really have the ability either. Other than that, on the basic block diagram, K8 is superior architecturally to Core Duo, yet Integer performance is somewhat better on Core Duo. The difference probably goes deeper than that. One of the articles mention K7/K8 has similar technique to Intel's micro op fusion. It could be Intel's is much better, etc. If a K8 with substantially better microarchitecture(+ODMC) can't beat integer performance of Core Duo, will K8L with basically same microarchitecture(or may be worse) beat Core?? It's simple to see it probably won't.
    Reply
  • DerekWilson - Tuesday, June 06, 2006 - link

    core duo can reorder loads as the Pentium M could reorder loads --

    http://anandtech.com/cpuchipsets/showdoc.aspx?i=27...">http://anandtech.com/cpuchipsets/showdoc.aspx?i=27...

    Reply
  • MrKaz - Monday, June 05, 2006 - link

    P3 on steroids may beat the K7 on steroids in performance.
    But performance isn’t everything or Intel employees where out of job since K7 came out and beat P3 and P4. And Intel didnt recover yet!

    I didn’t see any presentation of Intel new architecture, but I bet even the Hammer look better than any thing Intel will release.
    http://www.amd.com/us-en/assets/content_type/Downl...">http://www.amd.com/us-en/assets/content...ableAsse...

    4MB cache, 128bit SSE that tells me nothing. Other than the P3 started with PC100 SDRAM, 256Kb cache and SSE and it's now at DDR2 667, 4MB cache and SSE4.
    Reply
  • Sceptor - Saturday, June 03, 2006 - link

    Finally a real interconnect that can be used for a serious co-processor...perhaps a physics co-pro not limited by the PCI bus would help smooth transition to more realistic games.

    Or a dedicated video co-pro to use with Cad or 3D Modeling programs...
    Reply
  • od4hs - Friday, June 02, 2006 - link



    http://images.anandtech.com/reviews/cpu/amd/analys...">http://images.anandtech.com/reviews/cpu/amd/analys...


    -> UK firm to unveil wall-socket PC

    The Jack PC thin client fits into a wall socket and is so energy-efficient it can get its power over Ethernet

    http://news.zdnet.co.uk/0,39020330,39272166,00.htm">http://news.zdnet.co.uk/0,39020330,39272166,00.htm
    Reply
  • lopri - Friday, June 02, 2006 - link

    I totally agree that the "direct connect" is the most desirable way but I cannot help but think AMD is somewhat daydreaming. That is, what's showing in the slides seems way ahead of today's "practicality".

    I mean, we've had this PCI Express which has been strongly pushed by core logic vendors, but so far all we practically have are video cards. I sometimes think all these mobo makers pay more attention to "asthetic" point when they design PCI-E slots so the boards look prettier. (lol)

    If my understanding is correct, AMD will introduce a new type of slot, HTX, on motherboards. Will other technology/market follow? Or will it just give another chance to graphics card manufacturers to push us to buy new cards? On today's desktop boards, basically everything is "integrated", sans video. I know that a video card has its own core and frame buffer, and transfers data via Hyper Transport, but if a physics card can utilize the HTX, what stops a video card from connecting directly to CPU, without passing the core logic or system memory?

    I think this will also be closely related to the available bandwidth of HTX per CPU core (or cores), and I can't really think of any add-in board that'll prioritize the bandwidth other than video cards, (OK and the physics cards) even though the HTX will be an open standard. (look at the lazy/lame Creative)

    A very desirable case would be where storage (hard disks) can take advantage of this "direct" connection but then again there is a such thing called "memory", so my imagination stops there. (maybe solid-state/I-Ram type of storage can make use of the HTX? Then what's the use of memory? Taking care of I/O?) Talking about I/O, I just thought it'd be interesting to see keyboards/mice connect to CPU via HTX. (Sorry I couldn't resist)

    All in all, like the article says, this roadmap seems just too broad/ambiguous/futuristic. I'm not a CPU engineer so my thinking could be totaly off, though. If so, please enlighten.

    lop

    Reply
  • peternelson - Saturday, June 03, 2006 - link


    High end pcie cards are available if you look for them

    eg Areca 8 sata II onto 8x pci express
    eg Myrinet 10 gigabit ethernet onto 8x pci express
    Plenty of other examples.

    Also, witness the highend server boards many are now offering pcie as an option to the former server standard pci-x.

    PCIE is here to stay and is a must for anyone interested in a performance system.

    There is a direct mapping of pcie onto Hypertransport.

    There is already fast networking available on an HTX card.
    Reply
  • lopri - Friday, June 02, 2006 - link

    Correction: ..video card.. transfers data via PCI Express..
    ;)
    Reply
  • saratoga - Friday, June 02, 2006 - link

    quote:

    At a lower level, we have a block diagram of the compute core for K8L CPUs. Again, this diagram is a bit oversimplified, but we can see a few key features of the architecture. On the FP side, the CPU is able to handle 2x128-bit floating point or SSE operations per clock. While this isn't quite as flexible as Intel's Core with its 3 SSE units, AMD's K8L will be able to handle 4 double precision floating point operations per clock. . (Current K8 chips can only do 1x128/2x64-bit SSE instructions per clock.)


    I'm a little confused. IIRC Core2 can do 2x 128 bit operations, each of which can be an add or multiply, but only one of which can be a load. AMD is restricting the actual operations to just 1 add and 1 multiply, but is removing the restriction on loads? So they'll be better able to feed the vector units then Intel, but have less flexibility once they've loaded?

    That doesn't make a whole lot of sense to me. I'd think if their SSE implementation was less agressive, they would not have added more load units to feed it. Has AMD confirmed that there are only 1 add and 1 mult unit? Or is this a case of Intel designing a nice backend and not providing the front end resources to keep it fed?
    Reply
  • mino - Friday, June 02, 2006 - link

    Well, you're kinda right and wrong at the same time:)

    However intel's C2 frontend(from L2 up) is far superior to AMD's. And was such since Banias. Also intel's backend(execution units) is now on par but only recently Yonah and older were inferior to AMD's brute force 3-issue backend.

    AMD has kinda ingeniously hidden poor backend by IMC however for streamed(desktop) pseudo-random loads intel's huge cache structures mitigated this so they are forced to improve frontend(hard to do) and do some backend optimizations(easy) on the way. Well, they kinda knew they will have to do this since the 90's, they have just chosen to implement IMC and cater to the core itself in the next iteration.

    On the 2load units - without them the maxFLOPS would be n, real one x. With them(load units are relatively simple and low power compared to FPU's) they've got MmaxFLOPS around 2n AND real achievable one(IMHO) in the 1.2x~1.5x range. Pretty good ROI for the one added load unit.
    Reply
  • saratoga - Saturday, June 03, 2006 - link

    quote:

    However intel's C2 frontend(from L2 up) is far superior to AMD's. And was such since Banias. Also intel's backend(execution units) is now on par but only recently Yonah and older were inferior to AMD's brute force 3-issue backend.


    Could you explain how?

    quote:

    On the 2load units - without them the maxFLOPS would be n, real one x.


    No it wouldn't. CPUs have registers, so the number of load units has nothing to do with FLOPs. You could have just one load unit and still sustain an arbitrary number of FLOPs, provided you didn't mind using the same registers over and over again, which I suppose could be the case if you're doing an iterative approximation of a value.

    quote:

    With them(load units are relatively simple and low power compared to FPU's) they've got MmaxFLOPS around 2n AND real achievable one(IMHO) in the 1.2x~1.5x range. Pretty good ROI for the one added load unit.


    I don't think loads count as FLOPs, even if you're loading things to be used in FP operations, so having more load units doesn't increase max FLOPs.
    Reply
  • mino - Friday, June 02, 2006 - link

    Sory for the english, grammar wasn't my friend :) Reply
  • DigitalFreak - Friday, June 02, 2006 - link

    The smartest thing AMD ever did was create HyperTransport. There are so many cool uses for it! Intel, on the other hand, still insists on using their proprietary solutions. Reply
  • DerekWilson - Friday, June 02, 2006 - link

    HyperTransport was created by an open consortium.

    But you do have to remember that AMD implimented a propreitary coherent HT for use in SMP systems. They haven't always been open, even if their method was implimented on top of an open standard.

    I do agree that general use of HyperTransport makes I/O much easier on many levels, and was a very good move for AMD. And now that they are opening up cHT, some really cool things can happen -- if the industry is ready. :-)
    Reply
  • Viditor - Saturday, June 03, 2006 - link

    quote:

    HyperTransport was created by an open consortium

    Actually, it was created by AMD, it was developed by an open consortium.
    However coherent HT is still (at least until now) proprietary AMD...
    Reply
  • Viditor - Saturday, June 03, 2006 - link

    Doh! I need to read first, post second...already asked and answered. Sorry... Reply
  • HurleyBird - Friday, June 02, 2006 - link

    quote:

    HyperTransport was created by an open consortium.


    HyperTransport was created by AMD. The consortium was created afterwards to manage the standard.
    Reply
  • od4hs - Friday, June 02, 2006 - link

    AMD began developing the HyperTransport™ I/O link architecture in 1997.

    Pre-Consortium Versions of the Specification
    AMD has released these two pre-consortium documents which define two revisions of "LDT" (Lightning Data Transfer) as HT was known before the HT Consortium was formed.
    http://www.hypertransport.org/tech/tech_specs.cfm">http://www.hypertransport.org/tech/tech_specs.cfm


    [2001]
    AMD has disclosed HyperTransport technology specifications under non-disclosure
    agreement (NDA) to over 170 companies interested in building products that incorporate
    this technology.
    Multiple partners have signed the license agreement for HyperTransport technology,
    including, among many others:

    Sun Microsystems Cisco Systems Broadcom
    Texas Instruments NVIDIA Acer Labs
    Hewlett-Packard Schlumberger Stargen
    PLX Technology Mellanox FuturePlus
    API Networks Altera LSI Logic
    PMC-Sierra Pericom Transmeta


    AMD is releasing the specifications to an industry-supported non-profit trade association in the fall of 2001.
    The HyperTransport Consortium will manage and refine the specifications, and
    promote the adoption and deployment of HyperTransport technology. It is also expected
    to consist initially of a Technical Working Group and a Marketing Working Group.
    Subordinate task forces will do the work of the consortium. Anticipated technical task
    forces include:
    Protocol Task Force
    Connectivity Task Force
    Graphics Task Force
    Technology Task Force
    Power Management Task Force
    Information on joining the HyperTransport Technology Consortium can be found at
    this website: http://www.hypertransport.org">http://www.hypertransport.org

    http://www.amd.com/us-en/assets/content_type/white...">HyperTransport Technology I/O Link (white paper), PDF




    San Jose, Calif., July 24, 2001 -- A coalition of high-tech industry leaders today announced the formation of the HyperTransport™ Technology Consortium, a nonprofit corporation that supports the future development and adoption of AMD's HyperTransport I/O Link specification.

    [...] More than 180 companies throughout the computer and communications industries have been engaged with AMD in working with the HyperTransport technology
    http://www.hypertransport.org/consortium/cons_pres...">hypertransport.org press release
    Reply
  • peternelson - Friday, June 02, 2006 - link


    HT 1,2,and 3 are published standards.

    Direct Connect Architecture (DCA 1.0 and 2.0 are published standards).

    HTX is a published standard.

    Some questions for you to ask the AMD engineers:

    I'm still interested to obtain pinouts of AM2 and F1207 sockets to establish how many HT links they can support.

    From 4x4 it looks like AM2 *MIGHT* support TWO HT links (one to other processor, one to the tunnel chip.

    I note 4x4 is slated for 2006 launch.

    Hope to see those boards real soon ;-) I assume you can populate one socket and put the other proc in there later when you have more money ;-)

    I would like to see HTX appear on some 4x4 or AM2 boards but doubt it will happen.

    However, on the "acceleration technology" I would like 4x4 to support the so-called "socketfiller" type where you drop in a xilinx fpga onto the socket. That would give a cheap 1cpu + 1fpga system. Hopefully acceleration is not precluded just cos its not opteron and not 1207.

    Now thinking of opterons, I want to know the pinout of socket F. I want to count the HT link support built in to the socket. If its only 3 HT links that would force a socket change to do 4 links.

    What news on possible future socket change requirements? eg for ddr3 and HT3 speed?

    Can the Nvidia chipset for opteron be built onto an AM2 board?

    I would encourage many board makers to add HTX to their opteron boards (easy and worthwhile) because eg one example is the pathscale HTX cluster interconnect cards.

    I am interested in AMD terms for licensing of any proprietary tech for their cache coherency, or DCA2, any white papers on it or reference designs.

    Is the 4x4+ in 2007 only K8 quadcore or is it K8L quadcore?

    Will K8L be supported on AM2 socket?

    Please encourage AMD to publish web datasheets on AM2 as exist for their old sockets.
    Reply
  • saratoga - Friday, June 02, 2006 - link

    quote:

    Can the Nvidia chipset for opteron be built onto an AM2 board?


    I think any HT compliant chipset can be used with any HT compliant part. Thats why Apple can use AMD's PCI-X bridge designed for Opteron processors on their older G5 systems. The chipset supports HT, thats all you need.

    I don't know if that changes with the new HT standards though.
    Reply
  • peternelson - Friday, June 02, 2006 - link

    *IF* an am2 socket can indeed support TWO HT links, then the SECOND processor could use its spare link to connect to yet another I/O interface chip/chipset.

    This would give opportunity for innovative 4x4 boards to add additional I/O, more pcie links, or an HTX slot.

    Please can we verify:

    How many links are available on AM2, and howmany links are available on FX62, and how many links are available on lower AM2 chips. I suspect the lower ones only have one HT link which would make them unsuitable for 4x4 operation. Please confirm.
    Reply
  • Jellodyne - Friday, June 02, 2006 - link

    There's a few of ways 4x4 could work with only the one HTT link in the socket.

    1. AMD could enable a second chip-to-chip HTT link using pins/lands on top of the cpu, or some sort of edge connector, with a pcb which bridges the two.

    or

    2. AMD could be splitting the HTT link into 2 8-bit links. One to the chipset, one to the 2nd processor. Heck, if the chipset is smart enough the leftover 8 bit link could go back to the chipset, resulting in the equivalent bandwidth between chipset and processors as a 'standard' dual opteron rig, just less between the processors. For desktops, 8 bit is probably enough.

    and of course if you're talking custom chipset, that leaves

    3. The chipset has dual CHT links, one to each processor, and acts like a traditional dual FSB chipset.


    I'd say #2 is pretty likely.
    Reply
  • Squidward - Friday, June 02, 2006 - link

    Whoever designs those slides should be fired or at least taught some color coordination. They hurt my eyes.

    Now dual slot - dual core mobos sound tasty but the price would be astronimical to configure a killer system. (looks at outdated Athlon 2500+ and sighs)

    Reply
  • Calin - Monday, June 05, 2006 - link

    (looks at outdated Duron 600 and cries) Reply
  • LoneWolf15 - Monday, June 05, 2006 - link

    Time to draw the L1 bridges shut and clock your way up, my friend.

    My Duron 600 made 1GHz when cooled right --it was cooler at 7.5 x 133 than at 10.0 x 100.

    And if you fry the chip, well...a used Duron, Thunderbird, or Palomino core is relatively inexpensive these days...
    Reply
  • Frallan - Monday, June 05, 2006 - link

    Well at least U guys have saved some money on the way of beeing outdated...

    *looks at outdated 3500+, 6800Gt@Ultra, 2*1Gb Ram and empty wallet and howls with pain*

    /F
    Reply
  • Squidward - Friday, June 02, 2006 - link

    umm, that should be astronomical

    the message is clear... my typing has failed!
    Reply
  • Hulk - Friday, June 02, 2006 - link

    :| Reply
  • PrinceGaz - Saturday, June 03, 2006 - link

    Conroe is already obsolete because K8L will grind it into the dirt. Anyone who buys a Core 2 Duo this year is wasting their money because AMD's K8L is better in every way. There's no point upgrading now unless you are stuck with a rubbish last-generation netburst processor like a Northwood or Prescott, because it's clear that K8L will totally annhilate Core 2 Duo and its successors. But if Intel fanbois want to waste their money on Core 2 Duo, that's fine. A little bit of competition from Intel before AMD strike their devestating counter-attack next year will ensure AMD don't cut corners in the K8L design.

    The best way to sum up the next year in CPUs is: Intel manage to gain a slight lead in the second half of 2006 and early 2007, but after that it will be AMD r0><0rs and Intel is teh su><0rs again!!!111

    P.S. The above is not meant to be taken entirely seriously ;) though I do believe from what we've seen that K8L should be a bit ahead of what Intel have next year, if nothing else because of its integrated memory controller.
    Reply
  • JumpingJack - Monday, February 05, 2007 - link

    quote:

    Conroe is already obsolete because K8L will grind it into the dirt.


    Really, where can I buy a K8L then??
    Reply
  • MrKaz - Monday, June 05, 2006 - link

    Tell me what does conroe, woodcrest and meron bring new to the market?

    Even the 5 years old amd hammer architecture presentation looks better than conroe, ..., ...
    http://www.amd.com/us-en/assets/content_type/Downl...">http://www.amd.com/us-en/assets/content...ableAsse...
    Reply
  • Darth Farter - Friday, June 02, 2006 - link

    :| Reply
  • xTYBALTx - Friday, June 02, 2006 - link

    Great article, but like everyone else I am interested in how 4x4 will improve gaming performance. Guess we'll have to wait and see. Reply
  • Calin - Monday, June 05, 2006 - link

    More than two graphic cards will improve performance (over just two) only in the most insane resolutions (like 3000 by 2000 pixels). As for the use of four cores, there certainly exists - just not in the games right now. Even two cores won't bring a big boost in game performance - as of now. Who knows, maybe games ported from PS3 and XBox 360 will use them (hopefully) Reply
  • Regs - Friday, June 02, 2006 - link

    I can kind of figure it out for myself but I wanted to make sure - what is cache coherency? Either way, Torrenza looks very interesting and very promising. Not only will AMD be delivering good competitive performance, but has a chance to unlock a whole new path in bringing a new standard through integrated computing.

    Hope you find out the goods for us Derek. Keep up the good work!
    Reply
  • Ryan Smith - Friday, June 02, 2006 - link

    To go with the shortest description, cache coherency is the catchall term for methods used to organize and inform multiple processors of cache changes in multiprocessor/multicore systems. Because both processors can work on the same data set at once, if one changes the data, the other needs to be intelligently informed about this, otherwise it will likely do something incorrectly. Reply

Log in

Don't have an account? Sign up now