Original Link: https://www.anandtech.com/show/2229
A quiet AMD isn't a good AMD, but unfortunately it's the AMD we've been left with ever since Intel started becoming more competitive. In fact, the more Intel changed for the better, the more it seemed AMD changed for the worse. Intel started bringing out better product, talking more about its plans for the future, and made a whole lot of sense in just about everything it was doing and saying. Meanwhile, AMD just seemed to freeze up; we got no disclosures of upcoming products, no indication of direction, and very little sign of the hungry, competitive AMD that took Intel on and actually won a bout.
Enough complaining, poking, and prodding eventually got us a disclosure of AMD's Barcelona architecture last year. While we appreciated the depth with which AMD gave us information on Barcelona, the product itself was over a year away when we first heard about it. With no relief in sight for AMD other than a vicious price war, we began to worry not about Barcelona, but about what would come next. Would Barcelona have to tide us over for another three years until its replacement? How will AMD compete in the mobile and ultra-mobile spaces? And how does the ATI acquisition fit into AMD's long-term microprocessor design philosophy? In fact, what is AMD's long term microprocessor design philosophy?
You see, we have had all of these questions answered by Intel without ever having to ask them. Once or twice a year, Intel gathers a few thousand of its closest friends in California at the Intel Developer Forum and lays out its future plans. We needed the same from AMD, and we weren't getting it.
When Intel was losing the product battle late in the Pentium 4's lifespan, it responded by being even more open about what it had coming down the pipeline. When everyone doubted what Intel's next-generation micro-architecture would do, Intel released performance numbers months before any actual product launch. AMD's strategy of remaining guarded and silent while it lost market share, confidence, and sales simply wasn't working. Luckily, there were a handful of individuals within AMD that saw the strength and benefit of what Intel was doing.
A former ATI employee by the name of Jon Carvill was a particularly staunch advocate of a more open AMD. He fought to bring us the sort of detail on Barcelona that we wanted, and he was largely responsible for giving us access to the individuals and information that made our article on AMD's Barcelona architecture possible. Carvill got it, and he waged a one-man war within AMD to make sure that others within the company did as well.
We thanked him dearly for helping us get the information we needed to be able to tell you all about Barcelona, but we wanted more, and he wanted to give more. He convinced the CTOs within AMD to come together and break the silence, he put them in the same room with us, and he told them to tell us just about everything. We learned about multiple new AMD architectures, new chipsets, new directions, and nearly everything we had hoped to hear about the company.
Going into these meetings, in a secluded location away from AMD's campus, we honestly had low expectations. We were quite down on AMD and its ability to compete, and while AMD's situation in the market hasn't changed, by finally talking to the key folks within the company we at least have a better idea of how it plans to compete.
Over the coming weeks and months we will be able to share this information with you; today we start with a better understanding of the ATI acquisition and its impact on AMD's future CPU direction. We will look at where AMD plans on taking its x86 processors and what it plans to do about the ultra mobile PC market. And of course, we will talk about Barcelona; while AMD has yet to let us benchmark its upcoming processors, we can feel that our time alone with the CPU is nearing. We've got some additional details on Barcelona and its platform that we weren't aware of when we first covered the architecture.
The Road to Acquisition
The CPU and the GPU have been on this collision course for quite some time; although we often refer to the CPU as a general purpose processor and the GPU as a graphics processor, the reality is that they are both general purpose. The GPU is merely a highly parallel general purpose processor, which is particularly well suited for particular applications such as 3D gaming. As the GPU became more programmable and thus general purpose, its highly parallel nature became interesting to new classes of applications: things like scientific computing are now within the realm of possibility for execution on a GPU.
Today's GPUs are vastly superior to what we currently call desktop CPUs when it comes to things like 3D gaming, video decoding and a lot of HPC applications. The problem is that a GPU is fairly worthless at sequential tasks, meaning that it relies on having a fast host CPU to handle everything else other than what it's good at.
ATI discovered that long term, as the GPU grows in its power, it will eventually be bottlenecked by the ability to do high speed sequential processing. In the same vein, the CPU will eventually be bottlenecked by the ability to do highly parallel processing. In other words, GPUs need CPUs and CPUs need GPUs for all workloads going forward. Neither approach will solve every problem and run every program out there optimally, but the combination of the two is what is necessary.
ATI came to this realization originally when looking at the possibilities for using its GPU for general purpose computing (GPGPU), even before AMD began talking to ATI about a potential acquisition. ATI's Bob Drebin (formerly the CTO of ATI, now CTO of AMD's Graphics Products Group) told us that as he began looking at the potential for ATI's GPUs he realized that ATI needed a strong sequential processor.
We wanted to know how Bob's team solved the problem, because obviously they had to come up with a solution other than "get acquired by AMD". Bob didn't directly answer the question, but he did so with a smile, by saying that ATI tried to pair its GPUs with as low power of a sequential processor as possible but always ran into the same problem of the sequential processor becoming a bottleneck. In the end, Bob believes that the AMD acquisition made the most sense because the new company is able to combine a strong sequential processor with a strong parallel processor, eventually integrating the two on a single die. We really wanted to know what ATI's "plan-B" was, had the acquisition not worked out, because we're guessing that ATI's backup plan is probably very similar to what NVIDIA has planned for its future.
To understand the point of combining a highly sequential processor like modern day desktop CPUs and a highly parallel GPU you have to look above and beyond the gaming market, into what AMD is calling stream computing. AMD perceives a number of potential applications that will require a very GPU-like architecture to solve, things that we already see today. Simply watching an HD-DVD can eat up almost 100% of some of the fastest dual core processors today, while a GPU can perform the same decoding task with much better power efficiency. H.264 encoding and decoding are perfect examples of tasks that are better suited for highly parallel processor architectures than what desktop CPUs are currently built on. But just as video processing is important, so are general productivity tasks, which is where we need the strengths of present day Out of Order superscalar CPUs. A combined architecture that can excel at both types of applications is clearly a direction that desktop CPUs need to target in order to remain relevant in future applications.
Future applications will easily combine stream computing with more sequential tasks, and we already see some of that now with web browsers. Imagine browsing a site like YouTube except where all of the content is much higher quality and requires far more CPU (or GPU) power to play. You need the strengths of a high powered sequential processor to deal with everything other than the video playback, but then you need the strengths of a GPU to actually handle the video. Examples like this one are overly simple, as it is very difficult to predict the direction software will take when given even more processing power; the point is that CPUs will inevitably have to merge with GPUs in order to handle these types of applications.
Merging CPUs and GPUs
AMD has already outlined the beginning of its CPU/GPU merger strategy in a little product called Fusion. While quite bullish on Fusion, AMD hasn't done a tremendous job of truly explaining the importance of Fusion. Fusion, if you haven't heard, is AMD's first single chip CPU/GPU solution due out sometime in the 2008 - 2009 timeframe. Widely expected to be two individual die on a single package, the first incarnation of Fusion will simply be a more power efficient version of a platform with integrated graphics. Integrated graphics is nothing to get excited about, but it is what follows as manufacturing technology and processor architectures evolve that is really interesting.
AMD views the Fusion progression as three discrete steps:
Today we have a CPU and a GPU separated by an external bus, with the two being quite independent. The CPU does what it does best, and the GPU helps out wherever it can. Step 1, is what AMD is calling integration, and it is what we can expect in the first Fusion product due out in 2008 - 2009. The CPU and GPU are simply placed next to one another and there's minor leverage of that relationship, mostly from a cost and power efficiency standpoint.
Step 2, which AMD calls optimization, gets a bit more interesting. Parts of the CPU can be shared by the GPU and vice versa. There's not a deep level of integration, but it begins the transition to the most important step - exploitation.
The final step in the evolution of Fusion is where the CPU and GPU are truly integrated, and the GPU is accessed by user mode instructions just like the CPU. You can expect to talk to the GPU via extensions to the x86 ISA, and the GPU will have its own register file (much like FP and integer units each have their own register files). Elements of the architecture will be shared, especially things like the cache hierarchy, which will prove useful when running applications that require both CPU and GPU power.
The GPU could easily be integrated onto a single die as a separate core behind a shared L3 cache. For example, if you look at the current Barcelona architecture you have four homogenous cores behind a shared L3 cache and memory controller; simply swap one of those cores with a GPU core and you've got an idea of what one of these chips could look like. Instructions that can only be processed by the specialized core will be dispatched directly to it, while instructions better suited for other cores will be sent to them. There would have to be a bit of front end logic to manage all of this, but it's easily done.
AMD went as far as to say that the next stage in the development of x86 is the heterogeneous processing era. AMD's Phil Hester stated plainly that by the end of the decade, homogeneous multi-core becomes increasingly inadequate. The groundwork for the heterogeneous processing era (multiple cores on chip each with a different purpose) will be laid in the next 2 - 4 years, with true heterogeneous computing coming after 2010.
It's not just about combining the CPU and GPU as we know them; it's also about adding other types of processors and specialized hardware. You may remember that Intel made some similar statements a few IDFs ago, but not nearly as boldly as AMD given that Intel doesn't have nearly as strong of a graphics core to begin integrating. The xPUs listed in the diagram above could easily be things like H.264 encode/decode engines, network accelerators, virus scan accelerators, or any other type of accelerator that's deemed necessary for the target market.
In a sense, AMD's approach is much like that of the Cell processor, the difference being that with AMD's direction the end result would be a much more powerful sequential core combined with a true graphics core. Cell was very much ahead of its time, and by the time AMD and Intel can bring similar solutions to the market the entire industry will be far more ready for them than it was for Cell. Not to mention that treating everything as extensions to the x86 ISA makes programming far easier than with Cell.
Where does AMD's Torrenza fall into play? If you'll remember, Torrenza is AMD's platform approach to dealing with different types of processors in an AMD system. The idea being that external accelerators could simply pop into an AMD processor socket and communicate with the rest of the system over Hyper Transport. Torrenza actually works quite well with AMD's Fusion strategy, because it allows for other accelerators (xPUs if you will) to be put in AMD systems without having to integrate the functionality on AMD's processor die. If there's enough demand in the market, AMD can eventually integrate the functionality on die, but until then Torrenza offers a low cost in-between solution.
AMD drew the parallel to the 287/387 floating point coprocessor socket that was present on 286/386 motherboards. Only around 2 - 3% of 286 owners bought a 287 FPU, while around 10 - 20% of 386 owners bought a 387 FPU; when the 486 was designed it simply made sense to integrate the functionality of the FPU into all models because the demand from users and developers was there. Torrenza would allow the same sort of migration to occur from external socket to eventual die integration if it makes sense, for any sort of processor.
AMD in Consumer Electronics
The potential of Fusion extends far beyond the PC space and into the embedded space. If you can imagine a very low power, low profile Fusion CPU, you can easily see it being used in not only PCs but consumer electronics devices as well. The benefit is that your CE devices could run the same applications as your PC devices, truly encouraging and enabling convergence and cohabitation between CE and PC devices.
Despite both sides attempting to point out how they are different, AMD and Intel actually have very similar views on where the microprocessor industry is headed. Both companies have stated to us that they have no desire to engage in the "core wars", as in we won't see a race to keep adding cores. The explanation for why not is the same one that applied to the GHz race: if you scale exclusively in one direction (clock speed or number of cores), you will eventually run into the same power wall. The true path to performance is a combination of increasing instruction level parallelism, clock speed, and number of cores in line with the demands of the software you're trying to run.
AMD has been a bit more forthcoming than Intel in this respect by indicating that it doesn't believe that there's a clear sweet spot, at least for desktop CPUs. AMD doesn't believe there's enough data to conclude whether 3, 4, 6 or 8 cores is the ideal number for desktop processors. From our testing with Intel's V8 platform, an 8-core platform targeted at the high end desktop, it is extremely difficult finding high end desktop applications that can even benefit from 8 cores over 4. Our instincts tell us that for mainstream desktops, 3 - 4 general purpose x86 cores appears to be the near term target that makes sense. You could potentially lower the number of cores needed if you combine other specialized hardware (e.g. an H.264 encode/decode core).
What's particularly interesting is that many of the same goals Intel has for the future of its x86 processors are in line with what AMD has planned. For the past couple of IDFs Intel has been talking about bringing to market a < 0.5W x86 core that can be used for devices that are somewhere in size and complexity between a cell phone and an UMPC (e.g. iPhone). Intel has committed to delivering such a core in 2008 called Silverthorne, based around a new micro-architecture designed for these ultra low power environments.
AMD confirmed that it too envisions ultra low power x86 cores for use in consumer electronics devices, areas where ARM or other specialized cores are commonly used. AMD also recognizes that it can't address this market by simply reducing clock speed of its current processors, and thus AMD mentioned that it is working on a separate micro-architecture to address these ultra low power markets. AMD didn't attribute any timeframe or roadmap to its plans, but knowing what we know about Fusion's debut we'd expect a lower power version targeted at UMPC and CE markets to follow.
Why even think about bringing x86 cores to CE devices like digital TVs or smartphones? AMD offered one clear motivation: the software stack that will run on these devices is going to get more complex. Applications on TVs, cell phones and other CE devices will get more complex to the point where they will require faster processors. Combine that with the fact that software developers don't want to target multiple processor architectures when they deliver software for these CE devices, and by using x86 as the common platform between CE and PC software you end up creating an entire environment where the same applications and content can be available across any device. The goal of PC/CE convergence is to allow users to have access to any content, on any device, anywhere - if all the devices you're trying to gain access to content/programs on happen to all be x86, it makes the process much easier.
Why is a new core necessary? Although x86 can be applied to virtually any market segment, the range of usefulness of a particular core can extend throughout an order of magnitude of power. For example, AMD's current desktop cores can easily be scaled up or down to hit TDPs in the 10W - 100W range, but they would not be good for hitting something in the sub-1W range. AMD can easily address the sub-1W market, but it will require a different core from what it addresses the rest of the market with. This philosophy is akin to what Intel discovered with Centrino; in order to succeed in the mobile market, you need a mobile specific design. To succeed in the ultra mobile and handtop markets, you need an ultra mobile/handtop specific processor design as well. Both AMD and Intel realize this, and now both companies have publicly stated that they are doing something about it.
K10: What's in a name?
There's been this confusion over codenames when it comes to what we should call AMD's next-generation micro-architecture. Originally it was referred to by much of the press (and some of AMD) as K8L, and more recently AMD took the stance that K8L was made up by the press and that K10 is the actual name of its next-generation micro-architecture. Lately we've been calling it Barcelona, as that is the codename attached to the first incarnation of AMD's next-generation micro-architecture, destined for the server market. The desktop versions we've been calling Agena (quad-core), Kuma (dual core) and Agena FX for the Socket-1207 quad-core version, once again because those are the product specific codenames listed on AMD's roadmaps.
But when we talk about architecture, is Barcelona based on K8L, K10, or is there even a proper name for what we're talking about? To find out we went straight to the source, AMD's CTO Phil Hester, and asked him to settle the score. According to Hester, K10 was never used internally, despite some AMD representatives using it in reference to Barcelona. By the same measure, K8L does not refer to the micro-architecture behind Barcelona. It sounds like neither K8L nor K10 are correct when referring to AMD's next-generation architecture, so we'll have to continue to use Agena/Kuma/Barcelona in their place.
What happened after K8?
As we're talking about names, there was a project after the K8 that for various reasons wasn't called K9. Undoubtedly there was an internal name, but for now we'll just call it the first planned successor to the K8. The successor to the K8 was originally scrapped, but the question is how far into its development was AMD before the plug was pulled? According to Phil Hester, the project after K8 was in its concept phase when it was canceled - approximately 6 months of time were invested into the project.
So what was the reason for pulling the plug? Apparently the design was massively parallel, designed for heavily multithreaded applications. AMD overestimated the transition to multithreaded applications and made significant sacrifices to single threaded performance with this design. Just as the clock speed race resulted in Intel running straight into a power wall, AMD's massively multithreaded design also ran into power consumption issues. The chip would have tremendous power consumption, largely wasted, given its focus on highly parallel workloads.
The nail in the coffin of AMD's ill fated project was its support for FB-DIMMs. AMD quickly realized that Fully Buffered DIMM was not going to come down in cost quickly enough in the near term to tie its next microprocessor design to it. AMD eventually settled on unbuffered and registered DDR2 instead of FBD.
Without a doubt, AMD made the right decisions with scrapping this project, but it sounds like AMD lost about half a year doing the project. Given that the first K8 was introduced back in 2003, one canceled project doesn't explain why we're here in 2007 with no significant update to the K8's micro-architecture. We couldn't get a straight answer from AMD as to why Barcelona didn't come earlier, but there are a number of possibilities that we have to consider.
Barcelona is AMD's first native quad-core design, which is more complicated than simply sticking two independent dual core die on the same package. AMD committed the cardinal sin in microprocessor design by executing two very complicated transitions at the same time. Not only did AMD build its first native quad-core design with Barcelona, but it also made significant changes to the architecture of each of its cores.
Intel's Mooly Eden, the father of Centrino, once imparted some very important advice to us. He stated plainly that when designing a microprocessor you can change the architecture, or you can change the manufacturing process, but don't do both at the same time. AMD has already started its 65nm transition with its current generation parts, so the comparison isn't totally accurate, but the premise of Mooly's warning still applies: do too much at the same time and you will run into problems, usually resulting in delays.
There's also this idea that coming off of a significant technology lead, many within AMD were simply complacent and that contributed to a less hungry company as a whole. We're getting the impression that some major changes are happening within AMD, especially given its abysmal Q1 earnings results (losing $611M in a quarter tends to do that to a company). While AMD appeared to be in a state of shock after Intel's Core 2 launch last year, the boat has finally started to turn and the company that we'll see over the next 6 - 12 months should be quite different.
New Details on Barcelona Emerge
If you've been following AMD's latest roadmaps then you'll know there are a couple of new sockets on the way. While AMD's next-generation CPUs will work in current Socket-AM2 and Socket-1207 motherboards, a new class of boards will launch with support for Socket-AM2+ and Socket-1207+. Inevitably the question you will ask yourself is: what does the + get you?
The pinout of Socket-AM2 and Socket-AM2+ is identical, and likewise Socket-1207 and 1207+, and thus the same Agena or Barcelona will work in both sockets, which is how AMD is able to guarantee full backwards compatibility with current AM2 and 1207 motherboards. If you do buy a new motherboard that uses either Socket-AM2+ or 1207+, you will get some additional functionality.
Current platforms only support HyperTransport 2.0, while the new + platforms will enable HT3.0 which brings faster link speeds and greater bandwidth. For desktops, a faster HT link won't really change performance, but in multi-socket workstations and servers there will be a benefit.
The more tangible feature is the ability to support split voltage planes. As we mentioned in our preview of AMD's Barcelona architecture, the CPU cores and the Northbridge can operate at different voltages and frequencies. In order to enable this functionality, you need motherboard support, thus if you want the power benefits of having the Northbridge and CPU cores on independent power planes you need an AM2+ or 1207+ motherboard.
It's not all about saving power with split voltage planes, as there's also a performance benefit to going AM2+/1207+. When the Northbridge is placed on its own power plane, the motherboard can actually apply more voltage to it than the CPU cores and run it at a slightly higher frequency on the order of a 200 - 400MHz increase. The Northbridge is an extremely low power part of the CPU die and an increase in voltage/clock frequency results in a minor increase in TDP, but it can drive a disproportionately large increase in performance.
With AM2+/1207+ systems, the Northbridge runs at a higher frequency and thus the memory controller sees slightly lower latencies. The shared L3 cache also operates on the same power plane as the Northbridge, reducing L3 cache latency as well. AMD expects the overall performance advantage by going with AM2+/1207+ to be on the order of 3 - 10% over current motherboards.
While your current motherboards will work with AMD's forthcoming CPUs, you'll get better performance out of upcoming Socket-AM2+ and Socket-1207+ platforms. AMD does plan on supporting both AM2 and 1207 into 2009, so you can expect a continued upgrade path for your AMD platforms well after Agena/Barcelona.
Barcelona Demos and Motherboards
Much to our dismay and definitely against our recommendations, AMD will not follow in Intel's footsteps and let us do a performance preview of Agena or Barcelona. In fact, AMD wouldn't even let us know what clock speeds its demo systems were running at. While we cautioned AMD that a lack of information disclosure at this point would only reinforce this idea that AMD is lagging far behind Intel, AMD's counterpoint does have some validity. AMD's reasoning for not disclosing more information today has to do with not wanting to show all of its cards up front, and to give Intel the opportunity to react. We still don't believe it's the right decision, and we can't help but believe that the reason for not disclosing performance today is because performance isn't where it needs to be, but only AMD knows for sure at this point.
In order to combat worries that Barcelona is fundamentally broken, AMD did give us a couple of live demos of an 8-core QuadFX system and a 4-core Socket-AM2+ system. AMD ran Cinebench as well as Nero Recode on the systems, but it did not let us measure performance on either. Both systems worked fine; they didn't get too hot and they didn't crash.
Undoubtedly Agena and Agena FX work. We suspect that clock speeds aren't quite as high as they need to be but we don't doubt that AMD can get there by its scheduled release sometime in the second half of this year.
AMD also let us get up close and personal with the motherboards used in these systems, but we can't disclose details about the chipsets used just yet. Keep in mind that what you're looking at is AMD's next-generation desktop chipset solution.
The Hammerhead reference board is AMD's Socket-AM2+ reference board that was used in the quad-core Agena system above:
Up and running
The motherboard - Click to Enlarge |
All four cores, loaded and running
Socket-AM2+
The Wahoo reference board is AMD's QuadFX Socket-1207+ reference board, used in the eight-core Agena FX system:
Quad core per socket x two sockets
All eight cores, locked and loaded
The Wahoo Motherboard - Click to Enlarge |
Socket-1207+
And here's the man that made sure we could see these demos - AMD's Ian McNaughton:
He's also the guy that prevented us from running benchmarks, and hid the Cinebench scores from us:
Manufacturing Roadmap
AMD finished things off with a brief update on its manufacturing. By the middle of this year AMD's Fab 36 will be completely transitioned over to 65nm, which is just in time for Barcelona to ramp up production.
By the middle of 2008, AMD plans on bringing 45nm to market, approximately 6 months after Intel does. Fab 36 will continue to be AMD's most advanced fab, with 45nm coming out of it starting in 2008 and by 2010 AMD expects to move the fab to 32nm.
AMD showed off the same 45nm SRAM test vehicle we saw over a year ago in Dresden, which is a bit bothersome. We expected to see more than what we had already seen, but it could be that AMD continues to be a bit more guarded than we'd like; either that or functional 45nm CPU silicon just isn't yielding yet.
Final Words
Needless to say, there's more to come and this is just the beginning. The fact that we can say this about AMD is an absolute shock to us, and possibly to you as well. For the longest time it seemed like the only CPU articles we'd write were either disappointing AMD product launches or exciting new Intel announcements. AMD is changing, arguably later than we'd like, but at least it's happening.
For a while we had lost confidence in AMD, like many of you had as well, and although AMD's position in the market hasn't changed we are more confident now that it can actually bounce back from this. Intel seemed to have the perfect roadmap with Conroe, Penryn and Nehalem all lined up back to back, and we saw little room for AMD to compete. Now, coming away from these meetings, we do believe that AMD may have a fighting chance. Over the coming months you'll begin to see why; it won't be an easy battle, but it will be one that will be fought with more than just price.
AMD's Fusion strategy looks to be an even stronger part of its future plans, if Phil Hester's prediction of a heterogeneous processing era comes true. While Intel has managed to deliver a much stronger CPU roadmap, we don't have much of an understanding of its answer to Fusion in the long term. AMD has very much been a leader in areas such as the move to 64-bit, an on-die memory controller, and now we may see the same leadership role with the move to integrate the CPU and GPU. The integrated CPU/GPU, taken to the exploitation stage as it was called by AMD, has the potential to really change the CPU as we know it. We do know that Intel has a response, but we're not as clear as to exactly what it is.
That being said, there's a lot that AMD has to do in the near term to ever get us to the point where the ATI acquisition could pay off. Barcelona is still at least a quarter away, we have no idea how it will actually perform, and AMD isn't giving us any indication. Despite a relatively weak introduction of Intel's latest Centrino platform, AMD still doesn't have a good competitor; while Fusion will give it a unique selling point into the mobile market, the first Fusion core is still well over a year away. The same worries we've had about AMD are still there; while we now know that AMD has some truly wonderful things planned for its future, we worry how great of a toll the interim will take on the company.
It is often said that what doesn't kill you, makes you stronger; despite losing $611 million dollars last quarter, and not winning a single performance benchmark since Intel launched its Core 2 processors, AMD is not dead. Market share is diminished, morale is low, but it may just be possible for AMD to come back from this stronger than ever. We're not exactly sure how AMD has lasted through all of this, but if it can pull through, we may once again have two very competitive manufacturers in the CPU industry.