We just got off the phone with Nick Knupffer of Intel, who confirmed something that has long been speculated upon: the fate of Larrabee. As of today, the first Larrabee chip’s retail release has been canceled. This means that Intel will not be releasing a Larrabee video card or a Larrabee HPC/GPGPU compute part.

The Larrabee project itself has not been canceled however, and Intel is still hard at work developing their first entirely in-house discrete GPU. The first Larrabee chip (which for lack of an official name, we’re going to be calling Larrabee Prime) will be used for the R&D of future Larrabee chips in the form of development kits for internal and external use.

The big question of course is “why?” Officially, the reason why Larrabee Prime was scrubbed was that both the hardware and the software were behind schedule. Intel has left the finer details up to speculation in true Intel fashion, but it has been widely rumored in the last few months that Larrabee Prime has not been performing as well as Intel had been expecting it to, which is consistent with the chip being behind schedule.

Bear in mind that Larrabee Prime’s launch was originally scheduled to be in the 2009-2010 timeframe, so Intel has already missed the first year of their launch window. Even with TSMC’s 40nm problems, Intel would have been launching after NVIDIA’s Fermi and AMD’s Cypress, if not after Cypress’ 2010 successor too. If the chip was underperforming, then the time element would only make things worse for Intel, as they would be setting up Larrabee Prime against successively more powerful products from NVIDIA and AMD.

The software side leaves us a bit more curious, as Intel normally has a strong track record here. Their x86 compiler technology is second to none, and as Larrabee Prime is x86 based, this would have left them in a good starting position for software development. What we’re left wondering is whether the software setback was for overall HPC/GPGPU use, or if it was for graphics. Certainly the harder part of Larrabee Prime’s software development would be the need to write graphics drivers from scratch that were capable of harnessing the chip as a video card, taking in to consideration the need to support older APIs such as DX9 that make implicit assumptions about the layout of the hardware. Could it be that Intel couldn’t get Larrabee Prime working as a video card? That’s going to be a big question that’s going to hang over Intel’s heads right up to the day that they finally launch a Larrabee video card.

Ultimately when we took our first look at Larrabee Prime’s architecture, there were 3 things that we believed could go wrong: manufacturing/yield problems, performance problems, and driver problems. Based on what Intel has said, we can’t write off any of those scenarios. Larrabee Prime is certainly suffering from something that can be classified as driver problems, and it may very well be suffering from both manufacturing and performance problems too.

To Intel’s credit, even if Larrabee Prime will never see the light of day as a retail product, it has been turning in some impressive numbers at trade shows. At SC09 last month, Intel demonstrated Larrabee Prime running the SGEMM HPC benchmark at 1 TeraFLOP, a notable accomplishment as the actual performance of any GPU is usually a fraction of its theoretical performance. 1TF is close to the theoretical performance of NVIDIA’s GT200 and AMD’s RV770 chips, so Larrabee was no slouch. But then again its competition would not be GT220 and RV770, it’s Fermi and Cypress.

Next, this brings us to the future of Larrabee. Larrabee Prime may be canceled, but the Larrabee project is not. As Intel puts it, Larrabee is a “complex multi-year project” and development will be continuing. Intel still wants a piece of the HPC/GPGPU pie (least NVIDIA and AMD get it all to themselves) and they still want in to the video card space given the collision between those markets. For Intel, their plans have just been delayed.


The Larrabee architecture lives on

For the immediate future, as we mentioned earlier Larrabee Prime is still going to be used by Intel for R&D purposes, as a software development platform. This is a very good use of the hardware (however troubled it may be) as it allows Intel to bootstrap the software side of Larrabee so that developers can get started programming for real hardware while Intel works on the next iteration of Larrabee. Much like how NVIDIA and AMD sample their video cards months ahead of time to game developers, we expect that Larrabee Prime SDKs would be limited to Intel’s closest software partners, so don’t expect to see much if anything leak about Larrabee Prime once chips start leaving Intel’s hands, or to see extensive software development initially. Widespread Larrabee software development will still not start until Intel ships the next iteration of Larrabee, if this is the case.

We should know more about the Larrabee situation next year, as Intel is already planning on an announcement at some point in 2010. Our best guess is that Intel will announce the next Larrabee chip at that time, with a product release in 2011 or 2012. Much of this will depend on what the hardware problem was and what process node Intel wants to use. If Intel just needs the ability to pack more cores on to a Larrabee chip then 2011 is a reasonable target, otherwise if there’s a more fundamental issue then 2012 is more likely. This lines up with the process nodes for those years: if they go for 2011 they hit the 2nd year of their 32nm process, otherwise if they launched in 2012 they would be able to launch it as one of the first products on the 22nm process.

For that matter, Since the Larrabee project was not killed, it’s a safe assumption that any future Larrabee chips are going to be based on the same architectural design. The vibe from Intel is that the problem is Larrabee Prime and not the Larrabee architecture itself. The idea of an x86 many-cores GPU is still alive and well.


On-Chip GMA-based GPUs: Still On Schedule For 2010

Finally, there’s the matter of Intel’s competition. For AMD and NVIDIA, this is just about the best possible announcement they could hope for. On the video card front it means they won’t be facing any new competitors through 2010 and most of 2011. That doesn’t mean that Intel isn’t going to be a challenge for them – Intel is still launching Carkdale and Arrandale with on-chip GPUs next year – but they won’t be facing competition at the high-end too. For NVIDIA in particular, this means that Fermi has a clear shot at the HPC/GPGPU space without competition from Intel, which is exactly the kind of break NVIDIA needed since Fermi is running late.

POST A COMMENT

71 Comments

View All Comments

  • AnandThenMan - Friday, December 04, 2009 - link

    Which is quite surprising in retrospect. When AMD purchased ATI, it was heralded by many as one of the worst business deals of all time. Too expensive, ATI tech is not good enough, too much money to pay just to be able to offer a complete platform. (many believed AMD would outright give up doing high end discreet parts)

    Yes Intel will be first with a CPU/GPU hybrid, but marrying such a terrible GPU to an excellent CPU yields a very unbalanced and strange piece of silicon. Sure, Intel could potentially improve their GPU, but how many years have we been hearing that was going to happen? Performance is still incredibly lackluster. Wasn't the whole idea of Larrabee in the first place? Get close to the class leading performance, or at least in the same universe.

    Ironic how AMD ends being the only one out there with the tech to make a balanced Fusion type product.
    Reply
  • mutarasector - Saturday, December 12, 2009 - link

    "Which is quite surprising in retrospect. When AMD purchased ATI, it was heralded by many as one of the worst business deals of all time. Too expensive, ATI tech is not good enough, too much money to pay just to be able to offer a complete platform"

    I remember the criticisms of AMD's aquisition of ATI very well. At the time, I too was a bit concerned - not because it was a bad idea for AMD to buy ATI, but it was a *bad time* to do so considering the legal entanglement with their suit against Intel. While AMD needed to become a complete platform vendor, the expense of chewing two big bites (aquisition + Intel suit) darn near choked AMD.

    Strangely enough, the economy tanking may have actually worked to AMD's advantage with regard to the ATI aquisition because it forced AMD to focus on the mid level and low end market segments well ahead of Intel -- essentially beating a 'tactical retreat' and digging in there. ATI, the very thing that almost *choked* AMD is now the very corporate asset that is keeping AMD competitive with just enough cash flow/revenue stream to be able to withstand the losses AMD has had to suffer for as long as they have. Roll the clock ahead a few years, and a nice healthy settlement with Intel, and now there is hope for AMD to actually be able to work back to being competitive in the high-end market segment again - especially with the in-house manufacture limitation *gone* (allowing AMD to go completely fabless).
    Reply
  • mutarasector - Saturday, December 12, 2009 - link

    "Which is quite surprising in retrospect. When AMD purchased ATI, it was heralded by many as one of the worst business deals of all time. Too expensive, ATI tech is not good enough, too much money to pay just to be able to offer a complete platform"

    I remember the criticisms of AMD's aquisition of ATI very well. At the time, I too was a bit concerned - not because it was a bad idea for AMD to buy ATI, but it was a *bad time* to do so considering the legal entanglement with their suit against Intel. While AMD needed to become a complete platform vendor, the expense of chewing two big bites (aquisition + Intel suit) darn near choked AMD.

    Strangely enough, the economy tanking may have actually worked to AMD's advantage with regard to the ATI aquisition because it forced AMD to focus on the mid level and low end market segments well ahead of Intel -- essentially beating a 'tactical retreat' and digging in there. ATI, the very thing that almost *choked* AMD is now the very corporate asset that is keeping AMD competitive with just enough cash flow/revenue stream to be able to withstand the losses AMD has had to suffer for as long as they have. Roll the clock ahead a few years, and a nice healthy settlement with Intel, and now there is hope for AMD to actually be able to work back to being competitive in the high-end market segment again - especially with the in-house manufacture limitation *gone* (allowing AMD to go completely fabless).
    Reply
  • Technium - Saturday, December 05, 2009 - link

    Integrating Intel's GPU into Sandy Bridge is aimed mostly towards the low end (very low end) segment where FPS in Crysis are meaningless. Practically all the enterprise market and quite a lot of the retail market (desktop/laptop) can settle for a cheap GPU that can do anything but play new games, take zero space and very little power. This will also be the first time an integrated GPU is manufactured on the same process (32nm) as the CPU. Sandy Bridge's successor "Ivy Bridge" will be a 22nm part with even lower power and better performance.
    My work PC is a laptop with Intel graphics and it does the job - connect OK to various screens and projectors.
    The rant about the performance of the integrated GPUs is ridiculous - for proper game play you must buy a 100W or more GPU. You'll never ever get the same performance from a 5W part during the same generation. Since the game companies always aim towards the mid-high end segment, my statement holds.
    Reply
  • eddieroolz - Sunday, December 06, 2009 - link

    I disagree about your assertion that the performance of IGP not mattering in the low end segment.

    My laptop, an GMA4500 HP, struggles with heavy flash. The reason partially lies in the weak laptop CPU, but a lot of the lag does have to do with the crappy Intel IGP.

    Elsewhere, we have the older X3100-based Macbooks which struggle even to play YouTube in standard definiton, or run a Java game at full speed. Again, those things are slowed down by the Intel solutions, as on a Ion-based system it wouldn't happen.

    HD-movies also will benefit from beefier graphical solutions - just look at the difference between Ion and GMA4500.
    Reply
  • rs1 - Sunday, December 06, 2009 - link

    Exactly, gaming performance is no longer the only reason to want a decent GPU in your system. As Blu-ray drives and HD content continue to become more widespread, and as more applications start going the GPU accelerated route like Flash, the performance of the GPU is going to become more important, even for users who never play a demanding 3d game.

    As I said above, the failure of Larrabee leaves AMD in a relatively strong position for the next couple of years. The only question is whether or not they'll be able to execute on it.
    Reply
  • AnandThenMan - Friday, December 04, 2009 - link

    This writeup has some odd assumptions. You are saying that the architecture itself is sound, but there is some hardware problem, presumably Intel cannot pack enough cores using current fab tech to make it competitive in traditional graphics?

    Well if that's so, then the architecture is NOT sound. Waiting for a better process node is pointless, because your competition will move to a new one as well, combined with much better performance. If the architecture was truly solid, then it would be a competitive part if it was made today. Waiting to make the hardware viable and competitive is a losing battle because your competition never sits still, which is exactly what has happened to Larrabee as it stand today.

    Larrabee is just not a good idea for traditional graphics rendering.
    Reply
  • Olen Ahkcre - Saturday, January 09, 2010 - link

    "Devil is in the details..."

    Larabee looks good on paper, but working out the details is much, MUCH more problematic.

    Which general means something is overlooked or the design is more complicated than it needs to be or they're taking the wrong approach to the problem.

    In the case of Larabee concept, I think Intel is taking the wrong approach to problem.

    GPGPU is in fact not about "cloud-computing" or stream-processors or compute-shaders... GPGPU is in fact largely a misnomer, (a red herring, if you will) which often causes a problem of aiming for the target.

    What Intel or AMD or whoever wants to get ahead in next stage of the computer evolution is to build a faster floating-point processor. Which is EXACTLY what Nvidia has done... inadvertently, because the requirements for building a super-fast 3D accelerator requires huge amounts of floating calculations... and when Nvidia bought Aegia PhysX they stumbled on yet another piece of the next computer evolutionary stage, software/hardware physics engine... perfect for doing simulations scientific and otherwise.

    Think about what is common in stream-processors and compute-shaders and supercomputing... number crunching... FLOPS (floating-point operation per second).

    To put it in the simplest terms what they're aiming for is a massive floating-point processing unit array...

    Why Intel shouldn't try to build a GPGPU is, in my opinion this...

    They became too dominant in the CPU arena to the point of overlooking THE one major area that they could have excelled the field of supercomputers.... floating point processing and they lost sight of that target when CPUs for personal computers surpassed CPUs designed for super-computers and laps into complacency.

    Floating-point calculations... as far as I know, the SSE (Streaming SIMD units) on multicore processors aren't coordinated to operate as a single unit... something for AMD and Intel to look into. The area might be worth looking into for various reasons I won't go into, yet.

    The approach one I would recommend is looking to coordinating the SSE units on multicore CPUs from the software stand point and work back to how to improve the hardware design in the CPU for more efficient operating conditions.

    The area of focus I would recommend (if I was making the choice) would be Blu-ray (1920x1080 resolution, H.264) decoding. With a the combination of 4-series integrated graphics device and a dual-core processor, there should be enough processing and memory bandwidth to play back Blu-ray movies without dropping any frames. At least that's my opinion anyways.

    Why focus on Blu-ray... Blu-ray requires lots of computing power... and it's a more popular format than DVD (or it will be and why someone should start working on a solution now rather than later). Blu-ray movies don't play back all that well on a lot of laptops I've tested.

    Another reason for Intel to work this out is Nvidia looks like they're going to take another chunk and/or create a new market segment... set top box sized computers using another flavor of their general-purpose GPU to play-back Blu-ray using dual-core ARM processors.

    The area they've overlooked is how to improve computing power... their answer is just add more cores and hyperthreading.... all the while neglecting the SSE units, which could be utilized better.

    I mean the whole processor idea could use some serious rethinking.

    Set-top box media computers all the way up to supercomputers.

    Various types of servers for example don't need floating-point processors at all... (file, print, search engines, databases, etc.)

    Logistical analysis (like chess) doesn't require floating-pointing processors...

    Intel needs to work out how to divide up the transistor real estate better...

    System-on-a-Chip (SOC) up to discreet integer/floating-point units for supercomputing needs.

    The number of ... aaahhh-hhhaaa!!!

    Sorry for rambling on and on... I need to work out some of the details...

    but what I see is several different flavors of overlapping CPU designs... targeted for various market segments...

    Still working out the details...

    The ideas are coming faster than I can sort out in my head much less type...

    Well that's it for now... just some ideas to throw out there if anyone is interested.
    Reply
  • qcmadness - Friday, December 04, 2009 - link

    The 1024-bit ringbus could be the problem.
    Remember when ATi put a ringbus in R520 and removed it in RV770?

    I don't think chip-wide cache coherency is good either.
    Reply
  • Sahrin - Friday, December 04, 2009 - link

    This is exactly what I expected after that weird out-of-the-blue announcement of the suspiciously Larrabee-like 'cloud CPU' with '48' 'very simple x86' cores (read: mesh of 48 in-order x86 CPU's)

    Too bad, the graphics market could use the competition (especially since nVidia seems prepared to cede control of the market to ATI).

    I wonder if this has anything to do with the anti-trust (that is - probably not a good idea to invest a bunch of money entering a new market place when you're about to get fined billions of dollars and may be told at the end of a long an expensive development project that you can't release the product due to anti-trust sanctions)?
    Reply

Log in

Don't have an account? Sign up now