I just got off the phone with Intel’s Steve Smith (VP and Director of Intel Client PC Operations and Enabling) and got some more detail on this morning’s 6-series chipset/SATA bug.

The Problem

Cougar Point (Intel’s 6-series chipsets: H67/P67) has two sets of SATA ports: four that support 3Gbps operation, and two that support 6Gbps operation. Each set of ports requires its own PLL source.

The problem in the chipset was traced back to a transistor in the 3Gbps PLL clocking tree. The aforementioned transistor has a very thin gate oxide, which allows you to turn it on with a very low voltage. Unfortunately in this case Intel biased the transistor with too high of a voltage, resulting in higher than expected leakage current. Depending on the physical characteristics of the transistor the leakage current here can increase over time which can ultimately result in this failure on the 3Gbps ports. The fact that the 3Gbps and 6Gbps circuits have their own independent clocking trees is what ensures that this problem is limited to only ports 2 - 5 off the controller.

You can coax the problem out earlier by testing the PCH at increased voltage and temperature levels. By increasing one or both of these values you can simulate load over time and that’s how the problem was initially discovered. Intel believes that any current issues users have with SATA performance/compatibility/reliability are likely unrelated to the hardware bug.

One fix for this type of a problem would be to scale down the voltage applied across the problematic transistor. In this case there’s a much simpler option. The source of the problem is actually not even a key part of the 6-series chipset design, it’s remnant of an earlier design that’s no longer needed. In our Sandy Bridge review I pointed out the fair amount of design reuse that was done in creating the 6-series chipset. The solution Intel has devised is to simply remove voltage to the transistor. The chip is functionally no different, but by permanently disabling the transistor the problem will never arise.

To make matters worse, the problem was inserted at the B-stepping of the 6-series chipsets. Earlier steppings (such as what we previewed last summer) didn’t have the problem. Unfortunately for Intel, only B-stepping chipsets shipped to customers. Since the fix involves cutting off voltage to a transistor it will be fixed with a new spin of metal and you’ll get a new associated stepping (presumably C-stepping?).

While Steve wouldn’t go into greater detail he kept mentioning that this bug was completely an oversight. It sounds to me like an engineer did something without thinking and this was the result. This is a bit different from my initial take on the problem. Intel originally characterized the issue as purely statistical, but the source sounds a lot more like a design problem rather than completely random chance.

It’s Notta Recall

Intel has shipped around 8 million 6-series chipsets since the launch at CES. It also committed to setting aside $700 million to deal with the repair and replacement of any affected chipsets. That works out to be $87.50 per chipset if there are 8 million affected chipsets in the market, nearly the cost of an entire motherboard. Now the funds have to cover supplying the new chipset, bringing in the affected motherboard and repairing it or sending out a new one. Intel can eat the cost of the chipset, leaving the $87.50 for shipping, labor and time, as well as any other consideration Intel provides the OEM with (here’s $5, don’t hate us too much). At the end of the day it seems like enough money to handle the problem. However Intel was very careful to point out that this is not a full blown recall. The why is simple.

If you have a desktop system with six SATA ports driven off of P67/H67 chipset, there’s a chance (at least 5%) that during normal use some of the 3Gbps ports will stop working over the course of 3 years. The longer you use the ports, the higher that percentage will be. If you fall into this category, chances are your motherboard manufacturer will set up some sort of an exchange where you get a fixed board. The motherboard manufacturer could simply desolder your 6-series chipset and replace it with a newer stepping if it wanted to be frugal.

If you have a notebook system with only two SATA ports however, the scenario is a little less clear. Notebooks don’t have tons of storage bays and thus they don’t always use all of the ports a chipset offers. If a notebook design only uses ports 0 & 1 off the chipset (the unaffected ports), then the end user would never encounter an issue and the notebook may not even be recalled. In fact, if there are notebook designs currently in the pipeline that only use ports 0 & 1 they may not be delayed by today’s announcement. This is the only source of hope if you’re looking for an unaffected release schedule for your dual-core SNB notebook.

Final Words

Intel maintains that Sandy Bridge CPUs are not affected, and current users are highly unlikely to encounter the issue even under heavy loads. So far Intel has only been able to document the issue after running extended testing at high temperatures (in a thermal chamber) and voltages. My recommendation is to try to only use ports 0 & 1 (the 6Gbps ports) on your 6-series motherboard until you get a replacement in place.

OEMs and motherboard manufacturers are going to be talking to Intel over the next week to figure out the next steps. Intel plans to deliver fixed silicon to its partners at the end of February, however it’ll still take time for the motherboard makers to turn those chips into products. I wouldn’t expect replacements until March at the earliest.

I maintain that the best gesture of goodwill on Intel’s part would be to enable motherboard manufacturers to replace P67/H67 motherboards with Z68 boards for those users who want them.

POST A COMMENT

125 Comments

View All Comments

  • cjb110 - Tuesday, February 01, 2011 - link

    That's a good idea...better than the boards ending up in landfill.

    Remove the broken ports, sell it as a 2 port board (quite a few pc's only have 2 drives).

    But in actuality what will happen? Is it viable to 'remake' the boards with components replaced?
    Reply
  • cbass64 - Tuesday, February 01, 2011 - link

    Most of the boards are from OEM's, so Intel will just ship them a fixed version of the chipset and the OEM's will be in charge of either swapping the chipsets out or just putting the new chipsets on new boards. For the boards people bought directly from Intel, Intel will probably ship them a brand new board and then swap out the bad chipset and sell that board as a refurb later. Reply
  • JasperJanssen - Tuesday, February 01, 2011 - link

    Most motherboards that are sold retail for around $100. That means the manufacturing cost, in toto, is somewhere around $35-50. The rest is packaging, shipping, advertising, and profit.

    Shipping piles of boards (properly protected from ESD and physical damage, mind you) to China so that reflow-solder workers can remove the chipset, carefully clean the board, align and install the new one, reflow oven, test the newly made motherboard extensively, and ship them back... it's gonna be more expensive than simply dumping them in the landfill. Also, replacing the chipset is a major operation, and you would have to do it by hand rather than robotic placement, and I'm not convinced that the quality control wouldn't have to dump half the boards into the incinerator anyway.

    Removing the broken ports at the distributors and then disabling them in the bios, though, that might be worthwhile -- except for the fact that 2 ports is just too few to be interesting. You could only really do it with OEM manufacturers -- the pallets worth of boards that are stacked up waiting to go into HPaqs and Dells. And you need 1 esata port, 1 ssd port, 1 storage port, and 1 optical port even in basic machines -- possibly omitting the ssd, but even then, 3 is more than two and no expansion at all tends to make consumers sad. Especially when it's the one upgrade people actually do sometimes, slinging an extra HD or Optical in there.
    Reply
  • Zoomer - Wednesday, February 02, 2011 - link

    There's the ide port which can be used for the optical drive. So 2 ports, no esata, sounds good. Reply
  • Phylyp - Monday, January 31, 2011 - link

    "... the best gesture of goodwill on Intel’s part would be to enable motherboard manufacturers to replace P67/H67 motherboards with Z68 boards..."
    That might be the only way for Intel to win hearts and minds.

    Even so, I'm sure when the LGA 2011-based mobos launch later this year, it will make early adopters pause to think twice.

    And RU482's got a point - mobo manufacturers might choose to refurb, given the very specific nature of the problem.
    Reply
  • htwingnut - Monday, January 31, 2011 - link

    Ok, $700M over 8M products, I get the $87.50 there. But there's the round trip shipping cost, labor, and the bad stock which will be quite expensive, much more than $87.50, especially for laptops. I just ordered a Sager laptop with a Sandy Bridge CPU i7-2720QM. I can't imagine it'll be cheap to replace the motherboard on that thing. I just hope I can get better then UPS ground shipping because that would be a ten day round trip for me... :( Reply
  • nikclev - Monday, January 31, 2011 - link

    I'm sure they are factoring in several things:

    1) Many of those chips have probably not made it into motherboards yet, so it's a simple shipping and/or destruction cost. (I doubt they will destroy them, but who knows..) Quite simple to ask all the OEMs just to ship back or destroy the pallets/trays/however they are bulk packaged, I'm sure it won't cost 87.50 each to ship.

    2) I'm sure there are many people that don't care/don't know that there is a potential problem, although that depends I suppose on how much (if any) mainstream media attention there is.

    3) As Anand mentioned, there may be quite a few laptop chips that are unaffected that are probably included in that 8m number, many OEMs may have chosen to only implement the sataIII ports.

    4) They may be lowballing the dollar amount to make things look better than they actually are. I remember reading somewhere that Intel set aside a certain amount for the floating point bug, but ended up spending a good bit more.
    Reply
  • nikclev - Monday, January 31, 2011 - link

    Forgot to add: Maybe the plan to offset the cost somehow... Maybe sell them as keychains in the Intel gift shop! Reply
  • Calin - Tuesday, February 01, 2011 - link

    Intel would really really not like to have a million boards (15% of 8 million chips) losing SATA ports (even if three years from now). People have long memories for this kind of bad accidents, and might decide that Intel is no longer a dependable vendor.
    The reason Intel mainboards (Pentium II and !!! days) of old were bought was not feature set or performance or overclockability or something else - they were bought because they were rock solid, during the time when Anandtech testing of mainboard crashed them usually more than thrice in 48 hours. And I remember when the mainboards came that crashed in that burn test only when they used interleaved memory access, but not in non-interleaved memory access.
    Reply
  • JasperJanssen - Tuesday, February 01, 2011 - link

    Any chips that aren't already in boards will go to landfill, and they probably won't even be shipped back to intel for it. The only reason for Intel to require them back is to verify that they are *really* destroyed, and not just labeled as such and then inserted into no-brand cheap motherboards.

    $87.50 per chip on average is far from a lowball. That means they can easily be planning on spending $150 for ATX and even more per notebook that's affected.
    Reply

Log in

Don't have an account? Sign up now