Introduction

The majority of home users have experienced the agony of at least one hard drive failure in their lives. Power users often experience bottlenecks caused by their hard drives when they try and accomplish I/O-intensive tasks. Every IT person who has been in industry for any length of time has dealt with multiple hard drive failures. In short, hard drives have long caused the majority of support headaches in standard desktop or server configurations today, with little hope of improvement in the near term.

With the increased use of computers in the daily lives of people worldwide, the dollar value of data stored on the average computer has steadily increased. Even as MTBF figures have moved from 8000 hours in the 1980s (example: MiniScribe M2006) to the current levels of over 750,000 hours (Seagate 7200.11 series drives), this increase in data value has offset the relative decrease of hard drive failures. The increase in the value of data, and the general unwillingness of most casual users to back up their hard drive contents on a regular basis, has put increasing focus on technologies which can help users to survive a hard drive failure. RAID (Redundant Array of Inexpensive Disks) is one of these technologies.

Drawing on whitepapers produced in the late 1970s, the term RAID was coined in 1987 by researchers at the University of California, Berkley in an effort to put in practice theoretical gains in performance and redundancy which could be made by teaming multiple hard drives in a single configuration. While their paper proposed certain levels of RAID, the practical needs of the IT industry have brought several slightly differing approaches. Most common now are:

RAID 0 - Data Striping
RAID 1 - Data Mirroring
RAID 5 - Data Striping with Parity
RAID 6 - Data Striping with Redundant Parity
RAID 0+1 - Data Striping with a Mirrored Copy

Each of these RAID configurations has its own benefits and drawbacks, and is targeted for specific applications. In this article we'll go over each and discuss in which situations RAID can potentially help - or harm - you as a user.

RAID 0 and RAID 1
POST A COMMENT

40 Comments

View All Comments

  • Brovane - Friday, September 07, 2007 - link

    Personally we use Raid (0+1) at my work for our Exchange Cluster, SQL cluster and the home drives for our F&P cluster. Were Raid 0+1 is great is in a SAN environment. We have the drives mirror between SAN DAE so we could have a entire DAE fail on our SAN and for example Exchange will remain up and running. Also if you have a drive failure in one of our RAID 0+1 drives the SAN automatically just grabs the hot spare and starts rebuilding the array and pages the the lan team and alerts Dell to ship a new drive. Of course no matter what RAID you have setup you should always have daily tape backups with a copy of those tapes going offsite. Reply
  • Bladen - Friday, September 07, 2007 - link

    Might be asking a bit too much, especially in the case of RAID 5, 6, 0+1, and 1+0, but some SSD raid performance would be nice. They would need more than 2 drives wouldn't they?

    However if we could see some RAID 0 figures from a pair off budget SSD's, and a pair of performance SSD's, that would be awesome.
    Reply
  • tynopik - Friday, September 07, 2007 - link

    in addition to a WHS comparison i hope it covers

    1. software raid (like built into windows or linux)
    2. motherboard raid solutions (nvraid and intel matrix)
    3. low end products (highpoint and promise)
    4. high end/enterprise products
    5. more exotic raids like raid-z and raid 5ee
    6. performance of mixing raids across same disks like you can with matrix raid and some adaptecs

    and in addition to features/cost/performance i hope it really tries to test how reliable/bulletproof these solutions are

    for instance a ton of people have had problems with nvraid
    http://www.nforcershq.com/forum/image-vp511756.htm...">http://www.nforcershq.com/forum/image-vp511756.htm...

    what happens if you yank the power in the middle of a write?
    how easy is it to migrate an array to a different controller?
    can disks in raid1 be yanked from the array and read directly or does it put header info on the disk that makes this impossible?
    Reply
  • yyrkoon - Saturday, September 08, 2007 - link

    quote:

    for instance a ton of people have had problems with nvraid


    That would be becasue "a ton of people are idiots'. I have been using nvRAID for a couple of years without issues, and most recently I even swapped motherboards, and the array was picked right up without a hitch once the proper BIOS settings were made. I would suspect that these people who are 'having problems' are the type who expect/believe that having a RAID0 array on their system will give them another 30 frames per second in the latest first person shooter as well . . .
    Reply
  • tynopik - Saturday, September 08, 2007 - link

    > I would suspect that these people who are 'having problems' are the type who expect/believe that having a RAID0 array on their system will give them another 30 frames per second in the latest first person shooter as well . . .

    the link is in the very top comment

    they were all actually using raid1 and had problems with it constantly splitting the array
    Reply
  • tynopik - Friday, September 07, 2007 - link

    http://storageadvisors.adaptec.com/">http://storageadvisors.adaptec.com/
    great site with lots of potential topics like:

    desktop vs raid/enterprise drives - is there a difference
    http://storageadvisors.adaptec.com/2006/11/20/desk...">http://storageadvisors.adaptec.com/2006...-drives-...

    Picking the right stripe size
    http://storageadvisors.adaptec.com/2006/06/05/pick...">http://storageadvisors.adaptec.com/2006/06/05/pick...

    Different types of RAID6
    http://storageadvisors.adaptec.com/2005/11/07/a-ta...">http://storageadvisors.adaptec.com/2005/11/07/a-ta...

    other features to consider:
    handling dissimilar drives
    morph online from one RAID level to another
    easily add additional drives/capacity to an existing array
    can you change which port a drive is connected to without messing up the array?

    maybe create a big-honkin features matrix that shows which controllers are missing what?

    performance:
    - cpu hit between software raid, low-end controllers, enterprise controllers (some have reported high cpu usage with highpoint controllers even when using raid-1 which shouldn't cause much load)
    - cpu hit with different busses (PCI, PCI-X, PCIe) and different connections (firewire, sata, scsi, sas, usb)

    maybe even a corruption test. (write terabytes of data out under demanding situations and read back to ensure there was no corruption)

    But most of all I WANT A TORTURE TEST. I want these arrays pushed to their limits and beyond. What does it take to make them fail? How gracefully do they handle it?
    Reply
  • tynopik - Friday, September 07, 2007 - link

    an article from the anti-raid perspective
    http://www.pugetsystems.com/articles?&id=29">http://www.pugetsystems.com/articles?&id=29
    Reply
  • tynopik - Saturday, September 08, 2007 - link

    another semi-anti-raid piece

    http://www.bestpricecomputers.co.uk/reviews/home-p...">http://www.bestpricecomputers.co.uk/reviews/home-p...

    "Why? From our survey of a sample of our customers here's how it tends to happen:

    The first and foremost risk is that the RAID BIOS loses the information it stores to track the allocation of the drives. We've seen this caused by all manner of software particularly anti-virus programs. Caught in time a simple recreation of the array (see last page) resolves the problem in over 90% of the cases.

    BIOS changes, flashing the BIOS, resetting the BIOS, updating firmware etc can cause an array to fail. BIOS changes happen not just by hitting delete to enter setup. Software can make changes to the BIOS.

    Disk managers, hard disk utilities, imaging and partitioning software etc. can often confuse a RAID array."

    -------------------------

    http://storagemojo.com/?p=383">http://storagemojo.com/?p=383

    . . . . the probability of seeing two drives in the cluster fail within one hour is four times larger under the real data . . . .

    Translation: one array drive failure means a much higher likelihood of another drive failure. The longer since the last failure, the longer to the next failure. Magic!

    (perhaps intentionally mixing the manufacturers of drives in a raid is a good idea?)

    ------------------

    http://www.lime-technology.com/">http://www.lime-technology.com/

    unRAID

    -----------------

    http://www.miracleas.com/BAARF/">http://www.miracleas.com/BAARF/

    an amusing little page

    -----------------

    it would also be cool if you had a failing drive that behaved erratically/intermittently/partially to test these systems

    -----------------

    if a drive fails in a raid array and you pull the wrong drive, can you stick it back in and still recover or does the controller wig out?

    ------------------

    some parts from the thread at the top that you might have missed

    http://www.nforcershq.com/forum/3-vt61937.html?pos...">http://www.nforcershq.com/forum/3-vt619...=0&p...

    > Someone claims that the nv sata controler (or maybe raid controler) doesn't work properly with the NCQ function of new hard drives (or the tagged queing or whatever WD calls it).

    > if the drives are SATA II drives with 3 G/bps speed and NCQ features NVRAID Controller has know problems with this drives.

    > the first test trying to copy data from the raid to the external firewire drive resulted in not 1 but 2 drives dropping out.

    Luckily the 2 were both 1 half of the mirror meaning i could rebuild the raid. So looks like trying to use the firewire from the raid is the problem. THis may stand to reason as the firewire card is via an add-on card in a PCI slot so maybe there is some weird bottleneck in the bus when doing this causing the nvraid to malfunction.

    (so like check high pci bus competition)

    http://www.nforcershq.com/forum/4-vt61937.html?sta...">http://www.nforcershq.com/forum/4-vt61937.html?sta...

    > I have read that its best to disable ncq and also read cache from all drives in the raid via the device manager. This may tie in with someone else’s post here who says the nvraid has issues with ncq drives.

    http://www.nforcershq.com/forum/image-vp591021.htm...">http://www.nforcershq.com/forum/image-vp591021.htm...

    NF4 + Vista + RAID1 = no NCQ?

    ------------------------------------

    RAID is dead, all hail the storage robot

    http://www.daniweb.com/blogs/printentry1399.html">http://www.daniweb.com/blogs/printentry1399.html

    Drobo - The World's first storage robot

    http://www.datarobotics.com/">http://www.datarobotics.com/

    "Drobo changes the way you think about storage. In short, it's the best solution for managing external storage needs I have used." - JupiterResearch

    "It is the iPod of mass storage" - ZDNet

    "...the most impressive multi-drive storage solution for PCs I've seen to date" - eHomeUpgrade

    sucks that it's $500 without drives and usb only though

    Reply
  • Dave Robinet - Saturday, September 08, 2007 - link

    Good posts. A topic you're obviously interested in. :)

    Let me try and hit a few of the points in random order:

    - Stress/break testing is a GOOD idea, but very highly subjective. You can't GUARANTEE that you'll be writing (or reading) EXACTLY the same data under EXACTLY the same circumstances, so there's always that element of uncertainty. Even opening the same file can't guarantee that the same segments are on the same disk, so... I'll have to give some thought to that. Definitely worthwhile, though, to pursue that angle (especially in terms of looking at how array controllers recover from major issues like that).

    - Your other points pretty much all hit on a major argument: Software versus Hardware RAID (and versus proprietary hardware). I actually know an IT Director in a major (Fortune 500) company who uses software RAID exclusively, including in fairly intensive I/O applications. His argument? "I've been burned by "good" hardware too often - it lasts 7 years, I forget to replace it, and when the controller cooks, my array is done." (Make whatever argument you like about him not being on the ball enough to replace his 7 year old equipment, but I digress). I do find the majority of the decent controllers write header information in fairly documented (and retrievable) ways - look at IBM's SmartRAID series as a random example of this - so I don't see that being a hugely big deal anymore.

    You're dead on, though. *CONSUMERS* who are looking at RAID need to be very, very sure they know what they're getting themselves into.
    Reply
  • tynopik - Saturday, September 08, 2007 - link

    > You can't GUARANTEE that you'll be writing (or reading) EXACTLY the same data under EXACTLY the same circumstances, so there's always that element of uncertainty

    that's true, but i don't think it's that important

    have a test where you're copying a thousand small files and yank the power in the middle
    run this test 5-10 times and see how they compare
    controller 1 never has a problem
    controller 2 required a complete rebuild 5 times

    maybe you can't exactly duplicate the circumstances, but it's enough to say controller 2 has problems

    (actually requiring a complete rebuild even once would be a serious problem)

    similarly, have a heavy read/write pattern with random data while simultaneously writing data out a pci firewire card and maybe even a usb drive and have audio playing and high network traffic (as much bus traffic and conflict as you can generate) that runs for 6 hours
    controller 1 has 0 bit errors in that 6 hours
    controller 2 has 200 bit errors in that 6 hours

    controller 2 obviously has problems even if you can't exactly duplicate it

    i think it's sufficient to merely show that a controller could corrupt your data
    Reply

Log in

Don't have an account? Sign up now