RAID Primer: What's in a number?

Name: RAID Primer: What's in a number?
Item: RAID Primer: What's in a number?
Author: Dave Robinet

by Dave Robinet on September 7, 2007 12:00 PM EST

Posted in
Storage

41 Comments | Add A Comment

41 Comments

Introduction

The majority of home users have experienced the agony of at least one hard drive failure in their lives. Power users often experience bottlenecks caused by their hard drives when they try and accomplish I/O-intensive tasks. Every IT person who has been in industry for any length of time has dealt with multiple hard drive failures. In short, hard drives have long caused the majority of support headaches in standard desktop or server configurations today, with little hope of improvement in the near term.

With the increased use of computers in the daily lives of people worldwide, the dollar value of data stored on the average computer has steadily increased. Even as MTBF figures have moved from 8000 hours in the 1980s (example: MiniScribe M2006) to the current levels of over 750,000 hours (Seagate 7200.11 series drives), this increase in data value has offset the relative decrease of hard drive failures. The increase in the value of data, and the general unwillingness of most casual users to back up their hard drive contents on a regular basis, has put increasing focus on technologies which can help users to survive a hard drive failure. RAID (Redundant Array of Inexpensive Disks) is one of these technologies.

Drawing on whitepapers produced in the late 1970s, the term RAID was coined in 1987 by researchers at the University of California, Berkley in an effort to put in practice theoretical gains in performance and redundancy which could be made by teaming multiple hard drives in a single configuration. While their paper proposed certain levels of RAID, the practical needs of the IT industry have brought several slightly differing approaches. Most common now are:

RAID 0 - Data Striping
RAID 1 - Data Mirroring
RAID 5 - Data Striping with Parity
RAID 6 - Data Striping with Redundant Parity
RAID 0+1 - Data Striping with a Mirrored Copy

Each of these RAID configurations has its own benefits and drawbacks, and is targeted for specific applications. In this article we'll go over each and discuss in which situations RAID can potentially help - or harm - you as a user.

RAID 0 and RAID 1

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

41 Comments

View All Comments

munim - Friday, September 7, 2007 - link
I don't understand what the parity portion is in RAID 5, anyone care to explain?
drebo - Friday, September 7, 2007 - link
Simply: It's a peice of data that the RAID controller can use to calculate the value of either of the other pieces of data in the chunk in the event of a disk failure.

Example: You have a three disk RAID 5 array. A file gets written in two pieces. Piece A gets written to disk one. Piece B gets written to disk two. The parity between the two is then generated and written to disk three. If disk one dies, the RAID controller can use Piece B and the parity data to generate what would have been Piece A. If disk two dies, the controller can generate Piece B. If disk three dies, the controller still has the original pieces of data. Thus, any single disk can fail before any data loss can occur.

RAID 5 is what is known as a distributed parity system, so the disk that holds parity alternated with each write. If a second file is written in the above example, disk one would get Piece A, disk two would get the parity, and disk three would get Piece B. This ensures that regardless of which disk dies, you always have two of the three pieces of data, which is all you need to get the original.
Zan Lynx - Tuesday, September 11, 2007 - link
The reason RAID-5 uses distributed parity is to balance the disk accesses.

Most RAID-5 controllers do not read the parity data except during verification operations or when the array is degraded.

By rotating the parity blocks between disks 1-3, read operations can use all three disks instead of having a parity-only disk which is ignored by all reads.
ChronoReverse - Friday, September 7, 2007 - link
I understand how the fault tolerance in the best case is half the drives in the 1+0 scenario, but that's still not worse than the RAID 5 scenario where you can't lose more than 1 drive no matter what.

So why was RAID 5 given a "Good" description while RAID 10/01 given a "Minimal" description?
drebo - Friday, September 7, 2007 - link
RAID 0+1 (also known as a mirror of stripes) turns into a straight RAID 0 after one disk dies. The only way it will support a two disk failure is if disks on the same leg of the mirror die. If one on each side dies, you lose everything. After one disk failure, you lose all remaining fault tolerance. RAID 10 (or a stripe of mirrors) will sustain two disk failures if the disks are on different legs of the array. If it loses both disks on a single leg, you lose everything. Thus, it is far more likely that you'll lose the wrong two disks.

In a RAID 5 array, any single disk can be lost and you'll not lose anything--its position is irrelevant. Not only that, but a RAID 5 has better disk-to-storage efficiency (nx-1) when compared to RAID 0+1 and RAID 10 (nx/2). It's also less expensive to implement.

Overall, RAID 5 is one of the best fault tolerant features you can put into a system. RAID 6 is better, but it is also much more expensive.
ChronoReverse - Saturday, September 8, 2007 - link
So in RAID 5 you can lose any ONE drive.

In RAID 1+0/0+1, you can also lose any ONE drive.

In RAID 5 you CANNOT lose a second drive.

In RAID 1+0/0+1, there's a CHANCE you might survive a second drive failure.

Therefore, RAID 5 is more fault-tolerant.
Brovane - Saturday, September 8, 2007 - link
Yes apparently to some people. Also one of the big bonus to Raid 0+1 is that you lose a drive and you do not suffer any performance degradation unlike RAID5 which until the RAID is rebuilt after a drive failure you take a big performance hit. If you are running a Exchange Cluster you cannot afford to take this performance hit during the middle of a busy work day unless you really do not like your helpdesk people. I think the one argument you could make is that a RAID 0+1 has more drives in a array to offer the same amount of storage as a RAID 5 volume so maybe you could make the statistical argument that the RAID 0+1 could be less fault tolerant. However to me this seems very tenuous.
Dave Robinet - Saturday, September 8, 2007 - link
Good comments, and thanks for reading.

You are, however, taking things a little out of context. Take a 6 drive configuration, for example. If you do a RAID 5 with four drives and two hotspares, you'll end up with the same usable capacity as a 6 disk RAID 0+1 - but with the "ability" to lose 3 drives.

Your comment about rebuilding is, however, completely backwards. You spend FAR more time rebuilding the mirrored set of a RAID 0+1 after a failed disk, because you need to rebuild the entire mirrored portion of array once again (since data has presumably changed, there's no parity, etc). (Don't believe me? Try it. ;)

Your general observation about being able to lose one disk in one or the other configuration is correct. You do need to compare apples-to-apples - a RAID 5 will offer you far more capacity if you build the same number of drives as in a 0+1, and a 0+1 will give you more performance. Apples-to-apples, though, you're going to get better redundancy OPTIONS out of the additional RAID 5 flexibility than you will with a 0+1.

Again, though, good points, and thanks again for reading.

dave
ChronoReverse - Sunday, September 9, 2007 - link
What do you mean by hotspares? Do you mean a drive that you can swap in immediately after a drive fails? If that's the case, while in terms of economics it might be three "spares" in terms of actual data redundancy, it's still just one drive. If a drive fails while the spare is filling, you've still lost data.

In any case, it's quite clear that RAID will certainly give you more capacity, but the question was about the comment about data redundancy. I'll have to specify that it means to me how safe my data is in terms of absolute drive failures.
Brovane - Monday, September 10, 2007 - link
If you are using a SAN (Storage Area Network) were you might have say over 100+ disks were you build storage groups to assign to your servers that are connected into the SAN. This SAN will usually have various combinations of RAID 0,1,1_0,5. In this SAN you might have various types of disks says 300GB 10K FC and 146GB 15K FC Disks. You will keep a couple of these disks as hot spare. If say at 2AM the SAN detects in one of your RAID 1_0 Storage Groups that a disk has failed it will grab one of the hot spare disks and start re-building the RAID. The SAN will usually also send a alert of the various support teams that this has happened so the bad disk can be replaced. The SAN doesn't care where the hot spare is plugged into the SAN.

The biggest issue that I see as a ding against RAID 5 vs RAID 0+1 is the performance hit when a drive fails in a RAID 5. With a RAID 0+1 you suffer no performance hit when a drive fails because there is no parity rebuilding. With RAID 5 you can take a good performance hit until the RAID is rebuilt because of the parity calculation. Also with a SAN setup you can mirror a RAID 0+1 between physical DAE so the storage group will still stay up in the unlikely event of a complete DAE failure. Also even though in a RAID 0+1 you will have to rebuild the complete disk in the event of a drive failure with 15K RPM and 4GB FC backbone on the SAN this happens faster than you would think even when dealing with 500GB volumes. if you very concerned about losing another disk before the rebuild is complete you could use SnapView on your SAN to take a SnapShot of the disk and copy this data to your backup volume.

RAID Primer: What's in a number?

Post Your Comment

41 Comments

View All Comments

munim - Friday, September 7, 2007 - link

drebo - Friday, September 7, 2007 - link

Zan Lynx - Tuesday, September 11, 2007 - link

ChronoReverse - Friday, September 7, 2007 - link

drebo - Friday, September 7, 2007 - link

ChronoReverse - Saturday, September 8, 2007 - link

Brovane - Saturday, September 8, 2007 - link

Dave Robinet - Saturday, September 8, 2007 - link

ChronoReverse - Sunday, September 9, 2007 - link

Brovane - Monday, September 10, 2007 - link

Log in

Don't have an account? Sign up now