RAID 5
For the final popular RAID configuration, we jump to the number 5 for RAID 5. This setup is typically found in higher-end RAID cards only because it requires extra hardware to work properly.
RAID 5 requires at least 3 drives and attempts to combine the speed of striping with the reliability of mirroring. This is done by striping the data across two drives in the array at a user defined stripe size. The 3rd drive in the array, the one not getting striped data, is given a parity bit. A parity bit is generated from the original file using an algorithm to produce data that can recreate the information stored on both drives that received the striped data.
The two drives receiving the striped data and the one receiving the parity bit are constantly changing. For example, if drives 1 and 2 receive striped data on a write and drive 3 receives a parity bit, on the next write drives 2 and 3 will receive the striped data and drive 1 will receive the parity bit. The shifting continues and eliminates the random write performance hit that comes with a dedicated drive receiving the parity information.
Image courtesy of Promise Technology, Inc.
The parity information is typically calculated on the RAID controller itself, and thus these types of controllers are called hardware RAID controllers since they require a special chip to make the parity information and decide what drive to send it to.
RAID 5 arrays are said to provide a balance between RAID 0 and RAID 1 configurations. With RAID 5, some of the features of striping are in place as well as the features of mirroring. Thanks to the parity bit, if information is lost on one of the three drives in the array, it can be rebuilt. Thanks to the striping it uses to break up the data and send it to multiple drives, aspects of speed from RAID 0 are present.
Recreation works in the following manner. Let's use a 3 drive RAID 5 array with a 64KB stripe size for an example with a 128KB file needed to be written. First, a parity bit is created for the file that the controller card has received by performing an XOR calculation on the data. Next, the 128KB file is broken into two 64KB files, one of which is sent to drive 1 and the other to drive 2. Finally, the parity information calculated above is written to the third drive in the array.
Now, if one of the drives in the array goes bad and our 128KB file is lost, the data can be recreated. It does not matter which drive fails: all the data is still available. If the third drive in the above example, the one that received the parity information for this write, fails then the originally data can be read off of drives 1 and 2 to recreate the parity information. If either drive 1 or drive 2 fails, then the parity information stored on drive 3 can be used to recreate the information lost on the original drive.
Not all is good with RAID 5, however. Due to the parity bit that must be calculated and written to on each drive, there is overhead. This is especially present when changing only one piece of information on one drive in the array. During this process, not only does the information that requires changing require writing but the parity bit must also be recreated. This means that once the data is written, both drives with the stripe blocks on them must be read, a new parity bit be calculated, and then the new parity bit has to be written to the third drive. This problem only increases as additional drives are added to the array.
For the same reasons mentioned in both the RAID 0 and RAID 1 discussions it is best to use identical drives for a RAID 5 setup. Not only does this ensure speed it also ensures that all of the array's storage capacity is utilized. The size of a RAID 5 array is equal to the size of the smallest drive times the number of drives in the array minus one (since one of the drives is always getting a parity bit).
RAID 5 does provide a good balance between speed and reliability and is a popular configuration for arrays in a variety of systems, from serves to workstations. The data security made possible with the parity bit as well as the speed and space provided by RAID 5 have many high-end system builders turning to RAID 5.
2 Comments
View All Comments
kburrows - Thursday, December 4, 2003 - link
Have you run any tests on any onboard RAID solutions for RAID 0 & 1? I would love to see the results posted for the new SATA RAID on the Intel 875 boards.Anonymous User - Sunday, August 17, 2003 - link
In adressing the performance of an raid array with different stripe sizes, you miss an important factor, namely the accestime of an disk. This wait time has two main couses. First the head positioning and second the rotational latency (the heads track the right trace, but position where the read start has not passed under the head). You may have to wait from 0 to (in the worst case) a full cycle.Since the disks move independently You can calculate that the average latency to get an small file is minimal when the stripe size is about an full cycle of an disk in the array (aprox. 250kB today). All other factors I do know do not reduce this. (controller overhead, transport,...)
So I think that today a minimum stripe size of 256kB should be used.