In this article we will look in some detail how the RAID 5 parity is created and how it is possible to actually “read” from a destroyed disk in a RAID 5 set.
There are many sources on the web about the general principle of RAID 5, so we will not be covering that part here. In short, and as you might know, the RAID level 5 works with any number of disk equal or greater than 3 and places a parity sum on one disks in the set to be able to recover from a disk failure. (Striped blocks with distributed parity.) We can for example combine eight physical disk into a RAID5 set while only consuming the size of one disk for parity information. If any single drive breaks down we would still have full access to the data that was on the destroyed disk.
To understand how this is possible we have to look at the smallest unit, the binary bit, which could be 1 or 0. When doing mathematical calculations in binary we have several so called boolean algebra operations, for example the AND operation and the OR operation.
One of these low level logical operations is used heavily in RAID5: the XOR (“exclusive or”). XOR takes two binary digits and produces a true result if exactly one digit is true, (i.e. the other digit needs to be false).
Value A | Value B | XOR result |
0 | 0 | 0 |
0 | 1 | 1 |
1 | 0 | 1 |
1 | 1 | 0 |
This means that for example 1 XOR 0 = 1, and 1 XOR 1 = 0. Only one binary digits may be 1 for the result to be “true”, that is, 1.
Let us now see how the parity calculations are done in a RAID 5 set using XOR. If we assume we have a small RAID 5 set of four disks and some data is written to it. For simplicity we see only a half byte (4 bits), but the principle is true no matter of the stripe size or the number of disks.
On the first three disks we have the binary information 1010, 1100 and 0011, here representing some data, and we now have to calculate the parity information for the fourth disk.
If looking at the first “column” of the disks to the left we have 1, 1 and 0. If we use XOR to calculate the result that would be:
1 XOR 1 XOR 0 = Parity bit
This could be written as: (1 XOR 1) XOR 0 = Parity bit
This means first 1 XOR 1 = 0 for the first two disks and then the result of that, the zero, against the bit on the third disk. That is, the first result 0 with the last disk, also 0, means 0 XOR 0 = 0, which would give the final result to 0.
For the next “column”, to the right above, we have 0, 1 and 0. We do first 0 XOR 1 = 1 and then this result with the third disk: 1 XOR 0 = 1. The parity bit will here be 1.
For the third column we would have:
1 XOR 0 XOR 1 = Parity
Broken down: 1 XOR 0 = 1 and then 1 XOR 1 = 0
And finally the fourth column:
0 XOR 0 XOR 1 = 1
This will for all four columns end up with the parity sum of 0101.
If any of Disk number 1, 2 or 3 would break the parity information on Disk 4 could be used to recreate the missing data. Let us look how this is done. If we assume that disk number 2 unexpectedly goes down we have lost all read and write access to the real Disk 2, however with the help of the already recorded parity we might be able to calculate the information which is missing.
The primary feature in a RAID5 disk set is to be able to “access” the data on a missing disk. This is done by running the exact same XOR operation over the remaining disks and the parity information. Let us look at the first column again. 1 XOR 0 = 1 (for disk 1 and disk 3) and then 1 XOR 0 (the parity) = 1. This means that there must have been a binary digit of 1 on the missing disk. If we do the same operation on the other columns we will end up with 1100, which is exactly the same data that was on the failed drive.
The XOR operation itself is extremely quick and easily handled by the CPU or RAID controller, but the big downside is that we have to read against ALL other disks to recreate the data on the missing one. If having for example eight disks in the set with one broken, then a single read IO against the missing disk will create seven more disk IOs to calculate the lost data on the fly.
The XOR operation works perfect mathematically with one disk missing, but the moment a second disk is lost then we no longer have enough information to make the calculations. While it is possible to keep using the RAID5 set with one disk missing for some time with degraded performance, it is naturally very good to replace the damaged disk and begin the full re-creation as soon as possible (hot spare is quite handy here).
See also this blog post on the RAID 5 write penalty.
Would love to see a “How RAID6 actually works” from you. Or do you just add another layer with the exactly same parity?
I have plans to add blog posts about details in both RAID 10 and RAID 6, hope to be able to get time to write them down soon. Thanks for your comment!
Thank you very much for this article… Helped me to explain how RAID 5 works in my presentation.
Thanks brother
I am glad to hear that Jules, thanks for your comment.
Just a small question to get it right:
Isn’t it in RAID 5 that the calculated parities are NOT on one disk but rather spread on all disks?
Jules, that is correct, and the main difference between RAID4 and RAID5.
The parities are however stored together in a non default sized “stripe”, which could be for example 128 KB. That means that a certain amount of parity bits will be stored on the same physical disk, and then the next disk will hold the parity and so on.
In the examples above we see a very few binary bits in detail and how they are calculated, they are however assumed to be located together in on “stripe”, but since that is not explicit pointed out I understand your question. ๐
Regards, Rickard
Thanks a lot for the explanation.
Very nice explanation. Thank you for this.
Does this mean then.. that if you removed an SAS drive from a RAID array e.g. Dot Hill AssuredSAN, you would not be able to retrieve data from an individual drive without having it connected with all the other original drives?
I am in this position, and wish to sell the Seagate SAS drives individually but do not know how to wipe them, or even if I need to, before I sell.
Thanks.
No, not quite. RAID 5 doesn’t guarantee that file data will be split among drives. It records data based on cluster size. Some data — especially small chunks of information, such as credit card numbers, and some personally identifiable information — may reside on one drive. There’s more than enough complete information on a single drive from a RAID 5 setup to make it a security risk if sensitive information was on that array.
Upshot: if you’re disposing of a single drive from a RAID-5 setup that contained sensitive data, wipe or physically destroy the drive.
simple and clear explanation..
just to be sure rickard, if i used the raid 5 with just 3 disks does it means that my 1st disk saves 0 2nd disk saves1 and my 3rd disk saves both 0 and 1?
Hello Jien, no, not really. If we assume that the third disk holds the parity for this particular data then it will store a “1”.
Hello Dear Rickard Nobel,
I have read your article about how raid 5 works and now I have few questions. (First, I have quoted some sentences from your article and then I’ve asked my questions)
>> we now have to calculate the parity information for the fourth disk.
– Does only the fourth disk include the parity bits? I read an article somewhere and it’s said that every disks in the RAID 5 set includes parity bits not just fourth disk. I’m really confused about that.
>> If any of Disk number 1, 2 or 3 would break the parity information on Disk 4 could be used to recreate the missing data.
– What if Disk 4 dies? if we loose fourth disk, everything will be gone? bacuase we’ve lost the parity information?!
Thanks.
Reza: I used to have a bit of a problem understanding this as well. The “n”th drive is not a parity drive – the data is stored on all the disks. The parity data is stored on all the disks.
If you have 4 1TB drives, you can have a RAID 5 array of 3TB.
I’ll give eight bits for three drives, and the parity. In my example, we want a parity of 1.
10010110 – Disk 1
11000011 – Disk 2
01011010 – Disk 3
11110000 – Disk 4 [parity]
(I did it deliberately. ๐ )
Even though I labeled them Disk 1 through 4, the order doesn’t matter – the “Disk 4” is the parity stripe, and could be on any of the drives.
The idea here is to have data on N-1 disks, and the last disk is a parity stripe from which, with any N-2 disks, you can recreate a missing disk’s data.
If Disk 4 dies, we can recalculate all the missing data from the other three.
If it died and we replaced it, the system will rebuild the data – – If it was a data stripe, the parity stripe will let us “rebuild” the missing data. If it was a parity stripe, it will recalculate the parity.
RAID 4 has a dedicated parity drive
RAID 5 has the parity spread out so that each drive is being used for parity AND data. (This helps in a reduced system when a drive fails, as only 1/N of the data needs to be rebuilt from parity instead of all the data.)
I hope that clears it up a bit for you (a year and then some later) or for others who also wanted this answered.
What you forget in your example using RAID5 with 4 Disks Parity and Data will ALLWYAYS striped equaly using ALL the Disks.
FIRST BLOCK
Disk1 – DATA
Disk2 – DATA
DISK3 – DATA
DISK4 – Calculated Parity
NEXT BLOCK (2)
Disk1 – Calculated Parity
Disk2 – DATA
DISK3 – DATA
DISK4 – DATA
Block 3
Disk1 – DATA
Disk2 – Calculated Parity
DISK3 – DATA
DISK4 – DATA
Block 4
Disk1 – DATA
Disk2 – DATA
DISK3 – Calclated Party
DISK4 – DATA
So after 4 Blocks it will look Like
DISK 1 – 4
DPDD
DDPD
DDDP
PDDD
DATA PARITY
OK??
Greetings from Germany and thanks for the great article by Rickard.
You are an actual legend Rickard.
Brilliant!
I’ve been looking for an explanation of how RAID 5 parity works with more than 3 disks for a long time. This is by far the best interpretation I’ve found.
Thank you very much ๐
Thank you Sir. I have been looking for this logic for quiet sometime. explained well.
Thank you very much for this articleโฆ
Still I have one doubt on my mind. : How the data stores in degraded RAID 5 . I mean to say ,suppose 4 real physical drives were part of one RAID 5 VD. Assume drive 4 failed unexpectedly . How the data will write to VD now? Is the parity will calculate for data or only data will stores .
Thanks you in advance.
Hello Rakesh,
I will assume that NEW data to a degraded RAID 5 volume will be stored as if the failed disk was actually in place. That is, sometimes the parity will be missing and sometimes the data will be missing. However, when the failed disk is replaced it will be rebuilt using the remaining information and afterwards the RAID5 set will be “complete”.
Thank you Sir, this article helped me a lot.
I would like to know more about stripping and mirroring in initial levels of RAID.
If you could illustrate with an example how binary data is stripped and mirrored in RAID 0 and RAID 1, it would help me a lot .
Looking forward for your new article on Stripping and Mirroring.
Thank You
Hello Likhith,
thank you for your comment.
As for RAID 0 and RAID 1, they are actually quite straight forward.
In RAID 0 the data is just written to each disk in set, but only once and not parity or other security.
So, if you want to store 11110000 00001111 it would (simplified) be:
Disk 1: 11110000
Disk 2: 00001111
For RAID 1 it would just be a exact copy of the data, so if you wanted to store 10101010 it would be (simplified, very small stripe size) :
Disk 1: 10101010
Disk 2: 10101010
Thank you for your reply Sir,
But if you could explain it in more detail how exactly stripping happens, what is that algorithm which is dividing the data and then storing it among disk.
unlike how you explained how data is regenerated using parity in RAID 5, similarly please explain how stripping happens when a byte of data is fed into the RAID 1.
thank you
Hello Likhith,
the actual distribution depends on something called “stripe size” which is the amount of data written to the same disk. This size is not a standard but will differ depending on the actual RAID implementation.
Thank You so much for sharing your knowledge Sir. it helped me a lot .
This is by far the best explanation for raid 5 that I’ve come across. Does using a hot spare with a raid 5 array increase the tolerance to 2 disk failures considering the data of the 1st failed drive is already rebuilt to my hot spare and the 2nd failed drive will not result in any data loss ?
Hello Hari,
no, having a spare disk could not be said to truly increase the fault tolerance to two disks. In a “true” 2-disk-fault-tolerance system, like RAID6, any two disks could break at the same second and the volume would still be available.
This would not be possible with RAID5 with spare disk since the spare disk has to be rebuilt before another drive could fail. If however you have your RAID5 + spare and one disk did fail, the spare was rebuilt and the next day a second disk would fail – then the volume would still work. BUT – not guaranteed – if the failures come close in time.
So, can we say that the parity disk actually holds the data which is calculated from other disks during each block striping?