Compared to other RAID levels we have a higher write overhead in RAID 5. In this article we will see in some detail why there is a larger “penalty” for writing to RAID 5 disk systems.
In a RAID 5 set with any number of disks we will calculate a parity information for each stripe. See this article on how the RAID 5 parity works. In short, we use the XOR operation on all binary bits on all disks and save the result on the parity disk. For example if we have an eight disk set the actual data is saved on seven disks and parity on the last disk, see picture above.
A disadvantage with RAID 5 is how to write small IOs against the disk system. Even if the write IO will only affect the data on one disk, we still need to calculate the new parity. Since the parity, as explained in the other article, is created by using XOR on all disks this could now be done in two ways. We could either do a read against all the other disks and then XOR with the new information. This would however cause a very large overhead and it is not reasonable to block all other disks for just one write.
There is however a quite clever way to calculate the new parity with a minimum of disk IO.
Assume we have the following eight disks and a write should be done at the fifth disk, which should be changed to, say, 1111. (For simplicity we will only look at four bits at each disk, but this could be of any size.)
To get the new parity some actions has to be done. First we read the old data on the blocks that should be changed. We can call this “Disk5-Old” and will be the first IO that must be done. The data that should be written, here 1111, can be called Disk5-New.
Disk5-0ld = 0110
Disk5-New = 1111
We will now use XOR on the old and the new data, to calculate the difference between the old and new. We can call this Disk5-Delta.
Disk5-Delta = Disk5-Old XOR Disk5-New = 0110 XOR 1111 = 1001
When we know the “delta” we will have to commit another read. This is against the old parity. We call this Parity-Old, in this example the old parity is 0010. We will now XOR the old parity with the Disk5-Delta. What is quite interesting is that this will create the new parity, but without the need to read the other six disks.
Parity-New = Parity-Old XOR Disk5-Delta = 0010 XOR 1001 = 1011
When we know the new parity we can write both the new data block and the new parity. This causes two write IOs against the disks and makes up the last of the “penalty”.
So in summary this disk actions that must be done:
1. Read the old data
2. Read the old parity
3. Write the new data
4. Write the new parity
This means that each write against a RAID 5 set causes four IOs against the disks where the first two must be completed before the last two could be performed, which introduces some additional latency.
Excellent explanation!
Thanks a lot,
Vivek
Excellent and very clear operation about Raid 5 !!!
Really a very good explanation.
Excellent. Good explanation.
What a great explanation .. Thanks a lot!
Excellent …. Very good explanation.
Can you please also explain about Raid10/01 and raid6?
Thanks in advance !!!
Very good explanation.
One thing bothers me. I am searching high and low for an answer, and you seem able to provide just that.
If we do frontend and backend IOPS calculations for RAID 5 based on a write penalty of 4 regardless of how many disks are in the RAID 5 array, then does it mean 2 sets of 3+1 R5 gives the same total performance as 1 set of 7+1 R5?
Surely they don’t? What are the other factors that we need to take into account in order to get a closer approximation of the estimated performance?
Oh, nevermind. I found another excellent article at holyhandgrenade.org and managed to figure it out.
The backend IOPS calculation really has to discount the parity disk (many blogs got this wrong). It also depends on a myriad of other variables, i.e. segment size, stripe width, I/O size, which changes the write penalty which in turn affects the frontend IOPS.
In my example, with these variables being constant (save for I/O size in the event of full stripe write), 7+1 R5 can outperform 2x 3+1 R5, as it should be.
Many calculators out there, but it still strikes me as odd that wmarow’s calculator always produces exactly half the value of my calculation results for a few scenarios that I tried.
Very well explained.
Excellent …it was not written like that in Storage documents…
Your excellent article was pirated by EMC (iEMC APJ, an EMC team), claimed by Fenglin Li to be the author, at the location https://community.emc.com/thread/221272. Almost simply a copy from your original. That Chinese pirate also made Chinese translation https://community.emc.com/docs/DOC-26624.
I am as Chinese so shamed.
“For example if we have an eight disk set the actual data is saved on seven disks and parity on the last disk…”
Above statement describes RAID4, not RAID5.
Hello Steve,
and thank you for your comment. In the beginning of the paragraph you are referring to you will find “for each stripe”, which is the RAID5 way. If we would keep the same parity disk for all stripes you are correct that it would be RAID4.
Regards, Rickard
This is just excellent, thank you very much! For the first time in years of administration (shame) I fully understand! Now I should also be able to explain RAID to my colleagues in a better and more correct way.
Regards, Markus
Great post, I linked on my blog! 🙂
I really like it, nice explanation a very clearly :D, grettings from México, your lovely friend Miguel.
Well, thank you, Miguel. 🙂
A very nice explanation Richard. Totally liked the way you’ve detailed it here
You have explained here why the RAID5 write penalty when *changing* blocks. But what if I am only writing new data to the drive. For example writing a very large single file. Then for each block, the controller should first calculate the parity, then write the data and the parity. No need to read any existing blocks. Then the write penalty would be minimal. Am I missing something?