RICKARD NOBEL AB

RICKARD NOBEL AB

Specialists in IT infrastructure services

Menu
  • About
  • Windows
  • Networking
  • VMware
  • Storage
Menu

The write penalty of RAID 5

Posted on August 2, 2011May 7, 2013 by Rickard Nobel

Compared to other RAID levels we have a higher write overhead in RAID 5. In this article we will see in some detail why there is a larger “penalty” for writing to RAID 5 disk systems.

RAID 5 disks
In a RAID 5 set with any number of disks we will calculate a parity information for each stripe. See this article on how the RAID 5 parity works. In short, we use the XOR operation on all binary bits on all disks and save the result on the parity disk. For example if we have an eight disk set the actual data is saved on seven disks and parity on the last disk, see picture above.

A disadvantage with RAID 5 is how to write small IOs against the disk system. Even if the write IO will only affect the data on one disk, we still need to calculate the new parity. Since the parity, as explained in the other article, is created by using XOR on all disks this could now be done in two ways. We could either do a read against all the other disks and then XOR with the new information. This would however cause a very large overhead and it is not reasonable to block all other disks for just one write.

There is however a quite clever way to calculate the new parity with a minimum of disk IO.

RAID 5 write

Assume we have the following eight disks and a write should be done at the fifth disk, which should be changed to, say, 1111. (For simplicity we will only look at four bits at each disk, but this could be of any size.)

To get the new parity some actions has to be done. First we read the old data on the blocks that should be changed. We can call this “Disk5-Old” and will be the first IO that must be done. The data that should be written, here 1111, can be called Disk5-New.

Disk5-0ld = 0110
Disk5-New = 1111

We will now use XOR on the old and the new data, to calculate the difference between the old and new. We can call this Disk5-Delta.

Disk5-Delta = Disk5-Old XOR Disk5-New = 0110 XOR 1111 = 1001

When we know the “delta” we will have to commit another read. This is against the old parity. We call this Parity-Old, in this example the old parity is 0010. We will now XOR the old parity with the Disk5-Delta. What is quite interesting is that this will create the new parity, but without the need to read the other six disks.

Parity-New = Parity-Old XOR Disk5-Delta = 0010 XOR 1001 = 1011

When we know the new parity we can write both the new data block and the new parity. This causes two write IOs against the disks and makes up the last of the “penalty”.

So in summary this disk actions that must be done:

1. Read the old data
2. Read the old parity
3. Write the new data
4. Write the new parity

This means that each write against a RAID 5 set causes four IOs against the disks where the first two must be completed before the last two could be performed, which introduces some additional latency.

21 thoughts on “The write penalty of RAID 5”

  1. Vivek Singh says:
    March 25, 2013 at 11:44

    Excellent explanation!

    Thanks a lot,

    Vivek

    Reply
  2. Saravanan says:
    April 7, 2013 at 16:47

    Excellent and very clear operation about Raid 5 !!!

    Reply
  3. Samit Mamdol says:
    June 29, 2013 at 09:48

    Really a very good explanation.

    Reply
  4. Lim BT says:
    October 4, 2013 at 08:35

    Excellent. Good explanation.

    Reply
  5. Muditha says:
    October 6, 2013 at 10:49

    What a great explanation .. Thanks a lot!

    Reply
  6. Sudhakar says:
    October 22, 2013 at 11:29

    Excellent …. Very good explanation.

    Can you please also explain about Raid10/01 and raid6?

    Thanks in advance !!!

    Reply
  7. Casey says:
    March 31, 2014 at 18:12

    Very good explanation.

    One thing bothers me. I am searching high and low for an answer, and you seem able to provide just that.

    If we do frontend and backend IOPS calculations for RAID 5 based on a write penalty of 4 regardless of how many disks are in the RAID 5 array, then does it mean 2 sets of 3+1 R5 gives the same total performance as 1 set of 7+1 R5?

    Surely they don’t? What are the other factors that we need to take into account in order to get a closer approximation of the estimated performance?

    Reply
    1. Casey says:
      April 1, 2014 at 18:46

      Oh, nevermind. I found another excellent article at holyhandgrenade.org and managed to figure it out.

      The backend IOPS calculation really has to discount the parity disk (many blogs got this wrong). It also depends on a myriad of other variables, i.e. segment size, stripe width, I/O size, which changes the write penalty which in turn affects the frontend IOPS.

      In my example, with these variables being constant (save for I/O size in the event of full stripe write), 7+1 R5 can outperform 2x 3+1 R5, as it should be.

      Many calculators out there, but it still strikes me as odd that wmarow’s calculator always produces exactly half the value of my calculation results for a few scenarios that I tried.

      Reply
  8. Aashutosh says:
    June 4, 2015 at 14:11

    Very well explained.

    Reply
  9. hassan says:
    November 14, 2015 at 11:02

    Excellent …it was not written like that in Storage documents…

    Reply
  10. Vick says:
    February 23, 2016 at 13:08

    Your excellent article was pirated by EMC (iEMC APJ, an EMC team), claimed by Fenglin Li to be the author, at the location https://community.emc.com/thread/221272. Almost simply a copy from your original. That Chinese pirate also made Chinese translation https://community.emc.com/docs/DOC-26624.

    I am as Chinese so shamed.

    Reply
  11. Steve says:
    March 13, 2016 at 19:13

    “For example if we have an eight disk set the actual data is saved on seven disks and parity on the last disk…”

    Above statement describes RAID4, not RAID5.

    Reply
    1. Rickard Nobel says:
      March 13, 2016 at 20:05

      Hello Steve,

      and thank you for your comment. In the beginning of the paragraph you are referring to you will find “for each stripe”, which is the RAID5 way. If we would keep the same parity disk for all stripes you are correct that it would be RAID4.

      Regards, Rickard

      Reply
  12. Pingback: raid performance » trivia
  13. Markus says:
    June 21, 2016 at 09:41

    This is just excellent, thank you very much! For the first time in years of administration (shame) I fully understand! Now I should also be able to explain RAID to my colleagues in a better and more correct way.

    Regards, Markus

    Reply
  14. Kiss Tibor says:
    November 9, 2016 at 21:32

    Great post, I linked on my blog! 🙂

    Reply
  15. Pingback: RAID5 penalty | VMware Admin's Blog
  16. Miguel Uriel Méndez Monterrosas says:
    November 28, 2016 at 09:11

    I really like it, nice explanation a very clearly :D, grettings from México, your lovely friend Miguel.

    Reply
    1. Rickard Nobel says:
      December 1, 2016 at 08:47

      Well, thank you, Miguel. 🙂

      Reply
  17. shalom says:
    January 18, 2017 at 10:02

    A very nice explanation Richard. Totally liked the way you’ve detailed it here

    Reply
  18. Elliott Balsley says:
    April 20, 2018 at 01:24

    You have explained here why the RAID5 write penalty when *changing* blocks. But what if I am only writing new data to the drive. For example writing a very large single file. Then for each block, the controller should first calculate the parity, then write the data and the parity. No need to read any existing blocks. Then the write penalty would be minimal. Am I missing something?

    Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recent Posts

  • Verify NTP connectivity in Windows
  • The Ethertype value, part 1
  • Password strength part 1, the mathematical basics
  • MS16-072 breaks Group Policy
  • ESXi virtual machine network statistics
  • Determine the Zeroed status of Thick Lazy disk
  • Eager thick vs Lazy thick disk performance

Contact

Categories

  • Networking
  • Storage
  • VMware
  • Windows

Recent Comments

  • Rickard Nobel on VMXNET3 vs E1000E and E1000 – part 1
  • cees vos on VMXNET3 vs E1000E and E1000 – part 1
  • Filipi Souza on Storage performance: IOPS, latency and throughput
  • Backup vs RAID - Web Hosting on How RAID 5 actually works
  • Stephen on Password strength part 1, the mathematical basics
©2021 RICKARD NOBEL AB