The write penalty of RAID 5

By | August 2, 2011

Compared to other RAID levels we have a higher write overhead in RAID 5. In this article we will see in some detail why there is a larger “penalty” for writing to RAID 5 disk systems.

RAID 5 disks
In a RAID 5 set with any number of disks we will calculate a parity information for each stripe. See this article on how the RAID 5 parity works. In short, we use the XOR operation on all binary bits on all disks and save the result on the parity disk. For example if we have an eight disk set the actual data is saved on seven disks and parity on the last disk, see picture above.

A disadvantage with RAID 5 is how to write small IOs against the disk system. Even if the write IO will only affect the data on one disk, we still need to calculate the new parity. Since the parity, as explained in the other article, is created by using XOR on all disks this could now be done in two ways. We could either do a read against all the other disks and then XOR with the new information. This would however cause a very large overhead and it is not reasonable to block all other disks for just one write.

There is however a quite clever way to calculate the new parity with a minimum of disk IO.

RAID 5 write

Assume we have the following eight disks and a write should be done at the fifth disk, which should be changed to, say, 1111. (For simplicity we will only look at four bits at each disk, but this could be of any size.)

To get the new parity some actions has to be done. First we read the old data on the blocks that should be changed. We can call this “Disk5-Old” and will be the first IO that must be done. The data that should be written, here 1111, can be called Disk5-New.

Disk5-0ld = 0110
Disk5-New = 1111

We will now use XOR on the old and the new data, to calculate the difference between the old and new. We can call this Disk5-Delta.

Disk5-Delta = Disk5-Old XOR Disk5-New = 0110 XOR 1111 = 1001

When we know the “delta” we will have to commit another read. This is against the old parity. We call this Parity-Old, in this example the old parity is 0010. We will now XOR the old parity with the Disk5-Delta. What is quite interesting is that this will create the new parity, but without the need to read the other six disks.

Parity-New = Parity-Old XOR Disk5-Delta = 0010 XOR 1001 = 1011

When we know the new parity we can write both the new data block and the new parity. This causes two write IOs against the disks and makes up the last of the “penalty”.

So in summary this disk actions that must be done:

1. Read the old data
2. Read the old parity
3. Write the new data
4. Write the new parity

This means that each write against a RAID 5 set causes four IOs against the disks where the first two must be completed before the last two could be performed, which introduces some additional latency.

13 thoughts on “The write penalty of RAID 5

  1. Sudhakar

    Excellent …. Very good explanation.

    Can you please also explain about Raid10/01 and raid6?

    Thanks in advance !!!

    Reply
  2. Casey

    Very good explanation.

    One thing bothers me. I am searching high and low for an answer, and you seem able to provide just that.

    If we do frontend and backend IOPS calculations for RAID 5 based on a write penalty of 4 regardless of how many disks are in the RAID 5 array, then does it mean 2 sets of 3+1 R5 gives the same total performance as 1 set of 7+1 R5?

    Surely they don’t? What are the other factors that we need to take into account in order to get a closer approximation of the estimated performance?

    Reply
    1. Casey

      Oh, nevermind. I found another excellent article at holyhandgrenade.org and managed to figure it out.

      The backend IOPS calculation really has to discount the parity disk (many blogs got this wrong). It also depends on a myriad of other variables, i.e. segment size, stripe width, I/O size, which changes the write penalty which in turn affects the frontend IOPS.

      In my example, with these variables being constant (save for I/O size in the event of full stripe write), 7+1 R5 can outperform 2x 3+1 R5, as it should be.

      Many calculators out there, but it still strikes me as odd that wmarow’s calculator always produces exactly half the value of my calculation results for a few scenarios that I tried.

      Reply
  3. Steve

    “For example if we have an eight disk set the actual data is saved on seven disks and parity on the last disk…”

    Above statement describes RAID4, not RAID5.

    Reply
    1. Rickard Nobel Post author

      Hello Steve,

      and thank you for your comment. In the beginning of the paragraph you are referring to you will find “for each stripe”, which is the RAID5 way. If we would keep the same parity disk for all stripes you are correct that it would be RAID4.

      Regards, Rickard

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *