Software RAID – Linux

Managing your server with Software RAID is generally not too difficult. Typically and more commonly, we recommend hardware RAID (generally with a MegaCLI controller), however, if you are utilizing software RAID, it’s important to understand some fundamentals. If you are here in reference to seeing failed or missing drive(s) on your server or from your myvelocity panel, feel free to skip to (4) checking / recovery section.

Contents hide

1) RAID Basics

2) RAID Types

3) Setup Software RAID

4) Checking / Recovering Software RAID

5) Replacing a Bad Drive from an Existing RAID Array

1) RAID Basics

In brief, RAID is technology that allows you to build a virtual drive from two or more physical disks, whether it be for added performance and/or added redundancy.

Inevitably, server hard drives fail. Regardless of how well built they are or how highly rated, every drive will fail at some point in time. The problem is we never know when a drive will fail and sometimes it can happen instantly or faster than SMART reporting/monitoring can detect. What we do know is that a failure will come and to help combat this, it is common practice to run mirrored drives, referred to as RAID1 or another RAID variety (1, 5, 6, 10) that offers redundancy in some form.

For Software RAID, our most common setup will be a mirror of two drives (RAID1) so that if one drive fails, the system remains online with the other until it can be scheduled for replacement. We will use RAID1 of two drives for our example.

2) RAID Types

RAID0 – Data is striped across included drives for increasing performance. There is no redundancy with RAID0, if one drive fails in a RAID0 array, your data is likely unrecoverable

RAID1 – Most common implementation for redundancy and minor increased read performance. Typically this is run with 2x drives, each drive is a mirror of one another.

RAID5 – Less common in our experience but can be a good solution, requires 3 or more drives to implement. Data is striped across all drives but with an added parity block for redundancy. If for example running a 3x drive RAID5, you can lose one drive without data loss.

RAID6 – Similar to RAID5 but with a minimum of 4 disk requirement as there are two parity blocks used for redundancy with the ability to recover from 2 drives failures in a 4 drive array

RAID10 – If running 4 or more drives, this is typically our recommended configuration. RAID10 is a combination of striping/mirroring data across a minimum 4 or more drive array. You can lose up to two drives depending on which drives. Since this is a stripe of mirrors, you can lose up to one of each mirror drive essentially in a 4 drive array.

3) Setup Software RAID

Most ideally, Software RAID would be completed during server provisioning or upon requesting a server reload. However, if your server was provisioned with two identical secondary drives that you are wanting to dedicate to additional mirrored (RAID1) or striped (RAID0) storage, you may follow this section during your setup process to accomplish.

This is written for 2 drive mirrored setup, referred to as RAID1

Install mdadm:
yum install mdadm
Locate your additional unused disks and notate the device mounts (Resemble: /dev/sdX):
lsblk
# or
fdisk -l
# or
blkid
Examine the disks for any existing raid data (Example drives: /dev/sdb and /dev/sdc):
mdadm -E /dev/sdb
mdadm -E /dev/sdc
Create partition(s) on each drive (Ensure they are identical setups):
fdisk /dev/sdb
fdisk /dev/sdc
Recheck the drives with above tools to ensure setup properly (lsblk, fdisk):
lsblk
# or
fdisk -l
Create RAID1 device:
mdadm –create /dev/md0 –level=1 –raid-devices=2 /dev/sdb1 /dev/sdc1
Check the RAID device status:
mdadm –details /dev/md0
Create file system on the RAID Device:
mfs.xfs /dev/md0
Mount the device:
mkdir /raidstorage
mount /dev/md0 /raidstorage
Test the mount point:
cd /raidstorage
touch newfile.txt
echo “Test” > /raidstorage/newfile.txt
cat /raidstorage/newfile.txt
Auto-mount (If you wish for the additional raid device to mount on startup).
Make a backup of your fstab file:
cp /etc/fstab /root/fstab_backup
Check to ensure the backup and current fstab are identical:
cat /etc/fstab
cat /root/fstab_backup
diff /etc/fstab /root/fstab_backup
Add below line to your /etc/fstab file (You can use below or a text editor such as nano or vi directly):
echo ‘/dev/md0 /raidstorage xfs defaults 0 0’ >> /etc/fstab
Check to ensure it looks good:
cat /etc/fstab
If everything looks great, try a reboot test when convenient:
reboot now

4) Checking / Recovering Software RAID

Suppose you have two drives and you are finding one or a single partition is missing. This could be either total drive failure or perhaps the drive or partition has fallen out of sync. Falling out of sync can happen, particularly under a heavy sustained load or if the drive is developing hardware problems.

To get a better idea of what disks we have physically attached the system, a great tool you may utilize is: lsblk

[root@test ~]# lsblk
NAME              MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sda                 8:0    0 447.1G  0 disk  
├─sda1              8:1    0     1G  0 part  
│ └─md0             9:0    0  1022M  0 raid1 /boot
├─sda2              8:2    0     4G  0 part  
│ └─md1             9:1    0     4G  0 raid1 [SWAP]
└─sda3              8:3    0 442.1G  0 part  
  └─md2             9:2    0   442G  0 raid1 
    └─vg0-lv_root 253:0    0   442G  0 lvm   /
sdb                 8:16   0 447.1G  0 disk  
├─sdb1              8:17   0     1G  0 part  
│ └─md0             9:0    0  1022M  0 raid1 /boot
├─sdb2              8:18   0     4G  0 part  
│ └─md1             9:1    0     4G  0 raid1 [SWAP]
└─sdb3              8:19   0 442.1G  0 part

We can see from above example, we do appear to have both disks (sda and sdb) for our 2 drive RAID1 example, but we should expect to see sdb3 mounted with a vg on / path, but all we see is part.

NOTE – Issuing a support ticket: If you are seeing a missing physical drive here, it is possible the drive has failed or perhaps the connection with the drive is loose. At this point we would likely recommend issuing a support ticket. We recommend ensuring that you have up to date backups and to provide a best timeframe you can afford a 1-4 hour maintenance window for further inspection.

Proceeding with our example, to check this RAID further, we can locate virtual drives using below:

[root@test ~]# cat /proc/mdstat
Personalities : [raid1] 
md0 : active raid1 sdb1[1] sda1[0]
      1046528 blocks super 1.2 [2/2] [UU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md2 : active raid1 sda3[0]
      463474688 blocks super 1.2 [2/1] [U_]
      bitmap: 3/4 pages [12KB], 65536KB chunk

md1 : active raid1 sda2[0] sdb2[1]
      4189184 blocks super 1.2 [2/2] [UU]
      
unused devices:

Here again, for md2, we only find sda3, no sdb3

For additional insight, we can use mdadm –detail

[root@test ~]# mdadm --detail /dev/md2
/dev/md2:
           Version : 1.2
     Creation Time : Thu Apr  1 11:28:15 2021
        Raid Level : raid1
        Array Size : 463474688 (442.00 GiB 474.60 GB)
     Used Dev Size : 463474688 (442.00 GiB 474.60 GB)
      Raid Devices : 2
     Total Devices : 1
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Mon Apr 19 15:04:40 2021
             State : clean, degraded 
    Active Devices : 1
   Working Devices : 1
    Failed Devices : 0
     Spare Devices : 0

Consistency Policy : bitmap

              Name : test.raid1.com:2  (local to host test.raid1.com)
              UUID : 957b2d7f:0dc2cda2:d55bc5e1:ae4abc41
            Events : 309108

    Number   Major   Minor   RaidDevice State
       0       8        3        0      active sync   /dev/sda3
       -       0        0        1      removed

We can see the the array is in clean but degraded state and only /dev/sda3 is showing attached (No /dev/sdb3).

In this case, since the drive is still attached to the system – just removed from the array, we can re-add the partition using below as well and recheck the above –detail command periodically if needed to check for updates.

NOTE: Before applying a re-add or if having to re-add more often than reasonably, it would be recommended to check the drives SMART health. If the drive is reporting bad health or issues, it would be recommended to schedule a 1-4 hour maintenance window with support (Details above).

Check SMART health using smartctl -a /dev/{device}

[root@test ~]# smartctl -a /dev/sdb
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.24.1.el7.x86_64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     KINGSTON SEDC500R480G
Serial Number:    500xxxxxxxxx60CE
LU WWN Device Id: 5 0026b7 683ff60ce
Firmware Version: SCEKJ2.7
User Capacity:    480,103,981,056 bytes [480 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-3 (minor revision not indicated)
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Apr 22 09:37:28 2021 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
.
.
.
SMART Error Log Version: 1
No Errors Logged
.
.
.

This is a large output with a lot of details about the drive but above is sort of what we usually look for – Passing SMART overall and whether or not errors are logged. Having some errors does not necessarily mean there is drive failure but failing SMART overall can be a good indicator. Diving into the details further can tell you things like power on hours, power cycle counts, wear out indicator and much more.

If all looks well, you can try to re-add drive to array

mdadm --manage /dev/md2 --re-add /dev/sdb3

If the drive was replaced by our team, we can usually assist you with re-adding the replaced drive. If you prefer to perform this yourself, you would want to partition the replaced disk to match exactly your existing. You can use utilities such as fdisk / parted to print partition layout of existing and to then create matching partitions on your replacement disk.

You may then need to remove the failed device:

mdadm /dev/md2 -r /dev/sdb3

Partition the new drive with matching partitions, then re-add them:

mdadm --manage /dev/md2 --re-add /dev/sdb1
mdadm --manage /dev/md2 --re-add /dev/sdb2
mdadm --manage /dev/md2 --re-add /dev/sdb3

5) Replacing a Bad Drive from an Existing RAID Array

If one of the drives in your array has failed and requires replacement, the following will allow you to mark the drive as failed, remove it properly, and allow for proper replication of the partitions on the old disk so that they can be added back into the RAID array.

In the case of a verified drive failure, perform the following two commands to mark the drive as failed and to view the current status to confirm our actions have taken place.
mdadm –fail /dev/md# /dev/sda#
# or
mdadm –manage /dev/md# –fail /dev/sda#
/proc/mdstat
Remove the failed drive using the following command:
mdadm –remove /dev/md# /dev/sda#
# or
mdadm –manage /dev/md# –remove /dev/sda#
If entries using the partition’s UUID have been made in the /etc/fstab, ensure to remove/comment them out if necessary.
Now that the bad drive has been marked as bad and removed from the RAID array, we need to copy the partition table that was on the bad drive to the new disk.
Important! /dev/sda here is the source drive we use to copy the partition table FROM. The /dev/sdb drive is the one that the partition table will copied into [The new drive].

sudo sfdisk -d /dev/sda | sudo sfdisk /dev/sdb
Add the partition to the array. *Note: you will need to make sure which array (md#) has the partition you are going to add to. For example, if md5 had sda3 with 100GB and sdb3 has a partition of the same size (100GB) that we know is of the same content, then the command will be “mdadm –add /dev/md5 /dev/sdb3”
mdadm –add /dev/md# /dev/sdb#
Check the status of the array to see whether the rebuild has been completed or not.
cat /proc/mdstat
# or
mdadm –detail /dev/md#
If you’d like to watch it is as it changes you can view it “live” by using the following:
watch cat /proc/mdstat

Thank you for taking the time to check out this article! We hope this has helped you, whether it be RAID fundamentals, process understanding, setup, checking/recovering, or replacing bad drives.

Need More Personalized Help?

If you have any further issues, questions, or would like some assistance checking on this or anything else, please reach out to us from your my.hivelocity.net account and provide your server credentials within the encrypted field for the best possible security and support.

If you are unable to reach your my.hivelocity.net account or if you are on the go, please reach out from your valid my.hivelocity.net account email to us here at: support@hivelocity.net. We are also available to you through our phone and live chat system 24/7/365.