Wednesday, December 29, 2010

xfs sunit/swidth settings using RAID

For optimum performance the sunit and swidth XFS mount parameters should correspond to the underlying RAID array. sunit corresponds to the stripe size and swidth corresponds to the number of drives. Here are my notes on these settings:

sunit/swidth values reported by xfs_info are in file system blocks not 512 blocks, default file system block = 4k (I think this will be chosen automagically by mkfs.xfs)

swidth = n-1 drives for raid 5 (just the stripe drives)

mount -o sunit=512,swidth=1536 /dev/md0 [mountpoint]
        -Here sunit is set for 256k stripe size (512b*512b/1024b=256k) and 4 drives total in array (so 3*512)
        -xfs_info will report this as sunit=64blks, swidth=192blks with the default 4kb block size. (64*4096b/1024b=256k) (64*3)

Unfortunately I don't have the references for this info, which is a shame because the info was hard to nail down.

MDADM RAID5 Data Scrubbing

If you have a RAID5 array made up of large disks the odds are good that you will experience an unreadable block while rebuilding from a failed drive. To try and find these blocks preemptively, set up regular data scrubbing using cron.

Something along the lines of:
# crontab -e
        0 4 * * 3 /bin/echo check > /sys/block/md0/md/sync_action


An overly dramatic article on this issue:

Manually unban a fail2ban banned IP address

To manually unban an IP address that fail2ban has banned:

iptables -D fail2ban-ssh 1 

Where fail2ban-ssh is the chain the IP is in and 1 is the position of the ip in the chain. Use iptables -L to gather this info.

Make fail2ban's apache-auth work with auth_digest

By default fail2ban's apache-auth only works with auth_basic. To make it work with auth_digest:

vi /etc/fail2ban/filter.d/apache-auth.conf
        delete old failregex line
        failregex = [[]client <HOST>[]] .* user .* authentication failure
            [[]client <HOST>[]] .* user .* not found
            [[]client <HOST>[]] .* user .* password mismatch


Speed up MDADM RAID5 array using stripe_cache_size

edit /etc/rc.local
                echo 4096 > /sys/block/md0/md/stripe_cache_size

To make effect take place immediately:
echo 4096 > /sys/block/md0/md/stripe_cache_size

4096 was chosen after fairly extensive bonnie testing of various sizes from 256 to 8192 with and without NCQ enabled on the drives. Using this setting increased my write speeds by about 50%.

NeilB's post: "You can possibly increase the speed somewhat by increasing the buffer space that is used, thus allowing larger reads followed by larger writes. This is done by increasing /sys/block/mdXX/md/stripe_cache_size"

Another one of NeilB's posts regarding this topic:
"Changing the stripe_cache_size will not risk causing corruption.
If you set it too low the reshape will stop progressing. You can then set it to a larger value and let it continue.
If you set it too high you risk tying up all of your system memory in the cache. In this case you system might enter a swap-storm and it might be rather hard to set it back to a lower value.
The amount of memory used per cache entry is about 4K times the number of device in the array."

NeilB is Neil Brown, the author of MDADM.

Control MDADM rebuild/reshape speed

edit etc/sysctl.conf
       = 400000
       = 400000
# sysctl -p

or temporarily change by:
echo 400000 > /proc/sys/dev/raid/speed_limit_min
echo 400000 > /proc/sys/dev/raid/speed_limit_max

Where 400000 is the desired speed. My system doesn't have a heavy load on it so I like to max out the reshape speed using 400000 (400MB/s) max.

Useful MDADM commands

Notes: /dev/hdb{1,2,3} are current RAID members. /dev/hdb4 is a new "drive"

#RAID Status
cat /proc/mdstat
mdadm --detail /dev/md0

#Copy Partition Structure for a New Drive (Useful when adding a new drive)
sfdisk -d /dev/hdb1 | sfdisk /dev/hdb3

#Add another drive to an array
mdadm /dev/md0 --add /dev/hdb4
mdadm /dev/md0 --grow -n 4
resize2fs /dev/md0
xfs_growfs /mnt/RAID (for XFS)

#Replace a failed drive
mdadm /dev/md0 --fail /dev/hdb2
mdadm /dev/md0 --remove /dev/hdb2
//Partition the new drive same as others (info above)
mdadm /dev/md0 --add /dev/hdb4

#Increase size of array (all the drives have gotten bigger)
mdadm /dev/md0 --grow --size=max
resize2fs /dev/md

xfs_growfs /mnt/RAID (for XFS)

#Remove RAID array
//fail and remove all the drives (see replace a failed drive above)
//unmount array
mdadm --stop /dev/md0

Migrate to RAID5 from a single disk

The following are notes for adding just 2 drives to an existing drive full of data in order to create a RAID5 (without backing up and restoring).

The gist of the process is to create a new RAID5 array in a degraded state using the 2 new drives. Copy the data over to the degraded array, then add the drive that originally contained the data to the array.

Notes: /dev/hdb1 and /dev/hdb2 are new drives, /dev/hdb3 is the old full drive
(You'll note that these are different partitions not drives because these notes are from when I was preparing/testing)

#Get mdadm
apt-get install mdadm (Debian/Ubuntu)

#Optionally: Create 2 unformatted partitions slightly smaller than the drive size on the new hard drives. Change flags to raid in gparted or use fdisk to change IDs to 'fd' (fdisk /dev/sda; t; 1; fd).

#Step 1: Create the RAID 5 degraded array:
mdadm -C /dev/md0 -l 5 -n 3 missing /dev/hdb1 /dev/hdb2
(the -l is a letter 'L' not a 'one')
(where /dev/hdb1 and /dev/hdb2 are the 2 new drive partitions)

#Create a file system on the RAID, ex:
mkfs /dev/md0 -t ext3

#Create a mount point and mount the RAID partition (/dev/md0)

#Copy existing files onto the raid:
rsync -avH --progress -x /existing/files/ /RAID/mountpoint/

#Clear the full drive, (optionally create unformatted partition on it of same size as other 2 drives in RAID with raid flag (or 'fd' ID; see above))

#Add the originally full drive to the array:
mdadm /dev/md0 -a /dev/hdb3

#To view status of the rebuild:
watch -n1 'cat /proc/mdstat'
(l is the number 'One' not the letter 'L')

#Check to make sure everything looks alright:
mdadm --detail /dev/md0