Quantcast
Channel: Evaggelos Balaskas - System Engineer
Viewing all articles
Browse latest Browse all 333

Linux Software RAID mismatch Warning

$
0
0

I use Linux Software RAID for years now. It is reliable and stable (as long as your hard disks are reliable) with very few problems. One recent issue -that the daily cron raid-check was reporting- was this:

 

WARNING: mismatch_cnt is not 0 on /dev/md0

 

Raid Environment

A few details on this specific raid setup:

RAID 5 with 4 Drives

with 4 x 1TB hard disks and according the online raid calculator:

RAID Calculator

raid5-4disks

that means this setup is fault tolerant and cheap but not fast.

 

Raid Details

# /sbin/mdadm --detail /dev/md0

raid configuration is valid

/dev/md0:
        Version : 1.2
  Creation Time : Wed Feb 26 21:00:17 2014
     Raid Level : raid5
     Array Size : 2929893888 (2794.16 GiB 3000.21 GB)
  Used Dev Size : 976631296 (931.39 GiB 1000.07 GB)
   Raid Devices : 4
  Total Devices : 4
    Persistence : Superblock is persistent

    Update Time : Sat Oct 27 04:38:04 2018
          State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 512K

           Name : ServerTwo:0  (local to host ServerTwo)
           UUID : ef5da4df:3e53572e:c3fe1191:925b24cf
         Events : 60352

    Number   Major   Minor   RaidDevice State
       4       8       16        0      active sync   /dev/sdb
       1       8       32        1      active sync   /dev/sdc
       6       8       48        2      active sync   /dev/sdd
       5       8        0        3      active sync   /dev/sda

 

Examine Verbose Scan

with a more detailed output:

# mdadm -Evvvvs

there are a few Bad Blocks, although it is perfectly normal for a two (2) year disks to have some. smartctl is a tool you need to use from time to time.

/dev/sdd:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : ef5da4df:3e53572e:c3fe1191:925b24cf
           Name : ServerTwo:0  (local to host ServerTwo)
  Creation Time : Wed Feb 26 21:00:17 2014
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 1953266096 (931.39 GiB 1000.07 GB)
     Array Size : 2929893888 (2794.16 GiB 3000.21 GB)
  Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
    Data Offset : 259072 sectors
   Super Offset : 8 sectors
   Unused Space : before=258984 sectors, after=3504 sectors
          State : clean
    Device UUID : bdd41067:b5b243c6:a9b523c4:bc4d4a80

    Update Time : Sun Oct 28 09:04:01 2018
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : 6baa02c9 - correct
         Events : 60355

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)

 

/dev/sde:
   MBR Magic : aa55
Partition[0] :      8388608 sectors at         2048 (type 82)
Partition[1] :    226050048 sectors at      8390656 (type 83)
/dev/sdc:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : ef5da4df:3e53572e:c3fe1191:925b24cf
           Name : ServerTwo:0  (local to host ServerTwo)
  Creation Time : Wed Feb 26 21:00:17 2014
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
     Array Size : 2929893888 (2794.16 GiB 3000.21 GB)
  Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
    Data Offset : 259072 sectors
   Super Offset : 8 sectors
   Unused Space : before=258992 sectors, after=3504 sectors
          State : clean
    Device UUID : a90e317e:43848f30:0de1ee77:f8912610

    Update Time : Sun Oct 28 09:04:01 2018
       Checksum : 30b57195 - correct
         Events : 60355

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 1
   Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)

 

/dev/sdb:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : ef5da4df:3e53572e:c3fe1191:925b24cf
           Name : ServerTwo:0  (local to host ServerTwo)
  Creation Time : Wed Feb 26 21:00:17 2014
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
     Array Size : 2929893888 (2794.16 GiB 3000.21 GB)
  Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
    Data Offset : 259072 sectors
   Super Offset : 8 sectors
   Unused Space : before=258984 sectors, after=3504 sectors
          State : clean
    Device UUID : ad7315e5:56cebd8c:75c50a72:893a63db

    Update Time : Sun Oct 28 09:04:01 2018
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : b928adf1 - correct
         Events : 60355

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)

 

/dev/sda:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : ef5da4df:3e53572e:c3fe1191:925b24cf
           Name : ServerTwo:0  (local to host ServerTwo)
  Creation Time : Wed Feb 26 21:00:17 2014
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 1953263024 (931.39 GiB 1000.07 GB)
     Array Size : 2929893888 (2794.16 GiB 3000.21 GB)
  Used Dev Size : 1953262592 (931.39 GiB 1000.07 GB)
    Data Offset : 259072 sectors
   Super Offset : 8 sectors
   Unused Space : before=258984 sectors, after=3504 sectors
          State : clean
    Device UUID : f4e1da17:e4ff74f0:b1cf6ec8:6eca3df1

    Update Time : Sun Oct 28 09:04:01 2018
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : bbe3e7e8 - correct
         Events : 60355

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 3
   Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)

 

MisMatch Warning

WARNING: mismatch_cnt is not 0 on /dev/md0

So this is not a critical error, rather tells us that there are a few blocks that are “Not Synced Yet” across all disks.

 

Status

Checking the Multiple Device (md) driver status:

# cat /proc/mdstat

Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdc[1] sda[5] sdd[6] sdb[4]
      2929893888 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]

We verify that none job is running on the raid.

 

Repair

We can run a manual repair job:

# echo repair >/sys/block/md0/md/sync_action

now status looks like:

# cat /proc/mdstat

Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdc[1] sda[5] sdd[6] sdb[4]
      2929893888 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
      [=========>...........]  resync = 45.6% (445779112/976631296) finish=54.0min speed=163543K/sec

unused devices: <none>

Progress

Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdc[1] sda[5] sdd[6] sdb[4]
      2929893888 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
      [============>........]  resync = 63.4% (619673060/976631296) finish=38.2min speed=155300K/sec

unused devices: <none>
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdc[1] sda[5] sdd[6] sdb[4]
      2929893888 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
      [================>....]  resync = 81.9% (800492148/976631296) finish=21.6min speed=135627K/sec

unused devices: <none>

Finally

Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdc[1] sda[5] sdd[6] sdb[4]
      2929893888 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]

unused devices: <none>

 

Check

After repair is it useful to check again the status of our software raid:

# echo check >/sys/block/md0/md/sync_action

# cat /proc/mdstat

Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdc[1] sda[5] sdd[6] sdb[4]
      2929893888 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
      [=>...................]  check =  9.5% (92965776/976631296) finish=91.0min speed=161680K/sec

unused devices: <none>

and finally

# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdc[1] sda[5] sdd[6] sdb[4]
      2929893888 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]

unused devices: <none>
Tag(s): md0, mdadm, linux, raid

Viewing all articles
Browse latest Browse all 333

Trending Articles