Re: Talking about RAID - disks with same id

To: debian-user@lists.debian.org
Subject: Re: Talking about RAID - disks with same id
From: David Christensen <dpchrist@holgerdanske.com>
Date: Thu, 9 Nov 2017 14:46:50 -0800
Message-id: <[🔎] 61f3f439-c8d3-1d17-ab6e-7e58fcfccfdd@holgerdanske.com>
In-reply-to: <[🔎] ou2fsu$o5k$1@blaine.gmane.org>
References: <[🔎] otvtkn$lvm$1@blaine.gmane.org> <[🔎] 777634b1-4cdf-889e-d3f7-08e94f3f34be@holgerdanske.com> <[🔎] ou0vde$4gp$1@blaine.gmane.org> <[🔎] 7f045d99-7db7-2f6b-f445-f485fb125e19@holgerdanske.com> <[🔎] ou2fsu$o5k$1@blaine.gmane.org>

On 11/09/17 13:04, deloptes wrote:

David Christensen wrote:

What RAID technology are you using?


Linux software raid - kernel is 4.12.10

Most people call it 'mdadm', after the command-line tool. I am runningthe same, but on Debian "stable":


2017-11-09 14:00:32 root@dipsy ~
# dpkg-query --show mdadm
mdadm	3.4-4+b1

2017-11-09 14:00:40 root@dipsy ~
# cat /etc/debian_version
9.2

2017-11-09 14:01:00 root@dipsy ~
# uname -a

Linux dipsy 4.9.0-4-amd64 #1 SMP Debian 4.9.51-1 (2017-09-28) x86_64GNU/Linux


2017-11-09 14:01:06 root@dipsy ~
# dpkg-query --show mdadm
mdadm	3.4-4+b1

Take a look at:

# smartctl --xall /dev/sdg
This is nothing spectacular - see attachment.


I'll comment on the information I think I understand...


> Device Model:     ST3500630AS

I deal with 8 @ ST31500341AS drives, which I believe are of the samevintage. They all seem good.



> SMART overall-health self-assessment test result: PASSED

That is good.


> ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
>   1 Raw_Read_Error_Rate     POSR--   105   095   006    -    0
>  10 Spin_Retry_Count        PO--C-   100   100   097    -    0
> 187 Reported_Uncorrect      -O--CK   100   100   000    -    0
> 198 Offline_Uncorrectable   ----C-   100   100   000    -    0

A RAW_VALUE of 0 for these attributes is good.


> 199 UDMA_CRC_Error_Count    -OSRCK   200   200   000    -    7

7 is low, but the two in my file server are both 0.

Check your cable connections -- they should be fully engaged and notloose. Otherwise swap the cable. (I wrote a serial number on all of mySATA cables with Sharpie and track which cable is where.)



>   9 Power_On_Hours          -O--CK   034   034   000    -    58404

If 58404 means ~6.6 years (and I think it does), that is a lot of time.But, I would not worry based on just this value.



>   7 Seek_Error_Rate         POSR--   088   060   030    -    747385748
> 195 Hardware_ECC_Recovered  -O-RC-   064   056   000    -    179548239

I don't know how to interpret these raw values.  STFW I am not alone.


> SMART Extended Comprehensive Error Log Version: 1 (5 sectors)
> No Errors Logged

That is good.

In fact I think it does not
write to this disk at all as the partition in the raid setup shows to a
disk with same id.

I think the problem is that blkid reports the same ID for both and that
somehow RAID is using this information, rather than using some of the other
mechanisms - UUID or UDEV Maker/Model/Serial .. which can be found
under /dev/disk

As I understand it, when mdadm creates an array, mdadm puts a metadataheader into each device that includes identification of the array andidentification of each member.

When the system boots, mdadm reads /etc/mdadm/mdadm.conf for arrayspecifications, scans all devices for mdadm metadata, and then assemblesthe specified arrays using the devices it finds (as best it can).

It looks like you partitioned your drives with one large partition oneach drive, and then created the array on the partitions.

The matching PTUUID values for both drives, and matching UUID andPARTUUID values for both partitions, indicates that one drive was clonedonto the other at some point after creating the array. I agree thatthis is likely a mistake, and is likely to confuse mdadm.

If you learn smartctl well enough, capture reports on a schedule
(weekly?), and look for trends, you might be able to predict failure.
STFW for information on this approach.


Download the bootable CD image of Seagate Seatools and run it:

https://www.seagate.com/support/downloads/seatools/

might do that,

You want that CD as part of your tool kit -- it makes running the SMARTtests easy, lets you know if everything passed, and helps you understandanything that is questionable.

but I think the problem is in raid itself as it does not
indicate activity on the second disk and blkid reports the same id for two
disks - I really might need to look into the raid code if blkid is used in
any way.

Another alternative to crawling code would be to build another array ona pair of USB flash drives using the same process as you used for your500 GB drives, and then see what blkid(8) says about the USB drives.



Do you have the console session from when you built the array?

Be sure to keep a console session of any and all mdadm commands youissue from now on.

[the drives] are in server that virtually runs 24/7 and indeed I have replaced
many over the years. In fact most of the old disks are gone. The Seagate is
the oldest there ... the only left, so I think I'll just replace it so that
I may sleep well ... the problem is I don't know which disk is really
writing, might be the Seagate and the WD is not operational ... I think it
is best to be on the safe side :)

If the array is working, leave it alone. Backup/ archive, build areplacement array, rsync the data over, validate, migrate services tothe new array, validate services, and backup again (to validate yourbackup process). Once the new array has been up and running for awhile, tear it down and pull the drives.



David

Reply to:

References:
- Talking about RAID - disks with same id
  - From: deloptes <deloptes@gmail.com>
- Re: Talking about RAID - disks with same id
  - From: David Christensen <dpchrist@holgerdanske.com>
- Re: Talking about RAID - disks with same id
  - From: deloptes <deloptes@gmail.com>
- Re: Talking about RAID - disks with same id
  - From: David Christensen <dpchrist@holgerdanske.com>
- Re: Talking about RAID - disks with same id
  - From: deloptes <deloptes@gmail.com>

Prev by Date: Re: Sync two disks and hot swap
Next by Date: Re: Talking about RAID - disks with same id
Previous by thread: Re: Talking about RAID - disks with same id
Next by thread: Re: Talking about RAID - disks with same id
Index(es):
- Date
- Thread