[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: how to test disk for bad sector



On 29.08.20 10:18, Alexander V. Makartsev wrote:
On 29.08.2020 07:59, Long Wind wrote:
installation of linux to sdb1 fails
i believe hard disk has bad sector
If hard drive has bad sectors or recently encountered them, information about this should be noted to hard drive's SMART table. Alternatively, you can use "badblocks" program from "e2fsprogs" package to scan hard drive for bad blocks. I'd perform tests on wiped clean hard drive with non-destructive read test first, followed by write test. Testing media for bad blocks could be time consuming if hard drive is multiple terabytes in size.

i use e2fsck with -c, i.e. read-only test
it doesn't  report any error



I support this recommendation to use badblocks.
If you first would need to rescue data from the disk, although your question sounds like there is no data worth to rescue from the disk anymore, then use ddrescue from package gddrescue first. Then, using badblocks, I recommend to run it in write mode with option "-n" for the following reason: if I am correctly informed, then disks with S.M.A.R.T have usually a reservoir of memory blocks to which the firmware of the disk itself, without the operating system seeing this, redirects by the disk itself already detected bad blocks. The statistics about these permanent redirection events is found in the S.M.A.R.T. log of the disk, which you can access by the smartctl program. But the internal mechanism of the disk's S.M.A.R.T. will only detect bad blocks upon the intent to write to blocks. Simply intending to read from bad blocks will not trigger S.M.A.R.T. to recognize blocks as bad blocks and they would thus not become visible in the S.M.A.R.T. report. If you later would write to the disk (i.e. during your OS installation you are mentioning as the cause to have encountered a problem with your hardware) then either S.M.A.R.T. will invisibly protect you by applying its internal redirection mechanism to reservoir blocks, or, if no more reservoir blocks are available, leave the operating system with the problem. This is what might happen in your situation right now. So, the operating system now needs to maintain its own list of bad blocks, which is thus the list of bad blocks no more cared for by S.M.A.R.T. . Again, simply reading from the disk might not be enough to properly detect these still present bad blocks. Therefore I recommend to let the operating system search for them by running badblocks with option "-n" (or "-w", please consult the man pages what better fits your needs) in write mode! Actually, I would recommend to repeat such run several times, in order to monitor if the amount of bad blocks is at least constant or if it is increasing. In the latter case you should replace the disk by a new one for sure. In the former case, if badblocks command finds already bad blocks which couldn't be cared for by S.M.A.R.T., I would also seriously consider to replace the disk for a new one now, if the financial situation allows for it. But if a replacement is wished to be avoided now for financial reasons, then at least continue to monitor the situation very frequently and of course at any time have a proper backup of your data on a still good medium. Given the requirement to frequently monitor a disk which can not buffer problems for you automatically by its S.M.A.R.T., and considering the time effort this repeatedly involves, you will have to balance this costs of time and missing trust in the present medium against the costs for a new disk.

If the disk comes out to not be the cause for your trouble encountered with your system, then you could check if all components on the motherboard involved in moving data around are still fine: - write a heavy amount of data to the disk: with command dd copy a huge amount of data from an externally to USB connected drive to the drive which you at the moment suspect to trouble; maybe the motherboard fails from time to time to still handle such job free of errors; so, the disk might be still be fine and could be reused elsewhere, but the data highway on your motherboard started to fail; for this check I would not simply use if=/dev/zero, but really reading data from another drive, in order to ensure that on the motherboard the respective data highway has to be used as during your OS installation or during future data copy procedures; - check your RAM with memtest86+, you will have to search for a Life pen-drive offering this command in its grub boot menu; I am not perfectly sure right now, but would expect the Knoppix Linux distribution to offer this;
- check your CPU with stress-ng

Good luck!
Marco


Reply to: