[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Where is the problem: Tape Drive? Cartridge(s)? Cable? SAS Controller?



I've been trying to diagnose and resolve this since November, and am still having trouble figuring out what is happening... Debian 10 doesn't present any real easy way to decode and find details about the hexadecimal error messages.

I know this is kinda "old-school", but I'm backing up partition images to LTO-5 tape cartridges, and so far, the tape backup initially worked, but recently has eventually errored on each cartridge used in the backup attempts.

For the moment, I am willing to accept that the tar command is NOT the culprit.

It could be the SAS Controller software, or the "mt" package which manages the tape drive, but given that it has worked several times and has continued to work, even as individual backups have failed, I am not convinced that the issue is with controller or driver software.

All this leads to the hardware question, "What is failing": Tape Drive? Cartridge(s)? Cable? SAS Controller?

Rather than just blindly substitute parts (expensive, time consuming, frustratingly inconclusive) and try to eliminate that way, I'd really like to have a better roadmap for locating the issue. A new SAS Controller, or the Cable connecting the Controller to the Drive, or new Cartridges are not so expensive as to be non-starters, but I'm retired with limited income, and a new LTO drive would be a real stretch.

Here are three minutes of error notes from my last attempt in kern.log/syslog:

Nov 13 08:02:29 BigMutt kernel: [34669.493781] st 0:0:0:0: device_block, handle(0x0009) Nov 13 08:02:29 BigMutt kernel: [34669.493879] st 0:0:0:0: [st0] Error e0000 (driver bt 0x0, host bt 0xe). Nov 13 08:02:31 BigMutt kernel: [34671.743620] st 0:0:0:0: device_unblock and setting to running, handle(0x0009) Nov 13 08:02:31 BigMutt kernel: [34671.743714] st 0:0:0:0: [st0] Error 10000 (driver bt 0x0, host bt 0x1). Nov 13 08:02:31 BigMutt kernel: [34671.744077] st 0:0:0:0: [st0] Error 10000 (driver bt 0x0, host bt 0x1). Nov 13 08:02:31 BigMutt kernel: [34671.745089] mpt2sas_cm0: removing handle(0x0009), sas_addr(0x500110a001622ed0) Nov 13 08:02:31 BigMutt kernel: [34671.745091] mpt2sas_cm0: enclosure logical id(0x500605b00341cef0), slot(0) Nov 13 08:02:36 BigMutt kernel: [34676.006914] scsi 0:0:1:0: Sequential-Access HP       Ultrium 5-SCSI   Z6ED PQ: 0 ANSI: 6 Nov 13 08:02:36 BigMutt kernel: [34676.006922] scsi 0:0:1:0: SSP: handle(0x0009), sas_addr(0x500110a001622ed0), phy(3), device_name(0x500110a001622ed2) Nov 13 08:02:36 BigMutt kernel: [34676.006924] scsi 0:0:1:0: enclosure logical id (0x500605b00341cef0), slot(0)
Nov 13 08:02:36 BigMutt kernel: [34676.008694] scsi 0:0:1:0: TLR Enabled
Nov 13 08:02:36 BigMutt kernel: [34676.011053] st 0:0:1:0: Attached scsi tape st0 Nov 13 08:02:36 BigMutt kernel: [34676.011056] st 0:0:1:0: st0: try direct i/o: yes (alignment 4 B) Nov 13 08:02:36 BigMutt kernel: [34676.011143] st 0:0:1:0: Attached scsi generic sg2 type 1 Nov 13 08:05:24 BigMutt kernel: [34844.612941] st 0:0:1:0: [st0] Block limits 1 - 16777215 bytes.

So, is the culprit the LTO-5 drive? Cartridge? possibly the I/O signal cable? the SAS Controller? What do I need to do to determine the true cause of the errors with /dev/st0?

Hardware System Configuration:
4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 (2019-11-11)
MB: Gigabyte 970A-D3P
CPU: AMD FX-8350 @4000.000 MHz cache: 2048 KB
RAM: 32GB (4x8GB) Unbuffered/Unregistered
LTO-5 SAS Tape on LSI SAS9211 controller
Video: GeForce 8400 GS to VIZIO E320VA


Reply to: