[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Why syslog is not rotating?



On Sun, 3 Nov 2013, Reco wrote:

On Sun, 3 Nov 2013 17:16:02 +0200 (IST)
Itay <debian@itayf.fastmail.fm> wrote:

On Sun, 3 Nov 2013, Reco wrote:

[...] Is there anything suspicious in the root mailbox?

root mail box has daily messages like this starting at june 2010
(yes, I know, bad me)

     /etc/cron.daily/logrotate:

      gzip: stdin: Input/output error
      error: failed to compress log /var/log/syslog.1
      run-parts: /etc/cron.daily/logrotate exited with return code 1

And, is there anything unusual in /var/log/kern.log at the time you
had this error?

Multiple messages like those two:

...
Oct 31 07:59:35 gandalf kernel: [4627180.405646] ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0 Oct 31 07:59:35 gandalf kernel: [4627180.405650] ata3.00: irq_stat 0x40000008 Oct 31 07:59:35 gandalf kernel: [4627180.405653] ata3.00: failed command: READ FPDMA QUEUED Oct 31 07:59:35 gandalf kernel: [4627180.405659] ata3.00: cmd 60/08:00:cb:05:a9/00:00:05:00:00/40 tag 0 ncq 4096 in Oct 31 07:59:35 gandalf kernel: [4627180.405661] res 41/40:00:cd:05:a9/00:00:05:00:00/40 Emask 0x409 (media error) <F> Oct 31 07:59:35 gandalf kernel: [4627180.405664] ata3.00: status: { DRDY ERR } Oct 31 07:59:35 gandalf kernel: [4627180.405666] ata3.00: error: { UNC } Oct 31 07:59:35 gandalf kernel: [4627180.407143] ata3.00: configured for UDMA/133 Oct 31 07:59:35 gandalf kernel: [4627180.407153] sd 2:0:0:0: [sda] Unhandled sense code Oct 31 07:59:35 gandalf kernel: [4627180.407155] sd 2:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Oct 31 07:59:35 gandalf kernel: [4627180.407158] sd 2:0:0:0: [sda] Sense Key : Medium Error [current] [descriptor] Oct 31 07:59:35 gandalf kernel: [4627180.407163] Descriptor sense data with sense descriptors (in hex): Oct 31 07:59:35 gandalf kernel: [4627180.407165] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
Oct 31 07:59:35 gandalf kernel: [4627180.407173]         05 a9 05 cd
Oct 31 07:59:35 gandalf kernel: [4627180.407176] sd 2:0:0:0: [sda] Add. Sense: Unrecovered read error - auto reallocate failed Oct 31 07:59:35 gandalf kernel: [4627180.407181] sd 2:0:0:0: [sda] CDB: Read(10): 28 00 05 a9 05 cb 00 00 08 00 Oct 31 07:59:35 gandalf kernel: [4627180.407188] end_request: I/O error, dev sda, sector 94963149
Oct 31 07:59:35 gandalf kernel: [4627180.407208] ata3: EH complete
...
Nov 1 07:50:21 gandalf kernel: [4713026.178488] ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0 Nov 1 07:50:21 gandalf kernel: [4713026.178492] ata3.00: irq_stat 0x40000008 Nov 1 07:50:21 gandalf kernel: [4713026.178496] ata3.00: failed command: READ FPDMA QUEUED Nov 1 07:50:21 gandalf kernel: [4713026.178502] ata3.00: cmd 60/08:00:cb:05:a9/00:00:05:00:00/40 tag 0 ncq 4096 in Nov 1 07:50:21 gandalf kernel: [4713026.178503] res 41/40:00:cd:05:a9/00:00:05:00:00/40 Emask 0x409 (media error) <F> Nov 1 07:50:21 gandalf kernel: [4713026.178506] ata3.00: status: { DRDY ERR } Nov 1 07:50:21 gandalf kernel: [4713026.178509] ata3.00: error: { UNC } Nov 1 07:50:21 gandalf kernel: [4713026.179984] ata3.00: configured for UDMA/133
Nov  1 07:50:21 gandalf kernel: [4713026.179992] ata3: EH complete
...

Does, say, 'md5sum /var/log/syslog' runs to the completion?

Yes.  Without warnings/errors.

What about 'cat /var/log/syslog > /dev/null'?

Yes.  Without warnings/errors.

Ok. What about 'cat /var/log/syslog | gzip -c > /dev/null'?
And, while we're at that, what about:

cat /var/log/syslog | gzip -c > /var/log/syslog.test.gz

Both commands finished without warnings/errors.

If error shows early, can you also post contents of (/tmp/gzip):

strace -fo /tmp/gzip cat /var/log/syslog | gzip -c > /dev/null

Didn't try since there were no errors.

Can you run fsck on the filesystem containing /var/log/syslog?

[snip]

File system was found clean.  No errors were reported.

What does smartctl --all shows on the partition with this filesystem?

I never used smartctl (installed it now following-up your question).
In my system /var resides on a logical volume.
So I am not sure how to proceed.

Find a physical volume corresponding to the /var logical volume.
Run smartctl --all on the disk that's containing that physical volume.
In case you have RAID (be it mdadm or dm-mirror) - run smartctl on all
disks that are part of said RAID.

While we're on it, also run smartctl -t long on said disk, wait for a
while (smartctl should say you, how much), and run smartctl --all on
the same disk again.

Output of 'smartctl --all' (after running 'smartctl -t long'):

smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-4-amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Blue Serial ATA
Device Model:     WDC WD1600AAJS-00L7A0
Serial Number:    WD-WCAV34031063
LU WWN Device Id: 5 0014ee 15756c0f2
Firmware Version: 01.03E01
User Capacity:    160,041,885,696 bytes [160 GB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Mon Nov  4 10:42:48 2013 IST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84)	Offline data collection activity
					was suspended by an interrupting command from host.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      ( 121)	The previous self-test completed having
					the read element of the test failed.
Total time to complete Offline data collection: ( 3000) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: 	 (  39) minutes.
Conveyance self-test routine
recommended polling time: 	 (   5) minutes.
SCT capabilities: 	       (0x3037)	SCT Status supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       10434
  3 Spin_Up_Time            0x0027   135   130   021    Pre-fail  Always       -       4241
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       119
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   060   060   000    Old_age   Always       -       29269
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       117
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       52
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       119
194 Temperature_Celsius     0x0022   100   093   000    Old_age   Always       -       43
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       1
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       1
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       1

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%     29267         94963149

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

==========================================================
End of 'smartctl --all' output.

Many thanks for the help and the patience!
Itay

Reco





Reply to: