[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: loss of mbmon function



On 11/1/22 19:57, David Christensen wrote:
On 11/1/22 06:20, gene heskett wrote:
Greetings all;

I am now suffering from a hang on reboot. And in looking for info, I find that gkrellm can only see temps. I don't push this so they stay in the 29 to 30C range. gkrellm is, and has been part of my housekeeping for 20 years.

But mbmon was not installed, but it and all its suggested dependency's are now, and two reboots, which took about 20 minutes just to get to the bios screen while dancing a jig on the del key. During that time I can hear a very faint clicking sound from time to time. zero activity on any drive controller led, there are two controllers, one of course on the mobo, and one that interfaces a 4 drive raid10 for the /home.

Mobo is: Asus PRIME Z370-A II, BIOS 0801 04/24/2019

mbmon claims to run by itself but needs root, and when ran with sudo, reports
gene@coyote:~$ sudo mbmon
[sudo] password for gene:
No Hardware Monitor found!!
InitMBInfo: Success

What do you suggest I install so this Asus mobo  can be monitored.


Those symptoms would seem to indicate that a disk drive is failing, causing the motherboard firmware and/or the HBA/RAID controller firmware to enter a retry/ timeout loop.


I would try:

1.  Enter the motherboard firmware setup utility during POST and look for warnings, errors, log entries, etc..

2.  Enter the HBA/RAID firmware configuration utility during POST and look for warnings, errors, log entries, etc..

3.  Examine dmesg(1) after boot, looking for errors, warnings, etc..

4.  Examine the files in /var/log after boot, looking for warnings, errors, etc..

BTDT twice this morning, Its all clean to this point.

5. Run SMART short tests on all drives, generate SMART reports for all drives, and then look at the reports for symptoms of a failing drive.

I have not done that yet. /dev/sda says its fine.
 Now a long test is running on /dev/sde, the first of 4 in the raid10. 3 to go after this one. There are more, but they are late mounts, and not in  /etc/fstab, they are in other machines all mounted thru /sshnet. My local network's contents.
6. Examine dmesg(1) and /var/log files again after the machine has been up for a while and look for warnings, errors, etc..

Nothing, it just sits there with the early boot Asus blurb on screen, for 20 minutes or more.
7. POST and Debian boot messages can scroll by faster than you can see them, and I am unsure if everything ends up in a log file.  If you cannot find any clues using the above steps, set up a video camera to record the console during boot.  Then, look at the video for warnings, errors, etc..

My impression is that its all pre-bios, pre-boot. Once it reaches the inital grub screen, the rest of the boot seems to be quite normal speed. And if I can get into the bios, it looks perfectly normal. I'll send smartctl after more info. And I just put a dvm on a drive plug, getting
5.1 and 12.1 voltages there, so I don't think the psu is going down.

David

.
Take care & stay well, David, I'm going to go check some blankets &  eyelids for leaks.


Cheers, Gene Heskett.
--
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author, 1940)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis
Genes Web page <http://geneslinuxbox.net:6309/>


Reply to: