[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Athlon64 X2 vs its halt bug



On Saturday 22 August 2015 20:15:33 you wrote:

> Gene Heskett <gheskett@wdtv.com> writes:
> > On Saturday 22 August 2015 11:46:37 you wrote:
> >> Gene Heskett <gheskett@wdtv.com> writes:
> >> > Greetings all;
> >> >
> >> > I have a box thats locking up occasionally, and I am wondering if
> >> > there is a watchdog program using the usual onboard hardware as a
> >> > timer which can, if not "petted" issue an NMI or signal that will
> >> > kick this cpu back out of a halted condition?  Since its running
> >> > cnc machinery thru a custom card that has its own watchdog that
> >> > disables the card and stops the machine in a couple milliseconds,
> >> > that part is covered, but my only recovery recourse is a
> >> > powerdown reset.
> >> >
> >> > Searches with synaptic don't seem to be find a likely culprit, so
> >> > I am asking the list.
> >>
> >> There is a 'watchdog' package which is supposed to do exactly what
> >> you want.  It can use software timers, but there are hardware
> >> drivers for common hardware that should be much safer.
> >
> > openhpi seemed to have the best propaganda, but once installed, the
> > man pages do not describe what hardware it can use, or how to set
> > the nap time.
> >
> > Can you recommend a different, perhaps better suited kit?
>
> The kernel contains most of the modules, so in my kernel those are in
> the /lib/modules/3.2.0-4-amd64/kernel/drivers/watchdog/ directory.
> Yours will be different if you have a different kernel.  I just use
> the sbc60xxwdt module for the AMD chipset on my board.  The modinfo
> description for that driver says: 60xx Single Board Computer Watchdog
> Timer driver.  I don't remember why I tried that one, but you might
> just try modprobe on them and then start the watchdog and look at
> dmesg and see if watchdog uses it.
>
> I just realized that I don't really know which is working.  Modinfo
> seemed to show that the modules wasn't used, so I tried removing the
> sbc60xxwdt module and it continued running.  I didn't notice until
> later that the log shows that it disabled the watchdog timer.  I also
> tried the it87_wdt timer since I think that is a common chip, and that
> also seemed to work.  I also tried one of the others and that one just
> gave the message: 'ERROR: could not insert 'it8712f_wdt': No such
> device'.  I assume that means that my system doesn't have the hardware
> that module is for.
>
> Sorry that this may not seem very helpful, but it looks like you
> should just try different modules and see which work.

I tried about 6 of them, some loaded, some wouldn't, and restarted the /etc/init.d script about 30 times, but 4 
seconds later when I ask for status, its not running.  Nothing I can find in the logs seems to want to indicate which 
module in the  quoted directory, of this list, it should be looking for.

gene@GO704:/lib/modules/3.4-9-rtai-686-pae/kernel/drivers/watchdog$ ls
acquirewdt.ko    eurotechwdt.ko  ib700wdt.ko             iTCO_wdt.ko     pcwd.ko         sbc8360.ko         
sch311x_wdt.ko     w83627hf_wdt.ko  wafer5823wdt.ko
advantechwdt.ko  f71808e_wdt.ko  ibmasr.ko               machzwd.ko      pcwd_pci.ko     sbc_epx_c3.ko      
scx200_wdt.ko      w83697hf_wdt.ko  wdt.ko
alim1535_wdt.ko  geodewdt.ko     it8712f_wdt.ko          mixcomwd.ko     pcwd_usb.ko     sbc_fitpc2_wdt.ko  
smsc37b787_wdt.ko  w83697ug_wdt.ko  wdt_pci.ko
alim7101_wdt.ko  hpwdt.ko        it87_wdt.ko             nv_tco.ko       sbc60xxwdt.ko   sc1200wdt.ko       softdog.ko         
w83877f_wdt.ko
cpu5wdt.ko       i6300esb.ko     iTCO_vendor_support.ko  pc87413_wdt.ko  sbc7240_wdt.ko  sc520_wdt.ko       
sp5100_tco.ko      w83977f_wdt.ko

gene@GO704:/lib/modules/3.4-9-rtai-686-pae/kernel/drivers/watchdog$ sudo modprobe  nv_tco
gene@GO704:/lib/modules/3.4-9-rtai-686-pae/kernel/drivers/watchdog$ sudo  /etc/init.d/openhpid start
[ ok ing openhpid: [....] success.
gene@GO704:/lib/modules/3.4-9-rtai-686-pae/kernel/drivers/watchdog$ sudo  /etc/init.d/openhpid status
Checking for openhpid daemon: 
[ ok ] openhpid is not running.

So I've no clue where to go from here.  I put it back on the list in case
somebody has a bigger box of clues than I do.  The admin tool I installed is openhpi.

Cheers, Gene Heskett
-- 
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Genes Web page <http://geneslinuxbox.net:6309/gene>


Reply to: