[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Problems with KuroBox Pro and micro-evtd



On Wed, 8 Mar 2017 16:25:08 -0300
Rogério Brito <rbrito@gmail.com> wrote:

> Hi, Ryan.
> 
> On Tue, Mar 7, 2017 at 11:11 PM, Ryan Tandy <ryan@nardis.ca> wrote:
> > Hi Rogério,
> >
> > Sorry to hear about this issue.
> >
> > Is there enough info in syslog/journal to tell whether it's a graceful
> > (software-initiated) shutdown or simply the hardware forcing power off?
> 
> No, there are no messages, which makes things very hard to debug.
> 
> In fact, I looked at the code and the daemon could use more messages
> more being sent to syslog. :)

I reproduced this issue.
Yes, there's no log in syslog.
But After enabling DEBUG=1 in /etc/micro-evtd.conf, I have log in:
  /var/log/micro-evtd.log

the format of the log is: <DateTime> <n1,n2,n3>
which n1 seems to be the temperature. But I'm not sure what's n2 and n3.

> > I was going to mention the temperature thresholds that were adjusted:
> >
> > https://anonscm.debian.org/cgit/collab-maint/micro-evtd.git/commit/?id=b6a052b00cf898689dba1dd993037facaa1bf741

I guess the temperature control is irrelevant, but this commit:
  https://anonscm.debian.org/git/collab-maint/micro-evtd.git/commit/?id=9922c6d

According to changelog of kernel, the new restriction introduced by kernel
4.8 can be turned off by kernel option: iomem=relaxed

So if your u-boot-tools is properly configured (fw_printenv command can
show u-boot env list), you can setup kernel option by:
  fw_setenv bootargs_root "root=/dev/sda2 rw panic=5 iomem=relaxed"
(after above command you should run fw_printenv again to confirm it's
properly written to the device)

> Indeed, the limits only went up (if I didn't miss anything) and my
> system seems very stable with micro-evtd disabled. Of course, none of
> the functionalities of it (like pressing the button to turn the unit
> off) are present then (unless I hold the button for a hard shutdown).

How do you disable micro-evtd and keep the device running?
I remember the device need micro-evtd to talk to micon every a few
minutes to keep the watchdog happy.

> > 1-2 days is a very odd time period for it to survive!
> 
> Indeed, especially when, without micro-evtd or with the old
> version/"old state of the world" (with respect to both the kernel and
> earlier versions of micro-evtd) I get a computer that works as long as
> I don't reboot it or don't have power outages.
> 
> Oh, I noticed something strange, but I don't know if it is related
> somehow with micro-evtd not working. For quite some time now (let's
> say, 2 or 3 years, but I don't remember when it started), whenever I
> try to see the environment with fw_printenv, I get errors in the
> kernel log telling me that the NAND has unrecoverable errors:
> 
> - - - - - - - -
> (...)
> [   22.466531] Adding 396284k swap on /dev/sda3.  Priority:-1
> extents:1 across:396284k FS
> [   23.123208] EXT4-fs (sda1): mounting ext2 file system using the
> ext4 subsystem
> [   23.184435] EXT4-fs (sda1): mounted filesystem without journal.
> Opts: errors=remount-ro
> [   28.634334] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
> [   31.259076] mv643xx_eth_port mv643xx_eth_port.0 eth0: link up, 1000
> Mb/s, full duplex, flow control disabled
> [   31.268998] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> [   56.754422] NFSD: starting 90-second grace period (net c058c578)
> [132923.387111] __nand_correct_data: uncorrectable ECC error
> [132923.392597] __nand_correct_data: uncorrectable ECC error
> [132923.398070] __nand_correct_data: uncorrectable ECC error
> [132923.403519] __nand_correct_data: uncorrectable ECC error
> [132923.408972] __nand_correct_data: uncorrectable ECC error
> [132923.414422] __nand_correct_data: uncorrectable ECC error
> [132923.419876] __nand_correct_data: uncorrectable ECC error
> [132923.425327] __nand_correct_data: uncorrectable ECC error
> [132923.431272] __nand_correct_data: uncorrectable ECC error
> [132923.436729] __nand_correct_data: uncorrectable ECC error
> [132923.442192] __nand_correct_data: uncorrectable ECC error
> [132923.447663] __nand_correct_data: uncorrectable ECC error
> [132923.453113] __nand_correct_data: uncorrectable ECC error
> [132923.458564] __nand_correct_data: uncorrectable ECC error
> [132923.464019] __nand_correct_data: uncorrectable ECC error
> [132923.469468] __nand_correct_data: uncorrectable ECC error
> - - - - - - - -

I only have those ECC errors on boot time, and it doesn't appear
when running fw_printenv/fw_setenv.
If your fw_printenv result is fine, I guess there's nothing to worry about.

Cheers,
-- 
Roger Shimizu, GMT +9 Tokyo
PGP/GPG: 4096R/6C6ACD6417B3ACB1

Attachment: pgpmXO3V72h8o.pgp
Description: PGP signature


Reply to: