On Wed, 8 Mar 2017 16:25:08 -0300 Rogério Brito <rbrito@gmail.com> wrote: > Hi, Ryan. > > On Tue, Mar 7, 2017 at 11:11 PM, Ryan Tandy <ryan@nardis.ca> wrote: > > Hi Rogério, > > > > Sorry to hear about this issue. > > > > Is there enough info in syslog/journal to tell whether it's a graceful > > (software-initiated) shutdown or simply the hardware forcing power off? > > No, there are no messages, which makes things very hard to debug. > > In fact, I looked at the code and the daemon could use more messages > more being sent to syslog. :) I reproduced this issue. Yes, there's no log in syslog. But After enabling DEBUG=1 in /etc/micro-evtd.conf, I have log in: /var/log/micro-evtd.log the format of the log is: <DateTime> <n1,n2,n3> which n1 seems to be the temperature. But I'm not sure what's n2 and n3. > > I was going to mention the temperature thresholds that were adjusted: > > > > https://anonscm.debian.org/cgit/collab-maint/micro-evtd.git/commit/?id=b6a052b00cf898689dba1dd993037facaa1bf741 I guess the temperature control is irrelevant, but this commit: https://anonscm.debian.org/git/collab-maint/micro-evtd.git/commit/?id=9922c6d According to changelog of kernel, the new restriction introduced by kernel 4.8 can be turned off by kernel option: iomem=relaxed So if your u-boot-tools is properly configured (fw_printenv command can show u-boot env list), you can setup kernel option by: fw_setenv bootargs_root "root=/dev/sda2 rw panic=5 iomem=relaxed" (after above command you should run fw_printenv again to confirm it's properly written to the device) > Indeed, the limits only went up (if I didn't miss anything) and my > system seems very stable with micro-evtd disabled. Of course, none of > the functionalities of it (like pressing the button to turn the unit > off) are present then (unless I hold the button for a hard shutdown). How do you disable micro-evtd and keep the device running? I remember the device need micro-evtd to talk to micon every a few minutes to keep the watchdog happy. > > 1-2 days is a very odd time period for it to survive! > > Indeed, especially when, without micro-evtd or with the old > version/"old state of the world" (with respect to both the kernel and > earlier versions of micro-evtd) I get a computer that works as long as > I don't reboot it or don't have power outages. > > Oh, I noticed something strange, but I don't know if it is related > somehow with micro-evtd not working. For quite some time now (let's > say, 2 or 3 years, but I don't remember when it started), whenever I > try to see the environment with fw_printenv, I get errors in the > kernel log telling me that the NAND has unrecoverable errors: > > - - - - - - - - > (...) > [ 22.466531] Adding 396284k swap on /dev/sda3. Priority:-1 > extents:1 across:396284k FS > [ 23.123208] EXT4-fs (sda1): mounting ext2 file system using the > ext4 subsystem > [ 23.184435] EXT4-fs (sda1): mounted filesystem without journal. > Opts: errors=remount-ro > [ 28.634334] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready > [ 31.259076] mv643xx_eth_port mv643xx_eth_port.0 eth0: link up, 1000 > Mb/s, full duplex, flow control disabled > [ 31.268998] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready > [ 56.754422] NFSD: starting 90-second grace period (net c058c578) > [132923.387111] __nand_correct_data: uncorrectable ECC error > [132923.392597] __nand_correct_data: uncorrectable ECC error > [132923.398070] __nand_correct_data: uncorrectable ECC error > [132923.403519] __nand_correct_data: uncorrectable ECC error > [132923.408972] __nand_correct_data: uncorrectable ECC error > [132923.414422] __nand_correct_data: uncorrectable ECC error > [132923.419876] __nand_correct_data: uncorrectable ECC error > [132923.425327] __nand_correct_data: uncorrectable ECC error > [132923.431272] __nand_correct_data: uncorrectable ECC error > [132923.436729] __nand_correct_data: uncorrectable ECC error > [132923.442192] __nand_correct_data: uncorrectable ECC error > [132923.447663] __nand_correct_data: uncorrectable ECC error > [132923.453113] __nand_correct_data: uncorrectable ECC error > [132923.458564] __nand_correct_data: uncorrectable ECC error > [132923.464019] __nand_correct_data: uncorrectable ECC error > [132923.469468] __nand_correct_data: uncorrectable ECC error > - - - - - - - - I only have those ECC errors on boot time, and it doesn't appear when running fw_printenv/fw_setenv. If your fw_printenv result is fine, I guess there's nothing to worry about. Cheers, -- Roger Shimizu, GMT +9 Tokyo PGP/GPG: 4096R/6C6ACD6417B3ACB1
Attachment:
pgpmXO3V72h8o.pgp
Description: PGP signature