[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#781882: marked as done (frequent flash-kernel triggers on Wheezy->Jessie upgrade)



Your message dated Sun, 07 Jun 2015 10:13:03 +0100
with message-id <1433668383.3342.31.camel@debian.org>
and subject line Re: initramfs-tools use of triggers and DPKG_MAINTSCRIPT_PACKAGE
has caused the Debian Bug report #781882,
regarding frequent flash-kernel triggers on Wheezy->Jessie upgrade
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
781882: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=781882
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems
--- Begin Message ---
Package: upgrade-reports
Severity: important

Hello, I performed a wheezy->jessie upgrade on 3 different armel 
devices yesterday.

Here are a few notes about some speed bumps that cropped up. 

The devices in question are:
* Two QNAP TS-419P+ Turbo devices
* One QNAP TS-219P II Turbo device

I performed the upgrade using these steps:

* Purge removed packages with: 
  apt-get purge $(dpkg -l | awk '/^rc/ { print $2 }')
* Change "wheezy" to "jessie" in sources.list
* apt-get update
* apt-get upgrade
* apt-get dist-upgrade
* reboot

These are the issues I encountered:



** error: apt-get dist-upgrade broke during a flash-kernel 

This happened on the two QNAP TS-419P+ devices but not on the QNAP 
TS-219P II Turbo device. The "apt-get dist-upgrade" stage aborted in 
a flash-kernel trigger that failed, because it seemed to try to flash 
the jessie 3.16 kernel before it was properly unpacked. Unfortunately 
I don't have the error message - I expected to capture it on the third 
device but of course the error didn't happen there. It was something about
how it couldn't find the vmlinuz 3.16 file; it seemed confused about whether
it should flash the wheezy 3.2 or the jessie 3.16 kernel. During both the 
"upgrade" and the "dist-upgrade" stage, there were probably between 5 and 10
runs of the flash-kernel trigger, which takes quite a long time.

I recovered by issuing an "apt-get -f install" which proceeded to unpack
several more packages and the later flash-kernel triggers succesfully flashed
a 3.16 kernel. Finally I re-ran "apt-get dist-upgrade" to wrap it up.

Full disclosure: on the failing machines I had an additional sources.list.d 
entry for www.deb-multimedia.org jessie, which was not present on the machine
that didn't exhibit this error.



** frequent flash-kernel triggers

As mentioned about, on all the machines the flash-kernel trigger ran 
frequently during the upgrade and dist-upgrade operations. Since this
takes several minutes, it would be ideal if this only happened once
during an upgrade or dist-upgrade run.



** odd entry in dmesg: "alg: hash: Test 3 failed for mv-sha1"

dmesg reveals some slightly concerning messages:
[   35.120866] alg: hash: Test 3 failed for mv-sha1
[   35.120895] 00000000: 10 bf d7 00 71 0b bb 83 3a 26 d0 97 13 05 99 f5
[   35.120910] 00000010: 3a 92 53 3c
[   35.216233] alg: hash: Test 1 failed for mv-hmac-sha1
[   35.216262] 00000000: 0c aa 9f d5 37 c3 79 3a 91 d9 21 5f 42 2b 2c 24
[   35.216277] 00000010: b7 c3 16 0c

This happens on all three machines. Not sure if this is a problem? 
Never saw this on the wheezy kernel.



** journalctl permission / no journal found

It was not immediately obvious how to view systemd journals as a non-root 
user, even being a member of the "root", "adm", "staff" groups. Apparently
the correct solution is to add the user to the group "systemd-journal".
The error message given, "no journals found", is also not very helpful in
diagnosing the problem. Perhaps something could be written about this in
the release notes.



** shutdown -rf now doesn't work anymore

Apparently systemd has removed the skip-fsck option to shutdown. The error
message given is not so pretty:
"Code should not be reached 'Unhandled option' at ../src/systemctl/systemctl.c:6316, function shutdown_parse_argv(). Aborting."

I guess the workaround is to just use "shutdown -r" and deal with potential 
fsck delays. tune2fs is a too permanent solution to removing fscks; I miss
a way to prevent fscks in a one-shot fashion. 

Also, it seems running "shutdown -r" now longer kicks out active ssh
sessions; instead other clients won't see the system going down until
they try to type in their session and get a broken pipe error back.



** bind9 ignores /etc/default/bind9 and starts on ipv6

It looks like systemd ignores /etc/default/bind9, which contains
OPTIONS="-u bind -4"
so bind9 starts up with both IPv4 and IPv6 enabled, which causes a LOT of
"named: error (network unreachable) resolving ...." log entries and possibly 
delays.

I worked around this by hacking in an "-4" option into the ExecStart line of
/lib/systemd/system/bind9.service 



** bitlbee /version identifies as "Linux/armv7l"

When issuing a /version command in irssi against a local bitlbee installation,
the response given back is "BitlBee-3.2.2-2. localhost Linux/armv7l", which
makes me wonder if the bitlbee binary is built for armv7 (debian armel
should be at armv5; uname -a gives armv5tel). It seems to work, so maybe
not a problem



** arpwatch does not start on boot

The arpwatch daemon no longer starts properly on boot. It logs the
following lines:
arpwatch[1052]: Running as uid=109 gid=105
arpwatch[1052]: Link layer type 113 not ethernet or fddi
and exits.

Manually restarting after the system has booted complete seems to work.



** apcupsd's /sbin/apcaccess is broken

Running "apcaccess" just prints a "Usage:" help text, instead 
of dumping the UPS statistics. Looks like it was reported in 
november 2014 with a patch but there is no followup in the bug tracker:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=770984



** apache 2.4 upgrade was tricky

The sites-available/sites-enabled files apparently must
end with ".conf". This could be made more explicit in the release notes,
which only talks about "conf.d". Had to manually rename all these files and
re-a2ensite them. This wasn't so bad since there was a lot of manual work
to fix mod_access_compat entries ("Allow from all") etc to the new Require
statements. mod_access_compat didn't do a very good job, because a lot of
local configuration files with Allow/Deny statements intende to override
defaults from the debian configuration files no longer worked, since
it looks like mod_access_compat statements no longer can "undo" 
2.4-style Require lines in the default configuration files. So in a way,
not having the virtual hosts that were missing a ".conf" in their
site-available filenames active helped prevent misconfigured sites
from being served while the upgrade was still in progress.



** "db5.1-util" have been kept back

On one machine, I keep getting "The following packages have been kept back:"
db5.1-util

Not sure what that's about, maybe I should just remove it?

# apt-get -s install db5.1-util
The following packages will be REMOVED:
  libdb5.1 python2.6
The following packages will be upgraded:
  db5.1-util

# apt-get -s install db5.1-util libdb5.1
libdb5.1 is already the newest version.
 db5.1-util : Breaks: libdb5.1 (< 5.1.29-8~) but 5.1.29-5 is to be installed
E: Unable to correct problems, you have held broken packages.



So, all-in-all, I survived the upgrade and things seem to be running OK,
but I'm glad I'm not new to Debian since there were a few snags along the
way. :)

Thanks,

-- System Information:
Debian Release: 8.0
  APT prefers testing-updates
  APT policy: (500, 'testing-updates'), (500, 'testing')
Architecture: armel (armv5tel)

Kernel: Linux 3.16.0-4-kirkwood
Locale: LANG=en_US.UTF-8, LC_CTYPE=nb_NO.iso88591 (charmap=ISO-8859-1)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

--- End Message ---
--- Begin Message ---
(this conversation really should have been going to the flash-kernel
clone in #781882 and not to the original upgrade-report bug in #781742,
I've adjusted CC and moved the original to BCC)

On Thu, 2015-05-07 at 15:04 +0100, Ian Campbell wrote:
> Resending with a more obvious subject.
> 
> The workaround I describe in the final paragraph does seem to work, but
> I'm not sure that's the best way to go.

So I've been mulling this over for a while and I'm afraid the conclusion
I've eventually reached is that if the initramfs has been regenerated
then we really ought to be writing it to flash too, since otherwise we
risk leaving the system in an unbootable state.

I think this risk is already somewhat present in the gap between <some
package>'s update and the initramfs trigger but adding another delay
between the initramfs trigger and the flash-kernel trigger is certainly
widening it. This may even be the logic behind initramfs clobbering
$DPKG_MAINTSCRIPT_PACKAGE for all I know.

I think if there is anything to be done here it would be to investigate
reducing the number of times the initramfs is regenerated in the first
place.

Thanks for the report, it was certainly worth investigating and
considering. I'm closing the cloned bug against flash-kernel with this
message, I'll leave it up to the initramfs-tools maintainers (or others)
if they want to reclone the original bug (where all the actual useful
info is) into something.

Ian.

> 
> On Mon, 2015-05-04 at 15:31 +0100, Ian Campbell wrote:
> > (CC initramfs-tools@packages, context is flash-kernel invocation not
> > being deferred via triggers during upgrade and ultimately running
> > several times in a dist-upgrade)
> > 
> > On Sat, 2015-04-04 at 10:49 +0100, Ian Campbell wrote:
> > > At first glance it seems like invocations via the initramfs-tools hooks
> > > are not being deferred.
> > 
> > This is because initramfs-tools.postinst contains:
> >         # Regenerate initramfs whenever we go to dpkg state `installed'
> >         if [ "x$1" != xtriggered ]; then
> >                 # this activates the trigger, if triggers are working
> >                 update-initramfs -u
> >         else
> >                 # force it to actually happen
> >                 DPKG_MAINTSCRIPT_PACKAGE='' update-initramfs -u
> >         fi
> > 
> > and flash-kernel uses [ -n "$DPKG_MAINTSCRIPT_PACKAGE" ] when deciding
> > to defer to a trigger. So the invocations of flash-kernel
> > via /etc/initramfs/post-update.d/flash-kernel end up never being
> > deferred.
> > 
> > I don't think initramfs-tools is wrong to do this per-se, but it does
> > mean that anything hooked off the post-update.d hooks cannot reliably
> > use triggers (dpkg-trigger uses $DPKG_MAINTSCRIPT_PACKAGE itself).
> > 
> > flash-kernel itself does something similar, but instead of manipulating
> > DPKG_MAINTSCRIPT_PACKAGE it instead sets FLASH_KERNEL_NOTRIGGER=1 and
> > keys off that.
> > 
> > It seems like the best solution would a patch to switch initramfs-tools
> > to a similar scheme, would such a patch be accepted?
> > 
> > If not then I will arrange for /etc/initramfs/post-update.d/flash-kernel
> > to signal to f-k somehow that triggers should be used despite the lack
> > of DPKG_MAINTSCRIPT_PACKAGE.
> > 
> > Ian.
> > 
> > 
> 
> 
> 

--- End Message ---

Reply to: