[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Server upgrade to wheezy failed -- now trying just to boot new kernel.



On Sat, 29 Sep 2012 11:18:39 +0100, Brian wrote:

> On Fri 28 Sep 2012 at 19:50:22 +0000, Hendrik Boom wrote:
> 
>> My server upgrade from squeeze to wheezy just failed.  But I'm not
>> panicking, I can still dual-boot into a back-up squeeze partition, and
>> squeeze still works perfectly.
>> 
>> I just upgraded my server from squeeze to wheezy.  Lots of packages
>> failed to upgrade because of dependency problems.  Now it's normal to
>> have a few like this in a testing system, as packages leak through from
>> sid, so I wasn't too worried about this -- normally just wait a few
>> days and the missing dependencies show up.
>> 
>> But enough are missing that wheezy is not really usable.
>> 
>> It fails to recognise any network interfaces.  It used to recognise an
>> eth0, an eth1, and a ppp0, but now ifconfig reports nothing.  Of
>> course, this might not even be the fault of the missing packages. 
>> Maybe udev is wrong.  Yes, I started the upgrade with the kernel and
>> udev.  They should match.
>> 
>> I'm not sure where to start looking.
>> 
>> apt-get dist-upgrade just reports a lot of unresolved dependencies. 
>> I'm not clear what to do next.  apt-get suggests using apt-get -f
>> install. But which packages do I do this to?  Or do I misunderstand?
> 
> The idea is that 'apt-get -f install' by itself should sort out missing
> dependencies for a package. It will first look in
> /var/cache/apt/archives for them and next download from the mirror you
> are using if they are not there. The latter looks like a problem for you
> if there are no network interfaces :).

It's probably still worth a try.  Presumably many of the .debs it needs 
are already in the package cache.

But everything will be easier if I first get networking up.

> You probably should look at this
> first and see what the kernel is getting up to. The output of 'dmesg'
> after booting might help.

Indeed,  dmesg does mention udev, but nothing near there is an obvious 
error.

It immediately recognises the ethernet driver on the motherboard:

[    1.161749] Floppy drive(s): fd0 is 1.44M
[    1.188574] FDC 0 is a post-1991 82077
[    1.196843] forcedeth: Reverse Engineered nForce ethernet driver. 
Version 0.64.
[    1.197239] ACPI: PCI Interrupt Link [LMAC] enabled at IRQ 21
[    1.197281]   alloc irq_desc for 21 on node 0
[    1.197283]   alloc kstat_irqs on node 0
[    1.197296] forcedeth 0000:00:14.0: PCI INT A -> Link[LMAC] -> GSI 21 
(level, low) -> IRQ 21
[    1.197348] forcedeth 0000:00:14.0: setting latency timer to 64
[    1.197399] nv_probe: set workaround bit for reversed mac addr
[    1.200313] SCSI subsystem initialized
[    1.222134] libata version 3.00 loaded.


and later, has more to say about it:

[    1.249518] ata4: SATA max UDMA/100 host m128@0xfaaff800 port 
0xfaaf6000 irq 
19
[    1.249826] 8139cp 0000:04:08.0: This (id 10ec:8139 rev 10) is not an 
8139C+ 
compatible chip, use 8139too
[    1.717092] forcedeth 0000:00:14.0: ifname eth0, PHY OUI 0x732 @ 1, 
addr 00:13:d4:fd:e7:8d
[    1.717146] forcedeth 0000:00:14.0: highdma pwrctl lnktim desc-v3
[    1.717430] pata_amd 0000:00:0d.0: version 0.4.1
[    1.717476] pata_amd 0000:00:0d.0: setting latency timer to 64


The other ethernet interface (not on the motherboard) is recognised later:

[   10.357886] scsi 5:0:1:0: CD-ROM            HL-DT-ST DVDRAM GSA-4167B 
DL11 PQ
: 0 ANSI: 5
[   10.668030] ata8: SATA link down (SStatus 0 SControl 300)
[   10.673335] 8139too Fast Ethernet driver 0.9.28
[   10.688995] ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 18
[   10.689041]   alloc irq_desc for 18 on node 0
[   10.689044]   alloc kstat_irqs on node 0
[   10.689057] 8139too 0000:04:08.0: PCI INT A -> Link[LNKA] -> GSI 18 
(level, l
ow) -> IRQ 18
[   10.690119] eth1: RealTek RTL8139 at 0xffffc90010af2c00, 
00:40:f4:27:a6:5c, IRQ 18
[   10.706493] sd 0:0:0:0: [sda] 1465149168 512-byte logical blocks: (750 
GB/698 GiB)
[   10.706594] sd 0:0:0:0: [sda] Write Protect is off

udev is again mentioned much later, but this time  it's presumably the 
udev demon, udevd:

[   12.253367] EXT3-fs: mounted filesystem with ordered data mode.
[   14.187800] <30>udevd[588]: starting version 175
[   14.608550] input: Power Button as /devices/LNXSYSTM:00/LNXSYBUS:00/
PNP0C0C:00/input/input3


But I think you're right.  The  problem may still be related to udev.

What I suspect is the problem is that my lilo.conf was not configured to 
boot the very latest kernel -- it booted one of the 2.26.x kernels 
instead of the 3.0.x kernels, but the udev was the latest kernel.

But no.  When I thought I booted the new kernel with the new udev during 
installation, I can't have actually done  that, for the reasons I 
encountered when I tried to fix this mismatch just this morning.  And so 
it must have booted well enough to continue the upgrade.  So something 
else, presumably some package-installation failures, would have to be 
responsible for my woes.

Anyway, this is what I did and figured out this morning:

However, correcting lilo.conf and running squeeze's lilo resulted in 
complaints that something was too big and would overwrite the bootloader.  
Presumably what is has to load for 3.0 is just too big for it.

By the way, the nice thing about LILO on a floppy is that you can have as 
many floppies as you want.  It's quite safe to experiment on a new floppy 
knowing that the one you usually boot from is safe outside the machine.

I tried running os-prober, which seemed to find the kernel, and then grub-
install.  Although os-prober found the new system (as evidenced by its 
console messages) it failed to put anything involving the up-to-date 
kernel into grub.cfg, so that was not helpful.  Or is there something 
else I have to do between os-prober and grub-install?

I did manage to use LILO boot the new system with the wrong kernel again 
(and no net), but it took forever to get around to giving me a text 
console.  It seemed to be waiting ages for rpcbind to work.

In the meantime, I went to the helpful gdm screen, and chose its text 
console option, and ran os-prober and grub-install there.  grub 
complained that my core was too big for the embedding region, and that it 
had to be installed there for a cross-device boot.  Yes, the new system 
is on an LVM partition on a RAID that's on disks that are not the disk 
that the machine boots from.

Any advice how to get further?  Presumably there are more extreme boot 
setups that would work.

By the way, the disk the machine now boots from is due to be removed 
after everything is running nicely on the new drive, but not before!

Perhaps, though, the thing to do is to wipe the wheezy partitions clean, 
and restart the whole upgrade with a fresh copy of squeeze, this time 
*not* rebooting before I do the apt-get -f install, now that I know what 
it means.

-- hendrik


Reply to: