Bug#304267: RC3 netinst fails at boot on ia64

reassign 304267 kernel-image-2.6.8-2-itanium-smp

On Wed, Apr 13, 2005 at 12:51:18AM +0100, Steve McIntyre wrote:
>On Tue, Apr 12, 2005 at 04:09:13PM -0400, Joey Hess wrote:
>>Steve McIntyre wrote:
>>> ACPI: PCI interrupt 0000:01:00.0[A] -> GSI 19 (level, low) -> IRQ 52
>>> GSI 56 (level, low) -> CPU 1 (0x0300) vector 53
>>> ACPI: PCI interrupt 0000:01:0f.0[A] -> GSI 56 (level, low) -> IRQ 53
>>> GSI 57 (level, low) -> CPU 0 (0x0000) vector 54
>>> ACPI: PCI interrupt 0000:02:0f.0[A] -> GSI 57 (level, low) -> IRQ 54
>>> GSI 55 (level, low) -> CPU 1 (0x0300) vector 55
>>> ACPI: PCI interrupt 0000:03:00.0[A] -> GSI 55 (level, low) -> IRQ 55
>>> perfmon: version 2.0 IRQ 238
>>> perfmon: Itanium PMU detected, 14 PMCs, 18 PMDs, 4 counters (32 bits)
>>> in all cases it stops dead and will not respond any further. It sounds
>>> like the hardware is resetting at this point - the IDE floppy and the
>>> CD drive both make a churning noise similar to that at initial POST,
>>> and the VGA display blanks.
>>FWIW, here's the same bit of the (net)boot sequence of d-i with 2.6 on
>>my xp1000:
>>PCI->APIC IRQ transform: (20:01.0 INTA) -> CPU 0x0000 vector 54
>>PCI->APIC IRQ transform: (20:01.1 INTB) -> CPU 0x0000 vector 55
>>PCI->APIC IRQ transform: (20:02.0 INTA) -> CPU 0x0000 vector 56
>>PCI->APIC IRQ transform: (80:00.0 INTA) -> CPU 0x0000 vector 57
>>HWP0001 SBA at 0xfed00000; pci dev 00:1e.0
>>HWP0001 IOC at 0xfed01000; pci dev 00:1d.0
>>HWP0002 PCI LBA _BBN 0x00 at 0xfed20000; pci dev 00:1c.0
>>HWP0002 PCI LBA _BBN 0x20 at 0xfed22000; pci dev 20:1e.0
>>HWP0002 PCI LBA _BBN 0x40 at 0xfed24000; pci dev 40:1e.0
>>HWP0002 PCI LBA _BBN 0x60 at 0xfed26000; pci dev 60:1e.0
>>HWP0002 PCI LBA _BBN 0xc0 at 0xfed2c000; pci dev c0:1e.0
>>HWP0003 AGP LBA _BBN 0x80 at 0xfed28000; pci dev 80:1e.0
>>IOC: reserving 512Mb of IOVA space at 0x60000000 for agpgart
>>IOC: zx1 2.2 HPA 0xfed01000 IOVA space 1024Mb at 0x40000000
>>Linux NET4.0 for Linux 2.4
>>Based upon Swansea University Computer Society NET3.039
>>Initializing RT netlink socket
>>perfmon: version 1.5 IRQ 238
>>perfmon: 14 PMCs, 18 PMDs, 4 counters (32 bits)
>>PAL Information Facility v0.5
>>EFI Variables Facility v0.06 2002-Dec-10
>>Total HugeTLB memory allocated, 0
>>Starting kswapd
>>VFS: Disk quotas vdquot_6.5.1
>>Hugetlbfs mounted.
>>devfs: v1.12c (20020818) Richard Gooch (rgooch@atnf.csiro.au)
>>devfs: boot_options: 0x0
>>Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
>>This is well before init starts, so it looks like entirely a kernel
>>problem, I think it should be reassigned to an appropriate kernel package.
>That's fair, yes. I managed to get woody installed last night and I
>upgraded the system to sarge from there successfully. The system runs
>fine with the woody kernel (2.4.17-mckinley-smp), but all of the sarge
>kernels I've tried fail in a similar way during boot:
>kernel-image-2.4.27-2-itanium-smp  2.4.27-7
>kernel-image-2.4.27-2-mckinley-smp 2.4.27-7
>kernel-image-2.6.8-2-itanium-smp   2.6.8-12
>kernel-image-2.6.8-2-mckinley-smp  2.6.8-12
>which seems odd. The (woody) 2.4.17-mckinley-smp kernel is what I'm
>stuck with for now.

Quite a way on now...

I'm reassigning this to the itanium-smp 2.6 kernel in sarge, as that's
the kernel I've finally found the time to debug. All of the ia64
kernels in sarge fail to boot on the SMP HP i2000 workstation I have.
The machine used to be caballero.d.o; I don't know if it's a standard
machine or not, as I don't know much about the history of the machine.

What I _have_ found by trial and error with printk is that the lockup
I'm seeing (in 2.6.8 at least, 2.4.27 may be different) is somewhere
in (or under) ia64_mca_late_init(). By simply adding an early return
from that function (before the #ifdef CONFIG_ACPI block), I can make
the kernel boot fine. I don't know very much about
ia64_mca_late_init() - I'm assuming/hoping it's nothing to do with
Micro Channel Architecture!

