[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: jessie won't install/boot on a Dell Poweredge R815



My posting has not appeared on debian-{boot,kernel,user}. I think it is
because of the attachments. I have removed them. I'll send the screen images
to people individually if they request them.
-------------------------------------------------------------------------------
I am cross posting this to debian-{boot,kernel,user}. I had replied to a reply
to my original post on debian-{boot,kernel} with a to: to the replier and a
cc: to debian-{boot,kernel} apparently it didn't get posted. So I am reposting
this there. And I am posting this on debian-user to provide more information
to all of the responders to my post there. My original post was short, just to
raise the issue. This post is longer, to provide all of the details that I
have.

Thanks to everyone for your help.

Some background. I have 23 machines.

 11 Dell T5500            each has 4 disks
  4 HP DL165              each has 3 disks
  4 Dell Poweredge R815   each has 6 disks
  4 Dell Poweredge C6145  each has 4 disks

All were purchased around 2011. All have been running wheezy reliably for
years and running squeeze reliably for years before that. The initial install
about 5 years ago was squeeze, with the squeeze installer. And then a
dist-upgrade to wheezy a few years later.

All machines within a class have the same hardware and have their disks
partitoned identically. The disks were partitioned at the time of the initial
install of squeeze about five years ago by the squeeze installer. All the
machines have SATA disks but different classes of machines have different
numbers of disks of different sizes. The disks on the T5500s and C6145s are
the same.

Dell T5500
  sd[a-d]1 md0 RAID1 ext4 /
  sd[a-d]2 md1 RAID5 ext4 /aux
  sd[a-d]3 swap
DL165
  sd[a-c]1 md0 RAID1 ext3 /
  sd[a-c]2 md1 RAID5 ext3 /aux
  sd[a-c]3 swap
R815
  sd[a-f]1 md0 RAID1 ext3 /
  sd[a-f]2 md1 RAID5 ext3 /aux
  sd[a-f]3 swap
C6145
  sd[a-d]1 md0 RAID1 ext3 /
  sd[a-d]2 md1 RAID5 ext3 /aux
  sd[a-d]3 swap

The reason that the T5500s have ext4 and the others do not is that the
machines were purchased at slightly different times and ext4 became available.

I first tried to do a dist-upgrade from wheezy to jessie one one machine of
each class. But the dist-upgrade hung on 3 of the 4 machine types. I didn't
save the details from that. But what I decided to do was a fresh install on
one machine of each class.  That fresh install succeeded on the T5500, the
DL165, and the C6145. So I upgraded all of the T5500s, all of the DL165s, and
all of the C6145s with a fresh install of jessie. That was successfull. There
was (and still is) a minor issue with the C6145s. I will discuss that
later. But the attempted fresh install to one R815 has not been successful.

For the fresh installs, I am using the jessie installer on USB, built as
described below. I attempt to preserve the existing disk partitioning. I also
attempt to preserve the existing md1 /aux. These are my long-term data storage
and collectively have about 100 terabytes of data. I reformat md0 /, keeping
it as ext3 on the DL165s, R815s, and C6145s and keeping it as ext4 on the
T5500s.

On the R815, I first tried to do a fresh install from USB. (That was after the
unsuccessful attempt at a dist-upgrade from a wheezy installation that had
been running for years.) I tried that about 8 times, all unsuccessful. But it
fails in slightly different ways each time. That nondeterministic behavior,
described below, leads me to believe that there is a bug. After that, I tried
unsuccessfully to boot from a live wheezy. (See my other posts to
debian-user.) After that, I was successful in doing a fresh install of wheezy.
That install was a minimal install. I did nothing but the fresh install from
USB and I deselected all of the options for additional software to install.
After that minimal install of wheezy, all I did was:

  nano /etc/apt/sources.list
  (change all wheezy to jessie)
  apt-get update
  apt-get dist-upgrade
  (answer default to all questions)
  /sbin/reboot

The dist-upgrade did not complain and did not give any errors. But upon
reboot, it entered the initramfs. A screen picture is enclosed below.

I am only posting the part below because it has not previously been posted. To
the readers of debian-users, there have been posts to debian-{boot,kernel}
that may answer some of your questions and provide more information. I am not
reposting those. Likewise, to the readers of debian-{boot,kernel}, there have
been posts to debian-user that may answer some of your questions and provide
more information. I am not reposting those.

   From: deloptes <deloptes@gmail.com>
   I failed today to upgrade wheezy to jessie on raided system as well.

Please note that all of the above systems have / as md0 RAID1. The fresh
install of jessie was successfull on all but the R815s.
--------------------------------------------------------------------------------
   >     Then it fails to reboot and goes into the initramfs. I have a picture of
   >     the screen if anybody wishes.

   Yes please.  Also please use the 'rescue' boot option which enables
   more verbose logging to the screen.

Thanks for your help.

Here is a screen picture.

This is after (a) a fresh install of wheezy followed by (b) an apt-get
dist-upgrade to jessie followed by (c) /sbin/reboot.

The above picture was taken before your email. I have since reinstalled a
fresh wheezy. I can redo the apt-get dist-upgrade to jessie and reboot with
the rescue boot option and take a new picture if you wish. But before I do so,
please let me know what else you would like me to do as part of the same
experiment. The experiment will take several hours (including the subsequent
reinstall of a fresh wheezy). So let's maximize the amount of information gain
with this experiment.

I conjecture that the jessie kernel has difficulty accessing the MD array on
disk. The same problem occurs when I attempt a direct fresh install of jessie
with the installer.

The machine has six disks, all ST9500530NS SATA. These have about 500GB each.
They all are partitioned identically with three partitions. sd[a-f]1 is RAID1
md0 ext3 mounted as /. sd[a-f]2 is RAID5 md1 ext3  mounted as /aux. sd[a-f]3
is swap.

Enclosed below is the output of fdisk on one disk. It is not from the
particular machine in question because that machine is not currently on the
net and I am offsite. But it is from another R815 purchased at the same time
that is running wheezy. All six disks on all four R815s are partitioned
identically. I partitioned them only once when I did a fresh install of
squeeze (with the squeeze installer) when I purchased the machines in about
2011.

When I fresh install either wheezy or jessie, I keep md1 and reformat
md0. When I apt-get dist-upgrade from wheezy to jessie, there is no reformat.

Here is what happens that is strange. When I do a fresh install of jessie, one
of the first things that the installer does is probe for hardware to try to
find the ISO. I have done this about 10 times. Sometimes (about 3 or 4) it
succeeds in finding the ISO. Sometimes (the rest) it comes up with a red
screen and claims that it can't find the ISO. In all cases, I am booting the
installer from the same USB dongle with the same data on it. I made the dongle
as follows:

   # cd /tmp
   # wget http://ftp.nl.debian.org/debian/dists/jessie/main/installer-amd64/current/images/hd-media/boot.img.gz
   # wget http://cdimage.debian.org/cdimage/unofficial/non-free/cd-including-firmware/8.5.0+nonfree/amd64/iso-cd/firmware-8.5.0-amd64-netinst.iso
   # zcat boot.img.gz >/dev/sdf
   # mount /dev/sdf /mnt
   # cp firmware-8.5.0-amd64-netinst.iso /mnt/.
   # umount /mnt

(I actually have two such dongles, identical brand and size, with identical
data installed on them by the above. Sometime I use one and sometimes the
other.)

When it does find the ISO, it proceeds through the entire install without
issue until it gets to installing grub. Below are the answers that I give to
the installer. Somewhere in there, I forget exactly where but before the
network configuration, it asks which network device to use. The R815 has 4
identical ethernet ports. I select:

eth0: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet

When if gets to installing grub, I switch to ctrl-alt-f2 and type

cat /proc/mdstat

Every time so far, md1 has all 6 components. But md0 has only some of the
components, sometimes 5/6, sometimes 4/6, and sometimes 1/6. And every time it
is a different set of components. Even though, just a few minutes earlier, I
was running wheezy and md0 had all 6 components. I do

mdadm /dev/md0 --add <each of the missing components one by one>

but it refuses. I forget the error. If I redo a fresh wheezy install after a
failed jessie install, I get to the same place and do the same thing and it
does successfully add the missing components. I wait about a half an hour and
the array is successfully rebuilt. I then do

chroot target
grub-install /dev/sda
...
grub-install /dev/sdf

and it works. But if I attempt the grub-install in the jessie installer it
refuses. I forget the error.

In the jessie installer, no matter what I try, md0 has missing components, I
can't add them, and I can't install grub. If I go back to ctrl-alt-f1, it asks
what device to install grub to. I select sda. And I get a red screen that says
something like

Unable to install GRUB in /dev/sda
Executing 'grub-install /dev/sda' failed.
This is a fatal error.

If I look at ctrl-alt-f4, there are messages about unable to read block 2048
or 2052 or 2056 on dev/sd[a-f]. But there is no hardware problem. Because
right after this, I redo a fresh reinstall of wheezy from USB, rebuild md0 as
part of the process, install grub on all 6 drives as part of the process, and
everything works.

It is not just the jessie installer. If I do a fresh install of wheezy and get
a fully working wheezy with all six components of md0 and grub installed on
all 6 drives, and all I do is an apt-get dist-upgrade to jessie, I get no
errors during the upgrade. And after the upgrade, before reboot, all 6
components of md0 are there. (That is still running the wheezy kernel.) All I
do is /sbin/reboot and then it comes up in the initfs. And if I then do a
fresh reinstall of wheezy, I need to rebuild md0.

So it seems to me that something in the jessie kernel is broken, probably
related to the disk driver.

Also note that I upgraded to the latest BIOS. But the same exact problems
occurred both before the BIOS upgrade and after.

   booting jessie also takes hours to do systemd
   > configuration of the network

FYI, here is a screen picture where it takes minutes for systemd to bring up
the network. Note that I am not using DHCP. As per the enclosed, each host has
a fixed IPv4 address. There are fixed DNS servers. I am at a university and IT
services maintains the network for thousands of machines. I do not observe
issues bringing up the network when running wheezy.

    Jeff (http://engineering.purdue.edu/~qobi)
--------------------------------------------------------------------------------
default Install
default English
default United States
default American English
Go Back
default Configure network manually
128.46.115.211
default netmask
default gateway
128.210.11.57 128.210.11.5 128.46.154.76
default hostname
default domain name
root password
root password
Jeffrey Mark Siskind
qobi
password
password
default Eastern
Manual
RAID1 #1
Ext3 journaling file system
Format the partition: yes, format it
Mount point: /
Done setting up the partition
RAID5 #1
Ext3 journaling file system
default Format the partition: no, keep existing data
Mount point: /aux
Done setting up the partition
Finish partitioning and write changes to disk
Yes
default United States
default ftp.us.debian.org
default blank
Yes
uncheck all
Yes
/dev/sda
Continue
-------------------------------------------------------------------------------
Disk /dev/sda: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders, total 976773168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000080

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *        2048    78319615    39158784   fd  Linux raid autodetect
/dev/sda2        78319616   859570175   390625280   fd  Linux raid autodetect
/dev/sda3       859570176   976771071    58600448   82  Linux swap / Solaris


Reply to: