[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#558686: Partition manager fails to update kernel partition table

Hi Frans, 

Thanks for your investigation into this problem. I am impressed!

On Mon, Nov 30, 2009 at 03:58:31AM +0100, Frans Pop wrote:
> On Sunday 29 November 2009, Torsten Landschoff wrote:
> What can be seen from your logs is that you're creating an LVM on RAID 
> setup using manual partitioning. The error occurs during the *first* time 
> partman tries to commit changes to disk.


> I've just spent about 4 hours trying to reproduce the error in Virtualbox. 
> AFAICT I've succeeded in reconstructing exactly what you did in partman 
> Here's what I did to reproduce your actions.
> <snip>
> Starting position:
> Disk a: msdos disklabel
> - primairy ext4 partition 1
> - logical swap partition 5
> - free space at end of disk
> Disk b: no disklabel

Looks correct. I don't know exactly which setup I used for Ubuntu.  I did the
partitioning manually since I did not want it to create a 1TB+ filesystem which
I expected would take quite some time.

I think, I had /dev/sda1, 20GB ext4, and /dev/sda2, 16 GB swap.

> Start partman
> Choose: Guided LVM, but Go Back immediately

Right, I wanted to check if it suggests some RAID1 setup.

> Choose: Manual
> Select disk a and create new disklabel
> Select disk b and create new disklabel
> Create xGB primairy partition on disk a (different size than existing 
> partition 1)
> - use as RAID
> - delete partition

I first wanted to use the whole disk as LVM on RAID, but figured that having
/boot extra would be a good idea.

> Create xGB primairy partition on disk a
> - change mountpoint and select /boot
> - change type to ext2
> - mark bootable
> - done
> Select just created partition
> - use as RAID
> - done

... to have another boot partition on /dev/sdb.

> Create xGB primairy partition on disk b
> - change mountpoint, but Go Back immediately
> - use as RAID
> - done
> Create yGB primairy partition on disk a (leave some free space)
> - use as RAID
> - done
> Create yGB primairy partition on disk b
> - use as RAID
> - done
> Choose: Configure RAID
> - Accept to commit changes
> => for me: success; for you: error message
> </snip>

Quite close to my setup.

> I *can* reproduce the error by manually activating swap on /dev/hda5 from a 
> debug shell just before starting partman, except that it complains about
> /dev/hda1 instead of /dev/sda2.

Does the installer by default use any swap partition it finds? I did not enable
swap (hardly needed with that much RAM), but wasn't sure if d-i might auto configure
it when finding a swap partition.

> If I then switch to a debug shell and do 'fdisk -l /dev/hda', I see that - 
> despite the error message - the partition table *has* been changed, and if 
> I check 'free' I see that swap is disabled. So AFAICT your action to write 
> the partition table again from fdisk was probably redundant.

I don't think so. I checked in the shell if /proc/partitions (did not know about
fdisk -l) matched my expectations and it had the new setup for /dev/sdb but not
for /dev/sda.

> Questions:
> - did you do anything special or manually in the early part of the
>   installation (before the start of partitioning)?

Nothing special. I think I dropped out of the standard sequence because I
tried to get english texts with a german keyboard. I know this is bad for
the localization but I often find myself translating back to english to be
able to understand german messages.

> - did you do anything special or manually during partitioning before
>   the error occurred?


> - does my reconstruction above match what you did, or was there anything
>   different?

AFAIR, your reconstruction matches my steps, apart from partition sizes.

> Please think carefully: this is a subtle issue, details are essential.
> As already requested, please send the syslog of the installation! You can 
> find it under /var/log/installer on the installed system.

Sent to the installation report to have it all in one place.

> Some wild theories:
> 1) this is a libparted bug; somehow it manages to confuse itself about the
>    state of the disk (busy or not busy)

So far this is my guess. I think you showed that it is not (3). I would think
that (2) also does not apply, since fdisk immediately got the tables reloaded.
Perhaps I should try a few more times to see if this is reliable. I don't
see how (4) could apply - the ext4 on /dev/sda1 was not mounted and what else
should keep the disk busy? An MD or LVM device, sure, but nothing like that
was configured.

Greetings, Torsten

Reply to: