Re: Connot load Wheezy in a "virgin" desktop -- long

To: Ken Heard <kenslists@teksavvy.com>
Cc: Debian-user <debian-user@lists.debian.org>
Subject: Re: Connot load Wheezy in a "virgin" desktop -- long
From: Henning Follmann <hfollmann@itcfollmann.com>
Date: Tue, 7 Jan 2014 12:55:06 -0500
Message-id: <[🔎] 20140107175506.GA23874@newton.itcfollmann.com>
In-reply-to: <[🔎] CA+QnjnjFzNm3KwaCJp+O-gXeuwrA7f1dn2tF+S+=D=O9HM=joQ@mail.gmail.com>
References: <[🔎] CA+QnjnjFzNm3KwaCJp+O-gXeuwrA7f1dn2tF+S+=D=O9HM=joQ@mail.gmail.com>
On Mon, Jan 06, 2014 at 06:25:24PM -0500, Ken Heard wrote:
> I apologize in advance for the length of this post.  Since however I do not
> know what information is necessary to determine why this installation
> failed I am including everything which I have the least suspicion may be
> contributing to the failure.
> 
> To begin I describe the essentials of my recently purchased custom designed
> desktop.  It consists of a Gigabyte GA-Z87N mainboard, the Z87N being one
> of Intel's latest Haswell chipsets.  Integrated in that processor is
> Intel's HD graphics 4600, the mainboard having the required graphics
> outputs.  Two 2 terabyte hard drives are designated as a RAID1.
> 
> I started out with the intention of  installing only the basic system,
> thereby shortening  reinstalment processes if for any reason I needed to
> start the installer over again -- such as to change the partitioning.  At
> this stage for example I tried to install UEFI.  Since I was unsuccessful I
> reverted to the standard BIOS.  My experience trying to install UEFI will
> be the subject of another thread.
> 
> Since Dec 31 when I started this process I have lost count of the number of
> false starts.  My guess is that since then I ended up initiating the
> installer about fifteen times.
> 

So did you try to run the netinst cd with just the suggested (by the
installer) options. Basically just pressing enter, provide a root paaswd
and a user and passwd for the user?
That did fail? And how exactly?

> When I thought everything was ready to install the desktop environment (DE)
> and other things I tried to do so manually by installing xserver-xorg and
> xfce meta packages.  This method was unsuccessful. Examination of log files
> indicated that much was missing, including whatever package contains
> startx.  (Running startx in a virtual terminal returned the message
> "command not found".)

Again, what do you mean by "everything was ready"?
Did you have a running system? Did it boot into console?
If you want to install a desktop environment it is always best to do that
either by tasksel or selecting one of the meta packages:
apt-get install xfce4
should install all you need.


> 
> I then decided to allow the installer to install the DE and other selected
> tasks; to do so however required initiating the installer once again -- in
> fact twice.  First I set up in the partition section of the installer with
> xfs file type in most of my partitions with lilo as the boot loader, but
> the installer would not install lilo.  (Does xfs still require lilo instead
> of grub?)
> 
> The second time I used the ext4 file type with grub as the boot loader.
> The boot process aborted with a  "failed" message.  To see what would
> happen  I typed Control-D to resume the boot.  In due course the xfce login
> window appeared.  After entering my user name and password the monitor went
> blank.  At this point the computer was totally unresponsive to any input.
> I had to reboot.
>

Again, please be precise with your error messages:
At what point did it abort with the failed message. What exactly did fail?
Some amd64 systems seems to have an issue with the grub-efi.
Changing to legacy BIOS should safe that issue.


 
> This boot ran to the point where the line containing Control-D appeared.
> This time I entered the root password in order to examine the output of the
> dmesg command, various log files and the information which scrolls by on
> the monitor while the OS is loading -- and perhaps glean some idea of why
> the DE would not load.

You keep repeating "DE would not load". What actually do you mean by that?
Does the computer boot? Do you try to start X manually and that fails?

> 
> The first problem encountered was that something was detecting and enabling
> the Logitech USB optical mouse many times, even to the point of
> interrupting other commands and their outputs.  I solved that problem -- at
> least temporarily -- by disconnecting it.  In a virtual terminal it is
> anyway not needed.
> 
> The first part of the dmesg output which scares me reads as follows.
> -------------------------------------------------------------------------
> [    0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-3.2.0-4-amd64
> root=/dev/mapper/TG1-root ro single
> [    0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
> [    0.000000] xsave/xrstor: enabled xstate_bv 0x7, cntxt size 0x340
> [    0.000000] Checking aperture...
> [    0.000000] No AGP bridge found
> [    0.000000] Calgary: detecting Calgary via BIOS EBDA area
> [    0.000000] Calgary: Unable to locate Rio Grande table in EBDA - bailing!
> [    0.000000] ------------[ cut here ]------------
> [    0.000000] WARNING: at
> /build/linux-rrsxby/linux-3.2.51/drivers/iommu/dmar.c:492
> warn_invalid_dmar+0x77/0x85()
> [    0.000000] Hardware name: Z87N-WIFI
> [    0.000000] Your BIOS is broken; DMAR reported at address 0!
> [    0.000000] BIOS vendor: American Megatrends Inc.; Ver: F4; Product
> Version: To be filled by O.E.M.
> [    0.000000] Modules linked in:
> [    0.000000] Pid: 0, comm: swapper Not tainted 3.2.0-4-amd64 #1 Debian
> 3.2.51-1
> [    0.000000] Call Trace:
> [    0.000000]  [<ffffffff81046cbd>] ? warn_slowpath_common+0x78/0x8c
> [    0.000000]  [<ffffffff81046d1f>] ? warn_slowpath_fmt_taint+0x3d/0x42
> [    0.000000]  [<ffffffff81348632>] ? __pte+0x7/0x8
> [    0.000000]  [<ffffffff816bde4c>] ? __early_set_fixmap+0x89/0x8d
> [    0.000000]  [<ffffffff812757e5>] ? warn_invalid_dmar+0x77/0x85
> [    0.000000]  [<ffffffff816df6d4>] ? check_zero_address+0xad/0xdc
> [    0.000000]  [<ffffffff81358b26>] ? bad_to_user+0x620/0x620
> [    0.000000]  [<ffffffff816df714>] ? detect_intel_iommu+0x11/0xaf
> [    0.000000]  [<ffffffff816b1e38>] ? pci_iommu_alloc+0x3f/0x67
> [    0.000000]  [<ffffffff816bdc2b>] ? mem_init+0x14/0xe5
> [    0.000000]  [<ffffffff816ab94e>] ? start_kernel+0x1d0/0x3c3
> [    0.000000]  [<ffffffff816ab140>] ? early_idt_handlers+0x140/0x140
> [    0.000000]  [<ffffffff816ab3c4>] ? x86_64_start_kernel+0x104/0x111
> [    0.000000] ---[ end trace 01021c3814caad1d ]---
> ----------------------------------------------------------------------------
> The first thing that scares me is the line: " Your BIOS is broken; DMAR
> reported at address 0!"  After some online research I discovered that this
> phenomenon also occurs in other mainboard brands besides Gigabyte.  Three
> possibilities of removing it were mentioned in posts online: disabling
> VT-d,  turning off iommu or updating the BIOS.a

What you see is not scary! The kernel complains that your BIOS is broken
(talk to Gigabyte about that). But it deals with it and moves on. 
All what you googled there is how people deal with that error message and
tried to make it disappear. Still the BIOS is still broken.
If you are not planning to run a hypervizor (Xen, KVM, etc...) you can
disable VT. Does not affect you.


> 
> The closest thing to VT-d I could find in the BIOS was something called
> "Intel Virtualization Technology".  I am not sure what that means or what
> it does; disabling it however made no difference; so I reënabled it.  I did
> not try turning off iommu or updating the BIOS.
> 
> So I suppose the real questions at this point are the following.  What
> purpose does this file serve?   Is the invalidity of the DMAR referred to
> in the "WARNING" line above sufficient to cause the DE not to load?
> 
> The other part of the dmesg output which concerns me follows.
> ---------------------------------------------------------------------------
> 1.240960]  sdb: sdb1 sdb2
> [    1.241103] sd 1:0:0:0: [sdb] Attached SCSI disk
> [    1.260609]  sda: sda1 sda2
> [    1.260755] sd 0:0:0:0: [sda] Attached SCSI disk
> [    1.593645] md: md0 stopped.
> [    1.594503] md: bind<sdb1>
> [    1.594659] md: bind<sda1>
> [    1.595242] md: raid1 personality registered for level 1
> [    1.595394] bio: create slab <bio-1> at 1
> [    1.595484] md/raid1:md0: active with 2 out of 2 mirrors
> [    1.595541] md0: detected capacity change from 0 to 248315904
> [    1.596423]  md0: unknown partition table
> [    1.683228] Refined TSC clocksource calibration: 3392.144 MHz.
> [    1.683278] Switching to clocksource tsc
> [    1.797451] md: md1 stopped.
> [    1.797959] md: bind<sdb2>
> [    1.798118] md: bind<sda2>
> [    1.798620] md/raid1:md1: not clean -- starting background reconstruction
> [    1.798673] md/raid1:md1: active with 2 out of 2 mirrors
> [    1.798731] md1: detected capacity change from 0 to 1499865088000
> [    1.806447]  md1: unknown partition table
> [    1.999928] device-mapper: uevent: version 1.0.3
> [    2.000006] device-mapper: ioctl: 4.22.0-ioctl (2011-10-19) initialised:
> dm-devel@redhat.com
> [    2.195467] EXT4-fs (dm-0): INFO: recovery required on readonly
> filesystem
> [    2.195518] EXT4-fs (dm-0): write access will be enabled during recovery
> [    2.263170] md: resync of RAID array md1
> [    2.263216] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> [    2.263264] md: using maximum available idle IO bandwidth (but not more
> than 200000 KB/sec) for resync.
> [    2.263330] md: using 128k window, over a total of 1464712000k.
> [    2.277910] EXT4-fs (dm-0): recovery complete
> [    2.319337] EXT4-fs (dm-0): mounted filesystem with ordered data mode.
> Opts: (null)
> --------------------------------------------------------------------------
> The lines above which i do not understand are 1.596423 and 1.806447, both
> of which say that the system is not aware of partition tables for md0 and
> md1.  Both are part of  a RAID1; md0 contains only the /the /boot
> partition, which happens to be empty because the boot loader is in the MBR;
> and md1 is the only physical volume in LVM volume group TH1.  All the other
> partitions are logical volumes in that volume group.
> 
> The following quote neither comes from the output of dmesg nor is part of
> syslog.  Instead it appears at the end of the information which scrolls by
> on the screen as part of the boot process.
> --------------------------------------------------------------------------
> [ ok ] setting up LVM Volume Groups ... done.
> [ .... ] Starting remaining crypto disks .... [info] TG1-swap_crypt
> (starting) ... TG1 -swap_crypt (started) ... TG1-swap_crypt (running) ...
> [info] TG1-tmp_crypt (starting) ...
> [  ok  mp_crypt (started ) ... done.  {sic}
> [ ok ] Activating lvm and md swap ... done.
> [....]  Checking file systems ... fsck from util-linux 2.20.1
> fsck.ext4: Unable to resolve 'UUID=a5fdb692-2b34-4e18-8fd5-c24dde957071'
> fsck.ext4: No such file or directory while trying to open
> /dev/mapper/TH1-ken
> Possibly non-existent device?
> fsck.ext4: No such file or directory while trying to open
> /dev/mapper/TH1-martin
> Possibly non-existent device?
> fsck.ext2: No such file or directory while trying to open
> /dev/mapper/TH1-tmp_crypt
> Possibly non-existent device?
> fsck.ext4: No such file or directory while trying to open
> /dev/mapper/TH1-var
> Possibly non-existent device?
> fsck died with exit status 8
> failed (code 8).  {code 8 means "an operational error"  -- my comment.}
> [....]  File system check failed.  A log is being saved in
> /var/log/fsck.checkfs if
> [FAIL] the location is writable.  Please repair the file system manually.
> ... failed!
> [....] A maintenance shell will now be started.  CONTROL-D will terminate
> this [warning] shell and resume system boot. ... (warning).
> Give root password for maintenance
> (or type Control-D to continue):
> ----------------------------------------------------------------------------
> I am reasonably certain that this failure is the main -- possibly the only
> -- reason for failure of the boot process to complete and install the DE.
> I am also at a loss as to how to fix it.  The /etc/fstab file shows those
> four partitions -- with file type ext4 -- are mounted in accordance with
> the partitions created during installation.  The output of command blkid
> also shows correctly the same information.  In maintenance mode I was able
> to access all the "failed" mount points and write files to them.
> 

Well your lvm is messed up. You are supposed to run fsck on them and then
restart.


> I would appreciate any advice from anybody out there as to how to make this
> computer operational.  Once again I apologize for the length of this post.
>

boot from netinst cd. Only select default when asked anything.
when it boots, learn and try to understand before changing anything.
When it doesn't boot ask again with exact error messages and point where it
failed.
-H

-- 
Henning Follmann           | hfollmann@itcfollmann.com
Reply to:
References:
- Connot load Wheezy in a "virgin" desktop -- long
  - From: Ken Heard <kenslists@teksavvy.com>
Prev by Date: Re: how to specify kernel parameter "memmap="
Next by Date: Re: MTP and Android phones
Previous by thread: Re: Connot load Wheezy in a "virgin" desktop -- long
Next by thread: USB keyboard: what driver(s)?
Index(es):
- Date
- Thread