[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: We might have the GPU instances able to run



Warning - long mail!

Dnia 2014-06-01, nie o godzinie 21:24 +0800, James Bromberger pisze:

> > OK, I was using Debian unstable with GRUB 2.02 so I should repeat
> > tests with GRUB 1.99.
> 
> *nod* This is what I was talking about with you Tomasz and others -
> but I never figured out what it was in Grub 1.99 that was broken. If
> we can get a patch to grub (and I have already mentioned this to Colin
> Watson), then perhaps we merge that in the Grub Debian package.
> 
> In the mean time, I couldn't get a good with 1.99 to install boot
> blocks as a workaround (using all kinds of hardlinks to trick Grub
> 1.99)...
> 
I did some experiments trying to build HVM-compatible images
with GRUB.
First, I've built 2 AMIs, Wheezy and unstable, using following
manifests:
https://s3-eu-west-1.amazonaws.com/debian-pygpgpu/manifests/wheezy.manifest.json
https://s3-eu-west-1.amazonaws.com/debian-pygpgpu/manifests/unstable.manifest.json

Then I've run those on m3.medium and I was also trying to build AMIs
for Wheezy and unstable using following manifests:
https://s3-eu-west-1.amazonaws.com/debian-pygpgpu/manifests/wheezy.hvm.manifest.json
https://s3-eu-west-1.amazonaws.com/debian-pygpgpu/manifests/unstable.hvm.manifest.json

I was trying 3 configurations; all from my repository
https://github.com/rybaktomasz/bootstrap-vz.git
from branch grub-mapping.
The first one direct clone of this repository.
The second was this branch with removed changes from
bootstrapvz/common/tasks/filesystem.py - basically without
changing files in /dev/mapper/
The third was like the second, but I removed mapping
from bootstrapvz/common/tasks/boot.py InstallGrub.
Its run() method became:
    log_check_call(['chroot', info.root,
               'grub-install', info.volume.device_path])
    log_check_call(['chroot', info.root, 'update-grub'])



Building AMIs on Wheezy instance

The only successful building of Wheezy AMI (GRUB 1.99
on instance, and GRUB 1.99 on target) was when I've used 
code from grub mapping. All other attempts (without
"fixed" mapping and without mapping) failed with message:
INFO: Installing grub
DEBUG: Executing: chroot /target/99839bf4/root grub-install /dev
ERROR: /usr/sbin/grub-probe: error: cannot find a GRUB drive for
ERROR: Auto-detection of a filesystem of /dev/mapper/xvdf1 faile
ERROR: Try with --recheck.
ERROR: If the problem persists please report this together with 
ERROR: Command 'chroot /target/99839bf4/root grub-install /dev/x
Traceback (most recent call last):
  File "/home/admin/bootstrap-vz/bootstrapvz/base/main.py", line 78, in
run
    tasklist.run(info=bootstrap_info, dry_run=opts['--dry-run'])
  File "/home/admin/bootstrap-vz/bootstrapvz/base/tasklist.py", line 38,
in run
    task.run(info)
  File "/home/admin/bootstrap-vz/bootstrapvz/common/tasks/boot.py", line
92, in 
    'grub-install', info.volume.device_path])
  File "/home/admin/bootstrap-vz/bootstrapvz/common/tools.py", line 5,
in log_ch
    raise CalledProcessError(status, ' '.join(command),
'\n'.join(stderr))
CalledProcessError: Command 'chroot /target/99839bf4/root
grub-install /dev/xvdf
ERROR: Rolling back
DEBUG: Tasklist:
        bootstrapvz.common.tasks.filesystem.UnmountRoot
        bootstrapvz.common.tasks.partitioning.UnmapPartitions
        bootstrapvz.common.tasks.volume.Detach
        bootstrapvz.common.tasks.filesystem.DeleteMountDir
        bootstrapvz.common.tasks.volume.Delete
        bootstrapvz.common.tasks.workspace.DeleteWorkspace
INFO: Unmounting the bootstrap volume

OTOH all attempts of building unstable (i.e. GRUB 1.99 on instance,
GRUB 2.02 on target) succeeded. What's interesting,
I've got following warning two times, one when building unstable
from grub-mapping branch and one without "fixed mapping".

Installing grub
Installing for i386-pc platform.
grub-install: warning: the device.map entry `hd0,msdos1' is invalid.
Ignoring it. Please correct or delete your device.map.
Installation finished. No error reported.
/usr/sbin/grub-probe: warning: the device.map entry `hd0,msdos1' is
invalid. Ignoring it. Please correct or delete your device.map.
/usr/sbin/grub-probe: warning: the device.map entry `hd0,msdos1' is
invalid. Ignoring it. Please correct or delete your device.map.
Generating grub configuration file ...
/usr/sbin/grub-probe: warning: the device.map entry `hd0,msdos1' is
invalid. Ignoring it. Please correct or delete your device.map.
Found linux image: /boot/vmlinuz-3.14-1-amd64
Found initrd image: /boot/initrd.img-3.14-1-amd64
/usr/sbin/grub-probe: warning: the device.map entry `hd0,msdos1' is
invalid. Ignoring it. Please correct or delete your device.map.
/usr/sbin/grub-probe: warning: the device.map entry `hd0,msdos1' is
invalid. Ignoring it. Please correct or delete your device.map.
/usr/sbin/grub-probe: warning: the device.map entry `hd0,msdos1' is
invalid. Ignoring it. Please correct or delete your device.map.
/usr/sbin/grub-probe: warning: the device.map entry `hd0,msdos1' is
invalid. Ignoring it. Please correct or delete your device.map.
/usr/sbin/grub-probe: warning: the device.map entry `hd0,msdos1' is
invalid. Ignoring it. Please correct or delete your device.map.
done


Unfortunately I was not able to run any of the AMIs I created on Wheezy.
I've tried to run them on cg1.4xlarge
and all were not starting. System Log contained following lines:
[   32.742532]  xvda: xvda1
Begin: Loading essential drivers ... done.
Begin: Running /scripts/init-premount ... done.
Begin: Mounting root file system ... Begin:
Running /scripts/local-top ... done.
Begin: Waiting for root file system ... done.
Gave up waiting for root device.  Common problems:
 - Boot args (cat /proc/cmdline)
   - Check rootdelay= (did the system wait long enough?)
   - Check root= (did the system wait for the right device?)
 - Missing modules (cat /proc/modules; ls /dev)
ALERT!  /dev/mapper/xvdf1 does not exist.  Dropping to a shell!

Depending on the method used to create AMI, kernel
was trying to find different device:
/dev/xvdf1 for grub-mapping branch
/dev/mapper/xvdf1 for other changes (no mapping or no "fixed mapping")



Building AMIs on Unstable instance

Situation was the same as with Wheezy; building Wheezy
succeeded with grub-mapping branch, failed without
"mapping fix"
Building unstable succeeded on two attempts, giving
the same warning when building using grub-mapping
and without "mapping fix".

As for running, I was able to run all AMIs that
I was able to build.

System log contained following lines regarding disk devices:
[   32.879425]  xvda: xvda1
Begin: Loading essential drivers ... done.
Begin: Running /scripts/init-premount ... done.
Begin: Mounting root file system ... Begin: Running /scripts/local-top ... done.
Begin: Running /scripts/local-premount ... done.
[   33.207243] EXT4-fs (xvda1): mounted filesystem with ordered data mode. Opts: (null)
Begin: Running /scripts/local-bottom ... done.
done.
Begin: Running /scripts/init-bottom ... done.

INIT: version 2.88 booting

[info] Using makefile-style concurrent boot in runlevel S.
hostname: the specified hostname is invalid


So the next step with experiments is trying
to change root= parameter in GRUB. Is it safe
to pass there /dev/xvda1, or should we get information
from machine we are building AMI on? The latter
will mean that we must build on EC2 and might mean
problems when we build HVM on PVM or vice versa.

I've noticed yet another possibility to get this information
when looking at created AMIs. There is parameter called
"Root Device Name" with "/dev/xvda" as a value. I'll look
into EC2 documentation to see when it is available,
and how can we obtain it to pass to GRUB.

At the same time I'm a bit lost. It seems that
mixing GRUBs (one from instance and one from target)
is not the wisest idea - which might make bootstrapping
new releases (Jessie?) more difficult.
If I understand correctly this situation,
GRUB on target determines which combinations
of the devices, mappings, and configurations
are allowed. GRUB on the host determines whether
created AMI will even start on HVM EC2. GRUB 1.99
on host means no HVM for us ;-|


There are still few issues that I do not understand.
1. We do not map volume, but only partitions;
/dev/mapper contains xvdf1, and not xvdf
Is this important?

bootstrapvz/base/fs/volume.py, in _before_link_dm_node()
checks /dev/mapper/vd*, not /dev/mapper/xvd*.
Why, and what's the difference between xvda and vda?
Is it some EC2-specific change, or is there some deeper
meaning?

Best regards.

-- 
Tomasz Rybak  GPG/PGP key ID: 2AD5 9860
Fingerprint A481 824E 7DD3 9C0E C40A  488E C654 FB33 2AD5 9860
http://member.acm.org/~tomaszrybak

Attachment: signature.asc
Description: This is a digitally signed message part


Reply to: