[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: We might have the GPU instances able to run



On 31 May 2014 22:38, Tomasz Rybak <tomasz.rybak@post.pl> wrote:
Dnia 2014-05-26, pon o godzinie 23:46 +0200, Anders Ingemann pisze:
> On 26 May 2014 20:47, Tomasz Rybak <tomasz.rybak@post.pl> wrote:
>         Hello list.
>         I've been playing with building AWS and trying to run them
>         on GPU instances. It seems like I was able to create AMI
>         which runs on cg1.4xlarge. It is ami-90dd35f8 on us-east-1
>         with Debian Wheezy.
>         I built it using attached manifest. It seems like I forgot
>         to add plugin "admin_user" so you login to root - can
>         anyone confirm that lack of admin user is caused
>         by not calling plugin and not by other mistake.
>
>         I was able to install NVIDIA drivers and AMD OpenCL
>         provider. I was also able to install and run
>         PyOpenCL and PyCUDA, see attached info.
>
>         I built this AMI using code from my repository
>         https://github.com/rybaktomasz/bootstrap-vz.git
>         branch grub-mapping, which contains code
>         heavily influenced by Mike Christopher's
>         PR https://github.com/andsens/bootstrap-vz/pull/35
>
>         So - the task for now is to clean this code,
>         thinking about why there is bad mapping
>         of devices in chroot. I'll look into it at the
>         end of the week, and shall let you know if I discover
>         something.
[ cut ]
>
>
> Great job Tomasz!
>
> The reason I didn't accept the PR was because it basically undid the
> "link_dm_node" step, so really it shouldn't be called at all then.
> Can you confirm that?

Yes, I can confirm that it is undoing changes done by dmsetup create.

I do not fully understand what's going on during normal execution,
so please point some missing points.
Just before GRUB installation remount is called on main volume.
It unmounts all partitions from that volume, then calls unmap,
then calls link_dm_node which calls dmsetup, than calls map,
and mounts volume. It all somehow works with _before_* and _after_*
methods, and gets dispatched using events arrsy, but I do not see
all interdependencies here.
I did not see any dmsetup call in logs though.


When I disable code from PR I get:
[253401.851177] INFO: Installing grub
[253402.403116] DEBUG: Executing: readlink -f /dev/xvdg
[253522.482157] DEBUG: /dev/xvdg
[253623.136997] DEBUG: Executing: chroot /target/f63adbf5/root
grub-install /dev
/xvdg
[254886.157036] ERROR: /usr/sbin/grub-probe: error: cannot find a GRUB
drive for
 /dev/mapper/xvdg1.  Check your device.map.
[254887.318134] ERROR: Auto-detection of a filesystem
of /dev/mapper/xvdg1 failed.
[254887.449026] ERROR: Try with --recheck.
[254887.75897] ERROR: If the problem persists please report this
together with the output of "/usr/sbin/grub-probe
--device-map="/boot/grub/device.map" --target=fs -v /boot/grub" to
<bug-grub@gnu.org>
[254965.008974] ERROR: Command 'chroot /target/f63adbf5/root
grub-install /dev/xvdg' returned non-zero exit status 1
Traceback (most recent call last):
  File "/home/admin/bootstrap-vz/bootstrapvz/base/main.py", line 78, in
run
    tasklist.run(info=bootstrap_info, dry_run=opts['--dry-run'])
  File "/home/admin/bootstrap-vz/bootstrapvz/base/tasklist.py", line 38,
in run
    task.run(info)
  File "/home/admin/bootstrap-vz/bootstrapvz/common/tasks/boot.py", line
114, in run
    raise e
CalledProcessError: Command 'chroot /target/f63adbf5/root
grub-install /dev/xvdg' returned non-zero exit status 1
[254966.309071] ERROR: Rolling back

device.map contains:
(hd0) /dev/xvdg
(hd0,msdos1) /dev/mapper/xvdg1
and /dev/mapper/xvdg1 links to /dev/dm-1
(I have dm-0 taken by previous experiment).

I am not yet sure how to fix it. I made an experiment
and changed code to generate device.map as:
(hd0) /dev/xvdg
(hd0,msdos1) /dev/xvdg1
but it failed with the same error as before.
I do not know why, while having  /dev/xvdg1 in devices.map,
grub was trying to use /dev/mapper/xvdg1.

Basically - the only choice between failure and
success is link target in /dev/mapper/xvd*1.
If it points to dm-* - grub install fails.
If it points to xvd*1 - grub install succeeds.

Best regards.

--
Tomasz Rybak  GPG/PGP key ID: 2AD5 9860
Fingerprint A481 824E 7DD3 9C0E C40A  488E C654 FB33 2AD5 9860
http://member.acm.org/~tomaszrybak



I do not fully understand what's going on during normal execution,
so please point some missing points.

> Just before GRUB installation remount is called on main volume.
> It unmounts all partitions from that volume, then calls unmap, then calls link_dm_node which calls dmsetup, than calls map, and mounts volume.

Exactly. For NBDs and loopback volumes, this is necessary when using grub 1.99, because grub tries to be smart about things and then trips over its own feet.
To avoid that, we fool grub into thinking that the mounted volume and it's partitions are just on a normal HDD.

> It all somehow works with _before_* and _after_* methods, and gets dispatched using events arrsy, but I do not see all interdependencies here.

There is a custom class FSMProxy, which proxies calls on the class into the fsm class. This is mostly done because I didn't code all the state machines for volume and partition handling in one go. I'm sure it can be simplified, but I just haven't gotten around to it yet.

> I did not see any dmsetup call in logs though.

I'm sure you just overlooked them, they're there somewhere :-)

> Basically - the only choice between failure and
> success is link target in /dev/mapper/xvd*1.
> If it points to dm-* - grub install fails.
> If it points to xvd*1 - grub install succeeds.

You can achieve that without undoing the links. Just remove this line (and the one further down that remounts without the links).

Reply to: