[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1111240: cloud.debian.org: generic trixie amd64 qcow2 image unable to boot



Control: tags -1 - moreinfo
Control: reassign -1 src:grub2

On Mon, Aug 18, 2025 at 11:21:59AM +0200, allddd wrote:
> The options you see in the log are mostly defaults, the config I use is prettybare-bones. I copy the qcow2 image from the cache to the designated folder,
> resize it, and use the Ansible Jinja2 template below to define the VM:
> 
> # --- #
> <domain type='kvm'>
> 
>   <name>{{ vm_num }}</name>
>   <title>{{ vm_fqdn }}</title>
>   <os>
>     <type>hvm</type>
>     <boot dev='hd'/>
>   </os>
>   <pm>
>     <suspend-to-mem enabled='no'/>
>     <suspend-to-disk enabled='no'/>
>   </pm>
>   <features>
>     <acpi/>
>   </features>
> 
>   <!-- cpu -->
>   <vcpu>{{ vm_cpu }}</vcpu>
> 
>   <!-- ram -->
>   <memory unit='GiB'>{{ vm_ram }}</memory>
>   <currentMemory unit='GiB'>{{ vm_ram }}</currentMemory>
> 
>   <devices>
>     <emulator>/usr/bin/qemu-system-x86_64</emulator>
> 
>     <!-- image -->>
>     <disk type='file' device='disk'>
>       <driver name='qemu' type='qcow2'/>
>       <source file='{{ vm_dir }}/{{ vm_num }}/{{ vm_img }}'/>
>       <target dev='vda' bus='virtio'/>
>     </disk>
> 
>     <!-- cloud-init -->>
>     <disk type="file" device='cdrom'>
>       <driver name='qemu' type='raw'/>
>       <source file='{{ vm_cloudinit_data }}'/>
>       <target dev='vdb' bus='sata'/>
>       <readonly/>
>     </disk>
> 
>     <!-- network -->>
>     <interface type='bridge'>
>       <mac address='{{ vm_mac }}'/>
>       <source bridge='br0'/>
>       <target dev='vmif{{ vm_num }}'/>
>       <model type='virtio'/>
>     </interface>
> 
>     <!-- console -->>
>     <console type='pty'>
>       <target type='serial' port='0'/>
>     </console>
> 
>     <!-- guest-agent -->>
>     <channel type='unix'>
>       <target type='virtio' name='org.qemu.guest_agent.0'/>
>     </channel>
> 
>   </devices>
> </domain>
> # --- #
> 
> Once the VM is defined, I enable autostart, start the VM, and wait until it’s
> reachable.

I believe I've been able to reproduce the problem.  I'm not sure if the
issue is with qemu or with grub2, but we'll start with the latter.

I've created a script based on the qemu invocation from your log (see
attached) that reproduces this failure with our "nocloud" images and
doesn't rely on all the file descriptors that your ansible stuff sets
up.  When we run the script, we're shown a grub boot menu, but attempts
to boot a kernel fail and drop back to the menu.

If we then modify grub.cfg in the VM image to replace "terminal_output
gfxterm serial" with "terminal_output serial", things work as expected.

Use something like the following to edit the image's grub config.
sudo kpartx -av rootfs.raw
sudo mount /dev/mapper/loop0p1 /mnt
sudo vim /mnt/boot/grub/grub.cfg
sudo umount /mnt
sudo kpartx -dv rootfs.raw

The other thing I've observed is that if we remove most of the device
specification from your qemu invocation and stick with its default
device model, then the images boot as configured.  So it's posssible
that there's either a bug with qemu's emulation of the specific serial
devices you've configured or with grub's support of that hardware.
Whatever serial port qemu sets up when run without -nodefaults does not
trigger this failure.

noah

Attachment: repro.sh
Description: Bourne shell script


Reply to: