[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: QEMU-KVM VMs sometime freeze when I run them for a couple of days



buz.hrach@seznam.cz wrote: 
> Hi Debian people ;-),
> 
> After having some issues with Fedora last year I decided to reinstall all my servers to Debian 10. I'm supper happy with Debian except one repeating issue I have with QEMU-KVM hosts that is very difficult to reproduce so I would like to discuss it first before I open a new bug. Could you please discuss it with me? ;-)
> 
> I noticed that when I run VMs for a long period of time (a couple of days) one or multiple VMs quite often stuck. It is not possible to connect the stuck VMs using virt-manager and their serial consoles don't respond.


First question: when they are just a few minutes old, does the
serial console work?


> It is not possible to shut them down ("virsh shutdown vm"). Sometimes the stuck VMs can be powered down ("virsh destroy vm") but in most cases "virsh destroy" doesn't work. In that case the only thing to do is to shut down rest of running VMs (that do respond) and reboot the host.

Second question: when the VMs are a few minutes old, does virsh
shutdown work?

> When I reboot/shutdown the host the reboot/shutdown takes approx. 30min.
> 
> This is how it looks like during the reboot / shutdown:
> ~~~
> [   ***] (1 of 4) A stop job is running for /dev/dm-1 (18min 6s / no limit)

You probably want to change that to 1 minute or so.

> As I mentioned it is very difficult to reproduce it since it takes days to get into that situation. VMs that are more likely to get stuck are VMs that:
> 
> a) have larger virtual disks
> b) more intensive storage use (use more IOPs)
> c) have more vCPUs
> 
> The problem is that VMs with larger disks usually use more IOPs and also have more vCPUs so it is difficult to say what exactly is the issue. Based on my testing I thing that less vCPUs makes it less likely to get stuck but it's difficult to say...
> 
> The only thing I'm confident is that the problem is not HW related - it happened both on my SuperMicro with XEON E5 v2 and on other hardware with Intel i7 7th gen.

Are the VMs set up to match the local hardware definition or be
fully emulated?

And, especially: if they are not using virtio for disk and
network address, try that ASAP.

> Btw. this has never happened on my laptop that has same configuration as the server (+Desktop Env.) but I reboot it multiple time a week so that might be an answer...

Not so much an answer as an explanation why you haven't seen it,
but, sure, that's plausible.

-dsr-


Reply to: