Hi Ian, On 09/11/2011 05:37 PM, Ian Campbell wrote:
On Sun, 2011-09-11 at 01:18 +0200, Hans van Kranenburg wrote:On 09/08/2011 07:12 PM, Hans van Kranenburg wrote:When putting disk/network load on one of our office servers, Xen/dom0 crashes. Triple ctrl-a does not react any more on serial console.[...] A workaround should be to set the option disable_sendpage when loading the drbd module [3], [4].Sounds similar to the NFS issue[0] which caused me to begin working on the SKB paged fragment destructor patches[1]. I just gave a talk about this problem at LPC last week[2] They are still a WIP but I hope to have them ready for Linux 3.3. I will include DRDB in my list of subsystems to consider.
That sure is very interesting material to read. Ths original stack trace I posted makes a lot more sense to me now.
In the meantime disabling sendpage sounds like the best workaround.
So, we set the disable_sendpage option, did a domU reboot with drbdadm down/up of the drbd devices (just to be sure, don't know where/when this option is read by drbd), and after some days of hitting the disks and the network with data, no kernel panics happened anymore. Yay!
In the post you reference with [1] you write: "I expect that other block and filesystem users of the network subsystem (e.g. iSCSI) would also benefit from this functionality since they will suffer from the same class of issue.". Part of my work in the near future is doing lenny->squeeze upgrades of a couple of systems where we use lvm backed block devices for domU's which are on dm-multipath on iSCSI. Should I be concerned about the same issues that can happen when using iSCSI on squeeze? If so, or if unknown, do you recommend specific (stress)tests that we can do at the test-upgrade environment?
[1] http://marc.info/?l=linux-netdev&m=131072801125521&w=2What should be done with this bug report? Should I close it, as there's a workaround, and there's no simple fix that can be done in squeeze, or should it be hanging around to be closed when the work on this is done and included in the kernel?
Thanks! -- Hans van Kranenburg - System / Network Engineer T +31 (0)10 2760434 | hans.van.kranenburg@mendix.com | www.mendix.com