[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#596419: Acknowledgement (xen-linux-system-2.6.32-5-xen-amd64: causes a system hangup by the shutdown of the system, aacraid (sw raid) involved in hangup)



> So, it worked if I have specified in Dom0 in the "baloon" mode by omitting
> the specification of dom0_mem or, if dom0_mem is specified then also the
> swiotlb=65536 must be specified.

Wow. That implies that AACRAID uses quite a lot of buffers, and looking at the driver
there are a bunch of quirks where it can only do DMA up to 2GB, so that would explain
why it relies on SWIOTLB that much.

Based on what Ian analyzed it really looks that we just ran out of DMA buffers and
the driver didn't try to retry but just bails out.

We can narrow down who is using so many buffers by using the attached debug module
that when loaded will print out who is using what buffers if
CONFIG_DMA_API_DEBUG=y is set.

But the proper workaround is the one you discovered - either raise the SWIOTLB buffer
or raise the memory allocated for Dom0.

> 
> I have noticed one interesting behavior - during the successfull suspension
> of the domains during the shutdown the first one which is beeing suspended
> writes very fast three "dots", then it stops to write the dots for some time
> and then agfter some time very fast a lot of (possibly also all remaining)
> "dots" are written on the screen. By the next suspensions the suspension
> works continuously dot-by-dot smoothly without any delays. It looks like it
> waits for something during the first suspension (memory allocation?).

That usually means that is stuck waiting for the disks to write out all the data.
> 
> Generally, it is for me very surpsrising, how the aacraid module works, I am
> no C or kernel developer but I would expect something like this cannot
> happen - the module should allocate its necessary memory in the start or, I
> would understand there can fail some specific read or write operation if the
> sw raid has not enough memory to execute them, but I would never expect this
> will lead to the hangup and freeze of the whole system. The probability of

Well, to be honest, we engineers aren't known for testing all of the failure paths
as well as we should. That is why folks like you are quite helpful in finding
bugs :-)
/*
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License v2.0 as published by
 * the Free Software Foundation
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 */

#include <linux/module.h>
#include <linux/string.h>
#include <linux/types.h>
#include <linux/init.h>
#include <linux/stat.h>
#include <linux/err.h>
#include <linux/ctype.h>
#include <linux/slab.h>
#include <linux/limits.h>
#include <linux/device.h>
#include <linux/pci.h>
#include <linux/blkdev.h>
#include <linux/device.h>

#include <linux/init.h>
#include <linux/mm.h>
#include <linux/fcntl.h>
#include <linux/slab.h>
#include <linux/kmod.h>
#include <linux/major.h>
#include <linux/smp_lock.h>
#include <linux/highmem.h>
#include <linux/blkdev.h>
#include <linux/module.h>
#include <linux/blkpg.h>
#include <linux/buffer_head.h>
#include <linux/mpage.h>
#include <linux/mount.h>
#include <linux/uio.h>
#include <linux/namei.h>
#include <asm/uaccess.h>

#include <linux/pagemap.h>
#include <linux/pagevec.h>

#include <linux/dma-debug.h>

#define DUMP_DMA_FUN  "0.1"

MODULE_AUTHOR("Konrad Rzeszutek Wilk <konrad@virtualiron>");
MODULE_DESCRIPTION("dump dma");
MODULE_LICENSE("GPL");
MODULE_VERSION(DUMP_DMA_FUN);

static int __init dump_dma_init(void)
{
	debug_dma_dump_mappings(NULL);
	return 0;
}

static void __exit dump_dma_exit(void)
{
}

module_init(dump_dma_init);
module_exit(dump_dma_exit);
# Comment/uncomment the following line to disable/enable debugging
#DEBUG = y

# Add your debugging flag (or not) to CFLAGS
ifeq ($(DEBUG),y)
  DEBFLAGS = -O -g # "-O" is needed to expand inlines
else
  DEBFLAGS = -O2
endif

EXTRA_CFLAGS += $(DEBFLAGS) -I$(LDDINCDIR)

ifneq ($(KERNELRELEASE),)
# call from kernel build system

obj-m	:= dump_dma.o

else

#KERNELDIR ?= /lib/modules/$(shell uname -r)/build
KERNELDIR ?= /home/konrad/git/neb.64/linux-build
PWD       := $(shell pwd)

default:
	$(MAKE) -C $(KERNELDIR) M=$(PWD) LDDINCDIR=$(PWD)/../include modules

endif

clean:
	rm -rf *.o *~ core .depend .*.cmd *.ko *.mod.c .tmp_versions

depend .depend dep:
	$(CC) $(CFLAGS) -M *.c > .depend


ifeq (.depend,$(wildcard .depend))
include .depend
endif

Reply to: