[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#587763: scary messages from JBD when manipulating quotas on ext4



Hi,

Robert L Mathews wrote:

> Unfortunately, I'm not able to reproduce the original problem on a
> test system for some reason (and I tried quite hard). It only
> happens on one of my production servers, but I can't experiment on
> that one. Sorry about that.  :-(

Thanks for the update, and no problem.  What filesystem options does
your test system use (as shown in /proc/mounts or by dumpe2fs -h
<bdev>)?  How do you set up quotas on it?

Looking closer, straightforward testing of that one patch would
probably not reveal much anyway --- any subtle effects wouldn't
necessarily be triggered often.  Pointers that might help someone
review the code:

  http://thread.gmane.org/gmane.comp.file-systems.ext4/19421/focus=19462
  http://thread.gmane.org/gmane.comp.file-systems.ext4/19421/focus=19470

I suspect applying the patch to 2.6.32.y is safe even in edge cases.

Summary of the discussion: Ted said

|                                                  In the long-term I
| think the quota file should become a special file much like the
| journal, and then this makes a huge amount of sense.

Jan addressed his fears by clarifying that the quota file is immutable
when quotas are on:

|   Ted, that's what generic quota code actually does for you (unless
| DQUOT_QUOTA_SYS_FILE flag is specified but that's not the case of
| ext?)
| - see vfs_load_quota_inode.

The basic rules about quota have not changed fundamentally between
2.6.32 and 2.6.36, so this looks like a good and safe patch for
squeeze.  To avoid tempting fate, here is a list including some other
fixes that could go with it.  I haven't tried testing this at all yet.
Comments (including test results) welcome.

 - v2.6.34-rc1~192^2~11 "quota: Properly invalidate caches even for
   filesystems with blocksize < pagesize", 2010-02-22

 - v2.6.36-rc1~501^2~15 "ext4: Always journal quota file
   modifications", 2010-07-27

 - v2.6.36-rc1~501^2~7 "ext4: force block allocation on quota_off",
   2010-08-01, and v2.6.37-rc2~75^2~2 "ext4: do not try to grab the
   s_umount semaphore in ext4_quota_off", 2010-11-08

 - v2.6.33-rc1~311^2~14 "quota: Fix WARN_ON in lookup_one_len",
   2009-11-12

 - 6b6dc836a "vfs: Provide function to get superblock and wait for it
   to thaw", 2012-02-10, and dcdbed853d9f "quota: Fix deadlock with
   suspend and quotas", 2012-02-10

 fs/ext4/super.c    |   32 +++++++++++++++++---------------
 fs/quota/dquot.c   |   14 +++++++++-----
 fs/quota/quota.c   |   24 +++++++++++++++++++++---
 fs/super.c         |   22 ++++++++++++++++++++++
 include/linux/fs.h |    1 +
 5 files changed, 70 insertions(+), 23 deletions(-)
From: Jan Kara <jack@suse.cz>
Date: Mon, 22 Feb 2010 21:07:17 +0100
Subject: quota: Properly invalidate caches even for filesystems with blocksize < pagesize

commit ab94c39b6fa076d4f6d2903dcc54cda35d938776 upstream.

Sometimes invalidate_bdev() can fail to invalidate a part of block
device cache because of dirty data. If the filesystem has blocksize
smaller than page size, this can happen even for pages containing
quota files and thus kernel would operate on stale data. Fix the
issue by syncing the filesystem before invalidating the cache.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 fs/quota/dquot.c |   12 +++++++-----
 1 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index ce9a4f22c1dd..d06ee1f12c8b 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -2087,11 +2087,13 @@ static int vfs_load_quota_inode(struct inode *inode, int type, int format_id,
 	}
 
 	if (!(dqopt->flags & DQUOT_QUOTA_SYS_FILE)) {
-		/* As we bypass the pagecache we must now flush the inode so
-		 * that we see all the changes from userspace... */
-		write_inode_now(inode, 1);
-		/* And now flush the block cache so that kernel sees the
-		 * changes */
+		/* As we bypass the pagecache we must now flush all the
+		 * dirty data and invalidate caches so that kernel sees
+		 * changes from userspace. It is not enough to just flush
+		 * the quota file since if blocksize < pagesize, invalidation
+		 * of the cache could fail because of other unrelated dirty
+		 * data */
+		sync_filesystem(sb);
 		invalidate_bdev(sb->s_bdev);
 	}
 	mutex_lock(&dqopt->dqonoff_mutex);
-- 
1.7.9

From: Jan Kara <jack@suse.cz>
Date: Tue, 27 Jul 2010 11:56:07 -0400
Subject: ext4: Always journal quota file modifications

commit 62d2b5f2dcd3707b070efb16bbfdf6947c38c194 upstream.

When journaled quota options are not specified, we do writes
to quota files just in data=ordered mode. This actually causes
warnings from JBD2 about dirty journaled buffer because ext4_getblk
unconditionally treats a block allocated by it as metadata. Since
quota actually is filesystem metadata, the easiest way to get rid
of the warning is to always treat quota writes as metadata...

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 fs/ext4/super.c |   19 +++++--------------
 1 files changed, 5 insertions(+), 14 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index f1e7077a06a3..c53a60742cca 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -3926,7 +3926,6 @@ static ssize_t ext4_quota_write(struct super_block *sb, int type,
 	int err = 0;
 	int offset = off & (sb->s_blocksize - 1);
 	int tocopy;
-	int journal_quota = EXT4_SB(sb)->s_qf_names[type] != NULL;
 	size_t towrite = len;
 	struct buffer_head *bh;
 	handle_t *handle = journal_current_handle();
@@ -3944,24 +3943,16 @@ static ssize_t ext4_quota_write(struct super_block *sb, int type,
 		bh = ext4_bread(handle, inode, blk, 1, &err);
 		if (!bh)
 			goto out;
-		if (journal_quota) {
-			err = ext4_journal_get_write_access(handle, bh);
-			if (err) {
-				brelse(bh);
-				goto out;
-			}
+		err = ext4_journal_get_write_access(handle, bh);
+		if (err) {
+			brelse(bh);
+			goto out;
 		}
 		lock_buffer(bh);
 		memcpy(bh->b_data+offset, data, tocopy);
 		flush_dcache_page(bh->b_page);
 		unlock_buffer(bh);
-		if (journal_quota)
-			err = ext4_handle_dirty_metadata(handle, NULL, bh);
-		else {
-			/* Always do at least ordered writes for quotas */
-			err = ext4_jbd2_file_inode(handle, inode);
-			mark_buffer_dirty(bh);
-		}
+		err = ext4_handle_dirty_metadata(handle, NULL, bh);
 		brelse(bh);
 		if (err)
 			goto out;
-- 
1.7.9

From: Dmitry Monakhov <dmonakhov@openvz.org>
Date: Sun, 1 Aug 2010 17:48:36 -0400
Subject: ext4: force block allocation on quota_off

commit ca0e05e4b15193aeba72b995e90de990db7f8304 upstream.

Perform full sync procedure so that any delayed allocation blocks are
allocated so quota will be consistent.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 fs/ext4/super.c |   15 ++++++++++++++-
 1 files changed, 14 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index c53a60742cca..95ae5b658cbd 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -995,6 +995,7 @@ static int ext4_mark_dquot_dirty(struct dquot *dquot);
 static int ext4_write_info(struct super_block *sb, int type);
 static int ext4_quota_on(struct super_block *sb, int type, int format_id,
 				char *path, int remount);
+static int ext4_quota_off(struct super_block *sb, int type, int remount);
 static int ext4_quota_on_mount(struct super_block *sb, int type);
 static ssize_t ext4_quota_read(struct super_block *sb, int type, char *data,
 			       size_t len, loff_t off);
@@ -1026,7 +1027,7 @@ static const struct dquot_operations ext4_quota_operations = {
 
 static const struct quotactl_ops ext4_qctl_operations = {
 	.quota_on	= ext4_quota_on,
-	.quota_off	= vfs_quota_off,
+	.quota_off	= ext4_quota_off,
 	.quota_sync	= vfs_quota_sync,
 	.get_info	= vfs_get_dqinfo,
 	.set_info	= vfs_set_dqinfo,
@@ -3876,6 +3877,18 @@ static int ext4_quota_on(struct super_block *sb, int type, int format_id,
 	return err;
 }
 
+static int ext4_quota_off(struct super_block *sb, int type, int remount)
+{
+	/* Force all delayed allocation blocks to be allocated */
+	if (test_opt(sb, DELALLOC)) {
+		down_read(&sb->s_umount);
+		sync_filesystem(sb);
+		up_read(&sb->s_umount);
+	}
+
+	return vfs_quota_off(sb, type, remount);
+}
+
 /* Read data from quotafile - avoid pagecache and such because we cannot afford
  * acquiring the locks... As quota files are never truncated and quota code
  * itself serializes the operations (and noone else should touch the files)
-- 
1.7.9

From: Dmitry Monakhov <dmonakhov@openvz.org>
Date: Mon, 8 Nov 2010 13:47:33 -0500
Subject: ext4: do not try to grab the s_umount semaphore in ext4_quota_off

commit 87009d86dc045d228e21242467a67a5f99347553 upstream.

It's not needed to sync the filesystem, and it fixes a lock_dep complaint.

[jrnieder@gmail.com: s_umount does need to be held in sync_filesystem.
 The reason not to grab it in ext4_quota_off is that quotactl has
 already acquired s_umount by calling get_super.]

Signed-off-by: Dmitry Monakhov <dmonakhov@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 fs/ext4/super.c |    8 +++-----
 1 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 95ae5b658cbd..302304b82f7b 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -3879,12 +3879,10 @@ static int ext4_quota_on(struct super_block *sb, int type, int format_id,
 
 static int ext4_quota_off(struct super_block *sb, int type, int remount)
 {
-	/* Force all delayed allocation blocks to be allocated */
-	if (test_opt(sb, DELALLOC)) {
-		down_read(&sb->s_umount);
+	/* Force all delayed allocation blocks to be allocated.
+	 * Caller already holds s_umount sem */
+	if (test_opt(sb, DELALLOC))
 		sync_filesystem(sb);
-		up_read(&sb->s_umount);
-	}
 
 	return vfs_quota_off(sb, type, remount);
 }
-- 
1.7.9

From: Jan Kara <jack@suse.cz>
Date: Thu, 12 Nov 2009 15:42:08 +0100
Subject: quota: Fix WARN_ON in lookup_one_len

commit c56818d7dc976a7392be82e8e04fe26347d591f3 upstream.

We should hold i_mutex when looking up quota files for journaled quotas,
otherwise a WARN_ON in lookup_one_len triggers. The fact that we didn't
hold i_mutex previously probably could not lead to a real bug since the
filesystem is just being mounted / remounted read-write and thus the
root directory cannot change anyway but it's definitely cleaner with
i_mutex.

Reported-by: Bastien ROUCARIES <roucaries.bastien@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 fs/quota/dquot.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index d06ee1f12c8b..fc061714ead7 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -2287,7 +2287,9 @@ int vfs_quota_on_mount(struct super_block *sb, char *qf_name,
 	struct dentry *dentry;
 	int error;
 
+	mutex_lock(&sb->s_root->d_inode->i_mutex);
 	dentry = lookup_one_len(qf_name, sb->s_root, strlen(qf_name));
+	mutex_unlock(&sb->s_root->d_inode->i_mutex);
 	if (IS_ERR(dentry))
 		return PTR_ERR(dentry);
 
-- 
1.7.9

From: Jan Kara <jack@suse.cz>
Date: Fri, 10 Feb 2012 11:03:00 +0100
Subject: vfs: Provide function to get superblock and wait for it to thaw

commit 6b6dc836a195e077e76977b6c020a73de411b46d upstream.

In quota code we need to find a superblock corresponding to a device and wait
for superblock to be unfrozen. However this waiting has to happen without
s_umount semaphore because that is required for superblock to thaw. So provide
a function in VFS for this to keep dances with s_umount where they belong.

[AV: implementation switched to saner variant]

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 fs/super.c         |   22 ++++++++++++++++++++++
 include/linux/fs.h |    1 +
 2 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index aff046b0fe78..b1ea409e5f4d 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -467,6 +467,28 @@ rescan:
 EXPORT_SYMBOL(get_super);
 
 /**
+ *	get_super_thawed - get thawed superblock of a device
+ *	@bdev: device to get the superblock for
+ *
+ *	Scans the superblock list and finds the superblock of the file system
+ *	mounted on the device. The superblock is returned once it is thawed
+ *	(or immediately if it was not frozen). %NULL is returned if no match
+ *	is found.
+ */
+struct super_block *get_super_thawed(struct block_device *bdev)
+{
+	while (1) {
+		struct super_block *s = get_super(bdev);
+		if (!s || s->s_frozen == SB_UNFROZEN)
+			return s;
+		up_read(&s->s_umount);
+		vfs_check_frozen(s, SB_FREEZE_WRITE);
+		put_super(s);
+	}
+}
+EXPORT_SYMBOL(get_super_thawed);
+
+/**
  * get_active_super - get an active reference to the superblock of a device
  * @bdev: device to get the superblock for
  *
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 1b9a47abb014..bcbd8576cece 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2348,6 +2348,7 @@ extern void get_filesystem(struct file_system_type *fs);
 extern void put_filesystem(struct file_system_type *fs);
 extern struct file_system_type *get_fs_type(const char *name);
 extern struct super_block *get_super(struct block_device *);
+extern struct super_block *get_super_thawed(struct block_device *);
 extern struct super_block *get_active_super(struct block_device *bdev);
 extern struct super_block *user_get_super(dev_t);
 extern void drop_super(struct super_block *sb);
-- 
1.7.9

From: Jan Kara <jack@suse.cz>
Date: Fri, 10 Feb 2012 11:03:01 +0100
Subject: quota: Fix deadlock with suspend and quotas

commit dcdbed853d9fbb0547b781ba676049b87f54129a upstream.

This script causes a kernel deadlock:
set -e
DEVICE=/dev/vg1/linear
lvchange -ay $DEVICE
mkfs.ext3 $DEVICE
mount -t ext3 -o usrquota,grpquota $DEVICE /mnt/test
quotacheck -gu /mnt/test
umount /mnt/test
mount -t ext3 -o usrquota,grpquota $DEVICE /mnt/test
quotaon /mnt/test
dmsetup suspend $DEVICE
setquota -u root 1 2 3 4 /mnt/test &
sleep 1
dmsetup resume $DEVICE

setquota acquired semaphore s_umount for read and then tried to perform a
transaction (and waits because the device is suspended).  dmsetup resume tries
to acquire s_umount for write before resuming the device (and waits for
setquota).

Fix the deadlock by grabbing a thawed superblock for quota commands which need
it.

Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 fs/quota/quota.c |   24 +++++++++++++++++++++---
 1 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/fs/quota/quota.c b/fs/quota/quota.c
index 95c5b42384b2..ea5f543c3319 100644
--- a/fs/quota/quota.c
+++ b/fs/quota/quota.c
@@ -351,11 +351,26 @@ static int do_quotactl(struct super_block *sb, int type, int cmd, qid_t id,
 	return 0;
 }
 
+/* Return 1 if 'cmd' will block on frozen filesystem */
+static int quotactl_cmd_write(int cmd)
+{
+	switch (cmd) {
+	case Q_GETFMT:
+	case Q_GETINFO:
+	case Q_SYNC:
+	case Q_XGETQSTAT:
+	case Q_XGETQUOTA:
+	case Q_XQUOTASYNC:
+		return 0;
+	}
+	return 1;
+}
+
 /*
  * look up a superblock on which quota ops will be performed
  * - use the name of a block device to find the superblock thereon
  */
-static struct super_block *quotactl_block(const char __user *special)
+static struct super_block *quotactl_block(const char __user *special, int cmd)
 {
 #ifdef CONFIG_BLOCK
 	struct block_device *bdev;
@@ -368,7 +383,10 @@ static struct super_block *quotactl_block(const char __user *special)
 	putname(tmp);
 	if (IS_ERR(bdev))
 		return ERR_CAST(bdev);
-	sb = get_super(bdev);
+	if (quotactl_cmd_write(cmd))
+		sb = get_super_thawed(bdev);
+	else
+		sb = get_super(bdev);
 	bdput(bdev);
 	if (!sb)
 		return ERR_PTR(-ENODEV);
@@ -396,7 +414,7 @@ SYSCALL_DEFINE4(quotactl, unsigned int, cmd, const char __user *, special,
 	type = cmd & SUBCMDMASK;
 
 	if (cmds != Q_SYNC || special) {
-		sb = quotactl_block(special);
+		sb = quotactl_block(special, cmds);
 		if (IS_ERR(sb))
 			return PTR_ERR(sb);
 	}
-- 
1.7.9


Reply to: