[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#956890: buster-pu: package zfs-linux/0.7.12-2+deb10u2



On Thu, Apr 16, 2020 at 03:40:09PM +0100, Adam D. Barratt wrote:
> Control: tags -1 + moreinfo
> Control: tags 956889 + moreinfo
> 
> On Thu, 2020-04-16 at 11:26 +0000, Mo Zhou wrote:
> > (please explain the reason for this update here)
> > 
> > We need to cherry-pick two patches in order to fix a deadlock issue
> > for zfs
> > https://github.com/openzfs/zfs/commit/98bb45e27ae80145a6ce028df90fccdb23f8901d
> > https://github.com/openzfs/zfs/commit/01937958ce85b1cd8942dbaf9a3f9768c5b02a0a
> > 
> > And the two patches have to be used in conjunction with a patch for
> > src:spl-linux
> > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=932251
> > (I'm uploading shortly)
> 
> I'm afraid I'm slightly confused here.
> 
> You've filed two copies of this bug, with slightly different content.

Please ignore the result of a mutt crash.

> Neither of them has a proposed diff attached, and they both claim to be

see attached debdiff.

> requesting updates for the "zfs-linux" package, but the upload you've
> made is for the "spl-linux" package, for which there appears to be no
> p-u bug.

The whole fix involes two parts: a part goes to src:zfs-linux and the
other goes to src:spl-linux. Now that the updated src:spl-linux is
already uploaded, and I'm now asking for the permission to upload the
updated src:zfs-linux. Which includes two upstream commits fixing
potential deadlock issues.
 
> Please could you clarify?
> 
> Regards,
> 
> Adam
> 
diff -Nru zfs-linux-0.7.12/debian/changelog zfs-linux-0.7.12/debian/changelog
--- zfs-linux-0.7.12/debian/changelog	2019-06-09 11:17:40.000000000 +0800
+++ zfs-linux-0.7.12/debian/changelog	2020-04-22 13:14:27.000000000 +0800
@@ -1,3 +1,11 @@
+zfs-linux (0.7.12-2+deb10u2) buster; urgency=medium
+
+  * Cherry-pick two upstream patches to fix potential deadlock issues.
+    + 01937958ce85b1cd8942dbaf9a3f9768c5b02a0a
+    + 98bb45e27ae80145a6ce028df90fccdb23f8901d
+
+ -- Mo Zhou <lumin@debian.org>  Wed, 22 Apr 2020 13:14:27 +0800
+
 zfs-linux (0.7.12-2+deb10u1) testing-proposed-updates; urgency=high
 
   * Patch: Disable SIMD on 4.19.37+ or 5.0+ kernels. (Closes: #929929)
diff -Nru zfs-linux-0.7.12/debian/patches/01937958ce85b1cd8942dbaf9a3f9768c5b02a0a.patch zfs-linux-0.7.12/debian/patches/01937958ce85b1cd8942dbaf9a3f9768c5b02a0a.patch
--- zfs-linux-0.7.12/debian/patches/01937958ce85b1cd8942dbaf9a3f9768c5b02a0a.patch	1970-01-01 08:00:00.000000000 +0800
+++ zfs-linux-0.7.12/debian/patches/01937958ce85b1cd8942dbaf9a3f9768c5b02a0a.patch	2020-04-22 13:11:49.000000000 +0800
@@ -0,0 +1,327 @@
+From 01937958ce85b1cd8942dbaf9a3f9768c5b02a0a Mon Sep 17 00:00:00 2001
+From: Matthew Ahrens <mahrens@delphix.com>
+Date: Thu, 31 May 2018 10:29:12 -0700
+Subject: [PATCH] OpenZFS 9577 - remove zfs_dbuf_evict_key tsd
+
+The zfs_dbuf_evict_key TSD (thread-specific data) is not necessary -
+we can instead pass a flag down in a few places to prevent recursive
+dbuf eviction. Making this change has 3 benefits:
+
+1. The code semantics are easier to understand.
+2. On Linux, performance is improved, because creating/removing
+   TSD values (by setting to NULL vs non-NULL) is expensive, and
+   we do it very often.
+3. According to Nexenta, the current semantics can cause a
+   deadlock when concurrently calling dmu_objset_evict_dbufs()
+   (which is rare today, but they are working on a "parallel
+   unmount" change that triggers this more easily):
+
+Porting Notes:
+* Minor conflict with OpenZFS 9337 which has not yet been ported.
+
+Authored by: Matthew Ahrens <mahrens@delphix.com>
+Reviewed by: George Wilson <george.wilson@delphix.com>
+Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
+Reviewed by: Brad Lewis <brad.lewis@delphix.com>
+Reviewed-by: George Melikov <mail@gmelikov.ru>
+Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
+
+OpenZFS-issue: https://illumos.org/issues/9577
+OpenZFS-commit: https://github.com/openzfs/openzfs/pull/645
+External-issue: DLPX-58547
+Closes #7602
+---
+ include/sys/dbuf.h      |  4 +--
+ include/sys/dnode.h     |  4 +--
+ module/zfs/dbuf.c       | 69 ++++++++++++++---------------------------
+ module/zfs/dnode.c      |  7 +++--
+ module/zfs/dnode_sync.c | 17 ++++++++--
+ 5 files changed, 46 insertions(+), 55 deletions(-)
+
+diff --git a/include/sys/dbuf.h b/include/sys/dbuf.h
+index 127acad33c7..e007e97644e 100644
+--- a/include/sys/dbuf.h
++++ b/include/sys/dbuf.h
+@@ -20,7 +20,7 @@
+  */
+ /*
+  * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
+- * Copyright (c) 2012, 2015 by Delphix. All rights reserved.
++ * Copyright (c) 2012, 2018 by Delphix. All rights reserved.
+  * Copyright (c) 2013 by Saso Kiselkov. All rights reserved.
+  * Copyright (c) 2014 Spectra Logic Corporation, All rights reserved.
+  */
+@@ -294,7 +294,7 @@ boolean_t dbuf_try_add_ref(dmu_buf_t *db, objset_t *os, uint64_t obj,
+ uint64_t dbuf_refcount(dmu_buf_impl_t *db);
+ 
+ void dbuf_rele(dmu_buf_impl_t *db, void *tag);
+-void dbuf_rele_and_unlock(dmu_buf_impl_t *db, void *tag);
++void dbuf_rele_and_unlock(dmu_buf_impl_t *db, void *tag, boolean_t evicting);
+ 
+ dmu_buf_impl_t *dbuf_find(struct objset *os, uint64_t object, uint8_t level,
+     uint64_t blkid);
+diff --git a/include/sys/dnode.h b/include/sys/dnode.h
+index 1e77e0a32ec..4e006df5535 100644
+--- a/include/sys/dnode.h
++++ b/include/sys/dnode.h
+@@ -20,7 +20,7 @@
+  */
+ /*
+  * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
+- * Copyright (c) 2012, 2017 by Delphix. All rights reserved.
++ * Copyright (c) 2012, 2018 by Delphix. All rights reserved.
+  * Copyright (c) 2014 Spectra Logic Corporation, All rights reserved.
+  */
+ 
+@@ -339,7 +339,7 @@ int dnode_hold_impl(struct objset *dd, uint64_t object, int flag, int dn_slots,
+     void *ref, dnode_t **dnp);
+ boolean_t dnode_add_ref(dnode_t *dn, void *ref);
+ void dnode_rele(dnode_t *dn, void *ref);
+-void dnode_rele_and_unlock(dnode_t *dn, void *tag);
++void dnode_rele_and_unlock(dnode_t *dn, void *tag, boolean_t evicting);
+ void dnode_setdirty(dnode_t *dn, dmu_tx_t *tx);
+ void dnode_sync(dnode_t *dn, dmu_tx_t *tx);
+ void dnode_allocate(dnode_t *dn, dmu_object_type_t ot, int blocksize, int ibs,
+diff --git a/module/zfs/dbuf.c b/module/zfs/dbuf.c
+index 62b77bb0a1d..9694ce78b19 100644
+--- a/module/zfs/dbuf.c
++++ b/module/zfs/dbuf.c
+@@ -72,8 +72,6 @@ static void __dbuf_hold_impl_init(struct dbuf_hold_impl_data *dh,
+ 	void *tag, dmu_buf_impl_t **dbp, int depth);
+ static int __dbuf_hold_impl(struct dbuf_hold_impl_data *dh);
+ 
+-uint_t zfs_dbuf_evict_key;
+-
+ static boolean_t dbuf_undirty(dmu_buf_impl_t *db, dmu_tx_t *tx);
+ static void dbuf_write(dbuf_dirty_record_t *dr, arc_buf_t *data, dmu_tx_t *tx);
+ 
+@@ -505,14 +503,6 @@ dbuf_evict_one(void)
+ 	dmu_buf_impl_t *db;
+ 	ASSERT(!MUTEX_HELD(&dbuf_evict_lock));
+ 
+-	/*
+-	 * Set the thread's tsd to indicate that it's processing evictions.
+-	 * Once a thread stops evicting from the dbuf cache it will
+-	 * reset its tsd to NULL.
+-	 */
+-	ASSERT3P(tsd_get(zfs_dbuf_evict_key), ==, NULL);
+-	(void) tsd_set(zfs_dbuf_evict_key, (void *)B_TRUE);
+-
+ 	db = multilist_sublist_tail(mls);
+ 	while (db != NULL && mutex_tryenter(&db->db_mtx) == 0) {
+ 		db = multilist_sublist_prev(mls, db);
+@@ -530,7 +520,6 @@ dbuf_evict_one(void)
+ 	} else {
+ 		multilist_sublist_unlock(mls);
+ 	}
+-	(void) tsd_set(zfs_dbuf_evict_key, NULL);
+ }
+ 
+ /*
+@@ -583,29 +572,6 @@ dbuf_evict_thread(void)
+ static void
+ dbuf_evict_notify(void)
+ {
+-
+-	/*
+-	 * We use thread specific data to track when a thread has
+-	 * started processing evictions. This allows us to avoid deeply
+-	 * nested stacks that would have a call flow similar to this:
+-	 *
+-	 * dbuf_rele()-->dbuf_rele_and_unlock()-->dbuf_evict_notify()
+-	 *	^						|
+-	 *	|						|
+-	 *	+-----dbuf_destroy()<--dbuf_evict_one()<--------+
+-	 *
+-	 * The dbuf_eviction_thread will always have its tsd set until
+-	 * that thread exits. All other threads will only set their tsd
+-	 * if they are participating in the eviction process. This only
+-	 * happens if the eviction thread is unable to process evictions
+-	 * fast enough. To keep the dbuf cache size in check, other threads
+-	 * can evict from the dbuf cache directly. Those threads will set
+-	 * their tsd values so that we ensure that they only evict one dbuf
+-	 * from the dbuf cache.
+-	 */
+-	if (tsd_get(zfs_dbuf_evict_key) != NULL)
+-		return;
+-
+ 	/*
+ 	 * We check if we should evict without holding the dbuf_evict_lock,
+ 	 * because it's OK to occasionally make the wrong decision here,
+@@ -681,7 +647,6 @@ dbuf_init(void)
+ 	    dbuf_cache_multilist_index_func);
+ 	zfs_refcount_create(&dbuf_cache_size);
+ 
+-	tsd_create(&zfs_dbuf_evict_key, NULL);
+ 	dbuf_evict_thread_exit = B_FALSE;
+ 	mutex_init(&dbuf_evict_lock, NULL, MUTEX_DEFAULT, NULL);
+ 	cv_init(&dbuf_evict_cv, NULL, CV_DEFAULT, NULL);
+@@ -718,7 +683,6 @@ dbuf_fini(void)
+ 		cv_wait(&dbuf_evict_cv, &dbuf_evict_lock);
+ 	}
+ 	mutex_exit(&dbuf_evict_lock);
+-	tsd_destroy(&zfs_dbuf_evict_key);
+ 
+ 	mutex_destroy(&dbuf_evict_lock);
+ 	cv_destroy(&dbuf_evict_cv);
+@@ -1004,7 +968,7 @@ dbuf_read_done(zio_t *zio, arc_buf_t *buf, void *vdb)
+ 		db->db_state = DB_UNCACHED;
+ 	}
+ 	cv_broadcast(&db->db_changed);
+-	dbuf_rele_and_unlock(db, NULL);
++	dbuf_rele_and_unlock(db, NULL, B_FALSE);
+ }
+ 
+ static int
+@@ -2178,7 +2142,8 @@ dbuf_destroy(dmu_buf_impl_t *db)
+ 		 * value in dnode_move(), since DB_DNODE_EXIT doesn't actually
+ 		 * release any lock.
+ 		 */
+-		dnode_rele(dn, db);
++		mutex_enter(&dn->dn_mtx);
++		dnode_rele_and_unlock(dn, db, B_TRUE);
+ 		db->db_dnode_handle = NULL;
+ 
+ 		dbuf_hash_remove(db);
+@@ -2204,8 +2169,10 @@ dbuf_destroy(dmu_buf_impl_t *db)
+ 	 * If this dbuf is referenced from an indirect dbuf,
+ 	 * decrement the ref count on the indirect dbuf.
+ 	 */
+-	if (parent && parent != dndb)
+-		dbuf_rele(parent, db);
++	if (parent && parent != dndb) {
++		mutex_enter(&parent->db_mtx);
++		dbuf_rele_and_unlock(parent, db, B_TRUE);
++	}
+ }
+ 
+ /*
+@@ -2912,7 +2879,7 @@ void
+ dbuf_rele(dmu_buf_impl_t *db, void *tag)
+ {
+ 	mutex_enter(&db->db_mtx);
+-	dbuf_rele_and_unlock(db, tag);
++	dbuf_rele_and_unlock(db, tag, B_FALSE);
+ }
+ 
+ void
+@@ -2923,10 +2890,19 @@ dmu_buf_rele(dmu_buf_t *db, void *tag)
+ 
+ /*
+  * dbuf_rele() for an already-locked dbuf.  This is necessary to allow
+- * db_dirtycnt and db_holds to be updated atomically.
++ * db_dirtycnt and db_holds to be updated atomically.  The 'evicting'
++ * argument should be set if we are already in the dbuf-evicting code
++ * path, in which case we don't want to recursively evict.  This allows us to
++ * avoid deeply nested stacks that would have a call flow similar to this:
++ *
++ * dbuf_rele()-->dbuf_rele_and_unlock()-->dbuf_evict_notify()
++ *	^						|
++ *	|						|
++ *	+-----dbuf_destroy()<--dbuf_evict_one()<--------+
++ *
+  */
+ void
+-dbuf_rele_and_unlock(dmu_buf_impl_t *db, void *tag)
++dbuf_rele_and_unlock(dmu_buf_impl_t *db, void *tag, boolean_t evicting)
+ {
+ 	int64_t holds;
+ 
+@@ -3021,7 +2997,8 @@ dbuf_rele_and_unlock(dmu_buf_impl_t *db, void *tag)
+ 				    db->db.db_size, db);
+ 				mutex_exit(&db->db_mtx);
+ 
+-				dbuf_evict_notify();
++				if (!evicting)
++					dbuf_evict_notify();
+ 			}
+ 
+ 			if (do_arc_evict)
+@@ -3314,7 +3291,7 @@ dbuf_sync_leaf(dbuf_dirty_record_t *dr, dmu_tx_t *tx)
+ 		kmem_free(dr, sizeof (dbuf_dirty_record_t));
+ 		ASSERT(db->db_dirtycnt > 0);
+ 		db->db_dirtycnt -= 1;
+-		dbuf_rele_and_unlock(db, (void *)(uintptr_t)txg);
++		dbuf_rele_and_unlock(db, (void *)(uintptr_t)txg, B_FALSE);
+ 		return;
+ 	}
+ 
+@@ -3670,7 +3647,7 @@ dbuf_write_done(zio_t *zio, arc_buf_t *buf, void *vdb)
+ 	ASSERT(db->db_dirtycnt > 0);
+ 	db->db_dirtycnt -= 1;
+ 	db->db_data_pending = NULL;
+-	dbuf_rele_and_unlock(db, (void *)(uintptr_t)tx->tx_txg);
++	dbuf_rele_and_unlock(db, (void *)(uintptr_t)tx->tx_txg, B_FALSE);
+ }
+ 
+ static void
+diff --git a/module/zfs/dnode.c b/module/zfs/dnode.c
+index 989a8ec7f69..b80aeb8b1f8 100644
+--- a/module/zfs/dnode.c
++++ b/module/zfs/dnode.c
+@@ -1533,11 +1533,11 @@ void
+ dnode_rele(dnode_t *dn, void *tag)
+ {
+ 	mutex_enter(&dn->dn_mtx);
+-	dnode_rele_and_unlock(dn, tag);
++	dnode_rele_and_unlock(dn, tag, B_FALSE);
+ }
+ 
+ void
+-dnode_rele_and_unlock(dnode_t *dn, void *tag)
++dnode_rele_and_unlock(dnode_t *dn, void *tag, boolean_t evicting)
+ {
+ 	uint64_t refs;
+ 	/* Get while the hold prevents the dnode from moving. */
+@@ -1568,7 +1568,8 @@ dnode_rele_and_unlock(dnode_t *dn, void *tag)
+ 		 * that the handle has zero references, but that will be
+ 		 * asserted anyway when the handle gets destroyed.
+ 		 */
+-		dbuf_rele(db, dnh);
++		mutex_enter(&db->db_mtx);
++		dbuf_rele_and_unlock(db, dnh, evicting);
+ 	}
+ }
+ 
+diff --git a/module/zfs/dnode_sync.c b/module/zfs/dnode_sync.c
+index 2febb520630..ee86c13b1b2 100644
+--- a/module/zfs/dnode_sync.c
++++ b/module/zfs/dnode_sync.c
+@@ -21,7 +21,7 @@
+ 
+ /*
+  * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
+- * Copyright (c) 2012, 2017 by Delphix. All rights reserved.
++ * Copyright (c) 2012, 2018 by Delphix. All rights reserved.
+  * Copyright (c) 2014 Spectra Logic Corporation, All rights reserved.
+  */
+ 
+@@ -429,6 +429,19 @@ dnode_evict_dbufs(dnode_t *dn)
+ 			avl_insert_here(&dn->dn_dbufs, db_marker, db,
+ 			    AVL_BEFORE);
+ 
++			/*
++			 * We need to use the "marker" dbuf rather than
++			 * simply getting the next dbuf, because
++			 * dbuf_destroy() may actually remove multiple dbufs.
++			 * It can call itself recursively on the parent dbuf,
++			 * which may also be removed from dn_dbufs.  The code
++			 * flow would look like:
++			 *
++			 * dbuf_destroy():
++			 *   dnode_rele_and_unlock(parent_dbuf, evicting=TRUE):
++			 *	if (!cacheable || pending_evict)
++			 *	  dbuf_destroy()
++			 */
+ 			dbuf_destroy(db);
+ 
+ 			db_next = AVL_NEXT(&dn->dn_dbufs, db_marker);
+@@ -489,7 +502,7 @@ dnode_undirty_dbufs(list_t *list)
+ 			list_destroy(&dr->dt.di.dr_children);
+ 		}
+ 		kmem_free(dr, sizeof (dbuf_dirty_record_t));
+-		dbuf_rele_and_unlock(db, (void *)(uintptr_t)txg);
++		dbuf_rele_and_unlock(db, (void *)(uintptr_t)txg, B_FALSE);
+ 	}
+ }
+ 
diff -Nru zfs-linux-0.7.12/debian/patches/98bb45e27ae80145a6ce028df90fccdb23f8901d.patch zfs-linux-0.7.12/debian/patches/98bb45e27ae80145a6ce028df90fccdb23f8901d.patch
--- zfs-linux-0.7.12/debian/patches/98bb45e27ae80145a6ce028df90fccdb23f8901d.patch	1970-01-01 08:00:00.000000000 +0800
+++ zfs-linux-0.7.12/debian/patches/98bb45e27ae80145a6ce028df90fccdb23f8901d.patch	2020-04-22 13:12:02.000000000 +0800
@@ -0,0 +1,373 @@
+From 98bb45e27ae80145a6ce028df90fccdb23f8901d Mon Sep 17 00:00:00 2001
+From: ilbsmart <wgqimut@gmail.com>
+Date: Wed, 17 Oct 2018 02:11:24 +0800
+Subject: [PATCH] deadlock between mm_sem and tx assign in zfs_write() and page
+ fault
+
+The bug time sequence:
+1. thread #1, `zfs_write` assign a txg "n".
+2. In a same process, thread #2, mmap page fault (which means the
+   `mm_sem` is hold) occurred, `zfs_dirty_inode` open a txg failed,
+   and wait previous txg "n" completed.
+3. thread #1 call `uiomove` to write, however page fault is occurred
+   in `uiomove`, which means it need `mm_sem`, but `mm_sem` is hold by
+   thread #2, so it stuck and can't complete,  then txg "n" will
+   not complete.
+
+So thread #1 and thread #2 are deadlocked.
+
+Reviewed-by: Chunwei Chen <tuxoko@gmail.com>
+Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
+Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
+Signed-off-by: Grady Wong <grady.w@xtaotech.com>
+Closes #7939
+---
+ include/sys/uio_impl.h                        |   2 +-
+ module/zcommon/zfs_uio.c                      |  31 +++-
+ module/zfs/zfs_vnops.c                        |  24 ++-
+ tests/zfs-tests/cmd/mmapwrite/mmapwrite.c     | 140 +++++++++++++-----
+ .../functional/mmap/mmap_write_001_pos.ksh    |   8 +-
+ 5 files changed, 151 insertions(+), 54 deletions(-)
+
+diff --git a/include/sys/uio_impl.h b/include/sys/uio_impl.h
+index 37e283da0f8..cfef0b95dbb 100644
+--- a/include/sys/uio_impl.h
++++ b/include/sys/uio_impl.h
+@@ -42,7 +42,7 @@
+ #include <sys/uio.h>
+ 
+ extern int uiomove(void *, size_t, enum uio_rw, uio_t *);
+-extern void uio_prefaultpages(ssize_t, uio_t *);
++extern int uio_prefaultpages(ssize_t, uio_t *);
+ extern int uiocopy(void *, size_t, enum uio_rw, uio_t *, size_t *);
+ extern void uioskip(uio_t *, size_t);
+ 
+diff --git a/module/zcommon/zfs_uio.c b/module/zcommon/zfs_uio.c
+index 7b4175bbeee..8e969bbcc00 100644
+--- a/module/zcommon/zfs_uio.c
++++ b/module/zcommon/zfs_uio.c
+@@ -50,6 +50,7 @@
+ #include <sys/types.h>
+ #include <sys/uio_impl.h>
+ #include <linux/kmap_compat.h>
++#include <linux/uaccess.h>
+ 
+ /*
+  * Move "n" bytes at byte address "p"; "rw" indicates the direction
+@@ -77,8 +78,24 @@ uiomove_iov(void *p, size_t n, enum uio_rw rw, struct uio *uio)
+ 				if (copy_to_user(iov->iov_base+skip, p, cnt))
+ 					return (EFAULT);
+ 			} else {
+-				if (copy_from_user(p, iov->iov_base+skip, cnt))
+-					return (EFAULT);
++				if (uio->uio_fault_disable) {
++					if (!access_ok(VERIFY_READ,
++					    (iov->iov_base + skip), cnt)) {
++						return (EFAULT);
++					}
++
++					pagefault_disable();
++					if (__copy_from_user_inatomic(p,
++					    (iov->iov_base + skip), cnt)) {
++						pagefault_enable();
++						return (EFAULT);
++					}
++					pagefault_enable();
++				} else {
++					if (copy_from_user(p,
++					    (iov->iov_base + skip), cnt))
++						return (EFAULT);
++				}
+ 			}
+ 			break;
+ 		case UIO_SYSSPACE:
+@@ -156,7 +173,7 @@ EXPORT_SYMBOL(uiomove);
+  * error will terminate the process as this is only a best attempt to get
+  * the pages resident.
+  */
+-void
++int
+ uio_prefaultpages(ssize_t n, struct uio *uio)
+ {
+ 	const struct iovec *iov;
+@@ -170,7 +187,7 @@ uio_prefaultpages(ssize_t n, struct uio *uio)
+ 	switch (uio->uio_segflg) {
+ 		case UIO_SYSSPACE:
+ 		case UIO_BVEC:
+-			return;
++			return (0);
+ 		case UIO_USERSPACE:
+ 		case UIO_USERISPACE:
+ 			break;
+@@ -194,7 +211,7 @@ uio_prefaultpages(ssize_t n, struct uio *uio)
+ 		p = iov->iov_base + skip;
+ 		while (cnt) {
+ 			if (fuword8((uint8_t *)p, &tmp))
+-				return;
++				return (EFAULT);
+ 			incr = MIN(cnt, PAGESIZE);
+ 			p += incr;
+ 			cnt -= incr;
+@@ -204,8 +221,10 @@ uio_prefaultpages(ssize_t n, struct uio *uio)
+ 		 */
+ 		p--;
+ 		if (fuword8((uint8_t *)p, &tmp))
+-			return;
++			return (EFAULT);
+ 	}
++
++	return (0);
+ }
+ EXPORT_SYMBOL(uio_prefaultpages);
+ 
+diff --git a/module/zfs/zfs_vnops.c b/module/zfs/zfs_vnops.c
+index 5a2e55eb19a..c866352d765 100644
+--- a/module/zfs/zfs_vnops.c
++++ b/module/zfs/zfs_vnops.c
+@@ -675,7 +675,10 @@ zfs_write(struct inode *ip, uio_t *uio, int ioflag, cred_t *cr)
+ 		xuio = (xuio_t *)uio;
+ 	else
+ #endif
+-		uio_prefaultpages(MIN(n, max_blksz), uio);
++		if (uio_prefaultpages(MIN(n, max_blksz), uio)) {
++			ZFS_EXIT(zfsvfs);
++			return (SET_ERROR(EFAULT));
++		}
+ 
+ 	/*
+ 	 * If in append mode, set the io offset pointer to eof.
+@@ -820,8 +823,19 @@ zfs_write(struct inode *ip, uio_t *uio, int ioflag, cred_t *cr)
+ 
+ 		if (abuf == NULL) {
+ 			tx_bytes = uio->uio_resid;
++			uio->uio_fault_disable = B_TRUE;
+ 			error = dmu_write_uio_dbuf(sa_get_db(zp->z_sa_hdl),
+ 			    uio, nbytes, tx);
++			if (error == EFAULT) {
++				dmu_tx_commit(tx);
++				if (uio_prefaultpages(MIN(n, max_blksz), uio)) {
++					break;
++				}
++				continue;
++			} else if (error != 0) {
++				dmu_tx_commit(tx);
++				break;
++			}
+ 			tx_bytes -= uio->uio_resid;
+ 		} else {
+ 			tx_bytes = nbytes;
+@@ -921,8 +935,12 @@ zfs_write(struct inode *ip, uio_t *uio, int ioflag, cred_t *cr)
+ 		ASSERT(tx_bytes == nbytes);
+ 		n -= nbytes;
+ 
+-		if (!xuio && n > 0)
+-			uio_prefaultpages(MIN(n, max_blksz), uio);
++		if (!xuio && n > 0) {
++			if (uio_prefaultpages(MIN(n, max_blksz), uio)) {
++				error = EFAULT;
++				break;
++			}
++		}
+ 	}
+ 
+ 	zfs_inode_update(zp);
+diff --git a/tests/zfs-tests/cmd/mmapwrite/mmapwrite.c b/tests/zfs-tests/cmd/mmapwrite/mmapwrite.c
+index 190d31af3aa..b9915d5d31e 100644
+--- a/tests/zfs-tests/cmd/mmapwrite/mmapwrite.c
++++ b/tests/zfs-tests/cmd/mmapwrite/mmapwrite.c
+@@ -31,74 +31,132 @@
+ #include <string.h>
+ #include <sys/mman.h>
+ #include <pthread.h>
++#include <errno.h>
++#include <err.h>
+ 
+ /*
+  * --------------------------------------------------------------------
+- * Bug Id: 5032643
++ * Bug Issue Id: #7512
++ * The bug time sequence:
++ * 1. context #1, zfs_write assign a txg "n".
++ * 2. In the same process, context #2, mmap page fault (which means the mm_sem
++ *    is hold) occurred, zfs_dirty_inode open a txg failed, and wait previous
++ *    txg "n" completed.
++ * 3. context #1 call uiomove to write, however page fault is occurred in
++ *    uiomove, which means it need mm_sem, but mm_sem is hold by
++ *    context #2, so it stuck and can't complete, then txg "n" will not
++ *    complete.
+  *
+- * Simply writing to a file and mmaping that file at the same time can
+- * result in deadlock.  Nothing perverse like writing from the file's
+- * own mapping is required.
++ * So context #1 and context #2 trap into the "dead lock".
+  * --------------------------------------------------------------------
+  */
+ 
++#define	NORMAL_WRITE_TH_NUM	2
++
+ static void *
+-mapper(void *fdp)
++normal_writer(void *filename)
+ {
+-	void *addr;
+-	int fd = *(int *)fdp;
++	char *file_path = filename;
++	int fd = -1;
++	ssize_t write_num = 0;
++	int page_size = getpagesize();
+ 
+-	if ((addr =
+-	    mmap(0, 8192, PROT_READ, MAP_SHARED, fd, 0)) == MAP_FAILED) {
+-		perror("mmap");
+-		exit(1);
++	fd = open(file_path, O_RDWR | O_CREAT, 0777);
++	if (fd == -1) {
++		err(1, "failed to open %s", file_path);
+ 	}
+-	for (;;) {
+-		if (mmap(addr, 8192, PROT_READ,
+-		    MAP_SHARED|MAP_FIXED, fd, 0) == MAP_FAILED) {
+-			perror("mmap");
+-			exit(1);
++
++	char *buf = malloc(1);
++	while (1) {
++		write_num = write(fd, buf, 1);
++		if (write_num == 0) {
++			err(1, "write failed!");
++			break;
+ 		}
++		lseek(fd, page_size, SEEK_CUR);
++	}
++
++	if (buf) {
++		free(buf);
+ 	}
+-	/* NOTREACHED */
+-	return ((void *)1);
+ }
+ 
+-int
+-main(int argc, char **argv)
++static void *
++map_writer(void *filename)
+ {
+-	int fd;
+-	char buf[1024];
+-	pthread_t tid;
++	int fd = -1;
++	int ret = 0;
++	char *buf = NULL;
++	int page_size = getpagesize();
++	int op_errno = 0;
++	char *file_path = filename;
+ 
+-	memset(buf, 'a', sizeof (buf));
++	while (1) {
++		ret = access(file_path, F_OK);
++		if (ret) {
++			op_errno = errno;
++			if (op_errno == ENOENT) {
++				fd = open(file_path, O_RDWR | O_CREAT, 0777);
++				if (fd == -1) {
++					err(1, "open file failed");
++				}
+ 
+-	if (argc != 2) {
+-		(void) printf("usage: %s <file name>\n", argv[0]);
+-		exit(1);
+-	}
++				ret = ftruncate(fd, page_size);
++				if (ret == -1) {
++					err(1, "truncate file failed");
++				}
++			} else {
++				err(1, "access file failed!");
++			}
++		} else {
++			fd = open(file_path, O_RDWR, 0777);
++			if (fd == -1) {
++				err(1, "open file failed");
++			}
++		}
+ 
+-	if ((fd = open(argv[1], O_RDWR|O_CREAT|O_TRUNC, 0666)) == -1) {
+-		perror("open");
+-		exit(1);
++		if ((buf = mmap(NULL, page_size, PROT_READ | PROT_WRITE,
++		    MAP_SHARED, fd, 0)) == MAP_FAILED) {
++			err(1, "map file failed");
++		}
++
++		if (fd != -1)
++			close(fd);
++
++		char s[10] = {0, };
++		memcpy(buf, s, 10);
++		ret = munmap(buf, page_size);
++		if (ret != 0) {
++			err(1, "unmap file failed");
++		}
+ 	}
++}
+ 
+-	(void) pthread_setconcurrency(2);
+-	if (pthread_create(&tid, NULL, mapper, &fd) != 0) {
+-		perror("pthread_create");
+-		close(fd);
++int
++main(int argc, char **argv)
++{
++	pthread_t map_write_tid;
++	pthread_t normal_write_tid[NORMAL_WRITE_TH_NUM];
++	int i = 0;
++
++	if (argc != 3) {
++		(void) printf("usage: %s <normal write file name>"
++		    "<map write file name>\n", argv[0]);
+ 		exit(1);
+ 	}
+-	for (;;) {
+-		if (write(fd, buf, sizeof (buf)) == -1) {
+-			perror("write");
+-			close(fd);
+-			exit(1);
++
++	for (i = 0; i < NORMAL_WRITE_TH_NUM; i++) {
++		if (pthread_create(&normal_write_tid[i], NULL, normal_writer,
++		    argv[1])) {
++			err(1, "pthread_create normal_writer failed.");
+ 		}
+ 	}
+ 
+-	close(fd);
++	if (pthread_create(&map_write_tid, NULL, map_writer, argv[2])) {
++		err(1, "pthread_create map_writer failed.");
++	}
+ 
+ 	/* NOTREACHED */
++	pthread_join(map_write_tid, NULL);
+ 	return (0);
+ }
+diff --git a/tests/zfs-tests/tests/functional/mmap/mmap_write_001_pos.ksh b/tests/zfs-tests/tests/functional/mmap/mmap_write_001_pos.ksh
+index 1eda971041d..24150b827f8 100755
+--- a/tests/zfs-tests/tests/functional/mmap/mmap_write_001_pos.ksh
++++ b/tests/zfs-tests/tests/functional/mmap/mmap_write_001_pos.ksh
+@@ -53,12 +53,14 @@ if ! is_mp; then
+ fi
+ 
+ log_must chmod 777 $TESTDIR
+-mmapwrite $TESTDIR/test-write-file &
++mmapwrite $TESTDIR/normal_write_file $TESTDIR/map_write_file &
+ PID_MMAPWRITE=$!
+-log_note "mmapwrite $TESTDIR/test-write-file pid: $PID_MMAPWRITE"
++log_note "mmapwrite $TESTDIR/normal_write_file $TESTDIR/map_write_file"\
++	 "pid: $PID_MMAPWRITE"
+ log_must sleep 30
+ 
+ log_must kill -9 $PID_MMAPWRITE
+-log_must ls -l $TESTDIR/test-write-file
++log_must ls -l $TESTDIR/normal_write_file
++log_must ls -l $TESTDIR/map_write_file
+ 
+ log_pass "write(2) a mmap(2)'ing file succeeded."
diff -Nru zfs-linux-0.7.12/debian/patches/series zfs-linux-0.7.12/debian/patches/series
--- zfs-linux-0.7.12/debian/patches/series	2019-06-09 11:17:40.000000000 +0800
+++ zfs-linux-0.7.12/debian/patches/series	2020-04-22 13:12:46.000000000 +0800
@@ -10,3 +10,5 @@
 2000-increase-default-zcmd-allocation-to-256K.patch
 init-debian-openrc-workaround.patch
 e22bfd814960295029ca41c8e116e8d516d3e730.patch
+98bb45e27ae80145a6ce028df90fccdb23f8901d.patch
+01937958ce85b1cd8942dbaf9a3f9768c5b02a0a.patch

Reply to: