Bug#1063338: [regression 6.1.76] dlm: cannot start dlm midcomms -97 after backport of e9cdebbe23f1 ("dlm: use kernel_connect() and kernel_bind()")
- To: Valentin Kleibel <valentin@vrvis.at>
- Cc: Salvatore Bonaccorso <carnil@debian.org>, David Teigland <teigland@redhat.com>, Alexander Aring <aahringo@redhat.com>, 1063338@bugs.debian.org, gfs2@lists.linux.dev, linux-kernel@vger.kernel.org, stable@vger.kernel.org, gregkh@linuxfoundation.org, regressions@lists.linux.dev
- Subject: Bug#1063338: [regression 6.1.76] dlm: cannot start dlm midcomms -97 after backport of e9cdebbe23f1 ("dlm: use kernel_connect() and kernel_bind()")
- From: Jordan Rife <jrife@google.com>
- Date: Thu, 8 Feb 2024 13:17:11 -0800
- Message-id: <[🔎] CADKFtnQnz0NEWQT2K1AGARY5=_o2dhS3gRyMo-=9kuxqeQvcqQ@mail.gmail.com>
- Reply-to: Jordan Rife <jrife@google.com>, 1063338@bugs.debian.org
- In-reply-to: <[🔎] CADKFtnQUQt=M32tYhcutP0q6exOgk9R6xgxddDdewbms+7xwTQ@mail.gmail.com>
- References: <[🔎] 38f51dbb-65aa-4ec2-bed2-e914aef27d25@vrvis.at> <[🔎] ZcNdzZVPD76uSbps@eldamar.lan> <[🔎] CADKFtnRfqi-A_Ak_S-YC52jPn604+ekcmCmNoTA_yEpAcW4JJg@mail.gmail.com> <[🔎] 1d4c7d06-0c02-4adb-a2a3-ec85fd802ddb@vrvis.at> <[🔎] CADKFtnQUQt=M32tYhcutP0q6exOgk9R6xgxddDdewbms+7xwTQ@mail.gmail.com> <[🔎] 38f51dbb-65aa-4ec2-bed2-e914aef27d25@vrvis.at>
Hi Valentin,
Would you be able to confirm that the attached patch fixes your issue as well?
-Jordan
On Thu, Feb 8, 2024 at 9:42 AM Jordan Rife <jrife@google.com> wrote:
>
> On Thu, Feb 8, 2024 at 3:37 AM Valentin Kleibel <valentin@vrvis.at> wrote:
> >
> > Hi Jordan, hi all
> >
> > > Just a quick look comparing dlm_tcp_listen_bind between the latest 6.1
> > > and 6.6 stable branches,
> > > it looks like there is a mismatch here with the dlm_local_addr[0] parameter.
> > >
> > > 6.1
> > > ----
> > >
> > > static int dlm_tcp_listen_bind(struct socket *sock)
> > > {
> > > int addr_len;
> > >
> > > /* Bind to our port */
> > > make_sockaddr(dlm_local_addr[0], dlm_config.ci_tcp_port, &addr_len);
> > > return kernel_bind(sock, (struct sockaddr *)&dlm_local_addr[0],
> > > addr_len);
> > > }
> > >
> > > 6.6
> > > ----
> > > static int dlm_tcp_listen_bind(struct socket *sock)
> > > {
> > > int addr_len;
> > >
> > > /* Bind to our port */
> > > make_sockaddr(&dlm_local_addr[0], dlm_config.ci_tcp_port, &addr_len);
> > > return kernel_bind(sock, (struct sockaddr *)&dlm_local_addr[0],
> > > addr_len);
> > > }
> > >
> > > 6.6 contains commit c51c9cd8 (fs: dlm: don't put dlm_local_addrs on heap) which
> > > changed
> > >
> > > static struct sockaddr_storage *dlm_local_addr[DLM_MAX_ADDR_COUNT];
> > >
> > > to
> > >
> > > static struct sockaddr_storage dlm_local_addr[DLM_MAX_ADDR_COUNT];
> > >
> > > It looks like kernel_bind() in 6.1 needs to be modified to match.
> >
> > We tried to apply commit c51c9cd8 (fs: dlm: don't put dlm_local_addrs on
> > heap) to the debian kernel 6.1.76 and came up with the attached patch.
> > Besides the different offsets there is a slight change dlm_tcp_bind()
> > where in 6.1.76 kernel_bind() is used instead of sock->ops->bind() in
> > the original commit.
> >
> > This patch solves the issue we experienced.
> >
> > Thanks for your help,
> > Valentin
>
> Good to hear that works for you! We should fix this in the 6.1 stable
> kernel as well.
>
> IMO it may be less risky and simpler to fix the backport of my patch
> e9cdebbe23f1 ("dlm: use kernel_connect() and
> kernel_bind()") and just switch (struct sockaddr *)&dlm_local_addr[0]
> to (struct sockaddr *)dlm_local_addr[0]
> in the call to kernel_bind() rather than backporting c51c9cd8 (fs:
> dlm: don't put dlm_local_addrs on
> heap) to 6.1.
>
> I will have some time soon to fix the 6.1 backport, but it may make
> sense just to revert in the meantime.
>
> -Jordan
From dec5ffd309967e429b616a9d498037a5eb437c54 Mon Sep 17 00:00:00 2001
From: Jordan Rife <jrife@google.com>
Date: Thu, 8 Feb 2024 12:09:55 -0600
Subject: [PATCH] dlm: Treat dlm_local_addr[0] as sockaddr_storage *
Backport e11dea8 ("dlm: use kernel_connect() and kernel_bind()") to
Linux stable 6.1 caused a regression. The original patch expected
dlm_local_addrs[0] to be of type sockaddr_storage, because c51c9cd ("fs:
dlm: don't put dlm_local_addrs on heap") changed its type from
sockaddr_storage* to sockaddr_storage in Linux 6.5+ while in older Linux
versions this is still the original sockaddr_storage*.
Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1063338
Fixes: e11dea8 ("dlm: use kernel_connect() and kernel_bind()")
Signed-off-by: Jordan Rife <jrife@google.com>
---
fs/dlm/lowcomms.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
index 72f34f96d0155..8426073e73cf2 100644
--- a/fs/dlm/lowcomms.c
+++ b/fs/dlm/lowcomms.c
@@ -1900,7 +1900,7 @@ static int dlm_tcp_listen_bind(struct socket *sock)
/* Bind to our port */
make_sockaddr(dlm_local_addr[0], dlm_config.ci_tcp_port, &addr_len);
- return kernel_bind(sock, (struct sockaddr *)&dlm_local_addr[0],
+ return kernel_bind(sock, (struct sockaddr *)dlm_local_addr[0],
addr_len);
}
--
2.43.0.687.g38aa6559b0-goog
Reply to: