Re: [PPC] Boot problems after the pci-v6.18-changes
- To: Manivannan Sadhasivam <mani@kernel.org>
- Cc: Christian Zigotzky <chzigotzky@xenosoft.de>, Bjorn Helgaas <helgaas@kernel.org>, Lukas Wunner <lukas@wunner.de>, Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>, Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>, linux-pci@vger.kernel.org, mad skateman <madskateman@gmail.com>, "R.T.Dickinson" <rtd2@xtra.co.nz>, Christian Zigotzky <info@xenosoft.de>, linuxppc-dev <linuxppc-dev@lists.ozlabs.org>, hypexed@yahoo.com.au, Darren Stevens <darren@stevens-zone.net>, debian-powerpc@lists.debian.org, Thomas Petazzoni <thomas.petazzoni@bootlin.com>
- Subject: Re: [PPC] Boot problems after the pci-v6.18-changes
- From: Herve Codina <herve.codina@bootlin.com>
- Date: Thu, 23 Oct 2025 11:19:47 +0200
- Message-id: <[🔎] 20251023111947.6e960216@bootlin.com>
- In-reply-to: <vc7ehnmr6tjkkag3j543zwprwqdjyttovav2moo5ravpzzkmbi@qe4tds4e7nc6>
- References: <[🔎] 20251015101304.3ec03e6b@bootlin.com> <[🔎] A11312DD-8A5A-4456-B0E3-BC8EF37B21A7@xenosoft.de> <[🔎] 20251015135811.58b22331@bootlin.com> <4rtktpyqgvmpyvars3w3gvbny56y4bayw52vwjc3my3q2hw3ew@onz4v2p2uh5i> <[🔎] 20251023093813.3fbcd0ce@bootlin.com> <vc7ehnmr6tjkkag3j543zwprwqdjyttovav2moo5ravpzzkmbi@qe4tds4e7nc6>
Hi Manivannan,
On Thu, 23 Oct 2025 14:19:46 +0530
Manivannan Sadhasivam <mani@kernel.org> wrote:
> On Thu, Oct 23, 2025 at 09:38:13AM +0200, Herve Codina wrote:
> > Hi Manivannan,
> >
> > On Wed, 15 Oct 2025 18:20:22 +0530
> > Manivannan Sadhasivam <mani@kernel.org> wrote:
> >
> > > Hi Herve,
> > >
> > > On Wed, Oct 15, 2025 at 01:58:11PM +0200, Herve Codina wrote:
> > > > Hi Christian,
> > > >
> > > > On Wed, 15 Oct 2025 13:30:44 +0200
> > > > Christian Zigotzky <chzigotzky@xenosoft.de> wrote:
> > > >
> > > > > Hello Herve,
> > > > >
> > > > > > On 15 October 2025 at 10:39 am, Herve Codina <herve.codina@bootlin.com> wrote:
> > > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > I also observed issues with the commit f3ac2ff14834 ("PCI/ASPM: Enable all
> > > > > > ClockPM and ASPM states for devicetree platforms")
> > > > >
> > > > > Thanks for reporting.
> > > > >
> > > > > >
> > > > > > Also tried the quirk proposed in this discussion (quirk_disable_aspm_all)
> > > > > > an the quirk also fixes the timing issue.
> > > > >
> > > > > Where have you added quirk_disable_aspm_all?
> > > >
> > > > --- 8< ---
> > > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > > > index 214ed060ca1b..a3808ab6e92e 100644
> > > > --- a/drivers/pci/quirks.c
> > > > +++ b/drivers/pci/quirks.c
> > > > @@ -2525,6 +2525,17 @@ static void quirk_disable_aspm_l0s_l1(struct pci_dev *dev)
> > > > */
> > > > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ASMEDIA, 0x1080, quirk_disable_aspm_l0s_l1);
> > > >
> > > > +static void quirk_disable_aspm_all(struct pci_dev *dev)
> > > > +{
> > > > + pci_info(dev, "Disabling ASPM\n");
> > > > + pci_disable_link_state(dev, PCIE_LINK_STATE_ALL);
> > >
> > > Could you please try disabling L1SS and L0s separately to see which one is
> > > causing the issue? Like,
> > >
> > > pci_disable_link_state(dev, PCIE_LINK_STATE_L1_1 | PCIE_LINK_STATE_L1_2);
> > >
> > > pci_disable_link_state(dev, PCIE_LINK_STATE_L0S);
> > >
> >
> > I did tests and here are the results:
> >
> > - quirk pci_disable_link_state(dev, PCIE_LINK_STATE_ALL)
> > Issue not present
> >
> > - quirk pci_disable_link_state(dev, PCIE_LINK_STATE_L1_1 | PCIE_LINK_STATE_L1_2)
> > Issue present, timings similar to timings already reported
> > (hundreds of ms).
> >
> > - quirk pci_disable_link_state(dev, PCIE_LINK_STATE_L0S);
> > Issue present, timings still incorrect but lower
> > 64 bytes from 192.168.32.100: seq=10 ttl=64 time=16.738 ms
> > 64 bytes from 192.168.32.100: seq=11 ttl=64 time=39.500 ms
> > 64 bytes from 192.168.32.100: seq=12 ttl=64 time=62.178 ms
> > 64 bytes from 192.168.32.100: seq=13 ttl=64 time=84.709 ms
> > 64 bytes from 192.168.32.100: seq=14 ttl=64 time=107.484 ms
> >
>
> This is weird. Looks like all ASPM states (L0s, L1ss) are contributing to the
> increased latency, which is more than what should occur. This makes me ignore
> inspecting the L0s/L1 exit latency fields :/
>
> Bjorn sent out a patch [1] that enables only L0s and L1 by default. But it
> might not help you. I don't honestly know how you are seeing this much of the
> latency. This could the due to an issue in the PCI component (host or endpoint),
> or even the board routing. Identifying which one is causing the issue is going
> to be tricky as it would require some experimentation.
I've just tested the patch from Bjorn and I confirm that it doesn't fix my issue.
>
> If you are motivated, we can start to isolate this issue to the endpoint first.
> Is it possible for you to connect a different PCI card to your host and check
> whether you are seeing the increased latency? If the different PCI card is not
> exhibiting the same behavior, then the current device is the culprit and we
> should be able to quirk it.
Will see what I can do.
Best regards,
Hervé
Reply to: