Bug#1107521: ath12k_pci errors and loss of connectivity in 6.12.y branch
- To: Vasant Hegde <vasant.hegde@amd.com>
- Cc: Robin Murphy <robin.murphy@arm.com>, Baochen Qiang <baochen.qiang@oss.qualcomm.com>, Jeff Johnson <jjohnson@kernel.org>, will@kernel.org, joro@8bytes.org, linux-wireless@vger.kernel.org, ath12k@lists.infradead.org, 1107521@bugs.debian.org, iommu@lists.linux.dev
- Subject: Bug#1107521: ath12k_pci errors and loss of connectivity in 6.12.y branch
- From: Matt Mower <mowerm@gmail.com>
- Date: Tue, 1 Jul 2025 22:17:48 -0700
- Message-id: <[🔎] CAPDiVH-kVCUY8DKexT9OqAZsvkZ5_CGo8d8nENYA-kD=s_x8wA@mail.gmail.com>
- Reply-to: Matt Mower <mowerm@gmail.com>, 1107521@bugs.debian.org
- In-reply-to: <4a13d862-1bbb-4a98-bc1d-219bf78f7c0d@amd.com>
- References: <CAPDiVH8gaBH6o_OY-zUWYpDbj5mhiqmofKGb71gLgHOi4vA=Vw@mail.gmail.com> <0ba2176e-3339-4a8b-850a-ca5643939c8b@oss.qualcomm.com> <fd3bd8b1-4108-445a-b65f-4769d73e6e63@arm.com> <4a13d862-1bbb-4a98-bc1d-219bf78f7c0d@amd.com> <174939484316.7705.5967923154709480099.reportbug@AI360>
> A couple more things I'd try on the ath12k side: firstly, boot with
> "iommu.strict=1" and see if that makes the faults any more
> frequent/reproducible;
The issue is easy enough to reproduce in 6.12.27 onward and I may be
mistaken about the rarity in 6.12.22; I reproduced it relatively
quickly in .22 today, so if this was the primary purpose for setting
iommu.strict=1, then testing with or without strict works. FWIW, I did
test iommu.strict=1 with 6.15.3 and still have not reproduced this
issue there.
> if a fault is fairly easily reproducible, then
> use the DMA API and/or IOMMU API tracepoints to compare the fault
> address to prior DMA mapping activity - that can usually reveal the
> nature of the bug enough to then know what to go looking for.
This is unfamiliar territory for me, so I hope the following is at
least close to what you requested. If not, happy to provide more test
results based on a set of instructions. Here's what I did:
1. Set CONFIG_DMA_API_DEBUG=y
2. Set kernel command line to: iommu.strict=1 log_buf_len=100M
dma_debug_driver=ath12k_pci trace_event=dma:*,iommu:*
3. Booted and waited for page fault, then cat'd
/sys/kernel/tracing/trace to a file.
Additionally, though I'm pretty sure this is irrelevant now, I added
logging after each dma_map_single() in the ath12k driver to print the
function name and resultant address to the kernel log.
Comparing the addresses of several io_page_fault lines in the trace
and in the kernel log, they line up. So, I'm hopeful this is on the
right track.
DMA/IOMMU trace: https://cmphys.com/ath12k/iommu_dma_trace-20250701.log
Kernel log with additional logging:
https://cmphys.com/ath12k/dmesg-6.12.35-20250701.log
Diff showing extra logging added to v6.12.35:
https://cmphys.com/ath12k/ath12k-extra-logging-6.12.35-20250701.diff
Thanks,
Matt
Reply to: