On 6/6/25 9:31 PM, Paul Gevers wrote:
As is custom for the Release Team, I'm asking you what your plans are
with respect to testing upgrading DSA maintained machines to trixie.
If my information is correct, in the past you'd first upgrade a non
critical machine to see if there's anything broken in trixie that
prevents you from maintaining the machines in a decent manner. When
things look OK, I understand it's custom to upgrade at least one buildd
for every release architecture to see if all architecture buildds remain
working as they should on trixie.
I'd like to warn you on this front already with my experience from
upgrading ci.d.n machines this week (see backlog of #d-devel if you have
it). Due to changes in systemd that raise the amount of open file
descriptors [1], some builds and tests may timeout or use absurd amount
of RAM, e.g. [2]. I had to limit fs.nr_open [3] to the bookworm values
to prevent bad behavior taking the service down, even if things are
fixed in unstable/trixie.
Current progress: arm-conova-01, x86-grnet-01, ppc64el-conova-01 are
upgraded. I did not touch s390x yet. riscv64 is all trixie anyway (and
physical machines). mips64el I cannot upgrade.
All of the upgraded hosts of these were VMs, we should still also try to
upgrade physical hosts - but AFAICS for arm64 and ppc64el all physical
machines we have today are part of Ganeti clusters and it'd be unwise to
upgrade them individually. For x86 I'm not that worried, but we could do
that.
So I think we ultimately would need to figure out if there's a working
Ganeti in trixie and go and upgrade single clusters. (Preferably
starting with an x86 cluster, so that we don't run into architecture
specific issues on top of that on arm64, or ppc64el (where we only have
single host clusters).