Stats on packages not on Salsa (Was: Bits from DPL)
- To: Stuart Prescott <stuart@debian.org>
- Cc: debian-devel@lists.debian.org, Debian QA List <debian-qa@lists.debian.org>
- Subject: Stats on packages not on Salsa (Was: Bits from DPL)
- From: Andreas Tille <tille@debian.org>
- Date: Thu, 9 Jan 2025 17:30:53 +0100
- Message-id: <[🔎] Z3_5vVzCGH3sUO5U@an3as.eu>
- In-reply-to: <a15353fc-9927-44c7-87f2-6ede275937a1@debian.org>
- References: <Z3lMP_R9ccE4YZuF@an3as.eu> <CAOU6tAAvJ9ZY0YVfUXakjs8QrVQ=ekt80H4vhL_HXKAS1tfwJg@mail.gmail.com> <c0953837-d3b3-43f0-8b40-63ffd5df190e@debian.org> <Z316foUEWHMe0hYy@an3as.eu> <1b5757c9-0f58-44e2-b431-93a5279d01d6@debian.org> <Z36f-I9KglaYJOeQ@an3as.eu> <a15353fc-9927-44c7-87f2-6ede275937a1@debian.org>
Hi Stuart,
changing subject and suggest moving the topic to Debian QA list where
it probably belongs.
Am Thu, Jan 09, 2025 at 11:54:47AM +1100 schrieb Stuart Prescott:
> Good point on anonscm as well... that really does blow out the numbers.
Unfortunately yes.
> However... some of them still work via the aliasing mechanism that was
> introduced at the time of migration to salsa.
In the migration phase from Alioth to Salsa I maintained lists of
packages for Debian Med and Debian Science team. In my practical
experience finding some working alias is a rare exception. I also think
this alias mechanism was a temporary solution that should not survive
for >5 years.
> Duck used to check them all
> but I don't think it is running any more, unfortunately. vcswatch still
> does, more on that later.
Vcswatch is a good hint.
> > > The vast majority of these packages have seen post-alioth uploads but with
> > > the broken Vcs fields still in place.
> >
> > Do you have numbers backing up this "vast majority" statement?
>
> Yes, that's in the table below. Of those 161 packages, 145 have been
> uploaded since salsa launched and alioth stopped. (updated data with anonscm
> at the bottom - the story is still the same, although not all those anonscm
> links are broken)
Ahhh, got your point now. The Bug of the Day criteria are selecting
packages that are not uploaded for a long time and thus might experience
is different.
> > > (I accidentally found 2 python-team packages without Vcs URLs yesterday -
> > > the repos were on salsa, just not listed in d/control)
> >
> > Not so nice. Did you just injected these? If not would you mind naming
> > the packages?
>
> One got uploaded because I was sorting other changes for qtpy, the other is
> fixed in git. Having looked at 20-something packages in the last 2 days, I'm
> not sure I could actually name which ones at this stage...
OK as long as these are fixed now.
> In pursuing this, you might also find the vcswatch table in udd - it lists
> 1533 packages where the VCS fields might need fixing. Some of the errors
> there are transient, but this also picks up typos in the VCS fields
> ('debain', 'debian/packages/') and repos that simply don't exist.
Good point.
> Updated queries and data appended. (and btw postgres can do regex matches
> which simplifies the sql quite a lot)
I'm aware in principle about the regexp feature. Unfortunately I have
to deal with SQL databases without this kind feature in my day job. So
I usually try to avoid PostgreSQL only features.
> Majority of packages with invalid vcs_url uploaded post salsa:
>
SELECT
DATE_PART('year', date) AS year,
COUNT(*)
FROM
sources AS s
JOIN upload_history AS h
ON s.source = h.source AND s.version = h.version
WHERE
release = 'sid'
AND vcs_url ~ '/(git|svn|alioth|anonscm).debian.org'
GROUP BY
year
ORDER BY
year ASC;
>
> year | count
> -----+-------
> 2011 | 2
> 2012 | 5
> 2013 | 7
> 2014 | 9
> 2015 | 9
> 2016 | 20
> 2017 | 102
> 2018 | 85 ← (salsa.d.o general availability)
> 2019 | 10
> 2020 | 77
> 2021 | 411
> 2022 | 115
> 2023 | 13
> 2024 | 31
> 2025 | 3
> (15 rows)
>
>
>
> Teams with packages to fix - and the packages are probably already on salsa
> so this is just metadata, not lots of work.
>
SELECT
maintainer_name, COUNT(*)
FROM sources
WHERE
release = 'sid'
AND vcs_url ~ '/(git|svn|alioth|anonscm).debian.org'
AND maintainer ~ '(team|group|lists)'
GROUP BY
maintainer_name
ORDER BY
count DESC;
>
> maintainer_name | count
> ---------------------------------+-------
> Debian Ruby Extras Maintainers | 196 (+2 that are in Uploaders)
> Debian Java Maintainers | 178
> Debian Go Packaging Team | 105
> Debian Perl Group | 83
> pkg-go | 25
> Debian Javascript Maintainers | 20
> Debian Fonts Task Force | 15
> Debian PHP PEAR Maintainers | 14
> Debian X Strike Force | 12
> Debian Science Maintainers | 11
> Debian XML/SGML Group | 5
> Debichem Team | 4
> Debian VDR Team | 4
> Debian CLI Applications Team | 2
> Debian Games Team | 2
> Debian Java maintainers | 2
> Debian Tasktools Packaging Team | 2
> Debian VoIP Team | 2
> Debian Astronomy Maintainers | 2
> Debian Privacy Tools Maintainers | 2
> Debian Clojure Maintainers | 2
> Debian Astronomy Team | 2
> Debian Telepathy maintainers | 2
> Live Systems Maintainers | 1
> The Debian Lua Team | 1
> Pulseaudio maintenance team | 1
> Android Tools Maintainers | 1
> Debian PhotoTools Maintainers | 1
> Puppet Package Maintainers | 1
> ClamAV Team | 1
> Debian-IN Team | 1
> Debian CLI Libraries Team | 1
> Debian Islamic Maintainers | 1
> Debian GNOME Maintainers | 1
> Debian Science Team | 1
> Debian Sugar Team | 1
> Debian GNUKhata Team | 1
> Debian Emacs addons team | 1
> Debian Med Packaging Team | 1
> Debian Salt Team | 1
> NeuroDebian Team | 1
>
>
>
> Find packages in your favourite team that you want to work on...
SELECT
source, vcs_url
FROM sources
WHERE
release = 'sid'
AND vcs_url ~ '/(git|svn|alioth|anonscm).debian.org'
AND maintainer ~ 'science'
ORDER BY
source;
Thank you for publishing these data - I hope this will encourage people
to look into this.
> The vcswatch table has lots of interesting things... Note that the salsa
> error "could not read Username" in the table is not a misconfiguration - it
> means that the repo couldn't be obtained anonymously, which could be that it
> doesn't exist, or that it needs permissions - both are wrong for Debian.
>
SELECT
source, url, error
FROM
vcswatch
WHERE
error IS NOT NULL
ORDER BY
source;
I've remove the quotation markers from the SQL queries to enable easy
copy-n-pasting for the readers. I confirm a couple of Debian Science
packages will not show up any more tomorrow (but some are not simple
metadata fixes since a lot has happened on code in Git which does not
build currently - at least I pinged the team in those cases).
Kind regards
Andreas.
--
https://fam-tille.de
Reply to: