Bug#1057878: qa.debian.org: UDD upload_history has truncated email addresses
On 10/12/23 at 12:10 +1100, Stuart Prescott wrote:
> Package: qa.debian.org
> Severity: normal
> X-Debbugs-Cc: stuart@debian.org
>
> The 'maintainer' and 'maintainer_email' columns of the upload_history table
> in UDD have truncated email addresses. Somewhere the 'maintainer' data
> is being truncated and then the maintainer_email is consequently broken.
>
> udd=> SELECT maintainer, maintainer_email FROM upload_history WHERE maintainer_email LIKE '%=' LIMIT 10;
> maintainer | maintainer_email
> ----------------------------------------------------------------+----------------------------------------------
> Maintainers of GStreamer packages <pkg-gstreamer-maintainers@= | pkg-gstreamer-maintainers@=
> Maintainers of GStreamer packages <pkg-gstreamer-maintainers@= | pkg-gstreamer-maintainers@=
> Zenoss Packaging Team <pkg-zenoss-team@lists.alioth.debian.or= | pkg-zenoss-team@lists.alioth.debian.or=
> Debian GNOME Maintainers <pkg-gnome-maintainers@lists.alioth.= | pkg-gnome-maintainers@lists.alioth.=
> Debian Perl Group <pkg-perl-maintainers@lists.alioth.debian.o= | pkg-perl-maintainers@lists.alioth.debian.o=
> Debian VoIP Team <pkg-voip-maintainers@lists.alioth.debian.or= | pkg-voip-maintainers@lists.alioth.debian.or=
> Debian Python Modules Team <python-modules-team@lists.alioth.= | python-modules-team@lists.alioth.=
> Debian Python Modules Team <python-modules-team@lists.alioth.= | python-modules-team@lists.alioth.=
> Debian Firebird Group <pkg-firebird-general@lists.alioth.debi= | pkg-firebird-general@lists.alioth.debi=
> Debian Samba Maintainers <pkg-samba-maint@lists.alioth.debian= | pkg-samba-maint@lists.alioth.debian=
> (10 rows)
>
>
> The input data from the d-d-c mailing list looks fine in the web archive,
> but I can imagine this being due to linewrappig in the mbox files.
>
> Looking at one specific example:
>
> https://lists.debian.org/debian-devel-changes/2007/12/msg00466.html
>
> udd=> SELECT maintainer, maintainer_email FROM upload_history WHERE maintainer_email LIKE '%=' AND source = 'libxml-rss-perl' AND version = '1.31-3';
> maintainer | maintainer_email
> ----------------------------------------------------------------+---------------------------------------------
> Debian Perl Group <pkg-perl-maintainers@lists.alioth.debian.o= | pkg-perl-maintainers@lists.alioth.debian.o=
> (1 row)
>
> This particular example is quite old but the problem also exists in
> recent uploads; as of writing the most recent one is libgetdata (0.11.0-9)
> that was uploaded today.
>
> Of the 850k rows in upload_history, this data issue is in 70k of them.
Hi,
I did some changes to the email decoding that solved most cases. We are
down to 1162 badly processed emails (from the 70k you reported):
udd=> SELECT count(*) FROM upload_history WHERE maintainer_email LIKE '%=';
count
-------
1162
They are all since 2022-08-27, which coincides with dak adding a
detached signature. So there might still be something to fix in the code
for that case.
udd=> select source, version, date from upload_history where maintainer_email LIKE '%=' order by date asc limit 10;
source | version | date
----------------------------+---------------+------------------------
libsweble-common-java | 3.0.8-3 | 2022-08-27 20:49:34+00
xeus | 2.4.0-2 | 2022-08-27 20:49:43+00
systemd | 251.4-3 | 2022-08-27 22:05:51+00
cross-toolchain-base-ports | 53 | 2022-08-28 10:04:10+00
opencascade | 7.6.3+dfsg1-3 | 2022-08-28 10:36:28+00
wvkbd | 0.10-1 | 2022-08-28 10:36:40+00
gobject-introspection | 1.73.0+ds-1 | 2022-08-28 10:49:10+00
yade | 2022.01a-11 | 2022-08-28 11:05:40+00
ruby-em-http-request | 1.1.7-1 | 2022-08-28 12:29:29+00
ruby-rails-i18n | 7.0.5-1 | 2022-08-28 14:51:31+00
(10 rows)
Lucas
Reply to: