[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#966649: marked as done (UDD: 'upload_history' importer broken; needs porting to Python3)



Your message dated Tue, 25 Aug 2020 22:52:56 -0700
with message-id <CAMumaChEOPvfjTO54pVnig432Sbr1RMgAs3A+xaN+=3fbq2Q=Q@mail.gmail.com>
and subject line upload_history is back
has caused the Debian Bug report #966649,
regarding UDD: 'upload_history' importer broken; needs porting to Python3
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
966649: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=966649
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems
--- Begin Message ---
Package: qa.debian.org
User: qa.debian.org@packages.debian.org
Usertags: udd

Hi,

The upload_history importer works as follows:

1) /srv/udd.debian.org/email-archives/debian-devel-changes/ contains a copy
of the email archives, copied manually from master.debian.org. The
latest emails are received directly on ullmann, to /srv/udd.debian.org/email-archives/debian-devel-changes/debian-devel-changes.current
This part is about OK. It would be better if DSA provided a way to
access those archives from ullmann without having to copy them from time
to time.

2) When started, the importer first runs 'make' in /srv/udd.debian.org/upload-history/. This:
2.1) updates local copies of keyrings
2.2) using 'munge_ddc.py', converts email archives into summarized versions, stored as, e.g.:
/srv/udd.debian.org/upload-history/debian-devel-changes.201209.gz.out

3) then the importer reads *.out and import them into postgres.

'munge_ddc.py' has the following issues:
- it's not version-controlled
- it doesn't support xz email archives, so it's broken for recent
  archives
- it's python2 (but the lzma module is python3-only)

Help would be welcomed to port it to python3 and resolve the other
issues. Also, the data files around the upload_history gatherer should
probably be reorganized with a cleaner separation between code (that
should be versioned in UDD) and data.

Lucas

--- End Message ---
--- Begin Message ---
Thanks to Lucas for reviewing & merging & QA-ing this UDD merge request https://salsa.debian.org/qa/udd/-/merge_requests/26 and a few follow-up pushes/merge-requests.

I went with the approach Lucas suggested, where we still read the mbox files that mentioned at the start of the bug.

Test yourself with e.g. this command (which queries the public UDD mirror, but you can use the real UDD if you can connect to ullmann.debian.org)!

$ echo 'select date,source,version from upload_history order by date desc limit 5;' | psql "postgresql://udd-mirror:udd-mirror@udd-mirror.debian.net/udd"
          date          |       source        |     version      
------------------------+---------------------+------------------
 2020-08-26 00:33:28+00 | folding-mode-el     | 0+20200825.748-1
 2020-08-25 23:50:23+00 | supervisor          | 4.2.1-1
 2020-08-25 23:21:29+00 | php-doctrine-bundle | 2.1.2-1
 2020-08-25 23:21:09+00 | firefox-esr         | 68.12.0esr-1
 2020-08-25 22:33:30+00 | pandas              | 1.0.5+dfsg-1
(5 rows)

--- End Message ---

Reply to: