Bug#966649: Request for feedback on upload_history re-implementation
Hi Asheesh,
I'm currently testing your code from the git repository. Interestingly
it also respects future ;-) :
...
Computed upload history for 2020-05
Computed upload history for 2020-06
Computed upload history for 2020-07
Computed upload history for 2020-08
Computed upload history for 2020-09
Computed upload history for 2020-10
Computed upload history for 2020-11
Computed upload history for 2020-12
Computed upload history for 2021-01
Computed upload history for 2021-02
Computed upload history for 2021-03
Computed upload history for 2021-04
Computed upload history for 2021-05
Computed upload history for 2021-06
Computed upload history for 2021-07
Computed upload history for 2021-08
Computed upload history for 2021-09
Computed upload history for 2021-10
Computed upload history for 2021-11
Computed upload history for 2021-12
>From the first look the result looks sensible:
sqlite> select * from upload_history where maintainer like '%debian-med-packaging%' limit 2 ;
E1JAWxz-000605-6N@ries.debian.org|1199391582|gnumed-client|0.2.8.1-1|Andreas Tille <tille@debian.org>|Andreas Tille|tille@debian.org|Debian-Med Packaging Team <debian-med-packaging@lists.alioth.debian.org>|Debian-Med Packaging Team|debian-med-packaging@lists.alioth.debian.org|0|
gnumed-client (0.2.8.1-1) unstable; urgency=low
.
* New upstream version
E1JApSm-0006Xr-2E@ries.debian.org|1199462003|probcons|1.12-4|Charles Plessy <charles-debian-nospam@plessy.org>|Charles Plessy|charles-debian-nospam@plessy.org|Debian-Med Packaging Team <debian-med-packaging@lists.alioth.debian.org>|Debian-Med Packaging Team|debian-med-packaging@lists.alioth.debian.org|0|
probcons (1.12-4) unstable; urgency=low
.
- Allowed upload by Debian Maintainers.
- Checked the compliance with Policy 3.7.3
* debian/patches:
- swiched to quilt
- added a fix to build with GCC 4.3 (Closes: #455625)
* debian/rules:
- modify Main-RNA.cc so that it uses Defaults-RNA.h (Closes: #458926)
* debian/copyright:
- converted to machine-readable format.
.
[ David Paleino ]
* debian/probcons.1, debian/probcons-RNA.1, debian/pc-compare.1,
debian/pc-makegnuplot.1, debian/pc-project.1 added - these
have been statically built.
* debian/control:
- B-D updated
- added myself to Uploaders
* debian/rules:
- manpages statically built
- minor changes
But I guess you consider this table partly a debugging state. I do not
see a good reason to store the full changelog paragraph otherwise. You
also are storing message_id. That's OK from a data consumption point of
view but I do not see any real usage for this field at the moment.
I would love to see the same table structure as in UDD:
source | version | date | changed_by | changed_by_name | changed_by_email | maintainer | maintainer_name | maintainer_email | nmu | signed_by | signed_by_name | signed_by_email | key_id | distribution | file | fingerprint
What I'm missing is signed_by* . No idea what key_id means - never used
this. Distribution might be good to have as well, no idea what file
might have contained. Fingerprint seems also sensible since it could be
a link to the carnivore table.
Regarding the decision to parse the web archives rather than mboxes: I
don't know what is better. I agree that accessing public data is an
advantage but if it is at the expense of more complex code I would
rather stick to the mbox parsing.
BTW, formerly the data went at least back to 2000. Here is the graph
for pkg-perl:
http://blends.debian.net/liststats/uploaders_pkg-perl.png
Currently you encode date as integer in sqlite so I need to think about how to
translate this. For my target query I want to do for my talk it would be
comfortable to have date or datetime values.
So far for my review.
Thanks a lot for your work on this. Its really appreciated!
Kind regards
Andreas.
On Wed, Aug 19, 2020 at 11:03:40PM -0700, Asheesh Laroia wrote:
> Hi Andreas & Lucas & all,
>
> Lucas -- I'm making progress on re-implementing this. I'd love your input
> by email or IRC about my approach, but if you're busy, feel free to ignore
> this and I'll mention you again when I submit a patch.
>
> Andreas -- The codebase at
> https://github.com/paulproteus/debian-devel-changes-history-extractor can
> be run on your system and generate a "upload_history" table. Would you be
> willing to try it out and let me know if it meets your needs?
>
> The README at the URL above has some information about how to use it.
>
> https://drive.google.com/drive/folders/1hF_zuc_03m3a_VwOO5hpjp5vETNjVxMx?usp=sharing
> is a Google Drive folder (owned by me) which contains an
> upload_history.sqlite file you can use. This would allow you to query the
> current database without using the code to create it. (Feel free to also
> use the code to create your own DB.)
>
> I'm happy to discuss by IRC or private email or BTS email what you would
> need next. I do hope to resolve the issues listed in the bug tracker on
> GitHub, but I haven't yet, and feedback will help me prioritize.
>
> Per the info in the README, I'd like to get this merged into UDD in the
> long run, and be happy to have a discussion about the best way to do so.
> There are a few issues I want to fix before formally submitting it -- see
> https://github.com/paulproteus/debian-devel-changes-history-extractor/issues
> for
> a list.
>
> Cheers,
>
> Asheesh.
--
http://fam-tille.de
Reply to: