[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#966649: Request for feedback on upload_history re-implementation



Hi Asheesh,

I'm currently testing your code from the git repository.  Interestingly
it also respects future ;-) :

...
Computed upload history for 2020-05
Computed upload history for 2020-06
Computed upload history for 2020-07
Computed upload history for 2020-08
Computed upload history for 2020-09
Computed upload history for 2020-10
Computed upload history for 2020-11
Computed upload history for 2020-12
Computed upload history for 2021-01
Computed upload history for 2021-02
Computed upload history for 2021-03
Computed upload history for 2021-04
Computed upload history for 2021-05
Computed upload history for 2021-06
Computed upload history for 2021-07
Computed upload history for 2021-08
Computed upload history for 2021-09
Computed upload history for 2021-10
Computed upload history for 2021-11
Computed upload history for 2021-12


>From the first look the result looks sensible:

sqlite> select * from upload_history where maintainer like '%debian-med-packaging%' limit 2 ;
E1JAWxz-000605-6N@ries.debian.org|1199391582|gnumed-client|0.2.8.1-1|Andreas Tille <tille@debian.org>|Andreas Tille|tille@debian.org|Debian-Med Packaging Team <debian-med-packaging@lists.alioth.debian.org>|Debian-Med Packaging Team|debian-med-packaging@lists.alioth.debian.org|0|
 gnumed-client (0.2.8.1-1) unstable; urgency=low
 .
   * New upstream version
E1JApSm-0006Xr-2E@ries.debian.org|1199462003|probcons|1.12-4|Charles Plessy <charles-debian-nospam@plessy.org>|Charles Plessy|charles-debian-nospam@plessy.org|Debian-Med Packaging Team <debian-med-packaging@lists.alioth.debian.org>|Debian-Med Packaging Team|debian-med-packaging@lists.alioth.debian.org|0|
 probcons (1.12-4) unstable; urgency=low
 .
     - Allowed upload by Debian Maintainers.
     - Checked the compliance with Policy 3.7.3
   * debian/patches:
     - swiched to quilt
     - added a fix to build with GCC 4.3 (Closes: #455625)
   * debian/rules:
     - modify Main-RNA.cc so that it uses Defaults-RNA.h (Closes: #458926)
   * debian/copyright:
     - converted to machine-readable format.
 .
   [ David Paleino ]
   * debian/probcons.1, debian/probcons-RNA.1, debian/pc-compare.1,
     debian/pc-makegnuplot.1, debian/pc-project.1 added - these
     have been statically built.
   * debian/control:
     - B-D updated
     - added myself to Uploaders
   * debian/rules:
     - manpages statically built
     - minor changes

But I guess you consider this table partly a debugging state.  I do not
see a good reason to store the full changelog paragraph otherwise.  You
also are storing message_id.  That's OK from a data consumption point of
view but I do not see any real usage for this field at the moment.

I would love to see the same table structure as in UDD:

   source | version | date | changed_by | changed_by_name | changed_by_email | maintainer | maintainer_name | maintainer_email | nmu | signed_by | signed_by_name | signed_by_email | key_id | distribution | file | fingerprint

What I'm missing is signed_by* .  No idea what key_id means - never used
this.  Distribution might be good to have as well, no idea what file
might have contained.  Fingerprint seems also sensible since it could be
a link to the carnivore table.


Regarding the decision to parse the web archives rather than mboxes: I
don't know what is better.  I agree that accessing public data is an
advantage but if it is at the expense of more complex code I would
rather stick to the mbox parsing.

BTW, formerly the data went at least back to 2000.  Here is the graph
for pkg-perl:

   http://blends.debian.net/liststats/uploaders_pkg-perl.png

Currently you encode date as integer in sqlite so I need to think about how to
translate this.  For my target query I want to do for my talk it would be
comfortable to have date or datetime values.

So far for my review.

Thanks a lot for your work on this.  Its really appreciated!

Kind regards

      Andreas.

On Wed, Aug 19, 2020 at 11:03:40PM -0700, Asheesh Laroia wrote:
> Hi Andreas & Lucas & all,
> 
> Lucas -- I'm making progress on re-implementing this. I'd love your input
> by email or IRC about my approach, but if you're busy, feel free to ignore
> this and I'll mention you again when I submit a patch.
> 
> Andreas -- The codebase at
> https://github.com/paulproteus/debian-devel-changes-history-extractor can
> be run on your system and generate a "upload_history" table. Would you be
> willing to try it out and let me know if it meets your needs?
> 
> The README at the URL above has some information about how to use it.
> 
> https://drive.google.com/drive/folders/1hF_zuc_03m3a_VwOO5hpjp5vETNjVxMx?usp=sharing
> is a Google Drive folder (owned by me) which contains an
> upload_history.sqlite file you can use. This would allow you to query the
> current database without using the code to create it. (Feel free to also
> use the code to create your own DB.)
> 
> I'm happy to discuss by IRC or private email or BTS email what you would
> need next. I do hope to resolve the issues listed in the bug tracker on
> GitHub, but I haven't yet, and feedback will help me prioritize.
> 
> Per the info in the README, I'd like to get this merged into UDD in the
> long run, and be happy to have a discussion about the best way to do so.
> There are a few issues I want to fix before formally submitting it -- see
> https://github.com/paulproteus/debian-devel-changes-history-extractor/issues
> for
> a list.
> 
> Cheers,
> 
> Asheesh.

-- 
http://fam-tille.de


Reply to: