[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [UDD] Fixing (most) email addresses in upload_history table



On 22/01/11 at 15:19 +0100, Andreas Tille wrote:
> Hi,
> 
> I was aksed to do some investigation in who is active in maintainer
> teams and thus I looked deeper into upload_history to find out who is
> responsible for the changelog entries in the uploaded packages.  When
> doing so I noticed that a lot of email addresses in this table are
> broken.  This basically concerns:
> 
> udd# SELECT DISTINCT changed_by, changed_by_name, changed_by_email,                                                                                                                             regexp_replace(changed_by, E'^[^<]\+<\([.\\w]\+@[.\\w]\+\)>.*', E'\\1') AS changed_by_email_new,                                                                                          regexp_replace(changed_by, E'^[^\\w]*\([^<]\+[.\\w]\) *<[.\\w]\+@[.\\w]\+>.*', E'\\1') AS changed_by_name_new                                                                        FROM upload_history                                                                                                                                                                      WHERE changed_by_email NOT LIKE '%@%'                                                                                                                                                       AND changed_by       LIKE '%<%@%.%>%'                                                                                                                                                  ;
>                  changed_by                  | changed_by_name | changed_by_email |  changed_by_email_new  |  changed_by_name_new  
> ---------------------------------------------+-----------------+------------------+------------------------+-----------------------
>  Thomas Bushnell, BSG <tb@becket.net>        |                 | Thomas           | tb@becket.net          | Thomas Bushnell, BSG
>  Ivan, Wong Yat Cheung <email@ivanwong.info> |                 | Ivan             | email@ivanwong.info    | Ivan, Wong Yat Cheung
>  Thomas Bushnell, BSG <tb@debian.org>        |                 | Thomas           | tb@debian.org          | Thomas Bushnell, BSG
>  Adam C. Powell, IV <hazelsct@debian.org>    |                 | Adam             | hazelsct@debian.org    | Adam C. Powell, IV
>  RISKO, Gergely <risko@debian.org>           |                 | RISKO            | risko@debian.org       | RISKO, Gergely
>  Eric Gillespie, Jr. <epg@progeny.com>       |                 | Eric             | epg@progeny.com        | Eric Gillespie, Jr.
>  TransNexus, Inc. <support@transnexus.com>   |                 | TransNexus       | support@transnexus.com | TransNexus, Inc.
>  Eric Gillespie, Jr. <epg@debian.org>        |                 | Eric             | epg@debian.org         | Eric Gillespie, Jr.
>  John H. Robinson, IV <jaqque@debian.org>    |                 | John             | jaqque@debian.org      | John H. Robinson, IV
>  $"Jaldhar H. Vyas <jaldhar@debian.org>"     |                 | $                | jaldhar@debian.org     | Jaldhar H. Vyas
> 
> And it turns out that basically the comma (',') in the maintainer name
> has caused the parser to fail.  UDD contains more than 700 entries of
> this type (here hidden by the DISTINCT).  I'm quite interested in
> getting this fixed - at least the email field -  because otherwise I
> have no good chance to reliable get the stats for Debian Science Team
> (Adam C. Powell is quite active here and he would simply vanish from
> the stats if this is not fixed).
> 
> The question is now:  As you can see there is a quite simple fix with
> regular expressions which probably can easily ported to the importer
> code and which helps in > 700 cases reliably (there are four cases which
> can not catched by this means (see bottom of the mail) but we just fetch
> the vast majority of broken but easy to guess e-mail addresses fixed.
> 
> Is there anything wrong in my arguing that this should be fixed.  If
> not I will provide a patch for the importer.

Yes, please fix this in the importer.

L.


Reply to: