Re: UDD contains names where spaces and quotes are not stripped
Am Thu, Dec 07, 2023 at 08:36:12PM +0100 schrieb Lucas Nussbaum:
> On 07/12/23 at 20:24 +0100, Andreas Tille wrote:
> > Am Thu, Dec 07, 2023 at 07:59:38PM +0100 schrieb Lucas Nussbaum:
> > > On 07/12/23 at 09:58 +0100, Andreas Tille wrote:
> > > >
> > > > udd=> select '"' || u.name || '"' as name_with_spaces, uploader from uploaders u where name like '% ' or name like ' %' ;
> > > > name_with_spaces | uploader
> > > > --------------------------+-------------------------------------------
> > > > " Mehdi Dogguy" | Mehdi Dogguy <mehdi@debian.org>
> > > > " David Paleino" | David Paleino <dapal@debian.org>
> > > > " Stéphane Glondu" | Stéphane Glondu <glondu@debian.org>
> > > > " Stefano Zacchiroli" | Stefano Zacchiroli <zack@debian.org>
> > > > " Stefano Zacchiroli" | Stefano Zacchiroli <zack@debian.org>
> > > > " Stefano Zacchiroli" | Stefano Zacchiroli <zack@debian.org>
> > > > " Stefano Zacchiroli" | Stefano Zacchiroli <zack@debian.org>
> > > > " Stefano Zacchiroli" | Stefano Zacchiroli <zack@debian.org>
> > > > "Andreas Tille " | Andreas Tille <tille@debian.org>
> > > > " LI Daobing" | LI Daobing <lidaobing@debian.org>
> > > > " David Paleino" | David Paleino <dapal@debian.org>
> > > > " Stefano Zacchiroli" | Stefano Zacchiroli <zack@debian.org>
> > > > " Nikita V. Youshchenko" | Nikita V. Youshchenko <yoush@debian.org>
> > > > " Nikita V. Youshchenko" | Nikita V. Youshchenko <yoush@debian.org>
> > > > " Nikita V. Youshchenko" | Nikita V. Youshchenko <yoush@debian.org>
> > > > " Nikita V. Youshchenko" | Nikita V. Youshchenko <yoush@debian.org>
> > > > " Nikita V. Youshchenko" | Nikita V. Youshchenko <yoush@debian.org>
> > > > "Colin Tuckley " | Colin Tuckley <colint@debian.org>
> > > > "Colin Tuckley " | Colin Tuckley <colint@debian.org>
> > > > "Colin Tuckley " | Colin Tuckley <colint@debian.org>
> > > > (20 rows)
> > > > ...
> > > > UPDATE uploaders SET name = trim(name), uploader = trim(name) || ' ' || email WHERE name like ' %' or name like '% ' ;
> > > >
> >
> >
> > BTW: I found
> >
> > udd=> SELECT count(*), name FROM (SELECT CASE WHEN changed_by_name = '' THEN maintainer_name ELSE changed_by_name END AS name FROM upload_history) uh WHERE name ilike '%tille%' group by name;
> > count | name
> > -------+---------------
> > 16524 | Andreas Tille
> > (1 Zeile)
> >
> > So why do I have 8707 uploads per uploaders but 16524 per upload_history?
???
> > Is my assumption wrong that both values should match (modulo some wrongly
> > spelled names)
Could you please comment on these different results?
> If you look at the uploaders table, there are three columns:
> - 'uploader', than contains the raw data
> - 'name' and 'email' that contain the parsed (and trimmed) data
>
> udd=> select uploader, name, email, count(*) from uploaders where uploader ilike '%tille%' group by 1,2,3;
> uploader | name | email | count
> ------------------------------------+-----------------+------------------+-------
> Andreas Tille <tille@debian.org> | Andreas Tille | tille@debian.org | 8785
> Andreas Tille <andreas@an3as.eu> | Andreas Tille | andreas@an3as.eu | 1
> Andreas Tille <tille@debian.org> | Andreas Tille | tille@debian.org | 1
>
> So, just use name and/or email?
Well, I do not seek for a solution for this (non-)problem. I simply
think that not stripping values from spaces before injecting these into
UDD is wrong. I simply stumbled upon this when I did the query above.
I stumbled upon another reason which might be even worse:
select distinct done, done_name, done_email, owner, owner_name, owner_email from archived_bugs where done_name like '%"%' or owner_name like '%"%' order by done_name;
done | done_name | done_email | owner | owner_name | owner_email
---------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------+-------------------------------------------------+-----------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------+----------------------------------------------
<dererk@debian.org> | | dererk@debian.org | "vanecgs@gmail.com" <vanecgs@gmail.com> | "vanecgs@gmail.com" | vanecgs@gmail.com
<twerner@debian.org> | | twerner@debian.org | "Varun Hiremath" <varunhiremath@gmail.com> | "Varun Hiremath" | varunhiremath@gmail.com
alexander@belikoff.net (Alexander L. Belikoff) | | alexander@belikoff.net | "Alexander L. Belikoff" <alexander@belikoff.net> | "Alexander L. Belikoff" | alexander@belikoff.net
andi@debian.org (Andreas B. Mundt) | | andi@debian.org | "Andreas B. Mundt" <andi@debian.org> | "Andreas B. Mundt" | andi@debian.org
antoine.romain.dumont@gmail.com (Antoine R. Dumont (@ardumont)) | | antoine.romain.dumont@gmail.com | "Antoine R. Dumont" <antoine.romain.dumont@gmail.com> | "Antoine R. Dumont" | antoine.romain.dumont@gmail.com
antoine.romain.dumont@gmail.com (Antoine R. Dumont) | | antoine.romain.dumont@gmail.com | "Antoine R. Dumont" <antoine.romain.dumont@gmail.com> | "Antoine R. Dumont" | antoine.romain.dumont@gmail.com
arturcz@hell.pl (Artur R. Czechowski) | | arturcz@hell.pl | "Artur R. Czechowski" <arturcz@hell.pl> | "Artur R. Czechowski" | arturcz@hell.pl
...
We have lots of names in probably more than archived_bugs which are not
stripped from '"'. You always find the very same names without the
quotes inside the same table. I think this is similarly wrong and even
more annoying than the spaces.
I wonder where we could sensibly discuss those issues which I consider
bugs in UDD. Would it make sense to add some udd category in
`reportbug other` ?
Kind regards
Andreas.
--
http://fam-tille.de
Reply to: