[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [UDD] Splitting email phrase and address in bugs.submitter ?



On 18/03/09 at 16:36 +0100, Olivier Berger wrote:
> Hi.
> 
> (still not subscribed as interested mostly only in UDD at the moment, so
> please CC-me of responses).
> 
> In order to test the UDD to RDF export already described, I have had to
> split email addresses components in bugs.submitter in order to be able
> to extract various components of a foaf:Person, i.e. foaf.name and
> foaf.mbox.
> 
> I've done that by adding mail-splitting plperl functions to postgres,
> used in a view :
> 
> # this is needed for use of Email::Address in plperl
> $ createlang plperlu UDD
> 
> CREATE FUNCTION email_phrase (text) RETURNS text AS $$
> use  Email::Address;
> my $value = $_[0];
> my $address = ( Email::Address->parse($value) )[0];
> if ($address) {
> return $address->phrase;
> } else {return '';};
> $$ LANGUAGE plperlu;
> 
> CREATE FUNCTION email_address (text) RETURNS text AS $$
> use  Email::Address;
> my $value = $_[0];
> my $address = ( Email::Address->parse($value) )[0];
> if ($address) {
> return $address->address;
> } else {return '';};
> $$ LANGUAGE plperlu;
> 
> Then :
> CREATE VIEW d2r_bugsubmitter AS
> SELECT DISTINCT bugs.submitter, email_phrase(bugs.submitter), email_address(bugs.submitter)   FROM bugs
> 
> Maybe this could be tuned so that such calculations are cached as the
> functions results are constant, but at the moment it runs really slow,
> and considering there are > 20000 bug reporters in UDD... it's not
> really efficient.
> 
> I'm no postgres expert, and maybe there would be better ways to do
> that... but obviously there's at least the possibility to do it as early
> as possible, i.e. during UDD bugs table filling.
> 
> Do you think that such a change might be done on UDD ?

I would prefer to do that at import-time, like it is done for the
sources and uploaders table (maintainer being split in maintainer_name
and maintainer_email).

The reason it is not done for the bugs importer is just that people have
been lazy :-) I would very much appreciate a patch.

> Maybe also linking to the carnivore-related tables for bug reporters who
> are alread present in carnivore also, then (once the email is splitted
> apart) ?

One the email is splitted, joining the data with carnivore is a simple
JOIN, so I don't think that there's a need to duplicate the
carnivore_id into the bugs table. Also, that would break the "data from
different sources must not be inter-dependent in UDD" rule.
-- 
| Lucas Nussbaum
| lucas@lucas-nussbaum.net   http://www.lucas-nussbaum.net/ |
| jabber: lucas@nussbaum.fr             GPG: 1024D/023B3F4F |


Reply to: