Re: [UDD] Splitting email phrase and address in bugs.submitter ?
On 18/03/09 at 16:36 +0100, Olivier Berger wrote:
> Hi.
>
> (still not subscribed as interested mostly only in UDD at the moment, so
> please CC-me of responses).
>
> In order to test the UDD to RDF export already described, I have had to
> split email addresses components in bugs.submitter in order to be able
> to extract various components of a foaf:Person, i.e. foaf.name and
> foaf.mbox.
>
> I've done that by adding mail-splitting plperl functions to postgres,
> used in a view :
>
> # this is needed for use of Email::Address in plperl
> $ createlang plperlu UDD
>
> CREATE FUNCTION email_phrase (text) RETURNS text AS $$
> use Email::Address;
> my $value = $_[0];
> my $address = ( Email::Address->parse($value) )[0];
> if ($address) {
> return $address->phrase;
> } else {return '';};
> $$ LANGUAGE plperlu;
>
> CREATE FUNCTION email_address (text) RETURNS text AS $$
> use Email::Address;
> my $value = $_[0];
> my $address = ( Email::Address->parse($value) )[0];
> if ($address) {
> return $address->address;
> } else {return '';};
> $$ LANGUAGE plperlu;
>
> Then :
> CREATE VIEW d2r_bugsubmitter AS
> SELECT DISTINCT bugs.submitter, email_phrase(bugs.submitter), email_address(bugs.submitter) FROM bugs
>
> Maybe this could be tuned so that such calculations are cached as the
> functions results are constant, but at the moment it runs really slow,
> and considering there are > 20000 bug reporters in UDD... it's not
> really efficient.
>
> I'm no postgres expert, and maybe there would be better ways to do
> that... but obviously there's at least the possibility to do it as early
> as possible, i.e. during UDD bugs table filling.
>
> Do you think that such a change might be done on UDD ?
I would prefer to do that at import-time, like it is done for the
sources and uploaders table (maintainer being split in maintainer_name
and maintainer_email).
The reason it is not done for the bugs importer is just that people have
been lazy :-) I would very much appreciate a patch.
> Maybe also linking to the carnivore-related tables for bug reporters who
> are alread present in carnivore also, then (once the email is splitted
> apart) ?
One the email is splitted, joining the data with carnivore is a simple
JOIN, so I don't think that there's a need to duplicate the
carnivore_id into the bugs table. Also, that would break the "data from
different sources must not be inter-dependent in UDD" rule.
--
| Lucas Nussbaum
| lucas@lucas-nussbaum.net http://www.lucas-nussbaum.net/ |
| jabber: lucas@nussbaum.fr GPG: 1024D/023B3F4F |
Reply to: