[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Status bibref gatherer (Was: Tasks pages (close to) fixed; Bibref does not seem to be updated automatically)



On Sat, Mar 10, 2012 at 02:37:43PM +0900, Charles Plessy wrote:
> I agreed to provide flat files in a way that they can be parsed by anybody,
> because I agree that such a repository has some value, especially since it
> includes the copyright files as well.

Fully ACK.

> But for feeding the UDD with the bibliographic information, I feel that your
> effort is redundant with my work.

Yes, there is some redundancy however, in this thread I tried to find
out your motivation for creating an intermediate database and perhaps I
failed in understanding your mails but it became not clear to me in how
far this is helpful for an UDD importer.  When I realised that we get
duplicated keys from the intermediate file format and tried to track
down the reasons it became very intransparent (read, close to impossible
for me) to debug the problem.

> Most UDD tables are fed by downloading the
> data from other sites,

As far as I can see the most tables are using directly from the most
direct source: packages, sources, bugs, ddtp, ftpnew, screenshots are
using the original files which are used unchanged in other applications -
so I can not buy this argument.

> and while your approach is more direct in apparence,

Which is totally intended.

> I do not think that it is necessary.

As I said: To use a reasonable primary key I felt that this is very
necessary.  Moreover it is very simple to parse those yaml files and
keeping things simple makes perfectly sense to me.

> Moreover, it uses the files collected by
> umegaya, and therefore produces the same output, with the same limitations.

I'm afraid I have problems understanding this sentence.  Is umegaya for
you the effort to assemble debian/upstream files for every package or is
it rather the creation of the database.  Moreover I do not understand
what limitations you mean.

> ... 
> Also, your importer is inducing changes in the syntax of the files, that I
> would prefer to be discussed first.

I do not think that the importer is inducing any changes.  The reasons
for my changes were given in my last mail and were basically
misspellings and removing duplicates as well as some esthetical changes
(which are not needed but seemed quite reasonable to me).

> The debian/upstream files are a good
> medium to feed the UDD, but they can have other uses.  The more we take ad-hoc
> decisions with only the UDD in mind, the more we close doors for other uses.

What exact uses do you have in mind?

> For the references in particular, I think that we need to think twice before
> introducing arrays.  As I explained earlier, I would prefer that the fields
> remain flat.  With the array syntax, packages for which there is only one
> reference, and they are the majority, would need to use an array as well:
> 
>   Reference:
>    - Title: foo
>      Name: bar
> 
> instead of
> 
>   Reference:
>     Title: foo
>     Name: bar
> 
> or
> 
>   Reference-Title: foo
>   Reference-Name: bar
> 
> With the array syntax, I expect a lot of confusion and errors.  This could be
> solved as simply as keeping the original syntax for the main reference, and
> using an array for all the other ones.  I think that this example underlines
> that we need to discuss first before making choices.  Otherwise, it becomes
> your personal project.

I would like to stress that in the first time I resisted to support more
than one reference.  However we had at least to people (Laszlo and
Yaroslav) who prefered more than one reference and there are practical
use cases.  If this would not be the case I would definitely stay away
from making things more complicated.

> Regarding the use of hashes to represent complex field names, this is not a
> feature of YAML.

What hashes do you mean?

Kind regards

        Andreas.

-- 
http://fam-tille.de


Reply to: