[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Status bibref gatherer (Was: Tasks pages (close to) fixed; Bibref does not seem to be updated automatically)



Hi Andreas,

in the first round where I wrote the UDD importer you accepted the format I
proposed.  Writing this importer by myself took me countless hours, as I did
not know Python programming, and as the handling of Unicode in the UDD was not
so intuitive.  Then you found bugs and decided to throw everything to the bin.
But why don't you report and track the bugs instead ?

This monster email thread with many point-to-point comments is an information
blackhole.  This, plus the fact that there are only 24 hours in a day, are the
reason why some of the issues that count for you are not fixed.  But that does
not mean that the only way to fix them is to do everything by yourself without
sharing your code and without communicating your design requirements.

If we want to achieve something together, we need to be more organised.  Let's
use at least a wiki page or the TODO file of the umegaya repository to track
down open problems.  Also if we keep using Umegaya (the engine behind
upstream-metadata.debian.net, which provides consolidated tables and the pool
of upstream and copyright files), I will upload the debian package, so that we
have a bug tracker for free.

So far, we have on you side:

 - UDD tables must have a primary key.
 - Bibliographic data must support the loading of more than one reference.

On my side:

 - The syntax of debian/upstream is documented on the Debian wiki
   (http://wiki.debian.org/UpstreamMetadata) and changes must be discussed
   in advance.
 - The current syntax does not support complex structures such as arrays,
   and your way of loading Bibliographic references is therefore not supported.
 - The use of YAML mappings (hashes) is a syntax hack that you made me
   deeply regret.  If you start to depend on them, you break the equivalence
   between Foo-Bar: Baz and Foo: {Bar: Baz}.  

I have been working on this project since 2009.  My goal is not only to provide
bibliographic data to the UDD, but also to provide a machine-readable file that
is convenient for general use.  While I have picked YAML for that file, I
really do not want to support the full YAML syntax, but rather a minimal subset
that is close to the Debian control data files.  Currently, this is done by
only supporting YAML scalars (except for the hash hack).

The UDD loader is the first serious use of this data, but I think we should not
overfit the debian/upstream syntax to this sole use.  In particular, I do think
that the gatherer I made has some value, to give an easy access to the contents
of these files without downloading the source package or visiting a VCS web
interface.  For instance I recently used it to browse our watch files, in order
to find examples of direct detection in Google code or SourceForge.  (
http://upstream-metadata.debian.net/table/watch ).  It will also be more
efficient than the UDD for quick on-demand requests about single packages or
single fields.

In summary, let's track down issues and solve them one by one, and avoid long
point-to-point threads.  On my side, I do not manage to be efficient with them.  

Have a nice week-end,

-- 
Charles Plessy
Debian Med packaging team,
http://www.debian.org/devel/debian-med
Tsurumi, Kanagawa, Japan


Reply to: