[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Status bibref gatherer (Was: Tasks pages (close to) fixed; Bibref does not seem to be updated automatically)



On Sun, Mar 11, 2012 at 12:42:38PM +0900, Charles Plessy wrote:
> in the first round where I wrote the UDD importer you accepted the format I
> proposed.  Writing this importer by myself took me countless hours, as I did
> not know Python programming, and as the handling of Unicode in the UDD was not
> so intuitive.  Then you found bugs and decided to throw everything to the bin.
> But why don't you report and track the bugs instead ?

Perhaps I was not verbose enough about this but I tried to give an
answer in my mail from yesterday.  I tried to fix your bibref importer
in the first place to enable a primary key.  When diving into this I
noticed that the way to obtain the data via the detour of the web
service is not always reliable (there was some point in time when it
returned zero bytes and I worked around removing existing data in UDD in
your code - r2159).  After fixing this I realised that references of
packages were missing and started a longish thread about this here which
for my understanding resulted in the consensus to parse upstream files
directly - which for sure implied a rewrite of the code.  Sorry if I
might have misunderstood you in this sense.
 
> This monster email thread with many point-to-point comments is an information
> blackhole.  This, plus the fact that there are only 24 hours in a day, are the
> reason why some of the issues that count for you are not fixed.  But that does
> not mean that the only way to fix them is to do everything by yourself without
> sharing your code and without communicating your design requirements.

There are two simple reasons why I did not (yet) shared the code:
Currently there is a parallel change in the UDD tree SVN happening and I
did not want to spoil it with unfinished / not fully tested code.  (BTW,
the code is installed on blends.debian.net in /org/udd/udd.) You admited
that you are not comfortable with Python programming and thus I did not
regarded you as very keen on taking part in the coding neither did I
expected anybody else to join - so no need to push unfinished code.  The
code is based on the specification we agreed upon (or at least what I
considered an agreement because you proposed this poolish layout of
organising upstream files).
 
> If we want to achieve something together, we need to be more organised.  Let's
> use at least a wiki page or the TODO file of the umegaya repository to track
> down open problems.  Also if we keep using Umegaya (the engine behind
> upstream-metadata.debian.net, which provides consolidated tables and the pool
> of upstream and copyright files), I will upload the debian package, so that we
> have a bug tracker for free.
> 
> So far, we have on you side:
> 
>  - UDD tables must have a primary key.
>  - Bibliographic data must support the loading of more than one reference.
> 
> On my side:
> 
>  - The syntax of debian/upstream is documented on the Debian wiki
>    (http://wiki.debian.org/UpstreamMetadata) and changes must be discussed
>    in advance.

Yes. I have the very strong feeling that we are discussing orthogonal
here.  Please make me understand better by specifying an example commit
where I derived from this syntax.  Otherwise I simply fail to understand
your problem, sorry.

>  - The current syntax does not support complex structures such as arrays,
>    and your way of loading Bibliographic references is therefore not supported.

I lived under the impression debian/upstream files are YAML and if we talk
about this yaml specification

   http://en.wikipedia.org/wiki/YAML

I fail to understand what part is not supported.  May be I
missunderstood your specification but the Wiki page you quoted did not
mentioned the fact that we intend to restrict on a subset of YAML.

>  - The use of YAML mappings (hashes) is a syntax hack that you made me
>    deeply regret.  If you start to depend on them, you break the equivalence
>    between Foo-Bar: Baz and Foo: {Bar: Baz}.  

I admit I know very less about YAML and basically trust what the Python
parser provides and I can confirm that the parser does not regard these
as equivalent.  I do not know in how far one or the other is a hack but
the very first sample on the WikiPedia page for a YAML document implies
that what you call a hack is the usual way to use YAML.

Please, I really do not want to break anything or spoil your work but I
obviosely have a lack of background information what is regarded as
important to you and necessary for umegaya.  I lived under the
impression to parse files featuring a perfectly valid YAML syntax (based
on the definition of YAML and what the YAML parser provides) and just
tried to fix syntax errors and data duplication.  I continue to fail to
understand your bad feelings about my changes so please point me to
changes I did and you consider wrong and sorry of I'm testing your
patience that hard.

> I have been working on this project since 2009.  My goal is not only to provide
> bibliographic data to the UDD, but also to provide a machine-readable file that
> is convenient for general use.  While I have picked YAML for that file, I
> really do not want to support the full YAML syntax, but rather a minimal subset
> that is close to the Debian control data files.  Currently, this is done by
> only supporting YAML scalars (except for the hash hack).

While I was not aware of this restriction even this is not avoided by my
changes, thought.

> The UDD loader is the first serious use of this data, but I think we should not
> overfit the debian/upstream syntax to this sole use.

ACK.

> In particular, I do think
> that the gatherer I made has some value, to give an easy access to the contents
> of these files without downloading the source package or visiting a VCS web
> interface.  For instance I recently used it to browse our watch files, in order
> to find examples of direct detection in Google code or SourceForge.  (
> http://upstream-metadata.debian.net/table/watch ).  It will also be more
> efficient than the UDD for quick on-demand requests about single packages or
> single fields.

I did not dived into upstream-metadata.debian.net for time constraints
but in principle I do not see the difference in the kind of data storage
for this.  So why do you think that UDD could not be used for storing
watch information as well?  (This should be discussed in a different
thread on debian-qa - the thread about bibliography is long enough.)

> In summary, let's track down issues and solve them one by one, and avoid long
> point-to-point threads.  On my side, I do not manage to be efficient with them.  

I agree that this thread became unproductive.  I would suggest the following:

  - please point me to commit IDs where I might have broken anything to
    make me understand and let me avoid this in the future.
  - I try to finish the UDD bibref gatherer based on the upstream files
    you gathered in SVN (unfortunately no real time this week, thought)
    and I commit finished code in SVN (feel free to inspect intermediate
    stuff on blends.debian.net which fills bibref_new table)

Kind regards

       Andreas. 

-- 
http://fam-tille.de


Reply to: