Re: BibTeX file from debian/upstream data (Was: New Debian Science metapackages)
On Tue, Apr 24, 2012 at 03:44:12PM +0900, Charles Plessy wrote:
> Much of the references I entered in the debian/upstream files are pasted from
> the publisher's site (after removing the abstract, that is copyrighted work).
> I see a lot of heterogeneity there.
Yes, there is currently heterogenicity in our data set which is most
probably caused by the (at this time) reasonable approach to copy the
> We can require or recommend that
> maintainers normalise the syntax of the Authors field, but we need to consider
> that it is an extra work and an extra failing point.
For sure. I simply started a (private) policy to convert author data to
BibTeX syntax (and I even added this to the Wiki). If I stumble upon
debian/upstream files with different syntax I do convert it.
> Or an extra target for a
> yet-to-write extension to Lintian.
We need to add some lintian checks anyway - the authors format is just
one of it.
> Is there an international standard, or a standard recognised by popular
> parsers, which would be easy to justify ?
The fact that several BibTeX stiles do exist which can be adapted to the
standards of the popular journals lets me assume that there is no such
thing like a standard (in case you do not follow the approach: The good
thing about standards is that you can have so many of them.)
I'd rather would follow this consideration: For what purpose we are
using the data. The only current use is the Web sentinel. For this
purpose the format does not really matter even if some unification would
not harm. So for the moment the current status fits its application.
Another *potential* use of the data is using it in BibTeX which actually
is *one* standard you can rely on because it can be easily converted by
existing tools. Once we will really *honestly* use the data this way
there will be an interest and if there is a real interest people will
start converting what does not fit.
In short: From an application point of view a "Warning" about not using
BibTeX syntax is appropriate which later might be turned into an "Error"
once we are really using BibTeX.
> There are similar issues for other fields, for instance, some original BibTeX
> entries used the LaTeX representation of the en dash (--), which I converted to
> the proper Unicode character (–), which was later replaced by the ASCII dash
> (-) for the sake of normalisation (which is good).
Good point. For me the same as above stays valid: For the moment we
could stick to the ASCII dash in titles. If it comes to BibTeX usage we
might consider substituting it in title strings in the BibTeX export
after inspecting the database whether this makes sense. I would not
consider this a real design flaw. I'm more concerned how we could
motivate more maintainers to actually provide citation information at