[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: GSoC status: classification, output format and more



Jordà Polo <jorda@ettin.org> writes:
> On Tue, Jul 22, 2008 at 10:42:12PM -0700, Russ Allbery wrote:

>> Yes, standardization would be excellent here, as well as adding more
>> keywords to the translator that turns them into nice descriptions for
>> the web and for -i output.
>> 
>> The one thing this doesn't give us is distinguishing between the
>> "sources" of the various tags that don't have meaningful Ref values.
>> There are a few different cases even if the tag isn't based on some
>> external source.  "The resulting package would be broken" vs. "request
>> of relevant maintainer" vs. "generally accepted best practice" comes to
>> mind.  But we could handle this through keywords in Ref.
>
> I'm not sure what exactly do you mean with "The resulting package would
> be broken"... that looks like Severity to me, not a Source.
>
> You are right though, some detail is lost, but wouldn't it be a bit
> confusing to have both Source and Ref? How would you handle that?

So, here's my thought process, which may very well be bogus, but it's what
was sitting in the back of my mind.

The goal is to be able to select tests by source, so if you want to find
Policy violations, you can select only policy as a source, or if you want
to audit against the devref, you can select only devref as a source.  It
addressed a concern that was raised on debian-devel where people wanted to
be able to see only tags with a certain level of "blessing" based on their
opinions of the authority of different documents.

In that world, there are a whole pile of tags in Lintian right now that
have no Ref information at all.  These tags fall roughly into the
following categories:

* Tags we issue for things that are just obviously broken, even if not
  specifically mentioned in any document.  Example:
  library-in-debug-or-profile-should-not-be-stripped

* "You're doing it wrong" sorts of errors based on the documentation of
  the program used.  This may be the same sort of case.  Example:
  dh_testversion-is-deprecated

* Tags for something that someone noticed is usually a warning sign that
  there's something broken about the package even though it in and of
  itself is not a violation of any document.  (This may be the same thing
  as the previous one, but a lower severity.)  Example:
  library-not-linked-against-libc

* Best practices and style issues that aren't specifically mentioned in
  any document but seem to be the consensus of the project.  Example:
  debug-file-should-use-detached-symbols or diff-contains-cvs-control-dir.
  In many cases, these possibly should be added to the devref, but it's
  often easier to add a check to Lintian than write up text and find a
  document into which to put it.

* Tags from random other documents that don't produce enough tags to be a
  very good selection criteria.  Example: syntax-error-in-symbols-file

My guess is that we have as many tags without Ref as with, and while in
some cases that's an oversight, in many cases there just isn't a
reference.  So the question is: is it worthwhile to break those into
different categories for people to select, or do severities cover it?  I
can see, for example, someone wanting to see the "this is obviously
broken" tags but not the "best practice" tags.

> Making Source a mandatory field for all tags, even if they already have
> a Ref?  The relevant distinction is probably between policy and
> non-policy tags, so I'm not sure adding another field is worth it. I'd
> rather keep only one and use keywords, as suggested, for non-external
> sources. (Source may be more "semantically" correct than Ref if we
> include these keywords.)

Right, I don't think adding a separate Source field is needed.  I like the
idea of reusing Ref.  I'm just not sure if we should be adding a Ref:
just-broken or Ref: best-practice sort of tag to the non-Ref tags we have
now.

> Hmm, I'm not so sure about that ;)
>
> Most of those are probably tags that I didn't classify properly using
> Severity and Certainty. Anyway, the list was already available, just
> forgot the link:
>   http://ettin.org/tmp/lintian/transtats-v.out (last 3 paragraphs)

| Code I (69.23%)
|   I: 18
|   W: 8
|     xs-vcs-header-in-debian-control

This is probably wishlist since you can't tell the difference in the
resulting package.

|     diff-contains-editor-backup-file

This is an interesting case.  If I were filing a bug about this, it would
indeed be a minor bug, and severity is certain.  I think this is a correct
change, the more I think about it; this is probably a warning.  It doesn't
break anything about the package, though, which was probably the argument
for info.

|     debconf-error-requires-versioned-depends
|     package-needs-python-policy-debhelper
|     file-in-usr-something-x11-without-pre-depends

These tags go away completely with the lenny release.  They're all really
wishlist priority at the moment because they only affect oldstable
backports or upgrades from oldstable, which just aren't a priority for
anyone.

|     unknown-field-in-dsc

Looking at lintian.d.o, I think your change is correct.  This is catching
a ton of typos (VCS-Browse being a popular one) and should probably be
more visible.  The only false positive appears to be Comment, and I'm not
sure why some of the debian-med packages are using that.

|     non-us-spelling

Kind of a weird case since there's no non-US archive any more, and it's
hence basically obsolete.  I've been wondering for a while if we should
just delete all of the non-US Lintian tags and analysis.

|     script-in-usr-share-doc

This is probably wishlist.

| Code W (96.70%)
|   I: 1
|     no-upstream-changelog

This tag is never actually issued right now because of the false
positives.  Your change is correct here.

|   E: 6
|     changelog-file-not-compressed

Ah, this is W currently because it's a "should," but important/certain is
mapped to E.  Hm.  Interesting, and worth thinking about.  The question
is, do policy "shoulds" ever get to be errors instead of warnings?  If
not, we should change important/certain to W.  I think I'm okay with this
being an E, despite it only being a should.

|     wrong-name-for-debian-changelog-file

This is probably possible and not certain, but I'm not sure.  I'm only
aware of one false positive at present (perl, which is an odd special
case), but it is using a heuristic.

|     possible-missing-colon-in-closes

This is normal rather than important.

|     non-etc-file-marked-as-conffile
|     debian-revision-should-not-be-zero

I think your change to these is correct.

|     new-essential-package

This one is a warning mostly because people don't tend to do this by
mistake, and hence when the tag is issued, it's almost always a bug in
Lintian rather than in the package.  I think I'd mark this one
important/possible because of that.

| Code E (99.03%)
|   W: 2
|     library-in-debug-or-profile-should-not-be-stripped
|     essential-no-not-needed
|   E: 204

Yeah, I think those changes are correct.

-- 
Russ Allbery (rra@debian.org)               <http://www.eyrie.org/~eagle/>


Reply to: