[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#424746: lintian: please detect duplicate words in the description



On Monday 30 June 2008, you wrote:
> Raphael Geissert <atomo64@gmail.com> writes:
> > tag 424746 patch
> > thanks
> >
> > On Wednesday 16 May 2007, Justin Pryzby wrote:
> >> Package: lintian
> >> Version: 1.23.28
> >> Severity: wishlist
> >>
> >> apt-cache dumpavail |grep -wice 'the  *the'
> >
> > Attached is a patch adding such check.
>
> Thanks!
>
> Unless someone objects, I'm inclined to make this info-level instead of a
> warning, since there are valid English constructs where this is a false
> positive and it's a fairly minor bug.

What kind of English constructs use duplicated words and are likely to appear 
on a package description? I believe there are none (but I'm always open to 
other opinions :)

>
> I think also requiring \s instead of \W on either end of the repeated
> words would be safer; that way we wouldn't warn on "foo foo", and the
> general rule of thumb I've been applying with description checks is that
> if they're quoting it, it's probably intentional.

If that's the case, please refer to attached patch (applies over the previous 
one).

Cheers,
-- 
Atomo64 - Raphael

Please avoid sending me Word, PowerPoint or Excel attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html
diff --git a/checks/description b/checks/description
index b06196c..0d5d550 100644
--- a/checks/description
+++ b/checks/description
@@ -116,7 +116,9 @@ while (<IN>) {
         tag "description-contains-homepage";
     }
 
-    if (m,((?:\W|^)(\w+)\s+(\2)(?:\W|$)),i) {
+    my $wo_quotes = $_;
+    $wo_quotes =~ s,(\"|\').*(\1),,;
+    if ($wo_quotes =~ m,((?:\W|^)(\w+)\s+(\2)(?:\W|$)),i) {
         tag "description-contains-duplicated-word", "$1";
     }
 
diff --git a/testset/description/debian/control b/testset/description/debian/control
index bf24c8c..6ce5767 100644
--- a/testset/description/debian/control
+++ b/testset/description/debian/control
@@ -32,6 +32,9 @@ Description:
  . and please avoid control statements in the long description.
  The line in an extended description should be less than 80 characters, otherwise you'll get
  a Lintian warning.
+ .
+ And the old man said "he he is the one!"
+ "No, I am am not", he replied
 
 Package: description-baz
 Architecture: all
diff --git a/testset/tags.description b/testset/tags.description
index 87eae2e..9ef6bf0 100644
--- a/testset/tags.description
+++ b/testset/tags.description
@@ -21,5 +21,5 @@ W: description-foo: description-starts-with-leading-spaces
 W: description-foo: possible-unindented-list-in-extended-description
 W: description: changelog-not-compressed-with-max-compression changelog.Debian.gz
 W: description: debian-changelog-file-contains-obsolete-user-emacs-settings
-W: description: description-contains-duplicated-word The the
+W: description: description-contains-duplicated-word The the 
 W: description: description-synopsis-might-not-be-phrased-properly

Attachment: signature.asc
Description: This is a digitally signed message part.


Reply to: