[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#721252: Not pending + new patches



Package: lintian
Version: 2.5.17
control: tags -1 - pending

Ok this is harder then expected. Patch 3 need review 

>From 4af4536da46bc9867096f1d3244936ae4cd645a7 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Bastien=20ROUCARI=C3=88S?= <roucaries.bastien@gmail.com>
Date: Sun, 8 Sep 2013 00:22:05 +0200
Subject: [PATCH 4/4] Allow # to be considered like a space

This char # is a comment for po file or shell. Replace by space.
---
 checks/cruft.pm                                                  | 3 ++-
 .../debian/src/oldfalsepositive/comments.po                      | 9 +++++++++
 2 files changed, 11 insertions(+), 1 deletion(-)
 create mode 100644 t/tests/cruft-gfdl-invariants/debian/src/oldfalsepositive/comments.po

diff --git a/checks/cruft.pm b/checks/cruft.pm
index bb31cdc..3bad624 100644
--- a/checks/cruft.pm
+++ b/checks/cruft.pm
@@ -573,8 +573,9 @@ sub find_cruft {
                       \s*[,\.;]\s*\Z               |  # final punctuation
                       \A\s*[,\.;]\s*               |  # punctuation at the beginning
                       (?:``|'')                    |  # quote like
-                      [%\*\"\|\\]                     # String, C-style comment/javadoc indent, 
+                      [%\*\"\|\\\#]                   # String, C-style comment/javadoc indent, 
                                                       # quotes for strings, pipe and antislash in some txt
+                                                      # shell or po file comments
                     )}{ }gxms;
 
                     # delete double spacing now and normalize spacing
diff --git a/t/tests/cruft-gfdl-invariants/debian/src/oldfalsepositive/comments.po b/t/tests/cruft-gfdl-invariants/debian/src/oldfalsepositive/comments.po
new file mode 100644
index 0000000..d11e67b
--- /dev/null
+++ b/t/tests/cruft-gfdl-invariants/debian/src/oldfalsepositive/comments.po
@@ -0,0 +1,9 @@
+# French translation for SANE backend options
+#
+#      Permission is granted to copy, distribute and/or modify this document
+#      under the terms of the GNU Free Documentation License, Version 1.1
+#      or any later version published by the Free Software Foundation;
+#      with no Invariant Sections, with no Front-Cover Texts, and with
+#      no Back-Cover.
+#      A copy of the license is included in the section entitled "GNU
+#      Free Documentation License".
-- 
1.8.4.rc3

>From 9583d6b069f14818fd54c690e40b2a51001aa6e4 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Bastien=20ROUCARI=C3=88S?= <roucaries.bastien@gmail.com>
Date: Thu, 5 Sep 2013 23:30:37 +0200
Subject: [PATCH 1/4] Fix another false positive in gtk-doc

Fix a false positive and improve the test suite by new case.
---
 checks/cruft.pm                                    | 37 +++++++++++-----------
 .../debian/src/oldfalsepositive/citetitle.po       | 16 ++++++++++
 .../debian/src/oldfalsepositive/gtk-doc.po         | 21 ++++++++++++
 3 files changed, 56 insertions(+), 18 deletions(-)
 create mode 100644 t/tests/cruft-gfdl-invariants/debian/src/oldfalsepositive/citetitle.po

diff --git a/checks/cruft.pm b/checks/cruft.pm
index 2a9e3b0..5901a34 100644
--- a/checks/cruft.pm
+++ b/checks/cruft.pm
@@ -514,23 +514,24 @@ sub find_cruft {
 
                     # replace some common comment-marker/markup with space
                     $gfdlsections =~ s{(?:
-                      ^[-\+!<>]       |  # diff/patch lines
-                      ^\.\\\"         |  # man comments
-                      \@c(?:omment)?  |  # Tex info comment
-                      \@var\{         |  # Tex info emphasis
-                      \}              |  # Tex info end tag (could be more clever but brute force is fast)
-                      \"\s*,          |  # String array (e.g. "line1",\n"line2")
-                      ,\s*\"          |  # String array (e.g. "line1"\n ,"line2"), seen in findutils
-                      <br\s*/?>       |  # (X)HTML line breaks
-                      </?link[^>]*?>  |  # xml link
-                      </?a[^>]*?>     |  # a link
-                      </?p[^>]*?>     |  # html paragraph
-                      </?var[^>]*?>   |  # var tag used by html from texinfo
-                      \(\*note.*?::\) |  # info file note
-                      \\n             |  # Verbatim \n in string array
-                      \s*[,\.;]\s*\Z  |  # final punctuation
-                      \A\s*[,\.;]\s*  |  # punctuation at the beginning
-                      [%\*\"\|\\]        # String, C-style comment/javadoc indent, quotes for strings, pipe and antislash in some txt
+                      ^[-\+!<>]            |  # diff/patch lines
+                      ^\.\\\"              |  # man comments
+                      \@c(?:omment)?       |  # Tex info comment
+                      \@var\{              |  # Tex info emphasis
+                      \}                   |  # Tex info end tag (could be more clever but brute force is fast)
+                      \"\s*,               |  # String array (e.g. "line1",\n"line2")
+                      ,\s*\"               |  # String array (e.g. "line1"\n ,"line2"), seen in findutils
+                      <br\s*/?>            |  # (X)HTML line breaks
+                      </?link[^>]*?>       |  # xml link
+                      </?a[^>]*?>          |  # a link
+                      </?citetitle[^>]*?>  |  # citation title in docbook
+                      </?p[^>]*?>          |  # html paragraph
+                      </?var[^>]*?>        |  # var tag used by html from texinfo
+                      \(\*note.*?::\)      |  # info file note
+                      \\n                  |  # Verbatim \n in string array
+                      \s*[,\.;]\s*\Z       |  # final punctuation
+                      \A\s*[,\.;]\s*       |  # punctuation at the beginning
+                      [%\*\"\|\\]             # String, C-style comment/javadoc indent, quotes for strings, pipe and antislash in some txt
                     )}{ }gxms;
 
                     # delete double spacing now and normalize spacing
@@ -629,7 +630,7 @@ sub find_cruft {
                             being \s list \s their \s titles \s?[,\.;]?\s?
                             with \s the \s? <_: \s* link-\d+ \s? /> \s?
                             being \s list \s?[,\.;]?\s?
-                            (?:and\s)? with \s the \s? <_:\s link-\d+ \s? /> \s?
+                            (?:and\s)? with \s the \s? <_:\s? link-\d+ \s? /> \s?
                             being \s list \Z}xiso
                       ) {
                         # fix a false positive in .po file
diff --git a/t/tests/cruft-gfdl-invariants/debian/src/oldfalsepositive/citetitle.po b/t/tests/cruft-gfdl-invariants/debian/src/oldfalsepositive/citetitle.po
new file mode 100644
index 0000000..73f9c3a
--- /dev/null
+++ b/t/tests/cruft-gfdl-invariants/debian/src/oldfalsepositive/citetitle.po
@@ -0,0 +1,16 @@
+#: C/index.docbook:65(legalnotice/para)
+msgid ""
+"Permission is granted to copy, distribute and/or modify this document under "
+"the terms of the <citetitle>GNU Free Documentation License</citetitle>, "
+"Version 1.1 or any later version published by the Free Software Foundation "
+"with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A "
+"copy of the license is <link linkend=\"fdl\">included</link>."
+msgstr ""
+"Das vorliegende Dokument kann gemäß den Bedingungen der GNU Free "
+"Documentation License (GFDL), Version 1.1 oder jeder späteren, von der Free "
+"Software Foundation veröffentlichten Version ohne unveränderbare Abschnitte "
+"sowie ohne Texte auf dem vorderen und hinteren Buchdeckel kopiert, verteilt "
+"und/oder modifiziert werden. Eine Kopie der GFDL finden Sie unter diesem "
+"<ulink type=\"help\" url=\"ghelp:fdl\">Link</ulink> oder in der mit diesem "
+"Handbuch gelieferten Datei COPYING-DOCS."
+
diff --git a/t/tests/cruft-gfdl-invariants/debian/src/oldfalsepositive/gtk-doc.po b/t/tests/cruft-gfdl-invariants/debian/src/oldfalsepositive/gtk-doc.po
index 2f6f58d..07b3003 100644
--- a/t/tests/cruft-gfdl-invariants/debian/src/oldfalsepositive/gtk-doc.po
+++ b/t/tests/cruft-gfdl-invariants/debian/src/oldfalsepositive/gtk-doc.po
@@ -19,3 +19,24 @@ msgstr ""
 "l'historique du Document, des sources documentaires, des dispositions "
 "légales, commerciales, philosophiques, ou des positions éthiques ou "
 "politiques susceptibles de concerner le sujet traité."
+
+
+#: C/fdl-appendix.xml:632(blockquote/para)
+#, fuzzy
+msgid ""
+"Permission is granted to copy, distribute and/or modify this document under "
+"the terms of the GNU Free Documentation License, Version 1.1 or any later "
+"version published by the Free Software Foundation; with the <_:link-1/> being "
+"LIST THEIR TITLES, with the <_:link-2/> being LIST, and with the <_:link-3/> "
+"being LIST. A copy of the license is included in the section entitled <_:"
+"quote-4/>."
+msgstr ""
+"Es wird die Erlaubnis gegeben, dieses Dokument zu kopieren, verteilen und/"
+"oder zu verändern unter den Bedingungen der GNU Free Documentation License, "
+"Version 1.1 oder einer späteren, von der Free Software Foundation "
+"veröffentlichten Version; mit den <link linkend=\"fdl-invariant"
+"\">Unveränderlichen Abschnitten</link>. DEREN TITEL AUFGEZÄHLT sind, mit den "
+"<link linkend=\"fdl-cover-texts\">Vorderseitentexten</link>, die AUFGEZÄHLT "
+"sind, und mit den <link linkend=\"fdl-cover-texts\">Rückseitentexten</link>, "
+"die AUFGEZÄHLT sind. Eine Kopie dieser Lizenz ist in dem Abschnitt enthalten, "
+"der mit <quote>GNU Free Documentation License</quote> betitelt ist."
-- 
1.8.4.rc3

>From d2f69b03819b9025f5686d2a3f45e47a4bf8cf33 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Bastien=20ROUCARI=C3=88S?= <roucaries.bastien@gmail.com>
Date: Sat, 7 Sep 2013 15:33:42 +0200
Subject: [PATCH 2/4] Improve detection of gfdl

Do a search and replace before matching by regexp in order to robustly detect
highlighted GNU word.
---
 checks/cruft.pm                                    | 159 +++++++++++----------
 .../debian/src/oldfalsepositive/texignu.texi       |   6 +
 2 files changed, 93 insertions(+), 72 deletions(-)
 create mode 100644 t/tests/cruft-gfdl-invariants/debian/src/oldfalsepositive/texignu.texi

diff --git a/checks/cruft.pm b/checks/cruft.pm
index 5901a34..aeec3ea 100644
--- a/checks/cruft.pm
+++ b/checks/cruft.pm
@@ -502,18 +502,32 @@ sub find_cruft {
             # if the "redeeming" part is in the next block.
             #
             # See cruft-gfdl-fp-sliding-win for the test case
-            if (
-                index($block, 'license') > -1
-                && $block =~ m/gnu (?:\s+|\s*<\/span>\s*|\s*\}\s+)? free \s+
+            if (   index($block, 'license') > -1
+                && index($block,'documentation') > -1
+                && index($block,'gnu') > -1
+                && index($block,'copy') >-1) {
+
+                my $cleanedblock = $block;
+
+               # gnu word is often highlighted
+               # do a minimal replace in order to do the hard work only in case
+               # of positively matched GFDL
+                $cleanedblock =~ s{
+                 (?:<span\s*[^>]>)?\s*gnu\s*</span\s*[^>]*?>              |   # html span
+                 (?:@[[:alpha:]]*?\{)?\s*gnu\s*\}                             # Tex info command
+                }{ gnu }gxms;
+
+                if(
+                    $cleanedblock =~ m/gnu \s+ free \s+
                      documentation \s+ license (?'rawgfdlsections'.{0,1024}?)
                      a \s+ copy \s+ of \s+ the \s+ license \s+ is/xsm
-              ) {
-                if (!exists $licenseproblemhash{'gfdl-invariants'}) {
-                    my $rawgfdlsections = $+{rawgfdlsections};
-                    my $gfdlsections = $rawgfdlsections;
+                  ) {
+                    if (!exists $licenseproblemhash{'gfdl-invariants'}) {
+                        my $rawgfdlsections = $+{rawgfdlsections};
+                        my $gfdlsections = $rawgfdlsections;
 
-                    # replace some common comment-marker/markup with space
-                    $gfdlsections =~ s{(?:
+                        # replace some common comment-marker/markup with space
+                        $gfdlsections =~ s{(?:
                       ^[-\+!<>]            |  # diff/patch lines
                       ^\.\\\"              |  # man comments
                       \@c(?:omment)?       |  # Tex info comment
@@ -534,113 +548,114 @@ sub find_cruft {
                       [%\*\"\|\\]             # String, C-style comment/javadoc indent, quotes for strings, pipe and antislash in some txt
                     )}{ }gxms;
 
-                    # delete double spacing now and normalize spacing
-                    # to space character
-                    $gfdlsections =~ s{\s++}{ }gsm;
-                    strip($gfdlsections);
+                        # delete double spacing now and normalize spacing
+                        # to space character
+                        $gfdlsections =~ s{\s++}{ }gsm;
+                        strip($gfdlsections);
 
-                    # remove version information
-                    $gfdlsections =~ s/
+                        # remove version information
+                        $gfdlsections =~ s/
                             \A version \s \d+(?:\.\d+)? \s
                             (?:or \s any \s later \s version \s)?
                             published \s by \s the \s Free \s Software \s Foundation
                             (?: \s? [,\.;])? \s?
                             //xism;
 
-                    # GFDL license, assume it is bad unless it
-                    # explicitly states it has no "bad sections".
-                    if (
-                        $gfdlsections =~ m/
+                        # GFDL license, assume it is bad unless it
+                        # explicitly states it has no "bad sections".
+                        if (
+                            $gfdlsections =~ m/
                             no \s? Invariant \s+ Sections? \s? [,\.;]?
                                \s? (?:with\s)? (?:the\s)? no \s
                                Front(?:\s?\\?-)?\s?Cover (?:\s Texts?)? \s? [,\.;]? \s? (?:and\s)?
                                (?:with\s)? (?:the\s)? no
                                \s Back(?:\s?\\?-)?\s?Cover/xiso
-                      ) {
-                        # no invariant
-                    } elsif (
-                        $gfdlsections =~ m/
+                          ) {
+                            # no invariant
+                        } elsif (
+                            $gfdlsections =~ m/
                             no \s Invariant \s Sections? \s? [,\.;]?
                                \s? (?:no\s)? Front(?:\s?[\\]?-)? \s or
                                \s (?:no\s)? Back(?:\s?[\\]?-)?\s?Cover \s Texts?/xiso
-                      ) {
-                        # no invariant variant (dict-foldoc)
-                    } elsif (
-                        $gfdlsections =~ m/
+                          ) {
+                            # no invariant variant (dict-foldoc)
+                        } elsif (
+                            $gfdlsections =~ m/
                             \A There \s are \s no \s invariants? \s sections? \Z
                           /xiso
-                      ) {
-                        # no invariant libnss-pgsql version
-                    } elsif (
-                        $gfdlsections =~ m/
+                          ) {
+                            # no invariant libnss-pgsql version
+                        } elsif (
+                            $gfdlsections =~ m/
                             \A without \s any \s Invariant \s Sections? \Z
                           /xiso
-                      ) {
-                        # no invariant parsewiki version
-                    } elsif (
-                        $gfdlsections=~ m/
+                          ) {
+                            # no invariant parsewiki version
+                        } elsif (
+                            $gfdlsections=~ m/
                             \A with \s no \s invariants? \s sections? \Z
                          /xiso
-                      ) {
-                        # no invariant lilypond version
-                    } elsif (
-                        $gfdlsections =~ m/\A
+                          ) {
+                            # no invariant lilypond version
+                        } elsif (
+                            $gfdlsections =~ m/\A
                             with \s the \s Invariant \s Sections \s being \s
                             LIST (?:\s THEIR \s TITLES)? \s? [,\.;]? \s?
                             with \s the \s Front(?:\s?[\\]?-)\s?Cover \s Texts \s being \s
                             LIST (?:\s THEIR \s TITLES)? \s? [,\.;]? \s?
                             (?:and\s)? with \s the \s Back(?:\s?[\\]?-)\s?Cover \s Texts \s being \s
                             LIST (?:\s THEIR \s TITLES)? \Z/xiso
-                      ) {
-                        # verbatim text of license is ok
-                    } elsif ($gfdlsections eq '') {
-                        # empty text is ambiguous
-                        tag 'license-problem-gfdl-invariants-empty',$name;
-                        $licenseproblemhash{'gfdl-invariants'} = 1;
-                    } elsif (
-                        $gfdlsections =~ m/
+                          ) {
+                            # verbatim text of license is ok
+                        } elsif ($gfdlsections eq '') {
+                            # empty text is ambiguous
+                            tag 'license-problem-gfdl-invariants-empty',$name;
+                            $licenseproblemhash{'gfdl-invariants'} = 1;
+                        } elsif (
+                            $gfdlsections =~ m/
                             with \s \&FDLInvariantSections; \s? [,\.;]? \s?
                             with \s+\&FDLFrontCoverText; \s? [,\.;]? \s?
                             and \s with \s \&FDLBackCoverText;/xiso
-                      ) {
-                        # fix #708957 about FDL entities in template
-                        unless (
-                            $name =~ m{
+                          ) {
+                            # fix #708957 about FDL entities in template
+                            unless (
+                                $name =~ m{
                                 /customization/[^/]+/entities/[^/]+\.docbook \Z
                               }xsm
-                          ) {
-                            tag 'license-problem-gfdl-invariants',$name;
-                            $licenseproblemhash{'gfdl-invariants'} = 1;
-                        }
-                    } elsif (
-                        # fix a false positive in maintain.texi
-                        $gfdlsections =~ m/\A
+                              ) {
+                                tag 'license-problem-gfdl-invariants',$name;
+                                $licenseproblemhash{'gfdl-invariants'} = 1;
+                            }
+                        } elsif (
+                            # fix a false positive in maintain.texi
+                            $gfdlsections =~ m/\A
                             Following \s is \s an \s example \s of \s the \s license \s notice \s
                             to \s use \s after \s the \s copyright \s line\(s\) \s using \s all \s the \s
                             features \s of \s the \s GFDL/xiso
-                      ) {
-                        # allow only one text
-                        unless ($name =~ m/maintain/) {
-                            tag 'license-problem-gfdl-invariants',$name;
-                            $licenseproblemhash{'gfdl-invariants'} = 1;
-                        }
-                    } elsif (
-                        $gfdlsections =~ m{
+                          ) {
+                            # allow only one text
+                            unless ($name =~ m/maintain/) {
+                                tag 'license-problem-gfdl-invariants',$name;
+                                $licenseproblemhash{'gfdl-invariants'} = 1;
+                            }
+                        } elsif (
+                            $gfdlsections =~ m{
                             \A with \s the \s? <_: \s? link-\d+ \s? /> \s?
                             being \s list \s their \s titles \s?[,\.;]?\s?
                             with \s the \s? <_: \s* link-\d+ \s? /> \s?
                             being \s list \s?[,\.;]?\s?
                             (?:and\s)? with \s the \s? <_:\s? link-\d+ \s? /> \s?
                             being \s list \Z}xiso
-                      ) {
-                        # fix a false positive in .po file
-                        unless ($name =~ m,\.po$,) {
+                          ) {
+                            # fix a false positive in .po file
+                            unless ($name =~ m,\.po$,) {
+                                tag 'license-problem-gfdl-invariants', $name;
+                                $licenseproblemhash{'gfdl-invariants'} = 1;
+                            }
+                        } else {
                             tag 'license-problem-gfdl-invariants', $name;
                             $licenseproblemhash{'gfdl-invariants'} = 1;
                         }
-                    } else {
-                        tag 'license-problem-gfdl-invariants', $name;
-                        $licenseproblemhash{'gfdl-invariants'} = 1;
                     }
                 }
             }
diff --git a/t/tests/cruft-gfdl-invariants/debian/src/oldfalsepositive/texignu.texi b/t/tests/cruft-gfdl-invariants/debian/src/oldfalsepositive/texignu.texi
new file mode 100644
index 0000000..9327024
--- /dev/null
+++ b/t/tests/cruft-gfdl-invariants/debian/src/oldfalsepositive/texignu.texi
@@ -0,0 +1,6 @@
+Permission is granted to copy, distribute and/or modify this document
+under the terms of the @acronym{GNU} Free Documentation License,
+Version 1.3 or any later version published by the Free Software
+Foundation; with no Invariant Sections, with no Front-Cover texts
+and with no Back-Cover Texts.  A copy of the license is included in the section entitled
+``@acronym{GNU} Free Documentation License.''
-- 
1.8.4.rc3

>From 25ec9d12b99e8dcc779d444790c0c8b9b775d950 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Bastien=20ROUCARI=C3=88S?= <roucaries.bastien@gmail.com>
Date: Sat, 7 Sep 2013 23:43:07 +0200
Subject: [PATCH 3/4] Fix a false positive in gtk-doc

This is a really hard case this time. The example is like this:
> GNU Free Documentation license
> Sweedish text
> GNU Free Documentation license
> english text of invariants
> a copy is included.

In order to get it we need to match the interesting section ($gfdlsections in the code or here english text of invariant)
only if this section does not match GNU Free Documentation license. So we need to change
.*? by (?:(?!gnu \s+ free \s+ documentation \s+ license).)*?.

Unfortunalty it means regression on one of the text case maintain.texi that is an example text.
In order to fix this regression we need to get more context arround the GNU Free Documentation license pattern.

So if the context before means example do not tag.

Last but not least they are a corner case. Because we do a sliding windows match, we could loose the context before in the block
recently discarded. Thus except if it is the first block, always ask for 1024 char exactly of context.
---
 checks/cruft.pm                                    | 161 +++++++++++++--------
 .../debian/src/oldfalsepositive/maintain.html      |  28 ++++
 .../src/oldfalsepositive/partialtranslation.po     |  15 ++
 3 files changed, 147 insertions(+), 57 deletions(-)
 create mode 100644 t/tests/cruft-gfdl-invariants/debian/src/oldfalsepositive/maintain.html
 create mode 100644 t/tests/cruft-gfdl-invariants/debian/src/oldfalsepositive/partialtranslation.po

diff --git a/checks/cruft.pm b/checks/cruft.pm
index aeec3ea..bb31cdc 100644
--- a/checks/cruft.pm
+++ b/checks/cruft.pm
@@ -463,6 +463,7 @@ sub find_cruft {
 
         my @queue = ('', '');
         my %licenseproblemhash = ();
+        my $blocknumber = 0;
 
         # we try to read this file in block and use a sliding window
         # for efficiency.  We store two blocks in @queue and the whole
@@ -503,9 +504,9 @@ sub find_cruft {
             #
             # See cruft-gfdl-fp-sliding-win for the test case
             if (   index($block, 'license') > -1
-                && index($block,'documentation') > -1
-                && index($block,'gnu') > -1
-                && index($block,'copy') >-1) {
+                && index($block, 'documentation') > -1
+                && index($block, 'gnu') > -1
+                && index($block, 'copy') >-1) {
 
                 my $cleanedblock = $block;
 
@@ -517,49 +518,94 @@ sub find_cruft {
                  (?:@[[:alpha:]]*?\{)?\s*gnu\s*\}                             # Tex info command
                 }{ gnu }gxms;
 
-                if(
-                    $cleanedblock =~ m/gnu \s+ free \s+
-                     documentation \s+ license (?'rawgfdlsections'.{0,1024}?)
-                     a \s+ copy \s+ of \s+ the \s+ license \s+ is/xsm
-                  ) {
-                    if (!exists $licenseproblemhash{'gfdl-invariants'}) {
-                        my $rawgfdlsections = $+{rawgfdlsections};
-                        my $gfdlsections = $rawgfdlsections;
-
-                        # replace some common comment-marker/markup with space
-                        $gfdlsections =~ s{(?:
-                      ^[-\+!<>]            |  # diff/patch lines
-                      ^\.\\\"              |  # man comments
-                      \@c(?:omment)?       |  # Tex info comment
-                      \@var\{              |  # Tex info emphasis
-                      \}                   |  # Tex info end tag (could be more clever but brute force is fast)
-                      \"\s*,               |  # String array (e.g. "line1",\n"line2")
-                      ,\s*\"               |  # String array (e.g. "line1"\n ,"line2"), seen in findutils
-                      <br\s*/?>            |  # (X)HTML line breaks
-                      </?link[^>]*?>       |  # xml link
-                      </?a[^>]*?>          |  # a link
-                      </?citetitle[^>]*?>  |  # citation title in docbook
-                      </?p[^>]*?>          |  # html paragraph
-                      </?var[^>]*?>        |  # var tag used by html from texinfo
-                      \(\*note.*?::\)      |  # info file note
-                      \\n                  |  # Verbatim \n in string array
-                      \s*[,\.;]\s*\Z       |  # final punctuation
-                      \A\s*[,\.;]\s*       |  # punctuation at the beginning
-                      [%\*\"\|\\]             # String, C-style comment/javadoc indent, quotes for strings, pipe and antislash in some txt
+                # classical gfdl matching pattern
+                my $normalgfdlpattern = qr/
+                 (?'contextbefore'(?:
+                    (?:(?!a \s+ copy \s+ of \s+ the \s+ license \s+ is).){1024}|
+                    (?:\s+ copy \s+ of \s+ the \s+ license \s+ is.{0,1024}?)))
+                 gnu \s+ free \s+ documentation \s+ license
+                 (?'rawgfdlsections'(?:(?!gnu \s+ free \s+ documentation \s+ license).){0,1024}?)
+                 a \s+ copy \s+ of \s+ the \s+ license \s+ is
+                /xsmo;
+
+                # for first block we get context from the beginning
+                my $firstblockgfdlpattern = qr/
+                 (?'rawcontextbefore'(?:
+                    (?:(?!a \s+ copy \s+ of \s+ the \s+ license \s+ is).){1024}|
+                  \A(?:(?!a \s+ copy \s+ of \s+ the \s+ license \s+ is).){0,1024}|
+                    (?:\s+ copy \s+ of \s+ the \s+ license \s+ is.{0,1024}?)
+                  )
+                 )
+                 gnu \s+ free \s+ documentation \s+ license
+                 (?'rawgfdlsections'(?:(?!gnu \s+ free \s+ documentation \s+ license).){0,1024}?)
+                 a \s+ copy \s+ of \s+ the \s+ license \s+ is
+                 /xsmo;
+
+                my $gfdlpattern
+                  =  $blocknumber ? $normalgfdlpattern : 
+                    $firstblockgfdlpattern;
+
+                local *cleanmatch = sub {
+                    my $tocleanmatch = $_[0];
+
+                    # replace some common comment-marker/markup with space
+                    $tocleanmatch =~ s{(?:
+                      ^\.\\\"                      |  # man comments
+                      \@c(?:omment)?\s+            |  # Tex info comment
+                      \@var\{                      |  # Tex info emphasis
+                      \@(?:small)?example\s+       |  # Tex info example
+                      \@end\h+(?:small)example\s+  |  # Tex info end small example tag
+                      \@group\s+                   |  # Tex info group
+                      \@end\h+group\s+             |  # Tex info end group    
+                      \}                           |  # Tex info end tag (could be more clever but brute force is fast)
+                      \"\s*,                       |  # String array (e.g. "line1",\n"line2")
+                      ,\s*\"                       |  # String array (e.g. "line1"\n ,"line2"), seen in findutils
+                      <br\s*/?>                    |  # (X)HTML line breaks
+                      </?link[^>]*?>               |  # xml link
+                      </?a[^>]*?>                  |  # a link
+                      </?citetitle[^>]*?>          |  # citation title in docbook
+                      </?div[^>]*?>                |  # html style
+                      </?p[^>]*?>                  |  # html paragraph
+                      </?var[^>]*?>                |  # var tag used by html from texinfo
+                      ^[-\+!<>]                    |  # diff/patch lines (should be after html tag)
+                      \(\*note.*?::\)              |  # info file note
+                      \\n                          |  # Verbatim \n in string array
+                      \s*[,\.;]\s*\Z               |  # final punctuation
+                      \A\s*[,\.;]\s*               |  # punctuation at the beginning
+                      (?:``|'')                    |  # quote like
+                      [%\*\"\|\\]                     # String, C-style comment/javadoc indent, 
+                                                      # quotes for strings, pipe and antislash in some txt
                     )}{ }gxms;
 
-                        # delete double spacing now and normalize spacing
-                        # to space character
-                        $gfdlsections =~ s{\s++}{ }gsm;
-                        strip($gfdlsections);
+                    # delete double spacing now and normalize spacing
+                    # to space character
+                    $tocleanmatch =~ s{\s++}{ }gsm;
+                    strip($tocleanmatch);
 
-                        # remove version information
-                        $gfdlsections =~ s/
-                            \A version \s \d+(?:\.\d+)? \s
-                            (?:or \s any \s later \s version \s)?
-                            published \s by \s the \s Free \s Software \s Foundation
-                            (?: \s? [,\.;])? \s?
-                            //xism;
+                    return $tocleanmatch;
+                };
+
+                if($cleanedblock =~ $gfdlpattern) {
+                    if (!exists $licenseproblemhash{'gfdl-invariants'}) {
+                        my $rawgfdlsections = $+{rawgfdlsections} || '';
+                        my $rawcontextbefore = $+{rawcontextbefore} || '';
+
+                        # replace some common comment-marker/markup with space
+                        my $gfdlsections = cleanmatch($rawgfdlsections);
+                        my $contextbefore = cleanmatch($rawcontextbefore);
+
+                  # remove classical and without meaning part of matched string
+                        $gfdlsections =~ s{
+                          \A version \s \d+(?:\.\d+)? \s
+                           (?:or \s any \s later \s version \s)?
+                           published \s by \s the \s Free \s Software \s Foundation
+                           \s?[,\.;]?\s?}{}xismo;
+                        $contextbefore =~ s{
+                          \s? (:?[,\.;]? \s?)?
+                           permission \s is \s granted \s to \s copy \s?[,\.;]?\s?
+                           distribute \s?[,\.;]?\s? and\s?/?\s?or \s modify \s
+                           this \s document \s under \s the \s terms \s of \s the\Z}
+                        {}xismo;
 
                         # GFDL license, assume it is bad unless it
                         # explicitly states it has no "bad sections".
@@ -627,18 +673,6 @@ sub find_cruft {
                                 $licenseproblemhash{'gfdl-invariants'} = 1;
                             }
                         } elsif (
-                            # fix a false positive in maintain.texi
-                            $gfdlsections =~ m/\A
-                            Following \s is \s an \s example \s of \s the \s license \s notice \s
-                            to \s use \s after \s the \s copyright \s line\(s\) \s using \s all \s the \s
-                            features \s of \s the \s GFDL/xiso
-                          ) {
-                            # allow only one text
-                            unless ($name =~ m/maintain/) {
-                                tag 'license-problem-gfdl-invariants',$name;
-                                $licenseproblemhash{'gfdl-invariants'} = 1;
-                            }
-                        } elsif (
                             $gfdlsections =~ m{
                             \A with \s the \s? <_: \s? link-\d+ \s? /> \s?
                             being \s list \s their \s titles \s?[,\.;]?\s?
@@ -653,12 +687,25 @@ sub find_cruft {
                                 $licenseproblemhash{'gfdl-invariants'} = 1;
                             }
                         } else {
-                            tag 'license-problem-gfdl-invariants', $name;
-                            $licenseproblemhash{'gfdl-invariants'} = 1;
+                            if (
+                                $contextbefore =~ m/
+                                  Following \s is \s an \s example
+                                  (:?\s of \s the \s license \s notice \s to \s use
+                                    (?:\s after \s+ the copyright (?:line\(s\))?
+                                      (?:using all the features? of the GFDL)?
+                                    )?
+                                  )? \s? [,:]?/xiso
+                              ) {
+                                # it is an example
+                            }else {
+                                tag 'license-problem-gfdl-invariants', $name;
+                                $licenseproblemhash{'gfdl-invariants'} = 1;
+                            }
                         }
                     }
                 }
             }
+            $blocknumber++;
         }
         close($F);
     }
diff --git a/t/tests/cruft-gfdl-invariants/debian/src/oldfalsepositive/maintain.html b/t/tests/cruft-gfdl-invariants/debian/src/oldfalsepositive/maintain.html
new file mode 100644
index 0000000..2cb2af4
--- /dev/null
+++ b/t/tests/cruft-gfdl-invariants/debian/src/oldfalsepositive/maintain.html
@@ -0,0 +1,28 @@
+<html>
+<body>
+<p>Documentation files should have license notices also.  Manuals should
+use the GNU Free Documentation License.  Following is an example of the
+license notice to use after the copyright line(s) using all the
+features of the GFDL.
+</p>
+<div class="smallexample">
+<pre class="smallexample">Permission is granted to copy, distribute and/or modify this document
+under the terms of the GNU Free Documentation License, Version 1.3 or
+any later version published by the Free Software Foundation; with the
+Invariant Sections being ``GNU General Public License'', with the
+Front-Cover Texts being ``A GNU Manual'', and with the Back-Cover Texts
+as in (a) below.  A copy of the license is included in the section
+entitled ``GNU Free Documentation License''.
+
+(a) The FSF's Back-Cover Text is: ``You have the freedom to
+copy and modify this GNU manual.  Buying copies from the FSF
+supports it in developing GNU and promoting software freedom.''
+</pre></div>
+
+<p>If the FSF does not publish this manual on paper, then omit the last
+sentence in (a) that talks about copies from GNU Press.  If the FSF is
+not the copyright holder, then replace &lsquo;<samp>FSF</samp>&rsquo; with the appropriate
+name.
+</p>
+</body>
+</html>
\ No newline at end of file
diff --git a/t/tests/cruft-gfdl-invariants/debian/src/oldfalsepositive/partialtranslation.po b/t/tests/cruft-gfdl-invariants/debian/src/oldfalsepositive/partialtranslation.po
new file mode 100644
index 0000000..dccb85b
--- /dev/null
+++ b/t/tests/cruft-gfdl-invariants/debian/src/oldfalsepositive/partialtranslation.po
@@ -0,0 +1,15 @@
+<para>För att använda GNU Free Documentation License för ett dokument du har skrivit, inkludera en kopia av licensen [det engelska originalet] i dokumentet och placera följande copyrightklausul omedelbart efter titelsidan:</para>
+
+<blockquote>
+  <para>
+	 Permission is granted to copy, distribute and/or modify this
+	document under the terms of the GNU Free Documentation
+	License, Version 1.1 or any later version published by the
+	Free Software Foundation; with the <link linkend="fdl-invariant">Invariant Sections</link> being LIST
+	THEIR TITLES, with the <link linkend="fdl-cover-texts">Front-Cover Texts</link> being LIST,
+	and with the <link linkend="fdl-cover-texts">Back-Cover
+	Texts</link> being LIST.  A copy of the license is included in
+	the section entitled <quote>GNU Free Documentation
+	License</quote>.
+      </para>
+</blockquote>
\ No newline at end of file
-- 
1.8.4.rc3


Reply to: