[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#962601: manpage-section-mismatch doesn't take into account manpages with multiple binaries



On Wednesday, July 08 2020, Felix Lechner wrote:

> Hi Sergio,

Hey Felix,

> On Wed, Jun 10, 2020 at 9:57 AM Sergio Durigan Junior
> <sergiodj@debian.org> wrote:
>>
>> after calling Text::ParseWords::parse_line, we check to
>> see if the first package name has a comma as the last char.  If it does,
>> then we assume that there will be at least one other package name
>> listed, and advance an index.  When we reach a package name whose last
>> char is not a comma, we then assume that the next field is the manpage
>> section number.
>
> Something in that patch is not quite working. I previously added a
> safeguard for an undefined value warning, but that was not enough:
>
>     https://salsa.debian.org/lintian/lintian/-/commit/d3c64d8ab40de6e38c96334e2515550df1957a4a
>
> In an archive-wide run, the modified patch still produced the warnings
> below. I show the complete list for the record, and not to intimidate
> anyone. It's no big deal.
>
> You may want to check out kde-dev-scripts, which generated a lot of warnings.

Ouch, thanks for the report, and sorry about the breakage.  My Perl-foo
is very limited.

So, I think I found the problem, and I have a possible solution.
Apparently, some manpages have malformed .TH headers, and Perl's
Text::ParseWords::parse_line doesn't cope well with them.  For example,
kde-dev-scripts's /usr/share/man/ca/man1/create_makefiles.1.gz file has:

.TH "\FBCREATE_MAKEFILES\" "1" "8 de mar\(,c del 2003" "[FIXME: source]" "[FIXME: manual]"

Not pretty, huh?  If we make simple Perl program to try to parse this
line:

  use Text::ParseWords;
  @words = parse_line('\s+', 0, q{.TH "\FBCREATE_MAKEFILES\" "1" "8 de mar\(,c del 2003" "[FIXME: source]" "[FIXME: manual]"});
  $i = 0;
  foreach (@words) {
      print "$i: <$_>\n";
      $i++;
  }

and run it, you will noticed that it doesn't return anything!  Now, if
we tweak the line a little bit, by removing some of the backslashes for
example, we start getting somewhere:

  ...
  @words = parse_line('\s+', 0, q{.TH "FBCREATE_MAKEFILES" "1" "8 de mar\(,c del 2003" "[FIXME: source]" "[FIXME: manual]"});
  ...

Now run it:

  $ perl parseline.pl 
  0: <.TH>
  1: <FBCREATE_MAKEFILES>
  2: <1>
  3: <8 de mar(,c del 2003>
  4: <[FIXME: source]>
  5: <[FIXME: manual]>

So yeah, there's a problem here.  I honestly don't feel like spending
too much time investigating Perl's internals, so I think it's possible
to detect when parse_line failed and act accordingly.

I'm attaching a patch that does just that, and prevents the
warnings/failures from happening.  The idea is to check whether the size
of the @th_fields array is bigger than 0, and just perform the checks if
they are.  I also took the liberty to remove the // EMPTY part, because
it shouldn't be necessary anymore.

What do you think?

-- 
Sergio
GPG key ID: 237A 54B1 0287 28BF 00EF  31F4 D0EB 7628 65FC 5E36
Please send encrypted e-mail if possible
https://sergiodj.net/

diff --git a/checks/documentation/manual.pm b/checks/documentation/manual.pm
index b30bf6081..350dee927 100644
--- a/checks/documentation/manual.pm
+++ b/checks/documentation/manual.pm
@@ -297,21 +297,23 @@ sub files {
             next if $line =~ /^\.\\\"/; # comments .\"
             if ($line =~ /^\.TH\s/) { # header
                 my @th_fields= Text::ParseWords::parse_line('\s+', 0, $line);
-                my $pkgname_idx = 1;
-                # Iterate over possible package names.  If there is
-                # more than one, they will be separated by a comma and
-                # a whitespace.  In case we find the comma, we advance
-                # $pkgname_idx.
-                while ((substr($th_fields[$pkgname_idx], -1) // EMPTY) eq ','){
-                    $pkgname_idx++;
-                }
-                # We're now at the last package, so we should be able
-                # to obtain the manpage section number by incrementing
-                # 1 to the index.
-                my $th_section = $th_fields[++$pkgname_idx];
-                if ($th_section && (lc($fn_section) ne lc($th_section))) {
-                    $self->tag('wrong-manual-section',
-                        "$file:$lc $fn_section != $th_section");
+                if ($#th_fields > 0) {
+                    my $pkgname_idx = 1;
+                    # Iterate over possible package names.  If there is
+                    # more than one, they will be separated by a comma and
+                    # a whitespace.  In case we find the comma, we advance
+                    # $pkgname_idx.
+                    while ((substr($th_fields[$pkgname_idx], -1)) eq ','){
+                        $pkgname_idx++;
+                    }
+                    # We're now at the last package, so we should be able
+                    # to obtain the manpage section number by incrementing
+                    # 1 to the index.
+                    my $th_section = $th_fields[++$pkgname_idx];
+                    if ($th_section && (lc($fn_section) ne lc($th_section))) {
+                        $self->tag('wrong-manual-section',
+                                   "$file:$lc $fn_section != $th_section");
+                    }
                 }
             }
             if (   ($line =~ m,(/usr/(dict|doc|etc|info|man|adm|preserve)/),)

Attachment: signature.asc
Description: PGP signature


Reply to: