[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1019980: lintian: source-is-missing check for HTML is much too sensitive



Package: lintian
Version: 2.115.3
Severity: normal

Lintian issues these errors for putty 0.77-1:

  E: putty source: source-is-missing [doc/html/AppendixA.html]
  E: putty source: source-is-missing [doc/html/AppendixB.html]
  E: putty source: source-is-missing [doc/html/AppendixE.html]
  E: putty source: source-is-missing [doc/html/Chapter10.html]
  E: putty source: source-is-missing [doc/html/Chapter2.html]
  E: putty source: source-is-missing [doc/html/Chapter3.html]
  E: putty source: source-is-missing [doc/html/Chapter4.html]
  E: putty source: source-is-missing [doc/html/Chapter5.html]
  E: putty source: source-is-missing [doc/html/Chapter7.html]
  E: putty source: source-is-missing [doc/html/Chapter8.html]
  E: putty source: source-is-missing [doc/html/Chapter9.html]
  E: putty source: source-is-missing [doc/html/IndexPage.html]

This is pretty oversensitive.  Firstly, it's HTML, which is still often
enough written by hand anyway.  As it happens, these particular HTML
files are generated from halibut input that's also provided in the
source package, though I can't see how Lintian could possibly expect to
know that.

I tried to work out whether I should be overriding this or whether it's
a bug in Lintian, and I think it's the latter.  The current relevant
code is this in lib/Lintian/Check/Files/SourceMissing.pm:

    sub visit_patched_files {
        my ($self, $item) = @_;
    
        return
          unless $item->is_file;
        [...]
        return
          if !defined $longest || $line_length{$longest} <= $VERY_LONG_LINE_LENGTH;
        [...]
    
        if ($item->basename =~ /\.(?:x?html?\d?|xht)$/i) {
    
            # html file
            $self->pointed_hint('source-is-missing', $item->pointer)
              unless $self->find_source($item, {'.fragment.js' => $DOLLAR});
        }
    
        return;
    }

So it issues a diagnostic for every HTML file with a somewhat long line
(over 512 characters) unless it has an associated .fragment.js somewhere
(I think - the find_source sub is undocumented and a bit obscure to me)?
That doesn't sound right - surely that would catch far too many false
positives.

Next, I went looking through git history to try to figure out where this
was introduced.  I found this commit:

  https://salsa.debian.org/lintian/lintian/-/commit/4f24ab7fca

The commit message makes it sound as though it was probably just
refactoring, but it wasn't.  The corresponding bit of code there was
previously in a warn_prebuilt_javascript sub called from a
warn_long_lines sub, which in turn was called in two places: once for
certain kinds of .js files, and once from this sub:

    # check javascript in html file
    sub check_html_cruft {
        my ($self, $item, $lowercase) = @_;
    
        my $blockscript = $lowercase;
        my $indexscript;
    
        while (($indexscript = index($blockscript, '<script')) > $ITEM_NOT_FOUND) {
    
            $blockscript = substr($blockscript,$indexscript);
    
            # sourced script ok
            if ($blockscript =~ m{\A<script\s+[^>]*?src="[^"]+?"[^>]*?>}sm) {
    
                $blockscript = substr($blockscript,$+[0]);
                next;
            }
    
            # extract script
            if ($blockscript =~ m{<script[^>]*?>(.*?)</script>}sm) {
    
                $blockscript = substr($blockscript,$+[0]);
    
                my $lcscript = $1;
                $self->check_js_script($item, $lcscript);
    
                return 0
                  if $self->warn_long_lines($item, $lcscript);
    
                next;
            }
    
            # here we know that we have partial script. Do the check nevertheless
            # first check if we have the full <script> tag and do the check
            # if we get <script src="  "
            # then skip
            if ($blockscript =~ /\A<script[^>]*?>/sm) {
    
                $blockscript = substr($blockscript,$+[0]);
                $self->check_js_script($item, $blockscript);
            }
    
            return 0;
        }
    
        return 1;
    }

This made much more sense!  I could get on board with issuing a
diagnostic for <script> tags in HTML files that look like unminified
JavaScript, and that appears to be what this check was originally meant
to do.  Unfortunately, it looks like that extra logic was dropped in
this "Further rationalize cruft check; separate concerns" commit, and
now we have a very much broader check on HTML files with no indication
that this change was intentional.  Something like this <script> check is
still present in Lintian, but in a different context, and it's no longer
used for the source-is-missing check.

I suggest restoring something like this code to check for <script> tags
around the source-is-missing check for HTML files.  I suspect that this
might also deal with reports such as #1017094 and #1017966, though I've
filed this separately as I'm not sure of that.

Thanks,

-- 
Colin Watson (he/him)                              [cjwatson@debian.org]


Reply to: