[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#778955: lintian: suggest check html <img>s included in package



Package: lintian
Version: 2.5.30+deb8u3
Severity: wishlist
Tags: patch

If a .html file is in a package then usually its <img> files should be
in the package too so it displays nicely.  I suggest the few lines below
to check this.

Without picking on any particular maintainers, missing images can be
found in for example
* whizzytex where /usr/share/doc/whizzytex/whizzytex.html is missing
  whizzytex001.png (and two others)
* texlive-pictures-doc (very big) where
  /usr/share/doc/texlive-doc/latex/mathspic/sourcecode113.html is
  missing a fig1.jpg deep in its detailed description

I'm unsure if my code notices images supplied by dependent packages.
I put a group bit like the manpages and symlinks checks, but I don't
really understand when packages are a group.  Eg. per html.pm comments,
texlive-lang-french uses images from texlive-base and has a correct
declared dependency, but I couldn't make the right incantation to have
it recognised :-(.

Incidentally HTML::Parser would be a more reliable html parse of course.
But are lintian dependencies supposed to be kept down?  I see another
rough html parse in files.pm for privacy breaches.  A good parse might
help accuracy there against obscure quoting or escaping.

I thought separate html.pm script to leave room for other checks related
to html parse (whatever method).  Maybe similar treatment of css or
javascript (though I don't rate those), even some href checking.  No
full link checker, but detect document parts apparently missing from a
package.

# html -- lintian check script

# Copyright 2015 Kevin Ryde
#
# This program is free software; you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by the Free
# Software Foundation; either version 2 of the License, or (at your option)
# any later version.
#
# This program is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
# or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
# for more details.
#
# You should have received a copy of the GNU General Public License along
# with this program.  If not go to <http://www.gnu.org/licenses/>.


# ENHANCE-ME: snd-doc /usr/share/doc/snd-doc/HTML/manual/snd-contents.html
# has a javascript chunk in the <head> which tricks the rough regexp below
# into reporting src='".  HTML::Parser could likely do a better job.
#
# ENHANCE-ME: texlive-lang-french
# /usr/share/doc/texlive-doc/texlive/texlive-fr/texlive-fr.html has
# src="../texlive-common/install-lnx-main.png" which is in its declared
# dependency texlive-base but they're different source packages.  Will they
# show up in $ginfo->direct_dependencies($proc)?  (If so then amend the note
# in html.desc, if not then try something for arbitrary dependencies.)
# 

package Lintian::html;
use 5.010;
use strict;
use warnings;

use Lintian::Tags qw(tag);
use Lintian::Util qw(slurp_entire_file normalize_pkg_path);

use File::Basename qw(fileparse);

sub run {
my (undef, undef, $info, $proc, $group) = @_;

# Read each HTML file in the package...
foreach my $file ($info->sorted_index) {
    next unless $file =~ /\.html?$/i && $file->is_file;
    my ($basename, $dirname) = fileparse($file);

    my $str = $file->file_contents;
    while ($str =~ /<([^>]+)>/g) {
        my $body = $1;
        $body =~ /^(img|video)\b/i or next;
        #                     $1   $2       $3
        $body =~ /\bsrc\s*=\s*(['"]([^"']+)|([^ \t\r\n>]+))/ or next;
        my $target = $2 // $3;
        # <img src="foo.png"> results in $target="foo.png"

        # Skip anything external http: etc with a :
        # Skip anything with an & as probably literal text which the rough
        # parse has misinterpreted
        next if $target =~ /[:&]/;

        # If $target is relative then resolve against $dirname of the html.
        my $target_fullname = normalize_pkg_path($dirname, $target);

        if (! target_exists($info, $proc, $group, $target_fullname)) {
            tag 'html-missing-image-file', $file, $target;
        }
    }
}

return;
}

sub target_exists {
    my ($info, $proc, $group, $target_fullname) = @_;
    if ($info->index_resolved_path($target_fullname)) {
        return 1;
    }

    # Check our dependencies:
    my $ginfo = $group->info;
    my $deps = $ginfo->direct_dependencies($proc);
    foreach my $depproc (@{$ginfo->direct_dependencies($proc)}) {
        my $info = $depproc->info;
        my $f = $info->index_resolved_path($target_fullname);
        if ($f && $f->is_file) {
            return 1;
        }
    }

    return 0;
}

1;

# Local Variables:
# indent-tabs-mode: nil
# cperl-indent-level: 4
# End:
# vim: syntax=perl sw=4 sts=4 sr et
Check-Script: html
Type: binary
Needs-Info: unpacked, file-info
Info: This script checks HTML file content.

Tag: html-missing-image-file
Severity: normal
Certainty: possible
Info: HTML file missing an &lt;img&gt; file.
 Generally a HTML file in a package should have its image files
 packaged too, and in the right place.
 .
 If an image is only some candy then missing it doesn't matter very
 much, but the aim would still be to have the packaged page look good.
 If an image is something important like a technical diagram then
 missing it might make the HTML almost useless.
 .
 If a logo or similar is not freely redistributable then it will be
 deliberately omitted.  Lintian can't distinguish that from mistaken
 omission.
 .
 If some HTML is a template then its links might not exist yet.
 Lintian can't distinguish that from links that ought to have been
 filled in by a configure etc.  The suggestion would be to ignore
 reports on templates or add lintian overrides.
 .
 Beware absolute paths like src="/foo.png".  This is common in HTML
 written for a web site but fails when copied elsewhere like a Debian
 package.  Relative links are more helpful so that a document is
 displayable from under a different mount point etc.
 .
 Images supplied by a dependent package might give false positives.
 Packages from the same source should work if checked as a group.

-- System Information:
Debian Release: 8.0
  APT prefers unstable
  APT policy: (990, 'unstable')
Architecture: i386 (i686)

Kernel: Linux 3.16.0-4-686-pae (SMP w/1 CPU core)
Locale: LANG=en_AU.utf8, LC_CTYPE=en_AU.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: sysvinit (via /sbin/init)

Versions of packages lintian depends on:
ii  binutils                       2.25-4
ii  bzip2                          1.0.6-7+b2
ii  diffstat                       1.58-1
ii  file                           1:5.22+15-1
ii  gettext                        0.19.3-2
ii  hardening-includes             2.7
ii  intltool-debian                0.35.0+20060710.1
ii  libapt-pkg-perl                0.1.29+b2
ii  libarchive-zip-perl            1.39-1
ii  libclass-accessor-perl         0.34-1
ii  libclone-perl                  0.37-1+b1
ii  libdpkg-perl                   1.17.23
ii  libemail-valid-perl            1.195-1
ii  libfile-basedir-perl           0.03-1
ii  libipc-run-perl                0.92-1
ii  liblist-moreutils-perl         0.33-2+b1
ii  libparse-debianchangelog-perl  1.2.0-1.1
ii  libtext-levenshtein-perl       0.11-1
ii  libtimedate-perl               2.3000-2
ii  liburi-perl                    1.64-1
ii  man-db                         2.7.0.2-5
ii  patchutils                     0.3.3-1
ii  perl [libdigest-sha-perl]      5.20.1-5
ii  t1utils                        1.38-3

Versions of packages lintian recommends:
ii  libperlio-gzip-perl             0.18-3+b1
ii  perl                            5.20.1-5
ii  perl-modules [libautodie-perl]  5.20.1-5

Versions of packages lintian suggests:
pn  binutils-multiarch     <none>
ii  dpkg-dev               1.17.23
ii  libhtml-parser-perl    3.71-1+b3
ii  libtext-template-perl  1.46-1
ii  libyaml-perl           1.13-1
ii  xz-utils               5.1.1alpha+20120614-2+b3

-- no debconf information

-- debsums errors found:
debsums: changed file /usr/share/lintian/profiles/debian/main.profile (from lintian package)

Reply to: