[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Fix and improve reference links (was: GSoC status: classification, output format and more)



On Sat, Aug 02, 2008 at 09:28:25PM +0200, Jordà Polo wrote:
> The status of links on lintian.d.o has been bothering me for a while as
> well. lib/manual_refs is terribly outdated and manual_refs_update.pl is
> broken, so I tried to fix the problems (and implement some new features
> as well).

I have updated the patch since it didn't apply cleanly anymore (caused
by 9c27064d66, which introduced changes that were mostly implemented in
my first patch).

I made a few more improvements but tried to keep the first 4 patches
since you may have already reviewed them. The updated patches that you
can find attached should apply cleanly again. I only split the
regeneration of manual_refs as a new patch[1], which is not attached for
obvious reasons.

 1. http://ettin.org/tmp/lintian/manual-refs/v2/0007-Regenerate-manual-references-file.patch
>From 1c0d1a13333c7516ce5a9bc90f70fe30b7d95ba9 Mon Sep 17 00:00:00 2001
From: =?utf-8?q?Jord=C3=A0=20Polo?= <jorda@ettin.org>
Date: Fri, 1 Aug 2008 07:46:23 +0200
Subject: [PATCH] Standardize Debian Menu manual references
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="------------1.5.6.3"

This is a multi-part message in MIME format.
--------------1.5.6.3
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit

---
 checks/menu-format.desc |   10 +++++-----
 checks/menus.desc       |    2 +-
 2 files changed, 6 insertions(+), 6 deletions(-)
--------------1.5.6.3
Content-Type: text/x-patch; name="1c0d1a13333c7516ce5a9bc90f70fe30b7d95ba9.diff"
Content-Transfer-Encoding: 8bit
Content-Disposition: attachment; filename="1c0d1a13333c7516ce5a9bc90f70fe30b7d95ba9.diff"

diff --git a/checks/menu-format.desc b/checks/menu-format.desc
index b7845fe..0b4a0ce 100644
--- a/checks/menu-format.desc
+++ b/checks/menu-format.desc
@@ -42,7 +42,7 @@ Info: This menu item doesn't test to see if the package containing it is
  This error usually indicates a misspelling of the package name in the
  menu entry or a copied menu entry from another package that doesn't apply
  to this one.
-Ref: menu manual 3.2
+Ref: menu 3.2
 
 Tag: duplicated-tag-in-menu-item
 Type: warning
@@ -114,7 +114,7 @@ Info: The menu item has a line that specifies a new section to put a menu
 Tag: menu-icon-not-in-xpm-format
 Type: error
 Info: Icons in the Debian menu system should be in XPM format.
-Ref: menu manual 3.7
+Ref: menu 3.7
 
 Tag: menu-icon-missing
 Type: warning
@@ -125,13 +125,13 @@ Info: This icon file couldn't be found.  If the path to the icon in the
  .
  If the icon is in a package this package depends on, add a lintian
  override for this warning.  lintian cannot check icons in other packages.
-Ref: menu manual 3.7
+Ref: menu 3.7
 
 Tag: menu-icon-too-big
 Type: error
 Info: Icons in the Debian menu system should be at most 32x32 pixels
  (icon16x16 icons should of course be at most 16x16 pixels)
-Ref: menu manual 3.7
+Ref: menu 3.7
 
 Tag: menu-icon-cannot-be-parsed
 Type: warning
@@ -188,7 +188,7 @@ Tag: unquoted-string-in-menu-item
 Type: warning
 Info: The menu item includes a tag with an unquoted string like section=Games
  instead of section="Games". This is deprecated. Use a quoted string instead.
-Ref: menu manual 3.2
+Ref: menu 3.2
 
 Tag: menu-command-not-in-package
 Type: warning
diff --git a/checks/menus.desc b/checks/menus.desc
index 6aff4f2..2fe99de 100644
--- a/checks/menus.desc
+++ b/checks/menus.desc
@@ -312,4 +312,4 @@ Tag: menu-method-should-include-menu-h
 Type: error
 Info: A menu-method file must include the menu.h configuration file
  (using "!include menu.h").
-Ref: Debian Menu System manual section 5
+Ref: menu 5

--------------1.5.6.3--


>From 53d7e53ab98039523a20c74061618b45a4787364 Mon Sep 17 00:00:00 2001
From: =?utf-8?q?Jord=C3=A0=20Polo?= <jorda@ettin.org>
Date: Fri, 1 Aug 2008 09:12:39 +0200
Subject: [PATCH] Rewrite manual reference generator
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="------------1.5.6.3"

This is a multi-part message in MIME format.
--------------1.5.6.3
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


The rewrite of this scripts introduces two new major changes. First, it is now
possible to specify the regexes for each manual, and thus allows non-DebianDoc
manuals. Second, the output has been changed to '::' separated lists (since
some URLs have blank spaces) and also includes the title for each reference.

The script has also been simplified and doesn't try to "merge" reference
files, it simply collects references and displays the result.

All previously available manuals (policy, devref, menu) have been converted,
and new manual (FHS) has been included.
---
 private/manual_refs_update.pl |  120 ++++++++++++++++++++---------------------
 1 files changed, 58 insertions(+), 62 deletions(-)
--------------1.5.6.3
Content-Type: text/x-patch; name="53d7e53ab98039523a20c74061618b45a4787364.diff"
Content-Transfer-Encoding: 8bit
Content-Disposition: attachment; filename="53d7e53ab98039523a20c74061618b45a4787364.diff"

diff --git a/private/manual_refs_update.pl b/private/manual_refs_update.pl
index a329da0..fe647be 100755
--- a/private/manual_refs_update.pl
+++ b/private/manual_refs_update.pl
@@ -1,6 +1,7 @@
 #!/usr/bin/perl -w
 
-# Copyright (C) 2001 Colin Watson
+# Copyright © 2001 Colin Watson
+# Copyright © 2008 Jordà Polo
 #
 # This program is free software; you can redistribute it and/or modify
 # it under the terms of the GNU General Public License as published by
@@ -18,87 +19,82 @@
 # Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston,
 # MA 02110-1301, USA.
 
-# Invoke as ./manual_refs_update.pl manual_refs > manual_refs.new
+# Invoke as ./manual_refs_update.pl > manual_refs.new
 # You need copies of all the relevant manuals installed in the standard
 # places locally.
 
-# Currently, this is only likely to work with the HTML output by
-# DebianDoc-SGML. This seems to be OK for all the necessary manuals for now.
-
 use strict;
 
-# Location of the manual directory on the local filesystem, and base URL for
-# the eventual target of the reference.
+# For each manual, we need:
+#  * Location of the manual directory on the local filesystem
+#  * Base URL for the eventual target of the reference
+#  * Regex to match the title
+#  * Regex to match the possible references
+#  * Mapping from regex fields to reference fields
+
+my $ddoc_title = '<title>(.+?)<\/title>';
+my $ddoc_ref = '<a href="(.+?)">([A-Z]|[A-Z]?[\d\.]+?)\.?\s+'.
+               '([\w\s[:punct:]]+?)<\/a>';
+my @ddoc_fields = [ [ 'url' ], [ 'section' ], [ 'title' ] ];
 
 my %manuals = (
-    'policy'    => [ '/usr/share/doc/debian-policy/policy.html',
-                     'http://www.debian.org/doc/debian-policy' ],
-    'devref'    => [ '/usr/share/doc/developers-reference/' .
-                        'developers-reference.html',
-                     'http://www.debian.org/doc/packaging-manuals/' .
-                        'developers-reference' ],
-    'menu'      => [ '/usr/share/doc/menu/html',
-                     'http://www.debian.org/doc/packaging-manuals/menu.html' ],
+    'policy' => [ '/usr/share/doc/debian-policy/policy.html/index.html',
+                  'http://www.debian.org/doc/debian-policy/',
+                  $ddoc_title, $ddoc_ref, @ddoc_fields ],
+    'devref' => [ '/usr/share/doc/developers-reference/index.html',
+                  'http://www.debian.org/doc/developers-reference/',
+                  $ddoc_title, $ddoc_ref, @ddoc_fields ],
+    'menu'   => [ '/usr/share/doc/menu/html/index.html',
+                  'http://www.debian.org/doc/packaging-manuals/menu.html/',
+                  $ddoc_title, $ddoc_ref, @ddoc_fields ],
+    'fhs'    => [ '/usr/share/doc/debian-policy/fhs/fhs-2.3.html',
+                  'http://www.pathname.com/fhs/pub/fhs-2.3.html',
+                  '<title\s?>(.+?)<\/title\s?>',
+                  '<a\s+href="(#.+?)"\s?>([\w\s[:punct:]]+?)<\/a\s?>',
+                  [ [ 'section', 'url' ], [ 'title'] ] ],
 );
 
-my %refs;
+# Collect all possible references from avilable manuals.
 
 for my $manual (keys %manuals) {
-    my ($dir, $url) = @{$manuals{$manual}};
-    my @chapter_refs;
+    my ($index, $url, $title_re, $ref_re, $fields) = @{$manuals{$manual}};
+    my $title = 0;
 
-    unless (-d $dir) {
+    unless (-f $index) {
         print STDERR "Manual '$manual' not installed; not updating.\n";
         next;
     }
-    $refs{$manual} = [ "$manual $url/index.html" ];
 
-    local *DIR;
-    opendir DIR, $dir or die "Couldn't open $dir: $!";
-    while (defined(my $file = readdir DIR)) {
-        next unless -f "$dir/$file";
-        my $chapter;
-        local *FILE;
-        open FILE, "< $dir/$file" or
-            die "Couldn't open $dir/$file: $!";
-        while (<FILE>) {
-            if (m/^Chapter (\d+)/ and not defined $chapter) {
-                $chapter = $1;
-                push @{$chapter_refs[$chapter]}, "$manual-$1 $url/$file";
-            }
-            elsif (m/<a name="(.+?)">(\d.*?) /) {
-                if (defined $chapter) {
-                    push @{$chapter_refs[$chapter]},
-                         "$manual-$2 $url/$file#$1";
-                } else {
-                    print STDERR "No 'Chapter' line in $dir/$file; ",
-                                 "ignoring this file.\n";
-                    next;
+    open(INDEX, "$index") or die "Couldn't open $index: $!";
+
+    # Read until there are 2 newlines. This hack is needed since some lines in
+    # the Developer's Reference are cut in the middle of <a>...</a>.
+    local $/ = '\n\n';
+
+    while (<INDEX>) {
+        if (not $title and m/$title_re/i) {
+            $title = 1;
+            my @out = ( $manual, '', $1, $url );
+            print join('::', @out) . "\n";
+        }
+        while (m/$ref_re/gi) {
+            my %ref;
+            for(my $i = 0; $i < scalar @{$fields}; $i++) {
+                foreach my $c (@{$fields->[$i]}) {
+                    my $v = $i + 1;
+                    $ref{$c} = eval '$' . $v;
                 }
             }
+
+            $ref{section} =~ s/^\#(.+)$/\L$1/;
+            $ref{title} =~ s/\n//g;
+            $ref{url} = "$url$ref{url}";
+            my @out = ( $manual, $ref{section}, $ref{title}, "$ref{url}" );
+            print join('::', @out) . "\n";
         }
-        close FILE;
     }
-    closedir DIR;
 
-    for my $chapter_ref (@chapter_refs) {
-        next unless defined $chapter_ref;
-        push @{$refs{$manual}}, @$chapter_ref;
-    }
+    close(INDEX);
 }
 
-# Replace all lines for manuals for which we have up-to-date information.
-
-my %seen;
-
-while (<>) {
-    next unless m/^(\w+)/;
-    my $manual = $1;
-    next if $seen{$manual};
-    if (exists $manuals{$manual} and exists $refs{$manual}) {
-        $seen{$manual} = 1;
-        print join("\n", @{$refs{$manual}}), "\n";
-    } else {
-        print;
-    }
-}
+# vim: sw=4 sts=4 ts=4 et sr

--------------1.5.6.3--


>From eb2d0a4f0d987a9359988936bdf3713273fd153c Mon Sep 17 00:00:00 2001
From: =?utf-8?q?Jord=C3=A0=20Polo?= <jorda@ettin.org>
Date: Fri, 1 Aug 2008 13:07:24 +0200
Subject: [PATCH] Use IDs for FHS references
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="------------1.5.6.3"

This is a multi-part message in MIME format.
--------------1.5.6.3
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit

---
 checks/binaries.desc |    4 ++--
 checks/files.desc    |    8 ++++----
 2 files changed, 6 insertions(+), 6 deletions(-)
--------------1.5.6.3
Content-Type: text/x-patch; name="eb2d0a4f0d987a9359988936bdf3713273fd153c.diff"
Content-Transfer-Encoding: 8bit
Content-Disposition: attachment; filename="eb2d0a4f0d987a9359988936bdf3713273fd153c.diff"

diff --git a/checks/binaries.desc b/checks/binaries.desc
index 6a84e3e..ef70c4a 100644
--- a/checks/binaries.desc
+++ b/checks/binaries.desc
@@ -98,13 +98,13 @@ Info: The listed shared library doesn't include information about which
 
 Tag: arch-dependent-file-in-usr-share
 Type: error
-Ref: fhs 4.11
+Ref: fhs usrsharearchitectureindependentdata
 Info: This package installs an ELF binary in the <tt>/usr/share</tt>
  hierarchy, which is reserved for architecture-independent files.
 
 Tag: binary-in-etc
 Type: error
-Ref: fhs 3.7
+Ref: fhs etchostspecificsystemconfiguration
 Info: This package installs an ELF binary in <tt>/etc</tt>.
  The Filesystem Hierarchy Standard forbids this.
 
diff --git a/checks/files.desc b/checks/files.desc
index f7f3245..4939170 100644
--- a/checks/files.desc
+++ b/checks/files.desc
@@ -146,7 +146,7 @@ Type: error
 Info: This package installs a directory under <tt>/usr/share/man</tt> or
  <tt>/usr/X11R6/man</tt> that isn't a manual section directory or locale
  directory.
-Ref: fhs 4.11.5
+Ref: fhs usrsharemanmanualpages
 
 Tag: executable-manpage
 Type: error
@@ -377,7 +377,7 @@ Info: Documentation files should be owned by <tt>root/root</tt>.
 
 Tag: dir-or-file-in-var-www
 Type: error
-Ref: fhs 5
+Ref: fhs thevarhierarchy
 Info: Debian packages should not install files under <tt>/var/www</tt>.
  This is not one of the <tt>/var</tt> directories in the File Hierarchy
  Standard and is under the control of the local administrator.  Packages
@@ -410,7 +410,7 @@ Info: Debian packages should not install into <tt>/opt</tt>, because it
 
 Tag: dir-or-file-in-srv
 Type: error
-Ref: fhs 3
+Ref: fhs therootfilesystem
 Info: Debian packages should not install into <tt>/srv</tt>.  The
  specification of <tt>/srv</tt> states that its structure is at the
  discretion of the local administrator and no package should rely on any
@@ -688,7 +688,7 @@ Ref: policy 10.4
 
 Tag: file-in-usr-lib-sgml
 Type: warning
-Ref: fhs 4
+Ref: fhs theusrhierarchy
 Info: This package installs a file in <tt>/usr/lib/sgml</tt>.  This was
  the old location for SGML catalogs and similar flies.  All those files
  should now go into <tt>/usr/share/sgml</tt>.

--------------1.5.6.3--


>From f6bfe231111f5f62e889d27bbec87f9ba2ff48fd Mon Sep 17 00:00:00 2001
From: =?utf-8?q?Jord=C3=A0=20Polo?= <jorda@ettin.org>
Date: Wed, 6 Aug 2008 21:56:05 +0200
Subject: [PATCH] Implement support for new manual references
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="------------1.5.6.3"

This is a multi-part message in MIME format.
--------------1.5.6.3
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


This change not only makes it possible to display titles in addition to
chapters/sections, but it also allows IDs in references (e.g. Ref: fhs
usrsharemanmanualpages).
---
 lib/Manual_refs.pm  |   20 +++++-------
 lib/Read_taginfo.pm |   84 ++++++++++++++++++++++++++++----------------------
 2 files changed, 55 insertions(+), 49 deletions(-)
--------------1.5.6.3
Content-Type: text/x-patch; name="f6bfe231111f5f62e889d27bbec87f9ba2ff48fd.diff"
Content-Transfer-Encoding: 8bit
Content-Disposition: attachment; filename="f6bfe231111f5f62e889d27bbec87f9ba2ff48fd.diff"

diff --git a/lib/Manual_refs.pm b/lib/Manual_refs.pm
index 2e19012..5ea2e40 100644
--- a/lib/Manual_refs.pm
+++ b/lib/Manual_refs.pm
@@ -18,15 +18,7 @@
 
 use strict;
 
-# define hash for manuals
-my %manual =
-(
- 'policy' => 'Policy Manual',
- 'devref' => 'Developers Reference',
- 'fhs'    => 'FHS',
-);
-
-my %url;
+our %refs;
 
 my $lib = defined $ENV{LINTIAN_ROOT} ?  "$ENV{LINTIAN_ROOT}/" : "";
 
@@ -35,12 +27,16 @@ open (REFS, '<', "${lib}lib/manual_refs")
 
 while(<REFS>) {
     chomp;
-    next if m/^\s*\#/;
+    next if not m/^(.+?)::(.*?)::(.+?)::(.+?)$/;
 
-    my ($key, $data) = split;
-    $url{$key} = $data;
+    my ($man, $section, $title, $u) = split(::);
+    $section = '0' if $section eq "";
+    $refs{$man}{$section}{title} = $title;
+    $refs{$man}{$section}{url} = $u;
 }
 
 close REFS;
 
 1;
+
+# vim: sw=4 sts=4 ts=4 et sr
diff --git a/lib/Read_taginfo.pm b/lib/Read_taginfo.pm
index 70fff22..4e6e55c 100644
--- a/lib/Read_taginfo.pm
+++ b/lib/Read_taginfo.pm
@@ -26,17 +26,10 @@ use lib "$ENV{'LINTIAN_ROOT'}/lib";
 use Util;
 use Text_utils;
 use Manual_refs;
-use vars qw(%url); # from the above
+use vars qw(%refs); # from the above
 
 use strict;
 
-# define hash for manuals
-my %manual = (
-	      'policy' => 'Policy Manual',
-	      'devref' => 'Developers Reference',
-	      'fhs' => 'FHS',
-	     );
-
 srand;
 
 # load information about checker scripts
@@ -83,42 +76,59 @@ sub read_tag_info {
     return \%tag_info;
 }
 
-sub format_ref {
-    my ($ref) = @_;
+sub manual_ref {
+    my ($man, $sub) = @_;
+    my $numbered = ($sub =~ /[A-Z\d\.]+/) ? 1 : 0;
+    my $chapter = ($sub =~ /^[\d]+$/) ? 1 : 0;
+    my $appendix = ($sub =~ /^[A-Z]+$/) ? 1 : 0;
 
-    my @foo = split(/\s*,\s*/o,$ref);
-    my $u;
-    for ($u=0; $u<=$#foo; $u++) {
-	if ($foo[$u] =~ m,^\s*(policy|devref|fhs)\s*([\d\.]+)?\s*$,oi) {
-	    my ($man,$sec) = ($1,$2);
+    return "" if not exists $refs{$man}{0};
 
-	    $foo[$u] = $manual{lc $man};
+    my $man_title = $refs{$man}{0}{title};
+    my $man_url = $refs{$man}{0}{url};
+    my $text = "<a href='$man_url'>$man_title</a>";
 
-	    if ($sec =~ m,^\d+$,o) {
-		$foo[$u] .= ", chapter $sec";
-	    } elsif ($sec) {
-		$foo[$u] .= ", section $sec";
-	    }
+    my $div = '';
+    $div = "section $sub " if $numbered;
+    $div = "chapter $sub " if $chapter;
+    $div = "appendix $sub " if $appendix;
 
-	    if (exists $url{"$man-$sec"}) {
-		$foo[$u] = "<a href=\"$url{\"$man-$sec\"}\">$foo[$u]</a>";
-	    } elsif (exists $url{$man}) {
-		$foo[$u] = "<a href=\"$url{$man}\">$foo[$u]</a>";
-	    }
-	} elsif ($foo[$u] =~ m,^\s*((?:ftp|https?)://[\S~-]+?/?)\s*$,i) {
-	    $foo[$u] = "<a href=\"$1\">$1</a>";
-	} elsif ($foo[$u] =~ m,\s*([\w_-]+\(\d+\w*\))\s*$,i) {
-	    $foo[$u] = "the $foo[$u] manual page";
-	}
+    if (exists $refs{$man}{$sub}) {
+        my $sub_title = $refs{$man}{$sub}{title};
+        my $sub_url = $refs{$man}{$sub}{url};
+        $text .= " $div(<a href='$sub_url'>$sub_title</a>)";
+    }
+
+    return $text;
+}
+
+sub format_ref {
+    my ($header) = @_;
+    my $text = '';
+    my @list;
+
+    foreach my $ref (split(/,\s?/, $header)) {
+        if ($ref =~ /^([\w-]+)\s(.+)$/) {
+            $text = manual_ref($1, $2);
+        } elsif ($ref =~ /^[\w_-]+\(\d\)$/) {
+            $text = "the $ref manual page";
+        } elsif ($ref =~ /^(?:ftp|https?):\/\//) {
+            $text = "<a href='$ref'>$ref</a>";
+        }
+        push(@list, $text) if $text;
     }
-	
-    if ($#foo+1 > 2) {
-	$ref = sprintf "Refer to %s, and %s for details.",join(', ',splice(@foo,0,$#foo)),@foo;
-    } elsif ($#foo+1 > 0) {
-	$ref = sprintf "Refer to %s for details.",join(' and ',@foo);
+
+    if ($#list >= 2) {
+        $text = join(', ', splice(@list , 0, $#list));
+        $text = "Refer to $text, and @list for details.";
+    } elsif ($#list >= 0) {
+        $text = join(' and ', @list);
+        $text = "Refer to $text for details.";
     }
 
-    return $ref;
+    return $text;
 }
 
 1;
+
+# vim: sw=4 sts=4 ts=4 et sr

--------------1.5.6.3--


>From c3899fde1eaabb96ee585e9609b76a7b8f6d4d3b Mon Sep 17 00:00:00 2001
From: =?utf-8?q?Jord=C3=A0=20Polo?= <jorda@ettin.org>
Date: Wed, 6 Aug 2008 18:58:47 +0200
Subject: [PATCH] Standardize remaining references
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="------------1.5.6.3"

This is a multi-part message in MIME format.
--------------1.5.6.3
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit

---
 checks/binaries.desc       |    4 ++--
 checks/copyright-file.desc |    2 +-
 checks/fields.desc         |    4 ++--
 checks/files.desc          |    2 +-
 checks/menu-format.desc    |    2 +-
 checks/menus.desc          |   32 ++++++++++++++++----------------
 6 files changed, 23 insertions(+), 23 deletions(-)
--------------1.5.6.3
Content-Type: text/x-patch; name="c3899fde1eaabb96ee585e9609b76a7b8f6d4d3b.diff"
Content-Transfer-Encoding: 8bit
Content-Disposition: attachment; filename="c3899fde1eaabb96ee585e9609b76a7b8f6d4d3b.diff"

diff --git a/checks/binaries.desc b/checks/binaries.desc
index ef70c4a..4f4936c 100644
--- a/checks/binaries.desc
+++ b/checks/binaries.desc
@@ -124,7 +124,7 @@ Info: The package name of a library package should usually reflect
  from the library file name with the following code snippet:
  .
   $ objdump -p /path/to/libfoo-bar.so.1.2.3 | sed -n -e's/^[[:space:]]*SONAME[[:space:]]*//p' | sed -e's/\([0-9]\)\.so\./\1-/; s/\.so\.//'
-Ref: Library Packaging guide 5
+Ref: libpkg-guide 5
 
 Tag: binary-with-bad-dynamic-table
 Type: error
@@ -156,7 +156,7 @@ Info: The listed file appears to be linked against the C library, but the
 
 Tag: missing-dependency-on-perlapi
 Type: error
-Ref: Perl policy 4.4.2
+Ref: perl-policy 4.4.2
 Info: This package includes a *.so file in <tt>/usr/lib/perl5</tt>,
  normally indicating that it includes a binary Perl module.  Binary Perl
  modules must depend on perlapi-$Config{version} (from the Config module).
diff --git a/checks/copyright-file.desc b/checks/copyright-file.desc
index 1977198..5f05d72 100644
--- a/checks/copyright-file.desc
+++ b/checks/copyright-file.desc
@@ -195,7 +195,7 @@ Info: There is "Upstream Author(s)" in your copyright file. This was most
 
 Tag: copyright-has-url-from-dh_make-boilerplate
 Type: warning
-Ref: 12.5
+Ref: policy 12.5
 Info: There is "url://example.com" in your copyright file. This was most
  likely a remnant from the dh_make template.
  .
diff --git a/checks/fields.desc b/checks/fields.desc
index a0fa3fd..1f9c1a4 100644
--- a/checks/fields.desc
+++ b/checks/fields.desc
@@ -63,7 +63,7 @@ Tag: magic-arch-in-arch-list
 Type: error
 Info: The special architecture values `all' and `any' only make sense if
  they occur alone.
-Ref:  policy 5.6.8
+Ref: policy 5.6.8
 
 Tag: unknown-architecture
 Type: warning
@@ -700,7 +700,7 @@ Info: You depend on the build-essential package, which is only a
 
 Tag: malformed-python-version
 Type: error
-Ref: Python policy 2.3
+Ref: python-policy 2.3
 Info: The Python-Version control field is not in one of the valid
  formats.  It should be in one of the following formats:
  .
diff --git a/checks/files.desc b/checks/files.desc
index 4939170..ebf201b 100644
--- a/checks/files.desc
+++ b/checks/files.desc
@@ -628,7 +628,7 @@ Info: Architecture-independent Perl code should be placed in
 
 Tag: file-in-usr-lib-site-python
 Type: error
-Ref: Python policy 1.4
+Ref: python-policy 1.4
 Info: The directory /usr/lib/site-python has been deprecated as a
  location for installing Python modules and may be dropped from Python's
  module search path in a future version.  Most likely this module is a
diff --git a/checks/menu-format.desc b/checks/menu-format.desc
index 0b4a0ce..6836219 100644
--- a/checks/menu-format.desc
+++ b/checks/menu-format.desc
@@ -102,7 +102,7 @@ Info: The menu item has a line that specifies an unknown section or uses a
  applications should use directly.  Check the spelling of the section and
  check the section against the list in the menu policy.  (The menu
  sections changed as of June of 2007.)
-Ref: Debian Menu sub-policy 2.1
+Ref: menu-policy 2.1
 
 Tag: menu-item-creates-new-root-section
 Type: error
diff --git a/checks/menus.desc b/checks/menus.desc
index 2fe99de..c60fb8f 100644
--- a/checks/menus.desc
+++ b/checks/menus.desc
@@ -161,7 +161,7 @@ Type: error
 Info: The Index field in a doc-base file should reference the single index
  file for that document.  Any other files belonging to the same document
  should be listed in the Files field.
-Ref: Debian doc-base Manual section 2.3.2.2
+Ref: doc-base 2.3.2.2
 
 Tag: doc-base-file-references-missing-file
 Type: error
@@ -175,19 +175,19 @@ Type: warning
 Info: The Format field in this doc-base control file declares a format
  that is not supported.  Recognized formats are "HTML", "Text", "PDF",
  "PostScript", "Info", "DVI", and "DebianDoc-SGML" (case-insensitive).
-Ref: Debian doc-base Manual section 2.3.2.2
+Ref: doc-base 2.3.2.2
 
 Tag: doc-base-file-no-format
 Type: error
 Info: A format section of this doc-base control file didn't specify a
  format.  Each section after the first must specify a format.
-Ref: Debian doc-base Manual section 2.3.2.2
+Ref: doc-base 2.3.2.2
 
 Tag: doc-base-file-no-format-section
 Type: error
 Info: This doc-base control file didn't specify any format
  section.
-Ref: Debian doc-base Manual section 2.3.2.2
+Ref: doc-base 2.3.2.2
 
 Tag: doc-base-file-no-index
 Type: error
@@ -195,7 +195,7 @@ Info: Format sections in doc-base control files for HTML or Info documents
  must contain an Index field specifying the starting document for the
  documentation.  Even if the documentation is a single file, this field
  must be present.
-Ref: Debian doc-base Manual section 2.3.2.2
+Ref: doc-base 2.3.2.2
 
 Tag: doc-base-document-field-ends-in-whitespace
 Type: error
@@ -210,7 +210,7 @@ Info: The Document field in doc-base control file must be located at
  first line of the file.  While unregistering documents, doc-base 0.8
  and later parses only the first line of the control file for performance
  reasons.
-Ref: Debian doc-base Manual section 2.3.2.1
+Ref: doc-base 2.3.2.1
 
 Tag: doc-base-file-unknown-field
 Type: error
@@ -218,7 +218,7 @@ Info: The doc-base control file contains field which is either unknown
  or not valid for the section where was found.  Possible reasons for this
  error are: a typo in field name, missing empty line between control file
  sections, or an extra empty line separating sections.
-Ref: Debian doc-base Manual sections 2.3.2.1 and 2.3.2.2
+Ref: doc-base 2.3.2.1, doc-base 2.3.2.2
 
 Tag: doc-base-file-duplicated-field
 Type: error
@@ -228,20 +228,20 @@ Tag: doc-base-file-duplicated-format
 Type: error
 Info: The doc-base control file contains a duplicated format.  Doc-base
  files must not register different documents in one control file.
-Ref: Debian doc-base Manual section 2.3.2.2
+Ref: doc-base 2.3.2.2
 
 Tag: doc-base-file-lacks-required-field
 Type: error
 Info: The doc-base control file does not contain a required field for the
  appropriate section.
-Ref: Debian doc-base Manual sections 2.3.2.1 and 2.3.2.2
+Ref: doc-base 2.3.2.1, doc-base 2.3.2.2
 
 Tag: doc-base-invalid-document-field
 Type: error
 Info: The Document field should consists only of letters (a-z), digits
  (0-9), plus (+) or minus (-) signs, and dots (.).  In particular,
  uppercase letters are not allowed.
-Ref: Debian doc-base Manual section 2.2
+Ref: doc-base 2.2
 
 Tag: doc-base-abstract-field-is-template
 Type: warning
@@ -254,14 +254,14 @@ Type: warning
 Info: Continuation lines of the Abstract field of doc-base control file
  should start with only one space unless they are meant to be displayed
  verbatim by frontends.
-Ref: Debian doc-base Manual section 2.3.2
+Ref: doc-base 2.3.2
 
 Tag: doc-base-abstract-field-separator-extra-whitespaces
 Type: warning
 Info: Unnecessary spaces were found in the paragraph separator line of the
  doc-base's Abstract field.  The separator line should consist of a single
  space followed by a single dot.
-Ref: Debian doc-base Manual section 2.3.2
+Ref: doc-base 2.3.2
 
 Tag: spelling-error-in-doc-base-title-field
 Type: warning
@@ -280,14 +280,14 @@ Info: Lintian found a spelling or capitalization error in the Abstract
 Tag: doc-base-file-syntax-error
 Type: error
 Info: Lintian found a syntax error in the doc-base control file.
-Ref: Debian doc-base Manual section 2.3.2.2
+Ref: doc-base 2.3.2.2
 
 Tag: doc-base-file-separator-extra-whitespaces
 Type: warning
 Info: Unnecessary spaces were found in the doc-base file sections'
  separator.  The section separator is an empty line and should not contain
  any whitespace.
-Ref: Debian doc-base Manual section 2.3.2
+Ref: doc-base 2.3.2
 
 Tag: doc-base-file-uses-obsolete-national-encoding
 Type: error
@@ -299,14 +299,14 @@ Info: doc-base files must be valid UTF-8, an encoding of the Unicode
  .
   $ iconv -f ISO-8859-1 -t UTF-8 doc-base &gt; doc-base.new
   $ mv doc-base.new doc-base
-Ref: Debian doc-base Manual section 2.3.2
+Ref: doc-base 2.3.2
 
 Tag: doc-base-unknown-section
 Type: warning
 Info: The section indicated in this doc-base control file is not one of
  the standard doc-base sections.  The doc-base sections are based on the
  menu sections but are not exactly the same.
-Ref: Debian doc-base Manual section 2.3.3
+Ref: doc-base 2.3.3
 
 Tag: menu-method-should-include-menu-h
 Type: error

--------------1.5.6.3--


>From 63772cd15615da73b2726886b73fa6f194bd3717 Mon Sep 17 00:00:00 2001
From: =?utf-8?q?Jord=C3=A0=20Polo?= <jorda@ettin.org>
Date: Wed, 6 Aug 2008 19:34:53 +0200
Subject: [PATCH] Support reference generation for additional manuals
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="------------1.5.6.3"

This is a multi-part message in MIME format.
--------------1.5.6.3
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


manual_refs_update.pl is now able to generate references for menu-policy,
perl-policy, python-policy and libpkg-guide. Note that for the Library
Packaging guide only chapters are generated since sections are slightly
different in the avalable HTML-formatted manuals.

Reference detection has been improved and now uses a <link>-based regex if it
is available. Title detection has been simplified.
---
 private/manual_refs_update.pl |   74 ++++++++++++++++++++++++++++++-----------
 1 files changed, 54 insertions(+), 20 deletions(-)
--------------1.5.6.3
Content-Type: text/x-patch; name="63772cd15615da73b2726886b73fa6f194bd3717.diff"
Content-Transfer-Encoding: 8bit
Content-Disposition: attachment; filename="63772cd15615da73b2726886b73fa6f194bd3717.diff"

diff --git a/private/manual_refs_update.pl b/private/manual_refs_update.pl
index fe647be..26a3e4d 100755
--- a/private/manual_refs_update.pl
+++ b/private/manual_refs_update.pl
@@ -28,36 +28,70 @@ use strict;
 # For each manual, we need:
 #  * Location of the manual directory on the local filesystem
 #  * Base URL for the eventual target of the reference
-#  * Regex to match the title
 #  * Regex to match the possible references
-#  * Mapping from regex fields to reference fields
+#  * Mapping from regex fields to reference fields (array of arrays of
+#    keywords: url, section title; the position of each keyword in the array
+#    defines which is its corresponding group in the regex).
 
-my $ddoc_title = '<title>(.+?)<\/title>';
-my $ddoc_ref = '<a href="(.+?)">([A-Z]|[A-Z]?[\d\.]+?)\.?\s+'.
+my $title_re = '<title\s?>(.+?)<\/title\s?>';
+my $link_re = '<link href="(.+?)" rel="[\w]+" '.
+              'title="([A-Z]|[A-Z]?[\d\.]+?)\.?\s+([\w\s[:punct:]]+?)">';
+my $index_re = '<a href="(.+?)">([A-Z]|[A-Z]?[\d\.]+?)\.?\s+'.
                '([\w\s[:punct:]]+?)<\/a>';
-my @ddoc_fields = [ [ 'url' ], [ 'section' ], [ 'title' ] ];
+my @fields = [ [ 'url' ], [ 'section' ], [ 'title' ] ];
 
 my %manuals = (
-    'policy' => [ '/usr/share/doc/debian-policy/policy.html/index.html',
-                  'http://www.debian.org/doc/debian-policy/',
-                  $ddoc_title, $ddoc_ref, @ddoc_fields ],
-    'devref' => [ '/usr/share/doc/developers-reference/index.html',
-                  'http://www.debian.org/doc/developers-reference/',
-                  $ddoc_title, $ddoc_ref, @ddoc_fields ],
-    'menu'   => [ '/usr/share/doc/menu/html/index.html',
-                  'http://www.debian.org/doc/packaging-manuals/menu.html/',
-                  $ddoc_title, $ddoc_ref, @ddoc_fields ],
-    'fhs'    => [ '/usr/share/doc/debian-policy/fhs/fhs-2.3.html',
-                  'http://www.pathname.com/fhs/pub/fhs-2.3.html',
-                  '<title\s?>(.+?)<\/title\s?>',
-                  '<a\s+href="(#.+?)"\s?>([\w\s[:punct:]]+?)<\/a\s?>',
-                  [ [ 'section', 'url' ], [ 'title'] ] ],
+    'policy' => [
+        '/usr/share/doc/debian-policy/policy.html/index.html',
+        'http://www.debian.org/doc/debian-policy/',
+        $link_re, @fields
+    ],
+    'menu-policy' => [
+        '/usr/share/doc/debian-policy/menu-policy.html/index.html',
+        'http://www.debian.org/doc/packaging-manuals/menu-policy/',
+        $link_re, @fields
+    ],
+    'perl-policy' => [
+        '/usr/share/doc/debian-policy/perl-policy.html/index.html',
+        'http://www.debian.org/doc/packaging-manuals/perl-policy/',
+        $link_re, @fields
+    ],
+    'python-policy' => [
+        '/usr/share/doc/python/python-policy.html/index.html',
+        'http://www.debian.org/doc/packaging-manuals/python-policy/',
+        $link_re, @fields
+    ],
+    'devref' => [
+        '/usr/share/doc/developers-reference/index.html',
+        'http://www.debian.org/doc/developers-reference/',
+        $index_re, @fields
+    ],
+    'menu' => [
+        '/usr/share/doc/menu/html/index.html',
+        'http://www.debian.org/doc/packaging-manuals/menu.html/',
+        $index_re, @fields
+    ],
+    # Extract chapters only, since the HTML available in netfort.gr.jp isn't
+    # exactly the same with regards to section IDs as the version included in
+    # the package.
+    'libpkg-guide' => [
+        '/usr/share/doc/libpkg-guide/libpkg-guide.html',
+        'http://www.netfort.gr.jp/~dancer/column/libpkg-guide/libpkg-guide.html',
+        'class="chapter"><a href="(.+?)">([\d\.]+?)\.? ([\w\s[:punct:]]+?)<\/a>',
+        @fields
+    ],
+    'fhs' => [
+        '/usr/share/doc/debian-policy/fhs/fhs-2.3.html',
+        'http://www.pathname.com/fhs/pub/fhs-2.3.html',
+        '<a\s+href="(#.+?)"\s?>([\w\s[:punct:]]+?)<\/a\s?>',
+        [ [ 'section', 'url' ], [ 'title'] ]
+    ],
 );
 
 # Collect all possible references from avilable manuals.
 
 for my $manual (keys %manuals) {
-    my ($index, $url, $title_re, $ref_re, $fields) = @{$manuals{$manual}};
+    my ($index, $url, $ref_re, $fields) = @{$manuals{$manual}};
     my $title = 0;
 
     unless (-f $index) {

--------------1.5.6.3--



Reply to: