[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1091751: patgen.1: Some remarks and a patch with editorial changes for this man page



Package: texlive-binaries
Version: 2024.20240313.70630+ds-5+b1
Severity: minor
Tags: patch

   * What led up to the situation?

     Checking for defects with a new version

test-[g|n]roff -mandoc -t -K utf8 -rF0 -rHY=0 -rCHECKSTYLE=10 -ww -z < "man page"

  [Use "groff -e ' $' <file>" to find trailing spaces.]

  ["test-groff" is a script in the repository for "groff"; is not shipped]
(local copy and "troff" slightly changed by me).

  [The fate of "test-nroff" was decided in groff bug #55941.]

   * What was the outcome of this action?

troff:<stdin>:21: warning: trailing space in the line


Bad use of \s0 in a string definition, the string "X" could be resized,
for example with "\s-1\*X\s0".


8:.if t .ds BX \fRB\s-2IB\s0\fP\*(TX
11:.if t .ds LX \fRL\\h'-0.36m'\\v'-0.15v'\s-2A\s0\\h'-0.15m'\\v'0.15v'\fP\*(TX


   * What outcome did you expect instead?

     No output (no warnings).

-.-

  General remarks and further material, if a diff-file exist, are in the
attachments.


-- System Information:
Debian Release: trixie/sid
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: amd64 (x86_64)

Kernel: Linux 6.12.6-amd64 (SMP w/2 CPU threads; PREEMPT)
Locale: LANG=is_IS.iso88591, LC_CTYPE=is_IS.iso88591 (charmap=ISO-8859-1), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: sysvinit (via /sbin/init)

Versions of packages texlive-binaries depends on:
ii  libc6            2.40-4
ii  libcairo2        1.18.2-2
ii  libfontconfig1   2.15.0-1.1+b1
ii  libfreetype6     2.13.3+dfsg-1
ii  libgcc-s1        14.2.0-8
ii  libgraphite2-3   1.3.14-2+b1
ii  libharfbuzz0b    10.1.0-1
ii  libicu72         72.1-5+b1
ii  libkpathsea6     2024.20240313.70630+ds-5+b1
ii  libmpfi0         1.5.4+ds-4
ii  libmpfr6         4.2.1-1+b2
ii  libpaper2        2.2.5-0.3
ii  libpixman-1-0    0.44.0-3
ii  libpng16-16t64   1.6.44-3
ii  libpotrace0      1.16-2+b2
ii  libptexenc1      2024.20240313.70630+ds-5+b1
ii  libstdc++6       14.2.0-8
ii  libsynctex2      2024.20240313.70630+ds-5+b1
ii  libteckit0       2.5.12+ds1-1+b1
ii  libtexlua53-5    2024.20240313.70630+ds-5+b1
ii  libx11-6         2:1.8.10-2
ii  libxaw7          2:1.0.16-1
ii  libxi6           2:1.8.2-1
ii  libxmu6          2:1.1.3-3+b3
ii  libxpm4          1:3.5.17-1+b2
ii  libxt6t64        1:1.2.1-1.2+b1
ii  libzzip-0-13t64  0.13.72+dfsg.1-1.2+b1
ii  perl             5.40.0-8
ii  t1utils          1.41-4
ii  tex-common       6.18
ii  zlib1g           1:1.3.dfsg+really1.3.1-1+b1

Versions of packages texlive-binaries recommends:
pn  dvisvgm       <none>
ii  texlive-base  2024.20241115-1

Versions of packages texlive-binaries suggests:
pn  hintview               <none>
pn  texlive-binaries-sse2  <none>

Versions of packages tex-common depends on:
ii  ucf  3.0046

Versions of packages tex-common suggests:
pn  debhelper  <none>

Versions of packages texlive-binaries is related to:
ii  tex-common    6.18
ii  texlive-base  2024.20241115-1

-- no debconf information
Input file is patgen.1

  Any program (person), that produces man pages, should check the output
for defects by using (both groff and nroff)

[gn]roff -mandoc -t -ww -b -z -K utf8  <man page>

  The same goes for man pages that are used as an input.

  For a style guide use

  mandoc -T lint

-.-

  So any 'generator' should check its products with the above mentioned
'groff', 'mandoc',  and additionally with 'nroff ...'.

  This is just a simple quality control measure.

  The 'generator' may have to be corrected to get a better man page,
the source file may, and any additional file may.

  Common defects:

  Input text line longer than 80 bytes.

  Not removing trailing spaces (in in- and output).
  The reason for these trailing spaces should be found and eliminated.

  Not beginning each input sentence on a new line.
Lines should thus be shorter.

  See man-pages(7), item 'semantic newline'.

-.-

The difference between the formatted output of the original and patched file
can be seen with:

  nroff -mandoc <file1> > <out1>
  nroff -mandoc <file2> > <out2>
  diff -u <out1> <out2>

and for groff, using

"printf '%s\n%s\n' '.kern 0' '.ss 12 0' | groff -mandoc -Z - "

instead of 'nroff -mandoc'

  Add the option '-t', if the file contains a table.

  Read the output of 'diff -u' with 'less -R' or similar.

-.-.

  If 'man' (man-db) is used to check the manual for warnings,
the following must be set:

  The option "-warnings=w"

  The environmental variable:

export MAN_KEEP_STDERR=yes (or any non-empty value)

  or

  (produce only warnings):

export MANROFFOPT="-ww -b -z"

export MAN_KEEP_STDERR=yes (or any non-empty value)


-.-.

Output from "mandoc -T lint  patgen.1": (shortened list)

      2 whitespace at end of input line

-.-.

Output from "test-groff -mandoc -t -ww -z patgen.1": (shortened list)

      1 trailing space in the line

-.-.

Remove space characters (whitespace) at the end of lines.
Use "git apply ... --whitespace=fix" to fix extra space issues, or use
global configuration "core.whitespace".

Number of lines affected is

1

-.-.

Change '-' (\-) to '\(en' (en-dash) for a numeric range.
GNU gnulib has recently (2023-06-18) updated its
"build_aux/update-copyright" to recognize "\(en" in man pages.

patgen.1:113:in columns 1-2,
patgen.1:115:in columns 3-4, and either a blank or the replacement for one of the

-.-.

Wrong distance between sentences in the input file.

  Separate the sentences and subordinate clauses; each begins on a new
line.  See man-pages(7) ("Conventions for source file layout") and
"info groff" ("Input Conventions").

  The best procedure is to always start a new sentence on a new line,
at least, if you are typing on a computer.

Remember coding: Only one command ("sentence") on each (logical) line.

E-mail: Easier to quote exactly the relevant lines.

Generally: Easier to edit the sentence.

Patches: Less unaffected text.

Search for two adjacent words is easier, when they belong to the same line,
and the same phrase.

  The amount of space between sentences in the output can then be
controlled with the ".ss" request.

Mark a final abbreviation point as such by suffixing it with "\&".


37:language. The
43:language. Further details of the pattern generation process such as
45:the user's terminal. Optionally
57:for use in hyphenating words. For a real-life example of
77:of a word), and letters. In pattern files for non-English languages
86:per line starting in column 1. A digit in column 1 indicates a global
88:next global word weight. A digit at some intercharacter position
116:"hyphen" characters \`-', \`*', and \`.' in columns 5, 6, and 7. (Input
144:generated. The value of
156:Finally the decision (\`y' or \`Y' vs. anything else) whether or not to
181:Stanford University Ph.D. thesis, 1983,
184:Donald E. Knuth,
194:technical report. Howard Trickey originally ported it to Unix.

-.-.

Put a parenthetical sentence, phrase on a separate line,
if not part of a code.
See man-pages(7), item "semantic newline".

patgen.1:34:language (not a complete TeX source file; see below), and produces the
patgen.1:36:with (previously- plus newly-generated) hyphenation patterns for that
patgen.1:76:must entirely consist of digits (hyphenation levels), dots (\`.', edge
patgen.1:87:word weight (initially =1) applicable to all following words up to the
patgen.1:93:found, \`good' hyphens (correctly found by the patterns), and \`bad'
patgen.1:94:hyphens (erroneously found by the patterns) respectively; when reading a
patgen.1:117:lines are padded with blanks as for many \*(TX related programs.)
patgen.1:121:of that character (first the \`lower' case one used for output), each
patgen.1:156:Finally the decision (\`y' or \`Y' vs. anything else) whether or not to

-.-.

Output from "test-groff  -mandoc -t -K utf8 -rF0 -rHY=0 -rCHECKSTYLE=10 -ww -z ":

troff:<stdin>:21: warning: trailing space in the line

Bad use of \s0 in a string definition, the string "X" could be resized,
for example with "\s-1\*X\s0".

8:.if t .ds BX \fRB\s-2IB\s0\fP\*(TX
11:.if t .ds LX \fRL\\h'-0.36m'\\v'-0.15v'\s-2A\s0\\h'-0.15m'\\v'0.15v'\fP\*(TX

-.-.

  Additionally (general):

  Abbreviations get a '\&' added after their final full stop (.) to mark them
as such and not as an end of a sentence.
--- patgen.1	2024-12-30 19:59:09.999922383 +0000
+++ patgen.1.new	2024-12-30 20:19:04.960125126 +0000
@@ -5,10 +5,10 @@
 .ie t .ds OX \fIT\v'+0.25m'E\v'-0.25m'X\fP
 .el .ds OX TeX
 .\" BX definition must follow TX so BX can use TX
-.if t .ds BX \fRB\s-2IB\s0\fP\*(TX
+.if t .ds BX \fRB\s-2IB\s+2\fP\*(TX
 .if n .ds BX BibTeX
 .\" LX definition must follow TX so LX can use TX
-.if t .ds LX \fRL\\h'-0.36m'\\v'-0.15v'\s-2A\s0\\h'-0.15m'\\v'0.15v'\fP\*(TX
+.if t .ds LX \fRL\\h'-0.36m'\\v'-0.15v'\s-2A\s+2\\h'-0.15m'\\v'0.15v'\fP\*(TX
 .if n .ds LX LaTeX
 .\"=====================================================================
 .SH NAME
@@ -18,9 +18,9 @@ patgen \- generate patterns for TeX hyph
 .I dictionary_file pattern_file patout_file translate_file
 .\"=====================================================================
 .SH DESCRIPTION
-This manual page is not meant to be exhaustive.  
+This manual page is not meant to be exhaustive.
 See also the Info file or manual
-.I "Web2C: A TeX implementation" 
+.I "Web2C: A TeX implementation"
 available as part of the TeX Live distribution or at
 .IR http://tug.org/web2c .
 .PP
@@ -30,19 +30,27 @@ program reads the
 .I dictionary_file
 containing a list of hyphenated words and the
 .I pattern_file
-containing previously-generated patterns (if any) for a particular
-language (not a complete TeX source file; see below), and produces the
+containing previously-generated patterns
+(if any)
+for a particular language
+(not a complete TeX source file; see below),
+and produces the
 .I patout_file
-with (previously- plus newly-generated) hyphenation patterns for that
-language. The
+with
+(previously- plus newly-generated)
+hyphenation patterns for that language.
+The
 .I translate_file
 defines language specific values for the parameters
 .IR left_hyphen_min " and " right_hyphen_min
-used by \*(TX's hyphenation algorithm and the external representation
+used by \*(TX's hyphenation algorithm
+and the external representation
 of the lower and upper case version(s) of all \`letters' of that
-language. Further details of the pattern generation process such as
-hyphenation levels and pattern lengths are requested interactively from
-the user's terminal. Optionally
+language.
+Further details of the pattern generation process
+such as hyphenation levels and pattern lengths
+are requested interactively from the user's terminal.
+Optionally
 .I patgen
 creates a new dictionary file
 .BI pattmp. n
@@ -54,7 +62,8 @@ The patterns generated by
 .I patgen
 can be read by
 .B initex
-for use in hyphenating words. For a real-life example of
+for use in hyphenating words.
+For a real-life example of
 .IR patgen 's
 output, see
 .IR $TEXMFMAIN/tex/generic/hyphen/hyphen.tex ,
@@ -73,9 +82,13 @@ extensions or path searching is done.
 When
 .B initex
 digests hyphenation patterns, \*(TX first expands macros and the result
-must entirely consist of digits (hyphenation levels), dots (\`.', edge
-of a word), and letters. In pattern files for non-English languages
-letters are often represented by macros or other expandable constructs.
+must entirely consist of digits
+(hyphenation levels),
+dots (\`.', edge of a word),
+and letters.
+In pattern files for non-English languages
+letters are often represented by macros
+or other expandable constructs.
 For the purpose of
 .I patgen
 these are just character sequences, subject to the condition that no
@@ -83,16 +96,24 @@ such sequence is a prefix of another one
 .TP \w'@@'u+2n
 .B Dictionary file
 A dictionary file contains a weighted list of hyphenated words, one word
-per line starting in column 1. A digit in column 1 indicates a global
-word weight (initially =1) applicable to all following words up to the
-next global word weight. A digit at some intercharacter position
+per line starting in column 1.
+A digit in column 1 indicates a global word weight
+(initially =1)
+applicable to all following words up to the
+next global word weight.
+A digit at some intercharacter position
 indicates a weight for that position only.
 
-The hyphens in a word are indicated by \`-', \`*', or \`.' (or their
-replacements as defined in the translate file) for hyphens yet to be
-found, \`good' hyphens (correctly found by the patterns), and \`bad'
-hyphens (erroneously found by the patterns) respectively; when reading a
-dictionary file \`*' is treated like \`-' and \`.' is ignored.
+The hyphens in a word are indicated by \`-', \`*', or \`.'
+(or their replacements as defined in the translate file)
+for hyphens yet to be found,
+\`good' hyphens
+(correctly found by the patterns),
+and \`bad' hyphens
+(erroneously found by the patterns)
+respectively;
+when reading a dictionary file \`*' is treated like \`-'
+and \`.' is ignored.
 .TP
 .B Pattern file
 A pattern file contains only patterns in the format above, e.g., from a
@@ -110,17 +131,18 @@ It can only contain the actual patterns,
 .B Translate file
 A translate file starts with a line containing the values of
 .I left_hyphen_min
-in columns 1-2,
+in columns 1\(en2,
 .I right_hyphen_min
-in columns 3-4, and either a blank or the replacement for one of the
-"hyphen" characters \`-', \`*', and \`.' in columns 5, 6, and 7. (Input
-lines are padded with blanks as for many \*(TX related programs.)
+in columns 3\(en4, and either a blank or the replacement for one of the
+"hyphen" characters \`-', \`*', and \`.' in columns 5, 6, and 7.
+(Input lines are padded with blanks as for many \*(TX related programs.)
 
 Each following line defines one \`letter': an arbitrary delimiter
 character in column 1, followed by one or more external representations
-of that character (first the \`lower' case one used for output), each
-one terminated by the delimiter and the whole sequence terminated by
-another delimiter.
+of that character
+(first the \`lower' case one used for output),
+each one terminated by the delimiter
+and the whole sequence terminated by another delimiter.
 
 If the translate file is empty, the values
 .IR left_hyphen_min "=2, " right_hyphen_min "=3,"
@@ -141,7 +163,8 @@ requests input from the user's terminal.
 First the integer values of
 .IR hyph_start " and " hyph_finish ,
 the lowest and highest hyphenation level for which patterns are to be
-generated. The value of
+generated.
+The value of
 .I hyph_start
 should be larger than any hyphenation level already present in
 .IR pattern_file .
@@ -153,8 +176,9 @@ the smallest and largest pattern length
 the weights for good and bad hyphens and a weight threshold for useful
 patterns.
 
-Finally the decision (\`y' or \`Y' vs. anything else) whether or not to
-produce a hyphenated word list.
+Finally the decision
+(\`y' or \`Y' vs.\& anything else)
+whether or not to produce a hyphenated word list.
 .\"=====================================================================
 .SH FILES
 .TP \w'@@'u+2n
@@ -178,10 +202,10 @@ patgen.web.
 Frank Liang,
 .IR "Word hy-phen-a-tion by com-puter" ,
 STAN-CS-83-977,
-Stanford University Ph.D. thesis, 1983,
+Stanford University Ph.D.\& thesis, 1983,
 http://tug.org/docs/liang.
 .PP
-Donald E. Knuth,
+Donald E.\& Knuth,
 .IR "The \*(OXbook" ,
 Addison-Wesley, 1986, ISBN 0-201-13447-0, Appendix H.
 .\"=====================================================================
@@ -191,4 +215,5 @@ Breitenlohner made a
 substantial revision in 1991 for \*(TX 3.
 The first version was published as the appendix to the
 .I \*(OXware
-technical report. Howard Trickey originally ported it to Unix.
+technical report.
+Howard Trickey originally ported it to Unix.

Reply to: