Re: Please review changed man-file of w3m

To: debian-l10n-english@lists.debian.org
Cc: markus.hiereth@freenet.de
Subject: Re: Please review changed man-file of w3m
From: Justin B Rye <justin.byam.rye@gmail.com>
Date: Sun, 26 Oct 2014 23:12:13 +0000
Message-id: <[🔎] 20141026231213.GA28245@xibalba.demon.co.uk>
Mail-followup-to: debian-l10n-english@lists.debian.org, markus.hiereth@freenet.de
In-reply-to: <[🔎] 20141026183410.GA3844@lune>
References: <[🔎] 20141023215519.GA10013@lune> <[🔎] 20141024104226.GA27381@xibalba.demon.co.uk> <[🔎] 20141026183410.GA3844@lune>
markus.hiereth@freenet.de wrote:
>>> \fBw3m\fP [-t | -r | -M | -config \fIfile\fP | -I \fIcharset\fP | -O \fIcharset\fP ]
>>> .SS 
>> 
>> This fails to mention that it takes a file argument (or STDIN), and
>> falsely implies that -t, -r etc are mutually exclusive.  You want
>> something like 
>> 
>>  w3m [-M] [-r] [-config file] [-I charset] [-O charset] [-t tab] [filename]
>> 
>> with appropriate nroff highlighting; but why mention these particular
>> options and not, for instance, "+<num>" or "-W" or "-S"?  (Oh, wait,
>> the man page claims "-S" is "squeeze blanks", but no, that's "-s".)
> 
> I'm still undecided whether it makes sense to have the options
> explicitely listed instead of "[OPTIONS]" in the synopsis. 
> 
> The seperated treatment of a "pager mode" and a "browser" mode was
> primarily meant to stress that w3m expects data from STDIN if no
> target file is delivered. One remainig problem: I never saw
> constructions like
> 
>   cat file | w3m
> or
>   STDIN | w3m
> or
>   w3m < STDIN
> 
> in the synopsis of a manpage. 

It's true, this is more EXAMPLES material (though I wouldn't use any
of these examples; if you really want a use of cat, make it something
slightly less useless like "cat *.txt | w3m").

In a synopsis you'd just say "program [ file | URL ]..." and explain
later that without an explicit argument or arguments it falls back on
STDIN (though in fact, wait, it falls back on $WWW_HOME, and *then*
STDIN).
 
>>> browse and file view mode 
> 
>> Unclear.  Do you mean "browser mode"?  What kinds of files can w3m
>> "view" that don't already fall under "pager" or "browser" (given that
>> those may have handle fancy MIME-handling)?
> 
> My assuptions for classification:  
> 
> capability:
> Read from STDIN     -,
> Display plain text  -+--,
> Display HTML        -+--+--,
> Follow URLs         -+--+--+--,
>                      |  |  |  |
>                      v  v  v  v
> 
> typical pager        y  y  n  n
> typical browser      n  n  y  y
> 

Or less contortedly,
 browserlike:	follow URLs and render HTML
 pagerlike:	read from STDIN and display it verbatim

But really it's slightly more complicated...

 if given an argument or arguments (or special startup option):
	for URLs, use "webby" mode (working out MIME types via headers)
	for path arguments, use "local" mode (relying on the filename)
 if given no argument:
	fall back on $WWW_HOME (may be a URL or path), *then* STDIN
	(with reduced ability to guess MIME types), *then* error.

"Webby" mode can still browse to .txt files, "local" mode can still
open foo.html as a rendered web page, and a single invocation of w3m
can handle separate targets in different "modes" at the same time.

(I notice that you can't say "w3m -"; that isn't accepted either as
meaning STDIN or as the file "./-".  Not that I'm claiming anyone's
ever likely to want that functionality.)

[...]
>> Also of course you can use "pager" options in "browser" mode and
>> vice versa, but I suppose this is the sort of simplification that's
>> accepted.
> 
> You're right. There is no separation between pager mode and browser
> mode options. But there are combinationss of no or of little
> use. E.g. option -N when you use w3m as a pager for STDIN data.

It could be useful for "diffing" multiple versions of one text file...
though I'll probably forget it's possible before the next time I need
it.

> Such
> background made me mention the options -I and -O in the filter mode,
> because it quite probable there that user needs control of decoding
> (maybe unspecified) incoming data and the choice of how outcoming data
> are encoded.

I'm a bit suspicious of the -I and -O modes; Lynx used to have a lot
of charset-munging features which were basically workarounds for poor
locale handling, and had a lot of Japanese-centric assumptions.  See
also the -s, -e, and -j (expired) options.
 
>>> remote target data mode 
>  
>> Technically the header-dumping and so on also work on a local URL...
>> maybe it should be something like "HTTP data mode"?  But then again
>> what exactly *does* -m do?
>  
>>> \fBw3m\fP [-m | -dump_source | -dump_head | -dump_both | -dump_extra ]  \fIURL\fP 
>  
>> At least these presumably are mutually exclusive and do deserve the
>> vertical lines.
> 
> Yes, a combination of these dump-options makes no sense. Using the "|"
> was a mistake because it stands for OR.

As an argument in favour of having at least some of the options
explicitly explained like this in the synopsis rather than summarised
along the lines of "w3m [-opts] [arg]...", note that you can't say
"w3m -rM"; it has to be "w3m -r -M".

> What do You think of "target and HTTP meta data mode"?

I don't follow; it sounds as if it means "the mode of the target and
the HTTP meta data" or something.
 
> To avoid difficult questions of how to label "modes" of usage, one
> could shorten the section "SYNOPSIS" and extend section "EXAMPLES".

Or indeed the "third option" of grouping option-flags under explicit
subheadings in the OPTIONS section.

>>> .SS 
>>> filter mode 
>>> \fBw3m\fP [ -dump | -cols \fIN\fP | -I \fIcharset\fP | -O \fIcharset\fP | -T \fItype\fP ]  [ \fIfile\fP | \fIURL\fP ]
>> 
>> "-cols N" but not "-t tab"?
> 
> I tested it:
>   -t N only affects displayed files

It only affects text that hasn't been rendered as HTML, since that
squashes whitespace anyway.  But "w3m -t 1 http://localhost/foo.txt";
affects the tabs in foo.txt.

>   -cols N only affects how html-files are rendered to STOUT 

Yes.  Odd that you can't do it "live" - I mean, except by adjusting
the actual window.

[...]
>>> and Japanese help files and an option menu and can be configured to
>>
>> Isn't your objective to create further help files?
> 
> This would be an argument to omit references to English and Japanese
> as avaible language, doesn't it?

Yes, we want to rip out the bits that were obviously written in a
previous millennium *and* the bits that might change next month!
 
>>> use either language. It will display hypertext markup language (HTML)
>>> documents containing links to files residing on the local system, as
> 
>> Again, this millennium we can just say "HTML documents", but there's a
>> bigger problem: it can display them even if they don't contain links
>> to files anywhere else at all.  This is a piece of gibberish that also
>> used to be in the package description for lynx-cur, but I managed to
>> get that fixed back in 2009 -
>> https://lists.debian.org/debian-l10n-english/2009/07/msg00054.html
>> 
>> All it's trying to say is something like
>> 
>>                        It can render local or remote web pages and
>>   follow links.
>> 
>> But once you've boiled it down to this it's fairly pointless.  That's
>> the minimum functionality required before we'll describe something as
>> a "web browser".
>> 
>>> well as files residing on remote systems. It can display HTML tables
>> 
>> The part about tables and frames is worth having, in that it makes it
>> clear that unlike Lynx it supports HTML 3.  On the other hand we
>> might mention that it has no support for CSS, JavaScript, etc.

(And let's not forget the "tabbed browsing", "graphics in an xterm",
and "iceweasel handover" selling points.)

>>> and frames.  In addition, it can be used as a "pager" in much the same
>>> manner as "more" or "less".  Current versions of \fBw3m\fP run on
>>> Unix (Solaris, SunOS, HP-UX, Linux, FreeBSD, and EWS4800) and on
>>> Microsoft Windows 9x/NT.
>  
>> How many of these operating systems still exist?
> 
> I don't know. I assumed that You regard this statement as obsolete too
> and dropped it.

If we knew enough to update it, I'd keep it - upstream might still
consider portability a major feature of w3m.
  
[...]
>>> .TP
>>> \fB-l \fIN\fR
>>> preserve N lines of STDIN input (default 10000)
>  
>> In English text that should be "10,000".  But what does it preserve
>> it against?
> 
> Is this separation with commas obligatory in English? The German
> equivalent with dots is obligatory. Introducing the comma would force
> the integer to be "translated too. A strong reason to avoid the dubios
> helpers.

It's optional for four-digit numbers but normal for five-digit ones.
Some standards allow "10 000" for international compatibility - that's
supposed to be a thin non-breaking space, but you could get away with
a plain space.
 
> I did not test this setting, assuming that additional lines could get
> lost.

"printf '%s\n' {0..99999} | w3m" seems to get all of the input even
without -l.

MANUAL.html says:

    Specify line number preserved internally when reading text/plain
    document fron standard input. Default is 10000.

which sounds as if it's talking about... I don't know, maybe guessing
how long the file is going to be?

[...]
>>> .TP
>>> \fB-v\fP
>>> allows starting with no defined input via STDIN, file or URL
>  
>> Well, that's better than the nonsensical "visual mode", but it could
>> be clearer.  For a start, "w3m -v" isn't the same as saying plain
>> "w3m" (which goes to WWW_HOME).
> 
> As far as I know there is no WWW_HOME. Without target file or STDIN
> input, w3m puts out help lines.

Several text browsers honour $WWW_HOME if it's set in the environment;
it's obscure, but I don't think it's Debian-specific.  Aha, no indeed:

http://info.cern.ch/hypertext/WWW/LineMode/Defaults/Customisation.html

[...]
>>> .TP
>>> \fB-S\fP
>>> squeeze multiple blank lines
>> 
>> No, that's -s (which upstream claim has something to do with Japanese
>> legacy charsets.)
> 
> -e, -j and -s commandline options are not accepted by w3m here. It
> seems MANUAL.html is outdated.
> 
> Would You prefer a more detailed description? E.g:
> 
>  "replaces two and more blank lines of plain text files with a single one"

more(1) seems to think that "Squeeze multiple blank lines into one"
is clear enough; cat(1) is equally brief but completely different with
"suppress repeated empty output lines"; or if you want a more explicit
model to copy there are less(1) and most(1).
 
>>> .TP
>>> \fB-W\fP
>>> toggle wrap search mode
>> 
>>  toggle wrapping in searches
> 
> OK
> 
>>> .TP
>>> \fB-title\fP
>>> use the buffer name as terminal title string. 
>>> If TERM is specified, TERM style title configuration is used
>  
>> So it needs to say something like "\fB-title\fP [=TERM]".
> 
> On my system, I do not have the means to investigate what an addition
> of =TERM does. Here, only xterm is installed and the title bar
> indicates the content w3m is about to display. In addition, I have no
> idea about all this terminal stuff.
> 
> Therefore, I would omit [=TERM]. It just confuses.
> 
> Trials of Usage: 
> 
> input:               Title appears in the terminal window's title bar: 
> w3m -title ~         yes
> w3m -title=xterm ~   yes
> w3m -title=lxterm ~  no
> w3m -title=uxterm ~  no

Well, "=rxvt" works, but it's exactly the same as =xterm...
  
>>> .TP
>>> \fB-X\fP
>>> do not use termcap init/deinit
>  
>> Termcap doesn't exist on GNU/Linux, so maybe it should say
>>   do not use terminfo/termcap init/deinit
>> or
>>   do not initialize/deinitialize the terminal
> 
> I used Your description. Though I wonder about "initialization" of the
> terminal due to w3m. Usually, the terminal already runs when w3m is
> started. Are the terminal settings affected by a program that starts
> within the terminal?

They certainly are if the program is "setterm -reverse".  I'm not sure
what difference the initialisation makes, but it's easy to spot the
difference when w3m doesn't clean up after itself.

>>> height of N pixels per line. Range of 4.0 to 64.0.
>> 
>> Ditto.
> 
> I wonder whether it makes sense to mention options with no proof of
> that they really do what was promised to be done. I found neither
> effect for the options -ppc and -ppl and neither for the
> pixel_per_character and pixel_per_line parameter in the panel for
> setting options within w3m.And this refers to xterm windows and tty
> terminals.
>
> As the output of w3m -version does not indicate that these scaling
> functions have been excluded from complilation, I would conclude that
> they are not at all correctly implemented in the source code itself.
> 
> Shall we ask the maintainer Tatsuya?

They certainly do *something*, for pages containing text in tables -
mainly "mess things up".  This makes no sense to me, since w3m seems
to cope perfectly adequately if you change the number of pixels per
character by changing your terminal font's point size.  Maybe it has
something to do with the way CJK characters come in "halfwidth" and
"fullwidth" variants?

>>> .TP
>>> \fB-dump\fP
>>> dump formatted/rendered page into STDOUT
>>> .\" bugreport 285251
>> 
>> Omit.
> 
> What is Your preference about the verbs to format and to render? I'm
> inclined to the latter and changed the text respectively.

w3.org's HTML specs talk about "rendering" throughout.

>>> .TP
>>> \fB-dump_source\fP
>>> dump source code into STDOUT
>> 
>> The SourceForge versions are clearer:
>>   Read document specified by URL and dump the source.
> 
>> 
>>> .TP
>>> \fB-dump_head\fP
>>> dump response of HEAD request into STDOUT
>> 
>>    Read document specified by URL and dump the HEAD
> 
> According the request.log file, w3m really uses a HEAD instead of a
> GET request in this context. Therefore I would not change the given
> explanation.

Oh, okay, I hadn't noticed that.

[...]
[snipping more and more as I go on]
[...]
>>> .TP
>>> Conversion of HTML content by \fBw3m\fP
>>> $ cat foo.html | w3m -dump -T text/html >foo.txt
>> 
>> More Useless Use Of Cat, plus UUO "-dump -T text/html" - you'd get the
>> same result from just:
>> 
>>   $ w3m foo.html > foo.txt
>> 
>> For a more plausible scenario try:
>> 
>>   web page rendering
>>   $ find -name "*.html" | xargs w3m | mail $USER
> 
> I tried two versions of
> 
>  $ find -name "*.html" | xargs w3m | mail $USER
> 
> Both led to error messages.

It's only going to work if you've got .html files lying around locally
and an MTA for mail to be delivered to - it works for me.

>  I think, this is to extreme in the other
> way, to ambitious.

If we want a *portable* example of w3m usefully taking multiple input
web pages, rendering them, and sending them to some other program...
well, the output end is easy enough; just make it " | pager".  But on
the input end the problem is that our reader may not have any HTML
files lying about locally, whereas if they're remote URLs it would
make more sense for w3m to fetch them itself rather than depend on an
external program on STDIN.

[...]
>> (In fact if upstream were more active I'd suggest adopting
>> $XDG_CONFIG_HOME.)
> 
> I'm afraid that w3m is an orphan. On friday, I explored how it treats
> domain data in HTTP communication and in cookies. (Our
> domain_avoid_wrong_number_of_dot parameter problem.) I got the
> impression that applying patches finally produced patchwork.
>  
>>> .TP
>>> \fC${HOME}/.w3m/keymap\fP
>>> user defined key binding, overrides default key binding
>>                    bindings;
>> The comma in all of these would be better as a semicolon (or stop).
> 
> OK
> Don't we need an additional s as well in "default key binding"? 

Oops, yes.
 
[...]
>>> Please see the MANUAL.html file distributed with \fBw3m\fP for
>>> more detailed documentation.
>> 
>> Actually this makes the man page more detailed than the MANUAL.html
>> file for a lot of stuff.
> 
> I share Your view on MANUAL.html. I aimed to improve the description
> of the options. MANUAL.html is poor where the manpage was poor too. 

There might be extra secrets we could winkle out of
/usr/share/doc/w3m/ja/MANUAL.html, if only I could even work out what
legacy characterset that's using.

> A reason for still mentioning it are the links to example files (key
> bindings) and README-files it contains.
> 
> With respect to these README-files, it would make sense to refer to
> README.mouse, README.pre_form, README.cookie etc. in the section FILES
> above, where the respective configuration files are presented.

True.  The keymap.default file is useful, too...

[...]

Sorry, no time for a second round yet!
-- 
JBR	with qualifications in linguistics, experience as a Debian
	sysadmin, and probably no clue about this particular package
Reply to:
Follow-Ups:
- Re: Please review changed man-file of w3m
  - From: Justin B Rye <justin.byam.rye@gmail.com>
- Re: Re: Please review changed man-file of w3m
  - From: markus.hiereth@freenet.de
References:
- Please review changed man-file of w3m
  - From: Markus Hiereth <post@hiereth.de>
- Re: Please review changed man-file of w3m
  - From: Justin B Rye <justin.byam.rye@gmail.com>
- Re: Please review changed man-file of w3m
  - From: markus.hiereth@freenet.de
Prev by Date: Re: Please review changed man-file of w3m
Next by Date: Re: Please review changed man-file of w3m
Previous by thread: Re: Please review changed man-file of w3m
Next by thread: Re: Please review changed man-file of w3m
Index(es):
- Date
- Thread