Re: Re: Please review changed man-file of w3m

To: debian-l10n-english@lists.debian.org
Subject: Re: Re: Please review changed man-file of w3m
From: markus.hiereth@freenet.de
Date: Wed, 29 Oct 2014 09:39:02 +0100
Message-id: <[🔎] 20141029083902.GA3951@lune>
In-reply-to: <[🔎] 20141026231213.GA28245@xibalba.demon.co.uk>
Hello Justin,

just for Your information: Changes in Version _3 shall as well reflect
hints and corrections of Your second mail of monday 27th.

Best Regards
Markus 

----

>>>> \fBw3m\fP [-t | -r | -M | -config \fIfile\fP | -I \fIcharset\fP | -O \fIcharset\fP ]
>>>> .SS
>>>
>>> This fails to mention that it takes a file argument (or STDIN), and
>>> falsely implies that -t, -r etc are mutually exclusive.  You want
>>> something like
>>>
>>>  w3m [-M] [-r] [-config file] [-I charset] [-O charset] [-t tab] [filename]
>>>
>>> with appropriate nroff highlighting; but why mention these particular
>>> options and not, for instance, "+<num>" or "-W" or "-S"?  (Oh, wait,
>>> the man page claims "-S" is "squeeze blanks", but no, that's "-s".)
>>
>> I'm still undecided whether it makes sense to have the options
>> explicitely listed instead of "[OPTIONS]" in the synopsis.
>>
>> The seperated treatment of a "pager mode" and a "browser" mode was
>> primarily meant to stress that w3m expects data from STDIN if no
>> target file is delivered. One remainig problem: I never saw
>> constructions like
>>
>>   cat file | w3m
>> or
>>   STDIN | w3m
>> or
>>   w3m < STDIN
>>
>> in the synopsis of a manpage.
>
>It's true, this is more EXAMPLES material (though I wouldn't use any
>of these examples; if you really want a use of cat, make it something
>slightly less useless like "cat *.txt | w3m").
>
>In a synopsis you'd just say "program [ file | URL ]..." and explain
>later that without an explicit argument or arguments it falls back on
>STDIN (though in fact, wait, it falls back on $WWW_HOME, and *then*
>STDIN).

section SYNOPSIS is now cut back zu options and target

I set WWW_HOME as a local variable in xterm. w3m ignores it.


>>>> browse and file view mode
>>
>>> Unclear.  Do you mean "browser mode"?  What kinds of files can w3m
>>> "view" that don't already fall under "pager" or "browser" (given that
>>> those may have handle fancy MIME-handling)?
>>
>> My assuptions for classification:
>>
>> capability:
>> Read from STDIN     -,
>> Display plain text  -+--,
>> Display HTML        -+--+--,
>> Follow URLs         -+--+--+--,
>>                      |  |  |  |
>>                      v  v  v  v
>>
>> typical pager        y  y  n  n
>> typical browser      n  n  y  y
>>
>
>Or less contortedly,
> browserlike:   follow URLs and render HTML
> pagerlike: read from STDIN and display it verbatim
>
>But really it's slightly more complicated...
>
> if given an argument or arguments (or special startup option):
>    for URLs, use "webby" mode (working out MIME types via headers)
>    for path arguments, use "local" mode (relying on the filename)
> if given no argument:
>    fall back on $WWW_HOME (may be a URL or path), *then* STDIN
>    (with reduced ability to guess MIME types), *then* error.
>
>"Webby" mode can still browse to .txt files, "local" mode can still
>open foo.html as a rendered web page, and a single invocation of w3m
>can handle separate targets in different "modes" at the same time.
>
>(I notice that you can't say "w3m -"; that isn't accepted either as
>meaning STDIN or as the file "./-".  Not that I'm claiming anyone's
>ever likely to want that functionality.)

I introduced this explanations on the possible sources of data


>>>> remote target data mode
>
>>> Technically the header-dumping and so on also work on a local URL...
>>> maybe it should be something like "HTTP data mode"?  But then again
>>> what exactly *does* -m do?
>
>>>> \fBw3m\fP [-m | -dump_source | -dump_head | -dump_both | -dump_extra ]  \fIURL\fP
>>
>>> At least these presumably are mutually exclusive and do deserve the
>>> vertical lines.
>>
>> Yes, a combination of these dump-options makes no sense. Using the "|"
>> was a mistake because it stands for OR.
>
>As an argument in favour of having at least some of the options
>explicitly explained like this in the synopsis rather than summarised
>along the lines of "w3m [-opts] [arg]...", note that you can't say
>"w3m -rM"; it has to be "w3m -r -M".
>
>> What do You think of "target and HTTP meta data mode"?
>
>I don't follow; it sounds as if it means "the mode of the target and
>the HTTP meta data" or something.
>
>> To avoid difficult questions of how to label "modes" of usage, one
>> could shorten the section "SYNOPSIS" and extend section "EXAMPLES".
>
>Or indeed the "third option" of grouping option-flags under explicit
>subheadings in the OPTIONS section.

With Version _3, we try it this way 

>>>> and Japanese help files and an option menu and can be configured to
>>>
>>> Isn't your objective to create further help files?
>>
>> This would be an argument to omit references to English and Japanese
>> as avaible language, doesn't it?
>
>Yes, we want to rip out the bits that were obviously written in a
>previous millennium *and* the bits that might change next month!

I erased the reference to the available languages.


>>>> use either language. It will display hypertext markup language (HTML)
>>>> documents containing links to files residing on the local system, as
>>
>>> Again, this millennium we can just say "HTML documents", but there's a
>>> bigger problem: it can display them even if they don't contain links
>>> to files anywhere else at all.  This is a piece of gibberish that also
>>> used to be in the package description for lynx-cur, but I managed to
>>> get that fixed back in 2009 -
>>> https://lists.debian.org/debian-l10n-english/2009/07/msg00054.html
>>>
>>> All it's trying to say is something like
>>>
>>>                        It can render local or remote web pages and
>>>   follow links.
>>>
>>> But once you've boiled it down to this it's fairly pointless.  That's
>>> the minimum functionality required before we'll describe something as
>>> a "web browser".
>>>
>>>> well as files residing on remote systems. It can display HTML tables
>>>
>>> The part about tables and frames is worth having, in that it makes it
>>> clear that unlike Lynx it supports HTML 3.  On the other hand we
>>> might mention that it has no support for CSS, JavaScript, etc.
>
>(And let's not forget the "tabbed browsing", "graphics in an xterm",
>and "iceweasel handover" selling points.)

These points are mentioned now too.


>>>> and frames.  In addition, it can be used as a "pager" in much the same
>>>> manner as "more" or "less".  Current versions of \fBw3m\fP run on
>>>> Unix (Solaris, SunOS, HP-UX, Linux, FreeBSD, and EWS4800) and on
>>>> Microsoft Windows 9x/NT.
>>
>>> How many of these operating systems still exist?
>>
>> I don't know. I assumed that You regard this statement as obsolete too
>> and dropped it.

>If we knew enough to update it, I'd keep it - upstream might still
>consider portability a major feature of w3m.

I wrote two further mails, hoping for support from one of the
developers.


>[...]
>>>> .TP
>>>> \fB-l \fIN\fR
>>>> preserve N lines of STDIN input (default 10000)
>>
>>> In English text that should be "10,000".  But what does it preserve
>>> it against?
>>
>> Is this separation with commas obligatory in English? The German
>> equivalent with dots is obligatory. Introducing the comma would force
>> the integer to be "translated too. A strong reason to avoid the dubios
>> helpers.

>It's optional for four-digit numbers but normal for five-digit ones.
>Some standards allow "10 000" for international compatibility - that's
>supposed to be a thin non-breaking space, but you could get away with
>a plain space.

We should use a plain space.

Correction: The German equivalent is not obligatory! Dots are
sometimes used, but more often, numbers appear in groups of three.


>> I did not test this setting, assuming that additional lines could get
>> lost.
>
>"printf '%s\n' {0..99999} | w3m" seems to get all of the input even
>without -l.
>
>MANUAL.html says:
>
>    Specify line number preserved internally when reading text/plain
>    document fron standard input. Default is 10000.
>
>which sounds as if it's talking about... I don't know, maybe guessing
>how long the file is going to be?

Changed towards the explanation in MANUAL.html


>>>> .TP
>>>> \fB-v\fP
>>>> allows starting with no defined input via STDIN, file or URL
>>
>>> Well, that's better than the nonsensical "visual mode", but it could
>>> be clearer.  For a start, "w3m -v" isn't the same as saying plain
>>> "w3m" (which goes to WWW_HOME).
>>
>> As far as I know there is no WWW_HOME. Without target file or STDIN
>> input, w3m puts out help lines.
>
>Several text browsers honour $WWW_HOME if it's set in the environment;
>it's obscure, but I don't think it's Debian-specific.  Aha, no indeed:
>
>http://info.cern.ch/hypertext/WWW/LineMode/Defaults/Customisation.html

I noticed who wrote the piece. With this manpage, we proceed
turtle-like. But on the other hand, with Your reference on Tim
Berners-Lee explanations is to be expected from a browser, I had
somehow the pleasure of grabbing a stone from the stone age of the web
age.


>[...]
>>>> .TP
>>>> \fB-S\fP
>>>> squeeze multiple blank lines
>>>
>>> No, that's -s (which upstream claim has something to do with Japanese
>>> legacy charsets.)
>>
>> -e, -j and -s commandline options are not accepted by w3m here. It
>> seems MANUAL.html is outdated.
>>
>> Would You prefer a more detailed description? E.g:
>>
>>  "replaces two and more blank lines of plain text files with a single one"

>more(1) seems to think that "Squeeze multiple blank lines into one"
>is clear enough; cat(1) is equally brief but completely different with
>"suppress repeated empty output lines"; or if you want a more explicit
>model to copy there are less(1) and most(1).

I copied the explanation given in more(1). I just noticed that we deal
with -s and -S is not accepted anymore.  

I passed a bugreport about this.


>> Therefore, I would omit [=TERM]. It just confuses.
>>
>> Trials of Usage:
>>
>> input:               Title appears in the terminal window's title bar:
>> w3m -title ~         yes
>> w3m -title=xterm ~   yes
>> w3m -title=lxterm ~  no
>> w3m -title=uxterm ~  no

>Well, "=rxvt" works, but it's exactly the same as =xterm...

I still tend to wipe out [=TERM]. 

[...]

>>>> height of N pixels per line. Range of 4.0 to 64.0.
>>>
>>> Ditto.
>>
>> I wonder whether it makes sense to mention options with no proof of
>> that they really do what was promised to be done. I found neither
>> effect for the options -ppc and -ppl and neither for the
>> pixel_per_character and pixel_per_line parameter in the panel for
>> setting options within w3m.And this refers to xterm windows and tty
>> terminals.
>>
>> As the output of w3m -version does not indicate that these scaling
>> functions have been excluded from complilation, I would conclude that
>> they are not at all correctly implemented in the source code itself.
>>
>> Shall we ask the maintainer Tatsuya?
>
>They certainly do *something*, for pages containing text in tables -
>mainly "mess things up".  This makes no sense to me, since w3m seems
>to cope perfectly adequately if you change the number of pixels per
>character by changing your terminal font's point size.  Maybe it has
>something to do with the way CJK characters come in "halfwidth" and
>"fullwidth" variants?

ppc and ppl shall be transferred to the group of options "not tested,
DO NOT USE"


>>>> .TP
>>>> \fB-dump\fP
>>>> dump formatted/rendered page into STDOUT
>>>> .\" bugreport 285251
>>>
>>> Omit.
>>
>> What is Your preference about the verbs to format and to render? I'm
>> inclined to the latter and changed the text respectively.
>
>w3.org's HTML specs talk about "rendering" throughout.

OK

>>>> Conversion of HTML content by \fBw3m\fP
>>>> $ cat foo.html | w3m -dump -T text/html >foo.txt
>>>
>>> More Useless Use Of Cat, plus UUO "-dump -T text/html" - you'd get the
>>> same result from just:
>>>
>>>   $ w3m foo.html > foo.txt
>>>
>>> For a more plausible scenario try:
>>>
>>>   web page rendering
>>>   $ find -name "*.html" | xargs w3m | mail $USER
>>
>> I tried two versions of
>>
>>  $ find -name "*.html" | xargs w3m | mail $USER
>>
>> Both led to error messages.
>
>It's only going to work if you've got .html files lying around locally
>and an MTA for mail to be delivered to - it works for me.

Later, I noticed that two mails had been created. Thus, it worked and
yielded error messages too.


>>  I think, this is to extreme in the other
>> way, to ambitious.
>
>If we want a *portable* example of w3m usefully taking multiple input
>web pages, rendering them, and sending them to some other program...
>well, the output end is easy enough; just make it " | pager".  But on
>the input end the problem is that our reader may not have any HTML
>files lying about locally, whereas if they're remote URLs it would
>make more sense for w3m to fetch them itself rather than depend on an
>external program on STDIN.

We should deliver a couple of examples showing 
- pager-like, browser-like, filter-like usage
- using with options that work on character encoding, type of content
Reply to:
References:
- Re: Please review changed man-file of w3m
  - From: Justin B Rye <justin.byam.rye@gmail.com>
Prev by Date: Re: Please review changed man-file of w3m
Next by Date: Re: Re: Please review changed man-file of w3m
Previous by thread: Re: w3m man file : option -m, document encoding
Next by thread: Re: Re: Please review changed man-file of w3m
Index(es):
- Date
- Thread