[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Re: Please review changed man-file of w3m



markus.hiereth@freenet.de wrote:
> Hello Justin,

I'm re-merging some frayed threads here, skimming through and just
answering the parts that still seem to be relevant.

>> (I've never quite understood how this is useful without -N; the only
>> way I know of skipping between arguments is via the history function.
>> Am I missing something?  Maybe it's an emacsism?)
> 
> I regard the stack of buffers as useful. It is an emacsism. You get
> the buffers listed with s and select one with arrow-up or arrow-down
> and return.l

Is that when you run w3m out of Emacs?  Not being an Emacs user I'd
never tried that.  But now I notice that the default w3m config lets
you skip backwards in the bufferlist with "delete" - I'd reassigned
that keymapping years ago when w3m acquired tabbed-browsing.
 
>>>        -B     starts with default bookmark file ~/.w3m/bookmark.html
>>
>> The convention is to use uninflected (imperative?) verb forms: "start"
>> (cf "set tab width").
> 
> OK

It has finally occurred to me that I can definitely say these verb
forms are imperatives rather than just sentence-fragment infinitives
on the grounds that people say
	-foo		use foo
	-no-foo		don't use foo [never "not use foo"]
 
>> And maybe *all* of these "user defined" options should say what the
>> default is - or just say "see FILES".
> 
> I tend to mention files containing user data with the default
> information, i.e. to mention ~/.w3m/bookmark.html here. In contrast,
> the section FILES would only contain configuration files.

The bookmarks file lives in the config directory, so I would say
FILES is entitled to mention them.

>> This reminds me that unless we mean the package or the binary (which
>> are both named in lowercase) we should be talking about W3M.
> 
> Strange. Does this uppercase W3M tnen refer to this software in
> general, i.e. the source code delivered by the w3m project? Note that
> my previous versions of the manpage 'w3m' inside text paragraphs
> already was marked with bold letters.

A lot of software has a pattern of capitalisation in the "brand name"
that's lost in the command name ("GNU Emacs", "JavaScript", "LaTeX").
W3M isn't very insistent on this, but the upstream homepage does use
the allcaps version in its title line.  It also occurs in some
dialogues and page titles like the "W3M startup page".
 
[...]
>>>        -dump  dump rendered page into STDOUT
>> 
>> Add the implied "and exit"?  Maybe even group them with -help?
> 
> I do not think it necessary to mention that the dump options imply an exit.

Probably not for each one, but there needs to be some mention
somewhere of the distinction between persistent ("pager/browser")
terminal UI and the "and-exit" modes.
 
[...]
>>>        -cols N
>>>         combined  with -dump, HTML input is rendered to lines of N char-
>>>         acters length
>>
>> This probably shouldn't say "input", and it's also true for output
>> into a pipe even without -dump.
>>
>>    -cols N
>>        for rendered HTML output to a pipe or via -dump, use line
>>        widths of N characters
> 
> I used input because this means html code from all sources.

Yes, but the trouble is that "input" tends to mean only particular
sources.

> As far as
> I can see output into a pipe only works in combination with one of the
> dump options.

I'm not sure I follow.  For me, "w3m $URL | head" looks different
from "w3m -cols 5 $URL | head" even without a -dump option.

> My problem wie "HTML output": I regard it as plain text output.

True: it should say something like "rendered output".  In fact it even
strips some of the rendering (colours, for instance).

> -cols
> acts on the HTML block element <p>, it is part of the rendering.

It acts on any element for which the renderer needs to know the width
of the terminal.

> -cols
> combined with -dump_source has no effec.t

True - another reason for saying it affects "rendered output".
 
>>>        -post file
>>>         use POST method with file content
> 
> Meanwhile, I think this option deserves as well a place among our
> mystery options. With it, w3m displays usage information, apparently
> take no notice of anything else in the command line.

It works (or at least ignores the useless "-post file") for me.  The
usage info only appears if I do "w3m -post" without an argument (or
"w3m -post $URL" without a WWW_HOME, so that it takes $URL as my post
parameter and falls over on the lack of a target).

[...]
>>    TMPDIR
>>    WWW_HOME
>>    etc
>> I don't know if many of them are worth mentioning, but WWW_HOME makes
>> a major difference to the program's usefulness.
> 
> Just mentioning these variable would be ok. But I fear the effort of
> analysing their dependencies to other configuration approaches: Why
> does w3m invoke mutt as mail user agent instead of mailx although
> there is no trace of it in .w3m/config?

In my case W3M doesn't seem to launch mutt for "mailto:"; links, which
is okay since I don't want to send mail out of my web browser anyway.
I don't see any mention of mutt in the upstream or Debian sources...

[-------------------------------suture-------------------------------]

In the options table runthrough:

>> "w3m -dump x.html" renders the page to STDOUT, which is just like what
>> "lynx -dump x.html" does; but the w3m equivalent of "lynx -dump x.html
>> | head" is simply "w3m x.html | head".  Sending stuff to STDOUT works
>> automatically; what "-dump" means is "and exit without starting the
>> "browser", which is more like a "special startup".
> 
> As mentioned above. Your strict view on the option -dump is
> correct. With notice of all the situations where data are delivered to
> STDOUT, a clear separation between special startups and filter mode
> are not possible anymore.

On further consideration there are really two kinds of "special":
 1) special targeting - options that affect W3M's target acquisition
	(it's explicit CLI argument(s), then STDIN, then -B, then -v,
	then $WWW_HOME, then error)
 2) special output - options that cause W3M to send data to STDOUT
	(and exit) instead of presenting it in a persistent
	pager/browser/viewer; the data may be w3m's usual view of its
	target(s) or may be something triggered by the option
	(-version, -dump_source, etc).

It's interesting that "w3m -v -dump_both" is a no-op!
 
>> | -cols		int	no	yes	no	yes	no	no	no	---	yes
>> 
>> The one that's like setting $COLUMNS for rendered output to STDOUT.
> 
> You wrote to export such variables. A necessary hint because first, I
> only defined only within xterm. After exporting, WWW_HOME was effective
> as you described. 

It's also possible to define them just for a single command:
 WWW_HOME=example.org w3m
 
> But with COLUMNS, the following commands and requests for the actual
> value would indicate, that it is not kept. After one invocation of
> w3m, a value set to 40 is back to the previous system value of 80.
> 
> hiereth@lune:/tmp$ export COLUMNS=40
> hiereth@lune:/tmp$ export | grep COLUMNS
> declare -x COLUMNS="40"
> hiereth@lune:/tmp$ cat 1.txt | w3m | more
> 1
> hiereth@lune:/tmp$ export | grep COLUMNS
> declare -x COLUMNS="80"
> 
> I am agin confused on which level these variables exist and are
> available.

COLUMNS and LINES are a confusing special case - the terminal tries to
keep them constantly updated.  You can disable that behaviour with
 shopt -u checkwinsize
But it turns out W3M doesn't pay any attention to COLUMNS anyway.
If you want to see something that does, try:
 dpkg -l
 COLUMNS=100 dpkg -l
 
[...]
>> Strangely, w3m -dump_source nntp://server/group/ gives me a page of
>> HTML tags (though going to nntp://server/group/msgnum gives just the
>> individual message).
> 
> Probably, this is due to a built in procedure that goes through
> directories and shows the files therein. This procedure supports
> browsing in local directories, in ftp archives. It appears that the
> gathered information is drawn into an HTML file.

Oh, so it is.  Come to think of it we haven's said anything yet about
the "pager" being able to handle "w3m ./"
  
>> | -post		string	---	---	---	---	yes	---	---	---	---
> 
>> The point of this seems to be to let you submit data through a
>> <form method="post"> without ever actually visiting that page.
>> Apparently that's a useful thing to be able to do, since lynx has
>> the similar -post_date...
> 
> Nevertheless. w3m just gives me error messages with -post. I just
> tried to use it the way it is described for lynx, i.e. with data from STDIN and the local apache2.
> 
>   cat 1.txt | w3m -post http://lune/testen/test-formular.html
> 
> The file access.log shows no POST request. Therefor, this option shall
> not be presented. Additionally, the is no documentation about the
> syntax of this file containing values for POST variables.

With lynx it's "-opt < ./file", but with w3m it needs a parameter
("string" above, though to be specific it has to be a filename).

I think I understand how it works, but I'm having trouble finding an
example of a page with a POSTable form that I could reasonably test
this on.

>> | -header	string	---	---	---	---	no*1	no	yes	---	---
>> 
>> A general-purpose override for things like user-agent strings?  
> 
> In this and all case of a used header and in others too, it is
> apparently sent supplementar. In reqlog.txt there is an extra
> line, provided the string matched the syntax var: value. 

So another case where we understand it in principle, we just don't
have an obvious use-case.
 
[...]
>> | -no-mouse	---	---	---	yes	no	no	yes	yes	yes	---
>> 
>> I suppose this exists for when you want to be able to mouseclick on a
>> web page to select text without having to worry that you might click
>> on a link.
> 
> Yes. With this option, you can select text, which is static in the
> terminal window. Otherwise, the content moves with the pressed mouse
> button and mouse movement.

You *can* select in w3m, you just need to hold shift at the same time.
  
[...]
>> | -title	?	yes	yes	yes	no	yes	yes	yes	yes	no
[...]
>> Oh! -title=screen does something different and innovatively broken.
> 
> screen is one of the filename listed in the output ov ls -R /lib/terminfo/.
> It is you who knows something about terminfo and termcap.

I know termcap is the old UNIX one and terminfo is the newer
replacement (invented in 1981!), but not much else.  "screen" has to
do some clever messing around with terminfo features since it's
effectively a "virtual terminal manager".

[...]

[-------------------------------suture-------------------------------]

> 1.
[...]
> I just had a look on 
> 
>   https://lists.debian.org/debian-l10n-english/2014/10/msg00045.html
> 
> and it is the first time that I have the consequences of the
> implementation of character markup appears on my screen.
> 
>   NNAAMMEE
> 
> In contrast
> 
>   $ man w3m | w3m -r
>   $ man w3m | w3m   
> 
> made no difference. 

Oh, I forgot to check "man w3m | w3m"!  "man" must be another piece of
software that sanitises its output automatically when it's being fed
into a pipe, so the -r makes no difference here.

> My plain text dump of the man-page draft, created with 
> 
>   groff -Tascii -man file.1 > file.1.txt 
> 
> apparently contains this markup constructions but I neved noticed it. 
> 
> use of	 in xterm   	 in tty
> less -r  bold/underlined bold/cyan
> less -R  bold/underlined bold/cyan
> less -u  no markup	 nomarkup
> less -U	 N^HN 		 N^HN

(For some reason I long ago forgot, my default pager is "most", which
I've got configured to show bold as blue and italics as red.)

> w3m  	 bold/cyan	 bold/cyan

That's odd - I get that on a TTY, but in an xterm I only see bold and
underline, no cyan.

> w3m -r	 N^HN		 N^HN
> emacs 	 N^HN		 N^HN
> nano 	 N^HN		 N^HN
> 
> With this experience, I would replace
> 
>  -r ignore underline or bolding markup constructions that use
>     backspace (e.g. in nroff)
> 
> with
> 
>  -r display markup constructions with backspace characters verbatim
>     (default is to vary font, e.g. printing bold or underlined)

You're right, I hadn't noticed W3M does this "backwards".  "Verbatim"
is a bit confusing - which is more verbatim, "^H" or a literal
backspace-and-doublestrike? - so maybe we should copy less's phrasing
more directly:

   -r use caret notation to display backspace characters in nroff-style
      markup (default is to vary font, e.g. printing bold or underlined)

Oh, wait, another test shows me that w3m supports ANSI colour escapes
too, and again -r converts the special characters into caret notation
(in this case ESCAPE to "^[").

   -r use caret notation to display special escape characters in text
      (such as ANSI escapes or nroff-style backspaces) instead of
      processing them as colored or otherwise highlighted text.

[...]
> 3. I forgot to introduce a section ENVIRONMENT VARIABLES 
> 
> WWW_HOME is certainly worth being mentioned. What else?

I don't know if there are any others that matter.  Mind you, with all
the surprises I've found when I doublecheck how w3m behaves, I begin
to daydream that setting CENTURY=21 might deactivate Gopher support in
favour of CSS3.
-- 
JBR	with qualifications in linguistics, experience as a Debian
	sysadmin, and probably no clue about this particular package


Reply to: