Re: Problems with wget

To: tomas@tuxteam.de
Cc: debian-user <debian-user@lists.debian.org>
Subject: Re: Problems with wget
From: Richard Owlett <rowlett@cloud85.net>
Date: Mon, 26 Feb 2018 07:00:24 -0600
Message-id: <[🔎] 85a5a7bf-73fb-58b7-54ae-970143087654@cloud85.net>
In-reply-to: <[🔎] 20180226125044.GD19876@tuxteam.de>
References: <[🔎] c3ccc372-efcb-b3b2-0a99-b738a35cd398@cloud85.net> <[🔎] 20180226125044.GD19876@tuxteam.de>

On 02/26/2018 06:50 AM, tomas@tuxteam.de wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Mon, Feb 26, 2018 at 06:40:02AM -0600, Richard Owlett wrote:

I'm attempting to download a site which is an instruction manual.
Its URL is of the form
    http://example.com/index.html
That page has several lines whose target URLs are of form
    http://example.com/page1.html
    http://example.com/page2.html
    http://example.com/page3.html
   etc.

I wish a single HTML file consisting of all the pages of the site.
Where <http://example.com/index.html> points to
<http://example.com/pageN.html> I wish my local file to have
appropriate internal references.
There are references of form
    http://some_where_else.com/pagex.html
which I do not wish to download.



I tried
wget  -l 2 -O owl.html ‐‐no-parent http://example.com/index.html
It *almost* worked as intended.
I did get all the text of the site.

HOWEVER:
   1. I also got the text of <http://some_where_else.com/pagex.html>
   2. Where <http://example.com/index.html> referenced
      <http://example.com/pageN.html> there were still references to
      the original site rather than a relative link within owl.html .



Ad (1): this is strange. By default wget doesn't "span" hosts,
   i.e. doesn't follow links to other hosts unless you specify
   that with -H (--span-hosts).

Ad (2) you want option -k. Quoth the man page:

        -k
        --convert-links
            After the download is complete, convert the links in
            the document to make them suitable for local viewing...



However, that usage causes wget to complain saying:

Cannot specify both -k or --convert-file-only and -O if multiple URLs are given, or in combination
with -p or -r. See the manual for details.


<frown;>

Reply to:

Follow-Ups:
- Re: Problems with wget
  - From: tomas@tuxteam.de

References:
- Problems with wget
  - From: Richard Owlett <rowlett@cloud85.net>
- Re: Problems with wget
  - From: <tomas@tuxteam.de>

Prev by Date: Re: Problems with wget
Next by Date: Re: Problems with wget
Previous by thread: Re: Problems with wget
Next by thread: Re: Problems with wget
Index(es):
- Date
- Thread