[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Convert HTML document to use relative links ?



Owen Townend wrote:
2008/9/3 Andre Majorel <aym-naibed@teaser.fr>:
Is there is program to make all links relative in HTML documents
saved in wget -x fashion ? (http://foo.com/a/b.html saved as
./foo.com/a/b.html.)

For example,

- if ./foo.com/a/b.html contains <img src="/images/d.jpg">
 and                             ./foo.com/images/d.jpg
 exists, replace that tag     <img src="../images/d.jpg">

- if ./foo.com/a/b.html contains <a href="http://bar.org/c.html";>
 and                                          ./bar.org/c.html
 exists, replace that tag by     <a href="../../bar.org/c.html">

I know about wget -k and it doesn't do what I need. My goal is use
wget or some such to have an exact mirror of the web site and then
make a _copy_ of the mirror that can be navigated off-line.

One way to do this which would save downloading twice might be
something like this:

1) wget from foo.com to bar.local as exact mirror
2) apache virtual host for the exact mirror as foo.com
3) temporary hosts line/dns entry either on bar.local or your
workstation aliasing foo.com to bar.local
4) wget -k foo.com would pull from local exact copy as a local relative mirror.


An easier way would be to just run wget with the -k and -nc options from the main site. Just make sure you are in the same starting directory when you ran the original command. wget will not download any file that is already present, but will instead read it locally from the disk and make the link conversions.

Better make a backup though, in case something goes wrong.


--

If you can't explain it simply, you don't understand it well enough.
                                       -- Albert Einstein


Reply to: