[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Link checker



On Tue, Jul 17, 2001 at 11:23:45PM +0200, Denis Barbier wrote:
> On Mon, Jul 09, 2001 at 12:28:13PM -0400, James A. Treacy wrote:
> [... some words about linklint ...]
> > Could you send a message here when it is packaged? I'll recompile it
> > and install it on klecker.
> 
> It is in.

I have installed it on klecker and done some tests. A few remarks.

Linklint does not do the entire web site in just a few minutes. The
default is to stop after 500 pages. I had it do 10000 pages (still
not the whole site) and that took quite a while (can time it later).

By default, external links are not checked. These are where all the
time is spent. If the current program (urlchecker.py) only did internal
links, it would go through the entire site in a jiffy. I just started
linklint checking all the external links (something like 26000 of them (*))
and it looks like it may be slower than the current program. As with
the current program it defaults to timing out after 15 seconds.
In the last 10 minutes it has gone through a whopping 280 external
sites.

Sure linklint tells you what the bad links are, but it doesn't tell you
which page they are on. I have had complaints about the output from
urlchecker.py, but everything you need is right there: each page is
listed, followed by the bad links (indented) including the error.
Links in urlchecker.py are cached so each one is only checked once.
Only multithreading would speed it up.

I initially considered urlchecker.py to be a quick hack. With each
link checker program I try, though, my opinion of it has gone up.

(*)Don't be surprised by this number. Every bug page and link into the
list archives is considered to be an external link.

-- 
James (Jay) Treacy
treacy@debian.org



Reply to: