[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: New Debian Site



On Thu, Oct 19, 2006 at 03:01:24PM +0200, Augusto Frausin wrote:
> >Sorry, I do not know anything about web crumblers, is it something
> >similar (maybe clickable)? I use the up button of Konqueror to go up
> >but Firefox misses such a functionality IIRC.
> 
> Basically, if you go in a section then into a subsection there will be links
> saying:
> Home (link) / Section (link) / Subsection (link)
> Sort of an implementation of the site map inside each page.

This could be easily implemented on website generation (i.e. through WML) as
the header is part of that generation.

> >Ensure that all pages are easily accessible starting from the Homepage.
> >I think there are a few pages not reachable from it. Important pages
> >should not require more than 2 or 3 clicks.
> 
> How do we know which pages are or are not accessible? Should I work with
> a web crawler (spider) to see if I can track them down?

That's easy, since the website doesn't have any fancy javascript that would
break spiders you can:

a) build the site (with a local cvs copy) in dir A
b) send a spider (like wget) to retrieve a mirror site in dir B
c) compare the files in dir A vs. dir B. Those files in A but not in B are
   the ones that are not linked by another file.

Or, you can write a tool that parses the generated files in A and determines
the local paths links point to and then checks wether all the files in A are
being pointed to.

Or you can ask the operators of a heavily used mirror to provide you with the
logs of the access to the Debian web server and extract from there which
links are being visited. Those not being visited are, most probably, links
which are not being pointed to (externally or internally).

Notice that this problem is orthogonal to the problem most "link checking
tools" [1] try to fix (finding links in *external* web sites you link to but are
not available)

HTH

Javier

[1] Such as (available in Debian) checkbot, htcheck, linkchecker, linklint,
webcheck...

Attachment: signature.asc
Description: Digital signature


Reply to: