[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Use of our external site embedded into a Debian file



REF: http://db.debian.net/lurker/message/20070707.195201.8e2c00a8.en.html
Author: Varun Hiremath
Date: 2007-07-07 12:52 -700
To: 423669
CC: control, Torsten Werner
New-Topics: Processed: uscan: https support
Subject: Bug#423669: uscan: https support

We noticed a wierd usage of our SiteTruth.com site mentioned in a
Debian bug report.  Bug report #423669 apparently patched a problem
by using a link to a CGI script on our site.

We have a system that rates web pages, and as a service for webmasters,
we have a little utility, "viewer.cgi", which is used to show users how
our crawler saw a page.  Somebody stuck this into a Debian watchfile
because it can be used to read a HTTPS page via HTTP, something they needed.

But "viewer.cgi" does more than that.  It's not a transparent proxy.
It truncates pages at 1MB, parses the HTML into a tree, converts
to Unicode/UTF-8, makes all the links absolute, removes embedded
content (Javascript, Flash, etc.), and outputs the result as cleaned up
and properly indented HTML.  What you get out isn't quite what went in.
So this probably isn't what you want.

SiteTruth really shouldn't be part of some Debian build procedure.
We suggest finding some other way to read HTTPS pages with HTTP.
Wrong tool for the job.  Thanks.

				John Nagle
				SiteTruth
				http://www.sitetruth.com
				nagle@sitetruth.com



Reply to: