Bug#918002: patch including salvaged .html's
Package: xml2
Version: 0.5-4
Followup-For: Bug #918002
Control: tags -1 patch
Dear Maintainer,
here is a patch adding the html docs saved from archive.org. Including it would
be trivial.
diff --git a/debian/html/examples.html b/debian/html/examples.html
new file mode 100644
index 0000000..ccf3286
--- /dev/null
+++ b/debian/html/examples.html
@@ -0,0 +1,147 @@
+<html><head>
+<meta http-equiv="content-type" content="text/html; charset=Big5">
+ <title>xml2: Examples</title>
+ <link rel="stylesheet" type="text/css" href="style.css">
+ </head>
+ <body>
+<h1>Examples</h1>
+
+<p>Common GNU tools (<em>wget</em>, <em>grep</em>, ...) are assumed.</p>
+
+<h2>XML</h2>
+
+<h4>Use the <a href="http://slashdot.org/">Slashdot</a> backend.</h4>
+
+<pre>% <b>wget -q -O - http://slashdot.org/slashdot.xml | xml2</b>
+/backslash/@xmlns:backslash=http://slashdot.org/backslash.dtd
+/backslash/story/title=More on Athlon Overclocking
+/backslash/story/url=http://slashdot.org/articles/00/03/04/1441248.shtml
+/backslash/story/time=2000-03-05 03:40:47
+/backslash/story/author=Hemos
+/backslash/story/department=better-faster-strong
+/backslash/story/topic=amd
+/backslash/story/comments=56
+/backslash/story/section=articles
+/backslash/story/image=topicamd.gif
+/backslash/story
+/backslash/story/title=New Atari Jaguar Game Running $1,225 on eBay
+/backslash/story/url=http://slashdot.org/articles/00/03/02/1430232.shtml
+<b>...</b>
+</pre>
+
+<h4>Now, just the headlines.</h4>
+
+<pre>% <b>wget -q -O - http://slashdot.org/slashdot.xml | xml2 |
+ grep story/title= | cut -d= -f 2-</b>
+More on Athlon Overclocking
+New Atari Jaguar Game Running $1,225 on eBay
+AT&T;'s Korn Shell Source Code Released
+TheBench.org: Community Cartooning
+OpenGL for Palm OS Environment
+Banner Ads on Your Cell Phone
+Burning Money on Open Source
+Embedded OpenBSD Running the Stallion ePipe
+Bezos Responds to Tim O'Reilly's Open Letter
+Update on 'Blame Canada' and the Oscars
+</pre>
+
+<h4>How big is the Red Hat 6.1 libxml RPM?</h4>
+
+(For variety, we use <em>awk</em> rather than <em>grep</em> and <em>cut</em>.)
+
+<pre>% <b>wget -q -O - http://rpmfind.net/linux/RDF/redhat/6.1/i386/libxml-1.4.0-1.i386.rdf |
+ xml2 | awk -F= '/RPM:Size/ {print $2}'</b>
+704399
+</pre>
+
+<h4>What is the melting point of silicon?</h4>
+
+More awkitude. Don't let your CPU get hotter than this!
+
+<pre>% <b>wget -q -O - http://metalab.unc.edu/xml/examples/periodic_table/allelements.xml |
+ xml2 | awk '/ATOM\/NAME=Silicon/,!/ATOM\//' |
+ awk -F\= '/MELTING_POINT/ {print $2}'</b>
+Kelvin
+1683
+</pre>
+
+<em>(1683�K is 2570�F, by the way.)</em>
+
+<h2>HTML</h2>
+
+<h4>Fetch the <a href="http://web.archive.org/web/20160322165048/http://slashdot.org/">Slashdot</a> news page.</h4>
+
+You'll probably see some warnings. (Slashdot has some of the worst HTML I've
+ever seen...)
+
+<pre>% <b>wget -q -O - http://slashdot.org/ | html2</b>
+/html/head/title=Slashdot:News for Nerds. Stuff that Matters.
+/html/head=
+/html=
+/html/body/@bgcolor=#000000
+/html/body/@text=#000000
+/html/body/@link=#006666
+/html/body/@vlink=#000000
+/html/body=
+/html/body/center/a/@href=http://209.207.224.220/redir.pl?1789
+/html/body/center/a/@target=_top
+<b>...</b>
+</pre>
+
+<h4>Find all the links.</h4>
+
+If you find the warnings distracting, redirect the standard error of
+<em>html2</em> to /dev/null.
+
+<pre>% <b>wget -q -O - http://slashdot.org/ | html2 | grep 'a/@href' |
+ cut -d\= -f 2- | sort | uniq</b>
+/about.shtml
+/advertising.shtml
+/article.pl?sid=99/03/31/0137221
+/article.pl?sid=99/04/25/1438249
+/article.pl?sid=99/04/27/0310247
+/article.pl?sid=99/04/29/0124247
+/article.pl?sid=99/08/24/1327256&mode;=thread
+/awards.shtml
+/cheesyportal.shtml
+/code.shtml
+<b>...</b>
+</pre>
+
+<h4>Change some colors.</h4>
+
+This pipeline uses both <em>html2</em> and <em>2html</em> to effect a
+round-trip. In the middle, <em>sed</em> applies a transformation, turning
+the background of every colored table on the page yellow. Yuck, huh?
+
+<pre>% <b>wget -q -O - http://slashdot.org/ |
+ html2 | sed 's|table/@bgcolor=\(.*\)$|table/@bgcolor=yellow|' |
+ 2html > slashdot.html</b>
+% <b>netscape slashdot.html</b>
+</pre>
+
+<h4>Strip JavaScript from a
+<a href="http://web.archive.org/web/20160322165048/http://www.geocities.com/SiliconValley/Peaks/5957/xml.html">Geocities
+home page</a>.</h4>
+
+Geocities uses JavaScript to create an annoying little brand popup in the
+corner of their members' home pages. Let's delete it.
+
+<pre>% <b>wget -q -O - http://www.geocities.com/SiliconValley/Peaks/5957/xml.html |
+ html2 | grep -vi '^[^=]*/script[/=]' |
+ 2html > xml.html</b>
+% <b>netscape xml.html</b>
+</pre>
+
+
+
+<hr>
+<a href="">XML/Unix Processing Tools</a>
+<!--
+ FILE ARCHIVED ON 16:50:48 Mar 22, 2016 AND RETRIEVED FROM THE
+ INTERNET ARCHIVE ON 07:47:16 Feb 10, 2024.
+ JAVASCRIPT APPENDED BY WAYBACK MACHINE, COPYRIGHT INTERNET ARCHIVE.
+
+ ALL OTHER CONTENT MAY ALSO BE PROTECTED BY COPYRIGHT (17 U.S.C.
+ SECTION 108(a)(3)).
+-->
diff --git a/debian/html/ref.html b/debian/html/ref.html
new file mode 100644
index 0000000..6cc705f
--- /dev/null
+++ b/debian/html/ref.html
@@ -0,0 +1,205 @@
+<html><head>
+<meta http-equiv="content-type" content="text/html; charset=windows-1252">
+ <title>xml2: Reference</title>
+ <link rel="stylesheet" type="text/css" href="style.css">
+ </head>
+ <body>
+
+ <h1>XML/Unix Processing Tools Documentation</h1>
+
+ <h3>Usage</h3>
+
+ <p>There are six tools. They are all simple filters, reading information
+ from standard input in one format and writing the same information to
+ standard output in a different format.</p>
+
+ <table>
+ <tbody><tr><th>Tool name</th><th>Input</th><th>Output</th>
+ </tr><tr><td>xml2</td> <td>XML</td> <td>Flat</td>
+ </tr><tr><td>html2</td> <td>HTML</td> <td>Flat</td>
+ </tr><tr><td>csv2</td> <td>CSV</td> <td>Flat</td>
+ </tr><tr><td>2xml</td> <td>Flat</td> <td>XML</td>
+ </tr><tr><td>2html</td> <td>Flat</td> <td>HTML</td>
+ </tr><tr><td>2csv</td> <td>Flat</td> <td>CSV</td>
+ </tr></tbody></table>
+
+ <p>The ``Flat'' format is specific to these tools. It is a syntax for
+ representing structured markup in a way that makes it easy to process with
+ line-oriented tools. The same format is used for HTML, XML, and CSV; in
+ fact, you can think of <em>html2</em> as converting HTML to XHTML and
+ running <em>xml2</em> on the result; likewise <em>2html</em> and
+ <em>2xml</em>.</p>
+
+ <p>CSV (comma-separated value) files are less expressive than XML or HTML
+ (CSV has no hierarchy), so <em>xml2 | 2csv</em> is a lossy conversion.</p>
+
+ <h3>File Format</h3>
+
+ <p>To use these tools effectively, it's important to understand the ``Flat''
+ format. Unfortunately, I'm lazy and sloppy; rather than provide a precise
+ definition of the relationship between XML and ``Flat'', I will simply
+ give you a pile of examples and hope you can generalize correctly.
+ (Good luck!)</p>
+
+ <table>
+ <tbody><tr><th>XML</th><th>Flat equivalent</th>
+
+ </tr><tr>
+ <td><thing/></td>
+ <td>/thing</td>
+
+ </tr><tr><td colspan="2"><hr></td>
+ </tr><tr>
+ <td><thing><subthing/></thing></td>
+ <td>/thing/subthing</td>
+
+ </tr><tr><td colspan="2"><hr></td>
+ </tr><tr>
+ <td><thing>stuff</thing></td>
+ <td>/thing=stuff</td>
+
+ </tr><tr><td colspan="2"><hr></td>
+ </tr><tr>
+ <td>
+ <thing><br>
+ <span class="items">
+ <subthing>substuff</subthing><br>
+ stuff<br>
+ </span>
+ </thing><br>
+ </td>
+ <td>
+ /thing/subthing=substuff<br>
+ /thing=stuff<br>
+ </td>
+
+ </tr><tr><td colspan="2"><hr></td>
+ </tr><tr>
+ <td>
+ <person><br>
+ <span class="items">
+ <name>Juan Do�t;/name><br>
+ <occupation>Zillionaire</occupation><br>
+ <pet>Dogcow</pet><br>
+ <address><br>
+ <span class="items">
+ 123 Camino Real<br>
+ <city>El Dorado</city><br>
+ <state>AZ</state><br>
+ <zip>12345</zip><br>
+ </span>
+ </address><br>
+ <important/><br>
+ </span>
+ </person><br>
+ </td>
+ <td>
+ /person/name=Juan Do�r>
+ /person/occupation=Zillionaire<br>
+ /person/pet=Dogcow<br>
+ /person/address=123 Camino Real<br>
+ /person/address/city=El Dorado<br>
+ /person/address/state=AZ<br>
+ /person/address/zip=12345<br>
+ /person/important
+ </td>
+
+ </tr><tr><td colspan="2"><hr></td>
+ </tr><tr>
+ <td>
+ <collection><br>
+ <span class="items">
+ <group><br>
+ <span class="items">
+ <thing>stuff</thing><br>
+ <thing>stuff</thing><br>
+ </span>
+ </group><br>
+ </span>
+ </collection><br>
+ </td>
+ <td>
+ /collection/group/thing=stuff<br>
+ /collection/group/thing<br>
+ /collection/group/thing=stuff<br>
+ </td>
+
+ </tr><tr><td colspan="2"><hr></td>
+ </tr><tr>
+ <td>
+ <collection><br>
+ <span class="items">
+ <group><br>
+ <span class="items">
+ <thing>stuff</thing><br>
+ </span>
+ </group><br>
+ <group><br>
+ <span class="items">
+ <thing>stuff</thing><br>
+ </span>
+ </group><br>
+ </span>
+ </collection><br>
+ </td>
+ <td>
+ /collection/group/thing=stuff<br>
+ /collection/group<br>
+ /collection/group/thing=stuff<br>
+ </td>
+
+ </tr><tr><td colspan="2"><hr></td>
+ </tr><tr>
+ <td>
+ <thing><br>
+ <span class="items">
+ stuff<br>
+ <br>
+ more stuff<br>
+ &lt;other stuff&gt;
+ </span>
+ </thing><br>
+ </td>
+ <td>
+ /thing=stuff<br>
+ /thing=<br>
+ /thing=more stuff<br>
+ /thing=<other stuff><br>
+ </td>
+
+ </tr><tr><td colspan="2"><hr></td>
+ </tr><tr>
+ <td><thing flag="value">stuff</thing></td>
+ <td>
+ /thing/@flag=value<br>
+ /thing=stuff<br>
+ </td>
+
+ </tr><tr><td colspan="2"><hr></td>
+ </tr><tr>
+ <td>
+ <?processing instruction?><br>
+ <thing/>
+ </td><td>
+ /?processing=instruction<br>
+ /thing<br>
+ </td>
+
+ </tr></tbody></table>
+
+ <p>(TO DO: Add equivalent examples for CSV files.)</p>
+
+ <hr>
+ <a href="">XML/Unix Processing Tools</a>
+
+
+
+</body></html>
+<!--
+ FILE ARCHIVED ON 03:46:55 Aug 12, 2016 AND RETRIEVED FROM THE
+ INTERNET ARCHIVE ON 06:30:49 Feb 10, 2024.
+ JAVASCRIPT APPENDED BY WAYBACK MACHINE, COPYRIGHT INTERNET ARCHIVE.
+
+ ALL OTHER CONTENT MAY ALSO BE PROTECTED BY COPYRIGHT (17 U.S.C.
+ SECTION 108(a)(3)).
+-->
diff --git a/debian/html/style.css b/debian/html/style.css
new file mode 100644
index 0000000..71dce1b
--- /dev/null
+++ b/debian/html/style.css
@@ -0,0 +1,22 @@
+BODY { color: black; background-color: white; background-image: none; }
+H1 { color: #0000FF; }
+H2 { color: #0000BF; }
+H3 { color: #00009F; }
+H4 { color: #00009F; }
+# A:link { color: #004080; }
+# A:visited { color: #804080; }
+.heading { color: #00009F; font-weight: bold; }
+.divider { margin-top: 6pt; font-weight: bold; }
+.items { margin-left: 10pt; }
+
+TH { background: lightgrey; font-weight: bold; text-align: center; }
+TD { vertical-align: top; }
+
+/*
+ FILE ARCHIVED ON 17:00:13 Mar 22, 2016 AND RETRIEVED FROM THE
+ INTERNET ARCHIVE ON 06:30:51 Feb 10, 2024.
+ JAVASCRIPT APPENDED BY WAYBACK MACHINE, COPYRIGHT INTERNET ARCHIVE.
+
+ ALL OTHER CONTENT MAY ALSO BE PROTECTED BY COPYRIGHT (17 U.S.C.
+ SECTION 108(a)(3)).
+*/
diff --git a/debian/xml2.docs b/debian/xml2.docs
new file mode 100644
index 0000000..7547334
--- /dev/null
+++ b/debian/xml2.docs
@@ -0,0 +1 @@
+debian/html/
Reply to: