[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#918002: patch including salvaged .html's



Package: xml2
Version: 0.5-4
Followup-For: Bug #918002
Control: tags -1 patch

Dear Maintainer,

here is a patch adding the html docs saved from archive.org. Including it would
be trivial.
diff --git a/debian/html/examples.html b/debian/html/examples.html
new file mode 100644
index 0000000..ccf3286
--- /dev/null
+++ b/debian/html/examples.html
@@ -0,0 +1,147 @@
+<html><head>
+<meta http-equiv="content-type" content="text/html; charset=Big5">
+    <title>xml2: Examples</title>
+    <link rel="stylesheet" type="text/css" href="style.css">
+  </head>
+  <body>
+<h1>Examples</h1>
+
+<p>Common GNU tools (<em>wget</em>, <em>grep</em>, ...) are assumed.</p>
+
+<h2>XML</h2>
+
+<h4>Use the <a href="http://slashdot.org/";>Slashdot</a> backend.</h4>
+
+<pre>% <b>wget -q -O - http://slashdot.org/slashdot.xml | xml2</b>
+/backslash/@xmlns:backslash=http://slashdot.org/backslash.dtd
+/backslash/story/title=More on Athlon Overclocking
+/backslash/story/url=http://slashdot.org/articles/00/03/04/1441248.shtml
+/backslash/story/time=2000-03-05 03:40:47
+/backslash/story/author=Hemos
+/backslash/story/department=better-faster-strong
+/backslash/story/topic=amd
+/backslash/story/comments=56
+/backslash/story/section=articles
+/backslash/story/image=topicamd.gif
+/backslash/story
+/backslash/story/title=New Atari Jaguar Game Running $1,225 on eBay
+/backslash/story/url=http://slashdot.org/articles/00/03/02/1430232.shtml
+<b>...</b>
+</pre>
+
+<h4>Now, just the headlines.</h4>
+
+<pre>% <b>wget -q -O - http://slashdot.org/slashdot.xml | xml2 | 
+  grep story/title= | cut -d= -f 2-</b>
+More on Athlon Overclocking
+New Atari Jaguar Game Running $1,225 on eBay
+AT&amp;T;'s Korn Shell Source Code Released
+TheBench.org: Community Cartooning
+OpenGL for Palm OS Environment
+Banner Ads on Your Cell Phone
+Burning Money on Open Source
+Embedded OpenBSD Running the Stallion ePipe
+Bezos Responds to Tim O'Reilly's Open Letter
+Update on 'Blame Canada' and the Oscars
+</pre>
+
+<h4>How big is the Red Hat 6.1 libxml RPM?</h4>
+
+(For variety, we use <em>awk</em> rather than <em>grep</em> and <em>cut</em>.)
+
+<pre>% <b>wget -q -O - http://rpmfind.net/linux/RDF/redhat/6.1/i386/libxml-1.4.0-1.i386.rdf | 
+  xml2 | awk -F= '/RPM:Size/ {print $2}'</b>
+704399
+</pre>
+
+<h4>What is the melting point of silicon?</h4>
+
+More awkitude.  Don't let your CPU get hotter than this!
+
+<pre>% <b>wget -q -O - http://metalab.unc.edu/xml/examples/periodic_table/allelements.xml | 
+  xml2 | awk '/ATOM\/NAME=Silicon/,!/ATOM\//' | 
+         awk -F\= '/MELTING_POINT/ {print $2}'</b>
+Kelvin
+1683
+</pre>
+
+<em>(1683�K is 2570�F, by the way.)</em>
+
+<h2>HTML</h2>
+
+<h4>Fetch the <a href="http://web.archive.org/web/20160322165048/http://slashdot.org/";>Slashdot</a> news page.</h4>
+
+You'll probably see some warnings.  (Slashdot has some of the worst HTML I've
+ever seen...)
+
+<pre>% <b>wget -q -O - http://slashdot.org/ | html2</b>
+/html/head/title=Slashdot:News for Nerds. Stuff that Matters.
+/html/head=
+/html=
+/html/body/@bgcolor=#000000
+/html/body/@text=#000000
+/html/body/@link=#006666
+/html/body/@vlink=#000000
+/html/body=
+/html/body/center/a/@href=http://209.207.224.220/redir.pl?1789
+/html/body/center/a/@target=_top
+<b>...</b>
+</pre>
+
+<h4>Find all the links.</h4>
+
+If you find the warnings distracting, redirect the standard error of 
+<em>html2</em> to /dev/null.
+
+<pre>% <b>wget -q -O - http://slashdot.org/ | html2 | grep 'a/@href' | 
+  cut -d\= -f 2- | sort | uniq</b>
+/about.shtml
+/advertising.shtml
+/article.pl?sid=99/03/31/0137221
+/article.pl?sid=99/04/25/1438249
+/article.pl?sid=99/04/27/0310247
+/article.pl?sid=99/04/29/0124247
+/article.pl?sid=99/08/24/1327256&amp;mode;=thread
+/awards.shtml
+/cheesyportal.shtml
+/code.shtml
+<b>...</b>
+</pre>
+
+<h4>Change some colors.</h4>
+
+This pipeline uses both <em>html2</em> and <em>2html</em> to effect a 
+round-trip.  In the middle, <em>sed</em> applies a transformation, turning
+the background of every colored table on the page yellow.  Yuck, huh?
+
+<pre>% <b>wget -q -O - http://slashdot.org/ | 
+  html2 | sed 's|table/@bgcolor=\(.*\)$|table/@bgcolor=yellow|' | 
+  2html &gt; slashdot.html</b>
+% <b>netscape slashdot.html</b>
+</pre>
+
+<h4>Strip JavaScript from a 
+<a href="http://web.archive.org/web/20160322165048/http://www.geocities.com/SiliconValley/Peaks/5957/xml.html";>Geocities 
+home page</a>.</h4>
+
+Geocities uses JavaScript to create an annoying little brand popup in the 
+corner of their members' home pages.  Let's delete it.
+
+<pre>% <b>wget -q -O - http://www.geocities.com/SiliconValley/Peaks/5957/xml.html | 
+  html2 | grep -vi '^[^=]*/script[/=]' | 
+  2html &gt; xml.html</b>
+% <b>netscape xml.html</b>
+</pre>
+
+  
+
+<hr>
+<a href="">XML/Unix Processing Tools</a>
+<!--
+     FILE ARCHIVED ON 16:50:48 Mar 22, 2016 AND RETRIEVED FROM THE
+     INTERNET ARCHIVE ON 07:47:16 Feb 10, 2024.
+     JAVASCRIPT APPENDED BY WAYBACK MACHINE, COPYRIGHT INTERNET ARCHIVE.
+
+     ALL OTHER CONTENT MAY ALSO BE PROTECTED BY COPYRIGHT (17 U.S.C.
+     SECTION 108(a)(3)).
+-->
diff --git a/debian/html/ref.html b/debian/html/ref.html
new file mode 100644
index 0000000..6cc705f
--- /dev/null
+++ b/debian/html/ref.html
@@ -0,0 +1,205 @@
+<html><head>
+<meta http-equiv="content-type" content="text/html; charset=windows-1252">
+    <title>xml2: Reference</title>
+    <link rel="stylesheet" type="text/css" href="style.css">
+  </head>
+  <body>
+ 
+    <h1>XML/Unix Processing Tools Documentation</h1>
+
+    <h3>Usage</h3>
+
+    <p>There are six tools.  They are all simple filters, reading information
+    from standard input in one format and writing the same information to
+    standard output in a different format.</p>
+
+    <table>
+    <tbody><tr><th>Tool name</th><th>Input</th><th>Output</th>
+    </tr><tr><td>xml2</td>     <td>XML</td>  <td>Flat</td>
+    </tr><tr><td>html2</td>    <td>HTML</td> <td>Flat</td>
+    </tr><tr><td>csv2</td>     <td>CSV</td> <td>Flat</td>
+    </tr><tr><td>2xml</td>     <td>Flat</td> <td>XML</td>
+    </tr><tr><td>2html</td>    <td>Flat</td> <td>HTML</td>
+    </tr><tr><td>2csv</td>     <td>Flat</td> <td>CSV</td>
+    </tr></tbody></table>
+
+    <p>The ``Flat'' format is specific to these tools.  It is a syntax for
+    representing structured markup in a way that makes it easy to process with
+    line-oriented tools.  The same format is used for HTML, XML, and CSV; in
+    fact, you can think of <em>html2</em> as converting HTML to XHTML and
+    running <em>xml2</em> on the result; likewise <em>2html</em> and
+    <em>2xml</em>.</p>
+
+    <p>CSV (comma-separated value) files are less expressive than XML or HTML
+    (CSV has no hierarchy), so <em>xml2 | 2csv</em> is a lossy conversion.</p>
+
+    <h3>File Format</h3>
+
+    <p>To use these tools effectively, it's important to understand the ``Flat''
+    format.  Unfortunately, I'm lazy and sloppy; rather than provide a precise 
+    definition of the relationship between XML and ``Flat'', I will simply 
+    give you a pile of examples and hope you can generalize correctly.
+    (Good luck!)</p>
+
+    <table>
+    <tbody><tr><th>XML</th><th>Flat equivalent</th>
+
+    </tr><tr>
+    <td>&lt;thing/&gt;</td>
+    <td>/thing</td>
+
+    </tr><tr><td colspan="2"><hr></td>
+    </tr><tr>
+    <td>&lt;thing&gt;&lt;subthing/&gt;&lt;/thing&gt;</td>
+    <td>/thing/subthing</td>
+
+    </tr><tr><td colspan="2"><hr></td>
+    </tr><tr>
+    <td>&lt;thing&gt;stuff&lt;/thing&gt;</td>
+    <td>/thing=stuff</td>
+
+    </tr><tr><td colspan="2"><hr></td>
+    </tr><tr>
+    <td>
+      &lt;thing&gt;<br>
+    <span class="items">
+      &lt;subthing&gt;substuff&lt;/subthing&gt;<br>
+      stuff<br>
+    </span>
+      &lt;/thing&gt;<br>
+    </td>
+    <td>
+      /thing/subthing=substuff<br>
+      /thing=stuff<br>
+    </td>
+
+    </tr><tr><td colspan="2"><hr></td>
+    </tr><tr>
+    <td>
+      &lt;person&gt;<br>
+    <span class="items">
+      &lt;name&gt;Juan Do�t;/name&gt;<br>
+      &lt;occupation&gt;Zillionaire&lt;/occupation&gt;<br>
+      &lt;pet&gt;Dogcow&lt;/pet&gt;<br>
+      &lt;address&gt;<br>
+    <span class="items">
+        123 Camino Real<br>
+        &lt;city&gt;El Dorado&lt;/city&gt;<br>
+        &lt;state&gt;AZ&lt;/state&gt;<br>
+        &lt;zip&gt;12345&lt;/zip&gt;<br>
+    </span>
+      &lt;/address&gt;<br>
+      &lt;important/&gt;<br>
+    </span>
+      &lt;/person&gt;<br>
+    </td>
+    <td>
+      /person/name=Juan Do�r>
+      /person/occupation=Zillionaire<br>
+      /person/pet=Dogcow<br>
+      /person/address=123 Camino Real<br>
+      /person/address/city=El Dorado<br>
+      /person/address/state=AZ<br>
+      /person/address/zip=12345<br>
+      /person/important
+    </td>
+
+    </tr><tr><td colspan="2"><hr></td>
+    </tr><tr>
+    <td>
+      &lt;collection&gt;<br>
+    <span class="items">
+        &lt;group&gt;<br>
+    <span class="items">
+          &lt;thing&gt;stuff&lt;/thing&gt;<br>
+          &lt;thing&gt;stuff&lt;/thing&gt;<br>
+    </span>
+        &lt;/group&gt;<br>
+    </span>
+      &lt;/collection&gt;<br>
+    </td>
+    <td>
+      /collection/group/thing=stuff<br>
+      /collection/group/thing<br>
+      /collection/group/thing=stuff<br>
+    </td>
+
+    </tr><tr><td colspan="2"><hr></td>
+    </tr><tr>
+    <td>
+      &lt;collection&gt;<br>
+    <span class="items">
+        &lt;group&gt;<br>
+    <span class="items">
+          &lt;thing&gt;stuff&lt;/thing&gt;<br>
+    </span>
+        &lt;/group&gt;<br>
+        &lt;group&gt;<br>
+    <span class="items">
+          &lt;thing&gt;stuff&lt;/thing&gt;<br>
+    </span>
+        &lt;/group&gt;<br>
+    </span>
+      &lt;/collection&gt;<br>
+    </td>
+    <td>
+      /collection/group/thing=stuff<br>
+      /collection/group<br>
+      /collection/group/thing=stuff<br>
+    </td>
+
+    </tr><tr><td colspan="2"><hr></td>
+    </tr><tr>
+    <td>
+      &lt;thing&gt;<br>
+    <span class="items">
+        stuff<br>
+        <br>
+        more stuff<br>
+        &amp;lt;other stuff&amp;gt;
+    </span>
+      &lt;/thing&gt;<br>
+    </td>
+    <td>
+      /thing=stuff<br>
+      /thing=<br>
+      /thing=more stuff<br>
+      /thing=&lt;other stuff&gt;<br>
+    </td>
+
+    </tr><tr><td colspan="2"><hr></td>
+    </tr><tr>
+    <td>&lt;thing flag="value"&gt;stuff&lt;/thing&gt;</td>
+    <td>
+      /thing/@flag=value<br>
+      /thing=stuff<br>
+    </td>
+
+    </tr><tr><td colspan="2"><hr></td>
+    </tr><tr>
+    <td>
+      &lt;?processing instruction?&gt;<br>
+      &lt;thing/&gt;
+    </td><td>
+      /?processing=instruction<br>
+      /thing<br>
+    </td>
+
+    </tr></tbody></table>
+
+    <p>(TO DO: Add equivalent examples for CSV files.)</p>
+
+    <hr>
+    <a href="">XML/Unix Processing Tools</a>
+  
+
+
+</body></html>
+<!--
+     FILE ARCHIVED ON 03:46:55 Aug 12, 2016 AND RETRIEVED FROM THE
+     INTERNET ARCHIVE ON 06:30:49 Feb 10, 2024.
+     JAVASCRIPT APPENDED BY WAYBACK MACHINE, COPYRIGHT INTERNET ARCHIVE.
+
+     ALL OTHER CONTENT MAY ALSO BE PROTECTED BY COPYRIGHT (17 U.S.C.
+     SECTION 108(a)(3)).
+-->
diff --git a/debian/html/style.css b/debian/html/style.css
new file mode 100644
index 0000000..71dce1b
--- /dev/null
+++ b/debian/html/style.css
@@ -0,0 +1,22 @@
+BODY { color: black; background-color: white; background-image: none; }
+H1 { color: #0000FF; }
+H2 { color: #0000BF; }
+H3 { color: #00009F; }
+H4 { color: #00009F; }
+# A:link     { color: #004080; }
+# A:visited  { color: #804080; }
+.heading { color: #00009F; font-weight: bold; }
+.divider { margin-top: 6pt; font-weight: bold; }
+.items   { margin-left: 10pt; }
+
+TH { background: lightgrey; font-weight: bold; text-align: center; }
+TD { vertical-align: top; }
+
+/*
+     FILE ARCHIVED ON 17:00:13 Mar 22, 2016 AND RETRIEVED FROM THE
+     INTERNET ARCHIVE ON 06:30:51 Feb 10, 2024.
+     JAVASCRIPT APPENDED BY WAYBACK MACHINE, COPYRIGHT INTERNET ARCHIVE.
+
+     ALL OTHER CONTENT MAY ALSO BE PROTECTED BY COPYRIGHT (17 U.S.C.
+     SECTION 108(a)(3)).
+*/
diff --git a/debian/xml2.docs b/debian/xml2.docs
new file mode 100644
index 0000000..7547334
--- /dev/null
+++ b/debian/xml2.docs
@@ -0,0 +1 @@
+debian/html/

Reply to: