Bug#768817: unblock: geoip/1.6.2-2

To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: Bug#768817: unblock: geoip/1.6.2-2
From: Patrick Matthäi <pmatthaei@debian.org>
Date: Sun, 09 Nov 2014 14:34:09 +0100
Message-id: <[🔎] 20141109133409.20869.30070.reportbug@srv1.linux-dev.org>
Reply-to: Patrick Matthäi <pmatthaei@debian.org>, 768817@bugs.debian.org
Package: release.debian.org
Severity: normal
User: release.debian.org@packages.debian.org
Usertags: unblock

Hello,

on the 14.10.2014 I have recieved patches from two contributors to support the
building of the GeoIP City and ASN database in a DFSG free way, which is realy great!
Sadly I just had got time to adapt them from the beginning of the 27.10.2014, where I
thought it is enough time. After I have uploaded the changes I were told, that packages
in sid do not get a "free-unblock-ack" since jessie, fu.... I didn't get this change..

Anyway I realy hope that we can make an exception here to include more DFSG free (required)
geoip databases within jessie (an unblock request for geoip-database is after that also
required).

$ diff -Naur 1.6.2-1/ 1.6.2-2/ --exclude=.svn | diffstat
 changelog                    |   11
 control                      |    2
 geoip-bin.install            |    1
 geoip-bin.lintian-overrides  |    1
 rules                        |    2
 src/geoip-asn-csv-to-dat.cpp |  889 ++++++++++++++++++++++++++++++++++
 src/geoip-csv-to-dat.cpp     | 1122 +++++++++++++++++++++++++++++++++----------
 7 files changed, 1786 insertions(+), 242 deletions(-)



diff -Naur '--exclude=.svn' 1.6.2-1/debian/changelog 1.6.2-2/debian/changelog
--- 1.6.2-1/debian/changelog    2014-07-24 19:29:11.241627982 +0200
+++ 1.6.2-2/debian/changelog    2014-10-27 19:31:48.626784609 +0100
@@ -1,3 +1,14 @@
+geoip (1.6.2-2) unstable; urgency=low
+
+  * Add patch for geoip-csv-to-dat to add support for building GeoIP city DB.
+    Many thanks to Andrew Moise for contributing!
+  * Add and install geoip-generator-asn, which is able to build the ASN DB. It
+    is a modified version from the original geoip-generator. Much thanks for
+    contributing also to Aaron Gibson!
+  * Bump Standards-Version to 3.9.6 (no changes required).
+
+ -- Patrick Matthäi <pmatthaei@debian.org>  Mon, 27 Oct 2014 19:15:25 +0100
+
 geoip (1.6.2-1) unstable; urgency=medium
 
   * New upstream release.
diff -Naur '--exclude=.svn' 1.6.2-1/debian/control 1.6.2-2/debian/control
--- 1.6.2-1/debian/control      2014-07-24 19:29:11.245627903 +0200
+++ 1.6.2-2/debian/control      2014-10-27 19:31:48.626784609 +0100
@@ -9,7 +9,7 @@
  autotools-dev,
  dh-autoreconf,
  libtool
-Standards-Version: 3.9.5
+Standards-Version: 3.9.6
 
 Package: libgeoip1
 Section: libs
diff -Naur '--exclude=.svn' 1.6.2-1/debian/geoip-bin.install 1.6.2-2/debian/geoip-bin.install
--- 1.6.2-1/debian/geoip-bin.install    2014-07-24 19:29:11.245627903 +0200
+++ 1.6.2-2/debian/geoip-bin.install    2014-10-27 19:31:48.626784609 +0100
@@ -1,3 +1,4 @@
 debian/tmp/usr/bin/* usr/bin/
 debian/tmp/geoip-generator usr/lib/geoip/
+debian/tmp/geoip-generator-asn usr/lib/geoip/
 debian/src/v4-to-v6-layout.pl usr/lib/geoip/
diff -Naur '--exclude=.svn' 1.6.2-1/debian/geoip-bin.lintian-overrides 1.6.2-2/debian/geoip-bin.lintian-overrides
--- 1.6.2-1/debian/geoip-bin.lintian-overrides  2014-07-24 19:29:11.245627903 +0200
+++ 1.6.2-2/debian/geoip-bin.lintian-overrides  2014-10-27 19:31:48.626784609 +0100
@@ -1 +1,2 @@
 geoip-bin: hardening-no-fortify-functions usr/lib/geoip/geoip-generator
+geoip-bin: hardening-no-fortify-functions usr/lib/geoip/geoip-generator-asn
diff -Naur '--exclude=.svn' 1.6.2-1/debian/rules 1.6.2-2/debian/rules
--- 1.6.2-1/debian/rules        2014-07-24 19:29:11.241627982 +0200
+++ 1.6.2-2/debian/rules        2014-10-27 19:31:48.626784609 +0100
@@ -13,6 +13,8 @@
        # Build the build script.
        $(CXX) $(CPPFLAGS) $(LDFLAGS) -g debian/src/geoip-csv-to-dat.cpp -o debian/tmp/geoip-generator -lGeoIP \
                -I $(CURDIR)/debian/tmp/usr/include/ -L $(CURDIR)/debian/tmp/usr/lib/$(DEB_HOST_MULTIARCH)/
+       $(CXX) $(CPPFLAGS) $(LDFLAGS) -g debian/src/geoip-asn-csv-to-dat.cpp -o debian/tmp/geoip-generator-asn -lGeoIP \
+               -I $(CURDIR)/debian/tmp/usr/include/ -L $(CURDIR)/debian/tmp/usr/lib/$(DEB_HOST_MULTIARCH)/
        chrpath -d -k debian/tmp/usr/bin/geoip*
 
 override_dh_strip:
diff -Naur '--exclude=.svn' 1.6.2-1/debian/src/geoip-asn-csv-to-dat.cpp 1.6.2-2/debian/src/geoip-asn-csv-to-dat.cpp
--- 1.6.2-1/debian/src/geoip-asn-csv-to-dat.cpp 1970-01-01 01:00:00.000000000 +0100
+++ 1.6.2-2/debian/src/geoip-asn-csv-to-dat.cpp 2014-10-27 19:31:48.626784609 +0100
@@ -0,0 +1,889 @@
+/* geoip-csv-to-dat - convert a country database from CSV to GeoIP binary format
+ *
+ * Copyright (c) 2009 Kalle Olavi Niemitalo.
+ * Copyright (c) 2011 Patrick Matthäi
+ *
+ * Permission to use, copy, modify, and/or distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+
+#define _GNU_SOURCE 1
+#include <algorithm>
+#include <arpa/inet.h>
+#include <cerrno>
+#include <cstdio>
+#include <cstdlib>
+#include <cstring>
+#include <error.h>
+#include <fstream>
+#include <getopt.h>
+#include <iostream>
+#include <queue>
+#include <set>
+#include <stack>
+#include <string>
+#include <sys/socket.h>
+#include <sys/types.h>
+#include <sysexits.h>
+#include <vector>
+#include <GeoIP.h>
+#include <map>
+
+// Format of GeoIP Country database files
+// ======================================
+//
+// 1. Binary trie mapping IP addresses to countries.
+// 2. Optional unused data.
+// 3. Optional database-info block.
+// 4. Optional structure-info block.
+//
+// Binary trie
+// -----------
+//
+// The trie treats IP addresses as bit sequences and maps them to
+// numbers.  In the country database, each such number is 0xFFFF00 +
+// the the country ID that GeoIP_id_by_ipnum() would return.  The
+// meanings of country IDs are hardcoded in libGeoIP and cannot be
+// overridden by the database.
+//
+// The root node of the trie is at the beginning of the file, and the
+// other nodes then follow it. Each node has the same size and
+// consists of two little-endian pointers that correspond to the two
+// possible values of a bit.  In the country database, the pointers are
+// 24-bit, and each node is thus 6 bytes long.
+//
+// Each pointer is one of:
+// - The number that the whole lookup should return, i.e. 0xFFFF00 + id
+//   in the country database.
+// - The number of the node that should be examined next, counting from
+//   0 at the beginning of the file.  Pointing back to nodes with
+//   smaller numbers is allowed, but loops are not allowed.
+//
+// Optional unused data
+// --------------------
+//
+// The file format seems to permit extra data between the binary trie
+// and the optional blocks.
+//
+// Optional database-info block
+// ----------------------------
+//
+// Near the end of the file, there may be a three-byte tag (0x00 0x00
+// 0x00) followed by at most DATABASE_INFO_MAX_SIZE - 1 = 99 bytes of
+// text that describes the database.  GeoIP_database_info() returns
+// this text and appends a terminating '\0'.
+//
+// The GeoLite Country IPv4 database downloadable from MaxMind
+// includes this database-info block.
+//
+// Optional structure-info block
+// -----------------------------
+//
+// At the very end of the file, there may be a three-byte tag (0xFF
+// 0xFF 0xFF) followed by at most STRUCTURE_INFO_MAX_SIZE - 1 = 19
+// bytes.  The first byte is the database type,
+// e.g. GEOIP_COUNTRY_EDITION = 1 or GEOIP_COUNTRY_EDITION_V6 = 12,
+// possibly with 105 added to it.  Type-specific information then
+// follows.  There is no type-specific information for the country
+// editions.
+//
+// The GeoLite Country IPv4 database downloadable from MaxMind does
+// not include this structure-info block.
+
+namespace {
+       class binary_trie
+       {
+       public:
+               typedef uint_fast32_t edge_type;
+
+               explicit binary_trie(edge_type leaf);
+               void set_range(
+                       const uint8_t range_min[],
+                       const uint8_t range_max[],
+                       std::size_t bit_count,
+                       edge_type leaf);
+               void reorder_depth_first();
+               void reorder_in_blocks(std::size_t bytes_per_block);
+               void write_binary(std::ostream &dat_stream) const;
+               void write_segment(std::ostream &dat_stream) const;
+               void update_records();
+
+       private:
+               struct node
+               {
+                       edge_type edges[2];
+               };
+               std::vector<node> nodes;
+
+               // This could be std::vector<bool> but that seems slower.
+               typedef std::vector<uint8_t> bits_vector;
+
+               void set_range_in_node(
+                       const bits_vector *min_bits,
+                       const bits_vector *max_bits,
+                       std::size_t bit_pos,
+                       edge_type edit_node,
+                       edge_type leaf);
+               void set_range_in_edge(
+                       const bits_vector *min_bits,
+                       const bits_vector *max_bits,
+                       std::size_t bit_pos,
+                       edge_type edit_node,
+                       bool bit,
+                       edge_type leaf);
+               void reorder(
+                       const std::vector<edge_type> &old_to_new,
+                       const std::vector<edge_type> &new_to_old);
+       };
+}
+
+/** Construct a binary trie and its root node.
+ *
+ * \param leaf
+ * Both edges of the root node will initially point to this leaf.
+ * The caller should provide a value that means nothing was found.  */
+binary_trie::binary_trie(edge_type leaf)
+{
+       const node node = {{ leaf, leaf }};
+       nodes.push_back(node);
+}
+
+/** Edit the trie so it maps a range of bit sequences to the same
+ * leaf.
+ *
+ * \param range_min
+ * The first bit sequence in the range.  Eight bits are packed in each
+ * byte.  The most significant bit of the whole sequence is in the
+ * most significant bit of the first byte.
+ *
+ * \param range_max
+ * The last bit sequence in the range.
+ *
+ * \param bit_count
+ * The number of bits in both sequences.
+ *
+ * \param leaf
+ * The leaf to which all the bit sequences in the range should be
+ * mapped.  */
+void
+binary_trie::set_range(
+       const uint8_t range_min[],
+       const uint8_t range_max[],
+       std::size_t bit_count,
+       edge_type leaf)
+{
+       bits_vector min_bits(bit_count);
+       bits_vector max_bits(bit_count);
+       for (std::size_t i = 0; i < bit_count; ++i) {
+               std::size_t byte_pos = i / 8;
+               uint8_t mask = 1 << ((~i) % 8);
+               min_bits[i] = ((range_min[byte_pos] & mask) != 0);
+               max_bits[i] = ((range_max[byte_pos] & mask) != 0);
+       }
+       set_range_in_node(&min_bits, &max_bits, 0, 0, leaf);
+}
+
+/** Edit a node in the trie so it maps a range of bit sequences to the
+ * same leaf.
+ *
+ * \param min_bits
+ * The first bit sequence in the range, or NULL if unbounded.
+ *
+ * \param max_bits
+ * The last bit sequence in the range, or NULL if unbounded.
+ *
+ * \param bit_pos
+ * Which bit in the sequences corresponds to \a edit_node.
+ *
+ * \param edit_node
+ * The node to be modified.
+ *
+ * \param leaf
+ * The leaf to which all the bit sequences in the range should be
+ * mapped.  */
+void
+binary_trie::set_range_in_node(
+       const bits_vector *min_bits,
+       const bits_vector *max_bits,
+       std::size_t bit_pos,
+       edge_type edit_node,
+       edge_type leaf)
+{
+       if (!min_bits || (*min_bits)[bit_pos] == false) {
+               set_range_in_edge(min_bits,
+                                 (max_bits && (*max_bits)[bit_pos] == false)
+                                 ? max_bits : NULL,
+                                 bit_pos + 1, edit_node, false, leaf);
+       }
+       if (!max_bits || (*max_bits)[bit_pos] == true) {
+               set_range_in_edge((min_bits && (*min_bits)[bit_pos] == true)
+                                 ? min_bits : NULL,
+                                 max_bits,
+                                 bit_pos + 1, edit_node, true, leaf);
+       }
+}
+
+/** Edit an edge in the trie so it maps a range of bit sequences to
+ * the same leaf.
+ *
+ * \param min_bits
+ * The first bit sequence in the range, or NULL if unbounded.
+ *
+ * \param max_bits
+ * The last bit sequence in the range, or NULL if unbounded.
+ *
+ * \param bit_pos
+ * Which bit in the sequences corresponds to \a bit.
+ *
+ * \param edit_node
+ * The node in which the edge to be modified is located.
+ *
+ * \param bit
+ * Which edge of \a edit_node should be modified.
+ *
+ * \param leaf
+ * The leaf to which all the bit sequences in the range should be
+ * mapped.  */
+void
+binary_trie::set_range_in_edge(
+       const bits_vector *min_bits,
+       const bits_vector *max_bits,
+       std::size_t bit_pos,
+       edge_type edit_node,
+       bool bit,
+       edge_type leaf)
+{
+       // Check if the range fills this edge entirely.
+       bool entire = true;
+       if (min_bits
+           && std::find(min_bits->begin() + bit_pos, min_bits->end(),
+                        true) != min_bits->end())
+               entire = false;
+       if (max_bits
+           && std::find(max_bits->begin() + bit_pos, max_bits->end(),
+                        false) != max_bits->end())
+               entire = false;
+
+       if (entire) {
+               nodes[edit_node].edges[bit] = leaf;
+       } else {
+               edge_type next = nodes[edit_node].edges[bit];
+               if (next >= nodes.size()) {
+                       const node new_node = {{ next, next }};
+                       next = nodes.size();
+                       nodes.push_back(new_node);
+                       nodes[edit_node].edges[bit] = next;
+               }
+
+               set_range_in_node(min_bits, max_bits, bit_pos,
+                                 next, leaf);
+       }
+}
+
+/** Renumber the nodes in depth-first order.  */
+void
+binary_trie::reorder_depth_first()
+{
+       std::vector<edge_type> old_to_new, new_to_old;
+       std::stack<edge_type> depth_first;
+       old_to_new.resize(nodes.size(), -1);
+       new_to_old.reserve(nodes.size());
+       depth_first.push(0);
+       while (!depth_first.empty()) {
+               const edge_type edge = depth_first.top();
+               depth_first.pop();
+               if (edge < nodes.size()) {
+                       old_to_new[edge] = new_to_old.size();
+                       new_to_old.push_back(edge);
+                       depth_first.push(nodes[edge].edges[1]);
+                       depth_first.push(nodes[edge].edges[0]);
+               }
+       }
+       reorder(old_to_new, new_to_old);
+}
+
+/** Renumber the nodes to make lookups use CPU and disk caches more
+ * effectively.
+ *
+ * First group the nodes into blocks so that each block contains the
+ * root of a subtrie and as many levels of its descendants as will
+ * fit.  This way, after the root is paged in, the next few lookup
+ * steps need not page in anything else.  Then, sort the nodes of each
+ * block in depth-first order.  That should give each lookup almost
+ * 1/2 chance to find the next node immediately adjacent.
+ *
+ * With a block size of 1024 bytes, this renumbering reduces the time
+ * required for random lookups by about 1.1%, compared to a plain
+ * depth-first order.  However, it's still 2.3% slower than the
+ * database optimized by MaxMind.  */
+void
+binary_trie::reorder_in_blocks(
+       std::size_t bytes_per_block)
+{
+       const edge_type none = -1;
+       std::vector<edge_type> old_to_new, new_to_old;
+       ssize_t bytes_left = bytes_per_block;
+       old_to_new.resize(nodes.size(), none);
+       new_to_old.reserve(nodes.size());
+       for (edge_type subtrie = 0; subtrie < nodes.size(); ++subtrie) {
+               // If subtrie has already been added to the output,
+               // ignore it.
+               if (old_to_new[subtrie] != none)
+                       continue;
+
+               // Walk breadth-first from subtrie until we have a
+               // block full of nodes or the subtrie runs out.  Don't
+               // add these nodes immediately to the output, however.
+               // Instead just list them in nodes_in_block.
+               std::set<edge_type> nodes_in_block;
+               std::queue<edge_type> breadth_first;
+               breadth_first.push(subtrie);
+               if (bytes_left <= 0)
+                       bytes_left += bytes_per_block;
+               while (bytes_left > 0 && !breadth_first.empty()) {
+                       edge_type edge = breadth_first.front();
+                       breadth_first.pop();
+                       if (edge >= nodes.size())
+                               continue;
+
+                       // Let the last node of the block straddle the
+                       // block boundary.  That's better than making
+                       // the hotter first node do so.
+                       bytes_left -= 6;
+                       nodes_in_block.insert(edge);
+
+                       breadth_first.push(nodes[edge].edges[0]);
+                       breadth_first.push(nodes[edge].edges[1]);
+               }
+
+               // Add the nodes from nodes_in_block to the output in
+               // depth-first order.  This assumes they are all
+               // reachable from subtrie.
+               std::stack<edge_type> depth_first;
+               depth_first.push(subtrie);
+               while (!depth_first.empty()) {
+                       edge_type edge = depth_first.top();
+                       depth_first.pop();
+                       if (nodes_in_block.find(edge)
+                           == nodes_in_block.end())
+                               continue;
+
+                       old_to_new[edge] = new_to_old.size();
+                       new_to_old.push_back(edge);
+
+                       depth_first.push(nodes[edge].edges[1]);
+                       depth_first.push(nodes[edge].edges[0]);
+               }
+       }
+       reorder(old_to_new, new_to_old);
+}
+
+void
+binary_trie::reorder(
+       const std::vector<edge_type> &old_to_new,
+       const std::vector<edge_type> &new_to_old)
+{
+       std::vector<node> new_nodes;
+       new_nodes.reserve(new_to_old.size());
+       for (std::vector<edge_type>::const_iterator
+                    it = new_to_old.begin();
+            it != new_to_old.end(); ++it) {
+               node new_node;
+               for (int bit = 0; bit <= 1; ++bit) {
+                       edge_type old_edge = nodes[*it].edges[bit];
+                       if (old_edge < nodes.size())
+                               new_node.edges[bit] = old_to_new[old_edge];
+                       else
+                               new_node.edges[bit] = old_edge;
+               }
+               new_nodes.push_back(new_node);
+       }
+       swap(new_nodes, nodes);
+}
+
+/** Add the size of the trie (number of nodes) to data records*/
+void
+binary_trie::update_records()
+{
+       for (std::vector<node>::iterator it = nodes.begin();
+            it != nodes.end(); ++it) {
+       // previously, we commandeered the MSB in order to indicate which records
+       // were data records, rather than pointers to other nodes in the trie.
+       // Here, we remove that bit, and increment the record by the number of nodes,
+       // because this is how libGeoIP determines whether a node points to an entry
+       // inside the data section or another node.
+               for (int i = 0;i<1;++i) {
+                       if (it->edges[i] & 0x80000000) // msb means data record pointer
+                       {
+                               it->edges[i] = (it->edges[i] & 0x7FFFFFFF) + nodes.size();
+                       }
+               }
+       }
+}
+
+/** Write the 3 byte segment offset. **/
+void
+binary_trie::write_segment(std::ostream &dat_stream) const
+{
+       int len = nodes.size();
+       dat_stream << (char) (0xFF & len);
+       dat_stream << (char) (0xFF & (len >> 8));
+       dat_stream << (char) (0xFF & (len >> 16));
+}
+
+/** Write the trie to a stream in GeoIP binary format. */
+void
+binary_trie::write_binary(std::ostream &dat_stream) const
+{
+       for (std::vector<node>::const_iterator it = nodes.begin();
+            it != nodes.end(); ++it) {
+               union {
+                       uint8_t bytes[6];
+                       char chars[6];
+               } binary = {{
+                       (it->edges[0]      ) & 0xFF,
+                       (it->edges[0] >>  8) & 0xFF,
+                       (it->edges[0] >> 16) & 0xFF,
+                       (it->edges[1]      ) & 0xFF,
+                       (it->edges[1] >>  8) & 0xFF,
+                       (it->edges[1] >> 16) & 0xFF
+               }};
+               dat_stream.write(binary.chars, 6);
+               if (dat_stream.bad())
+                       return;
+       }
+}
+
+
+namespace {
+       void
+       v4_csv_line_to_vector(
+               const std::string line,
+               std::vector<std::string> &fields)
+       {
+               std::string buf(line);
+               std::string delim = ",";
+               std::size_t fs;
+               for(int i = 0; i<2;++i) {
+                       fs = buf.find(delim);
+                       fields.push_back(buf.substr(0,fs));
+                       buf.erase(0,fs + 1);
+               }
+               fields.push_back(buf.substr(1, buf.length() - 2));
+       }
+
+       void
+       v6_csv_line_to_vector(
+               const std::string line,
+               std::vector<std::string> & fields)
+       {
+               std::string buf(line);
+               std::string delim = ", ";
+               std::size_t fs;
+               for(int i = 0; i<3;++i) {
+                       fs = buf.rfind(delim);
+                       fields.push_back(buf.substr(fs+2, buf.length()));
+                       buf.erase(fs,buf.length());
+               }
+               fields.push_back(buf.substr(0, buf.length()));
+       }
+
+       /** Load ranges of IP addresses from a CSV-formatted stream to
+        * a trie.
+        *
+        * \param trie
+        * Load the ranges to this trie, overwriting original values.
+        *
+        * \param csv_file_name
+        * The name of the file that \a csv_stream is reading.
+        * This string is used only for error messages.
+        *
+        * \param csv_stream
+        * Load the ranges from this stream.
+        *
+        * \param address_family
+        * The type of IP addresses in the CSV data: either AF_INET * for IPv4 or AF_INET6 for IPv6.  */
+       void
+       csv_stream_to_trie_db(
+               binary_trie &trie,
+               std::string &database_segment,
+               const char *csv_file_name,
+               std::istream &csv_stream,
+               int address_family)
+       {
+               enum {
+                       V4_CSV_FIELD_MIN_DECIMAL,
+                       V4_CSV_FIELD_MAX_DECIMAL,
+                       V4_CSV_FIELD_ASNUM_DESCRIPTION,
+                       V4_CSV_FIELDS
+               };
+
+               enum {
+                       V6_CSV_FIELD_NET_BITS,
+                       V6_CSV_FIELD_MAX_TEXT,
+                       V6_CSV_FIELD_MIN_TEXT,
+                       V6_CSV_FIELD_ASNUM_DESCRIPTION,
+                       V6_CSV_FIELDS
+               };
+
+               std::string csv_line;
+               std::map<std::string,int> segment_offset;
+               std::vector<std::string> csv_fields;
+               // create a map to track which as descriptions are added already
+               int csv_line_number = 0;
+               database_segment += '\0'; // padding so that record 0 is not at the start of the db.
+               while (getline(csv_stream, csv_line)) {
+                       ++csv_line_number;
+                       std::string as;
+                       switch (address_family) {
+                       case AF_INET:
+                               v4_csv_line_to_vector(csv_line, csv_fields);
+                               if (csv_fields.size() != V4_CSV_FIELDS) {
+                                       error_at_line(EX_DATAERR, 0, csv_file_name, csv_line_number,
+                                                     "Wrong number of fields");
+                               }
+                               as = csv_fields[V4_CSV_FIELD_ASNUM_DESCRIPTION];
+                               if (segment_offset.find(as) == segment_offset.end()) { // no key found
+                                       segment_offset[as] = database_segment.length(); // start of record
+                                       database_segment += csv_fields[V4_CSV_FIELD_ASNUM_DESCRIPTION] + '\x00';
+                               }
+                               break;
+                       case AF_INET6:
+                               v6_csv_line_to_vector(csv_line, csv_fields);
+                               if (csv_fields.size() != V6_CSV_FIELDS) {
+                                       error_at_line(EX_DATAERR, 0, csv_file_name, csv_line_number,
+                                                     "Wrong number of fields");
+                               }
+                               as = csv_fields[V6_CSV_FIELD_ASNUM_DESCRIPTION];
+                               if (segment_offset.find(as) == segment_offset.end()) { // no key found
+                                       segment_offset[as] = database_segment.length(); // start of record
+                                       database_segment += csv_fields[V6_CSV_FIELD_ASNUM_DESCRIPTION] + '\x00';
+                               }
+                               break;
+                       default:
+                               abort();
+                       }
+
+                       // use the MSB to indicate that this is a data record. Later, the field
+                       // is set to the segment offset + the nubmer of nodes in the trie.
+                       const binary_trie::edge_type leaf = 0x80000000 | segment_offset[as];
+                       union {
+                               struct in_addr inet;
+                               uint8_t inetbytes[4];
+                               struct in6_addr inet6;
+                       } minaddr, maxaddr;
+
+                       switch (address_family) {
+                       case AF_INET:
+                               minaddr.inet.s_addr = htonl(atoi(csv_fields[V4_CSV_FIELD_MIN_DECIMAL].c_str()));
+                               maxaddr.inet.s_addr = htonl(atoi(csv_fields[V4_CSV_FIELD_MAX_DECIMAL].c_str()));
+                               trie.set_range(minaddr.inetbytes, maxaddr.inetbytes,
+                                              32, leaf);
+                               break;
+                       case AF_INET6:
+                               if (inet_pton(address_family, csv_fields[V6_CSV_FIELD_MIN_TEXT].c_str(), &minaddr) <= 0) {
+                                       error_at_line(EX_DATAERR, 0, csv_file_name, csv_line_number,
+                                                     "Cannot parse minimum address: %s",
+                                                     csv_fields[V6_CSV_FIELD_MIN_TEXT].c_str());
+                               }
+                               if (inet_pton(address_family, csv_fields[V6_CSV_FIELD_MAX_TEXT].c_str(), &maxaddr) <= 0) {
+                                       error_at_line(EX_DATAERR, 0, csv_file_name, csv_line_number,
+                                                     "Cannot parse maximum address: %s",
+                                                     csv_fields[V6_CSV_FIELD_MAX_TEXT].c_str());
+                               }
+                               trie.set_range(minaddr.inetbytes, maxaddr.inetbytes,
+                                              128, leaf);
+                               break;
+
+                       default:
+                               abort();
+                       }
+                       csv_fields.clear();
+               }
+               if (csv_stream.bad()) {
+                       error(EX_IOERR, errno, "%s", csv_file_name);
+               }
+       }
+
+       /** Load ranges of IP addresses from a CSV-formatted file or
+        * standard input to a trie.
+        *
+        * \param trie
+        * Load the ranges to this trie, overwriting original values.
+        *
+        * \param csv_file_name
+        * The name of the CSV file that should be read, or "-" for
+        * standard input.
+        *
+        * \param address_family
+        * The type of IP addresses in the CSV data: either AF_INET
+        * for IPv4 or AF_INET6 for IPv6.  */
+       void
+       csv_file_to_trie_db(
+               binary_trie &trie,
+               std::string &database_segment,
+               const char *csv_file_name,
+               int address_family)
+       {
+               if (std::strcmp(csv_file_name, "-") == 0) {
+                       csv_stream_to_trie_db(trie, database_segment, csv_file_name, std::cin, address_family);
+               } else {
+                       std::ifstream csv_stream(csv_file_name, std::ios::in);
+                       if (!csv_stream) {
+                               error(EX_NOINPUT, errno, "%s", csv_file_name);
+                       }
+                       csv_stream_to_trie_db(trie, database_segment, csv_file_name, csv_stream, address_family);
+               }
+       }
+
+       /** Write a GeoIP binary database to a stream.
+        *
+        * \param trie
+        * Mapping from IP addresses to country codes or other values.
+        *
+        * \param dat_file_name
+        * The name of the file that \a dat_stream is writing.
+        * This string is used only for error messages.
+        *
+        * \param dat_stream
+        * Write the database to this stream.
+        *
+        * \param database_info
+        * Copyright or other information about the database, or NULL.
+        * GeoIP_database_info() will return this string.
+        *
+        * \param address_family
+        * The type of IP addresses in the database: either AF_INET
+        * for IPv4 or AF_INET6 for IPv6.  */
+       void
+       write_dat_stream(
+               const binary_trie &trie,
+               const char *dat_file_name,
+               std::ostream &dat_stream,
+               const char *database_info,
+               std::string database_segment,
+               int address_family)
+       {
+
+               trie.write_binary(dat_stream);
+               if (dat_stream.bad()) {
+                       error(EX_IOERR, errno, "%s", dat_file_name);
+               }
+
+               // or open the file and read the length
+               if (database_segment.length() > 0) {
+                       dat_stream << database_segment;
+                       if (dat_stream.bad()) {
+                               error(EX_IOERR, errno, "%s", dat_file_name);
+                       }
+
+               }
+
+               // write the metadata section
+               if (database_info) {
+                       const char tag[3] = { 0, 0, 0 };
+                       dat_stream.write(tag, 3);
+                       dat_stream.write(database_info, std::strlen(database_info));
+                       if (dat_stream.bad()) {
+                               error(EX_IOERR, errno, "%s", dat_file_name);
+                       }
+               }
+
+               switch (address_family) {
+               case AF_INET: {
+                       const char structure_info[4] = { 0xFF, 0xFF, 0xFF, 9 };
+                       dat_stream.write(structure_info, 4);
+                       break;
+               }
+               case AF_INET6: {
+                       const char structure_info[4] = { 0xFF, 0xFF, 0xFF, 21 };
+                       dat_stream.write(structure_info, 4);
+                       break;
+               }
+               default:
+                       abort();
+               }
+
+               trie.write_segment(dat_stream);
+               if (dat_stream.bad()) {
+                       error(EX_IOERR, errno, "%s", dat_file_name);
+               }
+       }
+
+       /** Write a GeoIP binary database to a file or standard output.
+        *
+        * \param trie
+        * Mapping from IP addresses to country codes or other values.
+        *
+        * \param csv_file_name
+        * The name of the file that should be written, or "-" for
+        * standard output.
+        *
+        * \param database_info
+        * Copyright or other information about the database, or NULL.
+        * GeoIP_database_info() will return this string.
+        *
+        * \param address_family
+        * The type of IP addresses in the database: either AF_INET
+        * for IPv4 or AF_INET6 for IPv6.  */
+       void
+       write_dat_file(
+               const binary_trie &trie,
+               const char *dat_file_name,
+               const char *database_info,
+               std::string  database_segment,
+               int address_family)
+       {
+               if (std::strcmp(dat_file_name, "-") == 0) {
+                       write_dat_stream(trie, dat_file_name, std::cout,
+                                        database_info, database_segment, address_family);
+               } else {
+                       std::ofstream dat_stream(
+                               dat_file_name,
+                               std::ios::out | std::ios::binary);
+                       if (!dat_stream) {
+                               error(EX_CANTCREAT, errno, "%s", dat_file_name);
+                       }
+                       write_dat_stream(trie, dat_file_name, dat_stream,
+                                        database_info, database_segment, address_family);
+               }
+       }
+
+       struct cmdline {
+               const char *csv_file_name;
+               const char *dat_file_name;
+               int address_family;
+               const char *database_info;
+               bool verbose;
+
+               cmdline(int argc, char **argv);
+       };
+}
+
+cmdline::cmdline(int argc, char **argv):
+       csv_file_name("-"),
+       dat_file_name("-"),
+       address_family(AF_INET),
+       database_info(NULL),
+       verbose(false)
+{
+       enum {
+               OPT_HELP = -2
+       };
+
+       static const struct option long_options[] = {
+               { "info", required_argument, NULL, 'i' },
+               { "output", required_argument, NULL, 'o' },
+               { "verbose", no_argument, NULL, 'v' },
+               { "help", no_argument, NULL, OPT_HELP },
+               { NULL, 0, NULL, 0 }
+       };
+       static const char *const usage = "\
+Usage: %s [OPTION] [CSV-FILE]...\n\
+Convert a country database from CSV to GeoIP binary format.\n\
+\n\
+  -i, --info=TEXT       add copyright or other info TEXT to output\n\
+  -o, --output=FILE     write the binary data to FILE, not stdout\n\
+  -v, --verbose         show what is going on\n\
+      --help            display this help and exit\n";
+
+       for (;;) {
+               int optret = getopt_long(argc, argv, "46i:o:v", long_options, NULL);
+
+               if (optret == -1)
+                       break;
+               switch (optret) {
+               case '4':
+                       address_family = AF_INET;
+                       break;
+               case '6':
+                       address_family = AF_INET6;
+                       break;
+               case 'i':
+                       database_info = optarg;
+                       if (std::strlen(database_info) > 99) {
+                               error(EX_USAGE, 0,
+                                     "Database info must not be longer than 99 bytes");
+                       }
+                       break;
+               case 'o':
+                       dat_file_name = optarg;
+                       break;
+               case 'v':
+                       verbose = true;
+                       break;
+               case OPT_HELP:
+                       std::printf(usage, program_invocation_name);
+                       std::exit(EX_OK);
+               case '?':
+                       std::fprintf(stderr,
+                                    "Try `%s --help' for more information.\n",
+                                    program_invocation_name);
+                       std::exit(EX_USAGE);
+               default:
+                       std::abort();
+               }
+       }
+
+       if (optind < argc)
+               csv_file_name = argv[optind++];
+
+       if (optind < argc) {
+               error(EX_USAGE, 0,
+                     "Only one non-option argument is allowed");
+       }
+}
+
+int
+main(int argc, char **argv)
+{
+       cmdline cmdline(argc, argv);
+
+       std::ostream *verbose_stream;
+       if (!cmdline.verbose)
+               verbose_stream = NULL;
+       else if (strcmp(cmdline.dat_file_name, "-") == 0)
+               verbose_stream = &std::cerr;
+       else
+               verbose_stream = &std::cout;
+
+       if (verbose_stream) {
+               *verbose_stream << program_invocation_name
+                               << ": Reading CSV and building the trie"
+                               << std::endl;
+       }
+       /* Initialize the trie with a value that will point to the start of the
+        * data section, e.g. an empty record. See binary_trie::update_records()
+        */
+       binary_trie trie(0x80000000);
+       std::string database_segment;
+       csv_file_to_trie_db(trie, database_segment, cmdline.csv_file_name, cmdline.address_family);
+
+       if (verbose_stream) {
+               *verbose_stream << program_invocation_name
+                               << ": Optimizing" << std::endl;
+       }
+       trie.reorder_depth_first();
+       trie.reorder_in_blocks(1024);
+       trie.update_records();
+
+       if (verbose_stream) {
+               *verbose_stream << program_invocation_name
+                               << ": Writing output" << std::endl;
+       }
+       write_dat_file(trie, cmdline.dat_file_name, cmdline.database_info,
+                       database_segment, cmdline.address_family);
+
+       if (verbose_stream) {
+               *verbose_stream << program_invocation_name
+                               << ": All done" << std::endl; }
+}
diff -Naur '--exclude=.svn' 1.6.2-1/debian/src/geoip-csv-to-dat.cpp 1.6.2-2/debian/src/geoip-csv-to-dat.cpp
--- 1.6.2-1/debian/src/geoip-csv-to-dat.cpp     2014-07-24 19:29:11.233628140 +0200
+++ 1.6.2-2/debian/src/geoip-csv-to-dat.cpp     2014-10-27 19:31:48.626784609 +0100
@@ -2,6 +2,7 @@
  *
  * Copyright (c) 2009 Kalle Olavi Niemitalo.
  * Copyright (c) 2011 Patrick Matthäi
+ * Copyright (c) 2014 Andrew Moise
  *
  * Permission to use, copy, modify, and/or distribute this software for any
  * purpose with or without fee is hereby granted, provided that the above
@@ -29,6 +30,7 @@
 #include <iostream>
 #include <queue>
 #include <set>
+#include <sstream>
 #include <stack>
 #include <string>
 #include <sys/socket.h>
@@ -37,41 +39,98 @@
 #include <vector>
 #include <GeoIP.h>
 
-// Format of GeoIP Country database files
+// Format of GeoIP Database files
 // ======================================
 //
-// 1. Binary trie mapping IP addresses to countries.
-// 2. Optional unused data.
-// 3. Optional database-info block.
-// 4. Optional structure-info block.
+// 1. Binary trie mapping IP addresses to locations.
+// 2. Information about each location (only in city DBs).
+// 3. Optional unused data.
+// 4. Optional database-info block.
+// 5. Optional structure-info block.
 //
 // Binary trie
 // -----------
 //
 // The trie treats IP addresses as bit sequences and maps them to
-// numbers.  In the country database, each such number is 0xFFFF00 +
-// the the country ID that GeoIP_id_by_ipnum() would return.  The
-// meanings of country IDs are hardcoded in libGeoIP and cannot be
-// overridden by the database.
+// location numbers.
+//
+// In the country database format, each such number is a country ID
+// that GeoIP_id_by_ipnum() would return.  The meanings of country IDs
+// are hardcoded in libGeoIP and cannot be overridden by the database.
+//
+// In the city database format, each such number is a position to seek
+// to, within section #2 (information about each location), in order
+// to find a record giving information about the location associated
+// with that IP.
 //
 // The root node of the trie is at the beginning of the file, and the
 // other nodes then follow it. Each node has the same size and
 // consists of two little-endian pointers that correspond to the two
-// possible values of a bit.  In the country database, the pointers are
-// 24-bit, and each node is thus 6 bytes long.
+// possible values of a bit.  The pointers are 24-bit, and each node
+// is thus 6 bytes long.
+//
+// To traverse the trie, you go one bit at a time through the IP
+// address you're looking up -- starting at bit 0, and starting at the
+// root node of the trie. For each bit, you look at the current node,
+// and take the left branch (the first of the two pointers) if the bit
+// is 0, otherwise take the right branch. It is allowed for a node
+// pointer to refer back to earlier nodes in the file, but loops are
+// not allowed.
+//
+// If the pointer you're looking at is less than the total number of
+// nodes in the trie, it indicates the next node you should
+// examine. If the pointer you're looking at is greater than or equal
+// to the total number of nodes, it indicates a leaf -- the end of
+// your search.
+//
+// The meaning of the leaf pointers depends on the database type:
+//
+// In country databases, 0xFFFF?? indicates that the country ID
+// for that IP address is equal to the ?? part. 0xFFFF00 indicates
+// that the IP address you queried is not in the database.
+//
+// City databases contain an extra segment which contains location
+// information records. A leaf pointer equal to (number of nodes in
+// the trie + X) indicates that the location information for the IP
+// address you queried can be found in the location information
+// segment, at offset X. A leaf pointer exactly equal to the number of
+// nodes in the trie indicates that the location information for the
+// IP address you queried is not stored in this database.
+//
+// Location segment
+// --------------------
+//
+// City databases contain a location segment. Each record in this
+// segment contains information about a single location which IP
+// addresses may be mapped to. Pointers to records in this segment are
+// contained within leaf nodes of the trie; each record contains, in
+// this order:
+//
+// * A country ID, as a single byte
+// * A "region" (generally state or province), as a NULL-terminated string
+// * A city name, as a NULL-terminated string
+// * A postal code, as a NULL-terminated string
+// * Encoded latitude, as a little-endian 3-byte integer. To convert
+//   back to actual latitude, divide by 10000 and subtract 180.
+// * Encoded longitude, as a little-endian 3-byte integer. To convert
+//   back to actual longitude, divide by 10000 and subtract 180.
+// * Area code and metro code (ONLY if the country is US). These are
+//   encoded into a single little-endian 3-byte integer. The area code
+//   is the encoded value modulo 1000, and the metro code is the
+//   encoded value divided by 1000. If the country is not US, this
+//   field is not present.
 //
-// Each pointer is one of:
-// - The number that the whole lookup should return, i.e. 0xFFFF00 + id
-//   in the country database.
-// - The number of the node that should be examined next, counting from
-//   0 at the beginning of the file.  Pointing back to nodes with
-//   smaller numbers is allowed, but loops are not allowed.
+// The string fields may be equal to empty strings, but all fields are
+// always included (except area/metro code, which is included if and
+// only if the country is US).
+//
+// All strings seem to be in ISO-8859-1 encoding.
 //
 // Optional unused data
 // --------------------
 //
-// The file format seems to permit extra data between the binary trie
-// and the optional blocks.
+// The file format permits any amount of extra data between the binary
+// trie and the optional blocks.
 //
 // Optional database-info block
 // ----------------------------
@@ -98,11 +157,24 @@
 // The GeoLite Country IPv4 database downloadable from MaxMind does
 // not include this structure-info block.
 
+
+
+/*************************************************
+ * Binary trie
+ *
+ * This section implements a data structure representing the trie
+ * which, within a .dat file, maps IP address ranges to locations.
+ *************************************************/
+
 namespace {
        class binary_trie
        {
        public:
                typedef uint_fast32_t edge_type;
+               struct node
+               {
+                       edge_type edges[2];
+               };
 
                explicit binary_trie(edge_type leaf);
                void set_range(
@@ -112,13 +184,11 @@
                        edge_type leaf);
                void reorder_depth_first();
                void reorder_in_blocks(std::size_t bytes_per_block);
-               void write_binary(std::ostream &dat_stream) const;
+
+               std::vector<node>::iterator nodes_begin() { return nodes.begin(); }
+               std::vector<node>::iterator nodes_end() { return nodes.end(); }
 
        private:
-               struct node
-               {
-                       edge_type edges[2];
-               };
                std::vector<node> nodes;
 
                // This could be std::vector<bool> but that seems slower.
@@ -407,30 +477,30 @@
        swap(new_nodes, nodes);
 }
 
-/** Write the trie to a stream in GeoIP binary format. */
-void
-binary_trie::write_binary(std::ostream &dat_stream) const
-{
-       for (std::vector<node>::const_iterator it = nodes.begin();
-            it != nodes.end(); ++it) {
-               union {
-                       uint8_t bytes[6];
-                       char chars[6];
-               } binary = {{
-                       (it->edges[0]      ) & 0xFF,
-                       (it->edges[0] >>  8) & 0xFF,
-                       (it->edges[0] >> 16) & 0xFF,
-                       (it->edges[1]      ) & 0xFF,
-                       (it->edges[1] >>  8) & 0xFF,
-                       (it->edges[1] >> 16) & 0xFF
-               }};
-               dat_stream.write(binary.chars, 6);
-               if (dat_stream.bad())
-                       return;
-       }
+/*************************************************
+ * CSV file support
+ *
+ * This section implements reading from .csv files.
+ *************************************************/
+
+namespace {
+       /** Interface for classes interested in .csv data -- this should be
+        * implemented and then passed to csv_read_file(), which will then
+        * call read_csv_line(), providing the data in the .csv file. */
+       class csv_data_reader
+       {
+       public:
+               virtual ~csv_data_reader() {}
+
+               virtual void read_csv_line(const char *csv_file_name,
+                                          int csv_line_number,
+                                          std::vector<std::string> &fields) = 0;
+       };
 }
 
 namespace {
+       /** Convert a line from a .csv file into a vector of
+        *  tokens. For internal use by the .csv reading code. */
        void
        csv_line_to_vector(
                const std::string &line,
@@ -458,227 +528,358 @@
                fields.push_back(std::string(field.begin(), field.end()));
        }
 
-       /** Load ranges of IP addresses from a CSV-formatted stream to
-        * a trie.
+       /** Load data from a CSV-formatted stream.
         *
-        * \param trie
-        * Load the ranges to this trie, overwriting original values.
+        * \param reader
+        * The reader to call for each line of the CSV
         *
         * \param csv_file_name
         * The name of the file that \a csv_stream is reading.
         * This string is used only for error messages.
         *
         * \param csv_stream
-        * Load the ranges from this stream.
-        *
-        * \param address_family
-        * The type of IP addresses in the CSV data: either AF_INET
-        * for IPv4 or AF_INET6 for IPv6.  */
+        * The stream to read from.   */
        void
-       csv_stream_to_trie(
-               binary_trie &trie,
+       csv_read_stream(
+               csv_data_reader &reader,
                const char *csv_file_name,
-               std::istream &csv_stream,
-               int address_family)
+               std::istream &csv_stream)
        {
-               enum {
-                       CSV_FIELD_MIN_TEXT,
-                       CSV_FIELD_MAX_TEXT,
-                       CSV_FIELD_MIN_DECIMAL,
-                       CSV_FIELD_MAX_DECIMAL,
-                       CSV_FIELD_COUNTRY_CODE,
-                       CSV_FIELD_COUNTRY_NAME,
-                       CSV_FIELDS
-               };
-
                std::string csv_line;
                std::vector<std::string> csv_fields;
                int csv_line_number = 0;
                while (getline(csv_stream, csv_line)) {
                        ++csv_line_number;
                        csv_line_to_vector(csv_line, csv_fields);
-                       if (csv_fields.size() != CSV_FIELDS) {
-                               error_at_line(EX_DATAERR, 0, csv_file_name, csv_line_number,
-                                             "Wrong number of fields");
-                       }
-
-                       if (csv_fields[CSV_FIELD_COUNTRY_CODE] == "AN") {
-                               csv_fields[CSV_FIELD_COUNTRY_CODE] = "CW";
-                       }
-
-                       const int countryid = GeoIP_id_by_code(csv_fields[CSV_FIELD_COUNTRY_CODE].c_str());
-                       if (countryid == 0) {
-                               error_at_line(EX_DATAERR, 0, csv_file_name, csv_line_number,
-                                             "Unrecognized country code: %s",
-                                             csv_fields[CSV_FIELD_COUNTRY_CODE].c_str());
-                       }
-                       const binary_trie::edge_type leaf = 0xFFFF00 + countryid;
-
-                       union {
-                               struct in_addr inet;
-                               uint8_t inetbytes[4];
-                               struct in6_addr inet6;
-                       } minaddr, maxaddr;
-                       if (inet_pton(address_family, csv_fields[CSV_FIELD_MIN_TEXT].c_str(), &minaddr) <= 0) {
-                               error_at_line(EX_DATAERR, 0, csv_file_name, csv_line_number,
-                                             "Cannot parse minimum address: %s",
-                                             csv_fields[CSV_FIELD_MIN_TEXT].c_str());
-                       }
-                       if (inet_pton(address_family, csv_fields[CSV_FIELD_MAX_TEXT].c_str(), &maxaddr) <= 0) {
-                               error_at_line(EX_DATAERR, 0, csv_file_name, csv_line_number,
-                                             "Cannot parse maximum address: %s",
-                                             csv_fields[CSV_FIELD_MAX_TEXT].c_str());
-                       }
-                       switch (address_family) {
-                       case AF_INET:
-                               trie.set_range(minaddr.inetbytes, maxaddr.inetbytes,
-                                              32, leaf);
-                               break;
-                       case AF_INET6:
-                               trie.set_range(minaddr.inet6.s6_addr, maxaddr.inet6.s6_addr,
-                                              128, leaf);
-                               break;
-                       default:
-                               abort();
-                       }
+                       reader.read_csv_line(csv_file_name, csv_line_number, csv_fields);
                }
                if (csv_stream.bad()) {
                        error(EX_IOERR, errno, "%s", csv_file_name);
                }
        }
 
-       /** Load ranges of IP addresses from a CSV-formatted file or
-        * standard input to a trie.
+       /** Load data from a CSV-formatted file or standard input.
         *
-        * \param trie
-        * Load the ranges to this trie, overwriting original values.
+        * \param reader
+        * The reader to call for each line of the CSV.
         *
         * \param csv_file_name
         * The name of the CSV file that should be read, or "-" for
-        * standard input.
-        *
-        * \param address_family
-        * The type of IP addresses in the CSV data: either AF_INET
-        * for IPv4 or AF_INET6 for IPv6.  */
+        * standard input.   */
        void
-       csv_file_to_trie(
-               binary_trie &trie,
-               const char *csv_file_name,
-               int address_family)
+       csv_read_file(
+               csv_data_reader &reader,
+               const char *csv_file_name)
        {
                if (std::strcmp(csv_file_name, "-") == 0) {
-                       csv_stream_to_trie(trie, csv_file_name, std::cin, address_family);
+                       csv_read_stream(reader, csv_file_name, std::cin);
                } else {
                        std::ifstream csv_stream(csv_file_name, std::ios::in);
                        if (!csv_stream) {
                                error(EX_NOINPUT, errno, "%s", csv_file_name);
                        }
-                       csv_stream_to_trie(trie, csv_file_name, csv_stream, address_family);
+                       csv_read_stream(reader, csv_file_name, csv_stream);
                }
        }
+}
 
-       /** Write a GeoIP binary database to a stream.
-        *
-        * \param trie
-        * Mapping from IP addresses to country codes or other values.
-        *
-        * \param dat_file_name
-        * The name of the file that \a dat_stream is writing.
-        * This string is used only for error messages.
+/*************************************************
+ * .dat file support
+ *
+ * This section implements support code for writing out .dat files in
+ * Maxmind DB format.
+ *************************************************/
+
+namespace {
+
+       /** .dat file writer class
         *
-        * \param dat_stream
-        * Write the database to this stream.
+        * To write out a .dat file, construct a dat_writer, then call
+        * (in this order):
         *
-        * \param database_info
-        * Copyright or other information about the database, or NULL.
-        * GeoIP_database_info() will return this string.
+        * write_trie()
+        * write_database_info (optional)
+        * write_structure_info()
         *
-        * \param address_family
-        * The type of IP addresses in the database: either AF_INET
-        * for IPv4 or AF_INET6 for IPv6.  */
-       void
-       write_dat_stream(
-               const binary_trie &trie,
-               const char *dat_file_name,
-               std::ostream &dat_stream,
-               const char *database_info,
-               int address_family)
-       {
-               trie.write_binary(dat_stream);
-               if (dat_stream.bad()) {
-                       error(EX_IOERR, errno, "%s", dat_file_name);
-               }
-
-               if (database_info) {
-                       const char tag[3] = { 0, 0, 0 };
-                       dat_stream.write(tag, 3);
-                       dat_stream.write(database_info, std::strlen(database_info));
-                       if (dat_stream.bad()) {
-                               error(EX_IOERR, errno, "%s", dat_file_name);
-                       }
-               }
+        * Setting dat_file_name to "-" will write to standard output;
+        * otherwise, a file will be created, and closed when the
+        * dat_writer is deleted. */
 
-               switch (address_family) {
-               case AF_INET: {
-                       const char structure_info[4] = { 0xFF, 0xFF, 0xFF, 1 };
-                       dat_stream.write(structure_info, 4);
-                       break;
-               }
-               case AF_INET6: {
-                       const char structure_info[4] = { 0xFF, 0xFF, 0xFF, 12 };
-                       dat_stream.write(structure_info, 4);
-                       break;
-               }
-               default:
-                       abort();
-               }
-               if (dat_stream.bad()) {
-                       error(EX_IOERR, errno, "%s", dat_file_name);
+       class dat_writer
+       {
+       public:
+               dat_writer(const char *dat_file_name, GeoIPDBTypes database_type);
+               virtual ~dat_writer();
+
+               void write_trie(binary_trie &trie);
+               void write_database_info(const char *database_info);
+               virtual void write_structure_info();
+
+       protected:
+               std::ostream *dat_stream;
+               bool need_to_delete_stream;
+               std::string dat_file_name;
+               GeoIPDBTypes database_type;
+       };
+
+}
+
+dat_writer::dat_writer(const char *dat_file_name, GeoIPDBTypes database_type):
+       dat_file_name(dat_file_name),
+       database_type(database_type)
+{
+       if (std::strcmp(dat_file_name, "-") == 0) {
+               dat_stream = &std::cout;
+               need_to_delete_stream = false;
+       } else {
+               dat_stream = new std::ofstream(dat_file_name, std::ios::out | std::ios::binary);
+               if (!dat_stream) {
+                       error(EX_CANTCREAT, errno, "%s", dat_file_name);
                }
+               need_to_delete_stream = true;
        }
+}
 
-       /** Write a GeoIP binary database to a file or standard output.
-        *
-        * \param trie
-        * Mapping from IP addresses to country codes or other values.
-        *
-        * \param csv_file_name
-        * The name of the file that should be written, or "-" for
-        * standard output.
-        *
-        * \param database_info
-        * Copyright or other information about the database, or NULL.
-        * GeoIP_database_info() will return this string.
-        *
-        * \param address_family
-        * The type of IP addresses in the database: either AF_INET
-        * for IPv4 or AF_INET6 for IPv6.  */
-       void
-       write_dat_file(
-               const binary_trie &trie,
-               const char *dat_file_name,
-               const char *database_info,
-               int address_family)
-       {
-               if (std::strcmp(dat_file_name, "-") == 0) {
-                       write_dat_stream(trie, dat_file_name, std::cout,
-                                        database_info, address_family);
-               } else {
-                       std::ofstream dat_stream(
-                               dat_file_name,
-                               std::ios::out | std::ios::binary);
-                       if (!dat_stream) {
-                               error(EX_CANTCREAT, errno, "%s", dat_file_name);
-                       }
-                       write_dat_stream(trie, dat_file_name, dat_stream,
-                                        database_info, address_family);
-               }
+dat_writer::~dat_writer()
+{
+       if (need_to_delete_stream)
+               delete dat_stream;
+}
+
+void dat_writer::write_trie(binary_trie &trie)
+{
+       for (std::vector<binary_trie::node>::iterator it = trie.nodes_begin();
+            it != trie.nodes_end(); ++it)
+       {
+               union {
+                       uint8_t bytes[6];
+                       char chars[6];
+               } binary = {{
+                       (it->edges[0]      ) & 0xFF,
+                       (it->edges[0] >>  8) & 0xFF,
+                       (it->edges[0] >> 16) & 0xFF,
+                       (it->edges[1]      ) & 0xFF,
+                       (it->edges[1] >>  8) & 0xFF,
+                       (it->edges[1] >> 16) & 0xFF
+               }};
+               dat_stream->write(binary.chars, 6);
+               if (dat_stream->bad())
+                       error(EX_IOERR, errno, "%s", dat_file_name.c_str());
+       }
+}
+
+void dat_writer::write_database_info(const char *database_info)
+{
+       const char tag[3] = { 0, 0, 0 };
+       dat_stream->write(tag, 3);
+       dat_stream->write(database_info, std::strlen(database_info));
+       if (dat_stream->bad()) {
+               error(EX_IOERR, errno, "%s", dat_file_name.c_str());
+       }
+}
+
+void dat_writer::write_structure_info()
+{
+       const char structure_info[4] = { 0xFF, 0xFF, 0xFF, database_type };
+       dat_stream->write(structure_info, 4);
+}
+
+/*************************************************
+ * .dat file writer class, extended for city DBs
+ *************************************************/
+
+namespace
+{
+
+       class city_dat_writer : public dat_writer
+       {
+       public:
+               // All serialized location information
+               std::stringstream location_stream;
+
+               // Seek offset of each location within
+               // location_stream (relative to the beginning of
+               // location_stream)
+               std::vector<int> location_pos;
+
+               city_dat_writer(const char *dat_file_name, GeoIPDBTypes database_type);
+
+               void serialize_location_info(std::vector<std::string> &info,
+                                            const char *input_file_name,
+                                            int input_line_number);
+
+               void finalize_location_offsets(binary_trie &trie);
+               void write_locations();
+               virtual void write_structure_info(binary_trie &trie);
+       };
+
+}
+
+city_dat_writer::city_dat_writer(const char *dat_file_name, GeoIPDBTypes database_type)
+       : dat_writer(dat_file_name, database_type)
+{ }
+
+void city_dat_writer::finalize_location_offsets(binary_trie &trie)
+{
+       // We're going to convert the location numbers in the trie
+       // into the final location numbers we're going to want to
+       // write to disk. Previous to this call, leaf nodes in the
+       // trie have the value:
+       //
+       // 0x1000000 + the location number
+       //
+       // After this call, leaf nodes in the trie have the value:
+       //
+       // (total number of nodes in the trie) + (offset of location
+       // record in the location segment)
+       //
+       // Absence of a record is indicated by the value 0x1000000
+       // before this call, and by the value (total number of nodes
+       // in the trie) after this call.
+
+       int trie_size = std::distance(trie.nodes_begin(), trie.nodes_end());
+
+       for(std::vector<binary_trie::node>::iterator it = trie.nodes_begin();
+           it != trie.nodes_end(); ++it)
+       {
+               if (it->edges[0] == 0x1000000) // No data
+                       it->edges[0] = trie_size;
+               else if (it->edges[0] > 0x1000000) // Ptr to location block
+                       it->edges[0] = location_pos[it->edges[0] - 0x1000000 - 1] + trie_size;
+               // Any other value would indicate a non-leaf node
+
+               if (it->edges[1] == 0x1000000) // No data
+                       it->edges[1] = trie_size;
+               else if (it->edges[1] > 0x1000000) // Ptr to location block
+                       it->edges[1] = location_pos[it->edges[1] - 0x1000000 - 1] + trie_size;
+               // Any other value would indicate a non-leaf node
+       }
+}
+
+void city_dat_writer::write_locations()
+{
+       *dat_stream << location_stream.rdbuf();
+
+       if (dat_stream->bad())
+       {
+               error(EX_IOERR, errno, "%s", dat_file_name.c_str());
+       }
+}
+
+void city_dat_writer::write_structure_info(binary_trie &trie)
+{
+       int trie_size = std::distance(trie.nodes_begin(), trie.nodes_end());
+
+       const char structure_info[7] = { 0xFF,
+                                        0xFF,
+                                        0xFF,
+                                        database_type,
+                                        (trie_size      ) & 0xFF,
+                                        (trie_size >> 8 ) & 0xFF,
+                                        (trie_size >> 16) & 0xFF};
+       dat_stream->write(structure_info, 7);
+}
+
+/** Convert location info into on-disk format
+ *
+ *  \param info the location info read from the .csv file:
+ *
+ *  info[CSV_LOCATION_FIELD_COUNTRY] is the country id
+ *  info[CSV_LOCATION_FIELD_REGION] is the region
+ *  info[CSV_LOCATION_FIELD_CITY] is the city
+ *
+ *  ... and so on.
+ *
+ *  \param result a vector to append the on-disk converted information
+ *  to.
+ *
+ *  \param input_line_number input file line number (for error
+ *  notifications)
+ **/
+
+void city_dat_writer::serialize_location_info(std::vector<std::string> &info,
+                                             const char *input_file_name,
+                                             int input_line_number)
+{
+       // There's nothing wrong with location info being out of
+       // order, but currently we don't support it (i.e. the code
+       // won't work if the info's out of order). Sanity check that
+       // things are in order.
+       int loc_id = ::atoi(info[0].c_str());
+       if (loc_id != location_pos.size() + 1) {
+               error_at_line(EX_DATAERR, 0, input_file_name, input_line_number,
+                             "Location info not in order (currently not supported)");
+               return;
        }
 
+       // First, we save the offset of this location block.
+       location_pos.push_back(location_stream.tellp());
+
+       // Country ID
+       int country_id;
+       if (info[1] != "AN")
+               country_id = GeoIP_id_by_code(info[1].c_str());
+       else
+               country_id = GeoIP_id_by_code("CW");
+
+       if (country_id == 0) {
+               error(EX_DATAERR, 1, dat_file_name.c_str(), input_line_number,
+                     "Unrecognized country code: %s", info[1].c_str());
+       }
+       location_stream.put(country_id);
+
+       // Region
+       location_stream << info[2];
+       location_stream.put('\0');
+
+       // City
+       location_stream << info[3];
+       location_stream.put('\0');
+
+       // Postal code
+       location_stream << info[4];
+       location_stream.put('\0');
+
+       // Latitude
+       double latitude_dbl = ::atof(info[5].c_str());
+       int latitude_int = (latitude_dbl + 180) * 10000;
+       location_stream.put((latitude_int >>  0) & 0xFF);
+       location_stream.put((latitude_int >>  8) & 0xFF);
+       location_stream.put((latitude_int >> 16) & 0xFF);
+
+       // Longitude
+       double longitude_dbl = ::atof(info[6].c_str());
+       int longitude_int = (longitude_dbl + 180) * 10000;
+       location_stream.put((longitude_int >>  0) & 0xFF);
+       location_stream.put((longitude_int >>  8) & 0xFF);
+       location_stream.put((longitude_int >> 16) & 0xFF);
+
+       // Area code and metro code
+       if (info[1] == "US") {
+               int area_code = ::atoi(info[7].c_str());
+               int metro_code = ::atoi(info[8].c_str());
+               int area_metro_combined = metro_code * 1000 + area_code;
+               location_stream.put((area_metro_combined >>  0) & 0xFF);
+               location_stream.put((area_metro_combined >>  8) & 0xFF);
+               location_stream.put((area_metro_combined >> 16) & 0xFF);
+       }
+}
+
+/*************************************************
+ * Command line and options
+ *
+ * This section implements the command line parsing and stores the
+ * options for controlling the program's behavior.
+ *************************************************/
+
+namespace {
+
        struct cmdline {
-               const char *csv_file_name;
+               const char *ip_block_csv_file_name;
+               const char *location_csv_file_name;
                const char *dat_file_name;
                int address_family;
+               GeoIPDBTypes database_type;
                const char *database_info;
                bool verbose;
 
@@ -687,9 +888,11 @@
 }
 
 cmdline::cmdline(int argc, char **argv):
-       csv_file_name("-"),
+       ip_block_csv_file_name("-"),
+       location_csv_file_name(NULL),
        dat_file_name("-"),
        address_family(AF_INET),
+       database_type(GEOIP_COUNTRY_EDITION),
        database_info(NULL),
        verbose(false)
 {
@@ -701,24 +904,28 @@
                { "inet", no_argument, NULL, '4' },
                { "inet6", no_argument, NULL, '6' },
                { "info", required_argument, NULL, 'i' },
+               { "location-csv", required_argument, NULL, 'l' },
                { "output", required_argument, NULL, 'o' },
+               { "type", required_argument, NULL, 't' },
                { "verbose", no_argument, NULL, 'v' },
                { "help", no_argument, NULL, OPT_HELP },
                { NULL, 0, NULL, 0 }
        };
        static const char *const usage = "\
 Usage: %s [OPTION] [CSV-FILE]...\n\
-Convert a country database from CSV to GeoIP binary format.\n\
+Convert a GeoIP database from CSV to GeoIP binary format.\n\
 \n\
-  -4, --inet            addresses are IPv4 (default)\n\
-  -6, --inet6           addresses are IPv6\n\
-  -i, --info=TEXT       add copyright or other info TEXT to output\n\
-  -o, --output=FILE     write the binary data to FILE, not stdout\n\
-  -v, --verbose         show what is going on\n\
-      --help            display this help and exit\n";
+  -4, --inet              set database type to GEOIP_COUNTRY_EDITION, v4 addresses (default)\n\
+  -6, --inet6             set database type to GEOIP_COUNTRY_EDITION_V6, v6 addresses\n\
+  -t, --type=TYPE         set database type explicitly (e.g. to GEOIP_CITY_EDITION_REV1)\n\
+  -i, --info=TEXT         add copyright or other info TEXT to output\n\
+  -l, --location-csv=FILE set location CSV file name (required for GEOIP_CITY_EDITION_REV1)\n\
+  -o, --output=FILE       write the binary data to FILE, not stdout\n\
+  -v, --verbose           show what is going on\n\
+      --help              display this help and exit\n";
 
        for (;;) {
-               int optret = getopt_long(argc, argv, "46i:o:v", long_options, NULL);
+               int optret = getopt_long(argc, argv, "46i:l:o:t:v", long_options, NULL);
 
                if (optret == -1)
                        break;
@@ -736,9 +943,26 @@
                                      "Database info must not be longer than 99 bytes");
                        }
                        break;
+               case 'l':
+                       location_csv_file_name = optarg;
+                       break;
                case 'o':
                        dat_file_name = optarg;
                        break;
+               case 't':
+                       if (!strcmp(optarg, "GEOIP_COUNTRY_EDITION")) {
+                               database_type = GEOIP_COUNTRY_EDITION;
+                       } else if (!strcmp(optarg, "GEOIP_COUNTRY_EDITION_V6")) {
+                               database_type = GEOIP_COUNTRY_EDITION_V6;
+                               address_family = AF_INET6;
+                       } else if (!strcmp(optarg, "GEOIP_CITY_EDITION_REV1")) {
+                               database_type = GEOIP_CITY_EDITION_REV1;
+                       } else {
+                               error(EX_USAGE, 0,
+                                     "Unrecognized database type (we support GEOIP_COUNTRY_EDITION, GEOIP_COUNTRY_EDITION_V6, \
+GEOIP_CITY_EDITION_REV1)");
+                       }
+                       break;
                case 'v':
                        verbose = true;
                        break;
@@ -756,7 +980,12 @@
        }
 
        if (optind < argc)
-               csv_file_name = argv[optind++];
+               ip_block_csv_file_name = argv[optind++];
+
+       if (database_type == GEOIP_CITY_EDITION_REV1 && location_csv_file_name == NULL) {
+               error(EX_USAGE, 0,
+                     "Must specify -l option when type is GEOIP_CITY_EDITION_REV1");
+       }
 
        if (optind < argc) {
                error(EX_USAGE, 0,
@@ -764,26 +993,109 @@
        }
 }
 
-int
-main(int argc, char **argv)
+/*************************************************
+ * Country DB reading and writing
+ *
+ * This section contains code implementing coverting a country .csv
+ * file to a country .dat file.
+ *************************************************/
+
+namespace {
+
+       class country_db_impl : public csv_data_reader
+       {
+       public:
+               binary_trie trie;
+               struct cmdline &cmdline;
+
+               enum {
+                       CSV_FIELD_MIN_TEXT,
+                       CSV_FIELD_MAX_TEXT,
+                       CSV_FIELD_MIN_DECIMAL,
+                       CSV_FIELD_MAX_DECIMAL,
+                       CSV_FIELD_COUNTRY_CODE,
+                       CSV_FIELD_COUNTRY_NAME,
+                       CSV_FIELDS
+               };
+
+               country_db_impl(struct cmdline &cmdline);
+               void convert_db(std::ostream *verbose_stream);
+               void read_csv_line(const char *csv_file_name,
+                                  int csv_line_number,
+                                  std::vector<std::string> &fields);
+       };
+
+}
+
+country_db_impl::country_db_impl(struct cmdline &in_cmdline):
+       cmdline(in_cmdline),
+       trie(0xFFFF00)
+{ }
+
+/** Callback for receiving .csv data (see csv_read_file()) */
+
+void country_db_impl::read_csv_line(const char *csv_file_name,
+                                   int csv_line_number,
+                                   std::vector<std::string> &csv_fields)
 {
-       cmdline cmdline(argc, argv);
+       if (csv_fields.size() != CSV_FIELDS) {
+               error_at_line(EX_DATAERR, 0, csv_file_name, csv_line_number,
+                             "Wrong number of fields");
+       }
 
-       std::ostream *verbose_stream;
-       if (!cmdline.verbose)
-               verbose_stream = NULL;
-       else if (strcmp(cmdline.dat_file_name, "-") == 0)
-               verbose_stream = &std::cerr;
-       else
-               verbose_stream = &std::cout;
+       if (csv_fields[CSV_FIELD_COUNTRY_CODE] == "AN") {
+               csv_fields[CSV_FIELD_COUNTRY_CODE] = "CW";
+       }
+
+       const int countryid = GeoIP_id_by_code(csv_fields[CSV_FIELD_COUNTRY_CODE].c_str());
+       if (countryid == 0) {
+               error_at_line(EX_DATAERR, 0, csv_file_name, csv_line_number,
+                             "Unrecognized country code: %s",
+                             csv_fields[CSV_FIELD_COUNTRY_CODE].c_str());
+       }
+       const binary_trie::edge_type leaf = 0xFFFF00 + countryid;
 
+       union {
+               struct in_addr inet;
+               uint8_t inetbytes[4];
+               struct in6_addr inet6;
+       } minaddr, maxaddr;
+       if (inet_pton(cmdline.address_family, csv_fields[CSV_FIELD_MIN_TEXT].c_str(), &minaddr) <= 0) {
+               error_at_line(EX_DATAERR, 0, csv_file_name, csv_line_number,
+                             "Cannot parse minimum address: %s",
+                             csv_fields[CSV_FIELD_MIN_TEXT].c_str());
+       }
+       if (inet_pton(cmdline.address_family, csv_fields[CSV_FIELD_MAX_TEXT].c_str(), &maxaddr) <= 0) {
+               error_at_line(EX_DATAERR, 0, csv_file_name, csv_line_number,
+                             "Cannot parse maximum address: %s",
+                             csv_fields[CSV_FIELD_MAX_TEXT].c_str());
+       }
+       switch (cmdline.address_family) {
+       case AF_INET:
+               trie.set_range(minaddr.inetbytes, maxaddr.inetbytes,
+                              32, leaf);
+               break;
+       case AF_INET6:
+               trie.set_range(minaddr.inet6.s6_addr, maxaddr.inet6.s6_addr,
+                              128, leaf);
+               break;
+       default:
+               abort();
+       }
+}
+
+/** Convert a country DB from .csv to .dat. Parameters are mainly
+ *  controlled by the cmdline object. verbose_stream is (if non-NULL)
+ *  the stream to write verbose information to. */
+
+void country_db_impl::convert_db(std::ostream *verbose_stream)
+{
        if (verbose_stream) {
                *verbose_stream << program_invocation_name
                                << ": Reading CSV and building the trie"
                                << std::endl;
        }
-       binary_trie trie(0xFFFF00);
-       csv_file_to_trie(trie, cmdline.csv_file_name, cmdline.address_family);
+       csv_read_file(*this, cmdline.ip_block_csv_file_name);
 
        if (verbose_stream) {
                *verbose_stream << program_invocation_name
@@ -796,11 +1108,339 @@
                *verbose_stream << program_invocation_name
                                << ": Writing output" << std::endl;
        }
-       write_dat_file(trie, cmdline.dat_file_name, cmdline.database_info,
-                      cmdline.address_family);
+
+       dat_writer writer(cmdline.dat_file_name, cmdline.database_type);
+       writer.write_trie(trie);
+       if (cmdline.database_info)
+               writer.write_database_info(cmdline.database_info);
+       writer.write_structure_info();
+}
+
+/*************************************************
+ * City DB conversion
+ *
+ * This section implements converting the two .csv files storing city
+ * data to a city .dat file.
+ *************************************************/
+
+namespace {
+
+       /** Implementation for converting a city DB
+        */
+
+       class city_db_impl : public csv_data_reader
+       {
+       public:
+               // Trie mapping IP ranges to location blocks
+               binary_trie trie;
+
+               // Writer for .dat file
+               city_dat_writer writer;
+
+               enum {
+                       STAGE_READING_BLOCKS,
+                       STAGE_READING_LOCATIONS
+               };
+
+               // Which stage of CSV reading we're at (out of above
+               // enum)
+               int which_stage;
+
+               struct cmdline &cmdline;
+
+               enum {
+                       CSV_BLOCK_FIELD_MIN_DECIMAL,
+                       CSV_BLOCK_FIELD_MAX_DECIMAL,
+                       CSV_BLOCK_FIELD_LOC,
+                       CSV_BLOCK_FIELDS
+               };
+
+               enum {
+                       CSV_LOCATION_FIELD_ID,
+                       CSV_LOCATION_FIELD_COUNTRY,
+                       CSV_LOCATION_FIELD_REGION,
+                       CSV_LOCATION_FIELD_CITY,
+                       CSV_LOCATION_FIELD_POSTALCODE,
+                       CSV_LOCATION_FIELD_LATITUDE,
+                       CSV_LOCATION_FIELD_LONGITUDE,
+                       CSV_LOCATION_FIELD_METROCODE,
+                       CSV_LOCATION_FIELD_AREACODE,
+                       CSV_LOCATION_FIELDS
+               };
+
+               city_db_impl(struct cmdline &cmdline);
+               void convert_db(std::ostream *verbose_stream);
+               void read_csv_line(const char *csv_file_name,
+                                  int csv_line_number,
+                                  std::vector<std::string> &fields);
+
+               void read_location_line(const char *csv_file_name,
+                                       int csv_line_number,
+                                       std::vector<std::string> &fields);
+               void read_block_line(const char *csv_file_name,
+                                    int csv_line_number,
+                                    std::vector<std::string> &fields);
+
+               // Check that a token within the "header" of the CSV
+               // files is what we expect it to be, and cause a data
+               // error if not.
+               void check_csv_header_token(std::vector<std::string> &tokens,
+                                           int token_number,
+                                           const char *token_expected,
+                                           const char *csv_file_name,
+                                           int csv_line_number);
+       };
+
+}
+
+city_db_impl::city_db_impl(struct cmdline &in_cmdline):
+       trie(0x1000000), // We use 0x1000000 as the beginning of the
+                        // location information, since we don't know
+                        // the real value and we'll need to remap all
+                        // the offsets later anyway.
+       writer(in_cmdline.dat_file_name, in_cmdline.database_type),
+       cmdline(in_cmdline),
+       which_stage(STAGE_READING_BLOCKS)
+{ }
+
+/** Convert a city DB from .csv to .dat. Parameters are mainly
+ *  controlled by the cmdline object. verbose_stream is (if non-NULL)
+ *  the stream to write verbose information to. */
+
+void
+city_db_impl::convert_db(std::ostream *verbose_stream)
+{
+       // Read the block data from CSV
+       if (verbose_stream) {
+               *verbose_stream << program_invocation_name
+                               << ": Reading block CSV and building the trie"
+                               << std::endl;
+       }
+
+       which_stage = STAGE_READING_BLOCKS;
+       csv_read_file(*this, cmdline.ip_block_csv_file_name);
+
+       if (verbose_stream) {
+               int trie_size = std::distance(trie.nodes_begin(), trie.nodes_end());
+
+               *verbose_stream << program_invocation_name
+                               << ": Done reading blocks, trie size is "
+                               << trie_size
+                               << std::endl;
+       }
+
+       // Read the location data from CSV
+       if (verbose_stream) {
+               *verbose_stream << program_invocation_name
+                               << ": Reading location CSV"
+                               << std::endl;
+       }
+
+       which_stage = STAGE_READING_LOCATIONS;
+       csv_read_file(*this, cmdline.location_csv_file_name);
+
+       // Optimize
+       if (verbose_stream) {
+               *verbose_stream << program_invocation_name
+                               << ": Optimizing" << std::endl;
+       }
+
+       trie.reorder_depth_first();
+       trie.reorder_in_blocks(1024);
+
+       // Finalize offsets
+       if (verbose_stream) {
+               *verbose_stream << program_invocation_name
+                               << ": Linking location and block data" << std::endl;
+       }
+
+       writer.finalize_location_offsets(trie);
+
+       // Write
+       if (verbose_stream) {
+               *verbose_stream << program_invocation_name
+                               << ": Writing output" << std::endl;
+       }
+
+       writer.write_trie(trie);
+       writer.write_locations();
+       if (cmdline.database_info)
+               writer.write_database_info(cmdline.database_info);
+       writer.write_structure_info(trie);
+}
+
+/** Callback for receiving CSV data (see csv_read_file()). We use
+ *  this both for reading the location CSV and the block CSV; which
+ *  stage we're at is indicated by the which_stage variable. */
+
+void city_db_impl::read_csv_line(const char *csv_file_name,
+                                int csv_line_number,
+                                std::vector<std::string> &csv_fields)
+{
+       switch(which_stage) {
+       case STAGE_READING_BLOCKS:
+               read_block_line(csv_file_name, csv_line_number, csv_fields);
+               break;
+       case STAGE_READING_LOCATIONS:
+               read_location_line(csv_file_name, csv_line_number, csv_fields);
+               break;
+       default:
+               error(EX_SOFTWARE, 1, "Invalid which_stage value: %d", which_stage);
+       }
+}
+
+/** Callback for reading one line of the block CSV. */
+
+void city_db_impl::read_block_line(const char *csv_file_name,
+                                  int csv_line_number,
+                                  std::vector<std::string> &csv_fields)
+{
+       if (csv_line_number == 1)
+               return; // Assume this is copyright information and
+                       // skip doing anything to it
+
+       if (csv_fields.size() != CSV_BLOCK_FIELDS) {
+               error_at_line(EX_DATAERR, 0, csv_file_name, csv_line_number,
+                             "Wrong number of fields");
+               return;
+       }
+
+       if (csv_line_number == 2) {
+               // Assume this is header information -- we check it to
+               // make sure we're looking at the right format of file.
+               check_csv_header_token(csv_fields, CSV_BLOCK_FIELD_MIN_DECIMAL, "startIpNum",
+                                      csv_file_name, csv_line_number);
+               check_csv_header_token(csv_fields, CSV_BLOCK_FIELD_MAX_DECIMAL, "endIpNum",
+                                      csv_file_name, csv_line_number);
+               check_csv_header_token(csv_fields, CSV_BLOCK_FIELD_LOC, "locId",
+                                      csv_file_name, csv_line_number);
+
+               // Format checks out, we're now done with this line
+               return;
+       }
+
+       const int locid = atoi(csv_fields[CSV_BLOCK_FIELD_LOC].c_str());
+       const binary_trie::edge_type leaf = 0x1000000 + locid;
+
+       if (cmdline.address_family != AF_INET) {
+               error(EX_SOFTWARE, 1, "IPv6 with city database is unimplemented.");
+       }
+
+       union {
+               struct in_addr inet;
+               uint8_t inetbytes[4];
+       } minaddr, maxaddr;
+
+       if (inet_aton(csv_fields[CSV_BLOCK_FIELD_MIN_DECIMAL].c_str(), &(minaddr.inet)) == 0) {
+               error_at_line(EX_DATAERR, 1, csv_file_name, csv_line_number,
+                             "Invalid min IP address");
+       }
+       if (inet_aton(csv_fields[CSV_BLOCK_FIELD_MAX_DECIMAL].c_str(), &(maxaddr.inet)) == 0) {
+               error_at_line(EX_DATAERR, 1, csv_file_name, csv_line_number,
+                             "Invalid max IP address");
+       }
+
+       trie.set_range(minaddr.inetbytes, maxaddr.inetbytes, 32, leaf);
+}
+
+/** Callback for reading one line of the location CSV. */
+
+void city_db_impl::read_location_line(const char *csv_file_name,
+                                     int csv_line_number,
+                                     std::vector<std::string> &csv_fields)
+{
+       if (csv_line_number == 1)
+               return; // Assume this is copyright information and
+                       // skip it entirely
+
+       if (csv_fields.size() != CSV_LOCATION_FIELDS) {
+               error_at_line(EX_DATAERR, 0, csv_file_name, csv_line_number,
+                             "Wrong number of fields");
+               return;
+       }
+
+       if (csv_line_number == 2) {
+               // Assume this is header information -- we check it to
+               // make sure we're looking at the right format of file.
+               check_csv_header_token(csv_fields, CSV_LOCATION_FIELD_ID, "locId",
+                                      csv_file_name, csv_line_number);
+               check_csv_header_token(csv_fields, CSV_LOCATION_FIELD_COUNTRY, "country",
+                                      csv_file_name, csv_line_number);
+               check_csv_header_token(csv_fields, CSV_LOCATION_FIELD_REGION, "region",
+                                      csv_file_name, csv_line_number);
+               check_csv_header_token(csv_fields, CSV_LOCATION_FIELD_CITY, "city",
+                                      csv_file_name, csv_line_number);
+               check_csv_header_token(csv_fields, CSV_LOCATION_FIELD_POSTALCODE, "postalCode",
+                                      csv_file_name, csv_line_number);
+               check_csv_header_token(csv_fields, CSV_LOCATION_FIELD_LATITUDE, "latitude",
+                                      csv_file_name, csv_line_number);
+               check_csv_header_token(csv_fields, CSV_LOCATION_FIELD_LONGITUDE, "longitude",
+                                      csv_file_name, csv_line_number);
+               check_csv_header_token(csv_fields, CSV_LOCATION_FIELD_METROCODE, "metroCode",
+                                      csv_file_name, csv_line_number);
+               check_csv_header_token(csv_fields, CSV_LOCATION_FIELD_AREACODE, "areaCode",
+                                      csv_file_name, csv_line_number);
+
+               // Format checks out, we're now done with this line
+               return;
+       }
+
+       writer.serialize_location_info(csv_fields, csv_file_name, csv_line_number);
+}
+
+void city_db_impl::check_csv_header_token(std::vector<std::string> &tokens,
+                                         int token_number,
+                                         const char *token_expected,
+                                         const char *csv_file_name,
+                                         int csv_line_number)
+{
+       if (tokens[token_number] != token_expected) {
+               error_at_line(EX_DATAERR, 1, csv_file_name, csv_line_number,
+                             "Incorrect format: field %d is \"%s\", but we expected \"%s\"",
+                             token_number, tokens[token_number].c_str(), token_expected);
+       }
+}
+
+/*************************************************
+ * Main program
+ *
+ * This is the entry point.
+ *************************************************/
+
+int
+main(int argc, char **argv)
+{
+       cmdline cmdline(argc, argv);
+
+       std::ostream *verbose_stream;
+       if (!cmdline.verbose)
+               verbose_stream = NULL;
+       else if (strcmp(cmdline.dat_file_name, "-") == 0)
+               verbose_stream = &std::cerr;
+       else
+               verbose_stream = &std::cout;
+
+       switch(cmdline.database_type) {
+       case GEOIP_COUNTRY_EDITION:
+       case GEOIP_COUNTRY_EDITION_V6:
+               {
+                       country_db_impl country_db(cmdline);
+                       country_db.convert_db(verbose_stream);
+                       break;
+               }
+
+       case GEOIP_CITY_EDITION_REV1:
+               {
+                       city_db_impl city_db(cmdline);
+                       city_db.convert_db(verbose_stream);
+                       break;
+               }
+       }
 
        if (verbose_stream) {
                *verbose_stream << program_invocation_name
                                << ": All done" << std::endl;
        }
+
+       return 0;
 }




unblock geoip/1.6.2-2

-- System Information:
Debian Release: 7.7
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 3.2.0-4-amd64 (SMP w/2 CPU cores)
Locale: LANG=de_DE.UTF-8, LC_CTYPE=de_DE.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Reply to:
Follow-Ups:
- Bug#768817: unblock: geoip/1.6.2-2
  - From: "Adam D. Barratt" <adam@adam-barratt.org.uk>
- Processed: Re: Bug#768817: unblock: geoip/1.6.2-2
  - From: owner@bugs.debian.org (Debian Bug Tracking System)
- Bug#768817: marked as done (unblock: geoip/1.6.2-2)
  - From: owner@bugs.debian.org (Debian Bug Tracking System)
Prev by Date: Bug#768814: RM: sshm/0.4.3-2
Next by Date: Bug#768818: unblock: cython/0.21.1-1
Previous by thread: Bug#768814: marked as done (RM: sshm/0.4.3-2)
Next by thread: Bug#768817: unblock: geoip/1.6.2-2
Index(es):
- Date
- Thread