[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

[PATCH] Add DEP-11 / AppStream support to dak



Hello!
It has been a while since we last talked about DEP-11, and I finally
think that it makes sense to talk about merging the necessary code
into DAK.
While there are still some (minor) things I don't like about the dak
patch, in general and in my opinion it doesn't make any sense to delay
this further, especially since there is a high demand for it and
several people are constantly asking me about AppStream / DEP-11
support in Debian ;-)
So, no is the right time to request feedback on this change.

If you don't know about DEP-11 yet, take a look at [1] - the Wiki page
is outdated however, while DEP-11 was initially different from
AppStream[2] in some regards, the goals of both efforts are identical
now (thanks to some upstream work I did).

The DEP-11 code was mainly created by Abhishek Bhattacharjee (CC'ed)
during his Google Summer of Code internship 2014, and polished up by
me.

So, since I don't want to write a huge blob of text, here are some
answers to the questions I assume you will have about this thing in a
FAQ-like style.

== What is this all about? Can you summarize it in a few sentences? ==

In a few short sentences: AppStream is an effort to enhance metadata
provided about software components in our FLOSS ecosystem. It consists
of a simple XML file format which describes software and is shipped by
upstream, and of another XML spec for distributions to ship to their
users.
AppStream is what powers tools like GNOME-Software and Muon, and it
also is behint shiny new developments like the automatic UEFI firmware
updates.
DEP-11 is the Debian implementation of the distribution-specific part
of AppStream.


== Isn't AppStream XML-based? I only see YAML in DEP-11. ==

Yes, as you can see in the AppStream spec at Freedesktop[2], AppStream
is entirely XML-based. Debian uses YAML, however, since when I
initially talked about it's inclusion into Debian with our ftpmasters,
I got generally positive feedback about the idea, but XML was rejected
(in a "never ever" style, I really tried to get XML accepted). Back
then, since the AppStream data was supposed to be cached in a Xapian
database, I also didn't see large problems in using YAML (then
libappstream-glib was written by GNOME, which didn't use Xapian).
Today, however, every library out there parsing AppStream
(libappstream, libappsteam-qt, libappstream-glib) can ready the DEP-11
YAML file format, so shipping it is not an issue at all.


== Where do I get the python-dep11 module required by the patch? ==

You can get it at Github[3]. I plan to move it to Alioth later.


== Why does the dep11 dak patch require another Python module? ==

Initially, all the parsing & extracting code was in dak. I split it
out because there is interest from other parties who don't use dak for
repository management to use DEP-11, and having a common Python
library for that makes sense.


== Will you provide backports to Jessie for python-dep11? ==

Yes, as soon as the patch for DEP-11 is in dak, I will do whatever the
ftpmasters want to make python-dep11 available easily. ;-)


== I see the new "bin_dep11" table stores a YAML document directly. Why? ==

Storing YAML in the SQL database seems odd at first, but splitting an
AppStream component into the table layout of a database would result
in great complexity, since there are lots of translatable keys, nested
entries (screenshots, provided elements, ...)and also an
infrastructure for error/issue reporting about the metadata. Adding
all of this to projectb seemed insane. With the current solution, we
can easily rebuild the metadata files. Since we don't have to
regenerate the YAML data in each step, this process is also *much*
faster.
In theory, we could use JSON as storage format, since this is
something Postgres understands natively. But at the time the code was
written, this feature was still under development, and if we stored
JSON, we would still have to convert that back into YAML, which has
said performance issues.
So, in summary: Storing the YAML data that way is just a pragmatic
choice and a performance optimization.


== How well does the code perform? ==

While it uses quite a lot of memory and CPU due to extracting packages
in parallel, it is very fast - one Debian suite with one architecture
can be processed in less than 15min (HDD, but Xeon E3-1231, W-LAN
download speed ~210kb/s)
That being said, a full reprocessing will not be necessary, since the
code will only handle new packages, which have not yet ben processed.
So the overall load is expected to be very low. A call without any new
packages takes about 20sec.


== In which phase should the generate-metadata command be executed? ==

Initially I was going for a weekly update of the data, but I now think
we can update it with every dinstall run.
Dinstall would need to execute the following:
 $ dak generate-metadata --expire --write-hints -s <suite>
(for each suite)
This command will expire existing metadata and refresh the hints file
about broken metadata.
But yeah, how often and when this is executed is up to the ftp-masters.


== What testing did you do with the code? ==

I am running it locally on a suite of the Tanglu Debian derivative,
which is roughly equivalent to the current state of Debian testing.
The code was also test-run in the Tanglu instance of dak.


== How does the data look like which the code exports? Do you have an
example? ==

Yes, sure!
For the screenshots, icons and other stuff the code will
download/extract, take a look at this url:
http://metadata.tanglu.org/dep11/bartholomea/
The directory also contains the hints files which contain detected
issues with found (or even not-found) metadata.
Tanglu does not use the dak-dep11 code in production yet, since there
are some Apt changes missing to download it. So the final DEP-11 data
and icons are shipped as a .deb package at time. You can find it at
http://packages.tanglu.org/bartholomea-updates/appstream-data
Just extract the package to look at the generated data.


== How does the data reach the user's machine? Did you talk to the Apt team? ==

Yes, but the last time we discussed this matter happened a while ago.
The general ide is that Apt downloads the matadata and icons and
places it in the appropriate locations for other tools to pick it up.
The behaviour can be controlled by a config snippet, so people who
don't want the metadata or the icons (the icons package for HiDPI
displays is quite large) can enable it separately.
In general, this feature is expected to be opt-in in Apt.


== Are there other benefits of adding DEP-11 support? ==

Oh yes! This code will greatly improve our QA, since it catches all
kinds of issues in upstrem's code and in the packaging. Issues it
finds are for example (incomplete list):
 * Encoding issues in .desktop files
 * Broken icon links in .desktop files, even across packages
 * Packaging bugs where the AppStream metadata is placed in the wrong package
 * Lots of file-corruption issues
 * Packages shopping only XPM icons for their applications (not good
enough for modern desktop environments)
 * etc.
I think adding this will greatly improve the quality of our packages.
I plan to add code to python-dep11 later which will parse the hints
files generated by dak-dep11 and output nice HTML pages with found
issues. This data can then be referenced e.g. from the QA pages or
developer dashboard.


== This is a pretty large amount of code.... Will it be maintained? ==

Jup, I hereby volunteer to maintain the code in future - so no worries
about that. I am also the upstream maintainer of AppStream, so I know
what's coming up in future and can adapt the code on Debian's side. As
soon as it is in dak, I also expect to find a few more places to tweak
the code to extract more metadata.
Help is of course appreciated, and any advice from the ftp-masters who
are dealing with dak for a much longer time is very valuable.


== Are there known bugs? ==

None so far :) There are a few annoyances though, for example the
current code can export files with two components having the same ID.
This is not allowed by the specification, but it is also something we
shouldn't paper over with "fixing" it in dak. Therefore, I think it's
best solved in the respective broken packages.

Also, the whole icon finding code is something I don't like much, for
example it contains hardcoded icon packages. But there is no way to
solve this, since upstream projects sometimes rely on icons in an icon
theme, e.g. KDE is completely designed that way, so we have to support
it.
All other distributions implementing AppStream have the same set of
workarounds, which we will only be able to phase out over time by
adapting the upstream projects. This is nothing we can enforce now.

So, in summary: No known bugs in dak-dep11/python-dep11, but some
mindfuck to work around issues in upstream projects.


== Can the DEP-11 data be validated? ==

Yes, of course! Check out the dep11-validate.py script at [4]. Some
adaptations will have to be made on it to support the 0.8 spec of
AppStream, but that is something I want to delay until the existing
code has been ACK'ed by the people in charge of dak.


== Are there any breaking changes to any part of the Debian archive? ==

Eww, no - the DEP-11 stuff is just additional data, and will not break
anything or require any additional change. In fact, the whole thing
will be optional and opt-in even for our users.
(While installing a package like muon or gnome-software or apper might
of course auto-activate the feature)


== Where is the patch? ==

It's attached, as it is quite long. If desired, I can also send it
inline for review, or push it as one commit to Github / $service for
easy review. Whatever you like.


== I found <issue> in the code ==

I'll fix it as soon as I can :-)

####

And that's it - if I forgot anything, please let me know.

Have a great weekend!
Cheers,
    Matthias

[1]: https://wiki.debian.org/DEP-11
[2]: http://www.freedesktop.org/software/appstream/docs/
[3]: https://github.com/ximion/dep11
[4]: https://github.com/ximion/appstream/tree/master/contrib/dep11

-- 
I welcome VSRE emails. See http://vsre.info/
commit 08a961284af34d6aa25902d4f59893aa06d1b893
Author: Matthias Klumpp <matthias@tenstral.net>
Date:   Sun Mar 8 14:42:06 2015 +0100

    Add support for DEP-11
    
    DEP-11 is Debian's implementation of the AppStream specification.
    To find out more about AppStream, take a look at[1].
    This code requires python-dep11, which is currently available at[2].
    
    [1]: http://www.freedesktop.org/software/appstream/docs/
    [2]: https://github.com/ximion/dep11

diff --git a/config/debian/dak.conf b/config/debian/dak.conf
index d620a86..5dbd88d 100644
--- a/config/debian/dak.conf
+++ b/config/debian/dak.conf
@@ -210,6 +210,8 @@ Dir
   Holding "/srv/ftp-master.debian.org/queue/holding/";
   Done "/srv/ftp-master.debian.org/queue/done/";
   Reject "/srv/ftp-master.debian.org/queue/reject/";
+  MetaInfo "/srv/ftp-master.debian.org/export/metainfo/";
+  MetaInfoHints "/srv/ftp-master.debian.org/export/metainfo/";
 };
 
 Queue-Report
@@ -289,3 +291,18 @@ Command::DM-Admin {
     "309911BEA966D0613053045711B4E5FF15B0FD82"; // mhy
   };
 };
+
+DEP11
+{
+  Url "http://metadata.ftp-master.debian.org/dep11";;
+  IconSizes
+  {
+    128x128;
+    64x64;
+  };
+  IconThemePackages
+  {
+    oxygen-icon-theme;
+    gnome-icon-theme;
+  }
+};
diff --git a/dak/dak.py b/dak/dak.py
index 7cb80f4..b84348f 100755
--- a/dak/dak.py
+++ b/dak/dak.py
@@ -151,6 +151,8 @@ def init():
          "Generate a list of override disparities"),
         ("external-overrides",
          "Modify external overrides"),
+        ("generate-metadata",
+         "Extract DEP-11 metadata about components shipped with packages"),
         ]
     return functionality
 
diff --git a/dak/dakdb/update107.py b/dak/dakdb/update107.py
new file mode 100644
index 0000000..9b98259
--- /dev/null
+++ b/dak/dakdb/update107.py
@@ -0,0 +1,69 @@
+#!/usr/bin/env python
+
+"""
+Adds bin_dep11 table. Stores DEP-11 metadata per binary
+"""
+
+# Copyright (C) 2014 Abhishek Bhattacharjee <abhishek.bhattacharjee11@gmail.com>
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+
+############################################################################
+
+# the script is part of project under Google Summer of Code '14
+# Project: AppStream/DEP-11 for the Debian Archive
+# Mentor: Matthias Klumpp
+
+############################################################################
+
+
+import psycopg2
+from daklib.dak_exceptions import DBUpdateError
+from daklib.config import Config
+from daklib.dbconn import *
+
+statements = [
+    """
+    CREATE TABLE bin_dep11(
+        id SERIAL PRIMARY KEY,
+        binary_id INTEGER NOT NULL,
+        cpt_id TEXT NOT NULL,
+        metadata TEXT NOT NULL,
+        hints TEXT,
+        ignore boolean NOT NULL
+    );
+    """,
+
+    """
+    ALTER TABLE bin_dep11 ADD CONSTRAINT binaries_bin_dep11
+    FOREIGN KEY (binary_id) REFERENCES binaries (id) ON DELETE CASCADE;
+    """
+]
+
+##############################################################################
+
+def do_update(self):
+    print __doc__
+    try:
+        c = self.db.cursor()
+        for stmt in statements:
+            c.execute(stmt)
+
+        c.execute("UPDATE config SET value = '107' WHERE name = 'db_revision'")
+        self.db.commit()
+
+    except psycopg2.ProgrammingError as msg:
+        self.db.rollback()
+        raise DBUpdateError("Unable to apply sick update 107, rollback issued. Error message: {0}".format(msg))
diff --git a/dak/find_metainfo.py b/dak/find_metainfo.py
new file mode 100644
index 0000000..f115256
--- /dev/null
+++ b/dak/find_metainfo.py
@@ -0,0 +1,227 @@
+#!/usr/bin/env python
+
+"""
+Checks binaries with a .desktop file or an AppStream upstream XML file.
+Generates a dict with package name and associated appdata in
+a list as value.
+Finds icons for packages with missing icons.
+"""
+
+# Copyright (c) 2014 Abhishek Bhattacharjee <abhishek.bhattacharjee11@gmail.com>
+# Copyright (c) 2014 Matthias Klumpp <mak@debian.org>
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+
+
+import os
+import glob
+from shutil import rmtree
+from daklib.dbconn import *
+from daklib.config import Config
+from sqlalchemy import create_engine
+from sqlalchemy.orm import sessionmaker
+from dep11.component import IconSize
+from dep11.extractor import AbstractIconFinder
+
+class MetaInfoFinder:
+    def __init__(self, session):
+        '''
+        Initialize the variables and create a session.
+        '''
+
+        self._session = session
+
+    def find_meta_files(self, component, suitename):
+        '''
+        Find binaries with a .desktop files and/or .xml files.
+        '''
+
+        params = {
+            'component': component,
+            'suitename': suitename
+            }
+
+        # SQL logic:
+        # select all the binaries that have a .desktop and xml files
+        # do not repeat processing of deb files that are already processed
+
+        sql = """
+        with
+        req_data as
+        ( select distinct on(b.package) f.filename, c.name, b.id,
+        a.arch_string, b.package
+        from
+        binaries b, bin_associations ba, suite s, files f, override o,
+        component c, architecture a
+        where b.type = 'deb' and b.file = f.id and b.package = o.package
+        and o.component = c.id and c.name = :component and b.id = ba.bin
+        and ba.suite = s.id and s.suite_name = :suitename and
+        b.architecture = a.id order by b.package, b.version desc)
+
+        select bc.file,rd.filename,rd.name,rd.id,rd.arch_string,rd.package
+        from bin_contents bc,req_data rd
+        where (bc.file like 'usr/share/appdata/%.xml' or
+        bc.file like 'usr/share/applications/%.desktop')
+        and bc.binary_id = rd.id and rd.id not in
+        (select binary_id from bin_dep11)
+        """
+
+        result = self._session.query("file", "filename", "name", "id",
+                                     "arch_string", "package")\
+                              .from_statement(sql).params(params)
+
+        # create a dict with packagename:[.desktop and/or .xml files]
+
+        interesting_pkgs = dict()
+        for r in result:
+            fname = '%s/%s' % (r[2], r[1])
+            pkg_name = r[5]
+            arch_name = r[4]
+            if not interesting_pkgs.get(pkg_name):
+                interesting_pkgs[pkg_name] = dict()
+            pkg = interesting_pkgs[pkg_name]
+            if not pkg.get(arch_name):
+                pkg[arch_name] = dict()
+
+            pkg[arch_name]['filename'] = fname
+            pkg[arch_name]['binid'] = r[3]
+            if not pkg[arch_name].get('files'):
+                pkg[arch_name]['files'] = list()
+            ifiles = pkg[arch_name]['files']
+            ifiles.append(r[0])
+
+        return interesting_pkgs
+
+
+###########################################################################
+
+
+class IconFinder(AbstractIconFinder):
+    '''
+    To be used when icon is not found through regular method.This class
+    searches icons of similar packages. Ignores the package with binid.
+    '''
+    def __init__(self, suitename, component):
+        self._suite_name = suitename
+        self._component = component
+
+        cnf = Config()
+        self._icon_theme_packages = cnf.value_list('DEP11::IconThemePackages')
+        self._pool_dir = cnf["Dir::Pool"]
+
+        self._allowed_exts = (".png")
+
+    def query_icon(self, size, package, icon, binid):
+        '''
+        function to query icon files from similar packages.
+        Returns path of the icon
+        '''
+
+        # we need our own session, since we use multiprocessing and an icon can be queried
+        # at any time, and even in parallel
+        session = DBConn().session()
+
+        if size:
+            params = {
+                'package': package + '%',
+                'icon': 'usr/share/icons/hicolor/' + size + '/%' + icon + '%',
+                'id': binid,
+                'suitename': self._suite_name,
+                'component': self._component,
+            }
+        else:
+            params = {
+                'package': package + '%',
+                'icon': 'usr/share/pixmaps/' + icon + '%',
+                'id': binid,
+                'suitename': self._suite_name,
+                'component': self._component
+            }
+
+        sql = """ select bc.file, f.filename
+        from
+        binaries b, bin_contents bc, files f,
+        suite s, override o, component c, bin_associations ba
+        where b.package like :package and b.file = f.id
+        and (bc.file like :icon) and
+        (bc.file not like '%.xpm' and bc.file not like '%.tiff')
+        and b.id <> :id and b.id = bc.binary_id
+        and  c.name = :component and c.id = o.component
+        and o.package = b.package and b.id = ba.bin
+        and ba.suite = s.id and s.suite_name = :suitename"""
+
+        result = session.execute(sql, params)
+        rows = result.fetchall()
+
+        if (size) and (size != "scalable") and (not rows):
+            for pkg in self._icon_theme_packages:
+                # See if an icon-theme contains the icon.
+                # Especially KDE software is packaged that way
+                # FIXME: Make the hardcoded package-names a config option
+                params = {
+                    'package': pkg,
+                    'icon': 'usr/share/icons/%/' + size + '/%' + icon + '%',
+                    'id': binid,
+                    'suitename': self._suite_name,
+                    'component': self._component
+                }
+                result = session.execute(sql, params)
+                rows = result.fetchall()
+                if rows:
+                    break
+
+        # we don't need the session anymore beyond this point
+        session.close()
+
+        for r in rows:
+            path = str(r[0])
+            deb_fname = os.path.join(self._pool_dir, self._component, str(r[1]))
+            if path.endswith(icon):
+                return {'icon_fname': path, 'deb_fname': deb_fname}
+            for ext in self._allowed_exts:
+                if path.endswith(icon+ext):
+                    return {'icon_fname': path, 'deb_fname': deb_fname}
+
+        return False
+
+    def get_icons(self, package, icon, sizes, binid):
+        '''
+        Returns the best possible icon available
+        '''
+        size_map_flist = dict()
+
+        for size in sizes:
+            flist = self.query_icon(str(size), package, icon, binid)
+            if (flist):
+                size_map_flist[size] = flist
+
+        if '64x64' not in size_map_flist:
+            # see if we can find a scalable vector graphic as icon
+            # we assume "64x64" as size here, and resize the vector
+            # graphic later.
+            flist = self.query_icon("scalable", package, icon, binid)
+            if (flist):
+                size_map_flist = {'64x64': flist}
+            else:
+                # some software doesn't store icons in sized XDG directories.
+                # catch these here, and assume that the size is 64x64
+                flist = self.query_icon(None, package, icon, binid)
+                if (flist):
+                    size_map_flist = {'64x64': flist}
+
+        return size_map_flist
+
+    def set_allowed_icon_extensions(self, exts):
+        self._allowed_exts = exts
diff --git a/dak/generate_metadata.py b/dak/generate_metadata.py
new file mode 100644
index 0000000..20613d6
--- /dev/null
+++ b/dak/generate_metadata.py
@@ -0,0 +1,366 @@
+#!/usr/bin/env python
+
+"""
+Processes all packages in a given suite to extract interesting metadata
+(mainly AppStream metainfo data). The data will be stored in
+the "bin_dep11" table.
+Additionally, a screenshot cache and tarball of all the icons of packages
+beloging to a given suite will be created.
+"""
+
+# Copyright (c) 2014 Abhishek Bhattacharjee <abhishek.bhattacharjee11@gmail.com>
+# Copyright (c) 2014-2015 Matthias Klumpp <mak@debian.org>
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+
+import sys
+import tarfile
+import shutil
+import apt_pkg
+import os
+import yaml
+import uuid
+import glob
+
+from find_metainfo import *
+from dep11.extractor import MetadataExtractor
+from dep11.component import DEP11Component, DEP11YamlDumper, get_dep11_header
+
+from daklib import daklog
+from daklib.daksubprocess import call, check_call
+from daklib.filewriter import DEP11DataFileWriter, DEP11HintsFileWriter
+from daklib.config import Config
+from daklib.dbconn import *
+from daklib.dakmultiprocessing import DakProcessPool, PROC_STATUS_SUCCESS, PROC_STATUS_SIGNALRAISED
+
+def usage():
+    print("""Usage: dak generate_metadata -s <suitename> [OPTION]
+Extract DEP-11 metadata for the specified suite.
+
+  -e, --expire       Clear the icon/screenshot cache from stale data.
+  -h, --write-hints  Export YAML documents with issues found while processing the packages.
+    """)
+
+class MetadataPool:
+    '''
+    Keeps a pool of component metadata per arch per component
+    '''
+
+    def __init__(self, values):
+        '''
+        Initialize the metadata pool.
+        '''
+        self._values = values
+        self._mcpts = dict()
+
+    def append_cptdata(self, arch, cptlist):
+        '''
+        Makes a list of all the DEP11Component objects in a arch pool
+        '''
+        cpts = self._mcpts.get(arch)
+        if not cpts:
+            self._mcpts[arch] = list()
+            cpts = self._mcpts[arch]
+        for c in cptlist:
+            # TODO: Maybe check for duplicates here?
+            # Right now, we can easily filter them out later and complain about it at the maintainer side,
+            # so a hard-check on duplicate ids might not be necessary.
+            cpts.append(c)
+
+    def export(self, session):
+        """
+        Saves metadata in db (serialized to YAML)
+        """
+        for arch, cpts in self._mcpts.items():
+            values = self._values
+            values['architecture'] = arch
+            dep11 = DEP11Metadata(session)
+            for cpt in cpts:
+                # get the metadata in YAML format
+                metadata = cpt.to_yaml_doc()
+                hints_yml = cpt.get_hints_yaml()
+                if not hints_yml:
+                    hints_yml = ""
+
+                # store metadata in database
+                dep11.insert_data(cpt._binid, cpt.cid, metadata, hints_yml, cpt.has_ignore_reason())
+        # commit all changes
+        session.commit()
+
+##############################################################################
+
+def make_icon_tar(suitename, component):
+    '''
+     icons-%(component)_%(size).tar.gz of each Component.
+    '''
+    cnf = Config()
+    sizes  = cnf.value_list('DEP11::IconSizes')
+    for size in sizes:
+        icon_location_glob = os.path.join (cnf["Dir::MetaInfo"], suitename,  component, "*", "icons", size, "*.*")
+        tar_location = os.path.join (cnf["Dir::Root"], "dists", suitename, component)
+
+        icon_tar_fname = os.path.join(tar_location, "icons-%s_%s.tar.gz" % (component, size))
+        tar = tarfile.open(icon_tar_fname, "w:gz")
+
+        for filename in glob.glob(icon_location_glob):
+            icon_name = os.path.basename (filename)
+            tar.add(filename,arcname=icon_name)
+
+        tar.close()
+
+def extract_metadata(mde, sn, pkgname, metainfo_files, binid, package_fname, arch):
+    cpts = mde.process(pkgname, package_fname, metainfo_files, binid)
+
+    data = dict()
+    data['arch'] = arch
+    data['cpts'] = cpts
+    data['message'] = "Processed package: %s (%s/%s)" % (pkgname, sn, arch)
+    return (PROC_STATUS_SUCCESS, data)
+
+def process_suite(session, suite, logger, force=False):
+    '''
+    Extract new metadata for a given suite.
+    '''
+    path = Config()["Dir::Pool"]
+
+    if suite.untouchable and not force:
+        import daklib.utils
+        daklib.utils.fubar("Refusing to touch %s (untouchable and not forced)" % suite.suite_name)
+        return
+
+    for component in [ c.component_name for c in suite.components ]:
+        mif = MetaInfoFinder(session)
+        pkglist = mif.find_meta_files(component=component, suitename=suite.suite_name)
+
+        values = {
+            'archive': suite.archive.path,
+            'suite': suite.suite_name,
+            'component': component,
+        }
+
+        pool = DakProcessPool()
+        dpool = MetadataPool(values)
+
+        def parse_results(message):
+            # Split out into (code, msg)
+            code, msg = message
+            if code == PROC_STATUS_SUCCESS:
+                # we abuse the message return value here...
+                logger.log([msg['message']])
+                dpool.append_cptdata(msg['arch'], msg['cpts'])
+            elif code == PROC_STATUS_SIGNALRAISED:
+                logger.log(['E: Subprocess recieved signal ', msg])
+            else:
+                logger.log(['E: ', msg])
+
+        cnf = Config()
+        iconf = IconFinder(suite.suite_name, component)
+        mde = MetadataExtractor(suite.suite_name, component,
+                        cnf["Dir::MetaInfo"],
+                        cnf["DEP11::Url"],
+                        cnf.value_list('DEP11::IconSizes'),
+                        iconf)
+
+        for pkgname, pkg in pkglist.items():
+            for arch, data in pkg.items():
+                package_fname = os.path.join (path, data['filename'])
+                if not os.path.exists(package_fname):
+                    print('Package not found: %s' % (package_fname))
+                    continue
+                pool.apply_async(extract_metadata,
+                            (mde, suite.suite_name, pkgname, data['files'], data['binid'], package_fname, arch), callback=parse_results)
+        pool.close()
+        pool.join()
+
+        # save new metadata to the database
+        dpool.export(session)
+        make_icon_tar(suite.suite_name, component)
+
+        logger.log(["Completed metadata extraction for suite %s/%s" % (suite.suite_name, component)])
+
+def write_component_files(session, suite, logger):
+    '''
+    Writes the metadata into Component-<arch>.yml.xz
+    Ignores if ignore is True in the db
+    '''
+
+    # SQL to fetch metadata
+    sql = """
+        select distinct bd.metadata
+        from
+        bin_dep11 bd, binaries b, bin_associations ba,
+        override o
+        where bd.ignore = FALSE and bd.binary_id = b.id and b.package = o.package
+        and o.component = :component_id and b.id = ba.bin
+        and ba.suite = :suite_id and b.architecture = :arch_id
+        """
+
+    logger.log(["Writing DEP-11 files for %s" % (suite.suite_name)])
+    for c in suite.components:
+        # writing per <arch>
+        for arch in suite.architectures:
+            if arch.arch_string == "source":
+                continue
+
+            head_string = get_dep11_header(suite.suite_name, c.component_name)
+
+            values = {
+                'archive'  : suite.archive.path,
+                'suite_id' : suite.suite_id,
+                'suite'    : suite.suite_name,
+                'component_id' : c.component_id,
+                'component'    : c.component_name,
+                'arch_id' : arch.arch_id,
+                'arch'    : arch.arch_string
+            }
+
+            writer = DEP11DataFileWriter(**values)
+            ofile = writer.open()
+            ofile.write(head_string)
+
+            result = session.execute(sql, values)
+            for doc in result:
+                ofile.write(doc[0])
+            writer.close()
+
+def write_hints_files(session, suite, logger):
+    '''
+    Writes the DEP-11 hints file (with issues and hints to improve the metadata)
+    into DEP11Hints-<component>_<arch>.yml.gz in Dir::MetaInfoHints.
+    '''
+
+    # SQL to fetch hints
+    sql = """
+        select distinct bd.hints
+        from
+        bin_dep11 bd, binaries b, bin_associations ba,
+        override o
+        where bd.binary_id = b.id and b.package = o.package
+        and o.component = :component_id and b.id = ba.bin
+        and ba.suite = :suite_id and b.architecture = :arch_id
+        """
+
+    logger.log(["Writing DEP-11 hints files for %s" % (suite.suite_name)])
+    for c in suite.components:
+        # writing per arch
+        for arch in suite.architectures:
+            if arch.arch_string == "source":
+                continue
+
+            head_string = get_dep11_header(suite.suite_name, c.component_name)
+
+            values = {
+                'archive'  : suite.archive.path,
+                'suite_id' : suite.suite_id,
+                'suite'    : suite.suite_name,
+                'component_id' : c.component_id,
+                'component'    : c.component_name,
+                'arch_id' : arch.arch_id,
+                'arch'    : arch.arch_string
+            }
+
+            writer = DEP11HintsFileWriter(Config()["Dir::MetaInfoHints"], **values)
+            ofile = writer.open()
+            ofile.write(head_string)
+
+            result = session.execute(sql, values)
+            for doc in result:
+                ofile.write(doc[0])
+            writer.close()
+
+def expire_dep11_data_cache(session, suitename, logger):
+    '''
+    Clears stale cache items per suite.
+    '''
+
+    # list for metadata we want to keep
+    keep = list()
+
+    # select all the binids with a package-name
+    # (select all package-name from binaries)
+    sql = """select bd.binary_id,b.package
+    from bin_dep11 bd, binaries b
+    where b.id = bd.binary_id"""
+
+    q = session.execute(sql)
+    result = q.fetchall()
+    for r in result:
+        keep.append("%s-%s" % (r[1], r[0]))
+
+    glob_tmpl = "%s/*/*" % (os.path.join(Config()["Dir::MetaInfo"], suitename))
+    for fname in glob.glob(glob_tmpl):
+        if not os.path.basename(fname) in keep:
+            logger.log(["Expiring DEP-11 cache directory: %s" % (fname)])
+            rmtree(fname)
+
+def main():
+    cnf = Config()
+
+    Arguments = [('h',"help","DEP11::Options::Help"),
+                 ('s',"suite","DEP11::Options::Suite", "HasArg"),
+                 ('e',"expire","DEP11::Options::ExpireCache"),
+                 ('h',"write-hints","DEP11::Options::WriteHints"),
+                 ]
+    for i in ["help", "suite", "ExpireCache"]:
+        if not cnf.has_key("DEP11::Options::%s" % (i)):
+            cnf["DEP11::Options::%s" % (i)] = ""
+
+    arguments = apt_pkg.parse_commandline(cnf.Cnf, Arguments, sys.argv)
+    Options = cnf.subtree("DEP11::Options")
+
+    if Options["Help"]:
+        usage()
+        return
+
+    suitename = Options["Suite"]
+    if not suitename:
+        print("You need to specify a suite!")
+        sys.exit(1)
+
+    # check if we have some important config options set
+    if not cnf.has_key("Dir::MetaInfo"):
+        print("You need to specify a metadata export directory (Dir::MetaInfo)")
+        sys.exit(1)
+    if not cnf.has_key("DEP11::Url"):
+        print("You need to specify a metadata public web URL (DEP11::Url)")
+        sys.exit(1)
+    if not cnf.has_key("DEP11::IconSizes"):
+        print("You need to specify a list of allowed icon-sizes (DEP11::IconSizes)")
+        sys.exit(1)
+    if Options["WriteHints"] and not cnf.has_key("Dir::MetaInfoHints"):
+        print("You need to specify an export directory for DEP-11 hints files (Dir::MetaInfoHints)")
+        sys.exit(1)
+
+    logger = daklog.Logger('generate-metadata')
+
+    from daklib.dbconn import Component, DBConn, get_suite, Suite
+    session = DBConn().session()
+    suite = get_suite(suitename.lower(), session)
+
+    if Options["ExpireCache"]:
+        expire_dep11_data_cache(session, suitename, logger)
+
+    process_suite(session, suite, logger)
+    # export database content as Components-<arch>.xz YAML documents
+    write_component_files(session, suite, logger)
+
+    if Options["WriteHints"]:
+        write_hints_files(session, suite, logger)
+
+    # we're done
+    logger.close()
+
+if __name__ == "__main__":
+    main()
diff --git a/daklib/dbconn.py b/daklib/dbconn.py
index 3fab31b..123760e 100644
--- a/daklib/dbconn.py
+++ b/daklib/dbconn.py
@@ -2568,6 +2568,34 @@ __all__.append('get_version_checks')
 
 ################################################################################
 
+class DEP11Metadata():
+
+    def __init__(self, session):
+        self._session = session
+
+    def insert_data(self, binid, cid, yamldoc, hints, ignore):
+        d = {"bin_id": binid,
+             "cpt_id": cid,
+             "yaml_data": yamldoc,
+             "hints": hints,
+             "ignore": ignore}
+
+        sql = """insert into bin_dep11(binary_id,cpt_id,metadata,hints,ignore)
+        VALUES (:bin_id, :cpt_id, :yaml_data, :hints, :ignore)"""
+        self._session.execute(sql, d)
+
+    def remove_data(self, suitename):
+        sql = """delete from bin_dep11 where binary_id in
+        (select distinct(b.id) from binaries b,override o,suite s
+        where b.package = o.package and o.suite = s.id
+        and s.suite_name= :suitename)"""
+        self._session.execute(sql, {"suitename": suitename})
+        self._session.commit()
+
+__all__.append('DEP11Metadata')
+
+################################################################################
+
 class DBConn(object):
     """
     database module init.
diff --git a/daklib/filewriter.py b/daklib/filewriter.py
index 7db4208..01bd989 100644
--- a/daklib/filewriter.py
+++ b/daklib/filewriter.py
@@ -164,3 +164,29 @@ class TranslationFileWriter(BaseFileWriter):
         flags.update(keywords)
         template = "%(archive)s/dists/%(suite)s/%(component)s/i18n/Translation-%(language)s"
         super(TranslationFileWriter, self).__init__(template, **flags)
+
+class DEP11DataFileWriter(BaseFileWriter):
+    def __init__(self, **keywords):
+        '''
+        The value of the keywords suite and component are strings.
+        Output files are gzip compressed only.
+        '''
+        flags = {
+            'compression': ['xz'],
+        }
+        flags.update(keywords)
+        template = "%(archive)s/dists/%(suite)s/%(component)s/Components-%(arch)s.yml"
+        BaseFileWriter.__init__(self, template, **flags)
+
+class DEP11HintsFileWriter(BaseFileWriter):
+    def __init__(self, destination, **keywords):
+        '''
+        The value of the keywords suite and component are strings.
+        Output files are gzip compressed only.
+        '''
+        flags = {
+            'compression': ['gzip'],
+        }
+        flags.update(keywords)
+        template = destination + "/%(suite)s/DEP11Hints-%(component)s_%(arch)s.yml"
+        BaseFileWriter.__init__(self, template, **flags)

Reply to: