[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#773294: tracker.debian.org: add support for the vcswatch service



Le mercredi 22 novembre 2017 à 10:38:04+0100, Raphael Hertzog a écrit :
> Hi Pierre-Elliott,
> 
> On Wed, 22 Nov 2017, Pierre-Elliott Bécue wrote:
> > Please, feel free to review and comment the patch.
> > 
> > It lacks tests for the task, I'll work on that by the end of the week in a
> > fourth patch.
> 
> First of all, I would highly suggest to try to follow the principles of
> test-driven development:
> https://www.obeythetestinggoat.com/
> 
> IOW, you write a failing test first, then you code what's necessary to
> make the test pass and so on until you have all the features that you
> are looking for.
> 
> Adding tests afterwards is hard and you are likely to miss some important
> tests. And it's just not fun to code tests separately when you already
> have something working... it will be hard to stick to it in the long run.
> 
> Then on the compression feature, I have to agree with pabs, we should not
> have to look into the data-flow to find the compression method. I would
> suggest:
> - either the user is explicit (compression="gzip" attribute, or
>   compression=None/False for no compression)
> - or the user relies on compression="auto" (default value) in which
>   case it should just rely on the filename extension that we should pass
>   along in some way.
> 
> In the case of something retrieved through the web, it might be
> interesting to check the Content-Type to infer the compression too:
> $ HEAD https://qa.debian.org/data/vcswatch/vcswatch.json.gz |grep Content-Type
> Content-Type: application/x-gzip
> X-Content-Type-Options: nosniff
> 
> (But it might be overkill since I guess we have a usable extension in 99% of the cases,
> feel free to skip this)
> 
> Also it seems to me that the correct API for generic decompression should
> rely on getting a file object as input and providing a file object as
> output... as that's what you want to be able to process really big files
> without reading them all at once in memory (shall the need arise).
> 
> Now onto the vcswatch code:
> 
> > --- a/distro_tracker/core/panels.py
> > +++ b/distro_tracker/core/panels.py
> > @@ -235,6 +235,12 @@ class GeneralInformationPanel(BasePanel):
> >              # There is no general info for the package
> >              return
> >  
> > +        try:
> > +            vcswatch = PackageExtractedInfo.objects.get(
> > +                package=self.package, key='vcswatch').value
> > +        except PackageExtractedInfo.DoesNotExist:
> > +            vcswatch = {}
> 
> I don't like that we modify a general panel for a feature that is currently
> vendor specific. I would like to see an intermediary abstraction under
> the form of "vcs_extra_links". It can still be in PackageExtractedInfo but
> you would not be alone in having the right to create/modify those keys.
> It would be something like this:
> {
> 	'QA': 'https://qa.debian.org/cgi-bin/vcswatch?package=foo',
> 	'Browse': '...',
> }
> 
> And the code would display the links in alphabetical order. (Yes I'm suggesting
> that we might want to move the code that stores the Vcs-Browser link to use
> this new structure)
> 
> Also if we have to extract two keys out of  PackageExtractedInfo we want to do
> it in a single query to avoid round-trips with the database. The package page
> is already very heavy in numbers of queries.
> 
> > --- a/distro_tracker/core/templates/core/panels/general.html
> > +++ b/distro_tracker/core/templates/core/panels/general.html
> > @@ -95,7 +95,7 @@
> >         <a href="{{ ctx.vcs.url }}">{{ vcs }}</a>
> >         {% endif %}
> >         {% if ctx.vcs.browser %}
> > -       (<a href="{{ ctx.vcs.browser }}">Browse</a>)
> > +       (<a href="{{ ctx.vcs.browser }}">Browse</a>{% if ctx.vcs.watch %}, <a href="{{ ctx.vcs.watch }}">
> > QA</a>{% endif %})
> >         {% endif %}
> >         {% endwith %}
> 
> You can have vcswatch data even if you don't have any browse link. vcswatch works as soon
> as you have a vcs url. But here the code would be very different (and cleaner) if you
> implement the abstraction suggested above.
> 
> > --- /dev/null
> > +++ b/distro_tracker/vendor/debian/templates/debian/vcswatch-action-item.html
> > @@ -0,0 +1,9 @@
> > +{% with description=item.extra_data.description %}
> > +{% with error=item.extra_data.error %}
> > +
> > +<a href="{{item.extra_data.vcswatch_url}}">VCSwatch</a> reports
> > +that {{description}}<br/><br/>
> > +{% if error %}
> > +<span>{{error}}</span>
> > +{% endif %}
> > +{% endwith %}{% endwith %}
> 
> This looks wrong. The long descriptions should really be embedded here
> (with multiple if to check for the vcswatch status) and
> use whatever data they need out of extra_data.
> 
> > +        __out = {}
> 
> Please don't use variable names with underscores. A single underscore
> might be useful for private method or function names (or for variables defined at
> the module level possibly). But it's not really useful for private
> variables within methods.
> 
> > +    @staticmethod
> > +    def get_data_checksum(data):
> > +        json_dump = json.dumps(data, sort_keys=True)
> > +        if json_dump is not six.binary_type:
> > +            json_dump = json_dump.encode('UTF-8')
> > +        return hashlib.md5(json_dump).hexdigest()
> 
> This checksum mechanism to see whether we have updated data should likely
> be factored out in a function where it can be used by other objects. To make
> it possible to store the checksum alongside the data, you might want to checksum
> a copy of the data where you drop the pre-defined key "data_checksum".
> 
> We use Python 3 only now, so it's no longer required to use "six.binary_type".
> But in fact the whole check is not required as I believe that json.dumps will
> always return text (not bytes).
> 
> > +    def get_vcswatch_data(self):
> > +        url = 'https://qa.debian.org/data/vcswatch/vcswatch.json.gz'
> > +        data = json.loads(get_resource_content(url, compression=True).decode('utf-8'))
> 
> get_resource_content() might return None in case of network issue (which will
> trigger an excteption here). Also the need to decode the output is really
> annoying, we might want to introduce "get_resource_text()" which does this for
> us automatically.
> 
> This is a pretty-good job so far, thank you!
> 
> Cheers,

Taking all remarks and suggestions into account, here are 6 commits patches
on top of master to implement vcswatch support and vcs intel move from
general PackageExtractedInfo to vcs PackageExtractedInfo.

Remarks welcome.

-- 
PEB
From d8f5579fd131c10a40e81cc864726aa7fc2b426f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Pierre-Elliott=20B=C3=A9cue?= <becue@crans.org>
Date: Mon, 27 Nov 2017 17:06:45 +0100
Subject: [PATCH 1/6] Implements a basic compression feature to support
 vcswatch compressed file

---
 distro_tracker/core/tests/tests_utils.py | 97 ++++++++++++++++++++++++++++++++
 distro_tracker/core/utils/compression.py | 77 +++++++++++++++++++++++++
 2 files changed, 174 insertions(+)
 create mode 100644 distro_tracker/core/utils/compression.py

diff --git a/distro_tracker/core/tests/tests_utils.py b/distro_tracker/core/tests/tests_utils.py
index 1b46502..dc79985 100644
--- a/distro_tracker/core/tests/tests_utils.py
+++ b/distro_tracker/core/tests/tests_utils.py
@@ -36,6 +36,8 @@ from distro_tracker.core.utils import now
 from distro_tracker.core.utils import SpaceDelimitedTextField
 from distro_tracker.core.utils import PrettyPrintList
 from distro_tracker.core.utils import verify_signature
+from distro_tracker.core.utils.compression import uncompress_content
+from distro_tracker.core.utils.compression import guess_compression_method
 from distro_tracker.core.utils.packages import AptCache
 from distro_tracker.core.utils.packages import extract_vcs_information
 from distro_tracker.core.utils.packages import extract_dsc_file_name
@@ -1537,3 +1539,98 @@ class UtilsTests(TestCase):
     def test_now(self):
         """Ensure distro_tracker.core.utils.now() exists"""
         self.assertIsInstance(now(), datetime.datetime)
+
+
+class CompressionTests(TestCase):
+    def setUp(self):
+        # Set up a cache directory to use in the tests
+        _handler, self.temporary_bzip2_file = tempfile.mkstemp(suffix='.bz2')
+        os.write(
+            _handler,
+            (
+                b'BZh91AY&SY\x03X\xf5w\x00\x00\x01\x15\x80`\x00\x00@\x06\x04'
+                b'\x90\x80 \x001\x06LA\x03L"\xe0\x8bb\xa3\x9e.\xe4\x8ap\xa1 '
+                b'\x06\xb1\xea\xee'
+            ),
+        )
+        os.close(_handler)
+        _handler, self.temporary_gzip_file = tempfile.mkstemp(suffix='.gz')
+        os.write(
+            _handler,
+            (
+                b"\x1f\x8b\x08\x08\xca\xaa\x14Z\x00\x03helloworld\x00\xf3H"
+                b"\xcd\xc9\xc9W(\xcf/\xcaIQ\x04\x00\x95\x19\x85\x1b\x0c\x00"
+                b"\x00\x00"
+            ),
+        )
+        os.close(_handler)
+        _handler, self.temporary_xz_file = tempfile.mkstemp(suffix='.xz')
+        os.write(
+            _handler,
+            (
+                b"\xfd7zXZ\x00\x00\x04\xe6\xd6\xb4F\x02\x00!\x01\x16\x00\x00"
+                b"\x00t/\xe5\xa3\x01\x00\x0bHello world!\x00\nc\xd6\xf3\xf6"
+                b"\x80[\xd3\x00\x01$\x0c\xa6\x18\xd8\xd8\x1f\xb6\xf3}\x01\x00"
+                b"\x00\x00\x00\x04YZ"
+            ),
+        )
+        os.close(_handler)
+        _handler, self.temporary_plain_file = tempfile.mkstemp()
+        os.write(_handler, b"Hello world!")
+        os.close(_handler)
+
+    def tearDown(self):
+        os.unlink(self.temporary_bzip2_file)
+        os.unlink(self.temporary_gzip_file)
+        os.unlink(self.temporary_xz_file)
+        os.unlink(self.temporary_plain_file)
+
+    def get_uncompressed_text(self, file_path, compression):
+        """Calls to the uncompress function and does the redundant jobs for
+        each subtest"""
+
+        handler = open(file_path, 'rb')
+        handler = uncompress_content(handler, compression)
+        return handler.read().decode('ascii')
+
+    def test_bzip2_file(self):
+        """Tests the decompression of a bzip2 file"""
+        output = self.get_uncompressed_text(
+            self.temporary_bzip2_file, compression="bzip2")
+        self.assertEqual(output, "Hello world!")
+
+    def test_gzip_file(self):
+        """Tests the decompression of a gzip file"""
+        output = self.get_uncompressed_text(
+            self.temporary_gzip_file, compression="gzip")
+        self.assertEqual(output, "Hello world!")
+
+    def test_xz_file(self):
+        """Tests the decompression of a lzma-xz file"""
+        output = self.get_uncompressed_text(
+            self.temporary_xz_file, compression="xzip")
+        self.assertEqual(output, "Hello world!")
+
+    def test_no_compression_file(self):
+        """Tests if a plain file is correctly decompressed."""
+        output = self.get_uncompressed_text(
+            self.temporary_plain_file, compression="plain")
+        self.assertEqual(output, "Hello world!")
+
+    def test_compression_guess(self):
+        """As the compression is given explicitely in the previous tests
+        because tempfiles have no extension, this test checks if the
+        guess_compression_method function in compression utils works fine.
+
+        """
+
+        for (ext, method) in [
+                ("gz", "gzip"),
+                ("bz2", "bzip2"),
+                ("xz", "xzip"),
+                ("txt", "plain"),
+        ]:
+            filename = "%s.%s" % ("test", ext)
+            self.assertEqual(
+                guess_compression_method(filename),
+                method)
diff --git a/distro_tracker/core/utils/compression.py b/distro_tracker/core/utils/compression.py
new file mode 100644
index 0000000..e7d6db4
--- /dev/null
+++ b/distro_tracker/core/utils/compression.py
@@ -0,0 +1,77 @@
+# Copyright 2013 The Distro Tracker Developers
+# See the COPYRIGHT file at the top-level directory of this distribution and
+# at https://deb.li/DTAuthors
+#
+# This file is part of Distro Tracker. It is subject to the license terms
+# in the LICENSE file found in the top-level directory of this
+# distribution and at https://deb.li/DTLicense. No part of Distro Tracker,
+# including this file, may be copied, modified, propagated, or distributed
+# except according to the terms contained in the LICENSE file.
+"""
+Utilities for handling compression
+"""
+
+import os
+
+
+def guess_compression_method(filepath):
+    """Given filepath, tries to determine the compression of the file."""
+
+    filepath = filepath.lower()
+
+    extensions_to_method = {
+        "gz": "gzip",
+        "bz2": "bzip2",
+        "xz": "xzip",
+        "txt": "plain",
+    }
+
+    for (ext, method) in extensions_to_method.items():
+        if filepath.endswith(ext):
+            return method
+
+    return "plain"
+
+
+def uncompress_content(file_handle, compression_method="auto"):
+    """Receiving a file_handle, guess if it's compressed and then
+    *CLOSES* it. Returns a new uncompressed handle.
+
+    :param compression: The compression type. If not `auto`, then
+    do not guess the compression and assume it's what referenced.
+    :type compression: str
+
+    """
+
+    # A file_handle being provided, let's extract an absolute path.
+    file_path = os.path.abspath(file_handle.name)
+    if compression_method == "auto":
+        # Guess the compression method from file_path
+        compression_method = guess_compression_method(file_path)
+    elif compression_method is None:
+        compression_method = "plain"
+
+    _open = None
+
+    def _open(fobj, mode):
+        """Proxy open function depending on the compression method"""
+        if compression_method == "gzip":
+            import gzip
+            return gzip.open(filename=fobj, mode=mode)
+        elif compression_method == "bzip2":
+            import bz2
+            return bz2.open(filename=fobj, mode=mode)
+        elif compression_method == "xzip":
+            import lzma
+            return lzma.open(filename=fobj, mode=mode)
+        elif compression_method == "plain":
+            return fobj
+        else:
+            raise NotImplementedError(
+                (
+                    "The compression method %r is not known or not yet "
+                    "implemented."
+                ) % (compression_method,)
+            )
+
+    return _open(file_handle, "rb")
-- 
2.11.0

From 43ded90eb11035828cb0404a20fad3f293a7247a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Pierre-Elliott=20B=C3=A9cue?= <becue@crans.org>
Date: Mon, 27 Nov 2017 17:08:56 +0100
Subject: [PATCH 2/6] Implements compression utility into http utils

---
 distro_tracker/core/utils/http.py | 40 ++++++++++++++++++++++++++++++---------
 1 file changed, 31 insertions(+), 9 deletions(-)

diff --git a/distro_tracker/core/utils/http.py b/distro_tracker/core/utils/http.py
index 3efe371..05d0ee7 100644
--- a/distro_tracker/core/utils/http.py
+++ b/distro_tracker/core/utils/http.py
@@ -21,6 +21,8 @@ import json
 from requests.structures import CaseInsensitiveDict
 import requests
 
+from .compression import guess_compression_method, uncompress_content
+
 
 def parse_cache_control_header(header):
     """
@@ -87,15 +89,31 @@ class HttpCache(object):
         # If there is no cache freshness date consider the item expired
         return True
 
-    def get_content(self, url):
-        """
-        Returns the content of the cached response for the given URL.
+    def get_content(self, url, compression="auto"):
+        """Returns the content of the cached response for the given URL.
+
+        If the file is compressed, then uncompress it, else, consider it
+        as plain file.
+
+        :param compression: Specifies the compression method used to generate
+            the resource, and thus the compression method one should use to
+            decompress it.
+        :type compression: str
 
         :rtype: :class:`bytes`
+
         """
         if url in self:
-            with open(self._content_cache_file_path(url), 'rb') as content_file:
-                return content_file.read()
+            file_handle = None
+
+            if compression == "auto":
+                compression = guess_compression_method(url)
+
+            with open(self._content_cache_file_path(url), 'rb') as temp_file:
+                file_handle = uncompress_content(temp_file, compression)
+                content = file_handle.read()
+
+            return content
 
     def get_headers(self, url):
         """
@@ -169,9 +187,8 @@ class HttpCache(object):
         return md5(url.encode('utf-8')).hexdigest()
 
 
-def get_resource_content(url, cache=None):
-    """
-    A helper function which returns the content of the resource found at the
+def get_resource_content(url, cache=None, compression="auto"):
+    """A helper function which returns the content of the resource found at the
     given URL.
 
     If the resource is already cached in the ``cache`` object and the cached
@@ -187,9 +204,14 @@ def get_resource_content(url, cache=None):
         ``DISTRO_TRACKER_CACHE_DIRECTORY`` cache directory
         is used.
     :type cache: :class:`HttpCache` or an object with an equivalent interface
+    :param compression: Specifies the compression method used to generate the
+        resource, and thus the compression method one should use to decompress
+        it. If auto, then guess it from the url file extension.
+    :type compression: str
 
     :returns: The bytes representation of the resource found at the given url
     :rtype: bytes
+
     """
     if cache is None:
         cache_directory_path = settings.DISTRO_TRACKER_CACHE_DIRECTORY
@@ -198,6 +220,6 @@ def get_resource_content(url, cache=None):
     try:
         if cache.is_expired(url):
             cache.update(url)
-        return cache.get_content(url)
+        return cache.get_content(url, compression)
     except:
         pass
-- 
2.11.0

From 3fa2d3bcb4d551727b97c01567aa6f464ab7fd23 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Pierre-Elliott=20B=C3=A9cue?= <becue@crans.org>
Date: Mon, 27 Nov 2017 17:09:18 +0100
Subject: [PATCH 3/6] Implements a get_resource_text

 It's a simple decoding wrapper around get_resource_content
---
 distro_tracker/core/utils/http.py | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/distro_tracker/core/utils/http.py b/distro_tracker/core/utils/http.py
index 05d0ee7..a6479da 100644
--- a/distro_tracker/core/utils/http.py
+++ b/distro_tracker/core/utils/http.py
@@ -223,3 +223,39 @@ def get_resource_content(url, cache=None, compression="auto"):
         return cache.get_content(url, compression)
     except:
         pass
+
+
+def get_resource_text(url, cache=None, encoding="utf-8", compression="auto"):
+    """A helper function which returns the text from a resource found at
+    the given URL
+
+    If the resource is already cached in the ``cache`` object and the cached
+    content has not expired, the function will not do any HTTP requests and
+    will return the cached content.
+
+    If the resource is stale or not cached at all, it is from the Web.
+
+    :param url: The URL of the resource to be retrieved
+    :param cache: A cache object which should be used to look up and store
+        the cached resource. If it is not provided, an instance of
+        :class:`HttpCache` with a
+        ``DISTRO_TRACKER_CACHE_DIRECTORY`` cache directory
+        is used.
+    :type cache: :class:`HttpCache` or an object with an equivalent interface
+    :param encoding: Specifies an encoding to decode the resource_content.
+    :type encoding: str
+    :param compression: Specifies the compression method used to generate the
+        resource, and thus the compression method one should use to decompress
+        it.
+    :type compression: str
+
+    :returns: The bytes representation of the resource found at the given url
+    :rtype: bytes
+
+    """
+
+    content = get_resource_content(url, cache, compression)
+
+    # If content is None, there's nothing to do.
+    if content is not None:
+        return content.decode(encoding)
-- 
2.11.0

From ba7aabd1c9508690ff3a71bb2ab0b5b1ebb85664 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Pierre-Elliott=20B=C3=A9cue?= <becue@crans.org>
Date: Mon, 27 Nov 2017 17:11:01 +0100
Subject: [PATCH 4/6] Adds a generic get_data_checksum util, and removes it
 from tracker_tasks

 With tests updates, of course.
---
 distro_tracker/core/tests/tests_utils.py      |  6 ++++++
 distro_tracker/core/utils/misc.py             | 27 +++++++++++++++++++++++++++
 distro_tracker/vendor/debian/tests.py         |  4 ----
 distro_tracker/vendor/debian/tracker_tasks.py | 11 +++--------
 4 files changed, 36 insertions(+), 12 deletions(-)
 create mode 100644 distro_tracker/core/utils/misc.py

diff --git a/distro_tracker/core/tests/tests_utils.py b/distro_tracker/core/tests/tests_utils.py
index dc79985..8df0f52 100644
--- a/distro_tracker/core/tests/tests_utils.py
+++ b/distro_tracker/core/tests/tests_utils.py
@@ -59,6 +59,7 @@ from distro_tracker.core.utils.http import get_resource_content
 from distro_tracker.test import TestCase, SimpleTestCase
 from distro_tracker.test.utils import set_mock_response
 from distro_tracker.test.utils import make_temp_directory
+from distro_tracker.core.utils.misc import get_data_checksum
 
 
 class VerpModuleTest(SimpleTestCase):
@@ -1540,6 +1541,11 @@ class UtilsTests(TestCase):
         """Ensure distro_tracker.core.utils.now() exists"""
         self.assertIsInstance(now(), datetime.datetime)
 
+    def test_get_data_checksum(self):
+        """Ensures get_data_checksum behaves as expected."""
+        checksum = get_data_checksum({})
+        self.assertEqual(checksum, '99914b932bd37a50b983c5e7c90ae93b')
+
 
 class CompressionTests(TestCase):
     def setUp(self):
diff --git a/distro_tracker/core/utils/misc.py b/distro_tracker/core/utils/misc.py
new file mode 100644
index 0000000..1e12f74
--- /dev/null
+++ b/distro_tracker/core/utils/misc.py
@@ -0,0 +1,27 @@
+# Copyright 2013 The Distro Tracker Developers
+# See the COPYRIGHT file at the top-level directory of this distribution and
+# at https://deb.li/DTAuthors
+#
+# This file is part of Distro Tracker. It is subject to the license terms
+# in the LICENSE file found in the top-level directory of this
+# distribution and at https://deb.li/DTLicense. No part of Distro Tracker,
+# including this file, may be copied, modified, propagated, or distributed
+# except according to the terms contained in the LICENSE file.
+"""
+Miscellaneous utilities that don't require their own python module.
+"""
+
+import json
+import hashlib
+
+
+def get_data_checksum(data):
+    """Checksums a dict, without its prospective 'checksum' key/value."""
+    to_hash = dict(data)
+
+    # Checksum has to be done without taking the checksum
+    # into account.
+    if 'checksum' in to_hash:
+        to_hash.pop('checksum')
+    json_dump = json.dumps(to_hash, sort_keys=True)
+    return hashlib.md5(json_dump.encode('utf-8', 'ignore')).hexdigest()
diff --git a/distro_tracker/vendor/debian/tests.py b/distro_tracker/vendor/debian/tests.py
index 4558aa1..a060234 100644
--- a/distro_tracker/vendor/debian/tests.py
+++ b/distro_tracker/vendor/debian/tests.py
@@ -2398,10 +2398,6 @@ class UpdateSecurityIssuesTaskTests(TestCase):
         self.assertTrue(stats['dummy-package']['jessie']['open'], 2)
         self.assertTrue(stats['dummy-package']['jessie']['nodsa'], 1)
 
-    def test_get_data_checksum(self):
-        checksum = self.task.get_data_checksum({})
-        self.assertEqual(checksum, '99914b932bd37a50b983c5e7c90ae93b')
-
     def test_execute_create_data(self):
         self.mock_json_data('open')
         self.run_task()
diff --git a/distro_tracker/vendor/debian/tracker_tasks.py b/distro_tracker/vendor/debian/tracker_tasks.py
index 0a550a6..541db22 100644
--- a/distro_tracker/vendor/debian/tracker_tasks.py
+++ b/distro_tracker/vendor/debian/tracker_tasks.py
@@ -34,6 +34,7 @@ from distro_tracker.vendor.debian.models import PackageExcuses
 from distro_tracker.vendor.debian.models import UbuntuPackage
 from distro_tracker.core.utils.http import HttpCache
 from distro_tracker.core.utils.http import get_resource_content
+from distro_tracker.core.utils.misc import get_data_checksum
 from distro_tracker.core.utils.packages import package_hashdir
 from .models import DebianContributor
 from distro_tracker import vendor
@@ -43,7 +44,6 @@ import io
 import os
 import re
 import json
-import hashlib
 import itertools
 
 from debian import deb822
@@ -1312,11 +1312,6 @@ class UpdateSecurityIssuesTask(BaseTask):
             stats[pkg] = cls.get_issues_summary(issues)
         return stats
 
-    @staticmethod
-    def get_data_checksum(data):
-        json_dump = json.dumps(data, sort_keys=True).encode('utf-8')
-        return hashlib.md5(json_dump).hexdigest()
-
     def _get_short_description(self, key, action_item):
         count = action_item.extra_data['security_issues_count']
         url = 'https://security-tracker.debian.org/tracker/source-package/{}'
@@ -1354,7 +1349,7 @@ class UpdateSecurityIssuesTask(BaseTask):
         return {
             'details': issues,
             'stats': cls.get_issues_summary(issues),
-            'checksum': cls.get_data_checksum(issues)
+            'checksum': get_data_checksum(issues)
         }
 
     def process_pkg_action_items(self, pkgdata, existing_action_items):
@@ -1410,7 +1405,7 @@ class UpdateSecurityIssuesTask(BaseTask):
         for pkgname, issues in content.items():
             if pkgname in all_data:
                 # Check if we need to update the existing data
-                checksum = self.get_data_checksum(issues)
+                checksum = get_data_checksum(issues)
                 if all_data[pkgname].value.get('checksum', '') == checksum:
                     continue
                 # Update the data
-- 
2.11.0

From f3a362196420565731810b4865b3ebf5c9ae6e62 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Pierre-Elliott=20B=C3=A9cue?= <becue@crans.org>
Date: Mon, 27 Nov 2017 17:13:12 +0100
Subject: [PATCH 5/6] Adds support for vcswatch QA service

 Closes: #773294
---
 .../templates/debian/vcswatch-action-item.html     |  28 ++
 distro_tracker/vendor/debian/tests.py              | 213 ++++++++++++
 distro_tracker/vendor/debian/tracker_tasks.py      | 371 +++++++++++++++++++++
 3 files changed, 612 insertions(+)
 create mode 100644 distro_tracker/vendor/debian/templates/debian/vcswatch-action-item.html

diff --git a/distro_tracker/vendor/debian/templates/debian/vcswatch-action-item.html b/distro_tracker/vendor/debian/templates/debian/vcswatch-action-item.html
new file mode 100644
index 0000000..0fd203c
--- /dev/null
+++ b/distro_tracker/vendor/debian/templates/debian/vcswatch-action-item.html
@@ -0,0 +1,28 @@
+{% with name=item.extra_data.name %}
+{% with status=item.extra_data.status %}
+{% with error=item.extra_data.error %}
+
+<a href="{{item.extra_data.vcswatch_url}}">vcswatch</a> reports that
+{% if status == "NEW" %}
+this package has a new version ready in the VCS. You should consider uploading
+into the archive.
+{% elif status == "COMMITS" %}
+this package seems to have new commits in its VCS. You should consider updating
+the debian/changelog and uploading this new version into the archive.
+{% elif status == "OLD" %}
+the current version of the package is NOT in its VCS. You should push your
+commits immediately.
+{% elif status == "UNREL" %}
+this package has been uploaded into the archive but the debian/changelog in the
+VCS is still UNRELEASED.  You should consider updating the VCS.
+{% elif status == "ERROR" %}
+there is an error with this package's VCS, or the debian/changelog file inside
+it. Either you should create the VCS or you should fix whatever issue there is.
+{% elif status == "DEFAULT" %}
+a new type of VCS status has been added. Please <a href="mailto:submit@bugs.debian.org?Subject=tracker.debian.org%3A%20{{name}}%3A%20new%20vcswatch%20status%3A%20{{status}}&Body=Package%3A%20tracker.debian.org%0AUser%3A%20tracker.debian.org@packages.debian.org%0AUsertags%3A%20vcswatch%0A%0AThe%20vcswatch%20status%20for%20{{name}}%20is%20{{status}}%20and%0Athis%20is%20not%20known%20by%20the%20tracker%20website%3A%0Ahttps%3A//tracker.debian.org/pkg/{{name}}";>report</a> a bug about this so
+that the maintainers can describe it.
+{% endif %}<br/><br/>
+{% if error %}
+<span>{{error}}</span>
+{% endif %}
+{% endwith %}{% endwith %}{% endwith %}
diff --git a/distro_tracker/vendor/debian/tests.py b/distro_tracker/vendor/debian/tests.py
index a060234..ed78a26 100644
--- a/distro_tracker/vendor/debian/tests.py
+++ b/distro_tracker/vendor/debian/tests.py
@@ -69,6 +69,7 @@ from distro_tracker.vendor.debian.tracker_tasks \
     import UpdatePackageScreenshotsTask
 from distro_tracker.vendor.debian.tracker_tasks \
     import UpdateBuildReproducibilityTask
+from distro_tracker.vendor.debian.tracker_tasks import UpdateVcsWatchTask
 from distro_tracker.vendor.debian.models import DebianContributor
 from distro_tracker.vendor.debian.models import UbuntuPackage
 from distro_tracker.vendor.debian.tracker_tasks import UpdateLintianStatsTask
@@ -5081,3 +5082,215 @@ class UpdateBuildReproducibilityTaskTest(TestCase):
         info = self.dummy_package.packageextractedinfo_set.get(key='general')
 
         self.assertEqual(info.value['name'], 'dummy')
+
+
+@mock.patch('distro_tracker.core.utils.http.requests')
+class UpdateVcsWatchTaskTest(TestCase):
+    """
+    Tests for the:class:`distro_tracker.vendor.debian.tracker_tasks.
+    UpdateVcsWatchTask` task.
+    """
+    def setUp(self):
+        self.json_data = """[{
+ "commits": "46",
+ "package": "dummy",
+ "error": null,
+ "status": "COMMITS"}]"""
+        self.other_json_data = """[]"""
+        self.dummy_package = SourcePackageName.objects.create(name='dummy')
+        self.other_dummy_package = SourcePackageName.objects.create(
+            name='other-dummy')
+
+        # Useful for last test
+        PackageExtractedInfo.objects.create(
+            package=self.dummy_package,
+            key='general',
+            value={
+                'name': 'dummy',
+                'maintainer': {
+                    'email': 'jane@example.com',
+                }
+            }
+        )
+
+    def run_task(self):
+        """
+        Runs the build reproducibility status update task.
+        """
+        task = UpdateVcsWatchTask()
+        # Hijacks the url to avoid messy decompression.
+        task.VCSWATCH_DATA_URL = (
+            'https://qa.debian.org/data/vcswatch/vcswatch.json'
+        )
+        task.execute()
+
+    def test_extractedinfo_without_vcswatch(self, mock_requests):
+        """
+        Tests that packages without vcswatch info don't claim to have
+        them.
+        """
+        set_mock_response(mock_requests, text=self.json_data)
+
+        self.run_task()
+
+        with self.assertRaises(PackageExtractedInfo.DoesNotExist):
+            self.other_dummy_package.packageextractedinfo_set.get(key='vcs')
+
+    def test_no_extractedinfo_for_unknown_package(self, mock_requests):
+        """
+        Tests that BuildReproducibilityTask doesn't fail with an unknown
+        package.
+        """
+        set_mock_response(mock_requests, text="[]")
+
+        self.run_task()
+
+        count = PackageExtractedInfo.objects.filter(
+            key='vcs').count()
+        self.assertEqual(0, count)
+
+    def test_extractedinfo_with_vcswatch(self, mock_requests):
+        """
+        Tests that PackageExtractedInfo for a package with vcswatch info
+        is correct.
+        """
+        set_mock_response(mock_requests, text=self.json_data)
+
+        self.run_task()
+
+        theoretical_extra_data = {
+            'name': 'dummy',
+            'status': 'COMMITS',
+            'error': None,
+            'vcswatch_url': (
+                'https://qa.debian.org/cgi-bin/vcswatch?package=dummy'
+            ),
+            'commits': 46,
+        }
+        theoretical_package_info = {
+            "checksum": '1b2cab38521286a355f8394ff4a1adfc',
+            "watch_url": (
+                'https://qa.debian.org/cgi-bin/vcswatch?package=dummy'
+            ),
+        }
+
+        info = self.dummy_package.packageextractedinfo_set.get(
+            key='vcs')
+
+        for key in ['checksum', 'watch_url']:
+            self.assertEqual(info.value[key], theoretical_package_info[key])
+        action_items = self.dummy_package.action_items
+        self.assertEqual(action_items.count(), 1)
+        action_item = action_items.first()
+        self.assertEqual(action_item.item_type.type_name,
+                         UpdateVcsWatchTask.ACTION_ITEM_TYPE_NAME)
+        for key in ['name', 'status', 'error', 'commits', 'vcswatch_url']:
+            self.assertEqual(
+                action_item.extra_data[key],
+                theoretical_extra_data[key])
+
+    def test_extractedinfo_is_updated_if_needed(self, mock_requests):
+        """
+        Tests that PackageExtractedInfo is updated if vcswatch_url changes.
+        """
+        set_mock_response(mock_requests, text=self.json_data)
+        self.run_task()
+
+        # Alters the info so that it's not destroyed when we
+        # remove vcswatch data.
+        dummy_pi = self.dummy_package.packageextractedinfo_set.get(
+            key='vcs')
+        dummy_pi.value['test_useless_entry'] = True
+
+        # No need to change the checksum as we test a case where it's
+        # re-computed.
+        dummy_pi.save()
+
+        # Now it should be good.
+        set_mock_response(mock_requests, text=self.other_json_data)
+        self.run_task()
+
+        # Normally, no watch_url in the package
+        dummy_pi = self.dummy_package.packageextractedinfo_set.get(
+            key='vcs')
+        self.assertEqual('watch_url' not in dummy_pi.value, True)
+
+        # This part will test another part of the code.
+        set_mock_response(mock_requests, text=self.json_data)
+        self.run_task()
+
+        dummy_pi = self.dummy_package.packageextractedinfo_set.get(
+            key='vcs').value
+        self.assertEqual('watch_url' in dummy_pi, True)
+        self.assertEqual(
+            dummy_pi['watch_url'],
+            'https://qa.debian.org/cgi-bin/vcswatch?package=dummy')
+
+    def test_extractedinfo_is_dropped_when_data_is_gone(self, mock_requests):
+        """
+        Tests that PackageExtractedInfo is dropped if vcswatch info
+        goes away.
+        """
+        set_mock_response(mock_requests, text=self.json_data)
+        self.run_task()
+
+        set_mock_response(mock_requests, text=self.other_json_data)
+        self.run_task()
+
+        with self.assertRaises(PackageExtractedInfo.DoesNotExist):
+            self.dummy_package.packageextractedinfo_set.get(
+                key='vcs')
+
+    def test_action_item_is_dropped_when_status_is_ok(self,
+                                                      mock_requests):
+        """
+        Ensure the action item is dropped when status switches from
+        not "OK" to "OK".
+        """
+        set_mock_response(mock_requests, text=self.json_data)
+        self.run_task()
+        self.assertEqual(self.dummy_package.action_items.count(), 1)
+        json_data = """[{
+ "commits": "46",
+ "package": "dummy",
+ "error": null,
+ "status": "OK"}]"""
+        set_mock_response(mock_requests, text=json_data)
+        self.run_task()
+
+        self.assertEqual(self.dummy_package.action_items.count(), 0)
+
+    def test_action_item_is_updated_when_extra_data_changes(self,
+                                                            mock_requests):
+        """
+        Ensures that the action item is updated when extra_data changes.
+        """
+        set_mock_response(mock_requests, text=self.json_data)
+        self.run_task()
+        self.assertEqual(self.dummy_package.action_items.count(), 1)
+        json_data = """[{
+ "commits": "47",
+ "package": "dummy",
+ "error": null,
+ "status": "COMMITS"}]"""
+        set_mock_response(mock_requests, text=json_data)
+        self.run_task()
+
+        ai = self.dummy_package.action_items.first()
+        extra_data = ai.extra_data
+        self.assertEqual(extra_data['commits'], 47)
+
+    def test_other_extractedinfo_keys_not_dropped(self, mock_requests):
+        """
+        Ensure that other PackageExtractedInfo keys are not dropped when
+        deleting the vcs key.
+        """
+        set_mock_response(mock_requests, text=self.json_data)
+        self.run_task()
+
+        set_mock_response(mock_requests, text=self.other_json_data)
+        self.run_task()
+
+        info = self.dummy_package.packageextractedinfo_set.get(key='general')
+
+        self.assertEqual(info.value['name'], 'dummy')
diff --git a/distro_tracker/vendor/debian/tracker_tasks.py b/distro_tracker/vendor/debian/tracker_tasks.py
index 541db22..17d5b0f 100644
--- a/distro_tracker/vendor/debian/tracker_tasks.py
+++ b/distro_tracker/vendor/debian/tracker_tasks.py
@@ -34,6 +34,7 @@ from distro_tracker.vendor.debian.models import PackageExcuses
 from distro_tracker.vendor.debian.models import UbuntuPackage
 from distro_tracker.core.utils.http import HttpCache
 from distro_tracker.core.utils.http import get_resource_content
+from distro_tracker.core.utils.http import get_resource_text
 from distro_tracker.core.utils.misc import get_data_checksum
 from distro_tracker.core.utils.packages import package_hashdir
 from .models import DebianContributor
@@ -2432,3 +2433,373 @@ class MultiArchHintsTask(BaseTask):
 
             ActionItem.objects.delete_obsolete_items([self.action_item_type],
                                                      packages.keys())
+
+
+class UpdateVcsWatchTask(BaseTask):
+    """
+    Updates packages' vcswatch stats.
+    """
+    ACTION_ITEM_TYPE_NAME = 'vcswatch-warnings-and-errors'
+    ITEM_DESCRIPTION = 'This package <a href="{url}">{report}</a>.'
+    ITEM_FULL_DESCRIPTION_TEMPLATE = 'debian/vcswatch-action-item.html'
+    VCSWATCH_URL = 'https://qa.debian.org/cgi-bin/vcswatch?package=%(package)s'
+    VCSWATCH_DATA_URL = 'https://qa.debian.org/data/vcswatch/vcswatch.json.gz'
+
+    VCSWATCH_STATUS_DICT = {
+        "NEW": {
+            "brief": "has a new version in the VCS",
+            "severity": ActionItem.SEVERITY_NORMAL,
+        },
+        "COMMITS": {
+            "brief": "has {commits} new commits in its VCS",
+            "severity": ActionItem.SEVERITY_NORMAL,
+        },
+        "OLD": {
+            "brief": "VCS is NOT up to date",
+            "severity": ActionItem.SEVERITY_HIGH,
+        },
+        "UNREL": {
+            "brief": "VCS has unreleased changelog",
+            "severity": ActionItem.SEVERITY_HIGH,
+        },
+        "ERROR": {
+            "brief": "VCS has an error",
+            "severity": ActionItem.SEVERITY_HIGH,
+        },
+        "DEFAULT": {
+            "brief": "\"Huh, this is weird\"",
+            "severity": ActionItem.SEVERITY_HIGH,
+        },
+    }
+
+    def __init__(self, force_update=False, *args, **kwargs):
+        super(UpdateVcsWatchTask, self).__init__(*args, **kwargs)
+        self.force_update = force_update
+        self.vcswatch_ai_type = ActionItemType.objects.create_or_update(
+            type_name=self.ACTION_ITEM_TYPE_NAME,
+            full_description_template=self.ITEM_FULL_DESCRIPTION_TEMPLATE
+        )
+
+    def set_parameters(self, parameters):
+        if 'force_update' in parameters:
+            self.force_update = parameters['force_update']
+
+    def get_vcswatch_data(self):
+        text = get_resource_text(self.VCSWATCH_DATA_URL, encoding="utf-8")
+
+        if text is None:
+            return
+
+        # There's some text, let's load!
+        data = json.loads(text)
+
+        out = {}
+        # This allows to save a lot of list search later.
+        for entry in data:
+            out[entry[u'package']] = entry
+
+        return out
+
+    def clean_package_info(self, package_infos_without_watch, todo):
+        """Takes a list of :class:`PackageExtractedInfo` which do not
+        have a watch entry and cleans it. Then schedule in todo what
+        to do with them.
+        """
+        for package_info in package_infos_without_watch:
+            if 'watch_url' in package_info.value:
+                package_info.value.pop('watch_url')
+                if (list(package_info.value.keys()) == ['checksum'] or
+                        not package_info.value.keys()):
+                    todo['drop']['package_infos'].append(package_info)
+                else:
+                    package_info.value['checksum'] = get_data_checksum(
+                        package_info.value
+                    )
+                    todo['update']['package_infos'].append(package_info)
+
+    def update_action_item(self, package, vcswatch_data, action_item, todo):
+        """
+        For a given :class:`ActionItem` and a given vcswatch data, updates
+        properly the todo dict if required.
+
+        Returns dependingly on what has been done. If something is to
+        be updated, returns True, if nothing is to be updated, returns
+        False. If the calling loop should `continue`, returns `None`.
+
+        :rtype: bool or `None`
+        """
+
+        package_status = vcswatch_data[u'status']
+
+        if package_status == "OK":
+            # Everything is fine, let's purge the action item. Not the
+            # package extracted info as its QA url is still relevant.
+            if action_item:
+                todo['drop']['action_items'].append(action_item)
+
+            # Nothing more to do!
+            return None
+
+        # NOT BEFORE "OK" check!!
+        if package_status not in self.VCSWATCH_STATUS_DICT:
+            package_status = "DEFAULT"
+
+        # If we are here, then something is not OK. Let's check if we
+        # already had some intel regarding the current package status.
+        if action_item is None:
+            action_item = ActionItem(
+                package=package,
+                item_type=self.vcswatch_ai_type)
+            todo['add']['action_items'].append(action_item)
+        else:
+            todo['update']['action_items'].append(action_item)
+
+        # Computes the watch URL
+        vcswatch_url = self.VCSWATCH_URL % {'package': package.name}
+
+        if action_item.extra_data:
+            extra_data = action_item.extra_data
+        else:
+            extra_data = {}
+
+        # Fetches the long description and severity from
+        # the VCSWATCH_STATUS_DICT dict.
+        action_item.severity = \
+            self.VCSWATCH_STATUS_DICT[package_status]['severity']
+
+        nb_commits = vcswatch_data["commits"]
+        if nb_commits is None:
+            nb_commits = 0
+        else:
+            nb_commits = int(nb_commits)
+
+        # The new data
+        new_extra_data = {
+            'name': package.name,
+            'status': package_status,
+            'error': vcswatch_data["error"],
+            'vcswatch_url': vcswatch_url,
+            'commits': nb_commits,
+        }
+
+        extra_data_match = all([
+            new_extra_data[key] == extra_data.get(key, None)
+            for key in new_extra_data
+        ])
+
+        # If everything is fine and we are not forcing the update
+        # then we proceed to the next package.
+        if extra_data_match and not self.force_update:
+            # Remove from the todolist
+            todo['update']['action_items'].remove(action_item)
+            return False
+        else:
+            action_item.extra_data = new_extra_data
+
+            # Report for short description of the :class:`ActionItem`
+            report = self.VCSWATCH_STATUS_DICT[package_status]['brief']
+
+            # If COMMITS, then string format the report.
+            if package_status == 'COMMITS':
+                report = report.format(commits=nb_commits)
+
+            action_item.short_description = self.ITEM_DESCRIPTION.format(
+                url=vcswatch_url,
+                report=report,
+            )
+            return True
+
+    def update_package_info(self, package, vcswatch_data, package_info, todo):
+        """
+        """
+
+        # Same thing with PackageExtractedInfo
+        if package_info is None:
+            package_info = PackageExtractedInfo(
+                package=package,
+                key='vcs',
+            )
+            todo['add']['package_infos'].append(package_info)
+        else:
+            todo['update']['package_infos'].append(package_info)
+
+        # Computes the watch URL
+        vcswatch_url = self.VCSWATCH_URL % {'package': package.name}
+
+        new_value = dict(package_info.value)
+        new_value['watch_url'] = vcswatch_url
+        new_value['checksum'] = get_data_checksum(new_value)
+
+        package_info_match = (
+            new_value['checksum'] == package_info.value.get(
+                'checksum',
+                None,
+            )
+        )
+
+        if package_info_match and not self.force_update:
+            todo['update']['package_infos'].remove(package_info)
+            return False
+        else:
+            package_info.value = new_value
+            return True
+
+    def update_packages_item(self, packages, vcswatch_datas):
+        """Generates the lists of :class:`ActionItem` to be added,
+        deleted or updated regarding the status of their packages.
+
+        Categories of statuses are:
+        {u'COMMITS', u'ERROR', u'NEW', u'OK', u'OLD', u'UNREL'}
+
+        Basically, it fetches all info from :class:`PackageExtractedInfo`
+        with key='vcs', the ones without data matching vcswatch_datas are
+        stored in one variable that's iterated through directly, and if
+        there was something before, it is purged. Then, all entries in
+        that queryset that have no relevant intel anymore are scheduled
+        to be deleted. The others are only updated.
+
+        All :class:`PackageExtractedInfo` matching vcswatch_datas
+        are stored in another variable. The same is done with the list of
+        :class:`ActionItem` that match this task type.
+
+        Then, it iterates on all vcswatch_datas' packages and it tries to
+        determine if there are any news, if so, it updates apopriately the
+        prospective :class:`ActionItem` and :class:`PackageExtractedInfo`,
+        and schedule them to be updated. If no data was existent, then
+        it creates them and schedule them to be added to the database.
+
+        At the end, this function returns a dict of all instances of
+        :class:`ActionItem` and :class:`PackageExtractedInfo` stored
+        in subdicts depending on their class and what is to be done
+        with them.
+
+        :rtype: dict
+
+        """
+
+        todo = {
+            'drop': {
+                'action_items': [],
+                'package_infos': [],
+            },
+            'update': {
+                'action_items': [],
+                'package_infos': [],
+            },
+            'add': {
+                'action_items': [],
+                'package_infos': [],
+            },
+        }
+
+        # Fetches all PackageExtractedInfo for packages having a vcswatch
+        # key. As the pair (package, key) is unique, there is a bijection
+        # between these data, and we fetch them classifying them by package
+        # name.
+        package_infos = {
+            package_info.package.name: package_info
+            for package_info in PackageExtractedInfo.objects.select_related(
+                'package'
+            ).filter(key='vcs').only('package__name', 'value')
+        }
+
+        # As :class:`PackageExtractedInfo` key=vcs is shared, we have to
+        # clean up those with vcs watch_url that aren't in vcs_data
+        package_infos_without_watch = PackageExtractedInfo.objects.filter(
+            key='vcs').exclude(
+            package__name__in=vcswatch_datas.keys()).only('value')
+
+        # Do the actual clean.
+        self.clean_package_info(package_infos_without_watch, todo)
+
+        # Fetches all :class:`ActionItem` for packages concerned by a vcswatch
+        # action.
+        action_items = {
+            action_item.package.name: action_item
+            for action_item in ActionItem.objects.select_related(
+                'package'
+            ).filter(item_type=self.vcswatch_ai_type)
+        }
+
+        for package in packages:
+            # Get the vcswatch_data from the whole vcswatch_datas
+            vcswatch_data = vcswatch_datas[package.name]
+
+            # Get the old action item for this warning, if it exists.
+            action_item = action_items.get(package.name, None)
+            package_info = package_infos.get(package.name, None)
+
+            # Updates the :class:`ActionItem`. If _continue is None,
+            # then there is nothing more to do with this package.
+            # If it is False, then no update is pending for the
+            # :class:`ActionItem`, else there is an update
+            # to do.
+            _ai_continue = self.update_action_item(
+                package,
+                vcswatch_data,
+                action_item,
+                todo)
+
+            if _ai_continue is None:
+                continue
+
+            _pi_continue = self.update_package_info(
+                package,
+                vcswatch_data,
+                package_info,
+                todo)
+
+            if not _ai_continue and not _pi_continue:
+                continue
+
+        return todo
+
+    def execute(self):
+        # Get the actual vcswatch json file from qa.debian.org
+        vcs_data = self.get_vcswatch_data()
+
+        # Only fetch the packages that are in the json dict.
+        packages = PackageName.objects.filter(name__in=vcs_data.keys())
+
+        # Faster than fetching the action items one by one in a loop
+        # when handling each package.
+        packages.prefetch_related('action_items')
+
+        # Determine wether something is to be kept or dropped.
+        todo = self.update_packages_item(packages, vcs_data)
+
+        with transaction.atomic():
+            # Delete the :class:`ActionItem` that are osbolete, and also
+            # the :class:`PackageExtractedInfo` of the same.
+            ActionItem.objects.delete_obsolete_items(
+                [self.vcswatch_ai_type],
+                vcs_data.keys())
+            PackageExtractedInfo.objects.filter(
+                key='vcs',
+                id__in=[
+                    package_info.id
+                    for package_info in todo['drop']['package_infos']
+                ]
+            ).delete()
+
+            # Then delete the :class:`ActionItem` that are to be deleted.
+            ActionItem.objects.filter(
+                item_type__type_name=self.vcswatch_ai_type.type_name,
+                id__in=[
+                    action_item.id
+                    for action_item in todo['drop']['action_items']
+                ]
+            ).delete()
+
+            # Then bulk_create the :class:`ActionItem` to add and the
+            # :class:`PackageExtractedInfo`
+            ActionItem.objects.bulk_create(todo['add']['action_items'])
+            PackageExtractedInfo.objects.bulk_create(
+                todo['add']['package_infos']
+            )
+
+            # Update existing entries
+            for action_item in todo['update']['action_items']:
+                action_item.save()
+            for package_info in todo['update']['package_infos']:
+                package_info.save()
-- 
2.11.0

From fcf2940676ad3ce8006d0305024c69672aa53c5e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Pierre-Elliott=20B=C3=A9cue?= <becue@crans.org>
Date: Mon, 27 Nov 2017 17:16:18 +0100
Subject: [PATCH 6/6] VCS general info moved from general to vcs
 PackageExtractedInfo

---
 distro_tracker/core/panels.py                      | 16 ++++--
 distro_tracker/core/retrieve_data.py               | 60 +++++++++++++++++++++-
 .../core/templates/core/panels/general.html        | 30 +++++++----
 3 files changed, 91 insertions(+), 15 deletions(-)

diff --git a/distro_tracker/core/panels.py b/distro_tracker/core/panels.py
index f49df55..85c96ef 100644
--- a/distro_tracker/core/panels.py
+++ b/distro_tracker/core/panels.py
@@ -241,15 +241,23 @@ class GeneralInformationPanel(BasePanel):
         })
         if implemented and url:
             general['url'] = url
-        # Map the VCS type to its name.
-        if 'vcs' in general and 'type' in general['vcs']:
-            shorthand = general['vcs']['type']
-            general['vcs']['full_name'] = get_vcs_name(shorthand)
+
         # Add mailing list archive URLs
         self._add_archive_urls(general)
         # Add developer information links and any other vendor-specific extras
         self._add_developer_extras(general)
 
+        try:
+            vcs_info = PackageExtractedInfo.objects.get(
+                package=self.package, key='vcs').value
+        except PackageExtractedInfo.DoesNotExist:
+            vcs_info = {}
+
+        if vcs_info:
+            if 'type' in vcs_info:
+                vcs_info['full_name'] = get_vcs_name(vcs_info['type'])
+            general["vcs"] = vcs_info
+
         return general
 
     @property
diff --git a/distro_tracker/core/retrieve_data.py b/distro_tracker/core/retrieve_data.py
index 14f762c..548c731 100644
--- a/distro_tracker/core/retrieve_data.py
+++ b/distro_tracker/core/retrieve_data.py
@@ -27,6 +27,7 @@ from distro_tracker.core.utils.packages import (
 from distro_tracker.core.tasks import BaseTask
 from distro_tracker.core.tasks import clear_all_events_on_exception
 from distro_tracker.core.models import SourcePackageName, Architecture
+from distro_tracker.core.utils.misc import get_data_checksum
 from distro_tracker.accounts.models import UserEmail
 from django.db import transaction
 from django.db import models
@@ -846,7 +847,6 @@ class UpdatePackageGeneralInformation(PackageUpdateTask):
             'architectures': list(
                 map(str, srcpkg.architectures.order_by('name'))),
             'standards_version': srcpkg.standards_version,
-            'vcs': srcpkg.vcs,
         }
 
         return general_information
@@ -1059,3 +1059,61 @@ class UpdateTeamPackagesTask(BaseTask):
                 # Add the package to all the uploaders' teams packages
                 for uploader in source_package.uploaders.all():
                     self.add_package_to_maintainer_teams(package, uploader)
+
+
+class UpdatePackageVcsInformationTask(PackageUpdateTask):
+    """
+    Updates the vcs information regarding packages.
+    """
+    DEPENDS_ON_EVENTS = (
+        'new-source-package-version-in-repository',
+        'lost-source-package-version-in-repository',
+    )
+
+    def __init__(self, *args, **kwargs):
+        super(UpdatePackageVcsInformationTask, self).__init__(*args, **kwargs)
+        self.packages = set()
+
+    def process_event(self, event):
+        self.packages.add(event.arguments['name'])
+
+    def _get_info_from_entry(self, entry):
+        srcpkg = entry.source_package
+        return srcpkg.vcs
+
+    @clear_all_events_on_exception
+    def execute(self):
+        package_names = set(
+            event.arguments['name']
+            for event in self.get_all_events()
+        )
+
+        with transaction.atomic():
+            if self.is_initial_task():
+                self.log("Updating vcs infos of all packages")
+                qs = SourcePackageName.objects.all()
+            else:
+                self.log("Updating vcs infos of %d packages",
+                         len(package_names))
+                qs = SourcePackageName.objects.filter(name__in=package_names)
+            for package in qs:
+                entry = package.main_entry
+                if entry is None:
+                    continue
+
+                vcs_info = self._get_info_from_entry(entry)
+                vcs_pkginfo, _ = PackageExtractedInfo.objects.get_or_create(
+                    key='vcs',
+                    package=package
+                )
+                for key in ['type', 'url', 'browser']:
+                    if key in vcs_info:
+                        vcs_pkginfo.value[key] = vcs_info[key]
+
+                if (vcs_pkginfo.value.keys() == ['checksum'] or
+                        not vcs_pkginfo.value.keys()):
+                    vcs_pkginfo.delete()
+                else:
+                    vcs_pkginfo.value['checksum'] = get_data_checksum(
+                        vcs_pkginfo.value)
+                    vcs_pkginfo.save()
diff --git a/distro_tracker/core/templates/core/panels/general.html b/distro_tracker/core/templates/core/panels/general.html
index e55c564..8ce4bd5 100644
--- a/distro_tracker/core/templates/core/panels/general.html
+++ b/distro_tracker/core/templates/core/panels/general.html
@@ -86,20 +86,30 @@
     {% endif %}
 
     {% if ctx.vcs %}
+    {% with vcs=ctx.vcs %}
     <li class="list-group-item">
         <span class="list-item-key"><b>VCS:</b></span>
-	{% with vcs=ctx.vcs.full_name|default:ctx.vcs.type %}
-	{% if vcs|lower == "cvs" %}
-	<span title="{{ ctx.vcs.url }}">{{ vcs }}</span>
-	{% else %}
-	<a href="{{ ctx.vcs.url }}">{{ vcs }}</a>
-	{% endif %}
-	{% if ctx.vcs.browser %}
-	(<a href="{{ ctx.vcs.browser }}">Browse</a>)
-	{% endif %}
-	{% endwith %}
+        {% if vcs.full_name or vcs.type %}
+            {% with vcs_name=vcs.full_name|default:vcs.type %}
+            {% if vcs_name|lower == "cvs" %}
+            <span title="{{ vcs.url }}">{{ vcs_name }}</span>
+            {% else %}
+            <a href="{{ vcs.url }}">{{ vcs_name }}</a>
+            {% endif %}
+            {% endwith %}
+        {% endif %}
+
+        {% if vcs.browser %}
+        <a href="{{ vcs.browser }}">Browse</a>
+        {% endif %}
+
+        {% if vcs.watch_url %}
+        <a href="{{ vcs.watch_url }}">QA</a>
+        {% endif %}
     </li>
+    {% endwith %}
     {% endif %}
+
 </ul>
 {% endwith %}
 {% endblock %}
-- 
2.11.0

Attachment: signature.asc
Description: PGP signature


Reply to: