[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#773294: tracker.debian.org: add support for the vcswatch service



Le mercredi 22 novembre 2017 à 10:38:04+0100, Raphael Hertzog a écrit :
> Hi Pierre-Elliott,
> 
> On Wed, 22 Nov 2017, Pierre-Elliott Bécue wrote:
> > Please, feel free to review and comment the patch.
> > 
> > It lacks tests for the task, I'll work on that by the end of the week in a
> > fourth patch.
> 
> First of all, I would highly suggest to try to follow the principles of
> test-driven development:
> https://www.obeythetestinggoat.com/
> 
> IOW, you write a failing test first, then you code what's necessary to
> make the test pass and so on until you have all the features that you
> are looking for.
> 
> Adding tests afterwards is hard and you are likely to miss some important
> tests. And it's just not fun to code tests separately when you already
> have something working... it will be hard to stick to it in the long run.

I agree on the principle, but I had just no clue of where I was headed, and
no clue of how to design a test for this functionality. So I thought about
trying to develop something that analyzes the json file and then I was
set up to finish.

I'll design tests by the end of the day.

> Then on the compression feature, I have to agree with pabs, we should not
> have to look into the data-flow to find the compression method. I would
> suggest:
> - either the user is explicit (compression="gzip" attribute, or
>   compression=None/False for no compression)
> - or the user relies on compression="auto" (default value) in which
>   case it should just rely on the filename extension that we should pass
>   along in some way.

pabs was also suggesting to have something doing the decompression
implicitly and thus providing no argument for compression.

Which one would you prefer?

> Also it seems to me that the correct API for generic decompression should
> rely on getting a file object as input and providing a file object as
> output... as that's what you want to be able to process really big files
> without reading them all at once in memory (shall the need arise).

Yes.

> Now onto the vcswatch code:
> 
> > --- a/distro_tracker/core/panels.py
> > +++ b/distro_tracker/core/panels.py
> > @@ -235,6 +235,12 @@ class GeneralInformationPanel(BasePanel):
> >              # There is no general info for the package
> >              return
> >  
> > +        try:
> > +            vcswatch = PackageExtractedInfo.objects.get(
> > +                package=self.package, key='vcswatch').value
> > +        except PackageExtractedInfo.DoesNotExist:
> > +            vcswatch = {}
> 
> I don't like that we modify a general panel for a feature that is currently
> vendor specific. I would like to see an intermediary abstraction under
> the form of "vcs_extra_links". It can still be in PackageExtractedInfo but
> you would not be alone in having the right to create/modify those keys.
> It would be something like this:
> {
> 	'QA': 'https://qa.debian.org/cgi-bin/vcswatch?package=foo',
> 	'Browse': '...',
> }
> 
> And the code would display the links in alphabetical order. (Yes I'm suggesting
> that we might want to move the code that stores the Vcs-Browser link to use
> this new structure)
> 
> Also if we have to extract two keys out of  PackageExtractedInfo we want to do
> it in a single query to avoid round-trips with the database. The package page
> is already very heavy in numbers of queries.

I'll see what I can do.

> > --- a/distro_tracker/core/templates/core/panels/general.html
> > +++ b/distro_tracker/core/templates/core/panels/general.html
> > @@ -95,7 +95,7 @@
> >         <a href="{{ ctx.vcs.url }}">{{ vcs }}</a>
> >         {% endif %}
> >         {% if ctx.vcs.browser %}
> > -       (<a href="{{ ctx.vcs.browser }}">Browse</a>)
> > +       (<a href="{{ ctx.vcs.browser }}">Browse</a>{% if ctx.vcs.watch %}, <a href="{{ ctx.vcs.watch }}">
> > QA</a>{% endif %})
> >         {% endif %}
> >         {% endwith %}
> 
> You can have vcswatch data even if you don't have any browse link. vcswatch works as soon
> as you have a vcs url. But here the code would be very different (and cleaner) if you
> implement the abstraction suggested above.

Yep, I'll merge this comment with the previous one.

> > --- /dev/null
> > +++ b/distro_tracker/vendor/debian/templates/debian/vcswatch-action-item.html
> > @@ -0,0 +1,9 @@
> > +{% with description=item.extra_data.description %}
> > +{% with error=item.extra_data.error %}
> > +
> > +<a href="{{item.extra_data.vcswatch_url}}">VCSwatch</a> reports
> > +that {{description}}<br/><br/>
> > +{% if error %}
> > +<span>{{error}}</span>
> > +{% endif %}
> > +{% endwith %}{% endwith %}
> 
> This looks wrong. The long descriptions should really be embedded here
> (with multiple if to check for the vcswatch status) and
> use whatever data they need out of extra_data.

Okay.

> > +        __out = {}
> 
> Please don't use variable names with underscores. A single underscore
> might be useful for private method or function names (or for variables defined at
> the module level possibly). But it's not really useful for private
> variables within methods.

Sure.

> > +    @staticmethod
> > +    def get_data_checksum(data):
> > +        json_dump = json.dumps(data, sort_keys=True)
> > +        if json_dump is not six.binary_type:
> > +            json_dump = json_dump.encode('UTF-8')
> > +        return hashlib.md5(json_dump).hexdigest()
> 
> This checksum mechanism to see whether we have updated data should likely
> be factored out in a function where it can be used by other objects. To make
> it possible to store the checksum alongside the data, you might want to checksum
> a copy of the data where you drop the pre-defined key "data_checksum".
> 
> We use Python 3 only now, so it's no longer required to use "six.binary_type".
> But in fact the whole check is not required as I believe that json.dumps will
> always return text (not bytes).

I saw your commits of the 20th of Nov a little too late. :)

> > +    def get_vcswatch_data(self):
> > +        url = 'https://qa.debian.org/data/vcswatch/vcswatch.json.gz'
> > +        data = json.loads(get_resource_content(url, compression=True).decode('utf-8'))
> 
> get_resource_content() might return None in case of network issue (which will
> trigger an excteption here). Also the need to decode the output is really
> annoying, we might want to introduce "get_resource_text()" which does this for
> us automatically.

Sure. I'll also look into it!

> This is a pretty-good job so far, thank you!

Thanks, and you're welcome.

-- 
PEB

Attachment: signature.asc
Description: PGP signature


Reply to: