[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#715216: qa.debian.org: collab-qa/upload-history: Software trusts "Date" headers which are sometimes set wrong



(Sorry about a not fully formed thought on numeral 2 in the numbered list in the last message. Typing and thinking too fast.)

Anyway, as an update to this: further research indicates that upload-history is simply spitting out the data from the email in a "Message-Date" field. I can't blame the upload-history code for spitting out the garbage it got as input data.

However, in udd/upload_history_gatherer.py in udd.git, there is machinery to insert the value of Message-Date into the field, rather than the changes file's Date header.

I suggest the following untested patch:

diff --git a/udd/upload_history_gatherer.py b/udd/upload_history_gatherer.py
index 4091eec..fe485e6 100644
--- a/udd/upload_history_gatherer.py
+++ b/udd/upload_history_gatherer.py
@@ -54,5 +54,5 @@ class upload_history_gatherer(gatherer):
         VALUES ($1, $2, $3, $4)" % (self.my_config['table'] + '_closes'))

- query = "EXECUTE uh_insert(%(Source)s, %(Version)s, %(Message-Date)s, \
+    query = "EXECUTE uh_insert(%(Source)s, %(Version)s, %(Date)s, \
       %(Changed-By)s, %(Changed-By_name)s, %(Changed-By_email)s, \
%(Maintainer)s, %(Maintainer_name)s, %(Maintainer_email)s, %(NMU)s, \

(sorry about some patch mangling here by email)

The key question for us is: Are we okay with changing the definition of "date" in the upload-history table to mean the date within the changes file, rather than the email message's date?

One downside to this is that for sponsored packages, we see when the sponsoree did the work, rather than when the package got uploaded. The current behavior, of keeping the upload_history table's contents being the date of the message, results in a best-effort attempt to measure when the upload seemed to actually get processed. So I like the current behavior, and I now would reject my patch.


Okay.


At that point if the goal is, "The date field represents, to the best effort we can approximate, the time that the upload was successfully processed by the Debian servers", we have a few different options. I'm going to do some further research here and get back to the bug with a recommendation, but as a surely-incomplete list of options:

1. We could try to implement a very conservative fixup strategy like, "If the Message-Date field is more than one year different from the Date field, go with whichever of (Message-Date, Date) is closest to the envelope From".

2. We could create a "fixups" list by hand of package uploads and actual dates, manually maintained, that overrides the data in the mbox files. That is probably the easiest, since I suspect there are not very many packages with this problem.

3. We could store both Message-Date and Date in UDD, and then tell users of UDD that they will have to deal with this problem of bad data. (This is the option I like the least.)

Of these, I like option 1 the most. I will work on implementing that.

-- Asheesh.


Reply to: