Bug#908678: Testing the filter-branch scripts
On 2018-11-12 12:22:58, Antoine Beaupré wrote:
> I'll start a run on the whole history to see if I can find any problems,
> as soon as a first clone finishes resolving those damn deltas. ;)
The Python job finished successfully here after 10 hours.
I did some tests on the new git repository. Cloning the repository from
scratch takes around 2 minutes (the original repo: 21 minutes). It is
145MB while the original repo is 1.6GB.
Running git annotate on data/CVE/list.2018 takes about 26 seconds, while
it takes basically forever to annotate the original data/CVE/list. (It's
been running for 10 minutes here already.)
So that's about it. I have not done a thorough job at checking the
actual *integrity* of the results. It's difficult, considering CVE
identifiers are not sequential in the data/CVE/list file, so a naive
diff like this will fail:
$ diff -u <(cat ../security-tracker-full-test-filtered-bis/data/CVE/list.{2019,2018,2017,2016,2015,2014,2013,2012,2011,2010,2009,2008,2007,2006,2005,2004,2003,2002,2001,2000,1999} ) data/CVE/list | diffstat
list |106562 +++++++++++++++++++++++++++++++++----------------------------------
1 file changed, 53281 insertions(+), 53281 deletions(-)
But at least the numbers add up: it looks like no line is lost. And
indeed, it looks like all CVEs add up:
$ diff -u <(cat ../security-tracker-full-test-filtered-bis/data/CVE/list.{2019,2018,2017,2016,2015,2014,2013,2012,2011,2010,2009,2008,2007,2006,2005,2004,2003,2002,2001,2000,1999} | grep ^CVE | sort -n ) <( grep ^CVE data/CVE/list | sort -n ) | diffstat
0 files changed
A cursory look at the diff seems to indicate it is clean, however.
I looked at splitting that file per CVE. That did not scale and just
created new problems. But splitting by *year* seems like a very
efficient switch, and I think it would be worth pursuing that idea
forward.
A.
--
There is no cloud, it's just someone else's computer.
- Chris Watterson
Reply to: