Re: UDD gatherer for DDTP translations (Was: Extended descriptions size)
On Sun, 12 Apr 2009, Martijn van Oosterhout wrote:
SELECT md5(description || E'\n' || long_description || E'\n' ) AS md5
FROM packages WHERE ...
Ok, I see why you're having trouble now; you're splitting up the
description in your DB and thus need to stick it back together.
That's the format other tables in UDD are using. But it does not
really make the worst part of the problem - as you see It can
perfectly be joined again. It is just the md5 sum calsulation which
slows down things and the calculation of the version number is
not reliable in all cases - which I regard as a problem.
That does indeed make the process a bit less reliable.
I don't think that it is the split which causes the problem. I was
able to reproduce the correct description the way I described above.
treats the description as a single string, the exact string in the
Packages file (the Description field is a single entry in the file) so
we had no issues. By doing extra processing like splitting/stripping
parts of the string it's quite possible you're doing a not invertible
conversion, which would make matching later harder.
In how far? This is done in UDD with all descriptions and never
caused a problem.
It might actually be easier to write a script which simply collected
Packages files from say snapshot.debian.org, calculated all the MD5
sums (you can extract the description field using a regex so it's easy
enough in Perl) and built a database of description MD5s and version
numbers. That would give a reliable mapping, far more reliable than
the DDTP/DDTSS is ever likely to do.
Can you elaborate a bit more why you regard it as not reliable to
add a version number to DDTP Translation files?