[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Equivalent packages between Linux distributions



Hi, Thanks for the response and interest.
 
I should note the package names I have listed in the dataset are source package names, not binary package names.
 
The method I am using is based on similarity between filename lists of source packages. I use the Jaccard index (http://en.wikipedia.org/wiki/Jaccard_index) between sets of filenames to calculate similarity. This was done as an offshoot from the PhD research I'm currently undertaking at Deakin University.
 
--
Silvio

On Fri, Jan 21, 2011 at 9:02 AM, Enrico Zini <enrico@enricozini.org> wrote:
On Wed, Jan 19, 2011 at 10:54:44AM +1100, Silvio Cesare wrote:

>    I have generated a list of roughly equivalent packages between Linux
>    distributions (currently Debian 5 and Fedora 13). The list is
>    automatically generated.
[...]

Hi Silvio,

thank you for your work, it is extremely valuable work.  I'm currently
at a cross-distro meeting on app installers[1] and it's precisely
something we've been working on today. I'd be greatly interested to
exchange algorithms with you.

The main use case we have in mind is to be able to fall back on other
distros when a package doesn't have some piece of information. For
example:

 - does package $foo have a screenshot in Debian?
 - if no, how about in Fedora?
 - if no, how about in OpenSUSE?
 - if no, how about in Mandriva?

The example uses screenshots, but it could be other kinds of metadata,
like categories (it's a way for example to port at least some of Debtags
to other distros), ratings or user comments.

The euristics I've been implementing so far are:

 - trivial package name matching
 - 'stemming' specific kinds of package names (debian:lifoo-dev->foo;
  fedora:foo-devel->foo)
 - matching packages that contain the same .desktop files or the same
  pkg-config files
 - similarity matching of file lists

I still don't have results because the implementation is not complete,
but I should have something in a day or two. You have something *today*,
which is, wow. Tomorrow (Friday) I'll download your dataset and try to
add another euristic that just uses it. It'll also be interesting to use
all these methods to cross-validate each other.

[1] http://distributions.freedesktop.org/wiki/Meetings/AppInstaller2011


Ciao,

Enrico

--
GPG key: 4096R/E7AD5568 2009-05-08 Enrico Zini <enrico@enricozini.org>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iQEcBAEBCAAGBQJNOLDeAAoJEON4Oc9CHQta7ckH/1IsATAFZss4NprTfzO0LMWi
hXn8ds1GvPIxzokgKnX6v3JAq0rX56kFe4yDMFL2JA0GHTHR7bpXtClYBFtP9ErX
XWv6caymfqmJVQLDDwUuDMPUBrVLeT+U4syv7B47JI/paGMfDPYfcRn74qEVrSlL
T3P9cMYKzAwvgrNpL+EGAP3Kw34nfiMra3hmD7SeeYluo3trNUV3/BP6oRxIiLu0
RBSvRzf6+W2P+jE2TsR/KSPYQQ9Ji6CjFPElzNYgW6N3ZKte985vA5AadX91pE2G
QuKeW9PouddjCok1G9qgUCbDLz/WEQqbwkvC6/Wi5TVvpyRwqWmoj6Pmcx9klKM=
=KtYB
-----END PGP SIGNATURE-----



Reply to: