Hello,
This is my first report on the work progress on a
project PyPI to Debian Repository Converter[0] mentored by
Piotr Ożarowski.
----------------
Work: I’ve worked mainly over issues related to the PyPI
repository[1] and its XML-RPC interface[2]. My goal was to
download sources of available Python 3 packages.
In the course of work I’ve dealt with the following tasks:
----------------
1) Selection of packages intended for Python 3 (as agreed with
my mentor, I will work on packages for Python 3 first - once
ready, I’ll try to add support for Python 2 packages as well.)
After reading Python Packaging chapter from “The Architecture
of Open Source Applications” book[3], I’ve used browse method
from PyPI's XML-RPC interface, which makes it possible to
search for packages matching classifiers[4]. Unfortunately, it
is not possible to determine the minimum and/or maximum
required version of Python. You can list specific versions or
use "Programming Language :: Python :: 3" classifier,
unfortunately “Python :: 3.2” does not imply “Python :: 3”.
For this reason I have to call this method for each specific
version, but finally I’m able to get a list of unique
packages, with a list of their releases available for Python
3.
From my point of view it would be helpful if the
browse function has provided the ability to select packages
using wildcard in these criteria or looking for packages not
meeting given conditions. I have added this to my TODO and if
time permits, I will prepare patches for PyPI’s rpc.py.
I’ve decided to reject packages described in their classifiers
as 'Development Status :: 1 - Planning', simply because they
usually don’t have source files yet. Debian package for
project in the planning phase is also not the best idea.
I’ve acquainted with the standard pep-0386[5], but while
sorting list of versions (harvested from real releases) using
distutils library[6] (in order to select the latest available
version), I came across a problem which, by suggestion from my
mentor, I reported to Python’s bug tracker[7]. The first time
I’ve reported a bug there and I had enjoyed an immediate
response. Moreover, my mentor suggested me to look in the
library sources and propose appropriate patch, which I did:-)
2) Download the relevant source files. In order to obtain
links to sources I’ve decided to use release_urls method which
returns a list of download urls for the given package release.
Unfortunately, this method doesn't accept a list on the entry,
so calling it successively for each package is relatively
slow. While maintaining this shape the further optimization is
difficult, so I consider an attempt to modify this method and
send patches as well.
From the list of files returned by relase_urls I’ve chosen
those which have python_version set to source. In Python 3 it
is possible to put archives in different formats so I set the
download priority to tar.xz, then tar.bz2, tar.gz and zip.
Python programmers have many unusual ideas to name their
files, so it took me some time to make sure I'm downloading
appropriate files. Eventually I've (hopefully) reached the
state where only the right archive is dowloaded. My algorithm
doesn’t skip other sources (f.e. additional plugins like the
ones in Pythomnic3k[a]) included in releases, but I had to add
special cases for packages such as waferslim[b] or
tuxmodule[c] (i.e. check comment_text field).
Statistics for downloaded packages at this moment
are as follows:
packages for Python 3:
~~~~~~~~~~~~~~~~~
unique packages: 1016
packages without source: 138
packages for Python 2:
~~~~~~~~~~~~~~~~~
unique packages: 2930
packages without source: 457
NOTE: I’m aware that there are about 15k packages that match
"Programming Language :: Python", but most of them don’t have
any further version classifiers, so I’ll assume that they
support Python 2 only.
NOTE: The packages described as “packages without
source” are those for which the release_urls method doesn’t
return links to the source. In the classifiers dictionary
(obtained by the release_data method) there’s a download_url
field available, but this link often redirects to 3rd party
websites like sourceforge.net[8] which do not point to the
archive directly.
3) Update to the newest version of packages. To check if
there are new versions of packages in PyPI or new packages
were added, list of unique packages which meet my criteria is
generated again and the list is checked against already
downloaded files. It seemed unnecessary to use client_urls to
check the exact file name again at this point - as I wrote
earlier, calling it takes a lot of time. I realize that this
is not optimal and will try to change it a bit soon.
Developers usually stick to the package_name-version
convention, but there are also situations such as e.g. Python
Bytecode Verifier[d] or tmdb[e].
With a help of my mentor, I located PyPI
sources[9]. I’ve found over there updated_releases method
which is not mentioned in the documentation, but seems to be
useful - I compare my results with it.
-----------------
Summary: My tool is able to find and download newest versions
of Python 3 packages available in the PyPI. It was a fairly
tedious part of the job and I’m glad that I have it behind me.
Right now my code works as expected, I'll check how it behaves
after another round of PyPI updates and make the necessary
modifications if needed.
-----------------
Plans: In the next few days the most important
task is to design detailed API for plugins system, which will
convert the packages to the repository for Debian. I have to
think about how to integrate stdeb[10] and pkgme[11] (first
two plugins) and to add Python 3 support to both of them. One
of the biggest challenges will be to determine the build
dependencies.
I think that during last 2 weeks my knowledge about PyPI has
increased dramatically and I can't wait until my knowledge
about Debian packages also become a bit fuller:-)
My repository can be followed at:
https://gitorious.org/pypi2deb
----
Natalia Frydrych
----------------
[0]
http://wiki.debian.org/SummerOfCode2012/StudentApplications/NataliaFrydrych
[1] http://pypi.python.org/
[2] http://wiki.python.org/moin/PyPiXmlRpc
[3] http://www.aosabook.org/en/packaging.html
[4] http://pypi.python.org/pypi?%3Aaction=list_classifiers
[5] http://www.python.org/dev/peps/pep-0386/
[6] http://docs.python.org/dev/distutils/introduction
[7] http://bugs.python.org/issue14894
[8] http://sourceforge.net
[9] https://bitbucket.org/loewis/pypi/src/3d39a7bcfc26/rpc.py
[10] https://github.com/astraw/stdeb
[11] https://launchpad.net/pkgme
----------------
[a] http://pypi.python.org/pypi/Pythomnic3k/1.2
[b] http://pypi.python.org/pypi/waferslim/1.0.2
[c] http://pypi.python.org/pypi/tuxmodule/1.0 - http://paste.debian.net/172552/
[d]
http://pypi.python.org/pypi/Python%20Bytecode%20Verifier/0.1
[e] http://pypi.python.org/pypi/tmdb/0.9