[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Report 1 - PyPI to Debian repository converter



Hello,


This is my first report on the work progress on a project PyPI to Debian Repository Converter[0] mentored by Piotr Ożarowski.


----------------
Work: I’ve worked mainly over issues related to the PyPI repository[1] and its XML-RPC interface[2]. My goal was to download sources of available Python 3 packages.

In the course of work I’ve dealt with  the following tasks:

----------------
1) Selection of packages intended for Python 3 (as agreed with my mentor, I will work on packages for Python 3 first - once ready, I’ll try to add support for Python 2 packages as well.)

After reading Python Packaging chapter from “The Architecture of Open Source Applications” book[3], I’ve used browse method from PyPI's XML-RPC interface, which makes  it possible to search for packages matching classifiers[4]. Unfortunately, it is not possible to determine the minimum and/or maximum required version of Python. You can list specific versions or use "Programming Language :: Python :: 3" classifier, unfortunately “Python :: 3.2” does not imply “Python :: 3”. For this reason I have to call this method for each specific version, but finally I’m able to get a list of unique packages, with a list of their releases available for Python 3.

From my point of view it would be helpful if the browse function has provided the ability to select packages using wildcard in these criteria or looking for packages not meeting given conditions. I have added this to my TODO and if time permits, I will prepare patches for PyPI’s rpc.py.


I’ve decided to reject packages described in their classifiers as 'Development Status :: 1 - Planning', simply because they usually don’t have source files yet. Debian package for project in the planning phase is also not the best idea.

I’ve acquainted with the standard pep-0386[5], but while sorting list of versions (harvested from real releases) using distutils library[6]  (in order to select the latest available version), I came across a problem which, by suggestion from my mentor, I reported to Python’s bug tracker[7]. The first time I’ve reported a bug there and I had enjoyed an immediate response. Moreover, my mentor suggested me to look in the library sources and propose appropriate patch, which I did:-)

2) Download the relevant source files. In order to obtain links to sources I’ve decided to use release_urls method which returns a list of download urls for the given package release. Unfortunately, this method doesn't accept a list on the entry, so calling it successively for each package is relatively slow. While maintaining this shape the further optimization is difficult, so I consider an attempt to modify this method and send patches as well.


From the list of files returned by relase_urls I’ve chosen those which have python_version set to source. In Python 3 it is possible to put archives in different formats so I set the download priority to tar.xz, then tar.bz2, tar.gz and zip. Python programmers have many unusual ideas to name their files, so it took me some time to make sure I'm downloading appropriate files. Eventually I've (hopefully) reached the state where only the right archive is dowloaded. My algorithm doesn’t  skip other sources (f.e. additional plugins like the ones in Pythomnic3k[a]) included in releases, but I had to add special cases for packages such as waferslim[b] or tuxmodule[c] (i.e. check comment_text field).


Statistics for downloaded packages at this moment are as follows:

packages for Python 3:

~~~~~~~~~~~~~~~~~

unique packages: 1016
packages without source: 138


packages for Python 2:

~~~~~~~~~~~~~~~~~
unique packages: 2930
packages without source: 457

NOTE: I’m aware that there are about 15k packages that match "Programming Language :: Python", but most of them don’t have any further version classifiers, so I’ll assume that they support Python 2 only.


NOTE: The packages described as “packages without source” are those for which the release_urls method doesn’t return links to the source. In the classifiers dictionary (obtained by the release_data method) there’s a download_url field available, but this link often redirects to 3rd party websites like sourceforge.net[8] which do not point to the archive directly.

3) Update to the newest version of packages. To check if  there are new versions of packages in PyPI or new packages were added, list of unique packages which meet my criteria is generated again and the list is checked against already downloaded files. It seemed unnecessary to use client_urls to check the exact file name again at this point - as I wrote earlier, calling it takes a lot of time. I realize that this is not optimal and will try to change it a bit soon. Developers usually stick to the package_name-version convention, but there are also situations such as e.g. Python Bytecode Verifier[d] or tmdb[e].


With a help of my mentor, I located PyPI sources[9].  I’ve found over there updated_releases method which is not mentioned in the documentation, but seems to be useful - I compare my results with it.


-----------------
Summary: My tool is able to find and download newest versions of Python 3 packages available in the PyPI. It was a fairly tedious part of the job and I’m glad that I have it behind me. Right now my code works as expected, I'll check how it behaves after another round of PyPI updates and make the necessary modifications if needed.


-----------------

Plans: In the next few days the most important task is to design detailed  API for plugins system, which will convert the packages to the repository for Debian. I have to think about how to integrate stdeb[10] and pkgme[11] (first two plugins) and to add Python 3 support to both of them. One of the biggest challenges will be to determine the build dependencies.


I think that during last 2 weeks my knowledge about PyPI has increased dramatically and I can't wait until my knowledge about Debian packages also become a bit fuller:-)


My repository can be followed at: https://gitorious.org/pypi2deb
----
Natalia Frydrych


----------------
[0] http://wiki.debian.org/SummerOfCode2012/StudentApplications/NataliaFrydrych
[1] http://pypi.python.org/
[2] http://wiki.python.org/moin/PyPiXmlRpc
[3] http://www.aosabook.org/en/packaging.html
[4] http://pypi.python.org/pypi?%3Aaction=list_classifiers
[5] http://www.python.org/dev/peps/pep-0386/
[6] http://docs.python.org/dev/distutils/introduction
[7] http://bugs.python.org/issue14894
[8] http://sourceforge.net
[9] https://bitbucket.org/loewis/pypi/src/3d39a7bcfc26/rpc.py
[10] https://github.com/astraw/stdeb
[11]
https://launchpad.net/pkgme


----------------

[a] http://pypi.python.org/pypi/Pythomnic3k/1.2

[b] http://pypi.python.org/pypi/waferslim/1.0.2

[c] http://pypi.python.org/pypi/tuxmodule/1.0 - http://paste.debian.net/172552/

[d] http://pypi.python.org/pypi/Python%20Bytecode%20Verifier/0.1

[e] http://pypi.python.org/pypi/tmdb/0.9


Reply to: