[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Distutils] formencode as .egg in Debian ??



At 12:54 PM 11/25/2005 +1100, David Arnold wrote:
So, if a system package, shipped by the upstream developer as an egg, is
"unpacked" into a directory structure, and its metadata is maintained
in a .egg-info file somewhere in sys.path, non-system eggs will have all
they need to operate correctly?

Yes, with a few clarifications. The internal structure of an egg, let's say foobar-1.2-py23.egg, would look something like:

    foobar/
           __init__.py
           baz.py
           # plus .pyc files, etc.

    EGG-INFO/
             PKG-INFO  # distutils metadata like description/version
             requires.txt   # optional and required dependencies
             # plus other metadata files, either setuptools-defined or
             # project specific

If you unpack this as-is, but rename EGG-INFO to foobar.egg-info (today) or foobar-1.2.egg-info (when I release 0.6a9 of setuptools), and the whole tree above is in a directory on sys.path, this egg is good to go.

I would like to clarify the phrase "shipped as an egg", though. To me, that would mean that the developer is distributing a binary .egg file, and I'm assuming that Debian is primarily interested in *source* packages, being a Free Software distribution. (A binary .egg doesn't have to contain source code at all; you can specifically build it with the source stripped if you desire.) The plan for setuptools 0.6a9 is to provide an option to "setup.py install" that will basically install the layout described above, with the correctly named .egg-info directory automatically created. (Normally, the whole tree above is instead nested in an .egg file or directory.)

I think I should also clarify that whether the upstream developer sets out to package their project as an egg or not, it's possible to create an .egg-info directory and PKG-INFO file to identify that distribution, using setuptools' "easy_install" program and the source distribution. So if the developer of 'foobar' did not choose to create an egg or use setuptools, this doesn't stop a developer who wants to *use* foobar from simply running easy_install to create an .egg file for it. So, this is what I mean when I say there's no such thing as a non-egg package for an egg developer. Someone who depends on a package can simply say they depend on it, and when they build their package, they'll get eggs for their dependencies as a side effect.


So that's another goal of eggs?  To provide information to a package
maintainer to assist in determining if it's the user's PYTHONPATH or
.pth files that are causing a bug?

More specifically, what versions of what packages they're *actually* using, as opposed to what they think they installed or have on their system. PYTHONPATH and .pth files can of course be a factor in that, but also just people thinking they installed something, or not knowing that a bug is fixed in a particular version. Part of it too is finding out whether they're reporting a regression or whether they're just still using a version that has a bug that's been fixed. In the case of the TurboGears mailing list, it's often been the case that TurboGears users flush out a bug in a dependency, which then gets fixed, but then a new TurboGears user maybe reports the same problem, and then it's obvious from their error message whether or not they upgraded.

I realize this is stuff you guys probably do all day for system packages, but eggs make the support job easier upstream too.


I can see that this is *nice*; I'd debate "need".  But I'm happy to
accept that for egg-based stuff, this is a nice feature.

Well, need is relative. A project like TurboGears "needs" this, because otherwise it would be uneconomical to provide the current level of support on as many platforms. So, one project's "nice to have" may be another project's lifeblood, depending on available resources. They've also made it easier for the authors of TurboGears' dependencies to assist in support as well. For me, I'm glad that these features have helped to make something like TurboGears possible and practical.


I'm not going to try to assert "Unix values" here.  My observation is
that historically, Unix has installed things into one of a couple of
directory hierarchies (/usr, /usr/local, /opt).  Within those
hierarchies, there has been scope for only one version of any given
thing.

Um, sure. Not sure what this has to do with the present discussion. As a practical matter, only *one* version of an egg can be *active* (i.e. importable) on sys.path within a given process anyway. It's also clearly not going to be the case on a Debian system that somebody would have multiple versions of something living in /usr/lib, although they might do it for /usr/local or in a user-private directory.

So, I think maybe I lost the train of thought on this point here. I was under the impression that the consensus of the Debian-Python folks so far was that of any egg format, the "single version externally managed" one using .egg-info directories was preferred, since it is basically the same as your current layout. (It's also convenient for me to implement, because it's basically the same as the format already used by the "setup.py develop" command for temporarily adding a project's source checkout to sys.path.)


  Phillip> And we'd like all this to cleanly work with any
  Phillip> locally-installed non-Debian eggs that might be in the mix,
  Phillip> since we need to do development, beta testing, etc.

  >>  And non-egg packages as well, right?

  Phillip> There isn't any such thing, from an egg developer's
  Phillip> perspective.

Really?  So if I use one egg, everything has to be an egg?

I'm not sure I follow you. If I'm an egg developer, and I want to use other Python packages in my project, I add their project names and versions to my setup.py, and then I get them installed for free. If an .egg-info on sys.path indicates that the project I want is already on my system, then the tools don't go hunting on PyPI and the runtime doesn't gripe about missing dependencies.

Note again that the dependencies *don't* need to be distributed as eggs. They can be distributed as source, eggs, .exe installers (Windows only), or Subversion URLs, as long as either PyPI has a usable link, or if I supply one in my project configuration. These dependencies' authors don't even need to have heard of the concept of eggs, they just need a reasonably-standard Python distutils package with a setup.py.

Thus, if I'm developing an egg, yes, all my dependencies have to be eggs, but this doesn't imply that I'm pushing eggification upstream, it just means that I can install their package as an egg locally, which essentially amounts to adding the PKG-INFO file in either an EGG-INFO or .egg-info directory. (The distutils normally generate this PKG-INFO file as part of creating a source distribution, so it's not even an egg-specific file format.)

So, projects using setuptools get to take advantage of most any project using distutils, and the upstream projects are modified only by adding the egg-info, in order to allow the tools and runtime to know when a dependency has already been satisfied.

While I don't advocate changing all Debian Python packages to add this metadata, I do suggest it's a practical way to deal with certain dependency issues. For example, TurboGears depends on ElementTree, which is not packaged as an egg by its author. (I think that Kid, which is also an egg-packaged TurboGears dependency, may depend on ElementTree as well.) Anyway, the quickest way to get all this stuff working without a lot of hacks to the dependency metadata would be to install an .egg-info marker with the ElementTree package, so that the egg tools and runtime on any user's machine will simply know what version of ElementTree is present, and be happy.

I know - you can think of other ways to deal with this. However, most of the ways that have been suggested to date fail in the use case where a user has been using the Debian package, and Kevin moves to requiring a new version of ElementTree or some other dependency, perhaps a new SVN revision that hasn't been released -- foobar-1.3.dev-r4262, let's say. (Setuptools users can have their builds tagged with a repository revision number.) This release of foobar isn't going to be in Debian unless you're tracking subversion revisions of experimental projects daily - and maybe you are, I don't know. The point is that when the Debian package no longer satisfies the dependency, the egg tools move smoothly to downloading and installing wherever the user has configured their development environment to install it, say their ~/pydev directory. So now we've segued smoothly into "multiple versions" being installed, but the "system version" is still intact.

A month later, a stable package is released and I upgrade my Debian install. This is a later version than the development version I have in ~/pydev, so the egg tools switch back to that as the preferred version unless I have a .pth file specifically requesting activation of the ~/pydev version as the active version for the other work I'm doing. (And even then it'll still prefer the Debian version if I don't have a ~/pydev version that satisfies something's dependency.)

These transitions can only be so seamless if the Debian-installed version of foobar includes the egg-info marker so that the tools know what version is sitting in /usr/lib, as opposed to the version(s) I have hanging in my ~/pydev.


  Phillip> Any distutils package can be made into an egg, because all of
  Phillip> the metadata needed is supplied by the standard distutils
  Phillip> setup script.  So, if you have the source, you can make it an
  Phillip> egg.

What if I don't have the source (or setup.py) ?

What do you have instead? There really aren't many formats for shipping binary Python packages. The only ones provided by the distutils are bdist_dumb, bdist_wininst, and bdist_rpm. It seems to me that all of these formats except bdist_dumb include enough metadata to be able to get the project name and version, which is all you need to create enough metadata to make a usable egg. The "easy_install" tool actually supports turning bdist_wininst packages into eggs directly. I'm not sure if you could do it with a bdist_dumb. A bdist_rpm probably has most of what you need just in the filename alone, at least if you're doing it manually. (Distutils-built distributions' filenames are too ambiguously formatted for automated parsing, alas, even though a human reader can usually tell what they mean.)

Anyway, all you need to make a non-egg package into an egg is its project name and version number. If you have those two things, you can make a PKG-INFO file, and that's all you need for today's egg runtime. For 0.6a9, you won't even need to put the data in a file, just the filename.


Accepting that there will be parallel (I hesitate to say "competing")
systems, and that keeping them in sync is both hard and necessary seems
to be the open issue.

I think this may actually be an illusion, perhaps brought about by preconceptions based on experiences with other packaging systems. All we need is that:

1. For Debian packages of setuptools-using packages (i.e., projects like FormEncode that explicitly set themselves up to be eggs), all the included metadata is installed in an .egg-info directory alongside the package. This is nothing more than including all the package's required contents, so there's no "parallel" anything going on here.

2. For Debian packages of non-setuptools packages, that are a dependency of a setuptools-using package, add an empty .egg-info file named for the dependency's project name and version number, as specified in its setup.py name/version options. This is just a simple addition to the packaging, and again doesn't seem to create any "parallel" anything. You do not need to go back and repackage every single Debian-Python package unless you feel that that's a more efficient way to handle it. You can simply add the .egg-info on an as-needed basis, when you package setuptools-using projects.

Now, there is the separate issue of whether you want to create a separate pyegg or python-pypi namespace for these packages, so that you can keep a closer match between package names and PyPI project names. That's for you guys to decide, as that's a matter of policy and process. But I don't see anything forcing you to make such a split, so again I don't get the "parallel" part.



Reply to: