Re: [Distutils] formencode as .egg in Debian ??

To: "David Arnold" <david@mantara.com>
Cc: Distutils-Sig@python.org, debian-python@lists.debian.org
Subject: Re: [Distutils] formencode as .egg in Debian ??
From: "Phillip J. Eby" <pje@telecommunity.com>
Date: Fri, 25 Nov 2005 01:33:05 -0500
Message-id: <5.1.1.6.0.20051125003059.02c5de70@mail.telecommunity.com>
In-reply-to: <E1EfSmy-0004aw-00@server.0x1.org>
References: <Your message of "Thu, 24 Nov 2005 19:16:44 CDT." <5.1.1.6.0.20051124184858.01f9c2c8@mail.telecommunity.com>

At 12:54 PM 11/25/2005 +1100, David Arnold wrote:

So, if a system package, shipped by the upstream developer as an egg, is
"unpacked" into a directory structure, and its metadata is maintained
in a .egg-info file somewhere in sys.path, non-system eggs will have all
they need to operate correctly?

Yes, with a few clarifications. The internal structure of an egg, let'ssay foobar-1.2-py23.egg, would look something like:


    foobar/
           __init__.py
           baz.py
           # plus .pyc files, etc.

    EGG-INFO/
             PKG-INFO  # distutils metadata like description/version
             requires.txt   # optional and required dependencies
             # plus other metadata files, either setuptools-defined or
             # project specific

If you unpack this as-is, but rename EGG-INFO to foobar.egg-info (today) orfoobar-1.2.egg-info (when I release 0.6a9 of setuptools), and the wholetree above is in a directory on sys.path, this egg is good to go.

I would like to clarify the phrase "shipped as an egg", though. To me,that would mean that the developer is distributing a binary .egg file, andI'm assuming that Debian is primarily interested in *source* packages,being a Free Software distribution. (A binary .egg doesn't have to containsource code at all; you can specifically build it with the source strippedif you desire.) The plan for setuptools 0.6a9 is to provide an option to"setup.py install" that will basically install the layout described above,with the correctly named .egg-info directory automaticallycreated. (Normally, the whole tree above is instead nested in an .egg fileor directory.)

I think I should also clarify that whether the upstream developer sets outto package their project as an egg or not, it's possible to create an.egg-info directory and PKG-INFO file to identify that distribution, usingsetuptools' "easy_install" program and the source distribution. So if thedeveloper of 'foobar' did not choose to create an egg or use setuptools,this doesn't stop a developer who wants to *use* foobar from simply runningeasy_install to create an .egg file for it. So, this is what I mean when Isay there's no such thing as a non-egg package for an eggdeveloper. Someone who depends on a package can simply say they depend onit, and when they build their package, they'll get eggs for theirdependencies as a side effect.

So that's another goal of eggs?  To provide information to a package
maintainer to assist in determining if it's the user's PYTHONPATH or
.pth files that are causing a bug?

More specifically, what versions of what packages they're *actually* using,as opposed to what they think they installed or have on theirsystem. PYTHONPATH and .pth files can of course be a factor in that, butalso just people thinking they installed something, or not knowing that abug is fixed in a particular version. Part of it too is finding outwhether they're reporting a regression or whether they're just still usinga version that has a bug that's been fixed. In the case of the TurboGearsmailing list, it's often been the case that TurboGears users flush out abug in a dependency, which then gets fixed, but then a new TurboGears usermaybe reports the same problem, and then it's obvious from their errormessage whether or not they upgraded.

I realize this is stuff you guys probably do all day for system packages,but eggs make the support job easier upstream too.

I can see that this is *nice*; I'd debate "need".  But I'm happy to
accept that for egg-based stuff, this is a nice feature.

Well, need is relative. A project like TurboGears "needs" this, becauseotherwise it would be uneconomical to provide the current level of supporton as many platforms. So, one project's "nice to have" may be anotherproject's lifeblood, depending on available resources. They've also madeit easier for the authors of TurboGears' dependencies to assist in supportas well. For me, I'm glad that these features have helped to makesomething like TurboGears possible and practical.

I'm not going to try to assert "Unix values" here.  My observation is
that historically, Unix has installed things into one of a couple of
directory hierarchies (/usr, /usr/local, /opt).  Within those
hierarchies, there has been scope for only one version of any given
thing.

Um, sure. Not sure what this has to do with the present discussion. As apractical matter, only *one* version of an egg can be *active* (i.e.importable) on sys.path within a given process anyway. It's also clearlynot going to be the case on a Debian system that somebody would havemultiple versions of something living in /usr/lib, although they might doit for /usr/local or in a user-private directory.

So, I think maybe I lost the train of thought on this point here. I wasunder the impression that the consensus of the Debian-Python folks so farwas that of any egg format, the "single version externally managed" oneusing .egg-info directories was preferred, since it is basically the sameas your current layout. (It's also convenient for me to implement, becauseit's basically the same as the format already used by the "setup.pydevelop" command for temporarily adding a project's source checkout tosys.path.)

  Phillip> And we'd like all this to cleanly work with any
  Phillip> locally-installed non-Debian eggs that might be in the mix,
  Phillip> since we need to do development, beta testing, etc.

  >>  And non-egg packages as well, right?

  Phillip> There isn't any such thing, from an egg developer's
  Phillip> perspective.

Really?  So if I use one egg, everything has to be an egg?

I'm not sure I follow you. If I'm an egg developer, and I want to useother Python packages in my project, I add their project names and versionsto my setup.py, and then I get them installed for free. If an .egg-info onsys.path indicates that the project I want is already on my system, thenthe tools don't go hunting on PyPI and the runtime doesn't gripe aboutmissing dependencies.

Note again that the dependencies *don't* need to be distributed aseggs. They can be distributed as source, eggs, .exe installers (Windowsonly), or Subversion URLs, as long as either PyPI has a usable link, or ifI supply one in my project configuration. These dependencies' authorsdon't even need to have heard of the concept of eggs, they just need areasonably-standard Python distutils package with a setup.py.

Thus, if I'm developing an egg, yes, all my dependencies have to be eggs,but this doesn't imply that I'm pushing eggification upstream, it justmeans that I can install their package as an egg locally, which essentiallyamounts to adding the PKG-INFO file in either an EGG-INFO or .egg-infodirectory. (The distutils normally generate this PKG-INFO file as part ofcreating a source distribution, so it's not even an egg-specific file format.)

So, projects using setuptools get to take advantage of most any projectusing distutils, and the upstream projects are modified only by adding theegg-info, in order to allow the tools and runtime to know when a dependencyhas already been satisfied.

While I don't advocate changing all Debian Python packages to add thismetadata, I do suggest it's a practical way to deal with certain dependencyissues. For example, TurboGears depends on ElementTree, which is notpackaged as an egg by its author. (I think that Kid, which is also anegg-packaged TurboGears dependency, may depend on ElementTree aswell.) Anyway, the quickest way to get all this stuff working without alot of hacks to the dependency metadata would be to install an .egg-infomarker with the ElementTree package, so that the egg tools and runtime onany user's machine will simply know what version of ElementTree is present,and be happy.

I know - you can think of other ways to deal with this. However, most ofthe ways that have been suggested to date fail in the use case where a userhas been using the Debian package, and Kevin moves to requiring a newversion of ElementTree or some other dependency, perhaps a new SVN revisionthat hasn't been released -- foobar-1.3.dev-r4262, let's say. (Setuptoolsusers can have their builds tagged with a repository revisionnumber.) This release of foobar isn't going to be in Debian unless you'retracking subversion revisions of experimental projects daily - and maybeyou are, I don't know. The point is that when the Debian package no longersatisfies the dependency, the egg tools move smoothly to downloading andinstalling wherever the user has configured their development environmentto install it, say their ~/pydev directory. So now we've segued smoothlyinto "multiple versions" being installed, but the "system version" is stillintact.

A month later, a stable package is released and I upgrade my Debianinstall. This is a later version than the development version I have in~/pydev, so the egg tools switch back to that as the preferred versionunless I have a .pth file specifically requesting activation of the ~/pydevversion as the active version for the other work I'm doing. (And even thenit'll still prefer the Debian version if I don't have a ~/pydev versionthat satisfies something's dependency.)

These transitions can only be so seamless if the Debian-installed versionof foobar includes the egg-info marker so that the tools know what versionis sitting in /usr/lib, as opposed to the version(s) I have hanging in my~/pydev.

  Phillip> Any distutils package can be made into an egg, because all of
  Phillip> the metadata needed is supplied by the standard distutils
  Phillip> setup script.  So, if you have the source, you can make it an
  Phillip> egg.

What if I don't have the source (or setup.py) ?

What do you have instead? There really aren't many formats for shippingbinary Python packages. The only ones provided by the distutils arebdist_dumb, bdist_wininst, and bdist_rpm. It seems to me that all of theseformats except bdist_dumb include enough metadata to be able to get theproject name and version, which is all you need to create enough metadatato make a usable egg. The "easy_install" tool actually supports turningbdist_wininst packages into eggs directly. I'm not sure if you could do itwith a bdist_dumb. A bdist_rpm probably has most of what you need just inthe filename alone, at least if you're doing it manually. (Distutils-builtdistributions' filenames are too ambiguously formatted for automatedparsing, alas, even though a human reader can usually tell what they mean.)

Anyway, all you need to make a non-egg package into an egg is its projectname and version number. If you have those two things, you can make aPKG-INFO file, and that's all you need for today's egg runtime. For 0.6a9,you won't even need to put the data in a file, just the filename.

Accepting that there will be parallel (I hesitate to say "competing")
systems, and that keeping them in sync is both hard and necessary seems
to be the open issue.

I think this may actually be an illusion, perhaps brought about bypreconceptions based on experiences with other packaging systems. All weneed is that:

1. For Debian packages of setuptools-using packages (i.e., projects likeFormEncode that explicitly set themselves up to be eggs), all the includedmetadata is installed in an .egg-info directory alongside thepackage. This is nothing more than including all the package's requiredcontents, so there's no "parallel" anything going on here.

2. For Debian packages of non-setuptools packages, that are a dependency ofa setuptools-using package, add an empty .egg-info file named for thedependency's project name and version number, as specified in its setup.pyname/version options. This is just a simple addition to the packaging, andagain doesn't seem to create any "parallel" anything. You do not need togo back and repackage every single Debian-Python package unless you feelthat that's a more efficient way to handle it. You can simply add the.egg-info on an as-needed basis, when you package setuptools-using projects.

Now, there is the separate issue of whether you want to create a separatepyegg or python-pypi namespace for these packages, so that you can keep acloser match between package names and PyPI project names. That's for youguys to decide, as that's a matter of policy and process. But I don't seeanything forcing you to make such a split, so again I don't get the"parallel" part.

Reply to:

Follow-Ups:
- Re: [Distutils] formencode as .egg in Debian ??
  - From: "David Arnold" <david@mantara.com>
- Re: [Distutils] formencode as .egg in Debian ??
  - From: Donovan Baarda <abo@minkirri.apana.org.au>

References:
- Re: [Distutils] formencode as .egg in Debian ??
  - From: "Phillip J. Eby" <pje@telecommunity.com>
- Re: [Distutils] formencode as .egg in Debian ??
  - From: "David Arnold" <david@mantara.com>

Prev by Date: Re: [Distutils] formencode as .egg in Debian ??
Next by Date: Re: [Distutils] formencode as .egg in Debian ??
Previous by thread: Re: [Distutils] formencode as .egg in Debian ??
Next by thread: Re: [Distutils] formencode as .egg in Debian ??
Index(es):
- Date
- Thread