[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Distutils] formencode as .egg in Debian ??



At 11:08 AM 11/23/2005 +0100, Martin v. Löwis wrote:
As for terminology, you seem to suggest to use "distribution" where
Debian uses "package". So "Debian package" would become "Debian
distribution".

No, I'm fine with "Debian package"; I was using "distribution" in the sense of "distutils distribution", such that you can have a "Debian package" of a "distutils distribution". The issue is that a "Python package" is not 1:1 with either a "Debian package" nor a "distutils distribution". An "egg" is a "distutils distribution" that may or may not contain "Python packages", but also contains "egg metadata" which is specific to the "distribution", not to any individual Python module or Python package contained within that distribution.


I'll try to use "project" in your sense and "package" in the
Python sense whenever I can.

Great - and let's use "Debian package" to mean the thing that manages the installation of a project containing packages. :)


Phillip J. Eby wrote:
An "egg" is a "distribution" of a "project" that is importable and can carry both standardized and individualized metadata that can be read by the pkg_resources module. There are various distribution *formats* in which an "egg" may be physically manifested, but the "egg" itself is a logical concept, not a physical one. It is therefore, as I said, "not merely a distribution format". Is that any clearer?

Yes. When I said "an egg", I meant "a zipfile with a .egg extension,
or a directory with a .egg extension". In response to

# [...] who will quite simply need eggs for many packages.
# If Debian doesn't provide them, the users will be forced to obtain
# them elsewhere.

I meant

"Debian should provide the distributions, but not as .egg files";
it should provide the distribution as a deb file. So users are provided
with the project, but in a form that is not one of the three forms
an egg could have.

I was referring to how the distribution is *installed*. You don't use things directly from a deb file, they have to be installed on the system. When you install an egg, you must use one of the three forms, or the system as a whole will not function. Eggs that depend on the egg will not be able to find it, nor use any plugins it contains. Eggs that define a plugin system of their own, will usually define self-plugins in their own metadata, as this is considered good style as well as being more convenient. Such eggs will not function without their *own* metadata installed. (Setuptools is an example of this, and I believe Trac 1.0 will be similar; some of the Paste projects may be using this already, too.)

So, when I say it is a contradiction in terms to install an egg in a non-egg form, I mean that it is nonsensical to say that you have installed it, because it will be unusable (by other eggs), nonfunctional (by itself), or both.


The "contradiction in terms" was that I took your meaning of "package" to be the same as my term "project" - i.e., a functional collection of Python resources. Projects that *are* eggs, can't be provided "but not as eggs". They *are* eggs, so not providing them as eggs means not providing them at all.

I would expect that you can "unegg" a project.

For projects that make use of eggs, you expect wrong. Try it with setuptools, and you will find that it is unable to even run its own tests, because the "test" command is registered via an entry point. Entry points are just one kind of project metadata that can be registered; other projects like Trac and SQLObject have their own kinds of metadata as well. None of this metadata is accessible without the EGG-INFO or .egg-info directory; removing it is like removing the JavaBean metadata or the deployment descriptors from Java jars, rendering the jar useless in many contexts, despite the fact that all the "code" remains.

The only projects that can be "unegged", then, are ones that no egg project depends on, and which do not themselves depend on any eggs. The number of projects that are not depended on by other projects will be smaller and smaller over time, as will the number that do not depend on other eggs.

Hm, that reminds me. One of the newer setuptools features for egg projects is automatic script generation using entry points. A developer can designate a function in some module as the implementation for a script, and a platform-appropriate script to invoke that function is automatically generated during installation. (In the case of Windows, an .exe is created alongside a .py or .pyw, on all other platforms it's a simple #!python script with no extension.)

However, these generated scripts contain only a couple of lines that invoke the function via the project's entry point table - which is part of its egg metadata. So, if you remove the metadata, any scripts of this type that are installed by the project will fail to operate as well. Since there is no script in the original source, you would have to manually copy information from the project's setup.py in order to create scripts with equivalent functionality.

In essence, trying to work around the absence of egg metadata is a bottomless pit, because over time there will be an ever-increasing amount of functionality in the field that is based on the use of metadata.


You can distribute the
project as a collection of Python modules, not as a collection of
Python resources. The Debian developer could (and I was suggesting
he should) just ignore the entire egg structure, and distribute
the code of the library only.

Sure, just like you could delete the metadata files and directories from jar files, if you had some policy that required it. However, this wouldn't make any more sense than what you're proposing here. The projects would be unusable by other projects and/or nonfunctional in themselves, just like eggs.


 If so, Debian should not distribute them.

This is what I don't understand, as it has nothing to do whether or not is a distribution format, at least not that I can see. My statement was that eggs are not merely a distribution format; they are a logical concept that can be physically packaged in various ways, and if it's necessary to invent yet another physical layout, well, we can do that too.

Yes, but this logical concept is in the way of Debian packages/distributions (atleast if done naively by the Debian
developer). This is what started the entire discussion: Matthias
Urlichs complained that Bob Tanner included the egg structure
in the formencode Debian package/distribution.

It's in the way of not changing the policy, sure. However, the policy's restriction in this case is not providing any functional benefit to anyone. Eggs, on the other hand, are a functional technical construct with actual usefulness in the field. To choose the policy over your users' needs in this case is like choosing to eat the restaurant's menu because the food in the pictures is more neatly arranged than the food on your plate. :)


The specific initial complaints where:
- you can't use it with a simple "import formencode",
- pydoc does not work on "eggs".

These are both incorrect. First, if you install a .pth file (as easy_deb does, and any extra_path distutils distributions do), the first is moot. Second, pydoc works fine on all varieties of eggs, with a single exception: it does not work with zipped packages - the modules in the package can be documented, but not the parent package itself. This is a clear and obvious bug in pydoc (failure to update for PEP 302), and it is easily fixed. Nonetheless, it is trivially avoided by using either the unzipped or .egg-info installation formats.

(Detail: PEP 302 specifically allows strings in a package __path__ to not be directories, and it also allows __path__ to be empty. pydoc assumes that it is non-empty and that its first element is a directory.)


I would add the complaint:
- it increases sys.path for no good reason.

It is only true that it increases the length in the case of the two .egg forms, not the .egg-info form.

The "no good reason" part is an interesting opinion, although in my view it is rather narrow-minded. Being able to support multi-version importing is a very good reason indeed, as is avoiding the need for a platform-specific package management tool in order to manage Python projects.

Of course, you can safely ignore these points if you are looking at it strictly from the point of view of a package management tool that doesn't support installing multiple versions of things. You are blocked from these eminently "good reasons", however, by something that has nothing to do with eggs, so putting the "no good reason" on eggs is inappropriate. There are quite good reasons; you are simply blocked from taking advantage of them by the limitations of your chosen packaging tool.

In any case, this complaint is moot in the case of the .egg-info form, since it does not affect the length of sys.path.


Which would be the same as saying you wouldn't distribute, say, setuptools itself. Setuptools is an egg, and can't function except as an egg, because it is more than a Python package. Again, an "egg" is some specific release of a project and its introspectable metadata.

I could rewrite setuptools to function as a regular Python package.
After a shallow inspection, there aren't many places where it really
needs the pkg_resources functionalities for itself - I could only
identify the part that locates cli.exe. As this is used on Windows
only, a Debian port of setuptools could simply ignore this code.

Your "shallow inspection" is just that. Try this experiment. Delete the "setuptools.egg-info" directory, and then try to run "setup.py test" or "setup.py bdist_egg". After you figure out how to fix that, and install your setuptools in a "non-egg" form, I encourage you to try to build and install SQLObject and buildutils, or any other package that adds setup commands to setuptools, and see whether those commands work when the provider is lacking its metadata. For an encore, see if you can figure out how to get PasteDeploy configuration files to work - they're a format that allows users to deploy arbitrary WSGI applications as long as they're importable... and installed as an egg, with egg metadata.

Eggs (and their metadata) exist because they provide functionality that is not practical to provide without them, and the scope of the deployed functionality that relies on the metadata is increasing rather quickly.


If "setup.py install" does other things, like editing an
existing file, it is not so easy anymore.

I'm thinking that perhaps I should add an option like '--single-version-externally-managed' to the install command so that you can indicate that you are installing for the sake of an external package manager that will manage conflicts and uninstallation needs. This would then allow installation using the .egg-info form and no .pth files.

The only issues remaining then are namespace packages and other inter-project overlaps, which of course you have to deal with now. (Example: the PyDispatcher and RuleDispatch projects both contain a 'dispatch' package, with unrelated contents.)


That is not true. Usability also suffers if sys.path becomes long.

How?  I don't understand this.

People will often inspect sys.path to understand where Python
is looking for their code.

As I pointed out, eggs give you much better information on this.  For example:

python -c "import pkg_resources; print pkg_resources.require('kid')"

[kid 0.7a (c:\cygwin\home\pje\chandlerstuff\chandler\release\bin\lib\site-packages\kid-0.7a-py2.4.egg), elementtree 1.2.6 (c:\cygwin\home\pje\chandlerstuff\chandler\release\bin\lib\site-packages\elementtree-1.2.6-py2.4.egg)]

I get the versions along with the paths, and the versions and paths of all dependencies. This information is not available in a cross-platform way without eggs. (And again, I mean the logical egg, not the .egg format; the above command would've listed any projects in .egg-info format as well as .egg files and directories.)


What I would suggest here is having a namespace (e.g. pyegg2.4-whatever) for naming packages based on their PyPI names, so that there can be an automated relationship between setuptools dependencies and Debian ones.

That would be a policy change (I think). Whether it would be agreeable,
I have no idea.

I understand that, on both points. I was simply suggesting it would be useful, not trying to debate what the policy currently is.


Anyway, I don't see any obvious reasons why this can't be an automated process, even for the system library dependencies. easy_deb even has a simple configuration file that can augment the setuptools-style dependencies with explicit Debian dependencies.

Debian policy currently seems to require that the dependencies are
provided as plain text in a patch to the upstream sources(*). So the
idea certainly is that dependencies are managed by the developer,
not automatically.

I'm only interested in what's helpful or useful to Debian developers and users, not what the current policy is. Policies tend to adapt to fit things that are useful, or else they become more of a drawback than a benefit. I mention these things because they may allow the process and policy to be improved, to everyone's benefit.

If the policy doesn't change, however, then it should suffice to use .egg-info format to allow the distribution of egg projects as Debian packages conforming to the existing policy, assuming the policy does not prohibit including non-package directories in site-packages. The fact that .egg-info packaging may inconvenience packagers is a pain caused by the policy, however, not by eggs. I do intend, though, to update setuptools and easy_install to make using .egg-info form easier, and I will probably also fix it so that running e.g. bdist_rpm on a setuptools-based package will produce an .egg-info format egg wrapped in an RPM.

I remain concerned about how such packages will work with namespace packages, since namespace packages mean that two different distributions may be supplying the same __init__.py files, and some package managers may not be able to deal with two system packages (e.g. Debian packages, RPMs, etc.) supplying the same file, even if it has identical contents in each system package.



Reply to: