[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Distutils] formencode as .egg in Debian ??



At 10:00 PM 11/23/2005 +0100, Martin v. Löwis wrote:
Phillip J. Eby wrote:
I was referring to how the distribution is *installed*. You don't use things directly from a deb file, they have to be installed on the system. When you install an egg, you must use one of the three forms, or the system as a whole will not function.

That depends on whether the "system" (pkg_resources, I assume) is used
at all. If the project is just a Python library, you can install it
as a Python package in site-python, not as an egg.

Eggs that depend on the egg will not be able to find it, nor use any plugins it contains.

Not sure what an egg plugin is, so I cannot comment on that.
As for other eggs finding the one: In Debian, there normally shouldn't
be any need to, since there will be also a Debian package providing
the other project, and then a plain "import" will be sufficient to
find the Python package.

No, it won't, because...  oh never mind.  I'll explain again below.

What you seem to keep missing, though, is that eggs and their metadata are a *feature*, not a bug. The rapid uptake of setuptools by developers trying to build more powerful frameworks and platforms for Python is sufficient evidence that they provide useful features that Python developers desire to have, precisely because they can be used to wrap non-setuptools based pacakges without code changes and without reinventing wheels - either the wheels provided by setuptools, or the wheels provided by other projects when wrapped by setuptools. Removing the metadata gives them neither option.


Of course, any usage of the pkg_resource API would break. One way
to deal with that is to encourage upstream authors to have a fallback
mode where they can work without pkg_resource; another is to provide
a fallback implementation of pkg_resource.

Yes, and while we're at it, let's encourage developers to have fallbacks so their code can run on Python 1.5.2. Heck, why stop there? Anything that requires features introduced after Python 1.0 would obviously only be an impossible attempt to improve upon perfection. For that matter, let's not have any dependencies on other packages at all! Clearly it would be better for everybody to write their own modules and not use something written by some random person on the Internet. :)

All joking aside, one of the central points of having setuptools in the first place is that it allows people to avoid duplicating code. Code like, say, the pkg_resources module. This is another example of what I'm calling a contradiction in terms, because I keep saying that the purpose of all this is to allow X, and then you propose, "well, do it without X", and I say, "but X is the whole point! Doing it without X isn't actually doing 'it' because X is what 'it' is." And then you say, "Ah, but what if you do it with Y?", and so we go round the loop again.


So, when I say it is a contradiction in terms to install an egg in a non-egg form, I mean that it is nonsensical to say that you have installed it, because it will be unusable (by other eggs), nonfunctional (by itself), or both.

That makes me not like the egg infrastructure: too many subtle
dependencies, and you are too much forced into using the structures
that the setuptools authors came up with.

[boggle] Um, what is Debian but a collection of subtle dependencies forced into the structures that its authors came up with? Perhaps your point here is just too subtle for me. :)


Of course, the pragmatic view is just to bite the bitter pill (is
this the idiom?)

The idioms are to "bite the bullet", or "swallow the bitter pill". The former is from the one-time medical practice of biting on a bullet to avoid screaming during procedures performed without anesthetic. The latter of course is also a medical idiom, in the sense that a medicine may be bitter but nonetheless good for one's health. :) In any case, both idioms imply a desire to get an unpleasant but beneficial task over with, so mixing them is quite understandable, albeit odd-sounding. :)


 and find some strategy that makes pkg_resource
work, without any of the drawbacks of setuptools.

Just as I'm trying to help find a way to make Debian be able to provide something useful for setuptools-based projects, despite the drawbacks of the current Debian arrangements. ;)

The degree of negativity from the Debian side at the outset of this conversation (virtually all of it from you) has not been conducive to making this happen. As a simple matter of practicality, I can't afford to leave your comments unanswered, not because I feel any need to convince you personally of anything, but because I don't want to leave anyone else with the impression that your portrayal of these so-called "drawbacks" is a fair one. Otherwise, I would have just ignored your comments and focused on working with the people who seem more interested in finding solutions than finding ways to declare a non-existence of the problem. As it is, I feel forced to spend time replying to your comments point-by-point, that I could otherwise spend on actually helping to resolve the issues.

If I were to adopt your tone, I would be calling Debian a fragile and broken system that is unable to deal well with simple matters like editing a file upon installation, or having multiple versions of a package installed at the same time. Sure, the limitation might exist, but is it fair to call Debian fragile or broken because of it? Not a bit! I've therefore been very careful to describe any such tradeoffs that Debian makes in neutral terms rather than categorically pejorative ones. I would prefer if you would extend me the same courtesy of not describing every design tradeoff I make as being a "non-standard", "drawback", "for no good reason".

(Even though I have referred to the existing Debian policy as "outdated", I meant it only in the sense that it does not deal explicitly with the issue of eggs, which is a neutral statement, not a judgment of the condition. It would be stupid and unreasonable for me to imply that Debian's policy must be updated to include eggs, as setuptools is alpha software that is very much still in development. Which is why it isn't me who approached the Debian developers about this, as opposed to the other way around. However, once contacted about the matter, I'm certainly going to point out that ignoring the existence of eggs and their likely rapid increase in popularity (e.g. TurboGears claims 40,000 eggs served) is also unreasonable.)


I would expect that you can "unegg" a project.

For projects that make use of eggs, you expect wrong. Try it with setuptools, and you will find that it is unable to even run its own tests, because the "test" command is registered via an entry point.

I would have to rewrite the code, of course. I do all registration
that needs to be done in __init__.py

That registration can't be done until a package is imported, so even if you did the significant patching this would require, your effort will fail as soon as you bring extensions into the picture, such as buildutils or SQLObject, as I already explained.


Entry points are just one kind of project metadata that can be registered; other projects like Trac and SQLObject have their own kinds of metadata as well. None of this metadata is accessible without the EGG-INFO or .egg-info directory; removing it is like removing the JavaBean metadata or the deployment descriptors from Java jars, rendering the jar useless in many contexts, despite the fact that all the "code" remains.

Sure, *just* removing it would be wrong. I have to replace it with
Python code.

Which will *never be imported* and will therefore never execute, because the project it needs to *plug into* won't know it exists. A project "foo" that extends the functionality of project "bar" can't be statically known about by project "bar". The dependency is that foo requires bar, but bar must be able to "discover" at runtime that foo exists.

The idea is that project "bar" can be extensible by other projects, by providing entry point groups that other projects can add themselves to (via published metadata). These other projects do not need to be imported; they are found by their metadata, which describes them as offering entry points in the "bar"-supplied entry point groups. Thus, new projects like "foo" can hook in to the infrastructure provided by "bar".

For example, SQLObject and buildutils are project "foo" with respect to setuptools; setuptools doesn't depend on them, or know about their existence a priori. But their mere presence on sys.path (or more precisely, the presence of egg metadata in well-defined locations relative to sys.path entries) is enough to allow setuptools to find them.

The "Trac" web-based project management application is an example of project "bar" - it offers a sophisticated plugin capability to allow people to customize its database, web interface, and so on. The mere existence of a plugin project on sys.path, or its presence in the Trac plugins directory, is sufficient to allow that project's code to be *dynamically imported* on an as-needed basis whenever a particular notification hook is invoked.

These things are not practical without some kind of metadata. You cannot simply replace the metadata with code, because the code has to be imported, which means that you would have to import every module and package on sys.path in order to be sure you found all the metadata.


The only projects that can be "unegged", then, are ones that no egg project depends on, and which do not themselves depend on any eggs. The number of projects that are not depended on by other projects will be smaller and smaller over time, as will the number that do not depend on other eggs.

Define "depends on". If this is "imports", I don't see a problem with
unegging the package.

As you said, a false proposition implies any conclusion. It is you who is assuming "depends on" means "imports". Plugins are the simplest example of a "depends on" that goes beyond importing.


In essence, trying to work around the absence of egg metadata is a bottomless pit, because over time there will be an ever-increasing amount of functionality in the field that is based on the use of metadata.

That is really sad.

Yes, we should all go back to C like real programmers. :) No, wait, then we would have to deal with all those messy .h files. But who needs interfaces and metadata like argument types? We should just put the memory addresses of the functions directly in our code, because then there will be fewer processing steps and we won't have all those .h files messing up the place. Plus, that whole concept of a "linker" seems awfully fragile to me. Who knows what address it might put my code at? Besides, I don't need a linker if I only use the code that I write, and those people who use other people's code are obviously just too lazy to write their own or even copy and paste it. Can you imagine? :)


I would add the complaint:
- it increases sys.path for no good reason.

It is only true that it increases the length in the case of the two .egg forms, not the .egg-info form.

Ok, then I think this is what Debian should use.

Great! At least we are making some progress here. For non-setuptools packages (like ElementTree), it will suffice to place an empty 'projectname-version.egg-info' file or directory in site-packages alongside the installed package. I will modify setuptools 0.6a9 to parse the version from the file or directory name, and to accept a file instead of a directory. (Currently, it requires a PKG-INFO file inside an .egg-info directory and parses the Version: header from PKG-INFO.)

If Debian adds this metadata marker for its non-setuptools Python packages, then the Python packages will be "eggs" in the sense that other eggs will be able to discover them via the pkg_resources API, and thus TurboGears users will be able to use the Debian-supplied versions of ElementTree and the like.

Note, however, that the 'projectname-version' string has some precise escaping rules; the distutils are quite inconsistent about their processing of names and escaping, so I had to devise more specific rules for setuptools, because setuptools has to actually *use* the project names and versions, and parse them out of filenames:

1. The project name in a file or directory name is the setup(name=...) argument, with all runs of one or more non-alphanumeric characters replaced with '_'. (Note that this means there is never more than one '_' in a row in the filename.) So a project like "FooBar Tools" or "FooBar-Tools" would become "FooBar_Tools" in the filename.

2. The rules for the version are the same as for the name, *except* that the '.' character is allowed to remain unescaped, and spaces are converted to '.' before compacting non-alphanumeric runs. So, version '1.2 rc5' becomes '1.2.rc5', while '1.2-pl5' becomes '1.2_pl5'.


The "no good reason" part is an interesting opinion, although in my view it is rather narrow-minded. Being able to support multi-version importing is a very good reason indeed, as is avoiding the need for a platform-specific package management tool in order to manage Python projects.

I don't see why multi-version support necessarily requires to
increase sys.path. In the case of eggs, version dependencies are
expressed explicitly in the code (through require() calls),

Actually, they're expressed in the egg metadata, and the wrappers on a project's scripts execute the require() calls, so that the code doesn't have to contain explicit require() calls except for more-dynamic situations, such as plugins and "optional extra features" that require additional projects to be present.


 so
that essentially replace the standard Python import search algorithm.
Because of that, you could have a default version inside site-packages,
and additional versions elsewhere, only found when require() is
called.

That's correct, and setuptools actually supports that scenario, but it doesn't currently provide tools for creating that arrangement on disk, since the "default version" you propose would be hard to manage without an external packaging tool, like Debian. (The proposed addition for 0.6a9 would be to make it possible to install such a thing, for use with external packaging tools.)

Note that setuptools is in release 0.6a8 at the moment - it is obviously not a polished product, but it provides enough functionality to be quite useful to many Python developers. To this point, directly working on integration with external packaging tools has not been a focus, although I always have given top priority to responding to questions and requests from people working on integration with those tools (e.g. the volunteers who worked on easy_deb and the Gentoo stuff). I can't reasonably learn the technical details of every packaging system, so it is best to let volunteers familiar with individual packaging systems tell me what they need in order to effectively wrap the system.

Up until now, my interactions with such volunteers have been most pleasant and positive. To my knowledge, it's not usual for packaging system developers to spew FUD at a project and look for ways to exclude or break the work of developers who've chosen to use it. I'm therefore more than a little surprised by some of the attitude I've received. I hope, though, that we can get past that soon, if only because it means I'll have more time to work on implementing and documenting whatever the resolution is. ;)



Reply to: