[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Go (golang) packaging, part 2

Okay, fortunately, no bands are practicing tonight and no kids need homework
help, so let's see if I can answer some of these questions. :)

On Feb 07, 2013, at 08:54 AM, Paul Wise wrote:

>On Thu, Feb 7, 2013 at 8:19 AM, Barry Warsaw wrote:
>> Speaking with many hats on, I think Debian Python has done a very admirable
>> job of integrating the Python ecosystem with Debian.
>One of the pain points for users (I've had folks ask me this face-to-face)
>with that stuff was site-packages vs dist-packages. With your various Python
>hats on, can you explain why not just use "packages" instead of
>"site-packages" and "dist-packages"?

Fundamentally, this comes down to a conflict between Python's historical
defaults and Debian's interpretation of the FHS.  Let me just stipulate that
I'm not casting blame, or saying that anybody is doing anything wrong.  I'm
not interested in that discussion, though I've had it many times.  It is what
it is.

Old timers like me will remember the days when *nix systems reserved
/usr/local for stuff you downloaded and installed from source (i.e. most
everything on a usable system :).  There was no /opt or FHS.  This was
codified in the first auto-configuration scripts.  I don't remember when
Python adopted the configure regime, but as long as I can remember (going back
at least to 1994), a default build-from-source of Python installed into
/usr/local.  When site-packages was added,
/usr/local/lib/pythonX.Y/site-packages was the most logical place to put it.

Predating my involvement with Debian, I remember problem reports where
developers of Python, and others who install from source for various reasons,
would break their systems when they used the wrong Python executable to
install third party packages outside of the Debian packaging system.  This was
because Debian allowed /usr/local/lib/pythonX.Y/site-packages to be used for
third party packages installed outside the Debian packaging system, using the
*system* Python, i.e. /usr/bin/python.  This meant that if I installed
something for /usr/bin/python into /usr/local/lib/pythonX.Y/site-packages it
could easily break my /usr/local/bin/python, and possibly vice versa.

I think it was at a Pycon years ago that Matthias and I discussed this
problem.  At the time (and probably still so), it didn't seem like either
Debian or Python was going to change its policy, so we had to find a way to
avoid the conflict and let both communities live in peace.  Matthias's
solution was the use of dist-packages for Debian's system Python, which would
be ignored by a /usr/local/bin Python.  Also, system Python would ignore
/usr/local/lib/pythonX.Y/site-packages (but not .../dist-packages), thus
avoiding all conflict.  It seemed elegant at the time, and I still think this
is a reasonable compromise, even though it does cause some tooling problems,
which have to be patched in Debian.

>The right way (IMO) would have been to put site packages in
>/usr/local/lib/pythonX.Y/packages and dist ones in
>/usr/lib/pythonX.Y/packages. Right now I have
>/usr/local/lib/pythonX.Y/dist-packages and /usr/lib/pythonX.Y/dist-packages,
>why is /usr/local dist-packages instead of site-packages? /usr/local is
>clearly not the location for distro installed packages.

That was my position, i.e. that system Python shouldn't have any path from
/usr/local on sys.path, but that was very strongly (at the time) disputed by
Debian users.  To be fair, the Debian users at the time (and maybe still do)
say that the right solution is for a default from-source build of Python to
install into /opt/local and not /usr/local, but again, that would conflict
with years of established use by upstream.

That's the historical background as I remember it anyway.

>Why did Debian have to invent /usr/share/pyshared and symlink farms in
>/usr/lib/pythonX.Y instead of upstream having something like that in
>the default install and search paths?

Because upstream doesn't really care (or didn't until my PEPs 3147 and 3149 in
Python 3.2) about multiple versions of packages co-existing in harmony, and
because upstream Python requires .pyc files to live next to (FSVO, see below)
the .py files.

Debian was the first place that I recall where multiple versions of Python
could be co-installed.  Let's say you have both Python 2.6 and 2.7 installed,
and you have a module called foo.py that is source-level compatible with both.
The problem is that Python has never guaranteed that .pyc files would be
compatible across Python versions.  It's never said they wouldn't be, but in
practice the byte code cached in .pyc files always changes, due to new
features or bug fixes in the interpreter between major version numbers.

So in Debian you have a situation where you want to share foo.py across all
supported and installed Pythons, but where you cannot share .pyc files because
they aren't compatible.  You want to share .py files 1) to keep package sizes
smaller, 2) to consume less disk space, 3) because you don't actually know
which versions of Python the target system has installed.  Just because
version W of Debian supports PythonX.Y and PythonA.B doesn't mean your system
has both installed, so you'd rather not pay for the penalty of packaging up
two identical foo.py's for both of them, just because they'll live in
different locations on the file system.  And they'd have to live in different
paths because of Python's requirement for nearby .pyc files combined with
cross-version incompatibility of .pyc files.

(Aside: there's no getting around paying this cost for extension modules since
they are binary .so files, but there are *way* fewer of these than
pure-Python, theoretically cross-version compatible source files.)

There have been several regimes to manage this, all of them to the best of my
knowledge using symlink farms to manage the sharing of .py files with the
version-specific-ness of .pyc files.  IMHO, dh_python2 takes the best approach
to this, but previous regimes such as python-support, and probably
python-central are still in use.

This was finally solved by my work on PEPs 3147 and 3149, which introduced the
__pycache__ directory in Python 3.2 and tagged .so and .pyc file names.
(Aside: __pycache__ isn't strictly necessary to support this, but was a nice
additional feature suggested by Guido.)

Now in the Python 3 world, you *can* co-install multiple versions and even
though the .pyc and .so files are still version-specific, they can co-exist
peacefully.  PythonX.Y will only try to load foo.cpython-XY.pyc and ignore
foo.cpython-AB.pyc, instead of overwriting it, which would have happened
before.  Unfortunately, this work came too late to be included for Python 2,
so we still need the symlimk farms for that (obsolete <wink>) version.

But if you look at how we do Python 3 packages now, you'll see
/usr/lib/python3/dist-packages with shared .py source files, version-specific
.pyc inside __pycache__ directories, and ABI tagged .so files co-existing with
no symlink farms.  Three cheers for progress!

(Aside: you'll still see a /usr/lib/python3.X but that's for version-specific
stdlib only.)

>The location of .pyc files that are built at install time doesn't feel
>FHS-correct to me, /var/cache/python/X.Y/ seems better.

It probably is, but upstream Python can only handle .pyc files living next to
(or in the post PEP 3147 world, very nearby) their .py files.  I suppose you
could use Python 3.3's importlib to write an importer that codified this
policy, but leaving aside whether it would be worth it, you'd probably have a
similar (or worse) tooling problem as with dist-packages, since there's
probably many packages that assume the .pyc lives near the .py (and some have
even had bugs caused by the PEP 3147 reorganization alone, not all of which
are fixed I'm sure).

>Debian's Python build helper tools are still breeding like rabbits,
>there is a new one in experimental. I guess because the current ones
>dh_python2/dh_python3 don't handle packages that contain only code
>that runs on both python2 and python3 without changes.

Not exactly.  dh_python2 and dh_python3 are really good IMHO, but one problem
is that while dh has a lot of helpers to make it easy to write d/rules files
for common case setup.py based Python 2 packages, it doesn't know anything
about Python 3.  Take a look at all the overrides you have to add for
libraries that are both Python 2 and 3 compatible, as described in

Among the things that pybuild improves is dh support for Python 3, so you
really can almost always write just a 3 line d/rules file, even for libraries
that support both Python 2 and 3, with automatic running of unittests, etc.
That's win enough, IMHO.

Piotr can perhaps speak in more detail about it, but pybuild is more ambitious
still, and IIUC, really just builds on top of dh_python2 and dh_python3 by
supporting several different upstream build systems (default is to
auto-detect, e.g. distutils-based, configure-based, etc.) with lots of
overrides and customization possible.  For example, there are several ways
that a library's test suite can be invoked so having good auto-detection of
that, along with convenient ways to customize it are important.

The pybuild manpage has all the gory details, but I think with pybuild, we're
finally able to promote really easy to write d/rules files for the majority of
Python packages across both major version stacks.

Hope that helps!

Attachment: signature.asc
Description: PGP signature

Reply to: