[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Ending/reducing bytecode compilation, loosening dependencies



Hi,

About a month ago Steve Langasek and I discussed the state of Python
packages on IRC, in particular the effects of bytecode compilation; the
effectiveness (or lack thereof) of it, and how it tightens Python
dependencies. I'd like to propose three changes to how Python modules
are handled.

All three can be summarized as: Python should not compile stuff by
default; this is premature optimization and a waste of time, disk space,
and doesn't solve the problems anyway.

1. Stop compiling .pyo files, entirely (I'm hoping for little argument
on this).

Rationale: .pyo files are a joke. They aren't optimized in any
meaningful sense, they just have asserts removed. Examples for several
non-trivial files:

$ md5sum stock.pyc stock.pyo widgets.pyc widgets.pyo formats/_audio.pyc formats/_audio.pyo
5ca1a79bf036e9eddf97028c00f1d0c7  stock.pyc
5ca1a79bf036e9eddf97028c00f1d0c7  stock.pyo
f6c17acdf8043bb8524834f9a5f5c747  widgets.pyc
f6c17acdf8043bb8524834f9a5f5c747  widgets.pyo
dea672e99bb57f7e7585378886eb3cb0  formats/_audio.pyc
dea672e99bb57f7e7585378886eb3cb0  formats/_audio.pyo

They also aren't even loaded unless you run python with -O, which I
don't think any Python programs in Debian do.

How?: compileall.py:57,
-                cfile = fullname + (__debug__ and 'c' or 'o')
+                cfile = fullname + 'c'

2. Stop compiling .pyc files (this I expect to be contentious), unless a
package wants to.

Rationale: .pyc files have a minimal gain, and numerous failings.

Advantages of .pyc files:
* .pyc files make Python imports go marginally faster. However,
   for nontrivial Python programs, the import time is dwarfed
   by other startup code. Some quick benchmarks show about 20% gains
   for importing a .pyc over a .py. But even then, the wall-clock time
   is on the order of 0.5 seconds. Lars Wirzenius mentioned that
   this time matters for enemies-of-carlotta, and it probably also
   matters for some CGI scripts.

* Generating them at compile-time means they won't accidentally
  get generated some other time.

Disadvantages:
* They waste disk space; they use about as much as the code itself.

* It's still far too easy for modules to be regenerated for the
   wrong version of Python; just run the program as root.

* .pyc files are not really architecture independent. The integer
   constant 4294967296 will be a long in .pyc files compiled on 32 bit
   architectures, and an int when compiled on 64 bit architectures.
   The resulting module will run on both architectures, but won't
   behave in the same way as a module from that machine. To be fair,
   I don't know of any real-world examples that will break because
   of this.

* .pyc files result in strange bugs if they are not cleaned up
   properly, since Python will import them regardless of whether
   or not an equivalent .py is present.

* If we don't care about byte-compilation, the multi-version
   support suggested in 2.2.3 section 2 becomes much easier --
   just add that directory to sys.path (or use the existing
   unversioned /usr/lib/site-python). .pyc files are the rationale
   between tight dependencies on Python versions, which is the last
   of my suggested changes.

Another note: Currently, Python policy is based around the assumption
that .pyc files are valid within a single minor Python revision. I don't
find any evidence to support this in the Python documentation. In fact,
the marshal module documentation specifically says there are no such
guarantees. However, I don't think this has ever been a problem in
practice (if it was, we wouldn't notice, because Python just ignores
invalid pyc files).

How?: dh_python should not call compileall.py unless give some special
flag. Python policy 2.5 should change "should be generated" to "may be
generated." On the other hand, the removal code should be a "must" to
avoid littering the filesystem if .pyc files do get accidentally
generated.

I'm willing to write the patch for dh_python if there's agreement on
this.

The Python standard library should still compile .pyc files, because
this is a prerequisite for any program to make good use of .pyc files.
The problems don't apply here, because it's easy to keep the interpreter
and standard library in sync. 

3. Python dependencies should be loosened (and here I expect a
flamewar).

Rational: Python migrations in Debian suck, period. One reason for this
is that every Python program and module has a strict dependency on
python >> 2.x, << 2.x+1, so during a Python migration absolutely
everything must be rebuilt. But most pure-Python programs and modules
are upward-compatible, especially these days when Debian is a minor
version behind.

Tools like dh_python do make this easier, by making backporting (or
sideporting to e.g. Ubuntu) simply a rebuild. But why bother with even
that, when it's not necessary?

Without .pyc files, there's no reason for this tight dependency at all.
Even if we keep .pyc files, I think loosening this requirement is a good
idea. Programs will still run perfectly fine with mis-versioned .pyc
files; the worst we'll see is some slightly longer startup times.

How?: Strike the third paragraph from 3.1.1. This would also negate the
fifth paragraph, which outlines a hypothetical overcomplicated solution
to the same problem.
-- 
Joe Wreschnig <piman@debian.org>

Attachment: signature.asc
Description: This is a digitally signed message part


Reply to: