[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Debian (would like) to do list

Drew Scott Daniels wrote:
> I would like to become a Debian developer to help accomplish these tasks,
> but my time is limited and I do not need to be a developer to help if some
> developers pick up these tasks. Also my computing resources are limited so
> projects like scanning source code and brute force "now" checking of
> packages would be too time consuming without help or more resources. I
> also haven't done all the checking that may be necessary, and thus some of
> these tasks may be irrelevant or already underway. Having three exams
> comming up, this may not be the best time for me to discuss anything, but
> I got tired of waiting and, as you can see my list is growing long.

Good list. It reminds me of the list I had when I became a developer, 7
years ago, excecpt it looks like it could easly keep someone busy for
20, rather than just 10 years..

You should not find it hard to become a developer given your quality of
thought on debian, attention to detail, and wide variety of stuff you
want to see fixed -- if you follow through on it. I'm going to annotate
this list to note where we are on some things. There is also a ton of
stuff here that falls under the heading QA and can easily be worked on
by non-developers.

> I'm not sure what the appropriate forum for discussing my list, but
> debian-user seemed to be the best fit as I am a user.

debian-devel (or debian-qa) would really be better, cc'd.

> First some clarification. When I say "before, after, now" I mean that the
> uploader should(?) check this before uploading (perhaps this can be
> automated), the archive maintainers or upload procedure should(?) check
> this after it has been uploaded (perhaps this can be automated), and this
> should(?) be checked for now to catch any violations that have been missed
> (perhaps this can be automated).
> Debian related tasks:
> QA and improvements:
> Continue the spell check campaign and look at improving it (before, after,
> now)
> Add grammar checking (before, after, now).
> Add watch files to as many package sources (or diffs) as possible (before,
> now, after may be unnecessarily complex).

Some kind of semi-automated tool to do this might help. I have watch
files on all my packages that can have watch files, but it seems to take
at least 5 minutes to add and test one.

> Add Trove descriptions to packages and source (before, after, now). This
> would be nice. Perhaps this may help improve the trove format.
> Why are packages removed from the archives? There are many reasons, but
> sometimes it's hard to find out. There should be some way of recording
> this especially for those who track unstable on an infrequent basis.
> Perhaps an entry into the Debian BTS under the package name?

There is a file on the archive whose name I cannot remember that lists
removed packages and why.

> Packages should purge configuration files before purging directories
> otherwise empty directories can be left behind. (now)

It is rumored that an in-progress rewrite of dpkg-deb may address this.

> Scanning package descriptions, documentation and other package related
> areas for URL's and seeing if they are active URL's. (before, now)

Good idea and even somewhat easy.

> Check to see if a package depends on a pseudopackage, transitional (also
> dummy packages?) or other package that will be removed from the archives.
> (before, after, now) Should Debian have a way to mark packages that are
> going to be removed from the archives, pseudopackages, transitional and
> dummy packages? A common word to describe such packages may help users to
> better identify these packages and deal with them (users may want to
> remove them, developers must want to depend on other packages).

I think the word is "deprecated", but maybe you mean a formal control
file field.

> Check for bash specific pieces of shell scripts where it may cause
> problems such as in install scripts. (before, after, now)

Partly done already by lintian, a full check is hard.

> Checking for policy violations or better fits:
> Section 2.3.4 of the policy manual says:
> "Packages are not required to declare any dependencies they have on other
> packages which are marked Essential (see below), and should not do so
> unless they depend on a particular version of that package.", this should
> be checked for (before, after, now).

I think there is something on qa.debian.org that lists these.

> Check for packages that use old policies (before, after, now) and see if
> the policy version can be updated or what needs to be done and file bug
> reports against the package.

The check is already done by lintian, and the rest is already done
occasionally for the very old poliy versions. Any work toward tightening
it up to more recent versions is all to the good.

> Check for contrib packages that can be moved to main (before, after, now).
> Check for non-free packages that can be moved (before, after, now).

Generally done by their respective maintainers, I think we do a pretty
good job here.

> From dpkg (1.10.1) unstable's changelog:
> "* Add conflict with dpkg-iasearch which intruded on our namespace." by:
> -- Wichert Akkerman <wakkerma@debian.org>  Tue,  2 Jul 2002 12:34:07
> +0200. Is this a policy violation? Did dpkg-iasearch violate a policy?

No, it's just Wichert being inconsistent (dpkg-repack intruded on that
"namespace" long ago, and need I mention debhelper?)

> Automate testing of policy musts and where approval must be met create an
> automated system for approval by people (may require authority structure
> to be created). Many parts of Debian policy say to get approval from
> debian-devel. I would like to avoid having people upload packages without
> explicit approval which an automated mechanism could check for. (after,
> now)

I'm not sure I understand this one.

> Reducing the size of the distribution & packages, cleaning up, and backing
> up:
> Look at not only gziping documentation but also compressing other files
> such as png files using pngcrush or other files using other utilities.
> (before, after, now)
> Why not bzip2 instead of gzip? New upcoming algorithms are being worked on
> and there are known deficiencies in bzip2. See the bzip2 homepage and read
> about how the author thinks that he can make some significant
> improvements. Also see http://www.compression.ca for some comparisons of
> archives and note that PPM variants compress things more. CTW is pretty
> good too, but the algorithm that bzip2 is based on is lower on the list
> for compression ratio. Using bzip2 on source files is a wishlist item for
> Debian policy. I'm arguing that it's a good idea to look at algorithms
> other than gzip, but jumping on bzip2 may be a large transition that may
> be made unnecessary by another large transition to a new compression
> format. I'm hoping to help in the development of new compression formats
> some of which should have better performance than bzip2.

It seems certian that the new source format will support bzip2'd source
packages. There are as you note performance issues, and those have been
used to shoot down suggestions to use bzip2 in binary packages in the

> Section 2.4.1 of the policy manual says:
> "only the first three components of the policy version are significant in
> the Standards-Version control field, and so either these three components
> or the all four components may be specified." As this is a may, I would
> prefer the saved space over the acknowledgement of cosmetic differences.
> If the cosmetic difference is found to cause a meaning to change then a
> higher version number will be changed.

I'm ambivilant.

> A policy for reducing the length of changelogs may help reduce package
> sizes. "Before, after, now" only after a policy has been chosen. I know
> changelogs can be needed and useful. Changelogs can also be useless and
> consume precious space, especially on minimal installations. Perhaps
> packages could have a ranking of what files in them are necessary? This
> may imply splitting the archives, but you can't split some files like
> changelogs as they are required(2.4.4) for every package.

I'd love to see some kind of a formal policy on this. I think we should
keep old debian changelogs for ever in the source package, but there is
little value to the user in most cases in the 3 hundredth changelog
entry down being in a binary package. There are exceptions, though (or
maybe I'm the only one who goes trawling through ancient bits of
debhelper's changelog, I don't know). I know that some trimming is
already happening on an ad-hoc basis, and it would be useful it this
were formalized. Perhaps there should be a mark that can be placed in a
changelog, below which it is truncated when being put in a binary
package. Or perhaps changelogs could be truncated to X years back after
installation, based on the user's preferences. Just making sure what we
symlink changelogs together when possible between binary packages that
share a source package would save a lot of space.

> Optimize ordering of files in tar archives. tar files are usually
> compressed, but if files of similar types are put closer together they can
> compress better. I am looking at a simple method using 'file' and sorting
> by 'file' type first, then looking at mime types, and then looking at
> doing some statistical testing for file information. I may also create a
> utility for using brute force to try every combination and then
> compressing them and checking for the best order. Note that this may be
> affected by concepts discussed in the gzip/bzip argument above as
> compression methods do prefer different orderings in different cases.
> (before, after, now)

Heh, interesting idea, not very debian specific.

> Removing unnecessary directories from package listings. Some .deb's
> contain lists of directories that they need. Even when it is not required
> that they list certain directories, they are still allowed to. (before,
> and now, but as this is a 'may' then not after)

Do you refer to debs that happen to include an empty directory that is
already on the system, like a random deb that puts programs in /usr/bin
having an empty /sbin directory?

> Detecting the 'want' for virtual packages (when many "depend" and/or
> "require" have or's, or a virtual package is provided by few packages).
> This may cause virtual packages to be either created or removed. (before,
> after, now)
> Using upx or alike for minimal installs, boot disks, base? Making it an
> option? Perhaps this could be an option integrated into apt.

It has been discussed on debian-boot in the past; there are plenty of
issues. See archives.

> Some programs use static code for things like regex expressions and
> handling tar archives. A program to go through the source code of all the
> programs (or a developer effort) may help to find common code that could
> be put into a library or that already is in a library. This could make
> packages smaller, but if we're not careful, creating new libraries could
> increase the overall installed size for a program. (before, now) An
> additional benefit would be fewer places to change code (good for
> security, good for efficiency, good for all updates). Are there any
> security issues to exporting code from packages? (This should be looked at
> whenever code's exported.)

Very cute idea.

> Searching for more ways of removing unnecessary content from debs.
> Using a thesaurus such as Aiksaurus may help to reduce the size of
> descriptions. Shorter descriptions and more clear descriptions would be a
> good project (aka laconic's good). Automated tools could help (before,
> now).

I wouldn't trust an automated tool, but neat idea.

> http://lists.debian.org/debian-mentors/1999/debian-mentors-199901/msg00051.html
> talks about putting datasets into Debian or non-free. I wonder what has
> become of this particular dataset and if there has been a policy developed
> for datasets. I would like to see astronomical, meteorological,
> geographical and other data sets easily available. If a data set is
> DFSG-free then I feel it should be put in main, but segregated somehow (in
> extra?). Data sets may require maintenance too. For example, recently new,
> more accurate data was collected about the distance certain stars are from
> our sun. When I did some more investigation, I found out that a vote to
> include a dataset section was made and it was decided to create a dataset
> section. No such section was created and the astronomical data is sitting
> with the person made this proposal. Special handling of datasets may be
> required to reduce the impact on Debian distribution infrastructure. I
> recommend updates and distribution only be allowed through diffs or some
> other method that uses less bandwidth than is used now.

Yes, that idea seems to be stalled.

> findimagedups and other such packages could be used to search for
> duplicate or near duplicate files Debian packages. Then 'common' packages
> which have these files can be created and/or symbolic links may be used to
> save space. (before, now) Perhaps a program that makes symbolic links to
> common files where necessary?

Beware of overfragmentation of packages though.

> Create Debian cleanup procedure and program(s) (cruft, deborphan...), now.
> Create Debian backup procedure and program(s) (debian cleanup, cruft to
> backup, dpkg --getselections > myselections, backup config files possibly
> checking md5's which more than should be in every package), now.
> Creating a version of Debian that binds a writeable filesystem onto a read
> only filesystem (floppy writeable with a readonly CD). I would love to
> have this to cary around and run Linux on any machine with a CDROM drive
> that I could boot with. upx may be useful. A compressed filesystem for
> writing may be useful. Support for umsdos, NFS, samba, and/or mounting
> file systems, creating a file and mounting the file could be useful.
> http://www.debian.org/CD/faq/index#live-cd is something I later found. I
> would like to see more development, and official Debian development. Upon
> further investigation bootcd seems to be a start, but how much of this can
> it do? Maybe these features should be wishlist bugs, but the CD faq needs
> to be updated, and I would still like to see an official CD image.

Somebody please do this, mmmkay? IIRC I saw Robster and someone
discussing this on irc the other day.

> Should CD images be optimized for space? I saw an option to optimize CD
> images for space in Roxio's Ez-cdcreator (formerly by Adaptec).

Depends on how much slower or less portible it makes them and how much
space is gained, I imagine.

> Security, Policy and other bug stuff:
> Automated rough security audit of all source code (rats, splint & other
> programs can be used, before, after, now).
> Programs that use keyahead or mouseahead routines may be a security risk
> or cause other undesirable results. One example is my apt-get using
> readline has keyahead so if I accidentally hit enter, the enter is saved
> until the next question and then inputted. Instead I'd prefer it be
> disregarded so I can read the arbitrary (it's hard to predict the order)
> question that appears next. Mouseahead can be very dangerous if the
> program hasn't updated the interface, the user will likely have no idea
> what they will have clicked on ("it didn't work. I'll click again. What? I
> didn't select that second option."). Yes, these are probably wishlist
> bugs, but they could be a normal bug as this can affect the desired
> functionality of programs. These bugs may also to be tagged security in
> some situations (the default password gets set by accident, etc). This may
> tie into scanning code for security vulnerabilities. (With scanning before
> and after, but this should be checked for now, especially where system
> security can be involved).

I think that in all cases except for new password prompts, it can be
sucessfully argued that _someone_ will be abe to anticipate what prompts
are coming up and typeahead, and so that will get shot down. Maybe look
at modifying a terminal emulator to let typeahead be turned off, or
displayed at the bottom of the screen and cleared with a keypress, or
something, would be more effective?

> 'popularity-contest' and other methods can be very helpful in finding out
> what users are interested in seeing being developed and maintained.
> Perhaps this, archive (mirrors too?) statistics and other methods can be
> used to create a priority list for the qa group. Perhaps a system should
> be put in place to allow user input into package importance/maintenance
> priority levels. Currently I would assume that a good system would be by
> the subtype such as essential, optional...
> Campaigning for signed debs to be a must (if not already). Signed debs
> more than should be used (before, after, now).
> Campaigning for md5 lists in debs to be a must. md5 values for all files
> in packages more than should be done (before, after, now).

If the dpkg people ever get around to making this part of the package
format for real, it will happen. People seem to take malicious delight
in shooting down the idea that the current files become required by
policy since they are not perfect, while ignoring the fact that they are
very useful to a lot of people. Sigh.

> A procedure should be put in place to ensure installation starvation due
> to dependencies does not occur in the unstable distribution. (perhaps
> waiting a day for dependencies to catch up?) I feel this could be
> automated or automated better (before, after, now).

Testing already does this, and putting it in unstable would just add
another hurdle to making big changes to unstable. It's really not a good
idea, and it has been discussed before.

> Find a way to reduce the chance of bad NMU's (accidental, malicious,
> poorly done, etc.). I haven't looked into how this is done now (if at
> all), but the developer making the NMU should be warned that it's an NMU.
> It may be good to list NMU policy for the first time for an NMU by a
> specific developer and ask for confirmation. It may be good to have an
> automated system where maintainers can block NMU's except by permission of
> an authority such as the security or qa group.

Social solutions are the rights ones here. We do a pretty good job.

> Joey says at http://www.debian.org/devel/website/todo#misc that security
> updates are on the same server as the signatures for the updates. This
> could be a potential security issue as if one method is exploited to
> change the files, it can be used to change the signatures at the same
> time. wyrmbait at debianplanet.org says in his article Security with apt
> (found at http://www.debianplanet.org/article.php?sid=643 ), that apt can
> be viewed as a single point of failure. While his arguments may not quite
> be thorough, he does bring up some issues of security. Why not have a
> package for keys/certificates, then have dpkg complain if a new package
> has not been correctly signed. Also packages in the archive should be
> signed by a public key that is available on many public key servers and
> available offline (on CD perhaps). Changes to the keyring packages would
> need to have the appropriate signature(s).

We just need an end-to-end package signing infrastructure with no holes,

> Packages being signed by multiple people and allowing users to assign
> trust levels (checked before installing an upgrade) to people could
> improve security.
> I would like to encourage distribution of public key server media. Having
> keys stored online lends them to potential man in the middle attacks even
> if multiple protocols are used. It's much more difficult to circumvent an
> offline signature.

debian-keyring.deb, you mean? It's on the non-US cd's anyway, and will
be on the regular cd's for sarge I hope.

> One of the reasons for the delay of the release of Woody was said to have
> been security concerns. It has also been reported (see the glibc example
> at http://www.debianplanet.org/article.php?sid=568 ) that it takes a long
> time for security patches to get through due to the compiling and testing
> on the 68k and arm architectures. I would like to bring forward the idea
> of using emulation to help speed things up. There was recently (March?) a
> patch for UAE (a 68k emulator) to support running Linux. There are also
> emulators for the arm architecture such as arcem (
> http://bugs.debian.org/cgi-bin/bugreport.cgi?archive=no\&bug=136844 for
> details ). arcem is said to be quite fast on Intel architecture. Emulation
> of old architecture brings two advantages and one disadvantage: it's
> usually faster, it's easier to get, it could have trouble being a
> completely accurate emulation of the original hardware (bugs not emulated
> or new bugs not yet found/patched).
> Many of the patches and programs found at
> http://www.theaimsgroup.com/~hlein/haqs/ can be quite useful. The programs
> can be packaged. The patches, when useful, should be added to existing
> packages or modified to make them run time options. For example the idle
> connection traffic patch for ssh may be a useful option that may be
> possible to be chosen at runtime.
> Performance issues:
> Someone mentioned the idea of ordering the startup scripts into a
> dependency tree, and have programs startup in parallel. I feel this would
> be useful for many people running many startup scripts such as myself.
> Perhaps this should be a before, after, now. If nothing else, it should be
> looked at now and a policy document regarding this may be useful. I forget
> which Debian developer I read this idea from.

Henrique de Moraes Holschuh. He has this well in hand I think.

> Having package install in parallel may speed up installation. This may be
> a wishlist item for dpkg or apt.

Very unlikely, on a system fast enough to not run into CPU problems
doing this, you'd probably become quickly disk bound, and this would
just churn the disk more. It's also rather hard to do right.

> Using programs like cmix, the performance of programs may be able to be
> optimized (before and now, but not after upload as optimizing programs may
> not work desirably).
> Sometimes threads come up about performance optimization done at compile
> time. Yes numbers have not been provided, but some number should be. As
> such a comparison of compilers of gcc, lc (Intel's compiler if allowed),
> tcc and any other compilers should be done. Binaries could be compiled
> with each available compiler and then checked to see which produces the
> best results in application performance, binary size and perhaps compile
> time. (Smallest binary size usually means high compile time and better
> application performance, or so I've been told.) This should be done before
> upload and now.

By all means I'd like to see some real numbers.

> Other sometimes bigger ideas:
> Should there be a method to force retirement of developers? I don't
> believe so, I believe that a new category should be created for developers
> who are not active. Why separate inactive developers? To limit the
> security risk and make managing developers easier.

It has been discussed, seach for "MIA" and "emeritus" around the DPL
compagning times.

> A restructuring of the online distribution protocol is needed. Recently in
> the Debian Weekly news this was mentioned and this has been discussed. A
> BTS location may be a good place to start putting won't fix, wishlist and
> other bug information regarding the distribution protocol(s). Personally,
> I'd like to see low server loads, compressed files, deltas, and have
> upgrade priorities visible before downloading the package/archive.
> It might be nice to make debian/watch files separately available and to
> have a watch file for all upstream sources even when it's version
> specific. It would also be nice to carry md5's for upstream sources (last
> known version of course) so when upstream sources get modified (like the
> dsniff security issue), users of watch files to grab the current source
> get a heads up that there may be something wrong.

So long as packages use pristine upstream sources, we already have such
md5's in the .dsc files.

see shy jo

To UNSUBSCRIBE, email to debian-devel-request@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

Reply to: