[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Debian (would like) to do list



I would like to become a Debian developer to help accomplish these tasks,
but my time is limited and I do not need to be a developer to help if some
developers pick up these tasks. Also my computing resources are limited so
projects like scanning source code and brute force "now" checking of
packages would be too time consuming without help or more resources. I
also haven't done all the checking that may be necessary, and thus some of
these tasks may be irrelevant or already underway. Having three exams
comming up, this may not be the best time for me to discuss anything, but
I got tired of waiting and, as you can see my list is growing long.

I'm not sure what the appropriate forum for discussing my list, but
debian-user seemed to be the best fit as I am a user.

First some clarification. When I say "before, after, now" I mean that the
uploader should(?) check this before uploading (perhaps this can be
automated), the archive maintainers or upload procedure should(?) check
this after it has been uploaded (perhaps this can be automated), and this
should(?) be checked for now to catch any violations that have been missed
(perhaps this can be automated).


Debian related tasks:

QA and improvements:
Continue the spell check campaign and look at improving it (before, after,
now)

Add grammar checking (before, after, now).

Add watch files to as many package sources (or diffs) as possible (before,
now, after may be unnecessarily complex).

Add Trove descriptions to packages and source (before, after, now). This
would be nice. Perhaps this may help improve the trove format.

Why are packages removed from the archives? There are many reasons, but
sometimes it's hard to find out. There should be some way of recording
this especially for those who track unstable on an infrequent basis.
Perhaps an entry into the Debian BTS under the package name?

Packages should purge configuration files before purging directories
otherwise empty directories can be left behind. (now)

Scanning package descriptions, documentation and other package related
areas for URL's and seeing if they are active URL's. (before, now)

Check to see if a package depends on a pseudopackage, transitional (also
dummy packages?) or other package that will be removed from the archives.
(before, after, now) Should Debian have a way to mark packages that are
going to be removed from the archives, pseudopackages, transitional and
dummy packages? A common word to describe such packages may help users to
better identify these packages and deal with them (users may want to
remove them, developers must want to depend on other packages).

Check for bash specific pieces of shell scripts where it may cause
problems such as in install scripts. (before, after, now)



Checking for policy violations or better fits:
Section 2.3.4 of the policy manual says:
"Packages are not required to declare any dependencies they have on other
packages which are marked Essential (see below), and should not do so
unless they depend on a particular version of that package.", this should
be checked for (before, after, now).

Check for packages that use old policies (before, after, now) and see if
the policy version can be updated or what needs to be done and file bug
reports against the package.

Check for contrib packages that can be moved to main (before, after, now).

Check for non-free packages that can be moved (before, after, now).

>From dpkg (1.10.1) unstable's changelog:
"* Add conflict with dpkg-iasearch which intruded on our namespace." by:
-- Wichert Akkerman <wakkerma@debian.org>  Tue,  2 Jul 2002 12:34:07
+0200. Is this a policy violation? Did dpkg-iasearch violate a policy?

Automate testing of policy musts and where approval must be met create an
automated system for approval by people (may require authority structure
to be created). Many parts of Debian policy say to get approval from
debian-devel. I would like to avoid having people upload packages without
explicit approval which an automated mechanism could check for. (after,
now)



Reducing the size of the distribution & packages, cleaning up, and backing
up:
Look at not only gziping documentation but also compressing other files
such as png files using pngcrush or other files using other utilities.
(before, after, now)

Why not bzip2 instead of gzip? New upcoming algorithms are being worked on
and there are known deficiencies in bzip2. See the bzip2 homepage and read
about how the author thinks that he can make some significant
improvements. Also see http://www.compression.ca for some comparisons of
archives and note that PPM variants compress things more. CTW is pretty
good too, but the algorithm that bzip2 is based on is lower on the list
for compression ratio. Using bzip2 on source files is a wishlist item for
Debian policy. I'm arguing that it's a good idea to look at algorithms
other than gzip, but jumping on bzip2 may be a large transition that may
be made unnecessary by another large transition to a new compression
format. I'm hoping to help in the development of new compression formats
some of which should have better performance than bzip2.

Section 2.4.1 of the policy manual says:
"only the first three components of the policy version are significant in
the Standards-Version control field, and so either these three components
or the all four components may be specified." As this is a may, I would
prefer the saved space over the acknowledgement of cosmetic differences.
If the cosmetic difference is found to cause a meaning to change then a
higher version number will be changed.

A policy for reducing the length of changelogs may help reduce package
sizes. "Before, after, now" only after a policy has been chosen. I know
changelogs can be needed and useful. Changelogs can also be useless and
consume precious space, especially on minimal installations. Perhaps
packages could have a ranking of what files in them are necessary? This
may imply splitting the archives, but you can't split some files like
changelogs as they are required(2.4.4) for every package.

Optimize ordering of files in tar archives. tar files are usually
compressed, but if files of similar types are put closer together they can
compress better. I am looking at a simple method using 'file' and sorting
by 'file' type first, then looking at mime types, and then looking at
doing some statistical testing for file information. I may also create a
utility for using brute force to try every combination and then
compressing them and checking for the best order. Note that this may be
affected by concepts discussed in the gzip/bzip argument above as
compression methods do prefer different orderings in different cases.
(before, after, now)

Removing unnecessary directories from package listings. Some .deb's
contain lists of directories that they need. Even when it is not required
that they list certain directories, they are still allowed to. (before,
and now, but as this is a 'may' then not after)

Detecting the 'want' for virtual packages (when many "depend" and/or
"require" have or's, or a virtual package is provided by few packages).
This may cause virtual packages to be either created or removed. (before,
after, now)

Using upx or alike for minimal installs, boot disks, base? Making it an
option? Perhaps this could be an option integrated into apt.

Some programs use static code for things like regex expressions and
handling tar archives. A program to go through the source code of all the
programs (or a developer effort) may help to find common code that could
be put into a library or that already is in a library. This could make
packages smaller, but if we're not careful, creating new libraries could
increase the overall installed size for a program. (before, now) An
additional benefit would be fewer places to change code (good for
security, good for efficiency, good for all updates). Are there any
security issues to exporting code from packages? (This should be looked at
whenever code's exported.)

Searching for more ways of removing unnecessary content from debs.

Using a thesaurus such as Aiksaurus may help to reduce the size of
descriptions. Shorter descriptions and more clear descriptions would be a
good project (aka laconic's good). Automated tools could help (before,
now).

http://lists.debian.org/debian-mentors/1999/debian-mentors-199901/msg00051.html
talks about putting datasets into Debian or non-free. I wonder what has
become of this particular dataset and if there has been a policy developed
for datasets. I would like to see astronomical, meteorological,
geographical and other data sets easily available. If a data set is
DFSG-free then I feel it should be put in main, but segregated somehow (in
extra?). Data sets may require maintenance too. For example, recently new,
more accurate data was collected about the distance certain stars are from
our sun. When I did some more investigation, I found out that a vote to
include a dataset section was made and it was decided to create a dataset
section. No such section was created and the astronomical data is sitting
with the person made this proposal. Special handling of datasets may be
required to reduce the impact on Debian distribution infrastructure. I
recommend updates and distribution only be allowed through diffs or some
other method that uses less bandwidth than is used now.

findimagedups and other such packages could be used to search for
duplicate or near duplicate files Debian packages. Then 'common' packages
which have these files can be created and/or symbolic links may be used to
save space. (before, now) Perhaps a program that makes symbolic links to
common files where necessary?

Create Debian cleanup procedure and program(s) (cruft, deborphan...), now.

Create Debian backup procedure and program(s) (debian cleanup, cruft to
backup, dpkg --getselections > myselections, backup config files possibly
checking md5's which more than should be in every package), now.

Creating a version of Debian that binds a writeable filesystem onto a read
only filesystem (floppy writeable with a readonly CD). I would love to
have this to cary around and run Linux on any machine with a CDROM drive
that I could boot with. upx may be useful. A compressed filesystem for
writing may be useful. Support for umsdos, NFS, samba, and/or mounting
file systems, creating a file and mounting the file could be useful.
http://www.debian.org/CD/faq/index#live-cd is something I later found. I
would like to see more development, and official Debian development. Upon
further investigation bootcd seems to be a start, but how much of this can
it do? Maybe these features should be wishlist bugs, but the CD faq needs
to be updated, and I would still like to see an official CD image.

Should CD images be optimized for space? I saw an option to optimize CD
images for space in Roxio's Ez-cdcreator (formerly by Adaptec).


Security, Policy and other bug stuff:
Automated rough security audit of all source code (rats, splint & other
programs can be used, before, after, now).

Programs that use keyahead or mouseahead routines may be a security risk
or cause other undesirable results. One example is my apt-get using
readline has keyahead so if I accidentally hit enter, the enter is saved
until the next question and then inputted. Instead I'd prefer it be
disregarded so I can read the arbitrary (it's hard to predict the order)
question that appears next. Mouseahead can be very dangerous if the
program hasn't updated the interface, the user will likely have no idea
what they will have clicked on ("it didn't work. I'll click again. What? I
didn't select that second option."). Yes, these are probably wishlist
bugs, but they could be a normal bug as this can affect the desired
functionality of programs. These bugs may also to be tagged security in
some situations (the default password gets set by accident, etc). This may
tie into scanning code for security vulnerabilities. (With scanning before
and after, but this should be checked for now, especially where system
security can be involved).

'popularity-contest' and other methods can be very helpful in finding out
what users are interested in seeing being developed and maintained.
Perhaps this, archive (mirrors too?) statistics and other methods can be
used to create a priority list for the qa group. Perhaps a system should
be put in place to allow user input into package importance/maintenance
priority levels. Currently I would assume that a good system would be by
the subtype such as essential, optional...

Campaigning for signed debs to be a must (if not already). Signed debs
more than should be used (before, after, now).

Campaigning for md5 lists in debs to be a must. md5 values for all files
in packages more than should be done (before, after, now).

A procedure should be put in place to ensure installation starvation due
to dependencies does not occur in the unstable distribution. (perhaps
waiting a day for dependencies to catch up?) I feel this could be
automated or automated better (before, after, now).

Find a way to reduce the chance of bad NMU's (accidental, malicious,
poorly done, etc.). I haven't looked into how this is done now (if at
all), but the developer making the NMU should be warned that it's an NMU.
It may be good to list NMU policy for the first time for an NMU by a
specific developer and ask for confirmation. It may be good to have an
automated system where maintainers can block NMU's except by permission of
an authority such as the security or qa group.

Joey says at http://www.debian.org/devel/website/todo#misc that security
updates are on the same server as the signatures for the updates. This
could be a potential security issue as if one method is exploited to
change the files, it can be used to change the signatures at the same
time. wyrmbait at debianplanet.org says in his article Security with apt
(found at http://www.debianplanet.org/article.php?sid=643 ), that apt can
be viewed as a single point of failure. While his arguments may not quite
be thorough, he does bring up some issues of security. Why not have a
package for keys/certificates, then have dpkg complain if a new package
has not been correctly signed. Also packages in the archive should be
signed by a public key that is available on many public key servers and
available offline (on CD perhaps). Changes to the keyring packages would
need to have the appropriate signature(s).

Packages being signed by multiple people and allowing users to assign
trust levels (checked before installing an upgrade) to people could
improve security.

I would like to encourage distribution of public key server media. Having
keys stored online lends them to potential man in the middle attacks even
if multiple protocols are used. It's much more difficult to circumvent an
offline signature.

One of the reasons for the delay of the release of Woody was said to have
been security concerns. It has also been reported (see the glibc example
at http://www.debianplanet.org/article.php?sid=568 ) that it takes a long
time for security patches to get through due to the compiling and testing
on the 68k and arm architectures. I would like to bring forward the idea
of using emulation to help speed things up. There was recently (March?) a
patch for UAE (a 68k emulator) to support running Linux. There are also
emulators for the arm architecture such as arcem (
http://bugs.debian.org/cgi-bin/bugreport.cgi?archive=no\&bug=136844 for
details ). arcem is said to be quite fast on Intel architecture. Emulation
of old architecture brings two advantages and one disadvantage: it's
usually faster, it's easier to get, it could have trouble being a
completely accurate emulation of the original hardware (bugs not emulated
or new bugs not yet found/patched).

Many of the patches and programs found at
http://www.theaimsgroup.com/~hlein/haqs/ can be quite useful. The programs
can be packaged. The patches, when useful, should be added to existing
packages or modified to make them run time options. For example the idle
connection traffic patch for ssh may be a useful option that may be
possible to be chosen at runtime.



Performance issues:
Someone mentioned the idea of ordering the startup scripts into a
dependency tree, and have programs startup in parallel. I feel this would
be useful for many people running many startup scripts such as myself.
Perhaps this should be a before, after, now. If nothing else, it should be
looked at now and a policy document regarding this may be useful. I forget
which Debian developer I read this idea from.

Having package install in parallel may speed up installation. This may be
a wishlist item for dpkg or apt.

Should CD images be optimized for speed? I saw an option to optimize CD
images for creation speed in Roxio's Ez-cdcreator (formerly by Adaptec).
Also speed of installation or other reads of the CD. Seek time might also
be a consideration when choosing what order to put data on CD's.

Using programs like cmix, the performance of programs may be able to be
optimized (before and now, but not after upload as optimizing programs may
not work desirably).

Sometimes threads come up about performance optimization done at compile
time. Yes numbers have not been provided, but some number should be. As
such a comparison of compilers of gcc, lc (Intel's compiler if allowed),
tcc and any other compilers should be done. Binaries could be compiled
with each available compiler and then checked to see which produces the
best results in application performance, binary size and perhaps compile
time. (Smallest binary size usually means high compile time and better
application performance, or so I've been told.) This should be done before
upload and now.


Other sometimes bigger ideas:
Should there be a method to force retirement of developers? I don't
believe so, I believe that a new category should be created for developers
who are not active. Why separate inactive developers? To limit the
security risk and make managing developers easier.

A restructuring of the online distribution protocol is needed. Recently in
the Debian Weekly news this was mentioned and this has been discussed. A
BTS location may be a good place to start putting won't fix, wishlist and
other bug information regarding the distribution protocol(s). Personally,
I'd like to see low server loads, compressed files, deltas, and have
upgrade priorities visible before downloading the package/archive.

It might be nice to make debian/watch files separately available and to
have a watch file for all upstream sources even when it's version
specific. It would also be nice to carry md5's for upstream sources (last
known version of course) so when upstream sources get modified (like the
dsniff security issue), users of watch files to grab the current source
get a heads up that there may be something wrong.

Support for installing Debian via a netboot/bootp by distributing an
official netboot image.

A comparison of xwindows terminals (or is it terminal emulators?) is
disirable. xvt seems to have a smaller footprint than rxvt which, I
though, was supposed to be reduced xvt.
http://dickey.his.com/xterm/xterm.faq.html has some starting information.
This would be useful for creating a small RAM xwindows install.


Other related projects that aren't Debian specific:
RATS for gnu assembly (note: intel2gas) may be more useful if it existed,
but it doesn't, yet.

An open source grammar checker (not EBNF or alike) doesn't seem to exist.
Openoffice lacks a grammar checker and does not plan to add one. A grammar
checker is a major proofing tool that would be extremely useful to many
people. I did find one open source grammar checker called Link Grammar
http://www.link.cs.cmu.edu/link/ . I disagree with the evolution of
English being too fast for creating a static grammar checker as many in
the commercial world have done so.

Update File's database. This would be useful for my projects looking at
reordering files in tar archives and other compression projects of mine.
(This may have to be a Debian thing as I don't see updates to the database
very often.)



Other related projects (to those discussed):
I'm working on some compression algorithms (Charles Bloom at
http://www.cbloom.com and xiph have some starting work of what I can do).
I believe I can improve existing compression. I don't have much time as
I'm a full time student and I need money to pay rent. I will be graduating
with a computer science degree in December.

     Drew Daniels

PS: Any help at finding me a good job would be appreciated (contract or
otherwise). A reasonable version of my resume is available at
http://home.cc.umanitoba.ca/~umdanie8/resume.html


-- 
To UNSUBSCRIBE, email to debian-user-request@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org



Reply to: