[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: RFC: Bug handling policy



Hi Ben,

Ack on everything below.  Some comments are below.


On Sat, 17 Oct 2009 17:14:54 +0100
Ben Hutchings <ben@decadent.org.uk> wrote:

> I've drafted a policy based on what I believe to be best practice.
> Please comment - is anything wrong, or anything missing?  I also left
> some questions in square brackets.
> 
> Ben.
> 
> ---
> 
> 1. Required information
> 
> Submitters are expected to run reportbug or other tool that runs our
> 'bug' script under the kernel version in question.  The response to
> reports without this information should be a request to follow-up
> using reportbug.  If we do not receive this the bug may be closed
> [after how long?].

As Dann said, 1 month seems reasonable.

> 
> Exceptions:
> * If the kernel does not boot or is very unstable, instead of the
> usual system information we need the console messages in a clear
> photograph or via serial console or netconsole (or even retyping as a
> last resort).  Ask the submitter to remove 'quiet' and add
> 'vga=6' (where applicable) to the kernel command line.

Information should be available telling a user (or pointing them to
docs about) how to get information via serial/netconsole/etc.  It should
presumably live @ http://wiki.debian.org/DebianKernelReportingBugs.
When users don't provide enough information, they can just be pointed
there (ideally w/ an anchor that takes them directly to the correct
section).

If users are willing, having documentation describing how to bisect
would be quite helpful for diagnosing such critical bugs.

I'd also recommend explicitly disallowing photographs of oopses, as
they tend to miss a lot of information that's scrolled off the screen.
Even if the entire oops is included, it may be an oops that resulted
from an earlier oops (causing someone to chase a non-bug).


> * If the report is relaying information about a bug acknowledged
>   upstream, we do not need system information but we do need specific
>   references (bugzilla.kernel.org or git commit id).
> * If the bug is clearly not hardware-specific (e.g. packaging error),
> we do not need system information.
> 
> 2. Severities
> 
> Many submitters believe that their bug meets one of the following
> criteria for high severity.  We interpret them as follows and will
> downgrade as appropriate:
> 
> 'critical: makes unrelated software on the system (or the whole
> system) break...'
>    The bug must make the kernel unbootable or unstable on common
>    hardware or all systems that a specific flavour is supposed to
>    support.  There is no 'unrelated software' since everything
>    depends on the kernel.
> 
> 'grave: makes the package in question unusable or mostly so...'
>    If the kernel is unusable, this already qualifies as critical.
> 
> [Alternately: given that the user can normally reboot into an earlier
> kernel version, does that mean the bug is 'grave', not 'critical'?]

No.  Rebooting into an earlier kernel means that the user ends up with
known security holes.  That should never be something that's encouraged.

> 
> 'grave: ...or causes data loss...'
>    We exclude loss of data in memory due to a crash.  Only corruption
>    of data in storage or communication, or silent failure to write
> data, qualifies.

It happens rarely, but it does happen - you might want to also mention
hardware damage here.  Overheating due to ACPI bugs where fans don't
get turned on, filesystems trashing flash memory due to numerous writes
to the same area, and so on.

> 
> 3. Tagging
> 
> We do not use user-tags, but in order to aid bug triage we should:
> 
> * Add 'moreinfo' whenever we are waiting for a response from the
>   submitter and remove it when we are not.
> * Add 'upstream', 'fixed-upstream', 'patch', 'help' where appropriate
> * Not add 'unreproducible', since bugs are commonly hardware-dependent
> 
> 4. Analysis by maintainers
> 
> Generally we should not expect to be able to reproduce bugs without
> having similar hardware.  We should consider:
> 
> * Searching bugzilla.kernel.org (including closed bugs) or other
>   relevant bug tracker
> * Searching kernel mailing lists (of the many archives,
>   http://news.gmane.org seems to suck least)

http://patchwork.kernel.org/ is pretty awesome, too.  Useful when gmane
mangles a patch (which it has been known to do).


> * Viewing git commit logs for relevant source files
>   - In case of a regression, from the known good to the bad version
>   - In other cases, from the bad version forwards, in case the bug
>     has been fixed since
> * Searching kerneloops.org for similar oopses
> * Matching the machine code and registers in an 'oops' against the
>   source and deducing how the impossible happened (this doesn't work
>   that often but when it does you look like a genius ;-)
> 
> 5. Testing by submitter
> 
> Depending on the technical sophistication of the submitter and the
> service requirements of the system in question (e.g. whether it's a
> production server) we can request one or more of the following:
> 
> * Gathering more information passively (e.g. further logging,
> reporting contents of files in procfs or sysfs)
> * Upgrading to the current stable/stable-proposed-updates/
>   stable-security version, if it includes a fix for a similar bug
> * Adding debug or fallback options to the kernel command line or
>   module parameters
> * Installing the unstable [or backports?] version temporarily
> * Rebuilding and installing the kernel with a specific patch added
>   [I think we should add a script to the source to make this easier]
> 
> When a bug occurs in what upstream considers the current or previous
> stable release, and we cannot fix it, we ask the submitter to report
> it upstream at bugzilla.kernel.org under a specific Product and
> Component, and to tell us the upstream bug number.  We do not report
> bugs directly because follow-up questions from upstream need to go to
> the submitter, not to us.  Given the upstream bug number, we mark the
> bug as forwarded. bts-link then updates its status.
> 
> 6. Keeping bugs separate
> 
> Many submitters search for a characteristic error message and treat
> this as indicating a specific bug.  This can lead to many 'me too'
> follow-ups where, for example, the message indicates a driver bug and
> the second submitter is using a different driver from the original
> submitter.  We should try to respond to such a follow-up quickly,
> requesting a separate bug report.  Otherwise the original report is
> likely to turn into a mess of conflicting information about two or
> more different bugs.
> 

Of course, this may happen anyways if it's not handled quickly enough.
What should be done?  Cloning means that you'll still have lots of
unrelated reports in each.  Perhaps the process should be to close that
bug and reopen new bugs with the relevant information in each?


> Where the original report describes more than one bug ('...and other
> thing...'), we should clone it and deal with each separately.
> 
> 7. Applying patches
> 
> Patches should normally be reviewed and accepted by the relevant
> upstream maintainer (aside from necessary adjustments for an older
> kernel version) before being applied.
> 
> 8. Talking to submitters
> 
> We should always be polite to submitters.  Not only is this implied by
> the Social Contract, but it is likely to lead to a faster resolution
> of the bug.  If a submitter overrated the severity, quietly downgrade
> it. If a submitter has done something stupid, request that they undo
> that and report back.  'Sorry', and 'please' make a big difference in
> tone.
> 

Attachment: signature.asc
Description: PGP signature


Reply to: