[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

RFC: Bug handling policy



I've drafted a policy based on what I believe to be best practice.
Please comment - is anything wrong, or anything missing?  I also left
some questions in square brackets.

Ben.

---

1. Required information

Submitters are expected to run reportbug or other tool that runs our
'bug' script under the kernel version in question.  The response to
reports without this information should be a request to follow-up using
reportbug.  If we do not receive this the bug may be closed [after how
long?].

Exceptions:
* If the kernel does not boot or is very unstable, instead of the usual
  system information we need the console messages in a clear photograph
  or via serial console or netconsole (or even retyping as a last
  resort).  Ask the submitter to remove 'quiet' and add 'vga=6' (where
  applicable) to the kernel command line.
* If the report is relaying information about a bug acknowledged
  upstream, we do not need system information but we do need specific
  references (bugzilla.kernel.org or git commit id).
* If the bug is clearly not hardware-specific (e.g. packaging error), we
  do not need system information.

2. Severities

Many submitters believe that their bug meets one of the following
criteria for high severity.  We interpret them as follows and will
downgrade as appropriate:

'critical: makes unrelated software on the system (or the whole system)
break...'
   The bug must make the kernel unbootable or unstable on common
   hardware or all systems that a specific flavour is supposed to
   support.  There is no 'unrelated software' since everything
   depends on the kernel.

'grave: makes the package in question unusable or mostly so...'
   If the kernel is unusable, this already qualifies as critical.

[Alternately: given that the user can normally reboot into an earlier
kernel version, does that mean the bug is 'grave', not 'critical'?]

'grave: ...or causes data loss...'
   We exclude loss of data in memory due to a crash.  Only corruption
   of data in storage or communication, or silent failure to write data,
   qualifies.

3. Tagging

We do not use user-tags, but in order to aid bug triage we should:

* Add 'moreinfo' whenever we are waiting for a response from the
  submitter and remove it when we are not.
* Add 'upstream', 'fixed-upstream', 'patch', 'help' where appropriate
* Not add 'unreproducible', since bugs are commonly hardware-dependent

4. Analysis by maintainers

Generally we should not expect to be able to reproduce bugs without
having similar hardware.  We should consider:

* Searching bugzilla.kernel.org (including closed bugs) or other
  relevant bug tracker
* Searching kernel mailing lists (of the many archives,
  http://news.gmane.org seems to suck least)
* Viewing git commit logs for relevant source files
  - In case of a regression, from the known good to the bad version
  - In other cases, from the bad version forwards, in case the bug
    has been fixed since
* Searching kerneloops.org for similar oopses
* Matching the machine code and registers in an 'oops' against the
  source and deducing how the impossible happened (this doesn't work
  that often but when it does you look like a genius ;-)

5. Testing by submitter

Depending on the technical sophistication of the submitter and the
service requirements of the system in question (e.g. whether it's a
production server) we can request one or more of the following:

* Gathering more information passively (e.g. further logging, reporting
  contents of files in procfs or sysfs)
* Upgrading to the current stable/stable-proposed-updates/
  stable-security version, if it includes a fix for a similar bug
* Adding debug or fallback options to the kernel command line or
  module parameters
* Installing the unstable [or backports?] version temporarily
* Rebuilding and installing the kernel with a specific patch added
  [I think we should add a script to the source to make this easier]

When a bug occurs in what upstream considers the current or previous
stable release, and we cannot fix it, we ask the submitter to report it
upstream at bugzilla.kernel.org under a specific Product and Component,
and to tell us the upstream bug number.  We do not report bugs directly
because follow-up questions from upstream need to go to the submitter,
not to us.  Given the upstream bug number, we mark the bug as forwarded.
bts-link then updates its status.

6. Keeping bugs separate

Many submitters search for a characteristic error message and treat this
as indicating a specific bug.  This can lead to many 'me too' follow-ups
where, for example, the message indicates a driver bug and the second
submitter is using a different driver from the original submitter.  We
should try to respond to such a follow-up quickly, requesting a separate
bug report.  Otherwise the original report is likely to turn into a mess
of conflicting information about two or more different bugs.

Where the original report describes more than one bug ('...and other
thing...'), we should clone it and deal with each separately.

7. Applying patches

Patches should normally be reviewed and accepted by the relevant
upstream maintainer (aside from necessary adjustments for an older
kernel version) before being applied.

8. Talking to submitters

We should always be polite to submitters.  Not only is this implied by
the Social Contract, but it is likely to lead to a faster resolution of
the bug.  If a submitter overrated the severity, quietly downgrade it.
If a submitter has done something stupid, request that they undo that
and report back.  'Sorry', and 'please' make a big difference in tone.

-- 
Ben Hutchings
It is impossible to make anything foolproof because fools are so ingenious.

Attachment: signature.asc
Description: This is a digitally signed message part


Reply to: