[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Why is troubleshooting Linux so hard?

My suggestion, can't we create troubleshooting database??
On Sun, Aug 15, 2010 at 11:30 AM, Borden Rhodes <jrvp@bordenrhodes.com> wrote:
Good morning,

I'm going to list some of the frustrations I've been having with
troubleshooting Linux's quirks, crashes and problems in hopes that someone may
be able to help me (and the community) become better bug reporters and
troubleshooters.  I'll make comparisons to Windows only because I am used to
fixing the same problems in Windows a certain way - maybe there are analogies
in Linux or maybe I'm approaching these problems the wrong way.  I'm not
trying to troll or flame-bait.  I'm using Debian Squeeze, by the way.

1) Is there a way to apply debugging symbols retroactively to a dump? A few
times I've had Linux crash on me and spit out a debugging dump.  I do my best
to install debugging symbols for all 1400 packages I have on my system (when I
can find them) but this requires a huge amount of hard disk space and,
invariably, the odd dump is missing symbols.  Recreating the crash isn't
always possible.  Is there (or could someone invent) a way to save a dump
without the symbols, download the symbol tables and then regenerate the dump
with the symbols so it's useful to developers?

2) I find that the logs contain lots of facts but not a whole lot of useful
information (if any) when something goes wrong.  I've had KDE go black-screen
on me, for example, and force a hard reboot but there's no mention whatsoever
(that I can find) in xorg.log, kdm.log, messages, syslog or dmesg.  Windows
seems to be fairly good at making its last breath a stop error before it dies
which means when I get back into the system (or when I'm looking at a client's
computer days after) I can find that stop error, look it up and figure out what
went wrong.  Are Linux's logs designed for troubleshooting or only for
monitoring?  Are proper troubleshooting logs kept somewhere else or in a
special file? Is there a guide on how to read Linux's logs so I can make sense
out of them like I can Windows' logs?

3) Linux needs better troubleshooting and recovery systems.  The answer I
usually get when I get an unexplained error is to run the program inside a dbg
or with valgrind.  I'm not convinced that this is a practical way to
troubleshoot serious problems (like kernel panics) and it requires a certain
amount of foresight that a problem will occur.  According to this logic, the
only way that someone can produce useful reports and feedback (or even get a
clue as to what happened) on the day-to-day crashes and bugs is to start Linux
and all of its sub process inside valgrind and/or gdb.  This is obviously not
an intended use of these programs.

This is what would make it easier (at least for me) to troubleshoot Linux
problems.  If these features exist, please let me know so I can start using
them (they should probably be documented in the man pages too).

1) Logs need to have useful information.  When I look at a client's Windows
box days after they report something going wrong, the logs tell me at what
time the problem happened, which process failed and what error it threw just
before it blew.  I can look those error codes up and (usually) fix the problem
within an hour.  When something dies on Linux, the log entry (assuming it even
makes one) only tells me how many seconds into that particular boot the
problem occurred. I've never been able to go back a few days later and find the
log entries related to a particular crash - maybe because they've been purged.
I know that the Linux tradition is to identify processes only by ID but surely
there must be a way that it can print a file or package name or anything more
useful than memory addresses and registers so at least I know where to start
pointing fingers.  Several people have told me that it's pointless trying to
debug a dump in the logs.  What's the point of dumping it in the first place if
nobody can read it?

2) I wish error logs had simple codes or messages (which have documentation)
like Windows Stop errors so I can look them up and figure out why something
died.  Often times I try to Google the whole error message and either get
directed to source code or totally irrelevant postings (since it seems that
many messages are reused for all kinds of problems).  For example, 'segfault'
gets thrown so much that it only tells you that the program crashed -
something I already know.

3) Logs need better organisation.  I'm looking at the most recent dump and
each message is printed on its own line.  The problem is that interspersed in
those individual lines may be other entries from other events not related to
the problem in question.  When I look at a Windows log, each event is entirely
contained in one entry.  It doesn't make one entry for "Stop", another entry
for the Stop number, another 4 entries for the parameters and more entries for
whatever other information usually is in them - whilst having other entries
amid the list with what other things were doing at the time.  I find Linux logs
very frustrating to read for that reason since I don't know when an event is
finished reporting or which items are relevant.

4) Logs need to focus on reporting on one thing and making sure it reports
that one thing well.  Other than formatting, I can't see many differences
between syslog, dmesg and messages.  Xorg.log is some help for troubleshooting
misconfigured xorg.conf files (which are depreciated anyway) but not very useful
if your X session burns down.  kdm.log seems identical to Xorg.log except for
a few KDE-specific entries.  I had to uninstall my firewall because it kept
writing firewall entries to messages (and stdout) and I couldn't figure out how
to get it to stop.  Why isn't there one log that only deals with hardware
status and changes, another one that only deals with network status and
firewall logging, another one which only deals with dumps and crashes and so

Maybe I just haven't found the right manual yet that has all of these answers
so I'd appreciate any direction.

With regards,

Borden Rhodes

To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 201008150200.52677.jrvp@bordenrhodes.com" target="_blank">http://lists.debian.org/201008150200.52677.jrvp@bordenrhodes.com

Wishing you the very best of everything, always!!!
Kousik Maiti(কৌশিক মাইতি)
Registered Linux User #474025
Registered Ubuntu User # 28654

Reply to: