[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Why is troubleshooting Linux so hard?

Thank you for the response.  Indeed, you are correct in that my problem
isn't specific to Linux kernel troubleshooting (although I could
dedicate a website to things that don't work there) but with the
software that runs on Debian in general.

To clarify, the problem I have is when the computer freezes and crashes,
I forcibly restart the computer, and I try to trace what caused the
problem and cannot do so.  I pick through the dozens of files
in /var/log/ and cannot find any clues about what caused the crash.
Even if I can find a suspicious log entry or two, Googling them directs
me to bug reports and forum posts from 2006.  Almost none of these are
relevant to tracing what caused the problem.

I'm not asking that every piece of software that's ever been written be
fixed overnight as the proposed 'solution' implies.  Rather, I want to
have the information to be able to troubleshoot problems.  This will
also help the package maintainers and volunteers who dedicate their time
to helping plebs like me.

Why are there so many duplicate and incomplete bug reports and fora
which ask the same questions over and over?  I've been guilty of
submitting duplicate bug reports even after I spent an hour searching
Google to make sure it hadn't been reported or solved already.  I'm not
asking to be able to understand the error messages.  I'm asking for them
to be useful in a search or forum post so we can solve the problem and
help the other Linux users.

But how would such a utopian scheme be implemented?  Well, my training
is in accounting so I'll tell you how they solve these problems.  A
governing body, like the SEC or AICPA, recognises a problem in its
standards and rules which, for example, allowed Enron to get away with
what it did for as long as it did.  They sit down and they say 'this
shouldn't happen again if accountants do this.'  They pass a regulation
and they say 'anyone who wants to issue compliant financial statements
needs to play by these rules.' They don't chase down every practising
accountant and every registered company and convince them to use the new
standards.  They just tell them that, to be part of the club, they have
to play by the new rules.  Debian, to my understanding, works that way.
A package which doesn't follow the rules has a grave bug filed against
it and isn't included in the new release until it's fixed.  Why does it
have to be any more complicated for making error messages useful?

The suggestion is that a PhD-level mastery of computer science is not
necessary to find a problem in open source software; a thorough
understanding of source code, languages, architectures, engineering and
the esoteric disciplines which software is supposed to simplify should
suffice.  Ironically, it is on those topics which PhD candidates write
their dissertations so I don't see the difference.  Is the conclusion
that the only people who use GNU/Linux/FOSS software should also be able
to write the software themselves?  I have a working knowledge of C, Java
and a few other languages.  I can't even read the source code to the
simplest projects let alone figure out why it crashed on me!  And, no,
valgrind is not a solution to this problem either.  Valgrind is for
debugging programs in development, not as a shell in which to run every
program in case it crashes.

Example: Evolution just closed on me whilst I was writing this
e-mail.  .xsession-errors reports "(evolution:5186): Gtk-WARNING **: A
floating object was finalized. This means that someone
called g_object_unref() on an object that had only a floating
reference; the initial floating reference is not owned by anyone
and must be removed with g_object_ref_sink()."  The entry isn't time
stamped so I don't know whether it's relevant to the crash or not.  A
Google search on this message (predictably) produces no results.  A
modified Google search reveals 9 results, one from 2007, one from 2009
and one talking about pinning the calendar on the Ubuntu Netbook Remix
or something.  How much easier would it be to trace this crash if the
entry said "16 November 2010 16:12: Evolution: Illegal call to xyz_();
Error 0x1EE7: Debian hates you too" or something to that effect? That
way, I wouldn't have to burden the mailing list and bug reports with a
"now what do I do? This happens randomly but happened several times this
week" message!

I'm sorry for the length of this message.  I would use fewer words if I
knew which ones to cut and still retain the point.

Reply to: