[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Yet another patch + how to check for stuff in shell scripts



Raphael Geissert <atomo64+debian@gmail.com> writes:

> Of course, $LEADIN isn't suitable for every check lintian is doing.  My
> main point was that several checks are actually very prone to false
> positives and false negatives.

Yeah, that's part of the problem and one of the things that makes the code
hard to maintain: some of the existing patterns are actually intentionally
tuned to avoid false positives that we found in practice, whereas some are
just randomly the way they are for no particularly good reason, and
there's no good way to tell the difference.

Lintian has a very effective feedback mechanism for false positives and I
regularly look through lintian.d.o trying to find them and fix them, but
it has no such feedback mechanism for false negatives.

For a lot of the checks (such as the unescaped period in invoke-rc.d), the
false positive risk is just theoretical since "no one does that."  Which
doesn't mean we shouldn't improve it, of course, if we can without
creating problems for ourselves.

> I tried to design the module to be lintian friendly and I've even
> started to implement it in checks/scripts; but I have to confess that
> some of the checks being done right after the bashisms checks in
> checks/scripts are painful and require special care.

Yeah, there's a lot of that in Lintian.

> The idea is to also perform all, or at least most, of the non-bashisms
> checks also in Devscripts::Checkbashisms and later separate the results
> and emit the necessary tags. That's what the add_hash method is for.

I'll have to see the patch to really comment.  Please do bear in mind that
we have to keep things running in stable, which means that adding new
package dependencies, while possible, is annoying to deal with.  I'm not
sure which way would be best -- to have devscripts depend on lintian and
use a module it provides, to have lintian depend on devscripts and use a
module it provides, or to package an independent package like Frank did
with libparse-debianchangelog-perl (although that's a special case since
it's also used by other things, like packages.debian.org).

What I think Lintian really needs is a module that can do the following:

* Read a shell script and provide the caller with a stream of logical
  lines, dealing with things like backslash escapes so that each caller
  doesn't have to.

* Recognize heredocs and filter them out of the command stream so that the
  caller only sees the contents of heredocs as arguments and not as
  possible separate commands.

* Recognize conditionals so that the caller can get simple information
  like "is this line inside a conditional or would it always execute?".
  This is important in Lintian in several places for false positive
  suppression; we often can assume that if code is conditional, the
  maintainer probably knows what they're doing and we shouldn't bug them
  about something we're unsure about.

* Recognize common patterns that Lintian needs to be concerned with, such
  as whether a conditional probes for the existence of a command using the
  regular which / command -v methods.  There are a bunch of stock shell
  script parsing patterns like that in Lintian right now.

* Optionally do simple canonicalization on commands before returning them
  to the caller for analysis, doing things like removing unnecessary
  quotes or filtering out command prefixes that don't affect whether the
  command would run (such as your time example, or more commonly, things
  like @ and - in Makefiles).

* Parse commands into a list of command and arguments so that code can do
  simple things like ask "was the command given this argument" without
  doing error-prone regex matching on the whole command line.

You could incorporate bashism checks into the same module, although to
some extent a bashism checker is a consumer of this sort of parsing
framework.

This is a chunk of work, yes, but bits of Lintian already do a lot of
this, and the checkbashisms code in particular already does quite a lot.
Working through the API to make sure the code is clean and readable is
probably the hardest part.  My off-the-cuff estimate is that it would take
me about eight hours of work, but I've been working on Policy recently and
need to work on Kerberos packages next and won't have those eight hours
for probably a month or two at least.

Once we had something like this, a ton of code in Lintian could be
simplified by using it, far more broadly than just check/scripts.  We look
for patterns in maintainer scripts in all sorts of places in Lintian, and
do it in a different way almost every time right now.

> Weren't shell scripts supposed to do that? :) 
> IMHO using yet another command language is *not* the solution.

Well, I obviously don't agree, but unless I or someone else has a chance
to write the code and specification for how it would work and see what it
looks like, it's an academic argument.  :)

-- 
Russ Allbery (rra@debian.org)               <http://www.eyrie.org/~eagle/>


Reply to: