Re: Yet another patch + how to check for stuff in shell scripts

To: debian-lint-maint@lists.debian.org
Subject: Re: Yet another patch + how to check for stuff in shell scripts
From: Raphael Geissert <atomo64+debian@gmail.com>
Date: Mon, 07 Jul 2008 18:59:44 -0500
Message-id: <[🔎] g4ualh$tlu$1@ger.gmane.org>
References: <[🔎] g4r533$182$1@ger.gmane.org> <[🔎] 87wsjyaj0i.fsf@windlord.stanford.edu> <[🔎] g4rmvq$icg$1@ger.gmane.org> <[🔎] 87y74e8que.fsf@windlord.stanford.edu> <[🔎] g4rqij$qoi$1@ger.gmane.org> <[🔎] 87abgu8hl0.fsf@windlord.stanford.edu>

Russ Allbery wrote:

> Raphael Geissert <atomo64+debian@gmail.com>
> writes:
> 
>> Of course, $LEADIN isn't suitable for every check lintian is doing.  My
>> main point was that several checks are actually very prone to false
>> positives and false negatives.
> 
> Yeah, that's part of the problem and one of the things that makes the code
> hard to maintain: some of the existing patterns are actually intentionally
> tuned to avoid false positives that we found in practice, whereas some are
> just randomly the way they are for no particularly good reason, and
> there's no good way to tell the difference.
> 
> Lintian has a very effective feedback mechanism for false positives and I
> regularly look through lintian.d.o trying to find them and fix them, but
> it has no such feedback mechanism for false negatives.

Maybe experimental tags can be (ab)used with tags where 'greedier' regexes
are used, resulting in more data that can be used to improve the regexes.

> 
> For a lot of the checks (such as the unescaped period in invoke-rc.d), the
> false positive risk is just theoretical since "no one does that."  Which
> doesn't mean we shouldn't improve it, of course, if we can without
> creating problems for ourselves.
> 
>> I tried to design the module to be lintian friendly and I've even
>> started to implement it in checks/scripts; but I have to confess that
>> some of the checks being done right after the bashisms checks in
>> checks/scripts are painful and require special care.
> 
> Yeah, there's a lot of that in Lintian.
> 
>> The idea is to also perform all, or at least most, of the non-bashisms
>> checks also in Devscripts::Checkbashisms and later separate the results
>> and emit the necessary tags. That's what the add_hash method is for.
> 
> I'll have to see the patch to really comment.  Please do bear in mind that
> we have to keep things running in stable, which means that adding new
> package dependencies, while possible, is annoying to deal with.  I'm not
> sure which way would be best -- to have devscripts depend on lintian and
> use a module it provides, to have lintian depend on devscripts and use a
> module it provides, or to package an independent package like Frank did
> with libparse-debianchangelog-perl (although that's a special case since
> it's also used by other things, like packages.debian.org).

I was actually thinking about shipping a copy of the module in lintian
itself. Because I don't think lintian wants to depend on devscripts, the
same way devscripts doesn't want to depend on another package just for
checkbashisms to work.

> 
> What I think Lintian really needs is a module that can do the following:
> 
> * Read a shell script and provide the caller with a stream of logical
>   lines, dealing with things like backslash escapes so that each caller
>   doesn't have to.
> 
> * Recognize heredocs and filter them out of the command stream so that the
>   caller only sees the contents of heredocs as arguments and not as
>   possible separate commands.

Both already done in some way by checkbashisms.

> 
> * Recognize conditionals so that the caller can get simple information
>   like "is this line inside a conditional or would it always execute?".
>   This is important in Lintian in several places for false positive
>   suppression; we often can assume that if code is conditional, the
>   maintainer probably knows what they're doing and we shouldn't bug them
>   about something we're unsure about.

IMO this is one of the most complex and complicated things to do, as it
requires more than a "regexes logic".

> 
> * Recognize common patterns that Lintian needs to be concerned with, such
>   as whether a conditional probes for the existence of a command using the
>   regular which / command -v methods.  There are a bunch of stock shell
>   script parsing patterns like that in Lintian right now.

Something like $LEADIN?

> 
> * Optionally do simple canonicalization on commands before returning them
>   to the caller for analysis, doing things like removing unnecessary
>   quotes or filtering out command prefixes that don't affect whether the
>   command would run (such as your time example, or more commonly, things
>   like @ and - in Makefiles).

Already done by checkbashisms.

> 
> * Parse commands into a list of command and arguments so that code can do
>   simple things like ask "was the command given this argument" without
>   doing error-prone regex matching on the whole command line.

Not impossible, but won't be easy either.

> 
> You could incorporate bashism checks into the same module, although to
> some extent a bashism checker is a consumer of this sort of parsing
> framework.

Sure; but I have to say that checkbashisms is currently are both things
together and the perl module allows bashisms + whatever you want.

> 
> This is a chunk of work, yes, but bits of Lintian already do a lot of
> this, and the checkbashisms code in particular already does quite a lot.
> Working through the API to make sure the code is clean and readable is
> probably the hardest part.  My off-the-cuff estimate is that it would take
> me about eight hours of work, but I've been working on Policy recently and
> need to work on Kerberos packages next and won't have those eight hours
> for probably a month or two at least.
> 
> Once we had something like this, a ton of code in Lintian could be
> simplified by using it, far more broadly than just check/scripts.  We look
> for patterns in maintainer scripts in all sorts of places in Lintian, and
> do it in a different way almost every time right now.
> 
>> Weren't shell scripts supposed to do that? :)
>> IMHO using yet another command language is *not* the solution.
> 
> Well, I obviously don't agree, but unless I or someone else has a chance
> to write the code and specification for how it would work and see what it
> looks like, it's an academic argument.  :)
> 

:)


Cheers,
-- 
Atomo64 - Raphael

Please avoid sending me Word, PowerPoint or Excel attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html

Reply to:

Follow-Ups:
- Re: Yet another patch + how to check for stuff in shell scripts
  - From: Russ Allbery <rra@debian.org>

References:
- Yet another patch + how to check for stuff in shell scripts
  - From: Raphael Geissert <atomo64+debian@gmail.com>
- Re: Yet another patch + how to check for stuff in shell scripts
  - From: Russ Allbery <rra@debian.org>
- Re: Yet another patch + how to check for stuff in shell scripts
  - From: Raphael Geissert <atomo64+debian@gmail.com>
- Re: Yet another patch + how to check for stuff in shell scripts
  - From: Russ Allbery <rra@debian.org>
- Re: Yet another patch + how to check for stuff in shell scripts
  - From: Raphael Geissert <atomo64+debian@gmail.com>
- Re: Yet another patch + how to check for stuff in shell scripts
  - From: Russ Allbery <rra@debian.org>

Prev by Date: [SCM] Debian package checker branch, master, updated. 4d78609fc20f6a8d682b9a8f0b9321f91ed4db46
Next by Date: Re: Yet another patch + how to check for stuff in shell scripts
Previous by thread: Re: Yet another patch + how to check for stuff in shell scripts
Next by thread: Re: Yet another patch + how to check for stuff in shell scripts
Index(es):
- Date
- Thread