[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [PATCH] fix regexp for blacklist devices



05/05/2012 11:20 PM, intrigeri:
> anonym wrote (03 May 2012 12:44:47 GMT) :
>> See the attached patch. The is_in_list() function therein:
>> * interprets white-spaces and newlines as separators; all other
>>   chacaters form "words", so there should be no more surprises.
> I find this design decision sane to make this function useful and as
> robust as possible in the context of live-boot.
>
> However, the function semantics is different from how argument lists
> work in shell usually: the shell has a slightly more elaborate vision
> of what a list is, e.g. it supports list elements that contain whitespace.

Right, but such lists are fragile, error prone and confusing, and I
think (hope?) we all agree that they should be avoided. In fact, that
has been avoided in several parts of the code already. For instance,
snapshot data are returned as a shell list of colon-separated "lists" in
probe_for_filename(), even though shell lists of shell lists could have
been used.

> Therefore, I think the function's name should reflect the "list" it's
> about is *not* a "normal" shell list. Perhaps something like
> is_in_whitespace_separated_list. Yes, that's ugly. But with the
> current function name, while I am in a shell programming mindset, the
> way it is implemented would probably surprise me more than once.

Sure. I'll pick is_in_space_sep_list() for something with a bit more
brevity. What you suggest is also necessary because we also need a
is_in_comma_sep_list() for comma separated lists.

>> * doesn't depend on quoting input to form "lists".
> What do you mean exactly?

You don't have to do quote the lists elements to form the list parameter
to send to is_in_list().

>> * doesn't use any fancy grep functionality (no -w, no \< or \>),
>>   just basic regex stuff that ought to be correctly implemented
>>   in busybox.
> Just to be sure, have you verified it works properly with
> busybox grep?

Of course.

>> * makes the code much more readable.
> Sure.
>
>
> Also, I was surprised to discover that `is_in_list a b a` is wrong in
> dash, while it's true in bash and in busybox sh. After `local l=${@}'
> is evaluated, $l contains "b" in dash, while it contains "b a" in bash
> and busybox sh. The rest of the code (where you migrate a few `grep
> "$list"' to `is_in_list elt $list', removing quotes along the way)
> assumes bash and busybox behaviour. For what it's worth, replacing
> that line with `local l="${*}"' makes all these three shells think, in
> perfect agreement, that `is_in_list a b a' is true, which I would find
> much saner, given the idea is to avoid a certain class of subtle, hard
> to understand future bugs :)

Thanks!

>> +	echo ${l} | grep -qe "^\(.*[[:space:]]\)\?${e}\([[:space:]].*\)\?$"
> If the regexp language in use knows about them, these should really be
> non-capturing, grouping parenthesis -- e.g. in Perl regexp syntax, one
> would write the first group like this: (?:.*[[:space:]])?
>
> (It looks like the busybox package only installed busybox(1) in terms
> of busybox documentation on my Debian sid, and alas, this manpage does
> not document what kind of pattern is supported by their grep, so
> I can't check myself right now.)

Non-capturing groups are not part of POSIX regular expressions so they
shouldn't be implemented in busybox grep (which btw also lacks GNU
grep's -P parameter for perl regular expressions, which otherwise
could've been used).

Cheers!


Attachment: signature.asc
Description: OpenPGP digital signature


Reply to: