Re: Stanford IDG internal Perl style
On 2013-04-01 20:57, Russ Allbery wrote:
> Niels Thykier <niels@thykier.net> writes:
>> On 2013-03-29 18:24, Russ Allbery wrote:
>
>>> In general, follow the rules in the perlstyle man page and Perl Best
>>> Practices, except we use cuddled elses (else on the same line after }).
>>> And, of course, follow rules below rather than perlstyle if they
>>> conflict.
>
>> I don't think I have read the Perl Best Practises (assuming its the
>> O'Reilly one[1]), so I have to come back on that.
>
>> [1] http://refcards.com/docs/vromansj/perl-best-practices/refguide.pdf
>
> This refers to the book by Damian Conway. Not available for free so far
> as I know, although that's a summary.
>
AFAICT that would be the O'Reilly book (Damian Conway is writer of it
and O'Reilly published it).
> Thanks for all the typo fixes!
>
>[...]
You are welcome. :)
>> I have been considering to use autodie before (possible exception being
>> system/exec where it is always not compatible). I tend to prefer
>> exception based I/O instead of checking return values (which gets old
>> really soon), so I could be convinced to use autodie by default. Too
>> bad it does not cover all cases (print et al).
>
> Yeah, I really like and prefer autodie. One does have to be careful in
> situations where the file may not exist for some legitimate reason or one
> wants to present a decent error message to the user, but there are just so
> many cases where a missing file indicates some sort of serious internal
> error and it's better to just throw an exception.
>
> [...]
>
Okay, anyone else having comments on using autodie by default?
I am pro autodie at this point. Also I suspect it would allow us to
adopt certain perlcritic policies sooner (e.g.
InputOutput::RequireCheckedSyscalls)
>> (where partX is a "complex" computation) Or is it about splitting it
>> into temporary variables?
>
> It's the latter: splitting long calculations into intermediate variables.
>
Ok, thanks for clarifying.
>>> * Dereference variables with arrows for readability. (ie:
>>> "$record->{name}".
>
>> [...]
> No, just the first dereference (so your second example is correct). In
> other words, the goal is to avoid doubled sigils like $$record{name} and
> instead prefer $record->{name}. This is largely because the precedence
> for $$record{name} is always confusing and can break easily when the
> expression becomes a bit more complex.
>
Ah, thanks. I had been considering to "convert" my default to the
second example (because typing all those arrows gets boring).
>>> [...]
>>> * For clarity in some fonts, try to make single-character strings more
>>> obvious as to their intent:
>>>
>>> '' q{}
>>> ' ' q{ }
>>> ',' q{,}
>
>> I do not think I understand this one. Is it to use the second column
>> variants instead of the first?
>
> Yes.
>
> The problem here is that '' in a lot of fonts looks like ", and the
> difference between '' and ' ' can be hard to see. It's also easy to
> misread strings like ',' and ';' and think the comma or semicolon is part
> of the syntax. Consider, for example:
>
> print $foo, ',', $bar, ',', $baz;
>
> [...]
>
Ah, I guess the font(s) I use do not have this problem a lot. This
particular policy appears to be implemented in perlcritic as:
ValuesAndExpressions::ProhibitNoisyQuotes
We are not doing too well on that in Lintian, but I suspect we could
enable that policy with 30 minutes work (plus the time for running the
full test suite)
> [...]
>>> * Use a space between keywords (if, elsif, while) and functions and
>>> parenthesized arguments. Do not put a space between the opening or
>>> closing parentheses and the contents. For example:
>>>
>>> if ($foo) {
>>> bar ($baz);
>>> }
>
>> In your comment to #704197, you suggested you had dropped the space
>> between the function/sub name and its parentheses? i.e.
>
>> if ($foo) {
>> bar($baz);
>> }
>
>> Or did I misunderstand you here?
>
> No, that's just a mistake. Jon Robertson (who wrote up the style guide)
> and I are both so used to putting a space there that we keep forgetting.
> But everyone else felt strongly about omitting the space.
>
>[...]
>> Is that still map(EXPR, LIST) for the alternative call variant?
>
> I don't think we've ever used that style in any of our code. I find the
> non-block form of map really weird, and would rather write:
>
> @chars = map { chr($_) } @numbers;
>
> than:
>
> @chars = map(chr, @numbers);
>
> (from the perlfunc documentation). The latter looks like a bareword to me
> and looks quite confusing.
>
Sounds like we should consider using the "block" variant of these
builtins exclusively. I know I changed some of our grep(EXPR, LIST)
usage when I deployed the "don't use grep in bool context" patch.
>>> # Control Structures
>>>
>>> * Only use postfix if or unless for flow control statements (next,
>>> last, return, etc) where there is nothing or almost nothing between the
>>> flow control statement and the if/unless. Code between the flow
>>> control statement and the if/unless can make the fact that it's a
>>> conditional less obvious and confuse some users.
>
>> I might be inclined to use postfix if/unless for "tag" and die as well?
>
> I would too. We decided not to internally due to the feelings of folks
> other than me who felt that always using the block if structure was
> simpler, but there is *so much* of that code in Lintian that I'd be rather
> reluctant to rewrite it all as block ifs.
>
> On the other hand, people made a really good argument that code like:
>
> die "with some error message" if $error;
>
> is really confusing to people who come from languages that don't use
> statement modifier forms heavily, and pushes the arguably most important
> part (the error condition) out to the right margin past the less
> interesting error message. The same thing could be argued as applying to
> tags as well.
>
On the other hand, it moves the consequence to the left. :) But yes,
the error message can get in the way. Maybe a more "assert-like"
operator would have been better
die_if $condition, "...";
die_unless $condition, "...";
At least for simple conditions and if complex conditions are to be avoid
anyway ... :)
>>> # Subroutines
>>>
>>> [...]
>>> * Always return with an explicit return, to make certain that you are
>>> not returning an implicit value that may not be what you expect. Use a
>>> bare return to return failure, as 'return undef' will not do what you
>>> expect if the function was called in a list context.
>
>> exception being one-line anonymous subs/blocks?
>
> perlcritic makes it hard to make exceptions here without adding magic
> comments. I usually stick an explicit return in anyway when it's a
> one-line anonymous sub; it usually isn't too hard to work in. (Blocks are
> different; blocks don't need a return.)
>
Hmm, it seems that perlcritic does not detect anonymous subs at all.
Anyhow, I think this might be a good policy to add as well - although
Lintian is, erh, lacking a bit behind here as well. :)
> [...]
>>> * Use m{}xsm for matches by default. x is very useful for readability
>>> in longer regular expressions. m will match multiline strings better,
>>> and s will improve the handling of "." to match newlines in any
>>> multiline string. In many cases they aren't needed, but we want to
>>> encourage them as a default.
>
>> Given how rarely we seem to be maching multiline strings, I am not sure
>> it makes sense to use this rule in Lintian?
>
> The argument in Perl Best Practices is that xms should just be the default
> regex flavor and you should just use that everywhere. I've started doing
> that, and I admit that I kind of like it. There is tons of other advice
> on how to structure regexes using /x too (such as always using character
> classes instead of \ for metacharacters).
>
> It does make all the regexes more verbose and it's a daunting amount of
> conversion work with an existing code base. I don't think it's a horribly
> high priority.
I am no issue with enabling them eventually; I was just a bit put off by
the amount of hits we get on that policy.
> But note that, for example, file names *can* contain
> newlines in UNIX, so there's something to be said for avoiding surprises.
>
Which Lintian will not handle those correctly (at least not in the
general case). I know we have code that looks like it will work with
newlines, but I am fairly sure that our index will break these days and
so will a couple of deserialization of other collection data (e.g. readelf).
>>> * Do not use interspersed POD to document functions. All POD should
>>> start after the code is done, with an __END__ statement before it.
>
>> I have never quite understood this; especially not with the requirement
>> of having a second description at the function itself.
>
> I wouldn't adopt this for Lintian. The primary objection to intermixed
> POD that I've seen is that a lot of editors deal very poorly with it,
> since the syntax is completely different than the surrounding code.
> (Personally, I use mmm-mode in Emacs, which does okay, although emacs24
> seems likely to make that more of a problem.)
Okay, wasn't aware of editor issues here.
> There are other objections
> in Perl Best Practices, which mostly amount to Conway's opinion that
> internal documentation and user-facing documentation are different and
> should be written in different styles with different audiences, something
> that I buy in part but not enough to make the duplicate work for a case
> like Lintian's Perl modules seem worth it.
>
Right, personally I am usually working in a project where the project
itself is a freq. user of (a substational part of) its own public API.
Unless Conway's point is that you can get away with less documentation
for internal stuff, I am not sure I can agree with his view on that.
Though then again, these projects are rarely written in Perl... :)
> [...]
>>> * Use Getopt::Long::Descriptive for option parsing in any script that
>>> takes options. Give each option a corresponding single-character option
>>> unless it's rarely used. When listing the options, give the short
>>> option first, then |, and then the long option, as in:
>>>
>>> my ($opt, $usage = describe_options(
> [...]
>>> '$0 $o <args>',
>>> [ 'd|delete', 'delete the given object' ],
>>> [ 'e|exclude=s', 'exclude any matching names' ],
>>> [ 'h|help', 'display help' ],
>>> ) or exit 1;
>
>> Perhaps we should convert Lintian to use this? If only so I don't have
>> to remember to update the usage with changes to cmd options. :P
>
> Yeah, exactly. :) I'd at least consider it. We did a survey of all of
> the available modules, and while Getopt::Euclid looked rather neat in
> terms of not having to duplicate information from the documentation, it's
> also full of scary black magic and had some other issues.
>
> [...]
Getopt::Euclid does look like an interesting concept, but I don't mind
using Getopt::Long::Descriptive if it is more reliable. It appears to
be a pretty decent alternative to Getopt::Long and would probably let us
remove "sub syntax" (~70 lines) from the frontend. Maybe even a bit
more with that validation stuff.
~Niels
Reply to: