Re: Stanford IDG internal Perl style

To: debian-lint-maint@lists.debian.org
Subject: Re: Stanford IDG internal Perl style
From: Niels Thykier <niels@thykier.net>
Date: Mon, 01 Apr 2013 23:39:15 +0200
Message-id: <[🔎] 5159FE83.4040907@thykier.net>
In-reply-to: <[🔎] 87sj3am62z.fsf@windlord.stanford.edu>
References: <87k3oqdsq7.fsf@windlord.stanford.edu> <515620EA.70104@thykier.net> <[🔎] 87sj3am62z.fsf@windlord.stanford.edu>
On 2013-04-01 20:57, Russ Allbery wrote:
> Niels Thykier <niels@thykier.net> writes:
>> On 2013-03-29 18:24, Russ Allbery wrote:
> 
>>> In general, follow the rules in the perlstyle man page and Perl Best
>>> Practices, except we use cuddled elses (else on the same line after }).
>>> And, of course, follow rules below rather than perlstyle if they
>>> conflict.
> 
>> I don't think I have read the Perl Best Practises (assuming its the
>> O'Reilly one[1]), so I have to come back on that.
> 
>> [1] http://refcards.com/docs/vromansj/perl-best-practices/refguide.pdf
> 
> This refers to the book by Damian Conway.  Not available for free so far
> as I know, although that's a summary.
> 

AFAICT that would be the O'Reilly book (Damian Conway is writer of it
and O'Reilly published it).

> Thanks for all the typo fixes!
> 
>[...]

You are welcome.  :)

>> I have been considering to use autodie before (possible exception being
>> system/exec where it is always not compatible).  I tend to prefer
>> exception based I/O instead of checking return values (which gets old
>> really soon), so I could be convinced to use autodie by default.  Too
>> bad it does not cover all cases (print et al).
> 
> Yeah, I really like and prefer autodie.  One does have to be careful in
> situations where the file may not exist for some legitimate reason or one
> wants to present a decent error message to the user, but there are just so
> many cases where a missing file indicates some sort of serious internal
> error and it's better to just throw an exception.
> 
> [...]
> 

Okay, anyone else having comments on using autodie by default?

I am pro autodie at this point.  Also I suspect it would allow us to
adopt certain perlcritic policies sooner (e.g.
InputOutput::RequireCheckedSyscalls)

>> (where partX is a "complex" computation)  Or is it about splitting it
>> into temporary variables?
> 
> It's the latter: splitting long calculations into intermediate variables.
> 

Ok, thanks for clarifying.

>>> * Dereference variables with arrows for readability.  (ie:
>>> "$record->{name}".
> 
>> [...]
> No, just the first dereference (so your second example is correct).  In
> other words, the goal is to avoid doubled sigils like $$record{name} and
> instead prefer $record->{name}.  This is largely because the precedence
> for $$record{name} is always confusing and can break easily when the
> expression becomes a bit more complex.
> 

Ah, thanks.  I had been considering to "convert" my default to the
second example (because typing all those arrows gets boring).

>>> [...]
>>> * For clarity in some fonts, try to make single-character strings more
>>> obvious as to their intent:
>>>
>>>         ''      q{}
>>>         ' '     q{ }
>>>         ','     q{,}
> 
>> I do not think I understand this one.  Is it to use the second column
>> variants instead of the first?
> 
> Yes.
> 
> The problem here is that '' in a lot of fonts looks like ", and the
> difference between '' and ' ' can be hard to see.  It's also easy to
> misread strings like ',' and ';' and think the comma or semicolon is part
> of the syntax.  Consider, for example:
> 
>     print $foo, ',', $bar, ',', $baz;
> 
> [...]
> 
Ah, I guess the font(s) I use do not have this problem a lot.  This
particular policy appears to be implemented in perlcritic as:
  ValuesAndExpressions::ProhibitNoisyQuotes

We are not doing too well on that in Lintian, but I suspect we could
enable that policy with 30 minutes work (plus the time for running the
full test suite)

> [...]
>>> * Use a space between keywords (if, elsif, while) and functions and
>>> parenthesized arguments. Do not put a space between the opening or
>>> closing parentheses and the contents. For example:
>>>
>>>         if ($foo) {
>>>             bar ($baz);
>>>         }
> 
>> In your comment to #704197, you suggested you had dropped the space
>> between the function/sub name and its parentheses?  i.e.
> 
>>         if ($foo) {
>>             bar($baz);
>>         }
> 
>> Or did I misunderstand you here?
> 
> No, that's just a mistake.  Jon Robertson (who wrote up the style guide)
> and I are both so used to putting a space there that we keep forgetting.
> But everyone else felt strongly about omitting the space.
> 
>[...]
>> Is that still map(EXPR, LIST) for the alternative call variant?
> 
> I don't think we've ever used that style in any of our code.  I find the
> non-block form of map really weird, and would rather write:
> 
>     @chars = map { chr($_) } @numbers;
> 
> than:
> 
>     @chars = map(chr, @numbers);
> 
> (from the perlfunc documentation).  The latter looks like a bareword to me
> and looks quite confusing.
> 

Sounds like we should consider using the "block" variant of these
builtins exclusively.  I know I changed some of our grep(EXPR, LIST)
usage when I deployed the "don't use grep in bool context" patch.


>>> # Control Structures
>>>
>>> * Only use postfix if or unless for flow control statements (next,
>>> last, return, etc) where there is nothing or almost nothing between the
>>> flow control statement and the if/unless.  Code between the flow
>>> control statement and the if/unless can make the fact that it's a
>>> conditional less obvious and confuse some users.
> 
>> I might be inclined to use postfix if/unless for "tag" and die as well?
> 
> I would too.  We decided not to internally due to the feelings of folks
> other than me who felt that always using the block if structure was
> simpler, but there is *so much* of that code in Lintian that I'd be rather
> reluctant to rewrite it all as block ifs.
> 
> On the other hand, people made a really good argument that code like:
> 
>     die "with some error message" if $error;
> 
> is really confusing to people who come from languages that don't use
> statement modifier forms heavily, and pushes the arguably most important
> part (the error condition) out to the right margin past the less
> interesting error message.  The same thing could be argued as applying to
> tags as well.
> 

On the other hand, it moves the consequence to the left. :)  But yes,
the error message can get in the way.  Maybe a more "assert-like"
operator would have been better

  die_if $condition, "...";
  die_unless $condition, "...";

At least for simple conditions and if complex conditions are to be avoid
anyway ... :)

>>> # Subroutines
>>>
>>> [...]
>>> * Always return with an explicit return, to make certain that you are
>>> not returning an implicit value that may not be what you expect.  Use a
>>> bare return to return failure, as 'return undef' will not do what you
>>> expect if the function was called in a list context.
> 
>> exception being one-line anonymous subs/blocks?
> 
> perlcritic makes it hard to make exceptions here without adding magic
> comments.  I usually stick an explicit return in anyway when it's a
> one-line anonymous sub; it usually isn't too hard to work in.  (Blocks are
> different; blocks don't need a return.)
> 

Hmm, it seems that perlcritic does not detect anonymous subs at all.

Anyhow, I think this might be a good policy to add as well - although
Lintian is, erh, lacking a bit behind here as well.  :)

> [...]
>>> * Use m{}xsm for matches by default.  x is very useful for readability
>>> in longer regular expressions.  m will match multiline strings better,
>>> and s will improve the handling of "." to match newlines in any
>>> multiline string.  In many cases they aren't needed, but we want to
>>> encourage them as a default.
> 
>> Given how rarely we seem to be maching multiline strings, I am not sure
>> it makes sense to use this rule in Lintian?
> 
> The argument in Perl Best Practices is that xms should just be the default
> regex flavor and you should just use that everywhere.  I've started doing
> that, and I admit that I kind of like it.  There is tons of other advice
> on how to structure regexes using /x too (such as always using character
> classes instead of \ for metacharacters).
> 
> It does make all the regexes more verbose and it's a daunting amount of
> conversion work with an existing code base.  I don't think it's a horribly
> high priority.

I am no issue with enabling them eventually; I was just a bit put off by
the amount of hits we get on that policy.

>  But note that, for example, file names *can* contain
> newlines in UNIX, so there's something to be said for avoiding surprises.
> 

Which Lintian will not handle those correctly (at least not in the
general case).  I know we have code that looks like it will work with
newlines, but I am fairly sure that our index will break these days and
so will a couple of deserialization of other collection data (e.g. readelf).

>>> * Do not use interspersed POD to document functions.  All POD should
>>> start after the code is done, with an __END__ statement before it.
> 
>> I have never quite understood this; especially not with the requirement
>> of having a second description at the function itself.
> 
> I wouldn't adopt this for Lintian.  The primary objection to intermixed
> POD that I've seen is that a lot of editors deal very poorly with it,
> since the syntax is completely different than the surrounding code.
> (Personally, I use mmm-mode in Emacs, which does okay, although emacs24
> seems likely to make that more of a problem.)

Okay, wasn't aware of editor issues here.

>  There are other objections
> in Perl Best Practices, which mostly amount to Conway's opinion that
> internal documentation and user-facing documentation are different and
> should be written in different styles with different audiences, something
> that I buy in part but not enough to make the duplicate work for a case
> like Lintian's Perl modules seem worth it.
> 

Right, personally I am usually working in a project where the project
itself is a freq. user of (a substational part of) its own public API.
Unless Conway's point is that you can get away with less documentation
for internal stuff, I am not sure I can agree with his view on that.
Though then again, these projects are rarely written in Perl... :)

> [...]
>>> * Use Getopt::Long::Descriptive for option parsing in any script that
>>> takes options. Give each option a corresponding single-character option
>>> unless it's rarely used. When listing the options, give the short
>>> option first, then |, and then the long option, as in:
>>>
>>>         my ($opt, $usage = describe_options(
> [...]
>>>           '$0 $o <args>',
>>>           [ 'd|delete', 'delete the given object' ],
>>>           [ 'e|exclude=s', 'exclude any matching names' ],
>>>           [ 'h|help',      'display help' ],
>>>         ) or exit 1;
> 
>> Perhaps we should convert Lintian to use this?  If only so I don't have
>> to remember to update the usage with changes to cmd options. :P
> 
> Yeah, exactly.  :)  I'd at least consider it.  We did a survey of all of
> the available modules, and while Getopt::Euclid looked rather neat in
> terms of not having to duplicate information from the documentation, it's
> also full of scary black magic and had some other issues.
> 
> [...]

Getopt::Euclid does look like an interesting concept, but I don't mind
using Getopt::Long::Descriptive if it is more reliable.  It appears to
be a pretty decent alternative to Getopt::Long and would probably let us
remove "sub syntax" (~70 lines) from the frontend.  Maybe even a bit
more with that validation stuff.


~Niels
Reply to:
References:
- Re: Stanford IDG internal Perl style
  - From: Russ Allbery <rra@debian.org>
Prev by Date: [SCM] Debian package checker branch, master, updated. 2.5.11-210-g0dce35e
Next by Date: Bug#704197: Please review: systemd checks
Previous by thread: Re: Stanford IDG internal Perl style
Next by thread: Re: Stanford IDG internal Perl style
Index(es):
- Date
- Thread