Re: Stanford IDG internal Perl style

To: debian-lint-maint@lists.debian.org
Subject: Re: Stanford IDG internal Perl style
From: Niels Thykier <niels@thykier.net>
Date: Sat, 30 Mar 2013 00:16:58 +0100
Message-id: <[🔎] 515620EA.70104@thykier.net>
In-reply-to: <[🔎] 87k3oqdsq7.fsf@windlord.stanford.edu>
References: <[🔎] 87k3oqdsq7.fsf@windlord.stanford.edu>
On 2013-03-29 18:24, Russ Allbery wrote:
> This is the evolution of my original style document, much modified by
> contact with Perl Best Practices.  There are some things here that I'd
> personally do differently (I prefer omitting the parens around arguments
> to Perl built-ins, for example), but this was the compromise reached
> across the group among people with very different styles and different
> "first" languages.
> 

Great :)  I was just considering whether to codify a Lintian style guide.

> [[!meta title="Perl Coding Style"]]
> 
> In general, follow the rules in the perlstyle man page and Perl Best
> Practices, except we use cuddled elses (else on the same line after }).
> And, of course, follow rules below rather than perlstyle if they
> conflict.
> 

I don't think I have read the Perl Best Practises (assuming its the
O'Reilly one[1]), so I have to come back on that.

[1] http://refcards.com/docs/vromansj/perl-best-practices/refguide.pdf

> Note that these guidelines are for internal projects.  If we release
> something as open source that needs to be compatible with Perl 5.8
> rather than 5.10 (which the document guidelines assump), there are
> the following exceptions:
> 

s/assump/assume/ ?

> * Do not use autodie.
> * 'use base' instead of 'use parent'
> * Do not include Stanford::Infrared::*.
> * Other things to be added.
> 
> 
> # General Guidelines
> 
> [...]
> 
> * Always check the results of operations. For many functions this can
> be done with 'use autodie' at the start of a script or module.  For
> anything not covered by autodie, you should add 'or die "some error:
> $!\n"' to the end of function calls that mail fail.  For print and say,
> death checking is is provided by the Stanford::Infrared::Wrappers
> module with the functions print_fh, say_fh, print_stdout, and
> say_stdout.
> 

s/is is/is/;

I have been considering to use autodie before (possible exception being
system/exec where it is always not compatible).  I tend to prefer
exception based I/O instead of checking return values (which gets old
really soon), so I could be convinced to use autodie by default.  Too
bad it does not cover all cases (print et al).

> [...]
> 
> * Factor out long expressions in the middle of statements.  It's better
> to read three clear statements than to have them all combined into a
> more complicated all-in-one longer line.
> 

Is this:

   my $result = part1
      + part2
      + part3;

 vs:

  my $result = part1 + part2 + part3;

(where partX is a "complex" computation)  Or is it about splitting it
into temporary variables?

> * Use perltidy and perlcritic.  Default [perltidyrc](perltidyrc) and
> [perlcriticrc](perlcriticrc) files are available.
> 

If we can codify our style in a fashion these can check for that would
definitely help in maintaining the style.  Also, we have a "perlcritic"
check already but it is skipped by default (and I never remember to set
the proper variables to enable it... >.>)

> [...]
> 
> * Dereference variables with arrows for readability.  (ie:
> "$record->{name}".
> 

Also in "chaining"?
  $record->{$last_name}->{$first_name}
  $matrix->[$i]->[$j]

vs

  $record->{$last_name}{$first_name}
  $matrix->[$i][$j]


> [...]
> * For clarity in some fonts, try to make single-character strings more
> obvious as to their intent:
> 
>         ''      q{}
>         ' '     q{ }
>         ','     q{,}
> 

I do not think I understand this one.  Is it to use the second column
variants instead of the first?  I don't see how any of those expresses
their "intent" more than the other (but I might be a missing a reference
to the Perl Best Practises here?).

> [...]
> 
> # Naming
> 
> [...]
> 
> * Name modules using Noun::Adjective::Adjective.
> (Disk::DVD::Rewritable)
> 

Mmm... Is that strictly "one Noun followed by adjective(s)".  Also the
example appear to be "Noun::Noun::Adjective" (or is a DVD an
adjective?).  Anyway I assume "namespacing" is allowed (e.g.
Lintian::$module)

> [...]
> 
> # Formatting and Indentation
> 
> Don't use tabs in perl scripts, since they can expand differently in
> different environments. In particular, please try not to use the mix of
> tabs and spaces that is the default in Emacs.
> 
> Please follow these guidelines for spacing and formatting:
> 
> * Each block is indented four spaces. Continuations of commands should
> be indented two more spaces unless parenthesized, or indented to line
> up one space after the opening parentheses if parenthesized.
> 

So if parentheses are used it should be lined up as... ?

  really_long_name(argument1,
                    argument2,
                    argument3)

  if (really_long_condition1
       && really_long_condition2
       && really_long_condition3) {

Or does this rule not apply to sub arguments?

> * Lines should be 79 characters or less. Continue long lines on the
> next line. Continue long strings by breaking the string before a space
> and using string concatenation (.) to combine shorter strings.
> 
> * Use a space between keywords (if, elsif, while) and functions and
> parenthesized arguments. Do not put a space between the opening or
> closing parentheses and the contents. For example:
> 
>         if ($foo) {
>             bar ($baz);
>         }
> 

In your comment to #704197, you suggested you had dropped the space
between the function/sub name and its parentheses?  i.e.


        if ($foo) {
            bar($baz);
        }

Or did I misunderstand you here?

> [...]
>
> * Parentheses are optional around print, die, and warn.  Other
> built-ins should include the parentheses, as should any user-defined
> functions or methods.  The empty parenthesis should be admitted from
> any method call taking no arguments, but should exist for regular
> functions.  Any parentheses should have no space between the keyword
> and the open parenthesis.
> 

s/admitted/omitted/ ?

> * map, grep, and sort are a special case; their first argument is a
> code block and should be set off by { } with spaces between the
> brackets and the code. This code block should never be within
> parentheses.
> 

Is that still map(EXPR, LIST) for the alternative call variant?


> [...]
> 
> 
> # Control Structures
> 
> * Only use postfix if or unless for flow control statements (next,
> last, return, etc) where there is nothing or almost nothing between the
> flow control statement and the if/unless.  Code between the flow
> control statement and the if/unless can make the fact that it's a
> conditional less obvious and confuse some users.
> 

I might be inclined to use postfix if/unless for "tag" and die as well?

> [...]
> 
> 
> # Subroutines
> 
> [...]
> * Always return with an explicit return, to make certain that you are
> not returning an implicit value that may not be what you expect.  Use a
> bare return to return failure, as 'return undef' will not do what you
> expect if the function was called in a list context.
> 

exception being one-line anonymous subs/blocks?

> * Each function should have a leading comment that briefly describes
> what the function does and what arguments it takes.  The format should
> be the following:
> 
> [...]
> 

This one sounds like double work with the POD documentation requirement
as well (except for internal functions)?

> 
> [...]
> # Regular Expressions
> 
> [...]
> * Use m{}xsm for matches by default.  x is very useful for readability
> in longer regular expressions.  m will match multiline strings better,
> and s will improve the handling of "." to match newlines in any
> multiline string.  In many cases they aren't needed, but we want to
> encourage them as a default.
> 

Given how rarely we seem to be maching multiline strings, I am not sure
it makes sense to use this rule in Lintian?

> [...]
> 
> # Documentation
> 
> [...]
> 
> * Do not use interspersed POD to document functions.  All POD should
> start after the code is done, with an __END__ statement before it.
> 
> [...]
> 

I have never quite understood this; especially not with the requirement
of having a second description at the function itself.

> # Writing Modules
> 
> [...]
> 
> * Use Exporter to export functions, and only do so by request (via
> @EXPORT_OK).  Exporting all functions by default can clutter namespaces
> and lead to conflicts if a module and program add new functions.  The
> recommended way to do module inheritence with Perl 5.10 and later would
> be to use Exporter by:
> 
>         use parent qw(Exporter);
> 

Why this over "use Exporter qw(import);", which has been possible since
perl 5.8.3?

> [...]
> 
> # Using Modules
> 
> [...]
> 
> * Use Getopt::Long::Descriptive for option parsing in any script that
> takes options. Give each option a corresponding single-character option
> unless it's rarely used. When listing the options, give the short
> option first, then |, and then the long option, as in:
> 
>         my ($opt, $usage = describe_options(
                          ^
Missing a ")" ?

>           '$0 $o <args>',
>           [ 'd|delete', 'delete the given object' ],
>           [ 'e|exclude=s', 'exclude any matching names' ],
>           [ 'h|help',      'display help' ],
>         ) or exit 1;
> 

Perhaps we should convert Lintian to use this?  If only so I don't have
to remember to update the usage with changes to cmd options. :P

> [...]
> 
> * If you need to cache, use the Memoize module.  It allows you to cache
> entire functions easily without changing the actual function, and with
> Memoize::Expire, create your own expiration policies for the cache.
> 

Got anything on lazy loading stuff?  L::Collect is basically one big
lazy-loader and it could do without requiring us to implement the lazy
loading itself.  Maybe some smart use of Moose?

> * Look at the Stanford::Infrared::* modules to see if they cover any
> generalized cases you need, or to see if there are any cases you think
> they should cover.  Stanford::Infrared::General covers general needs
> used in several places, with functions specific to Stanford ITS
> infrastructure.  Stanford::Infrared::MySQL covers interfacing with the
> remctl commands in stanford-server-mysql, and other MySQL setup
> specific to that infrastructure.  Stanford::Infrared::Wrappers covers
> things that are not as specific to Stanford, but are useful for our
> needs (such as an IPC::Open3 wrapper).
> 

A URL for these ... Yeah, too lazy to google for it. :)

> 
> # Recommendations
> 
> [...]
> 
> * Include the name of the script in error messages. This is
> particularly important for scripts run from cron or other scripts,
> although it's useful to have in general and never hurts. The easiest
> way to do this is to put something like the following.  You should then
> prefix any warn, die, carp, or croak messages with '$0: ', and use $0
> in any messages that warn of incorrect option usage.  If you use this
> format, the first character of the rest of the message should not be
> capitalized.
> 
>         my $fullpath = $0;
>         $0 =~ s{ ^ .* / }{}xms;
> 
> [...]
> 


Alternatively, basename $0.  What do you do about croak/exceptions from
module functions ?

  eval { function(....); }
  if ($@) {
	die "$0: $@";
  }

Or do you use "$0" inside function where it calls croak?



~Niels
Reply to:
Follow-Ups:
- Re: Stanford IDG internal Perl style
  - From: Niels Thykier <niels@thykier.net>
References:
- Stanford IDG internal Perl style
  - From: Russ Allbery <rra@debian.org>
Prev by Date: [SCM] Debian package checker branch, master, updated. 2.5.11-197-g69e0ac5
Next by Date: [SCM] Debian package checker branch, master, updated. 2.5.11-201-ge323d72
Previous by thread: Stanford IDG internal Perl style
Next by thread: Re: Stanford IDG internal Perl style
Index(es):
- Date
- Thread