Stanford IDG internal Perl style

To: debian-lint-maint@lists.debian.org
Subject: Stanford IDG internal Perl style
From: Russ Allbery <rra@debian.org>
Date: Fri, 29 Mar 2013 10:24:32 -0700
Message-id: <[🔎] 87k3oqdsq7.fsf@windlord.stanford.edu>
This is the evolution of my original style document, much modified by
contact with Perl Best Practices.  There are some things here that I'd
personally do differently (I prefer omitting the parens around arguments
to Perl built-ins, for example), but this was the compromise reached
across the group among people with very different styles and different
"first" languages.

[[!meta title="Perl Coding Style"]]

In general, follow the rules in the perlstyle man page and Perl Best
Practices, except we use cuddled elses (else on the same line after }).
And, of course, follow rules below rather than perlstyle if they
conflict.

Note that these guidelines are for internal projects.  If we release
something as open source that needs to be compatible with Perl 5.8
rather than 5.10 (which the document guidelines assump), there are
the following exceptions:

* Do not use autodie.
* 'use base' instead of 'use parent'
* Do not include Stanford::Infrared::*.
* Other things to be added.


# General Guidelines

* Always use strict.

* Always use warnings.

* Always check the results of operations. For many functions this can
be done with 'use autodie' at the start of a script or module.  For
anything not covered by autodie, you should add 'or die "some error:
$!\n"' to the end of function calls that mail fail.  For print and say,
death checking is is provided by the Stanford::Infrared::Wrappers
module with the functions print_fh, say_fh, print_stdout, and
say_stdout.

* Global variables should be set at the top of the script and have
names in all caps. They must either be declared with my (recommended)
or with Readonly. All other variables should use my to restrict their
scope as appropriate.

* All Perl code should pass perl -wc cleanly.

* Don't use use English; it confuses experienced Perl programmers.

* Place a semicolon after every statement.

* Code in paragraphs, with statements serving one function grouped
together in a block and a comment explaining that block before.

* Factor out long expressions in the middle of statements.  It's better
to read three clear statements than to have them all combined into a
more complicated all-in-one longer line.

* Use perltidy and perlcritic.  Default [perltidyrc](perltidyrc) and
[perlcriticrc](perlcriticrc) files are available.

* On list and hash assignments more than one line, always have a comma
after the last item, and have the closing paren on a new line equal to
the starting line.  This helps if you need to add more items to the
list or hash in the future, or need to re-order the items.

        my %server_to_admin = (
            paradox  => 'jonrober',
            frankoz1 => 'adamhl',
            windlord => 'rra',
            mysql05  => 'sfeng',
        );

* Align items vertically for readability.

        my $uname    = 'jonrober';
        my $fullname = 'Jon Robertson';
        my $uid      = 1034;

* Dereference variables with arrows for readability.  (ie:
"$record->{name}".

* When you must dereference with a prefix, use braces to make it more
obvious how the variable is being interpreted.  (ie: "@{$list_ref}"
instead of "@$list_ref".

* Only use "" or other interpolating string delimiters for strings that
need them.  If you are printing out plain text with no special
characters, having "" can make the reader pause to look for
non-existant variables in the string.

* For clarity in some fonts, try to make single-character strings more
obvious as to their intent:

        ''      q{}
        ' '     q{ }
        ','     q{,}

* Use 'local' when you have to modify a package variable, or special
variable such as %SIG, %ENV, or $_.  This will localize any
changes to avoid unpleasant side-effects, and ensure that other things
are not also affecting the value you are using.


# Naming

* Use underscores to separate words in multiword identifiers.

* Modules should have mixed case, normally with each word starting
uppercase and otherwise lowercase.  Constants should be all uppercase,
and other variables should be all lowercase.

* When you have to abbreviate for a sensible variable name, do so by
cutting off the end rather than middle, to make a sensible prefix.
Make certain that the end result is actually unambiguous.  Avoid words
that are inherently ambiguous or form homophones, like 'record'.

* Name modules using Noun::Adjective::Adjective.
(Disk::DVD::Rewritable)

* Name scalar variables as [adjective_]*noun.  ($next_client, $total,
$final_total, $final_office_total).

* Name booleans after their associated test.  ($done_loading,
$found_bad_record, sub is_valid, sub invalid_record)

* Name hashes as %noun_to_noun, describing the mapping of objects
together, when it fits.  (%server_to_admin, %title_to_author)

* Name arrays in the plural.

* Name a scalar reference to hash or array with an ending _ref.

* Name functions meant to be only internal to a module and never used
outside it with an underscore.  This makes it obvious which functions
are meant to be exported or used by an object, and will be picked up by
POD coverage testing so that you do not have to document them.


# Formatting and Indentation

Don't use tabs in perl scripts, since they can expand differently in
different environments. In particular, please try not to use the mix of
tabs and spaces that is the default in Emacs.

Please follow these guidelines for spacing and formatting:

* Each block is indented four spaces. Continuations of commands should
be indented two more spaces unless parenthesized, or indented to line
up one space after the opening parentheses if parenthesized.

* Lines should be 79 characters or less. Continue long lines on the
next line. Continue long strings by breaking the string before a space
and using string concatenation (.) to combine shorter strings.

* Use a space between keywords (if, elsif, while) and functions and
parenthesized arguments. Do not put a space between the opening or
closing parentheses and the contents. For example:

        if ($foo) {
            bar ($baz);
        }

* Prefer the above statement on three lines to:

        if ($foo) { bar ($baz) }

* unless there are multiple consecutive conditions and actions that
you're aligning vertically. Whether to put if before or after the
statement depends on the logical flow of what you're testing; see
perlstyle for some guidelines.

* As mentioned in perlstyle, when derefencing, add a space between
curly brackets and a complex expresion, such as @{ $foo{bar} }. Also
add a space around operators (+, =, and so forth).

* Parentheses are optional around print, die, and warn.  Other
built-ins should include the parentheses, as should any user-defined
functions or methods.  The empty parenthesis should be admitted from
any method call taking no arguments, but should exist for regular
functions.  Any parentheses should have no space between the keyword
and the open parenthesis.

* map, grep, and sort are a special case; their first argument is a
code block and should be set off by { } with spaces between the
brackets and the code. This code block should never be within
parentheses.

* Break long if statements before an operator such as && or || or +,
not after as suggested in perlstyle. Starting the line with an operator
makes it easy to see its relation with the previous line without
excessive eye movement.


# Control Structures

* Only use postfix if or unless for flow control statements (next,
last, return, etc) where there is nothing or almost nothing between the
flow control statement and the if/unless.  Code between the flow
control statement and the if/unless can make the fact that it's a
conditional less obvious and confuse some users.

* Don't use postfix for, while, or until at all.

* Use for rather than foreach, for consistency.

* Avoid C-style for statements when possible.  Unless you have a need
for a subscript when iterating with for, operate on the object
directly instead.  (ie: "for my $item (@array)" rather than "for my $i
(0..@#array)".  If you do need to use a subscript, store the values in
temporary variables at the start of the loop rather than using the
subscript for the same values multiple times.

* In a loop, assign $_ to a value rather than using directly.  This
makes it clearer what data $_ contains, and makes it more obvious when
you are modifying a value.

* In a loop, try to do whatever checking for rejecting an iteration
(ie: "next if $line =~ m{^#};") at the start of the loop, to avoid
unneeded calculations. This also puts all of the conditions we skip an
iteration of the loop together, for better readability.

* Avoid overcascading an if.  Some code can use cascading if and elses
to the point where it's not obvious what the if actually does.  Using a
table lookup, or given/when, can simplify by putting all of the logic
together where it's easier to read.

* Don't use the ternary operator (?:) except on the right-hand side of
an assignment. Specifically, it is not a replacement for an if/then
block. It's harder for inexperienced programmers to read when used that
way and it looks odd.


# Subroutines

* Avoid prototypes.

* Unless there's some exceptional reason not to do so (such as needing
to pass unknown partial parameters on to another function, or when
implementing AUTOLOAD routines), the first line of any Perl function
should be of the form for whatever variables that function takes. This
documents the parameters in a consistent way.

        my ($var1, $var2) = @_;

* You should then immediately resolve default values if needed, and
check for any missing arguments (with defined or exists) that you need
to validate.  This puts all the default argument checking together.

* If you have more than three paramaters in a subroutine, use a hash
instead.  This helps avoid recalling the correct order of variables
being passed to the hash, and sets arguments more apart from other
local variables in the function itself.

* Always return with an explicit return, to make certain that you are
not returning an implicit value that may not be what you expect.  Use a
bare return to return failure, as 'return undef' will not do what you
expect if the function was called in a list context.

* Each function should have a leading comment that briefly describes
what the function does and what arguments it takes.  The format should
be the following:

        # The first area of a pre-function comment is a free-form description of what
        # the sub does, using however many lines as needed and written in normal
        # English sentences, but not repeating the information included below.
        #
        # For complicated subs, it may be multiple paragraphs.
        #
        # $param   - Description
        # $another - Description of the other
        # @stuff   - Description of the array values
        #
        # Returns: return value
        #  Throws: Description of any exceptions thrown
        sub example {


# I/O

* Rather than using bareword filehandles, assign the filehandle to a
variable.

        open(my $fh, '<', $fname);
        while (my $line = <$fh>) { }
        close($fh)

* Use the three-argument form of open, as above, rather than combining
the second and third arguments.  It's more readable and avoids worries
about the contents of the filename having special characters.

* Always close filehandles explicitly, as soon as possible after you're
done with the file.

* Use autodie to make certain that failures on open or close are fatal.


# Regular Expressions

* Use m{} as the regular expression delimiter.  // can cause problems
with filenames and needing to be escaped, while {} is very clear and
not in normal use.

* Use m{}xsm for matches by default.  x is very useful for readability
in longer regular expressions.  m will match multiline strings better,
and s will improve the handling of "." to match newlines in any
multiline string.  In many cases they aren't needed, but we want to
encourage them as a default.

* Always pull match captures into named variables as soon as possible.
$1, $2, etc are not that useful for understanding the actual content of
the variable, and so are less readable.  Also, make sure that a match
has succeeded before trying to do this, as the numbered match variables
will only reset on a successful match.  If you use them without making
sure a match was successful, you can end up using values from an
earlier match.

* Do not use capturing parenthesis when you're not intending to
capture, as that can be confusing.  If you need parens for grouping but
not capturing, use "(?:...)" instead of "(...)".

* If only using a regular expression for the case-insensitive i flag,
it's more efficient to do a string match instead.  "if (lc $var eq
'help')" rather than "if ($var =~ m{^help$}i)"


# Output

* Use warn rather than print for debugging.

* When writing modules, use carp and croak via the Carp module rather
than warn and die.

* In a print or say statement, always put filehandles in braces to
better call out that they are a filehandle printed to rather than a
variable we are printed.  Also always check the return value of the
print statement in order to verify that there were no problems.  These
are both handled by Stanford::Infrared::Wrappers and the functions
print_fh, say_fh, print_stdout, and say_stdout.

* For short multiline strings, form them over multiple lines with an
explicit "\n" at the end of each, rather than leaving open quotes over
multiple lines.  This is more readable.

* For longer multiline strings, use a heredoc.  However, you should not
put heredocs in the middle of normal code, as they break the visual
flow.  Either define them during the general declarations area of the
program, or put them in their own subroutine.  Heredocs should have the
terminator be quoted, to make it obvious about whether you're intending
to interpolate values within or not.

* Name heredocs as one unspaced string, in all caps with a standard
prefix or other name.  This makes the end of the heredoc stand out more
against the heredoc itself.


# Documentation

* All scripts must have POD documentation including a NAME section
suitable for turning into a man whatis entry, a SYNOPSIS section giving
the options for invoking the script, and a DESCRIPTION section that
explains what the script does. Any script that takes command-line
options must have an OPTIONS section documenting them. Follow the
layout in the pod2man man page.

* Similarly, all modules must have POD documentation including NAME,
SYNOPSIS, and DESCRIPTION sections. Preferrably, modules should also
have a METHODS or FUNCTIONS section describing each interface provided.
POD documentation for modules may be collected at the end of the module
or interspersed with the function or method definitions as preferred.

* All options must be documented in the OPTIONS section of the POD
documentation. List the short and long options in the =item header
separated by a comma and space.

* Do not use interspersed POD to document functions.  All POD should
start after the code is done, with an __END__ statement before it.

* Section headings should be 78 # characters, a line starting with one
or more #s and a space giving the contents of the section (no need to
capitalize every word here), and then another line of 78 # characters.
This heading style doesn't require adjustment or realignment after
changing the text of the heading.

        #############################################################################
        # Database functions
        #############################################################################


# Writing Modules

* When putting code in more than one script, strongly consider moving
it to a module.

* Use Exporter to export functions, and only do so by request (via
@EXPORT_OK).  Exporting all functions by default can clutter namespaces
and lead to conflicts if a module and program add new functions.  The
recommended way to do module inheritence with Perl 5.10 and later would
be to use Exporter by:

        use parent qw(Exporter);

* Module variables should be internal to the module, and not set
directly save in tests.

* If creating an OO module, use standard names for accessors. (ie: new
as a creator, destroy to clean up an object).


# Using Modules

* Prefer to use core modules when available, then Debian-packaged CPAN
modules, then other CPAN modules.

* Use Getopt::Long::Descriptive for option parsing in any script that
takes options. Give each option a corresponding single-character option
unless it's rarely used. When listing the options, give the short
option first, then |, and then the long option, as in:

        my ($opt, $usage = describe_options(
          '$0 $o <args>',
          [ 'd|delete', 'delete the given object' ],
          [ 'e|exclude=s', 'exclude any matching names' ],
          [ 'h|help',      'display help' ],
        ) or exit 1;

* For loading configuration files, use Config::Any in most situations.
Some older files use AppConfig, but Config::Any is the current
standard.

* Use Test::More for writing tests.  It provides a number of useful
checking functions, and works with other Perl test harness setup very
well, allowing us to write standard tests that integrate well with many
tools.

* Look at Scalar::Util and List::Util for useful functions that can
simplify some needs.  Both are in core.  List::MoreUtils is not in
core, but also contains useful functions.

* If you need to cache, use the Memoize module.  It allows you to cache
entire functions easily without changing the actual function, and with
Memoize::Expire, create your own expiration policies for the cache.

* Look at the Stanford::Infrared::* modules to see if they cover any
generalized cases you need, or to see if there are any cases you think
they should cover.  Stanford::Infrared::General covers general needs
used in several places, with functions specific to Stanford ITS
infrastructure.  Stanford::Infrared::MySQL covers interfacing with the
remctl commands in stanford-server-mysql, and other MySQL setup
specific to that infrastructure.  Stanford::Infrared::Wrappers covers
things that are not as specific to Stanford, but are useful for our
needs (such as an IPC::Open3 wrapper).


# Recommendations

These are optional rules that you may break when you have good reason,
but should try to adhere to whenever possible.

* Scripts should support -h and --help options that displays usage
information for the script, via Getopt::Long::Descriptive. They should
also support -m, --man, and --manual options to show the full
documentation for the script, generally by running perldoc -t on the
script.

* Include the name of the script in error messages. This is
particularly important for scripts run from cron or other scripts,
although it's useful to have in general and never hurts. The easiest
way to do this is to put something like the following.  You should then
prefix any warn, die, carp, or croak messages with '$0: ', and use $0
in any messages that warn of incorrect option usage.  If you use this
format, the first character of the rest of the message should not be
capitalized.

        my $fullpath = $0;
        $0 =~ s{ ^ .* / }{}xms;

* You can then use $fullpath as the path to the script for perldoc.

* When writing SQL code, set AutoCommit to 0 and RaiseError to 1 and
then use the following pattern:

        eval {
            my $sql = '...';
            $dbh->do ($sql, undef, $param1, $param2);
            $dbh->commit;
        };
        if ($@) {
            $dbh->rollback;
            # Other error handling goes here.
        }

* Always use placeholders and bind values (see the DBI man page)
whenever possible so that variable values will be escaped properly,
preventing SQL injection attacks.

* If using extensive SQL, you might instead wish to use DBIx::Class,
though it can be overkill for short occasional scripts.

* It's generally a good idea to set 'STDOUT->autoflush' near the top of
the script to flush output after every print so that regular output and
error messages are intermixed in the proper sequence. This should be
omitted, however, if the script produces copious output and benefits
from system I/O buffering.

* Include an AUTHOR or AUTHORS section as the final section in the POD
file to document the maintainer and previous authors of the script or
module.

* Don't leave commented-out code in production scripts or modules;
instead, just delete it. Retrieving no-longer-used code that may be
needed again later is what the revision control system is for. An
exception may be places where the code is useful as part of a comment,
in which case indent it four spaces relative to the surrounding text.

* my $o = Module::Name->new, not my $o = new Module::Name. new is a
method and the second form uses magical barewords and lots of other
squirrelly areas of Perl syntax and tends to break in odd and
surprising ways if you need to pass arguments and have more complex
syntax. The first form always works and is clear that you're calling a
class method.


# Editor Configuration

The recommended Emacs configuration, suitable for including in your
.emacs file, is:

        (setq cperl-close-paren-offset -4)
        (setq cperl-continued-statement-offset 2)
        (setq cperl-indent-level 4)
        (setq cperl-indent-parens-as-block t)
        (setq cperl-lineup-step 1)

The section headings may be generated with the following Emacs function.

        ;; Insert a header for a Perl program (using # comment notation).
        (defun rra-insert-script-header (header)
          "Insert my Perl section header, prompting for the header content."
          (interactive "sHeader: ")
          (insert (make-string 78 ?#) "\n# " header "\n"
                  (make-string 78 ?#) "\n"))

You may want to bind that to a key.

The recommend vim configuration, suitable for including in your .vimrc
file, is:

        set tabstop=4
        set expandtab

-- 
Russ Allbery (rra@debian.org)               <http://www.eyrie.org/~eagle/>
Reply to:
Follow-Ups:
- Re: Stanford IDG internal Perl style
  - From: Niels Thykier <niels@thykier.net>
Prev by Date: Bug#704197: Please review: systemd checks
Next by Date: [SCM] Debian package checker branch, master, updated. 2.5.11-197-g69e0ac5
Previous by thread: [SCM] Debian package checker branch, master, updated. 2.5.11-196-gf399c98
Next by thread: Re: Stanford IDG internal Perl style
Index(es):
- Date
- Thread