[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#738342: lintian: checks/cruft - GFDL check is slow



Package: lintian
Version: 2.5.21
Severity: normal

A quick benchmark suggests that lintian spends nearly 2 minutes on the
Linux source package (I tested with linux/3.10~rc7-1~exp1).  Profiling
Lintian with perl -d:NYTProf suggests that the vast majority of the time
is spent in:

"""
            if ($cleanedblock =~ $gfdlpattern) {
"""

Where $gfdlpattern is one of:

"""
            # classical gfdl matching pattern
            my $normalgfdlpattern = qr/
                 (?'contextbefore'(?:
                    (?:(?!a \s+ copy \s+ of \s+ the \s+ license \s+ is).){1024}|
                    (?:\s+ copy \s+ of \s+ the \s+ license \s+ is.{0,1024}?)))
                 gnu \s+ free \s+ documentation \s+ license
                 (?'rawgfdlsections'(?:(?!gnu \s+ free \s+ documentation \s+ license).){0,1024}?)
                 a \s+ copy \s+ of \s+ the \s+ license \s+ is
                /xsmo;

            # for first block we get context from the beginning
            my $firstblockgfdlpattern = qr/
                 (?'rawcontextbefore'(?:
                    (?:(?!a \s+ copy \s+ of \s+ the \s+ license \s+ is).){1024}|
                  \A(?:(?!a \s+ copy \s+ of \s+ the \s+ license \s+ is).){0,1024}|
                    (?:\s+ copy \s+ of \s+ the \s+ license \s+ is.{0,1024}?)
                  )
                 )
                 gnu \s+ free \s+ documentation \s+ license
                 (?'rawgfdlsections'(?:(?!gnu \s+ free \s+ documentation \s+ license).){0,1024}?)
                 a \s+ copy \s+ of \s+ the \s+ license \s+ is
                 /xsmo;
"""


The profiler suggests that 60% of the runtime is spent in the
"CORE:match" operations inside "license_check" from c/cruft.  The
regex appeas to be hit "only" 2452 times, but it spends an average of
55.9ms per time totalling 137s.

Bastian, do you have an ideas for reducing the cost of the regex?

~Niels


Reply to: