[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: First approximation of source line count for potato

Lars wrote:
> I wrote a new script and ran it on a potato mirror a couple of days
> old. The temporary results are in http://liw.iki.fi/liw/foo.html
> (WATCH OUT! it's an 800 kilobyte table, which is quite slow to display
> on Netscape). The totals are below:
>       Files  Size      Lines    AWK  C       C++    Perl  Python
>       714315 7497103.3 228096.6 39.0 80457.0 7500.0 693.0 595.0
> Size is in kilobytes, line counts are in units of 1000 lines. That is,
> there are about 7.5 gigabytes of files in source packages, making about
> 230 million lines, of which about 80 million lines are C.
> If anyone has suggestions for better statistics, let me hear them.

I finished off my line counter enough to get a second measurment.
My counter also works off of the file suffixes.  I covered main,non-free,
and contrib.  

We correlate well for C and C++, it was closer before I put in contrib and
non-free.  I'm guessing Lars did just main?  The other file types are fuzzier
for suffixes and we apparently disagreed.

One large note, my total line count is much smaller, 154M agains 228M.
Many things I skipped as non-source were evidently counted by Lars.

Interested folk can drill over to http://folk.federated.com/~jim/debcount/
see the language breakdown, see how their favorite packages fared, and
suggest new regexps so their favorite baby gets counted correctly.
I will set this up to regenerate nightly after the archive updates.

The current tally sits like...

  1,212,171  ASM
        837  Audio  <-- files, not lines
 86,469,798  C
  8,038,974  C++
 29,682,236  Docs
      6,280  DscFile
  2,039,701  Fortran
     66,539  Image  <-- files, not lines
  3,218,926  Makefiles
    403,837  Pascal
      1,408  Postscript  <-- files, not lines
     25,926  SQL
  8,051,677  Shell
    748,732  TCL
     42,214  awk
     79,362  debian/ <-- not in a diff
  5,306,637  diff    <-- just the debian .diff files
    631,862  java
  4,727,692  lisp
  1,630,342  misc
  1,491,742  perl
     59,427  sed
153,936,320  Grand Total

                                     Jim Studt, President
                                     The Federated Software Group, Inc.

Reply to: