[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: FOSS tool to do general stats from text indata




Emanuel Berg wrote:
> Is there a CLI and FOSS tool that creates stats from text
> indata - e.g.,
>
> $ txt2stats path/to/indata/*.txt
>
> I mean a general tool, but with options to tweak the report
> included, of course.

As "stats" is a grab bag larger inside than the Tardis, I suspect that
only on that other ship with the infinite improbability drive is a stats
babelfish interpreter to be found.

For the last 30+ years, I've just thrown together a few lines of Awk
to generate the initially required stats, then tweaked the C-like code
and regexes to add the inevitable nice-to-haves. Some result is
immediate, and dissatisfaction with completeness motivates the
tweaking/temporary_satisfaction cycle. Options are limitless, as is
needed for an undefined task.

There is no need for looping code; just a list of:
/pattern/  {action}
statements is sufficient.
BEGIN  {action}   # Runs first
END      {actions} # Is where you postprocess and print.

Awk's associative arrays take string subscripts, so

/elephant/   { animals[elephant]++ }

accumulates that stat. If you have prefilled the array
animal_list with the names all animals of interest, then
in an action,

for (i=1;i<=NF;i++)      # Iterate over the line's fields.
(  if ( $i in animal_list) animals[$i]++ )

should accumulate a frequency histogram of 'em all.
Job done. In essentially one line of script.

A quick search for "GAWK: Effective AWK Programming"
should snarf more know-how than most folk desire.

And if you'd like to run it as a daemon, crunching data
coming from a coprocess, there's gawkinet.

It does not seem worthwhile to wade into a swamp after
alligators, shod only in ill-fitting boots made for someone
else. Go for one with steel toecaps and the Swiss army
knife in the heel.
 
Good luck.

Erik

P.S. It's 15 years since I did this stuff for money, so it's
worth checking the syntax of the old wetware dredgings,
above.

Reply to: