Re: FOSS tool to do general stats from text indata
Emanuel Berg wrote:
> Is there a CLI and FOSS tool that creates stats from text
> indata - e.g.,
>
> $ txt2stats path/to/indata/*.txt
>
> I mean a general tool, but with options to tweak the report
> included, of course.
As "stats" is a grab bag larger inside than the Tardis, I suspect that
only on that other ship with the infinite improbability drive is a stats
babelfish interpreter to be found.
For the last 30+ years, I've just thrown together a few lines of Awk
to generate the initially required stats, then tweaked the C-like code
and regexes to add the inevitable nice-to-haves. Some result is
immediate, and dissatisfaction with completeness motivates the
tweaking/temporary_satisfaction cycle. Options are limitless, as is
needed for an undefined task.
There is no need for looping code; just a list of:
/pattern/ {action}
statements is sufficient.
BEGIN {action} # Runs first
END {actions} # Is where you postprocess and print.
Awk's associative arrays take string subscripts, so
/elephant/ { animals[elephant]++ }
accumulates that stat. If you have prefilled the array
animal_list with the names all animals of interest, then
in an action,
for (i=1;i<=NF;i++) # Iterate over the line's fields.
( if ( $i in animal_list) animals[$i]++ )
should accumulate a frequency histogram of 'em all.
Job done. In essentially one line of script.
A quick search for "GAWK: Effective AWK Programming"
should snarf more know-how than most folk desire.
And if you'd like to run it as a daemon, crunching data
coming from a coprocess, there's gawkinet.
It does not seem worthwhile to wade into a swamp after
alligators, shod only in ill-fitting boots made for someone
else. Go for one with steel toecaps and the Swiss army
knife in the heel.
Good luck.
Erik
P.S. It's 15 years since I did this stuff for money, so it's
worth checking the syntax of the old wetware dredgings,
above.
Reply to: