[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: FOSS tool to do general stats from text indata



On Sat, Jun 24, 2023 at 10:00:05PM +0200, Emanuel Berg wrote:
> tomas wrote:
> 
> >> Is there a CLI and FOSS tool that creates stats from text
> >> indata - e.g.,
> >> 
> >>   $ txt2stats path/to/indata/*.txt
> >> 
> >> I mean a general tool, but with options to tweak the report
> >> included, of course.
> >
> > If you can bear some tweaking, R is it.
> 
> Sure! Let's run R on this e-mail. Does it work and if so, what
> does it say?

T a generic question -- a generic answer. I don't even know what
you mean by "general stats" -- the sports example you put in the
other mail suggests that you want statistics gathered about a
subject from written text: this is far more than "just" stats
and involves "understanding texts written in human languages",
another big can of worms (which has become somewhat fashionable
as of late).

If it's text statistics, good statistics packages have lots of
resources. R is a good statistics package with a big community,
so it has:

  https://towardsdatascience.com/a-light-introduction-to-text-analysis-in-r-ea291a9865a8?gi=001414a39e96
  https://www.r-bloggers.com/2021/02/text-analysis-with-r/
  https://bookdown.org/jdholster1/idsr/text-analysis.html
  https://m-clark.github.io/text-analysis-with-R/intro.html
  https://towardsdatascience.com/r-packages-for-text-analysis-ad8d86684adb?gi=4a426e671fe6
  https://www.springboard.com/blog/data-science/text-mining-in-r/
  https://m-clark.github.io/text-analysis-with-R/string-theory.html

That said, there are others. In the Python galaxy, there is
the Natural Language Toolkit

  https://www.nltk.org/

But your question was posed in a way that I don't even know
whether I'm wasting our both times with this answer.

Cheers
-- 
t

Attachment: signature.asc
Description: PGP signature


Reply to: