Package that quickly reports the closest line in a text file?
O' big brains o' bio informatics!
If I ask you nicely, may I please have the benefit
of your huge science brain thoughts?
Do you happen to know of a Debian package with a
nice, quick command line tool that says which line
of a text file most closely matches a string?
I imagine it having a syntax like grep:
$ stupendous-tool "hello world" file
But instead of
reporting every line in file containing the
string "hello world",
it would
return the line closest to "hello world".
For example, it might return the line "hello word".
Debian already has an approximate grep package
(tre-agrep), but, But, BUT!
a.) It uses a slow algorithm: the Levenshtein
distance and
b.) is limited to differences of 9 or fewer
characters.
I believe comparing long strings of DNA is a well
known chore in bioinformatics.
I read at
https://stackoverflow.com/questions/5859561/getting-the-closest-string-match/5859823
and
https://stackoverflow.com/questions/49263/approximate-string-matching-algorithms
that better algorithms are available.
Debian's packages named "ncbi-blast+" and "neobio"
look close, but I have no personal experience with
either.
My question?
Can you recommend a computationally efficient
Debian package that reports which line of a text
file most closely matches a string?
Thanks,
Kingsley
--
Time is the fire in which we all burn.
Reply to: