Re: Slickest way to do "joins"/lookups in text files
On Fri, Dec 17, 2004 at 08:39:37PM -0500, William Ballard wrote:
> In SQL you join tables. If it's indexed it's fast.
> Can you do this using text files from the prompt?
>
> Example:
> I'd like to investigate the most popular RSS readers Debian offers.
>
> $apt-cache search rss | cut -d' ' -f1 > rss.txt
> $wget http://popcon.debian.org/main/by_vote.gz
Yes. There is a standard unix tool called "join" that does it.
The syntax is a bit strange, but for example:
join -t $'\t' -j1 1 -j2 1 file1 file2
This will join two TSV files on their first column.
For example, to solve your problem, you might do this:
zgrep -v '^#' by_vote.gz | perl -pe '/^-/ && exit' |
perl -pe 's/ +/\t/g; 1 while s/(\(.*?)\t/$1 /' >by_vote.1
to clean it up into TSV, then:
<by_vote.1 sort -t $'\t' -k2,2 >by_vote.2
to sort it by the second column (package name), then:
join -t $'\t' -j1 1 -j2 2 rss.txt by_vote.2 > rss_votes.tsv
and to make it human-readable again:
< rss_votes.tsv tsv2txt > rss_votes
tsv2txt is a script I wrote, you can get it at:
http://nipl.net/hacks/tsv2txt
There is a program called "nosql" which provides many relational functions
for TSV files, it's quite good. Also, I have written several other scripts
that do useful database-like things with TSV files.
Sam
Reply to: