[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: delete lines that contain duplicated column items



On Tue, Apr 03, 2007 at 08:19:10PM +0800, Jeff Zhang wrote:
> I have a simple txt file, like:
> ...
> a a
> aa a
> b b
> ba b
> ...
> 
> I want to just keep the lines that first appeared in column 2 and delete the follow lines that
> contain duplicated ones in column 2.
> then it will like:
> ...
> a a
> b b
> ...
> 
> I've tried `uniq -f1`  but it didn't work.

awk's associative arrays (or perl's hashes) are good for this
sort of thing, e.g.,

  $ awk '!seen[$2]{print; seen[$2]=1}' < file

awk scans input line by line, checks for match condition(s), and 
performs the associated actions.  Here, if the 2nd column value hasn't
been seen, print it and record it as seen. 

-- 
Ken Irving



Reply to: