[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#457778: ITP: blm -- compute set operations on line-oriented files: and, or, xor, and more.



Hi Florian,

Thanks for letting me know about comm.  I appreciate the streaming
(not having to
read the whole file in memory) capability of comm which blm cannot duplicate.
You have given me a good opportunity to explain the benefits and disadvantages
of blm versus comm.

1) comm works only for exactly two input files.  blm works for
1-arbitrarily many files.

2) blm supports four operations: and, or, exclusive-or, and set-difference.
comm does not support exclusive-or directly (it requires column merge).

3) comm always produces three columns, even when you want only one answer list.
but comm uses tabs (\t) to delimit columns.  With comm this means if
input files contain
tabs, the output file columns will be misaligned and break in further
shell processing.

blm is optimized in that it always provides exactly one (simple)
column output regardless of
how the file uses (or does not use) whitespace like tabs.  I consider
this simpler than the
behavior of comm which uses tabs to delimit (not my taste) and does so
without explicit
mention in the manual page.

4) comm assumes input files are sorted.  blm does not and sorts output
automatically.

Merry Christmas,

Rudi

On Dec 25, 2007 2:29 PM, Florian Weimer <fw@deneb.enyo.de> wrote:
> * Rudi Cilibrasi:
>
> > * Package name    : blm
> >   Version         : 0.9.0
> >   Upstream Author : Rudi Cilibrasi <cilibrar@debian.org>
> > * URL             : http://cilibrar.com/~cilibrar/projsup/blm-0.9.0.tar.gz
> > * License         : BSD
> >   Programming Lang: C++
> >   Description     : compute set operations on line-oriented files: and, or, xor, and more.
> >
> > Line oriented files may be considered to represent sets of strings.
> > blm computes set intersection, set union, set difference, and set
> > "exclusive or" via a convenient command-line interface.  blm stands
> > for Boolean line manipulator.  blm lets you easily and quickly answer
> > questions like which lines appear in essential.txt and wanted.txt but
> > not prevented.txt.
>
> Uhm, sort & comm is probably faster and also works when the input files
> are larger than the available address space.
>



-- 
"Our lives are determined by what we pay attention to; the quality of
our lives is determined by the quality of our attention." -- Michael
Wells



Reply to: