[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: request for review: lbzip2

László Érsek wrote:
>> Meanwhile the archives already have pbzip2, "parallel bzip2
>> implementation".  As far as I can see lbzip2 is just another
>> pthreads-using bzip2 indistinguishable from pbzip2.
> This is factually wrong. If this was the case, I would have never asked  
> for a sponsor, and lbzip2 wouldn't have gotten past my sponsor. See [0].

I actively try to forget detailed background knowledge I acquired
elsewhere when I'm evaluating a package description; it needs to be
intelligible on its own. 

>> There's nothing in either package description that would help me decide 
>> which to install, except that pbzip2 gives me the hint that there's no  
>> point having either on my old uniprocessor desktop.
> The long description of lbzip2 details the exact performance gap left by  
> pbzip2 that lbzip2 covers.

It compares lbzip2 to "standard bzip2"; pbzip2 isn't mentioned.

> There's a use for lbzip2 on single-core machines too, because it contains
> internal buffering for single-worker modes too. Try to tar+bz2 a large  
> source tree, like the kernel tree, and watch the processor load on some  
> desktop applet closely. bzip2 reads (per default) in blocks of around  
> 900K, then goes to work for a long time and doesn't read, doesn't write  
> during that period. Then it emits the compressed data.

Does this mean it's also taking advantage of my PC's 21st-century

> [...]
> Notice how your (perfectly valid) remark:
>> There's nothing in either package description that would help me decide
>> which to install
> could be fixed by copying *more* technical information from the  
> documentation into the long package description, while Ben advises 
> exactly against that.

More technical information about the program's internal workings
wouldn't fix it, because it's answering the wrong question.  The
question the package description should be answering isn't "how
cunningly is it implemented?", it's "what good will it do me?"

To you it's obvious whether "an approximate 900k block size" is a
good or bad thing.  But the program can also be useful for people
who have no idea what a block size is!

>> I see it's described there as "a multi-threaded
>> bzip2/bunzip2 filter that doesn't depend on the lseek() system call
>> and so isn't restricted to regular files."  It had never occurred to
>> me that ordinary bzip couldn't compress block devices (etc); if
>> that's an important difference between lbzip2 and other
>> implementations it should probably be emphasised.
> It is, quoting from the long desc:
>   "It isn't restricted to regular files on input, nor output."

That's only helpful if I know some of its competitors do have this

> And this is not a distinguishing feature in contrast to standard bzip2, 
> it is one in contrast to the other parallel bzip2 implementations. They  
> cannot decompress with multiple threads from a pipe. See [0].

So in fact standard bzip2 has the same feature, and this is only an
advantage when you compare lbzip2 to something that isn't mentioned?

The end-user benefits of lbzip2 don't need to be described at great
length; they just need to stand out.  One obvious approach would be
to make each advantage of lbzip2 a short "headline" and back up each
one with a few sentences of technical details.
JBR	with qualifications in linguistics, experience as a Debian
	sysadmin, and probably no clue about this particular package

Reply to: