[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: MP3 decoder packaged with XMMS

On 7/19/05, Monty <monty@xiph.org> wrote:
> On Tue, Jul 19, 2005 at 04:05:59PM -0700, Michael K. Edwards wrote:
> > That's mighty cool.  Can you say anything about the Mercora encoder's
> > psycho-acoustic bits
> In fact, I can't say much about it (I know all about it but am under
> NDA).

That's what I expected.  Such is life.

> > or about how you approach the risk that loading
> > a particular codebook into the Vorbis decoder would result in
> > something patent-infringing?
> The codebooks are huffman trees + a value per leaf: just data.  The
> code that applies them may infringe, but I doubt very much that raw
> data itself can, genomics stupidity notwithstanding.

That's a little like saying that no software can possibly infringe a
patent because the object code is just data consumed by a Von Neumann
machine.  Only a little, of course; the codebook abstraction is hardly
Turing complete.  But suppose that the Vorbis decoder fit most of the
claims of a patent, and that a certain pattern of codebook usage
completed the fit.  Then combining the two would be a
patent-infringing use, and the suppliers of one or both (not to
mention of the encoder) could be held liable depending on criteria
such as whether there are substantial non-infringing uses.

Let me make that a little more concrete.  Lucent's patent #5,341,457
(at issue in the Dolby suit) has four independent claims:
   #1 ("method of processing ... audio signals", i. e., encoding)
   #10 ("storage medium" to which is applied a "recording signal", i.
e., the data format put in a "physical" form according to the
patent-agent shibboleth of the day)
   #13 ("method of transmitting audio signals", i. e., streaming encoder)
   #17 ("method for generating signals", i. e., the encoding process
again, but this time stated all in one claim and hewing a little more
closely to the preferred embodiment than #1 does)

The disclosure also describes the decoder for these "signals".  It is
wholly plausible to me (IANAL, TINLA) that the history of the patent
application would support a claim either that the act of decoding such
a "storage medium" is an infringing use or that the examiner
erroneously insisted on the "storage medium" lingo when the proper
subject matter of the invention is the encoding and decoding

Now, my reading of this patent is that the "novel" bit of each
independent claim is the use of "at least one tonality value
reflecting the degree to which said time sequence of audio signals
comprises tone-like quality" to control the noise masking threshold
used when quantizing.  The rest is vanilla blockwise transform coding
(in the disclosure, 2048 FFT).  In the preferred embodiment, the
"tonality value" is a "Spectral Flatness Measure", a relatively
inexpensive-to-calculate (given a cheap floating point multiply,
anyway) proxy for a true statistical measure of tone strength.  The
disclosure is quite articulate on the scientific basis for varying the
noise threshold, and hence the quantization, based on the degree of
tonality in a given critical band.

A range of noise thresholds would presumably translate, in the Vorbis
codec as it does in the "entropy-coded case" of the '457 preferred
embodiment, to a range of Huffman codebooks.  Without going into
needless detail, I submit that one could easily construct a Vorbis
encoder that selected codebooks for residue encoding using
substantially the method taught in the '457 patent.  Would its output
be meaningfully distinguishable from that of the reference Vorbis
encoder or of the Mercora encoder?  I have not studied either enough
to be able to answer that question.

Note that I turned first to the '457 patent, not least because its
claim structure is simpler, but also because its claimed invention
appears to me to be a little closer to the heart of the Vorbis system.
 A quick glance at #5,579,430 (the principal MP3 patent) persuades me
that I could go through a similar exercise, not with claim 1 (since
Vorbis doesn't appear to provide an escape mechanism from codebook
into "PCM", i. e., raw data for rare entries), but with each of the
other independent claims 19 and 22.

Personally I think both of these claims are very weak on both the
originality and non-obviousness fronts.  In my unqualified opinion, if
they were ever litigated they would have to have dependent claims
containing non-trivial psycho-acoustic results or other engineering
benefits folded into them, or else they could well be invalidated
altogether.  The claims dependent on 22 make it clear that it is about
re-establishing sync in mid-stream, and hence outside the domain of
Vorbis proper.  But 19, 20, and 21 together represent a
psycho-acoustic tactic that I wouldn't immediately dismiss as unfit
for patenting, and could easily be embodied in an alternate Vorbis

> >  Have you tried, just for kicks, mapping
> > the AC-3 and/or MP3 techniques onto the Vorbis framework?
> Vorbis isn't a framework, it's a codec.  A more flexble codec than the
> others, but still just a codec.
> The techniques used by both mp3 and AC3 are, to put it bluntly,
> ancient.  Although there was once some 'cargo cult' tendency to try
> out what the other encoders did, for the most part the external
> techniques turned out to be obsolete or inappropriate. Floor 0 is the
> most visible example of taking a cue from outside research without
> thinking it through (LSP is a *terrible* idea for wideband encodings).

AC-3 and DTS are really very different from the music-oriented codecs.
 They use an impressive amount of ad-hockery to handle the vagaries of
film sound (pop and classical music, speech, quiet ambient sounds,
Foley work, explosions, subsonics, comfort noise, and most
combinations of the above, spread across six channels or so with
different purposes and frequency responses, _plus_ markup to support
variant post-processing such as alternate voice dubs and dynamic range
compression for non-theatre listening environments).  Whatever the
respects may be in which the Vorbis design may reflect newer and
better fundamental research, it was silly for me to suggest that
typical AC-3 media could be losslessly transcoded into a Vorbis
bitstream without a considerable increase in bitrate.  (Almost as
silly, actually, as claiming that an AC-3 encoder is just a formalized
chunk of pure math.)

Although I do not by any means know all of the ins and outs of the MP3
format, I think there is more reason to believe that a lossless
transcoder from MP3 to Vorbis might be possible, at least for some
flavors of MP3.  Your codebook might bloat out because you have to
shoehorn all the values into it that the MP3 coded as PCM escapes, and
you might not be able to represent all of the joint encoding variants
used to improve the Huffman efficiency, and for various other reasons
you would doubtless get a less efficiently coded Vorbis stream than
the output of a native Vorbis encoder at the same perceptual quality,
but that's not the point.

Your decoder is more flexible than those with baked-in equivalents to
the codebooks that you prepend to the encoded data.  This means that
you might be able to embody in your encoder the moral equivalent of
the psycho-acoustic techniques used in The Other Guy's.  In your
shoes, I wouldn't want to wait until the discovery phase of a lawsuit
to find out that The Other Guy's expert witness has figured out a way
to coax your encoder into producing a stream whose codebooks will look
like a smoking gun in the eyes of judge and jury.

> In general, the 'lock you up tight' patents that the other firms go
> for are not ones that strictly affect encoding or the raw bitstream
> itself; they attempt to patent sufficient algoritms around the data
> that it's impossible to encode/decode the bitstream itself without
> infringing.  This is another reason I feel relatively secure about
> Vorbis; the bitstream looks/works nothing like the competition.
> Should, God forbid, Vorbis be accused of using some specific technique
> that is not central to handling the bitstream, we could sidestep it
> easily.  The only worrisome patents are the abusive, overly-broad
> ones.

The other firms have a variety of patent agents and attorneys at their
disposal, thinking about the problem of securing legal barriers to
competition from a variety of angles.  Some of them focus on blocking
unauthorized interoperable implementations, and others think more
about how broad a swathe of techniques they can presumptively encumber
for horse-trading purposes.

AFAICT the differences between the various music-oriented bitstream
types, including Ogg Vorbis, are more quantitative than qualitative --
except where sync strategies are concerned, which AIUI is more of an
Ogg thing.  I agree with you that some patents as granted, and
occasionally even as litigated, are overly broad, and that the
incidence of these failures is higher in the digital arena than in
some others.  But the fact that your bitstreams are not trivially
interoperable does not mean that you are automatically safe from being
found to infringe a patent of the limited scope typical of those in
other industries.

> However, the biggest reason I feel secure is that most of the world is
> currently using and shipping Vorbis daily.  Even Microsoft ships it in
> games (where it's not obvious that it's there, but it is nonetheless).

I'm glad that it's commercially successful!  But note that "most of
the [media-producing] world", by revenues, engages in some kind of
patent horse-trading.  You can't be sure that, say, Microsoft is
comfortable using your format because their lawyers judge it to be
patent-free rather than because they already have blanket licenses or
no-sue agreements with the holders of patents that would otherwise
concern them.  There's really no substitute for the opinion of your
own competent counsel (which I am not).

> > It would be kind of fun to write a lossless transcoder to Vorbis from
> > one or more patent-encumbered formats and to see if there are any
> > discernible patterns in the codebooks.
> Can't happen.  The transform domains are not compatable.

Looks to me like MPEG Audio Decoder (libmad) uses an IMDCT, just like
the decoder in the Vorbis I spec -- which is no surprise given that
you cite (unless I am gravely mistaken) Fraunhofer's Dr. Brandenburg
for its definition.  Or is there some other fundamental
incompatibility that I'm missing?

> > It might also be a prudent
> > defensive measure so that you can demonstrate what a potentially
> > infringing Vorbis stream would look like and evaluate to what extent
> > you can distinguish them from Mercora streams.
> Mercora is 100% real Vorbis. Aside from a different vedor string I
> don't believe they are distinguishable from streams produced by our
> reference encoder.

I understand that they are interoperable.  But I presume, based on
what you have written, that the Mercora encoder uses psycho-acoustic
techniques that are both more bit-efficient and substantially less
processor-intensive than the reference encoder.  This comment and
those that followed were predicated on the assumption that you and/or
Xiph.org were involved in the design and implementation of the Mercora
encoder, or at least have some interest in the question of whether it
or the bitstreams it produces are potentially patent-encumbered; my
apologies if that is not the case.

> > Could be doubly
> > prudent if there's anything about the Mercora internals that you
> > wouldn't want to have to divulge into the public record during a court
> > proceeding, since presumably in the absence of a patent you have no
> > way of retaining proprietary rights to that encoder's methods of
> > operation other than trade secret law.
> The Mercora encoder isn't ours and we have no rights to it, but I will
> say it doesn't do anything the reference encoder doesn't.  Aside from
> that, I'm not sure what your point actually is; the worry that third
> parties using Vorbis would be exposing themselves to being forced to
> violate NDA?

No, not at all.  I was conflating your interests and Mercora's here. 
My thought was that it could be difficult to defend a charge of patent
infringement, hypothetically supported by evidence of similarity
between a pattern of bit allocation in Mercora output and the
analogous pattern in the output of a patented encoder, without
divulging some details about the Mercora encoder's internals that are
currently not public.  The fact that you are under NDA about the
Mercora psycho-acoustics suggests that they are held as a trade
secret.  That's a perfectly valid strategy; but it means that Mercora
only has legal reinforcement for its efforts to retain a technological
edge over its competitors so long as the techniques remain secret.

A patent infringement proceeding is one of the easier ways for such a
secret to be forced into the open, at which point it's available for
use by all comers (apart from copyright, but that's usually no barrier
to reimplementation).  Contrariwise, in the event that Mercora is not
commercially successful, its techniques could wind up in a sort of
legal limbo in which no one who knows them is ever legally permitted
to disclose them or use them elsewhere.  I mean this in the nicest
possible way, but those are exactly the risks that the patent system
is designed to avert.

> > I'm just trying to understand
> > how deliberately eschewing patents works out in a field littered with
> > them.
> If I was going to be worried about patents to the level of paranoia
> some suggest, I'd have to give up computers and become a blacksmith or
> machinist, or something (perhaps a hooligan, that's always appealed,
> but I hate soccer and cheap booze).  You can't demonstrate
> conclusively that a single piece of software, anywhere, does not
> infringe any patent.  How many patents does GCC 'infringe'?  100?
> 1000? 10,000?  The only answer is: "The courts have not awarded any
> infringement claim against the FSF regarding GCC" and that is the
> closest practical definition we have of "does not infringe".  Vorbis
> meets the same definition and, honestly, is really not any more likely
> than GCC to see an infringement claim (eg, Microsoft is not 'at war'
> with us the way they are with the FSF.  Microsoft is about as
> aggressive as software companies get, yet for some reason they're not
> using the patent card).

Has there ever been a cease-and-desist letter, let alone an
infringement proceeding, claiming patent rights against a compiler for
a language that GCC supports?  I'm not saying there hasn't, nor have I
even researched the question.  But the solutions to some problems
require little dollops of ingenuity and large amounts of grunt work
rather than the sort of quantum of novelty that patents are designed
to encourage.  Such problems are no less worthy of skill and design
rigor, but they're closer to architecture than to applied science. 
Compilers may or may not be in this category, but there's no question
that a lot of other software is.

I've done other work of which I'm much prouder than the one patent I
(successfully) applied for, but I would have to say that I haven't
reduced any other invention to practice within the statutory
definition as I understand it.  That one patent, seen through the lens
of time, is right smack in this signal-perception nexus (video motion
estimation rather than psycho-acoustics); the closest I ever came to
reducing a second invention to practice was encryption-related.  If
there were a third candidate to date, it would be outside software
altogether -- and I've spent practically my whole working life so far
doing software, relatively little of it in the above areas.  That
doesn't feel like a coincidence to me.

> The only suggestion, at any time, that there may be an infringement
> claim against Vorbis was an off-the-cuff remark from Henri Linde of
> Thomson years ago when he was under the impression that 'Vorbis' was
> just a tweaked mp3 encoder.  He was corrected and retracted his
> remarks (but that followup was not widely reported).

I'm glad to hear that at least one of the sleeping dogs has considered
attempting to bite but decided against it.  I very much doubt that
anything I write here adds to your risks, since I have no special
knowledge or skills in this area and your competitors have more
qualified analysts and attorneys that either of us can shake a stick

> > I am going on the press release at
> > http://investor.dolby.com/ReleaseDetail.cfm?ReleaseID=161066 ; I
> [...]
> At this point a lawyer who knows what actually happened has to weigh
> in and let us know; anything else is guessing, hearsay and uninformed
> speculation I fear :-( Not that it's ever stopped Debian legal before,
> but I'm not personally going to get involved in such a discussion
> myself.

Nor I, unless I get around to dropping by the law library and shelling
out for the whole PACER docket.  Last time I did that I was a bit
disappointed, but the Dolby case is a lot fresher.

- Michael

Reply to: