[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#682010: [mumble] Communication failures due to CELT codec library removal

On Tue, Jul 24, 2012 at 01:17:10PM +0100, Ian Jackson wrote:
> Ron writes ("Re: Bug#682010: [mumble] Communication failures due to CELT codec library removal"):
> > My primary concern is with the fact we would be shipping very complicated
> > code, that only about 3 people in the world really understand, and which
> > has no committed ongoing maintainer from among them or anyone else.
> I don't think this is really a huge issue.  As I understand it the
> code in the celt codec has been reused in the implementation of opus -
> obviously not quite identically, but that means that it's not really
> right to say that no-one understands this code and that it's dead
> upstream.

I understand your line of thinking there, and for 99% of the code in the
world, I'd be in complete agreement.  I'm not someone who is afraid of
code, or of getting my hands dirty in it, but we're not talking about
simple programming errors here - if and where there are problems, they
are in the boundary conditions of some very deep math that often still
confuses the people who created it until they stop and think very hard.

There's a simplified description of some of that here:

(be sure to mouse-over the block diagram too)

If there is anybody reading this who thinks they understand all the
concepts there enough to analyse code implementing them, then please
do put your hand up, we may need your expert attention at some point
in the future. (and we have other codecs we'd love you to help out
with too :)

For those who don't, I probably should note that was described as the
"lies for children" simplification by the people who wrote it.  Which
wasn't meant to be rude to anyone else, it just really does skip over
an enormous amount of the actual complexity that is really there, to
try and give ordinary people some broad idea of how it works, and
things to go research if they want to learn more.

I agree that saying "nobody" understands it is also an oversimplification
on my part, but I'm not aware of there being anybody at present in the
intersection set of "people who care about mumble using old celt" and
"people who do understand this".  If that wasn't an empty set, I'd also
be less concerned.  (and I did spend quite some time asking around to
try to find someone who might fill that void)

I have no reason to doubt that the upstream maintainers who said they
have no further interest in this codebase really do mean that.  I've
been involved with them long enough to see them move from one project
to the next before and it really will be "our problem, and our problem
alone" if we continue using this.

> It's been incorporated as a key part of opus, renamed and
> developed.  So celt 0.7.11 is really best seen as an old, pre-release,
> version of opus.

There is a sense in which you are 100% correct there.  But it is also
the same sense which gave us the aphorism:

 "This is Ned Kelly's original axe."
 (only the handle has been replaced 5 times and the head twice)

There's a 'spiritual' connection between these codebases, but so much
has been rewritten and reinvented so many times now, that backporting
anything from the new code to the old won't be a case of understanding
the code.  It will likely require reverse engineering the math and
then completely re-analysing the problem in a very different domain,
just to see if it still applies, let alone to figure out a fix.

If that wasn't the case, the problems identified in later code would
have already had fixes backported to the code we have.  As it is, nobody
has yet figured out how those things really map together, to confirm
or deny the old code is affected -- all the people who cared enough to
try got lost at the very first step.

For a more empirical clarification of how much has changed, see the
diffstat below.

> > If there is a consensus among the members of the TC and the security
> > team that the risk of doing that is justified by other factors, then
> > I'll consider the peer decision making process to have worked as it
> > should, and quite the opposite of being 'irritated', I'll be quite
> > relieved that this decision and its possible consequences do not fall
> > on my head alone.
> Well I asked this question of the security team, and while they
> weren't particularly positive about it they did not object to the
> inclusion in wheezy.

Yes, I did see nion's earlier response to that:

> > So ...  really the only decision I see to be made here, is will we
> > ship with celt 0.7.1 enabled or not.  If -ctte and -security weighs
> > up the risks and tells me they are happy doing that, then I'm happy
> > to make that happen with no further delay.
> I don't speak for the whole TC, but my personal view at the moment is
> that this tradeoff is worthwhile.

I think I've made my concerns are clear as I can, so in my mind then,
if Don and Steve agree with your assessment, then I'm happy to consider
that a sufficient 'consensus' of the TC (since they have already been
following this, and it seems cruel and unnecessary to inflict reading
all of it on the rest of the -ctte members if they don't of their own
free will wish to do so and chime in with an opinion).

If Phil is ok with this from his perspective as SRM, then let's all
get beer and move on to the wrap party :)


Diffstat of celt 0.7.1 to opus 0.9.14 currently in Wheezy:

 _kiss_fft_guts.h     |  153 --
 arch.h               |  138 --
 bands.c              | 1790 +++++++++++++++----------
 bands.h              |   85 -
 c64_fft.c            |  344 ----
 c64_fft.h            |   58 
 celt.c               | 3513 ++++++++++++++++++++++++++++++++++-----------------
 celt.h               |  265 ---
 celt_header.h        |   70 -
 celt_lpc.c           |  188 ++
 celt_lpc.h           |   53 
 celt_types.h         |  140 --
 cwrs.c               |  515 +------
 cwrs.h               |   25 
 dump_modes.c         |  223 ---
 ecintrin.h           |   99 -
 entcode.c            |   54 
 entcode.h            |  132 +
 entdec.c             |  258 ++-
 entdec.h             |   87 -
 entenc.c             |  283 +++-
 entenc.h             |  108 -
 fixed_c5x.h          |   26 
 fixed_c6x.h          |   26 
 fixed_debug.h        |  383 ++++-
 fixed_generic.h      |   85 -
 float_cast.h         |  169 +-
 header.c             |  129 -
 kfft_double.h        |   79 -
 kfft_single.c        |   44 
 kfft_single.h        |   84 -
 kiss_fft.c           |  777 +++++------
 kiss_fft.h           |  158 +-
 kiss_fftr.c          |  165 --
 kiss_fftr.h          |   82 -
 laplace.c            |  168 +-
 laplace.h            |   26 
 match-test.sh        |   30 
 mathops.c            |  206 ++
 mathops.h            |  285 ----
 mdct.c               |  200 +-
 mdct.h               |   41 
 mfrngcod.h           |   18 
 mfrngdec.c           |  248 ---
 mfrngenc.c           |  236 ---
 modes.c              |  551 +++----
 modes.h              |   91 -
 opus_custom_demo.c   |  210 +++
 os_support.h         |   99 -
 pitch.c              |  326 +++-
 pitch.h              |   28 
 plc.c                |  136 -
 psy.c                |  211 ---
 psy.h                |   57 
 quant_bands.c        |  523 +++++--
 quant_bands.h        |   49 
 rangedec.c           |  210 ---
 rangeenc.c           |  209 ---
 rate.c               |  654 +++++++--
 rate.h               |  135 -
 stack_alloc.h        |   44 
 static_modes_fixed.h |  595 ++++++++
 static_modes_float.h |  599 ++++++++
 testcelt.c           |  209 ---
 vq.c                 |  396 +++--
 vq.h                 |   47 
 70 files changed, 9563 insertions(+), 9178 deletions(-)

And while that may not look like the most horrifyingly large diff that
has ever been sent to -release as a minimal 'harmless' proposed fix,
let's put it into perspective as a proportion of the total codebase:

 $ cat libcelt/*.[ch] | wc -l

I'll leave trying to understand and review the diff itself as an
exercise for the truly dedicated reader :)

Reply to: