[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#682010: Interoperability with speex patch



Ok, where to begin ...  I guess first off, the IRC meeting was a big help
for me in getting a sense of what was already clear and what was not.
I fairly deliberately didn't lurk that because I was more interested in
listening to what other people were "thinking out loud" at this stage,
and it was a great window into that.  Keep doing those :)

There are some open questions from that which do have straightforward
answers, so I'll try to sort this from easy to Hard ...


 <dondelelcaro> and the server aparently isn't capable of transcoding or whatever
 <Diziet> dondelelcaro: AFAICT the server doesn't ever transcode so every client
          needs to send a codec everyone understands.

The server is basically just a packet amplifier, it takes whatever each
client sends, and forwards a copy to each other participant.  It doesn't
do much more than that.

Clients don't even need to be sending the with the same codec.  If you
configure your client bandwidth for <= 32kbs it will send Speex, and all
other clients will decode that fine.  But it's not symmetrical, so they
may still be sending you Celt at 96kbs, or Opus at any rate, regardless
of your own client configuration.

There isn't really any "negotiation" in the normal protocol sense of that
word, except when the client supports multiple versions of Celt, then the
server will indicate which version the connected clients have in common.


 <vorlon> Debian runs a public mumble server, easy enough to attack

Because of the above, the server itself isn't vulnerable to problems in
the codec code.  It may have its own problems, but I'm not aware of any
of those if they exist.  It's only the client code that's exposed here,
but anyone connecting to a server has a 'direct' connection to all of
the other clients there.


 <rra> In other words, we don't *know* of any security problems;
       we (for some value of we) just think the code is horrible.
 <Diziet> No.  "We" think the code is dead upstream, is the main point.
 <vorlon> my understanding was that someone concocted a proof of concept
          against later versions of the code, and that the opus code has
          been proofed against it

It's not that the code is even particularly horrible (as code written by
Heavy Math people goes :)  Mostly it's that there is nobody analysing the
problems found and corrected in later code, and/or backporting any needed
fixes to the version that mumble wants to continue using.

People are saying they want to keep using it, but nobody is taking on the
role of actually being responsible for it - or indicating anything other
than that they are explicitly _not_ prepared to take on that task so far.


 <Diziet> Not that it's horrible.  I haven't seen anyone claim the opus
          code is much better than the celt code from a security pov.

I'll make that claim unambiguously now (if I hadn't done so already :).

Celt was an entirely experimental codebase, each 'version' of it that
was tagged existed *only* for testing the quality of the audio that it
produced.  Next to no attention was paid to the normal "release issues"
of a piece of "production" software beyond what other people submitted
as patches.  Once the listening test results were in, code was changed,
the bitstream broken as/if needed, and audio quality testing continued.

Being free software, and an experiment conducted fully in the open,
people were free to do with it what they wished -- but if they did,
the onus was *entirely* on them to worry about release quality and
maintenance issues.  That's not something the upstream developers
devoted any real attention to at all before the bitstream freeze.

Opus by comparison has its C code as the normative part of an IETF
proposed standard.  An utterly insane number of hours went into
QA testing it for "release issues" after the final bitstream freeze,
and vetting it for precisely these sorts of problems (and that work
is still ongoing).  There are slides from one of the IETF meetings
documenting some of that process -- and there are things that should
be obvious from even a cursory look at the code - like Opus actually
has a test suite, with near complete code coverage, that fuzzes the
code intelligently on every run etc. etc.

I won't go so far as to claim it's completely bug free.  But people
actually care if there is even a hint that it isn't.  We're in an
entirely different phase of the development now, where release
polishing is at the forefront of What Matters to the maintainers.
No version of celt had that sort of attention, especially not one
as old as 0.7.1.


 <Diziet> What's weird is why don't we have references to this vuln ?

It's not really that weird.  As per the above Thorvald and I became
solely responsible for celt 0.7.1 when we decided to include it in
squeeze - so there is nobody else spending any time on this - and
you never before asked me, or apparently the mumble upstream people
you said you spoke to, for any such further detail :)

 <Diziet> If so we could see "can we apply the patch to celt" which
          might be interesting info.

The mumble upstream folk were given a (not exhaustive) list of commits
to look at when this first came to our attention - and I asked them
about their progress with those again last week.  I got the same reply
as I did initially though:

The patches don't directly apply to the older code, and far too much
had changed for there to be any trivial mapping back to it that they
were able to follow.  Which doesn't mean the problems don't exist in
the old code, just that the places where a fix was later applied did
not make this easy to answer with any confidence.

If somebody has time to volunteer to analyse this in more detail, then
I'm sure we can get them more information.  I'd be delighted if that
resulted in a plausible belief that the old code really is safe still,
or patches to make it so.  I just don't buy people telling me "pfft,
it's fine" when they haven't looked at all - after a person who had
done much of the insanely thorough testing of this code told me that
they thought it wasn't ...


 <rra> Backing up a little bit: Assume that we all decide that it's
       okay to reintroduce celt.  Do we actually have someone who is
       willing to do the work of reintroducing celt into the archive?
       I mean, is Ron willing to do that, or is someone else willing
       and capable to do it if Ron doesn't want to be stuck supporting
       it because he doesn't agree with it?

We don't really want to reintroduce celt as a public package whatever
is decided here.  There really is nothing except mumble with any excuse
to still be using this now.  So the main question is, are we comfortable
shipping mumble with it enabled as a private lib?

Simply uploading that is a no-brainer, anyone can do it, and I won't
refuse to do that if that's where consensus lands and Thorvald doesn't
have time to do so.

But I only committed to being responsible for celt 0.7.1 until we had a
bitstream frozen version to ship, which we now do, and Thorvald doesn't
appear to have the time to commit to that for another release cycle
either.  So my big concern is that we have nobody stepping in to fill
the gap of an "upstream" maintainer, who will diligently investigate
issues like this rather than just say "I'll worry about it when someone
else sends me a patch" ...


 <rra> vorlon: My understanding was that we were unsure whether the
       existing clients out there in the world that speak celt would
       actually negotiate speex.

As I mentioned above, there is no negotiation for this.  If you have a
client that can encode speex, it can just send it and any other client
will be able to decode it.  But that's kind of orthogonal to what they
will send you.  They could send you Opus in return, and if you don't
have a client that can decode Opus, then you won't be able to hear them.

The lack of real bi-directional negotiation is part of why this is such
a mess in the transition period, but that's sort of fundamental to the
way the server operates and can't trivially be fixed.


 <Diziet> But AIUI that would involve downgrading all the clients in
          a channel to speex which might well be unacceptable to the
          userbase effectively making our version of the mumble client
          unuseable in those contexts.

Talking about "downgrading" to speex is only meaningful when comparing
it to opus.  Celt isn't a speech coder, so it doesn't perform well under
conditions where speex does, and vice versa.  Neither is clearly "down"
from the other, at least not when comparing with celt 0.7.1.  They are
different tools, specialised for different jobs.

It would be just as valid to say that "downgrading to celt 0.7.1" would
have the effect you mention.  And that's empirically true because there
are already people blocking its use on their servers, and the number of
people doing that will only grow over the lifetime of Wheezy now that
they can do it just by setting opusthreshold instead of hacking at the
code to change the permitted celt versions.

Celt 0.7.1 gives poor results at bitrates where both opus and speex shine.


 <Diziet> dondelelcaro: I think we know that if we reenable the embedded
          celt it will work as intended with existing clients.

We do gain an extra possible dimension of interoperability.  Unfortunately
that's not enough on its own to ensure it will work with other existing
clients and servers - either at all, or with acceptable results.  And even
mumble upstream is hoping to be able to phase out all codecs other than
opus in a shorter time than the lifecycle of Wheezy.

I agree the backward compatibility issue is important.  But "existing
clients" is currently not a stationary target either.  If we lock this
into another stable release, we are likely to be the last ones left
carrying the hot potato alone, long after it stops actually being useful
to anyone.


 <vorlon> I want to see what actually happens when two clients connect
          to a server having only speex in common
 <vorlon> right now we don't have such a test

If you set the requested bandwidth in both clients to 32kbs or less,
then that's exactly what will happen (if you have old enough clients
to still have speex encoding support :)

Lying about the celt version so that clients thought they didn't have
anything else in common was the crux of the trick Thorvald thought we
could pull though as I understood it, yeah.


Of which I sadly have no new news at this stage ... :(

Thorvald is back, but the other mumble upstream folk and I only got to
talk to him for a couple of minutes before "Argh. Work. gotta go. bbl."
And he hasn't yet been back again ...


On the brighter side, he has got upstream snapshots rolling again, so
there are opus enabled clients getting around and more testing has been
happening on those.  And the other upstream folk and I have had some
reasonably productive discussions on mitigation and interop issues.

We're still talking about getting speex put back into the client, which
seems to be getting some agreement, but we're not quite all said and done
on that just yet.  There's a couple of things that still need looking at.

They've added a config option for the client now, which permits users to
disable celt - which at least gives people an option to turn it off fast
should that be needed faster than we can push code updates.

And the SECCOMP sandboxed version of celt has been pushed upstream now.
The guy who worked on that sounds like he's pretty happy with it, but I
looked at the code and do worry that it's probably too big a change to
push into a stable release without more live testing, and probably a
few other pairs of eyes auditing it.  It seems to be a good answer that
I don't totally want to take off the table, but probably not something
that could seriously be considered for wheezy proper without more people
taking an intense interest in it very soon.


So whichever way you slice it - we still don't have a one-true killer
solution here yet that's clearly without fault.  My gut feeling is
still kind of saying we should target bpo, where we can push upstream
fixes as fast as they come, since it looks like that is going to be
needed for a while - but that answer sucks for another group of people
too ...


Anyhow, way too many words, sorry about that - and there's still lots
I haven't covered - but I'm playing this with an open hand, so if
there's questions, do ask them, please.  And I'll try to give shorter
answers ...


 <rra> Damn, can't get someone else to do the work for us.  :)

That's kind of the perfect summary of the mess we're immersed in, yeah :)


 Ron


Reply to: