[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: What to do when DD considers policy to be optional? [kubernetes]



On Wednesday, 25 March 2020 6:08:23 AM AEDT Janos LENART wrote:
> I know Dimitry was fighting an uphill battle with kubernetes between 2016
> and 2018 and he experienced first hand the problems posed by vendored code.

No. This is a incorrect. Largest chunk of work that I did on Kubernetes was 
in 2015..2016 when I was packaging and introducing many new libraries and 
stabilizing their relationships. A lot of this work has been done with 
various upstreams and a lot of it was in Kubernetes' dependency packages such 
as docker.io. Licensing/copyright documentation and addressing numerous DFSG 
problems took a lot of time and effort to meet requirements for new packages. 
This effort took more than a year or work. Between 2016...2018 I barely 
touched Kubernetes as it was already orphaned.

Challenge is not with "vendored code" as such, or at least not the way you 
think.


> We see more and more software making excessive use of vendored code. Pretty
> much everything that is written in Go. Some of these are crucially
> important, like Docker or Kubernetes. So I understand the concern everyone
> has about how this fits with the Debian Policy.

No you don't understand.

Vendored code is nuisance but not a problem if/when you can throw it away in 
favor of packaged, tested, reusable libraries.

You've made no effort to reuse any existing packaged libraries. Most of them 
could be used without much effort. And there are many advantages of doing so 
beyond obvious benefits for security and de-duplication.

There is integrity of build to care about, and automated QA/CI checks for 
libraries to appreciate. Most of the libraries are tested on more 
architectures that upstreams ever test. We know when there is a problem and 
can do something about it. Concept of good quality reusable components is too 
powerful to throw it away just because you think it is inconvenient.

Vendored Golang libraries run no tests on build. IMHO Kubernetes bundle (with 
vendor directory) is unmaintainable and you can find how much upstream 
struggles with it to extent when they would not upgrade a component to fix a 
problem due to fears that everything might fall apart.

Using properly packages libraries is an advantage to maintainer and a benefit 
to a whole ecosystem.


> Debian Policy, paragraph 4.13 states:

Before we discuss the policy let me focus on your attitude first.

Kubernetes -- one of the most sophisticated packages -- was you _second_ ever 
upload to Debian. You are woefully inexperienced maintainer, knowing little 
if anything about team work, packaging practices, their meanings and 
implications. IMHO your interpretation of policy is mostly irrelevant as you 
are merely trying to use the policy to justify what you did.

There are several problems with how you did it too. You did not use anyone's 
advise, ignored Salsa repository, threw away _everything_ and made no effort 
to understand how and why things were implemented, let alone appreciated 
prior work or tried to improve it. What you did is technological hijack of 
the package, a gross violation of practices.

Imagine I'll upload a package to NEW, get in reviewed and accepted for what 
it is then re-upload as something entirely different bundled with 500+ 
dependencies that were not reviewed then claim that policy allows it?


> =================
> I think this is the part that has the most bearing on the vendored code
> problem, especially the footnote. I agree with this principle. But we
> should apply it to the state of affairs in 2020, and to this specific
> situation.

Nonsense. In 2020 using packaged libraries is much easier than before.
You simply have no excuse not to do so. Again, you are not understanding what 
is the problem.


> Keeping all that in mind, here are the reasons why I think it is acceptable
> for now to package Kubernetes with the vendored code, and even the best
> solution that is available currently:

I keep telling you that it is not a best solution but the sloppiest and the 
most inferior one.


> 1. OTHER EXAMPLES.
> [...]
> - docker.io (58, including some that are vendored more than once within the
> same source package, but not including the fact that docker.io itself is
> made up of 7 tarballs)

Docker.io is a special, exceptionally difficult case which should be an 
example to you that even with Docker it is possible to leverage packaged 
libraries. Docker upstream is one of the worst with their abuse of versioning 
practices. On top of that Docker code base is shipped in several name spaces 
that make it impossible to package some components separately due to mutual 
circular dependencies. In Docker we use strategically bundled components only 
when necessary.


> - kubernetes (20 for the previous version, 200 now)
> - prometheus (4)
> - golang (4)
> None of these were REJECTed, and please don't sabotage these packages now

Perhaps Nomad or recently accepted Vault would serve as a better example. 
Vault is one of packages with greatest dependencies, maintains a good balance 
between vendored sub-components and re-use of packaged libraries.
This is how Kubernetes should have been maintained and it is not too hard to 
do so if you know what you are doing and why.


> :-D The idea was only to show that, at least for now, vendoring is a fact
> in Debian. There is an effort to improve the situation but in the meantime
> we just go on. Not great, not terrible..

We are striving to eliminate vendoring whenever possible for good reasons. 
You would have make a good point back in 2015 or 2016 when Golang community 
was yet to discover software versioning practices. These days Golang 
community embraced versioning and it is learning to appreciate API stability 
so situation it not ideal but it is getting better.


> 2. MAINTAINABILITY. Having every single vendored repo available as a
> separate package in Debian is not feasible.

I disagree. You are not even trying and with virtually no experience making 
such strong conclusion.


> Dimitry and a few others worked
> hard on trying to pull this off but even they could not do it.

Incorrect. Not only did I do it successfully but I kept doing it for 
different very sophisticated packages ever since, only walking away from 
Kubernetes for unrelated reasons.

You don't understand that you have to own the whole dependency tree and 
maintain packages separately. You don't have to care for every single 
component but for most of them.


> Since 2016
> a total of 3 Kubernetes releases made it into Debian/unstable, but there
> have been 17 major and countless minor upstream releases of Kubernetes.

I doubt we need every single release.
Kubernetes situation was unfortunate due to lack of maintainers.
After I kick-started Kubernetes packaging I expected that some maintainers 
would join after I did the hardest most laborious chunk of work.
That's true that I could not sustain continuous contribution to Kubernetes 
but that was due to other priorities.


> Thousands of issues were fixed upstream, including serious security flaws,
> these never made it into Debian.

Some fixes was not accepted upstream and some of those security flaws were in 
3rd party libraries where they were fixed in separate uploads.


> Exactly because the packaging was too difficult to maintain.

It was not easy to maintain but that's no excuse to do sloppy work cutting 
corners whenever you feel like.

It was because there were nobody to maintain Kubernetes but it was perfectly 
possible to maintain in a sane way when at least some system libraries are 
used.


> 3. NO FORKS. Debian developers hacking Kubernetes source code, so it
> compiles with a lucky enough version of a dependency that made it into
> Debian, makes the Debian version of Kubernetes different from the standard
> one that everyone expects. This is totally unwelcome by almost every user.
> No sane cluster admin would dare to use this "fork", ever. There were some
> attempts to get the Kubernetes contributors to update dependencies to a
> specific version: https://github.com/kubernetes/kubernetes/issues/27543 .
> Reading the whole thread helps to put some perspective on this. The
> Kubernetes contributors were actually quite helpful throughout but they
> have made it clear that they will not update dependencies for update's
> sake. Maybe with some projects Debian would have the upper hand, but not
> with Kubernetes.

That upstream bug was an example of upstream failure (and unwillingness) to 
incorporate a simple straightforward and tested patch. An example of how 
inefficient and dysfunctional upstream development is. As a cluster admin I 
don't have enough confidence in upstream governance to use binaries provided 
by that project because upstream does a very poor job and, to make things 
worse, resists cooperation.


> 4. TESTING. The Kubernetes releases are meticulously tested, with far
> greater technical resources that Debian can collectively muster. The
> Kubernetes project runs e2e tests regularly on thousands of nodes (donated
> compute time). If we were to continue to have a fork we would be obliged to
> do the same. Even if we could run such extensive tests on our fork, and
> these e2e tests revealed a problem, who is going to interpret the results
> and fix our snowflake? The debian fork was never tested this way and it
> seems unlikely that it could ever be.

Good point. However the is a catch. Kubernetes probably have several times 
more untested 3rd party code comparing to size of its own code base.

Unlike upstream, we test individual libraries but not the compiled 
combination of those (in case of Kubernetes).


> 5. SECURITY. The strongest and most applicable point from the DP footnote
> is about security vulnerabilities in the duplicated code. This is
> completely valid. But again, with the maintainability issues (see 2) we
> won't be able to roll out security fixes in time.

Maintainability issues can not be fixed without maintainer. Of course no 
issues were fixed in unmaintained package. But one thing is certain: 
graveyard of obsolete libraries in "vendor" is unmaintainable and upstream 
struggles with it too with many unfixed problems. IMHO our approach is better 
with all the difficulties.


> 7. DFSG.
> [...]
> I have checked the licences of every dependency and have compiled it in a
> container with no network access, so what's in the .orig.tar.gz is exactly
> what was compiled, nothing more.

You would have less work to do if you had private vendored libraries removed. 
Good balance can and should be achieved without vendoring everything.


> I do think there is a good case for Kubernetes to be an exception from 4.13
> for now, just like other Go packages effectively are.

The whole thread is about how current state of Kubernetes packaging is not 
like other Golang packages.


> It is a massively
> popular project topped only by the Linux kernel. We cannot afford not to
> have up to date versions in Debian, or have forks that no one can use.
> 
> So let's find a way to make this happen!

It would be nice to have Kubernetes in Debian but not in the sloppy way how 
you want to maintain it. Upstream binaries are already available -- they are 
not great and yours are not better than upstream's.

-- 
Best wishes,
 Dmitry Smirnov.

---

There are two different types of people in the world, those who want to
know, and those who want to believe.
        -- Friedrich Nietzsche

Attachment: signature.asc
Description: This is a digitally signed message part.


Reply to: