[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: What to do when DD considers policy to be optional? [kubernetes]



Hi Dimitry, FTP masters and others,

I know Dimitry was fighting an uphill battle with kubernetes between 2016 and 2018 and he experienced first hand the problems posed by vendored code.

We see more and more software making excessive use of vendored code. Pretty much everything that is written in Go. Some of these are crucially important, like Docker or Kubernetes. So I understand the concern everyone has about how this fits with the Debian Policy.

Debian Policy, paragraph 4.13 states:
(for your convenience I include it below :) )
https://www.debian.org/doc/debian-policy/ch-source.html#convenience-copies-of-code

=================
4.13 Convenience copies of code

Some software packages include in their distribution convenience copies of code from other software packages, generally so that users compiling from source don’t have to download multiple packages. Debian packages should not make use of these convenience copies unless the included package is explicitly intended to be used in this way. [17] If the included code is already in the Debian archive in the form of a library, the Debian packaging should ensure that binary packages reference the libraries already in Debian and the convenience copy is not used. If the included code is not already in Debian, it should be packaged separately as a prerequisite if possible. [18]

[18] Having multiple copies of the same code in Debian is inefficient, often creates either static linking or shared library conflicts, and, most importantly, increases the difficulty of handling security vulnerabilities in the duplicated code.
=================

I think this is the part that has the most bearing on the vendored code problem, especially the footnote. I agree with this principle. But we should apply it to the state of affairs in 2020, and to this specific situation.

Keeping all that in mind, here are the reasons why I think it is acceptable for now to package Kubernetes with the vendored code, and even the best solution that is available currently:

1. OTHER EXAMPLES. If we take this paragraph completely literally and to the extreme then other packages are also in violation of it. True, the current packaging of kubernetes does this to a greater extent than its predecessor for example, but perhaps this shows that this section was always open for interpretation. Examples of some prominent packages in Debian that bundle and use the vendored code (in parentheses is the number of go packages bundled, estimate):
- docker.io (58, including some that are vendored more than once within the same source package, but not including the fact that docker.io itself is made up of 7 tarballs)
- kubernetes (20 for the previous version, 200 now)
- prometheus (4)
- golang (4)
None of these were REJECTed, and please don't sabotage these packages now :-D The idea was only to show that, at least for now, vendoring is a fact in Debian. There is an effort to improve the situation but in the meantime we just go on. Not great, not terrible..

2. MAINTAINABILITY. Having every single vendored repo available as a separate package in Debian is not feasible. It is true that some of them are already packaged. But the expectation that all of them are (with the exact version that is needed for Kubernetes), is not going to happen. Also, the golang-* packages have a number of different maintainers. Hundreds of such packages would be required to build Kubernetes. So one can be rest assured that every future release in Debian will be blocked on waiting for dozens of these packages to be updated. Dimitry and a few others worked hard on trying to pull this off but even they could not do it. Since 2016 a total of 3 Kubernetes releases made it into Debian/unstable, but there have been 17 major and countless minor upstream releases of Kubernetes. Thousands of issues were fixed upstream, including serious security flaws, these never made it into Debian. Exactly because the packaging was too difficult to maintain. So, how maintainable was that solution then, despite the huge amount of effort put in? In my opinion this shows that the reasoning on maintainability in DP does not apply here.

3. NO FORKS. Debian developers hacking Kubernetes source code, so it compiles with a lucky enough version of a dependency that made it into Debian, makes the Debian version of Kubernetes different from the standard one that everyone expects. This is totally unwelcome by almost every user. No sane cluster admin would dare to use this "fork", ever. There were some attempts to get the Kubernetes contributors to update dependencies to a specific version: https://github.com/kubernetes/kubernetes/issues/27543 . Reading the whole thread helps to put some perspective on this. The Kubernetes contributors were actually quite helpful throughout but they have made it clear that they will not update dependencies for update's sake. Maybe with some projects Debian would have the upper hand, but not with Kubernetes.

4. TESTING. The Kubernetes releases are meticulously tested, with far greater technical resources that Debian can collectively muster. The Kubernetes project runs e2e tests regularly on thousands of nodes (donated compute time). If we were to continue to have a fork we would be obliged to do the same. Even if we could run such extensive tests on our fork, and these e2e tests revealed a problem, who is going to interpret the results and fix our snowflake? The debian fork was never tested this way and it seems unlikely that it could ever be.

5. SECURITY. The strongest and most applicable point from the DP footnote is about security vulnerabilities in the duplicated code. This is completely valid. But again, with the maintainability issues (see 2) we won't be able to roll out security fixes in time. How did security in the Debian forks created by DDs worked in the past? https://www.explainxkcd.com/wiki/index.php/424:_Security_Holes . It is true that without listing the bundled dependencies in Built-Using, it is harder to find out if a vulnerability in one of them affects the binary. (Hint: it is hard anyway.) In the case of Kubernetes, and other Go programs in general, an automated tool could be made that extracts go.mod/go.sum for monitoring the dependencies for security vulnerability reports. Doing the whole dance of let's package and maintain hundreds of dependencies so we have a machine readable Built-Using instead of a machine readable go.mod/go.sum seems a lot more harm than good for security. Furthermore the current situation forces users to add third party repos to sources.list to get up to date Kubernetes releases and/or download who-knows-whats-in-it binaries. So this is not great, but the alternative is terrible.

6. EFFICIENCY. Go libraries, vendored or not, are essentially statically linked into the binary. This is still the case when the result is a "dynamic" Go binary, e.g. linked to libc. While there are some experiments for shared libraries in Go, there is no real world use. This means that vendoring has no effect on linking behavior so the whole point is beside the issue.

7. DFSG. I am not aware of any DFSG issues in the vendored packages. No funny licenses, blobs, network downloaded stuff, etc. If there are any, please point it out specifically, and it will be fixed with high priority. I have checked the licences of every dependency and have compiled it in a container with no network access, so what's in the .orig.tar.gz is exactly what was compiled, nothing more.

I do think there is a good case for Kubernetes to be an exception from 4.13 for now, just like other Go packages effectively are. It is a massively popular project topped only by the Linux kernel. We cannot afford not to have up to date versions in Debian, or have forks that no one can use.

So let's find a way to make this happen!

Regards,
-- 
LENART, János
<ocsi@debian.org>

Reply to: