On Wednesday, 25 March 2020 6:08:23 AM AEDT Janos LENART wrote: > I know Dimitry was fighting an uphill battle with kubernetes between 2016 > and 2018 and he experienced first hand the problems posed by vendored code. No. This is a incorrect. Largest chunk of work that I did on Kubernetes was in 2015..2016 when I was packaging and introducing many new libraries and stabilizing their relationships. A lot of this work has been done with various upstreams and a lot of it was in Kubernetes' dependency packages such as docker.io. Licensing/copyright documentation and addressing numerous DFSG problems took a lot of time and effort to meet requirements for new packages. This effort took more than a year or work. Between 2016...2018 I barely touched Kubernetes as it was already orphaned. Challenge is not with "vendored code" as such, or at least not the way you think. > We see more and more software making excessive use of vendored code. Pretty > much everything that is written in Go. Some of these are crucially > important, like Docker or Kubernetes. So I understand the concern everyone > has about how this fits with the Debian Policy. No you don't understand. Vendored code is nuisance but not a problem if/when you can throw it away in favor of packaged, tested, reusable libraries. You've made no effort to reuse any existing packaged libraries. Most of them could be used without much effort. And there are many advantages of doing so beyond obvious benefits for security and de-duplication. There is integrity of build to care about, and automated QA/CI checks for libraries to appreciate. Most of the libraries are tested on more architectures that upstreams ever test. We know when there is a problem and can do something about it. Concept of good quality reusable components is too powerful to throw it away just because you think it is inconvenient. Vendored Golang libraries run no tests on build. IMHO Kubernetes bundle (with vendor directory) is unmaintainable and you can find how much upstream struggles with it to extent when they would not upgrade a component to fix a problem due to fears that everything might fall apart. Using properly packages libraries is an advantage to maintainer and a benefit to a whole ecosystem. > Debian Policy, paragraph 4.13 states: Before we discuss the policy let me focus on your attitude first. Kubernetes -- one of the most sophisticated packages -- was you _second_ ever upload to Debian. You are woefully inexperienced maintainer, knowing little if anything about team work, packaging practices, their meanings and implications. IMHO your interpretation of policy is mostly irrelevant as you are merely trying to use the policy to justify what you did. There are several problems with how you did it too. You did not use anyone's advise, ignored Salsa repository, threw away _everything_ and made no effort to understand how and why things were implemented, let alone appreciated prior work or tried to improve it. What you did is technological hijack of the package, a gross violation of practices. Imagine I'll upload a package to NEW, get in reviewed and accepted for what it is then re-upload as something entirely different bundled with 500+ dependencies that were not reviewed then claim that policy allows it? > ================= > I think this is the part that has the most bearing on the vendored code > problem, especially the footnote. I agree with this principle. But we > should apply it to the state of affairs in 2020, and to this specific > situation. Nonsense. In 2020 using packaged libraries is much easier than before. You simply have no excuse not to do so. Again, you are not understanding what is the problem. > Keeping all that in mind, here are the reasons why I think it is acceptable > for now to package Kubernetes with the vendored code, and even the best > solution that is available currently: I keep telling you that it is not a best solution but the sloppiest and the most inferior one. > 1. OTHER EXAMPLES. > [...] > - docker.io (58, including some that are vendored more than once within the > same source package, but not including the fact that docker.io itself is > made up of 7 tarballs) Docker.io is a special, exceptionally difficult case which should be an example to you that even with Docker it is possible to leverage packaged libraries. Docker upstream is one of the worst with their abuse of versioning practices. On top of that Docker code base is shipped in several name spaces that make it impossible to package some components separately due to mutual circular dependencies. In Docker we use strategically bundled components only when necessary. > - kubernetes (20 for the previous version, 200 now) > - prometheus (4) > - golang (4) > None of these were REJECTed, and please don't sabotage these packages now Perhaps Nomad or recently accepted Vault would serve as a better example. Vault is one of packages with greatest dependencies, maintains a good balance between vendored sub-components and re-use of packaged libraries. This is how Kubernetes should have been maintained and it is not too hard to do so if you know what you are doing and why. > :-D The idea was only to show that, at least for now, vendoring is a fact > in Debian. There is an effort to improve the situation but in the meantime > we just go on. Not great, not terrible.. We are striving to eliminate vendoring whenever possible for good reasons. You would have make a good point back in 2015 or 2016 when Golang community was yet to discover software versioning practices. These days Golang community embraced versioning and it is learning to appreciate API stability so situation it not ideal but it is getting better. > 2. MAINTAINABILITY. Having every single vendored repo available as a > separate package in Debian is not feasible. I disagree. You are not even trying and with virtually no experience making such strong conclusion. > Dimitry and a few others worked > hard on trying to pull this off but even they could not do it. Incorrect. Not only did I do it successfully but I kept doing it for different very sophisticated packages ever since, only walking away from Kubernetes for unrelated reasons. You don't understand that you have to own the whole dependency tree and maintain packages separately. You don't have to care for every single component but for most of them. > Since 2016 > a total of 3 Kubernetes releases made it into Debian/unstable, but there > have been 17 major and countless minor upstream releases of Kubernetes. I doubt we need every single release. Kubernetes situation was unfortunate due to lack of maintainers. After I kick-started Kubernetes packaging I expected that some maintainers would join after I did the hardest most laborious chunk of work. That's true that I could not sustain continuous contribution to Kubernetes but that was due to other priorities. > Thousands of issues were fixed upstream, including serious security flaws, > these never made it into Debian. Some fixes was not accepted upstream and some of those security flaws were in 3rd party libraries where they were fixed in separate uploads. > Exactly because the packaging was too difficult to maintain. It was not easy to maintain but that's no excuse to do sloppy work cutting corners whenever you feel like. It was because there were nobody to maintain Kubernetes but it was perfectly possible to maintain in a sane way when at least some system libraries are used. > 3. NO FORKS. Debian developers hacking Kubernetes source code, so it > compiles with a lucky enough version of a dependency that made it into > Debian, makes the Debian version of Kubernetes different from the standard > one that everyone expects. This is totally unwelcome by almost every user. > No sane cluster admin would dare to use this "fork", ever. There were some > attempts to get the Kubernetes contributors to update dependencies to a > specific version: https://github.com/kubernetes/kubernetes/issues/27543 . > Reading the whole thread helps to put some perspective on this. The > Kubernetes contributors were actually quite helpful throughout but they > have made it clear that they will not update dependencies for update's > sake. Maybe with some projects Debian would have the upper hand, but not > with Kubernetes. That upstream bug was an example of upstream failure (and unwillingness) to incorporate a simple straightforward and tested patch. An example of how inefficient and dysfunctional upstream development is. As a cluster admin I don't have enough confidence in upstream governance to use binaries provided by that project because upstream does a very poor job and, to make things worse, resists cooperation. > 4. TESTING. The Kubernetes releases are meticulously tested, with far > greater technical resources that Debian can collectively muster. The > Kubernetes project runs e2e tests regularly on thousands of nodes (donated > compute time). If we were to continue to have a fork we would be obliged to > do the same. Even if we could run such extensive tests on our fork, and > these e2e tests revealed a problem, who is going to interpret the results > and fix our snowflake? The debian fork was never tested this way and it > seems unlikely that it could ever be. Good point. However the is a catch. Kubernetes probably have several times more untested 3rd party code comparing to size of its own code base. Unlike upstream, we test individual libraries but not the compiled combination of those (in case of Kubernetes). > 5. SECURITY. The strongest and most applicable point from the DP footnote > is about security vulnerabilities in the duplicated code. This is > completely valid. But again, with the maintainability issues (see 2) we > won't be able to roll out security fixes in time. Maintainability issues can not be fixed without maintainer. Of course no issues were fixed in unmaintained package. But one thing is certain: graveyard of obsolete libraries in "vendor" is unmaintainable and upstream struggles with it too with many unfixed problems. IMHO our approach is better with all the difficulties. > 7. DFSG. > [...] > I have checked the licences of every dependency and have compiled it in a > container with no network access, so what's in the .orig.tar.gz is exactly > what was compiled, nothing more. You would have less work to do if you had private vendored libraries removed. Good balance can and should be achieved without vendoring everything. > I do think there is a good case for Kubernetes to be an exception from 4.13 > for now, just like other Go packages effectively are. The whole thread is about how current state of Kubernetes packaging is not like other Golang packages. > It is a massively > popular project topped only by the Linux kernel. We cannot afford not to > have up to date versions in Debian, or have forks that no one can use. > > So let's find a way to make this happen! It would be nice to have Kubernetes in Debian but not in the sloppy way how you want to maintain it. Upstream binaries are already available -- they are not great and yours are not better than upstream's. -- Best wishes, Dmitry Smirnov. --- There are two different types of people in the world, those who want to know, and those who want to believe. -- Friedrich Nietzsche
Description: This is a digitally signed message part.