[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bits from DPL



On Wed, Jan 08, 2025 at 10:19:34AM +0100, Julien Plissonneau Duquène wrote:
> Le 2025-01-07 21:52, Peter Pentchev a écrit :
> > 
> > Hm. That sounds interesting, but I think the Debian project cannot
> > protect such a mirror from automatically bringing in non-DFSG content
> > that appears in the remote repository. One might even take this one step
> > further and go to content forbidden by law in various jurisdictions.
> 
> Then we are going to have the same issue implementing automated upstream
> release imports in packaging repositories, e.g. with the Janitor, and this
> is a service I would very much like to have.

Unfortunately you are correct that the same problem would arise.

> I would worry more about malicious content getting automatically pulled in.
> But anyway this can probably be mitigated the way large platforms do: make
> it possible to easily report abuse and being diligent in investigating them,
> eventually putting the repository offline until the issue is cleared.

Hm, I would be really, really surprised if there was even one "large
platform" that did not shift the responsibility to the user by having
them sign a terms of service document upon account registration.
Also, I'm not sure that some issues can really be cleared; see below.

> Additional automated checks could be implemented to suspend updates and
> require human review e.g. with LICENSE changes unless the file contents
> matches a whitelist.

That would put the responsibility on the uploader to review not only
the actual changes (as in a diff) between the releases, but each and every
individual file in each and every commit between the two releases.
I don't think this is completely realistic.

Why each and every individual file? Well, consider this:
- version 3.14.1 is tagged
- version 3.14.1 is uploaded to Debian
- somebody pushes a commit to the upstream repo that adds a file that
  really does not belong there
- two more "real" commits are pushed
- somebody pushes a commit that reverts the "add a bad file" one
- three more "real" commits are pushed
- version 3.14.2 is tagged
- version 3.14.2 is uploaded to Debian

...so, if at this point the mirror pulls in the Git commits between
versions 3.14.1 and 3.14.2, there will exist several publicly-accessible
blobs that will contain the file that really does not belong there.
Clearing the issue would require rewriting Git history, squashing commits or
dropping them altogether, which would make the Debian version of
the "upstream" Git repository no longer be a mirror.

> Alternatively the mirroring could be implemented to pull only the release
> tags after a package is uploaded to the archive (which means that someone
> reviewed the changes), and dealt with on a case-by-case basis for non-free
> packages or packages that have +dfsg repacking.

In Git repositories, pulling the release tag involves pulling (and making
available) all the commits leading up to it, even the reverted ones, so...
see above.

In general, automatically mirroring Git repository content is... fraught
with various issues.

G'luck,
Peter

-- 
Peter Pentchev  roam@ringlet.net roam@debian.org peter@morpheusly.com
PGP key:        https://www.ringlet.net/roam/roam.key.asc
Key fingerprint 2EE7 A7A5 17FC 124C F115  C354 651E EFB0 2527 DF13

Attachment: signature.asc
Description: PGP signature


Reply to: