[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Validating tarballs against git repositories



On Sat, 30 Mar 2024 at 06:29, Russ Allbery <rra@debian.org> wrote:
>
> Antonio Russo <antonio.e.russo@gmail.com> writes:
>
> > The way I see it, there are two options in handling a buildable package:
>
> > 1. That file would have been considered a build artifact, consequently
> > removed and then regenerated.  No backdoor.
>
> > 2. The file would not have been scrubbed, and a difference between the
> > git version and the released tar version would have been noticed.
> > Backdoor found.
>
> > Either of these is, in my mind, dramatically better than what happened.
>
> I think the point that you're assuming (probably because you quite
> reasonably think it's too obvious to need to be stated, but I'm not sure
> it's that obvious to everyone) is that malicious code injected via a
> commit is significantly easier to detect than malicious code that is only
> in the release tarball.
>
> This is not *always* correct; it really depends on how many eyes are on
> the upstream repository and how complex or unreadable the code upstream
> writes normally is.  (For example, I am far from confident that I can
> eyeball the difference between valid and malicious procmail-style C code
> or random M4 files.)  I think it's clearly at least *sometimes* correct,
> though, so I'm sympathetic, particularly given that it's already Debian
> practice to regenerate the build system files anyway.
>
> In other words, we should make sure that breaking the specific tactics
> *this* attacker used truly make the attacker's life harder, as opposed to
> making life harder for Debian packagers while only forcing a one-time,
> minor shift in attacker tactics.  I *think* I'm mostly convinced that
> forcing the attacker into Git commits is a useful partial defense, but I'm
> not sure this is obviously true.

While it's of course true that avoiding massaged tarballs as orig.tar
is not a panacea, and that obfuscated malicious code can and is
checked in git, I am pretty sure it is undeniable that having
everything tracked in git makes it _easier_ to audit and investigate.
Not perfect, not fool-proof, but easier, compared to manually diffing
tarballs. And given we are talking about malicious actors using
subterfuge to attack us, I think we could use all the help we can get,
even if there's no perfect solution.

In the end, massaged tarballs were needed to avoid rerunning
autoconfery on twelve thousands different proprietary and
non-proprietary Unix variants, back in the day. In 2024, we do
dh_autoreconf by default so it's all moot anyway. When using
Meson/CMake/home-grown makefiles there's no meaningful difference on
average, although I'm sure there are corner cases and exceptions here
and there.


Reply to: