[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#977994: apt: Output from sandboxed methods should not be trusted



On Wed, Dec 30, 2020 at 09:32:32PM -0500, Demi M. Obenour wrote:
> That is true.  Nevertheless, if we are going to put in the work to
> confine the methods, we should also make sure they cannot escape
> their confinement.

The confinement can never be perfect given that we always have to let
data escape to actually work with it. Its also fatal to go into
discussion with an attitude of demanding perfectism as nothing else
registers as improvement over the status quo.

Is APT perfect? Certainly not. Was it a good idea to not run the http
method as root? It surely was work, but we hope it was an improvement
even if not bulletproof and hence not to your liking. Can we do more?
Yes, if time and energy permits.


> 1. libapt-pkg spawns its methods as normal.
> 
> 2. When libapt-pkg requests that a file be downloaded, it creates
>    a socketpair and passes one end to the method using SCM_RIGHTS.
>    libapt-pkg reads from the other end in a background thread.
> 
> 3. The method downloads the file as normal, but instead of writing it
>    to a file, it writes it to the file descriptor it received above.
>    libapt-pkg reads this data, hashes it, and writes it to a temporary
>    file.

You fail to mention who is (e.g. in the case of https) creating the
connection to the server, keeping it alive, handling handshakes (https),
reading/running proxy configuration and pipelining(http1.1).

The later can actually cause the method to change the file it is working
with if it detects a server has messed up pipelining. I presume
a potential http2 can do that on purpose due to their streams.

Other things to consider are partial downloads (one optional input file),
pdiff patches (potentially infinite amount of input files) and methods
who do not produce a verifiable stream of data as output (e.g. gpgv).

They all produce countless different messages accompanying the raw data
though, like progress reports and error messages – both based at least
tangibly if not directly on (untrusted) data from the server.

You also fail to mention how you handle e.g. redirects, which was the
start of this whole CVE message passing ordeal.


> 4. The method returns its response and closes the file descriptor.

Oh, so you are passing a message from the method to libapt. How is that
message secured given it is data coming from a source (= the method) you
do not trust?


> 5. libapt-pkg reads the response.  It reads and closes the file
>    descriptor, and checks the hash against the signed manifest.

The size of unprocessed data in a pipe is not infinite. You make it
sound like you could download a 4 GB deb to an open file descriptor and
leave it there until the process is done. In reality you have to
constantly process that data (= calculate hash) and store on disk long
before the process is done. And don't even think of handing over that
4 GB blob to hashing only after… nobody wants to wait for that expensive
refetching of that data from a slow disk on an already slow machine
while you had all the time in the world to process that data already
while writing it to disk.

(Also, just so we be clear: There is no "the" hash. APT supports
 multiple, the involved methods support multiple, the repository
 supports multiple, and these (at least) three sets hopefully have
 a non-empty intersection.)


>    If the file being downloaded is a diff, then the hash of the diff
>    is checked.

As said, e.g. the method "store" takes e.g. a Contents.xz (which passed
hashes check), extracts it (= subject to e.g. an extraction bomb) and
produces a Contents.lz4. The result can not be verified. The
intermediate product (= the uncompressed data) only this method sees can.
Or are you saying that because the file passed hashes once, its
absolutely no longer hostile (I mentioned extraction bomb for a reason)
and no mistakes could have been made?

Similar, why only check the hash of the diff if we can and should also
check the hash of the file it produced (potentially as an intermediate
as its on disk storage is potentially again a compressed file).

With that train of thought we can just use https and be done with it.


> This is safe.  If the method returns malicious data, libapt-pkg will
> detect it.  The hashing and writing can be performed in a background
> thread, so there is very little performance penalty.

God forbid that someone runs `sed -i -e 's#background thread#method#'` on
this sentence. The word "thread" doesn't make it magically more secure,
actually less as a thread has a lot of shared context.



That said, sure, individual methods could benefit from "getting
subordinates" (as I said in the last message). That includes `http`
having dedicated subprocesses for directly interacting with the server
and for the more finer points of operation. It is "just" a matter of
actually implementing it as you are of course not the first person to
name-drop socketpair and SCM_RIGHTS here [0].

Thankfully you don't have to implement it for the whole of the acquire
system in one go through. Given that apts "background threads" are
trusted in your design and we provide them already as "methods" which can
be implemented however they want including a multiprocess architecture
utilizing socketpair or what not with the benefit of not requiring to
change all other existing methods at the same time.


Happily awaiting a soon upcoming merge request from you now,

David Kalnischkies

[0] https://salsa.debian.org/apt-team/apt/-/commit/da6f750f4586af16f2df3fb85cd6eb35a2a8227d

Attachment: signature.asc
Description: PGP signature


Reply to: