[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Reviving schroot as used by sbuild



On Fri, 13 Sep 2024 at 11:15:55 +0200, Helmut Grohne wrote:
> My initial experiments indicate that we're in
> for a factor two [slowdown] whereas we could get this down significantly
> by using an overlayfs approach that we cannot shoehorn into podman.

Er, podman does use overlayfs, in at least some circumstances?

$ podman run --rm -it debian:sid-slim grep ' / / ' /proc/self/mountinfo
464 131 0:101 / / rw,relatime - overlay overlay rw,lowerdir=/home/smcv/.local/share/containers/storage/overlay/l/[…],upperdir=/home/smcv/.local/share/containers/storage/overlay/[…]/diff,workdir=/home/smcv/.local/share/containers/storage/overlay/[…]/work,redirect_dir=nofollow,uuid=on,volatile,userxattr

In unstable (and I think also bookworm but I haven't checked
recently), /usr/share/containers/storage.conf defaults to the
"overlay" driver - but the real default is whatever already exists in
~/.local/share/containers/storage, with the configured driver only used
for new setups, unless forced.

I think the performance characteristics you describe probably mean that
you have container storage that is already using the "vfs" driver, which
is indeed based on quite a lot of copying.

> podman
> upstream insists on CAP_SYS_ADMIN being a no go while systemd upstream
> insists on CAP_SYS_ADMIN being a requirement

Sorry, this is just not true, in either direction.

podman can be configured to allow CAP_SYS_ADMIN inside the container
(podman run --cap-add=CAP_SYS_ADMIN), but it isn't the default, because
it likely[1] means that "containers don't contain" (no effective security
boundary between root in the container, and the user whose uid was mapped
to the container's uid 0). I suspect the same is going to be equally
true for anything that retains CAP_SYS_ADMIN and maps your real uid to
a container uid, but having a uid in common is usually desirable if you
want to be able to provide files to the container, or provide a place
where the container can write files back out.

systemd doesn't "insist on" CAP_SYS_ADMIN either - it specifically
doesn't require it! - but some individual systemd features do require
it. At the moment, it will fail closed (services like polkitd whose
security-hardening settings need CAP_SYS_ADMIN fail to start), which
surprised me, because other systemd security-hardening settings tend to
fail open (if systemd doesn't have all of the necessary capabilities
or kernel features then the service still starts, but the rest of the
containerized system is less protected from the service than it could
have been).

[1] I asked podman upstream and the answer can be summarized as
    "it's complicated, but probably"

> I have reached the
> conclusion that doing a persistent namespace requires a background
> process and an IPC mechanism. (This requirement rules out
> podman/docker/crun/runc.)

podman/docker can certainly run a background process that accepts
commands via IPC. They don't do this by default, sure, but if you
make the container payload include a process that accepts commands -
perhaps on an AF_UNIX or TCP socket, or through pipes - then they won't
stand in the way of doing that.

(Proof of concept 1: a podman container with an init system and
sshd. Proof of concept 2: the persistent process is a shell inside the
container, and the IPC mechanism is a pipe on stdin and another pipe on
stdout. Obviously an interactive shell makes a really bad IPC protocol,
as we already knew from autopkgtest-virt-qemu and LAVA, and for production
use it would be better to use a more structured protocol with proper
framing and error handling, like the D-Bus interface that systemd-run
uses - but that's an implementation detail.)

    smcv


Reply to: