Bug#1107620: vcswatch: e2fsprogs: repository blocked even though it is nowhere near 500MiB
Control: tag -1 patch
Hi!
On Mon, 2025-08-11 at 01:51:50 +0200, Guillem Jover wrote:
> I was looking at this again just now, and I think the subsequent git
> fetches are causing the problem. On my server the dpkg.git repo is
> 180 MiB (and I've not run «git gc --aggressive» for a while.
>
> Trying to replicate what the vcswatch data gathering script is doing,
> I got the following:
>
> ,---
> # Initial clone
> $ git clone --quiet --bare --mirror --depth 50 --filter tree:0 \
> --no-single-branch --template '' \
> https://git.dpkg.org/git/dpkg/dpkg.git dpkg.git
> warning: filtering not recognized by server, ignoring
> $ cd dpkg.git
> $ du -sh .
> 57M .
> # Iterative fetch 1
> $ git -c gc.auto=200 fetch --depth 50 --prune --force origin '*:*'
> […]
> $ du -sh
> 115M .
> # Iterative fetch 2
> $ git -c gc.auto=200 fetch --depth 50 --prune --force origin '*:*'
> […]
> $ du -sh
> 173M .
> # Iterative fetch 3
> $ git -c gc.auto=200 fetch --depth 50 --prune --force origin '*:*'
> […]
> $ du -sh
> 231M .
> `---
>
> Which I guess increases until reaching the 500 MiB limit. I notice
> that under objects/pack/ there is one set of similarly sized packs
> (57 MiB) per each iteration.
>
> Running «git gc» on the repo makes things go back to a more normal
> size, as it would be expected.
Ok, I think the attached patch should help with this, as it will force
an automatic «git gc» after 4 packs on disk.
The fetching though still seems rather inefficient, because at least
for dpkg, it will keep requesting to download a pack which is
currently 57 MiB big, instead of asking for the few new objects that
would usually be downloaded.
Thanks,
Guillem
From 2f3e9d591a2ea698932ab7af4bbabd5fed8e3dad Mon Sep 17 00:00:00 2001
From: Guillem Jover <guillem@debian.org>
Date: Sat, 16 Aug 2025 17:57:50 +0200
Subject: [PATCH] =?UTF-8?q?vcswatch:=20Force=20a=20=C2=ABgit=20gc=C2=BB=20?=
=?UTF-8?q?after=20fetch=20to=20avoid=20hitting=20repo=20quotas?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The current code requests a depth of 50 commits, and requests auto
garbage collection after 200 loose objects are in the repository.
The problem is that on each fetch we might get a pack with all the
relevant object, which will be duplicated with the packs for the
previous fetches, where there will be no loose objects. These will
keep stacking and then we will hit the repository quota, and further
fetching will be completely disabled.
Instead, request that we do not want more than 4 non-keep packs,
before triggering an auto garbage collection.
Closes: #1107620
Ref: #1072498
---
data/vcswatch/vcswatch | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/data/vcswatch/vcswatch b/data/vcswatch/vcswatch
index a31bf079..954e4cc7 100755
--- a/data/vcswatch/vcswatch
+++ b/data/vcswatch/vcswatch
@@ -296,7 +296,9 @@ sub process_package ($) {
}
runcmd ('darcs', 'pull', '-a');
} elsif ($pkg->{vcs} eq 'Git') {
- runcmd ('git', '-c', 'gc.auto=200',
+ runcmd ('git',
+ '-c', 'gc.auto=200',
+ '-c', 'gc.autoPackLimit=4',
'fetch',
($pkg->{dumb_http} ? () : ('--depth', '50')),
'--prune', '--force', 'origin', '*:*');
--
2.50.1
Reply to: