[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1107620: vcswatch: e2fsprogs: repository blocked even though it is nowhere near 500MiB



Hi!

On Wed, 2025-06-11 at 10:22:50 +0200, Guillem Jover wrote:
> On Tue, 2025-06-10 at 15:14:11 -0400, Theodore Y. Ts'o wrote:
> > Package: qa.debian.org
> > Severity: normal
> 
> > I was recently looking at the tracker page for e2fsprogs, and I
> > noticed that there was the VFS watch error: "Failed to analyze the VCS
> > repository. Please troubleshoot and fix the issue."
> > 
> > Looking at this this page [1] the complaint seems to be: "Repository
> > size 531054592 exceeds 500 MiB, blocking it".  However, looking at the
> > e2fsprogs git repo, the bare repo (just the git packs) is only 43
> > megs, and the checked out working tree is 41 megs.  I'm not sure why
> > the repo on VCS watch is over 500 MiB, but could it be that it's in
> > deperate need of having a "git gc" being run on it?
> > 
> > [1] https://qa.debian.org/cgi-bin/vcswatch?package=e2fsprogs
> > 
> > Note, this isn't the first time this has come up.  The last time I
> > reported it to the debian-qa mailing list on July 25, 2024.  Perhaps
> > there needs to be an automated "git gc" run on the git clones of the
> > repos which are used by vcswatch?
> 
> Yes, the same happened (a long time ago) again as well with dpkg,
> for which I also had filed a previous report (#1072498):
> 
>   https://qa.debian.org/cgi-bin/vcswatch?package=dpkg
> 
> On my local tree with no gc for a while the .git dir takes ~100 MiB,
> after a gc --aggressive now it takes 66 MiB.
> 
> There is definitely something wrong with how these repos are being
> handled I think (maybe gc is being prevented or something?).

I was looking at this again just now, and I think the subsequent git
fetches are causing the problem. On my server the dpkg.git repo is
180 MiB (and I've not run «git gc --aggressive» for a while.

Trying to replicate what the vcswatch data gathering script is doing,
I got the following:

  ,---
  # Initial clone
  $ git clone --quiet --bare --mirror --depth 50 --filter tree:0 \
      --no-single-branch --template '' \
      https://git.dpkg.org/git/dpkg/dpkg.git dpkg.git
  warning: filtering not recognized by server, ignoring
  $ cd dpkg.git
  $ du -sh .
  57M     .
  # Iterative fetch 1
  $ git -c gc.auto=200 fetch --depth 50 --prune --force origin '*:*'
  […]
  $ du -sh
  115M    .
  # Iterative fetch 2
  $ git -c gc.auto=200 fetch --depth 50 --prune --force origin '*:*'
  […]
  $ du -sh
  173M    .
  # Iterative fetch 3
  $ git -c gc.auto=200 fetch --depth 50 --prune --force origin '*:*'
  […]
  $ du -sh
  231M    .
  `---

Which I guess increases until reaching the 500 MiB limit. I notice
that under objects/pack/ there is one set of similarly sized packs
(57 MiB) per each iteration.

Running «git gc» on the repo makes things go back to a more normal
size, as it would be expected.

Thanks,
Guillem


Reply to: