[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#977006: packages.debian.org: file list broken for some packages



Control: tags -1 + patch

Hi Rebecca,
Hi Louis-Philippe,

> On the packages.d.o pages of arch:all packages from sid, bullseye or
> experimental, the "list of files" link gives the error message "No such
> package in this suite on this architecture."

First of all, thank you both for the excellent detective work! The
issue was caused by commit 81824d23 in daklib [1] in which the archive
started to provide—on Oct 25, 2020 7:08am PDT and for releases past
buster—separate Contents files containing the file paths in Arch:all
packages.

>From what I can tell, the code generating the web pages for
packages.d.o did not read those files for releases post buster.

I filed a merge request that I believe solves the issue. [2] It was
tested on the Debian node that generates the web pages
(picconi.debian.org). It comes with two caveats:

(1) Due to insufficient permissions I created an improvised
environment, described further below, that may not fully mimic
production runs.

(2) The second commit addresses a condition that should have prevented
the code from performing at all, although apparently it didn't. That
opens up the possibility that I misunderstood the existing code and,
for my tests, created a runtime environment that differed appreciably
from production.

To test the MR, I cloned my feature branch into my home directory on
picconi.debian.org. I then applied the local patch below this message.

Next I ran the command './bin/setup-site /home/lechner/packages
packages.debian.org' as suggested in ./INSTALL and started the test
with '/home/lechner/packages/cron.d/200process_archive'. (I also
created the folders './files/db' and './tmp' in the base directory of
the Git repo, which was my working directory.) The run finished
without errors and produced the attached log.

Now the databases are more even in size across architectures. Here is
a partial listing of the relevant folder ./files/db: (The full listing
for *.db is attached.)

   0 filelists_sid_all.db
129M filelists_sid_alpha.db
132M filelists_sid_amd64.db
208M filelists_sid_arm64.db
126M filelists_sid_armel.db
128M filelists_sid_armhf.db
128M filelists_sid_hppa.db
131M filelists_sid_i386.db
128M filelists_sid_m68k.db
127M filelists_sid_mips64el.db
127M filelists_sid_mipsel.db
123M filelists_sid_powerpcspe.db
132M filelists_sid_ppc64.db
129M filelists_sid_ppc64el.db
130M filelists_sid_riscv64.db
127M filelists_sid_s390x.db
125M filelists_sid_sh4.db
129M filelists_sid_sparc64.db
130M filelists_sid_x32.db

All packages for Arch:all are symbolic links (human size zero). I am
not sure why arm64 is so large.

Perhaps someone with the appropriate user privileges could pull my
feature branch from the merge request [2] into
/srv/packages.debian.org and test it on the live system. The cron run
can be triggered by hand.

A better long-term solution would be to produce separate transfer
files for Arch:all, but that may not work until buster is being
dropped from the archive. Thank you both for your hard work!

Kind regards
Felix Lechner

[1] https://salsa.debian.org/ftp-team/dak/-/commit/81824d2326f5cc50fdcb95c81f9f26864aebaa15
[2] https://salsa.debian.org/webmaster-team/packages/-/merge_requests/20

* * *

[local patch]

lechner@picconi:~/packages$ git diff
diff --git a/bin/parse-contents b/bin/parse-contents
index a1bfc35..7c5f166 100755
--- a/bin/parse-contents
+++ b/bin/parse-contents
@@ -51,6 +51,9 @@ my @sections = @SECTIONS;
 # Add empty section, need to search Contents directly at dist root,
for debports compat
 push(@sections, "");

+$DBDIR = "/home/lechner/packages/files/db";
+my $TMPDIR = "/home/lechner/packages/tmp";
+
 my %debports_hash;
 # copy from config.sh ${arch_debports}
 @debports_hash{qw( alpha hppa ia64 m68k powerpcspe ppc64 riscv64 sh4
sparc64 x32 )} = ();
@@ -166,9 +169,9 @@ for my $suite (@suites) {

     # Piping from sort's output doesn't really scale with 16 GB worth
     # of input, so let's store in a temporary file:
-    my $rev_path_file = "$TOPDIR/tmp/${suite}.sorted";
+    my $rev_path_file = "$TMPDIR/${suite}.sorted";
     print "Merging reverse path lists for ${suite}...\n";
-    system("sort -T $TOPDIR/tmp -m $DBDIR/reverse_${suite}_*.txt -o
${rev_path_file}") == 0
+    system("sort -T $TMPDIR -m $DBDIR/reverse_${suite}_*.txt -o
${rev_path_file}") == 0
        or die "Failed to build merged list";
     my $rev_path_size = stat($rev_path_file)->size;

diff --git a/cron.d/200process_archive b/cron.d/200process_archive
index 29a7385..eecd412 100755
--- a/cron.d/200process_archive
+++ b/cron.d/200process_archive
@@ -5,13 +5,13 @@
 cd "$topdir"

 date
-./bin/parse-translations --english-only
-date
-./bin/parse-packages
-date
-./bin/parse-sources
-date
-./bin/parse-translations
-date
-./bin/parse-contents
+#./bin/parse-translations --english-only
+#date
+#./bin/parse-packages
+#date
+#./bin/parse-sources
+#date
+#./bin/parse-translations
+#date
+/home/lechner/packages/bin/parse-contents
 date

* * *

Attachment: parse-contents.log.xz
Description: Binary data

Attachment: db-listing.txt.xz
Description: application/xz


Reply to: