[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [PATCH v3] Add by-hash support



Julien Cristau writes:
> --- a/dak/clean_suites.py
> +++ b/dak/clean_suites.py
> @@ -384,6 +384,33 @@ SELECT f.id, f.fingerprint FROM fingerprint f
> +def clean_byhash(now_date, session):
> +    Logger.log(["Cleaning out unused by-hash files..."])
> +
> +    q = session.execute("""
> +        DELETE FROM hashfile h
> +        USING suite s, archive a
> +        WHERE s.id = h.suite_id
> +          AND a.id = s.archive_id
> +          AND h.unreferenced + a.stayofexecution < CURRENT_TIMESTAMP
> +        RETURNING a.path, s.suite_name, h.path""")
> +    count = q.rowcount
> +
> +    if not Options["No-Action"]:
> +        for base, suite, path in q.fetchall():

The fetchall() is not necessary: `for row in q` will iterate over all
rows returned.

> +            filename = os.path.join(base, 'dists', suite, path)
> +            if os.path.isfile(filename):
> +                Logger.log(['delete hashfile', suite, path])
> +                os.unlink(filename)
> +            else:
> +                Logger.log(['database referred to non-existing file', filename])

I would prefer just trying to unlink the file and catching the
does-not-exist error.  Besides `os.path.isfile(filename)` returns
false if `filename` is a dangling symlink which we still want to remove.

In case of `No-Action`, the transaction should be rolled back.

> --- a/dak/generate_releases.py
> +++ b/dak/generate_releases.py
> +        if suite.byhash:
> +            query = """
> +                UPDATE hashfile SET unreferenced = CURRENT_TIMESTAMP
> +                WHERE suite_id = :id AND unreferenced IS NULL"""
> +            session.execute(query, {'id': suite.suite_id})
> +
> +            hashes = filter(lambda h: h in hashfuncs, ('MD5Sum', 'SHA1', 'SHA256'))
> +            strong_hash = hashes[-1]

Hmm, I don't like this assumption, but then I have nothing better for
now.

> +                    query = "SELECT 1 FROM hashfile WHERE path = :p AND suite_id = :id"
> +                    q = session.execute(
> +                            query,
> +                            {'p': hashfile, 'id': suite.suite_id})
> +                    if q.rowcount:
> +                        session.execute('''
> +                            UPDATE hashfile SET unreferenced = NULL
> +                            WHERE path = :p and suite_id = :id''',
> +                            {'p': hashfile, 'id': suite.suite_id})
> +                    else:
> +                        session.execute('''
> +                            INSERT INTO hashfile (path, suite_id, unreferenced)
> +                            VALUES (:p, :id, NULL)''',
> +                            {'p': hashfile, 'id': suite.suite_id})

I look forward to `INSERT ... ON CONFLICT UPDATE ...` in Postgres 9.5 :)

The default value for `unreferenced` should not be specified: it's fine
to take the one from the table definition.

> +                    if h == strong_hash:
> +                        with open(filename, 'rb') as src:
> +                            contents = src.read()
> +                        with open(hashfile + '.new', 'wb') as dst:
> +                            dst.write(contents)
> +                        os.rename(hashfile + '.new', hashfile)
> +                        os.unlink(filename)

This is just `os.rename(filename, hashfile)`?  Or `shutil.move` for
the theoretic case of `filename` and `hashfile` being on different
filesystems.

> +                        os.symlink(os.path.join('by-hash', h, fileinfo[filename][h]),
> +                                   filename)
> +                    elif not os.path.exists(hashfile):
> +                        os.symlink(os.path.join('..', strong_hash, fileinfo[filename][strong_hash]),
> +                                   hashfile)

I wonder if hardlinks are the better choice (no risk of dangling
symlinks), but symlinks are probably better for mirrors that don't
mirror hardlinks as such, but duplicate the data?

Ansgar


Reply to: