[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Speeding up dpkg triggers with a list of changes



I had a fun little weekend project.

I tried out using inotify to speed up dpkg file triggers, with man-db
as a test case, not that this approach is limited to that.

My code consists of two programs, one that parses a .triggers file and
collects the events in the background and another one that asks for
those changes.  I didn't need to recompile any programs for this, but
just add a few lines to man-db's postinst file.

Some benchmarks, to motivate this thing.  I chose "the" as a benchmark
since it's a simple package with a singular man page in section 1,
where man1 would be a directory with a few thousand entries.

# time { for a in {1..10}; do dpkg --remove the; dpkg -i /var/cache/apt/archives/the_3.3~rc1-2_amd64.deb ; done }

Plain old man-db trigger:
real    0m39.733s
user    0m11.949s
sys     0m7.768s

With inotify and using mandb -f:
real    0m26.910s
user    0m11.081s
sys     0m4.428s

That's a lot of stat calls left uncalled.

The reason why I've made my test code to handle just singular man
pages is that mandb accepts only one -f parameter.  I didn't try
changing that for this test.  mandb still has a nontrivial startup
time and I wouldn't call it in a loop, as it is.

I'd say that doing this is a worthwhile thing, but I'd like to discuss
the specifics.  How closely should this be associated with dpkg
itself?  Starting the collection process takes about 200ms so I'm not
quite sure how well launching it at the same time as dpkg itself would
work.  With apt-get or aptitude that'd pose no problem.  On the other
hand, man is an example where we could eliminate that delay if we
applied some domain specific knowledge.  Stop readdir early if there
are any non-directories in a directory, since we know that, for man,
none of those will have subdirectories.  We're only adding inotify
watches on directories.

Who should decide what packages have inotify data collection enabled?
I don't expect this level of detail to be useful for all packages.
How configurable should this be?  I doubt any trigger would benefit
from getting a list of a hundred files or so and would be better off
just doing a full run of whatever they're doing.

I'd keep having this information available optional, with having
triggers fall back to do what they currently do.  There's a chance
(however small) that inotify fills up its event buffer and any data
collection routine will have no choice but to bail out, and we have
non-Linux systems to consider too.

I'm not entirely sure this thing couldn't have false negatives, with
having changes go unnoticed.  But triggers are supposed to cope with
that already.

I haven't tried looking at dpkg's source to see what it does to decide
to call a file trigger and why it won't make a file list available, or
what would need to be done to expose that.  I know that it doesn't use
inotify.

Strictly speaking, none of what I did really necessiates dpkg's, apt's
or anyone's cooperation, if I made it an independent daemon and just
let a package's postinst trigger optionally use it if it was active.

I've attached my test code.  I don't know what all earlier attempts
there are at doing this sort of a thing.  Most of the file alteration
monitor software (e.g. fam, gamin, incron) are more geared towards
having actions happen when files change, not recording the changes.

Attachment: inotify-interest.tar.gz
Description: Binary data


Reply to: