Bug#803342: dhelp: Weekly cron job terminates, doesn't create 'documents.index'
Hi,
I am also affected by this bug. Here is the e-mail I receive every week (the complete e-mail is 16MB !):
/etc/cron.weekly/dhelp:
terminate called after throwing an instance of 'std::out_of_range'
what(): basic_string::erase: __pos (which is 18446744073709551615) > this->size() (which is 0)
Dhelp::IndexerError: Broken pipe indexing /usr/share/doc/muscle/muscle.html
, /usr/share/doc/muscle/muscle.html
[...]
, /usr/share/doc/libatlas-doc/atlas_devel.pdf.gz
, using /usr/bin/index++ --config-file /usr/share/dhelp/config/swish++.conf --index-file /var/lib/dhelp/documents.index --follow-links - (/usr/lib/ruby/vendor_ruby/dhelp.rb:616:in `rescue in index'
/usr/lib/ruby/vendor_ruby/dhelp.rb:609:in `index'
/usr/sbin/dhelp_parse:171:in `do_deferred_indexing'
/usr/sbin/dhelp_parse:205:in `main'
/usr/sbin/dhelp_parse:221:in `<main>')
A quick way to reproduce this bug is to run this command:
# echo /usr/share/doc/libatlas-doc/atlas_devel.pdf.gz | /usr/bin/index++ --config-file /usr/share/dhelp/config/swish++.conf --index-file index -
terminate called after throwing an instance of 'std::out_of_range'
what(): basic_string::erase: __pos (which is 18446744073709551615) > this->size() (which is 0)
Aborted (core dumped)
This should make it easier to debug for someone who is familiar with swish++.
My guess is that files with double extensions (.pdf.gz) are not handled correctly and then index++ is confused by binary data and crashes.
This can be seen by running:
# echo /usr/share/doc/libatlas-doc/atlas_devel.pdf.gz | strace -f /usr/bin/index++ --config-file /usr/share/dhelp/config/swish++.conf --index-file index -
and in the strace log, I see:
read(0, "/usr/share/doc/libatlas-doc/atla"..., 4096) = 47
stat("/usr/share/doc/libatlas-doc/atlas_devel.pdf.gz", {st_mode=S_IFREG|0644, st_size=253989, ...}) = 0
lstat("/usr/share/doc/libatlas-doc/atlas_devel.pdf.gz", {st_mode=S_IFREG|0644, st_size=253989, ...}) = 0
unlink("/var/lib/dhelp/tmp/atlas_devel.pdf") = -1 ENOENT (No such file or directory)
which means that /var/lib/dhelp/tmp/atlas_devel.pdf is never created.
--
Laurent.
Reply to: