[gopher] Improved binary file detection in Bucktooth 0.2.2
I'm using buckd to serve up binary files, and noticed that several
binary files (mostly older PDFs with a lot of text in the file header)
were being identified as item type "0" rather than "9". It turns out
that buckd uses the Perl -B operator to determine binary files. To do
this, it examines some number of bytes in the file header for certain
characteristics (nul bytes, high-order bits set, etc.) and if that
number of bytes exceeds 30%, Perl identifies it as a binary file.
This wasn't accurate enough for my purposes, so I modified buckd.in so
that it calls the UNIX "file" command and greps for the string "text"
(guaranteed to be returned if a file is identified as a text file).
I just want to emphasize that this is *not* a problem with Bucktooth,
but rather an issue with Perl.
Here's the patchfile with the change. I opted to modify buckd.in and
simply regenerate buckd.
--- buckd.in 2007-12-28 01:21:30.000000000 -0600
+++ buckd.in.new 2007-12-28 01:20:58.000000000 -0600
@@ -289,7 +289,7 @@
($xentr =~ /\.jpe?g$/i) ? "I" :
($xentr =~ /\.html?$/i) ? "h" :
($xentr =~ /\.hqx$/i) ? "4" :
- (-B $xentr) ? "9" :
+ (grep(!/text/, `file $xentr`)) ? "9" :
$xentr =~ s/^$DIR//;
return ($itype, ($pentr eq $xentr) ? '' : $xentr);