[gopher] Improved binary file detection in Bucktooth 0.2.2

To: gopher@complete.org
Subject: [gopher] Improved binary file detection in Bucktooth 0.2.2
From: brian@pongonova.net
Date: Fri, 28 Dec 2007 01:23:39 -0600
Message-id: <[🔎] 20071228072339.GA25327@pongonova.net>
Reply-to: gopher@complete.org

I'm using buckd to serve up binary files, and noticed that several
binary files (mostly older PDFs with a lot of text in the file header)
were being identified as item type "0" rather than "9". It turns out
that buckd uses the Perl -B operator to determine binary files.  To do
this, it examines some number of bytes in the file header for certain
characteristics (nul bytes, high-order bits set, etc.) and if that
number of bytes exceeds 30%, Perl identifies it as a binary file.

This wasn't accurate enough for my purposes, so I modified buckd.in so
that it calls the UNIX "file" command and greps for the string "text"
(guaranteed to be returned if a file is identified as a text file).

I just want to emphasize that this is *not* a problem with Bucktooth,
but rather an issue with Perl.

Here's the patchfile with the change.  I opted to modify buckd.in and
simply regenerate buckd.

--- buckd.in    2007-12-28 01:21:30.000000000 -0600
+++ buckd.in.new        2007-12-28 01:20:58.000000000 -0600
@@ -289,7 +289,7 @@
                ($xentr =~ /\.jpe?g$/i) ? "I" :
                ($xentr =~ /\.html?$/i) ? "h" :
                ($xentr =~ /\.hqx$/i) ? "4" :
-               (-B $xentr) ? "9" :
+               (grep(!/text/, `file $xentr`)) ? "9" :
        "0";
        $xentr =~ s/^$DIR//;
        return ($itype, ($pentr eq $xentr) ? '' : $xentr);

  --Brian

Reply to:

Follow-Ups:
- [gopher] Re: Improved binary file detection in Bucktooth 0.2.2
  - From: Cameron Kaiser <spectre@floodgap.com>

Prev by Date: [gopher] Re: PocketPC / Smartphones
Next by Date: [gopher] Re: Improved binary file detection in Bucktooth 0.2.2
Previous by thread: [gopher] Re: PocketPC / Smartphones
Next by thread: [gopher] Re: Improved binary file detection in Bucktooth 0.2.2
Index(es):
- Date
- Thread