[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: document archiving w/ scanner



On Fri, 9 Jul 2004, martin f krafft wrote:

> also sprach Andrew Perrin <clists@perrin.socsci.unc.edu> [2004.07.09.1752 +0200]:
> > I use an Epson SU1640 Office, which includes a document feeder and
> > can be connected via either USB or SCSI.  It works fine under
> > debian, using the SANE backends, although the one put out by epson
> > (the "epkowa" driver) works better than the epson one included
> > with SANE.  I wrote a simple script to turn documents into PDF's;
> > it's not exactly perfect, but it does the job:
> > http://www.unc.edu/home/aperrin/tips/src/pdfscan-pl.txt
>
> Since I assume PNM files to be graphic files, `convert`ing them and
> `ps2pdf`ing them will result in PDFs storing image data. These PDFs
> are not going to be searchable. Can you confirm this?
>

Correct - if you want searchable text you need some OCR filter. I've used
gocr with some, moderate, success, but it's by no means perfect. Others
have recommended clara, which is probably better but requires too much
user involvement for my taste!

ap

----------------------------------------------------------------------
Andrew J Perrin - http://www.unc.edu/~aperrin
Assistant Professor of Sociology, U of North Carolina, Chapel Hill
clists@perrin.socsci.unc.edu * andrew_perrin (at) unc.edu





Reply to: