Re: document archiving w/ scanner

To: debian users <debian-user@lists.debian.org>
Subject: Re: document archiving w/ scanner
From: William Ballard <40618.nospam@comcast.net>
Date: Fri, 9 Jul 2004 15:41:28 -0700
Message-id: <[🔎] 20040709224128.GA25081@comcast.net>
Mail-followup-to: debian users <debian-user@lists.debian.org>
In-reply-to: <[🔎] 20040709221528.GB26806@cirrus.madduck.net>
References: <[🔎] 20040709140136.GC1906@cirrus.madduck.net> <[🔎] Pine.LNX.4.53.0407091149590.24135@perrin.socsci.unc.edu> <[🔎] 20040709174707.GD14880@cirrus.madduck.net> <[🔎] Pine.LNX.4.53.0407091621010.24135@perrin.socsci.unc.edu> <[🔎] 20040709221528.GB26806@cirrus.madduck.net>

On Sat, Jul 10, 2004 at 12:15:28AM +0200, martin f krafft wrote:
> also sprach Andrew Perrin <clists@perrin.socsci.unc.edu> [2004.07.09.2221 +0200]:
> > Correct - if you want searchable text you need some OCR filter.
> > I've used gocr with some, moderate, success, but it's by no means
> > perfect. Others have recommended clara, which is probably better
> > but requires too much user involvement for my taste!
> 
> Yes, I am starting to notice that we need to get into the OCR
> domain. I am new to scanning, so please excuse me not making that
> jump before posting.
> 
> So far it sounds like HP has open source drivers for their
> all-in-ones... if I can find one with automated pagefeeding, I am
> off to try clara...

Search the archives for my and other's discussions about project 
gutenbergs tests with gocr and other open source OCR programs.  They are 
all perfect with perfect texts, but basically horribly unusable with 
"typical" texts.  If the text is not perfectly straight with a great big 
font, i.e., printed with OCR in mind, gocr does an abysmal job -- 
whereas closed source OCR software got to the 95% accuracy with these 
"typical" tests in oh I don't know 1996.

The OCR software that comes with Microsoft Office beats the crap out of 
GOCR, even with cleanly printed books with nice fonts that you'd expect 
to be easy to scan.

What's missing in GOCR is a "slanted text straighter" algorithm.

Reply to:

Follow-Ups:
- Re: document archiving w/ scanner
  - From: martin f krafft <madduck@debian.org>

References:
- document archiving w/ scanner
  - From: martin f krafft <madduck@debian.org>
- Re: document archiving w/ scanner
  - From: Andrew Perrin <clists@perrin.socsci.unc.edu>
- Re: document archiving w/ scanner
  - From: martin f krafft <madduck@debian.org>
- Re: document archiving w/ scanner
  - From: Andrew Perrin <clists@perrin.socsci.unc.edu>
- Re: document archiving w/ scanner
  - From: martin f krafft <madduck@debian.org>

Prev by Date: update-rc.d usage
Next by Date: Re: update-rc.d usage
Previous by thread: Re: document archiving w/ scanner
Next by thread: Re: document archiving w/ scanner
Index(es):
- Date
- Thread