[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: document archiving w/ scanner



On Wed, Jul 14, 2004 at 12:38:04AM -0400, Mark Roach wrote:
> On Sat, 2004-07-10 at 01:14 +0200, martin f krafft wrote:
> > also sprach William Ballard <40618.nospam@comcast.net> [2004.07.10.0041 +0200]:
> > > Search the archives for my and other's discussions about project 
> > > gutenbergs tests with gocr and other open source OCR programs.
> > 
> > great pointer. I guess the conclusion here is that gocr and clara
> > pretty much suck and for any serious work, I have to go with
> > OmniPage or other commercial products. Damn.
> 
> At my last employer, I used Ascent Capture (on windows) to scan images
> and index them against a postgresql+debian server and used a wxPython
> application I wrote to search and view them. We used indexing info
> (date, names, etc.) instead of the text of the documents, but Ascent
> Capture can do that too. Obviously there are non-free parts to that
> solution, but that was the best I was able to come up with. If you'd
> like some more info on that setup feel free to drop me a line off-list.

I scan every piece of paper with my name on it and every receipt even 
for bubble gum and shred the originals.  All you need is a clever 
directory structure and some hacky little scripts.  I scan most things 
at 150dpi as .png and produce 50% sized images for eyeballing.  A script 
builds web pages with <img> tags of the 75dpi images, with an <a> link 
to the larger image when you click on it.  It works good enough.  I 
produce about 3GB of scans per year.

The hardest part is shelling around a directory structure like

/paper/d4/BigOldBank/40713/0{1,2,3}.png
/paper/d4/BigOldBank/Slips/Cash/40713-McDonalds.png
/paper/d4/PowerCompany/40614/...
/paper/d4/E-Broker/31231~30101/01.png

that's a lot of keystrokes when you're scanning.  I wrote a little GUI 
app I plan to put on SourceForge that lets me pick the elements from 
lists and has a calendar to enter dates, then renames them.

I find things like PaperPort constraining.  I like my hacky-scripts, 
like a lot of Linux things they are a bit hacky but make you happy!



Reply to: