Re: How can I convert old scanned documents to OCR?
Mon May 23, 2016 1:23 am
Trying out Omnipage, which I lucked out on for $59 for the entry?
level one. I've an aversion to anything from Adobe.
In the first document the photocopy quality was very poor, so there were a lot of hits requiring correction and just as many hits not requiring correction but needing visual review and confirmation. Took several hours to go through it in my first pass using the proofreading function, which also constituted a training session.
The software flagged a lot of stuff that would have otherwise been printed as a swearword or racial epithet. Needless to say, I'll have to spend LOTS of time proofreading and re-proofreading to keep the output from having embarrassing results.
Certain pages will have to just be carried forward without conversion, because the OCR engine wants to translate things that should remain image-based.
Convert Paper Documents into Editable Word Document
Signatures, crests, logos, and notary imprints, for example. And some of the speckles on the pages have been translated into characters that really aren't there.
Sort of like the software sees commas, semicolons, periods, and hyphens in poor photocopies just like we see puppy dogs and snowmen in a sky of puffy clouds. The software did quite well considering the poor quality of the copies.
I've already put nearly an entire workday into this project, and probably have another 20-30 hours to go, but I can see now that even with the imperfect source material, this software is going to save me a lot of labor hours; maybe three weeks or more's worth of time.
I also see now that using Dragon to dictate this project would not have been a good use of my time. I have made the better choice, and now I just need to settle in and work it to conclusion.