[Mageia-discuss] Reading payment forms with a scanner

Juergen Harms juergen.harms at unige.ch
Fri Feb 15 17:47:04 CET 2013

> In the display I sugest the user get presented the original _scanned image_
> and the end result.

Right, is planned to be done. For now, the entire display is a quick 
hack. So far I only display the middle field (middle of 3) which, I am 
sure, holds the reference data - the data most likely to be subject to 
typos. I will probably rearrange the way how things are displayed to 
show all fields. Another option to envisage - if experience shows that 
this is worth while - is to interactively help when parsing lines that 
result from poor OCR conversion (get rid of garbage, separate lines into 
fields); but: small is beautiful. I need to get a clearer understanding 
of the syntax and semantics of the fields of scanned lines

In the meantime I have focused on experimenting - and had a hard time 
with lack of reproducability of the quality of results. But the 
explanation has become clear: tiling the slips on the scanner has a 
tendency to produce bad angular registration, and tesseract (and, even 
more, gocr) are very susceptible and get confused if lines are badly 
aligned, which is understandable. I hope I wont hit more problems of 
that kind, but it is too early to be sure.

Interim results:
- no problem if I scan a single slip
- no problem if I take great care when tiling multiple slips on the 
scanner (worth while since scanning is so slow, and simple because the 
line meant for OCR is at the bottom and has much white on top and 
bottom) - maybe I will make a mechanical contraption to help getting the 
alignment right,
- overall: looks good, I decided to put in some more time,
- tesseract clearly provides better results than gocr,
- selection of parameters (resolution, resizing etc.) is important, but 
what I have (partly result of googling) is close to optimal,
- parameters for xsane are painful to handle (presently, my .sane 
directory is a link that I switch between configurations for 
straightforward scanning and for slip handling.

I will re-post once I have reached some kind of "interim product" and 
have confidence that it is solid (sorry, for swiss payment slips now - 
but keeping in mind the interest to be extensible - there wont be tons 
of code).


More information about the Mageia-discuss mailing list