OCR Improvement Through
Copyright 1997, Picture Elements, Inc.
Picture Elements is working with its partners in the OCR arena to exploit three advantages to OCR performance provided by Picture Elements capture technologies.
OCR performance can be improved as measured in the following ways:
- higher recognition rates
- lower substitution rates
- higher confidence levels
The Picture Elements No Re-Scan Scanning (tm) architecture enables a variety of approaches to OCR improvement. The relevant Picture Elements technologies are:
- Edge Thresholding
- Multiple Binary Imaging
- Simultaneous Binary and Grayscale Imaging
Edge thresholding, as produced by the VST-1000 chip, results in more precisely formed characters, with accurate stroke widths, which are more completely separated from their background and neighboring characters. These characteristics apply even in cases of lower contrast characters or higher contrast backgrounds.
These benefits accrue to all systems using the VST-1000 chip for producing binary images without any additional work being required by the OCR vendor or systems integrator.
Several scanner manufacturers use these chips in their scanners. This is the lowest cost solution. Ask your scanner vendor if they use Picture Elements technology for their binary images.
Other scanners, if they have grayscale outputs, can be attached to the Picture Elements ISE Board to produce binary images thresholded by the VST-1000 chip.
The ISE Board can accept the D3MST Daughterboard, which uses three VST-1000 chips to simultaneously produce three binary images, called Normal (N), High (H) and Low (L).
- The Normal image, at normal contrast sensitivity, is suitable for 95 to 99 percent of an average document flow.
- The High image, at a higher contrast sensitivity, produces better rendering of characters created with low-contrast marking devices, such as faint pencil or light-colored pens.
- The Low image, at a lower contrast sensitivity, produces a less cluttered image for documents where the background is of high contrast.
This permits the OCR engine to request the High and Low sensitivity images from the scanning subsystem (on an exception basis) for documents where statistically lower confidence levels are seen in the recognition results from the Normal sensitivity image. If this is done for only 1% or 5% of all documents, the increase in needed OCR capacity or network throughput is small (2% to 10%). This is the preferred method.
Alternatively, all three binary images can always be sent on to OCR. While this results in a need for triple the OCR and network throughput, it could result in somewhat better aggregate OCR results, since a greater number of marginal cases will see improvement.
The ISE Board can accept the DJPEG Daughterboard, which provides hardware support for JPEG grayscale or color compression, offering full resolution capture of grayscale images simultaneous with the production of binary images from the same data. Both images can be captured to disk on the scanning workstation.
This permits sophisticated OCR algorithms to normally use the binary data, but to request the use of the compressed grayscale for a given document, whenever a particular binary image gives a low confidence level.
This can enable three possible techniques for exploiting the grayscale data.
- The OCR engine (or an alternative engine for use on these exception items) can directly use the grayscale to segment the characters in the region of the low confidence characters, reverting to the binary image for subsequent steps. For a description of a grayscale segmentation algorithm, see Seong-Whan Lee, et al, A New Methodology for Gray-Scale Character Segmentation and Recognition, IEEE Trans PAMI, Vol 18, No 10, Oct 1996, pp 1045-1050. After segmentation in grayscale, the corresponding binary data can be used for recognition, since the registration between the gray and binary images is perfect.
- The OCR engine can request, on an exception basis, that the ISE Board in the scanning workstation re-process new binary images (or snippets of binary images) from the grayscale. Several of these can be produced, with the recognition results having the highest confidence being used.
- The OCR engine can (for low-confidence exception items) directly use the grayscale image for the recognition step, not just for segmentation. Unfortunately, some vendors claim to be able to do this, but in fact apply a software thresholding algorithm (perhaps iteratively) on the grayscale data before working with it. The sophisticated VST-1000 algorithm, operating at several billion operations per second, will give much better results in such a case (when using the technique of re-processing binary images as described in the previous paragraph).
Note that the key to this architecture is the use of compressed grayscale. Uncompressed grayscale (being perhaps 80 times larger than a Group 4 compressed binary image) cannot be written to disk at full production scanner speeds, forcing the scanner to slow way down as soon as memory is full. Compressed grayscale (at say, 20:1 compression) is only a few times larger than the compressed binary image, requiring a much less heroic effort to increase disk throughput.
Picture Elements has shown that the thresholding results achievable from 10:1 to 20:1 JPEG compressed grayscale are indistinguishable from those produced from raw scanner data. Therefore we call this grayscale image the Surrogate Original (tm), since it can stand in for the original paper document to support re-processing without physically re-scanning. It is the key to the No Re-Scan Scanning (tm) architecture which results in significant scanning labor savings and improved OCR performance.
Note: No Re-Scan Scanning and Surrogate Original are trademarks of Picture Elements, Inc.
For more information, please contact:
Lou Sharpe, firstname.lastname@example.org
Picture Elements, Inc.
| Home |
Copyright © 1996-1997 Picture Elements, Inc.