Document imaging is an important component of Enterprise Content Management systems, helping to capture paper-based documents. We look at the major document imaging tools used today.
A scanner is the primary document imaging tool. It s a device that converts paper images, printed text, handwriting or even an object like an ornament into a digital image.
A scanner reads red-green-blue color (RGB) data and this data is then processed with the scanner s algorithm for making adjustments for different exposure conditions. Image quality depends on color depth, resolution, and density range, as well as the quality of the algorithm. Continued research has fine-tuned the algorithm to the extent that images can now be better than the originals.
OCR or Optical Character Recognition is a technology for converting images of text documents into machine-readable text. While a high degree of accuracy has been achieved in recognizing printed or typewritten text, the ability to recognize different kinds of handwriting is still imperfect.
A human review is usually needed to ensure 100-percent accuracy of the converted document.
Computer systems store the scanned images of documents, converted into machine-readable text, in their repositories. But how can any of these particular documents be quickly retrieved?
Where the number of documents is few, this might not be such a problem. However, when this number runs into thousands or millions, finding a particular document does indeed become a problem.
It is here that indexing comes into the picture. Index data attached to the documents allow search-engine style queries to be performed to locate specific documents from among the mass of stored documents.
Search-engine indexing can be as simple as full-text indexing--where every word in a document is indexed--or keyword or tag indexing where only a few tags attached to the document are indexed. These tags are selected to correctly identify the contents of the document.
Tag indexing assumes that all relevant tags will be attached to each document. If this is not the case, the document might not be found even when relevant for many search queries.
Even images and other kinds of non-text files can be tagged and indexed.
Beyond using document imaging tools to produce digital content--images or text--document imaging tools also exist to produce non-digital output.
A photocopier is a document imaging tool that produces paper copies. Such copies might be needed for distribution or reference purposes.
Computer Output Microfilm produces images of documents on microfilm that s excellent for archiving.
While the above are the basic document imaging tools, modern document imaging tools come in many complex configurations and capability levels.
There is multifunction equipment, commonly known as MFPs (multi-function peripherals), that can scan, print, copy, and fax. Advanced mailroom equipment can extract documents from envelopes, scan the documents, convert images into machine-readable text, generate metadata, and index the documents.
Advanced modern scanners use many new technologies and algorithms to produce very high-quality images, often better than the original. Production-level scan equipment controls many scan stations working on the same or different projects.
Conclusion
Document imaging tools include scanners, OCR technologies, photocopiers, and computer-output microfilm. There are many vendors offering similar equipment, with differing configurations and capabilities.
Combined with indexing algorithms, document imaging can create searchable content repositories more easily and speedily than data entry or document transcription.
About Author: Ademero, Inc. develops document imaging software. Based largely on user experience, the company s flagship product, Content Central , is a browser-based document management software system created to provide businesses and other organizations with a convenient way to capture, retrieve, and manage information originating in hard copy or digital form. Access a live preview of this document management solution by visiting the Ademero web site. beata shea
Bookmark it:
Tuesday, March 18, 2008
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment