Abstract
We live in the century of technology, where the enormous evolution of data and science has recently favored a strong interest in processing, transmitting, and storing information. If, in the past, only a human mind could extract meaningful information from image data, after decades of dedicated research, scientists have managed to build complex systems that can identify different areas, tables, and texts from scanned documents, all the obtained information being easily accessed and passed by one to another. Books, newspapers, maps, letters, drawings - all types of documents can be scanned and processed in order to become available in a digital format. In the digital world, the storage space is very small compared to physical documents, so these applications will replace millions of old paper volumes with a single memory disk and will be accessible at the same time for anyone using just Internet access and without having a risk of deterioration. Other problems, such as ecological issues, accessibility and flexibility constraints can be solved by the use of document image analysis systems. This article presents the methods and techniques used to process on-paper documents and convert them to electronic ones, starting from pixel level and getting to the level of the entire document. The main purpose of Document Image Analysis Systems is to recognize texts and graphical interpretations from images, extract, format and present their contained information accordingly to the people's needs. We will also try to provide solid ground for practitioners that implement systems from this category to enhance the unsupervised processing features in order to make physical documents easily available to the masses. |