A new visual signature for content-based indexing of low resolution documents
This paper proposes a new visual signature for content –based indexing of low resolution documents. Camera Based Document Analysis and Recognition (CBDAR) has been established which deals with the textual information in scene images taken by low cost hand held devices like digital camera, cell p...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
2012
|
Subjects: | |
Online Access: | http://eprints.uthm.edu.my/7097/1/J14168_5130d0b6fdee9bb0e61a4edec1d3837d.pdf http://eprints.uthm.edu.my/7097/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Tun Hussein Onn Malaysia |
Language: | English |
Summary: | This paper proposes a new visual signature for content –based indexing of low resolution documents. Camera Based Document Analysis and Recognition (CBDAR) has been established which deals with
the textual information in scene images taken by low cost hand held devices like digital camera, cell
phones, etc. A lot of applications like text translation, reading text for visually impaired and blind
person, information retrieval from media document, e-learning, etc., can be built using the techniques
developed in CBDAR domain. The proposed approach of extraction of textual information is
composed of three steps: image segmentation, text localization and extraction, and Optical Character
Recognition. First of all, for pre-processing the resolution of each image is checked for re-sampling
to a common resolution format (720 X 540). Then, the final image is converted to grayscale and
binarized using Otsu segmentation method for further processing. In addition, looking at the mean
horizontal run length of both black and white pixels, the proper segmentation of foreground objects is
checked. In the post-processing step, the text localizer validates the candidate text regions proposed
by text detector. We have employed a connected component approach for text localization. The
extracted text is then has been successfully recognized using ABBYY FineReader for OCR. Apart
from OCR, we had created a novel feature vectors from textual information for Content-Based Image
Retrieval (CBIR). |
---|