Document image segmentation and classification

a fast speed and robust document image segmentation and classification algorithm based on bottom-up strategy is proposed. Several techniques are used to overcome the slow speed limitation and large memory space requirement of the traditional bottom-up strategy. In line segment extraction, byte-based...

Full description

Saved in:
Bibliographic Details
Main Author: Chang, Kim Wah.
Other Authors: Shao, Lejun
Format: Theses and Dissertations
Language:English
Published: 2009
Subjects:
Online Access:http://hdl.handle.net/10356/19756
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:a fast speed and robust document image segmentation and classification algorithm based on bottom-up strategy is proposed. Several techniques are used to overcome the slow speed limitation and large memory space requirement of the traditional bottom-up strategy. In line segment extraction, byte-based operation is used instead of bit-based operation, precomputed tables are used where the data byte of the document image is used as an index into the table, and the attributes of line segment(s) contained in the data byte are returned, state machine is used in conjunction with the look-up tables to form linked lists of line segments. In connected component forming process, line segments formed in two consecutive scan lines will be merged into connected components immediately. This greatly reduced the memory space requirement. In classification stage, attributes extracted out from the data byte in the segmentation process are used. This makes the classification an easy task.