Table detection and recognition from image-based document and implementation of software application

Table Detection and Recognition refers to the detection and recognition of table from documents and images while preserving their layout and structure. With the increasing number of digital files and contents and many of customers are uploading documents via scanners and mobile devices with camera,...

Full description

Saved in:
Bibliographic Details
Main Author: Kong, Alson
Other Authors: Loke Yuan Ren
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2021
Subjects:
Online Access:https://hdl.handle.net/10356/148071
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Table Detection and Recognition refers to the detection and recognition of table from documents and images while preserving their layout and structure. With the increasing number of digital files and contents and many of customers are uploading documents via scanners and mobile devices with camera, there is an increasing demand for automated table detection and recognition for consumption, and in support of advanced application related to Natural Language Processing, Summarization and Information Retrieval. This project proposed an automated pipeline for table detection and recognition using transfer learning model – CascadeTabNet, an improved deep learning-based approach for solving both problems of table detection and recognition using a single Convolutional Neural Network (CNN) model. After that, a web-based software application will be implemented using Django to transform the table detection system for user interactions. The report will include the current and existing literature review for table detection and recognition. Next, the report will introduce the datasets collected for training and evaluation as well as the image augmentation method, the architecture of the model used, and the experiments carried out. The evaluation metric and results will then be presented and discussed. Furthermore, the default method for standardizing the bounding box format for evaluation will also be presented. Additionally, the implementation and design of the web application will also be discussed.