A novel pipeline for table extraction using deep learning
Table extraction refers to the detection and extraction of tables from documents and images while preserving their structural layout and content. With the ever-growing volume of digital files and content, there is an increasing demand for the automated extraction of tables for consumption in a progr...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2020
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/136597 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Table extraction refers to the detection and extraction of tables from documents and images while preserving their structural layout and content. With the ever-growing volume of digital files and content, there is an increasing demand for the automated extraction of tables for consumption in a programmatic format, as well as in support of advanced applications such as information retrieval and natural language processing.
This project proposes an automated pipeline for table extraction using convolutional neural networks (CNN). The pipeline consists of a table detection module, which detects the presence of tables and extract the table regions using an object detection CNN model, and a table structure recognition module, which extracts table cells and their contents before reconstructing the table structure. To enhance performance of the table detection module, modifications were implemented into the table detection model and evaluated against their non-modified versions.
The report will first review existing literature for table detection and table structure recognition. Next, the report introduces the datasets utilized for training, as well as data augmentation methods, the architectures utilized in the evaluation of single-stage approaches and experiments on modifications carried out to improve performance. The evaluation metrics and results will then be presented and discussed. Several experiments carried out in this project were discovered to show promising results over their non-modified counterparts. Additionally, the pipeline was successfully demonstrated to perform table extraction, thus demonstrating the viability of the overall process. |
---|