A novel pipeline for table extraction using deep learning

Table extraction refers to the detection and extraction of tables from documents and images while preserving their structural layout and content. With the ever-growing volume of digital files and content, there is an increasing demand for the automated extraction of tables for consumption in a progr...

Full description

Saved in:
Bibliographic Details
Main Author: Lee, Seng Cheong
Other Authors: School of Computer Science and Engineering
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2020
Subjects:
Online Access:https://hdl.handle.net/10356/136597
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-136597
record_format dspace
spelling sg-ntu-dr.10356-1365972020-01-06T06:18:29Z A novel pipeline for table extraction using deep learning Lee, Seng Cheong School of Computer Science and Engineering Loke Yuan Ren yrloke@ntu.edu.sg Engineering::Computer science and engineering Table extraction refers to the detection and extraction of tables from documents and images while preserving their structural layout and content. With the ever-growing volume of digital files and content, there is an increasing demand for the automated extraction of tables for consumption in a programmatic format, as well as in support of advanced applications such as information retrieval and natural language processing. This project proposes an automated pipeline for table extraction using convolutional neural networks (CNN). The pipeline consists of a table detection module, which detects the presence of tables and extract the table regions using an object detection CNN model, and a table structure recognition module, which extracts table cells and their contents before reconstructing the table structure. To enhance performance of the table detection module, modifications were implemented into the table detection model and evaluated against their non-modified versions. The report will first review existing literature for table detection and table structure recognition. Next, the report introduces the datasets utilized for training, as well as data augmentation methods, the architectures utilized in the evaluation of single-stage approaches and experiments on modifications carried out to improve performance. The evaluation metrics and results will then be presented and discussed. Several experiments carried out in this project were discovered to show promising results over their non-modified counterparts. Additionally, the pipeline was successfully demonstrated to perform table extraction, thus demonstrating the viability of the overall process. Bachelor of Engineering (Computer Science) 2020-01-06T06:17:30Z 2020-01-06T06:17:30Z 2019 Final Year Project (FYP) https://hdl.handle.net/10356/136597 en application/pdf application/pdf text/html Nanyang Technological University
institution Nanyang Technological University
building NTU Library
country Singapore
collection DR-NTU
language English
topic Engineering::Computer science and engineering
spellingShingle Engineering::Computer science and engineering
Lee, Seng Cheong
A novel pipeline for table extraction using deep learning
description Table extraction refers to the detection and extraction of tables from documents and images while preserving their structural layout and content. With the ever-growing volume of digital files and content, there is an increasing demand for the automated extraction of tables for consumption in a programmatic format, as well as in support of advanced applications such as information retrieval and natural language processing. This project proposes an automated pipeline for table extraction using convolutional neural networks (CNN). The pipeline consists of a table detection module, which detects the presence of tables and extract the table regions using an object detection CNN model, and a table structure recognition module, which extracts table cells and their contents before reconstructing the table structure. To enhance performance of the table detection module, modifications were implemented into the table detection model and evaluated against their non-modified versions. The report will first review existing literature for table detection and table structure recognition. Next, the report introduces the datasets utilized for training, as well as data augmentation methods, the architectures utilized in the evaluation of single-stage approaches and experiments on modifications carried out to improve performance. The evaluation metrics and results will then be presented and discussed. Several experiments carried out in this project were discovered to show promising results over their non-modified counterparts. Additionally, the pipeline was successfully demonstrated to perform table extraction, thus demonstrating the viability of the overall process.
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Lee, Seng Cheong
format Final Year Project
author Lee, Seng Cheong
author_sort Lee, Seng Cheong
title A novel pipeline for table extraction using deep learning
title_short A novel pipeline for table extraction using deep learning
title_full A novel pipeline for table extraction using deep learning
title_fullStr A novel pipeline for table extraction using deep learning
title_full_unstemmed A novel pipeline for table extraction using deep learning
title_sort novel pipeline for table extraction using deep learning
publisher Nanyang Technological University
publishDate 2020
url https://hdl.handle.net/10356/136597
_version_ 1681034903288807424