Deep Learning-based Extraction of Algorithmic Metadata in Full-Text Scholarly Documents
© 2020 Elsevier Ltd The advancements of search engines for traditional text documents have enabled the effective retrieval of massive textual information in a resource-efficient manner. However, such conventional search methodologies often suffer from poor retrieval accuracy especially when document...
Saved in:
Main Authors: | , , , , , |
---|---|
Other Authors: | |
Format: | Article |
Published: |
2020
|
Subjects: | |
Online Access: | https://repository.li.mahidol.ac.th/handle/123456789/57817 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Mahidol University |
id |
th-mahidol.57817 |
---|---|
record_format |
dspace |
spelling |
th-mahidol.578172020-08-25T18:51:43Z Deep Learning-based Extraction of Algorithmic Metadata in Full-Text Scholarly Documents Iqra Safder Saeed Ul Hassan Anna Visvizi Thanapon Noraset Raheel Nawaz Suppawong Tuarob Information Technology University American College of Greece Manchester Metropolitan University Mahidol University Computer Science Decision Sciences Engineering Social Sciences © 2020 Elsevier Ltd The advancements of search engines for traditional text documents have enabled the effective retrieval of massive textual information in a resource-efficient manner. However, such conventional search methodologies often suffer from poor retrieval accuracy especially when documents exhibit unique properties that behoove specialized and deeper semantic extraction. Recently, AlgorithmSeer, a search engine for algorithms has been proposed, that extracts pseudo-codes and shallow textual metadata from scientific publications and treats them as traditional documents so that the conventional search engine methodology could be applied. However, such a system fails to facilitate user search queries that seek to identify algorithm-specific information, such as the datasets on which algorithms operate, the performance of algorithms, and runtime complexity, etc. In this paper, a set of enhancements to the previously proposed algorithm search engine are presented. Specifically, we propose a set of methods to automatically identify and extract algorithmic pseudo-codes and the sentences that convey related algorithmic metadata using a set of machine-learning techniques. In an experiment with over 93,000 text lines, we introduce 60 novel features, comprising content-based, font style based and structure-based feature groups, to extract algorithmic pseudo-codes. Our proposed pseudo-code extraction method achieves 93.32% F1-score, outperforming the state-of-the-art techniques by 28%. Additionally, we propose a method to extract algorithmic-related sentences using deep neural networks and achieve an accuracy of 78.5%, outperforming a Rule-based model and a support vector machine model by 28% and 16%, respectively. 2020-08-25T09:35:14Z 2020-08-25T09:35:14Z 2020-11-01 Article Information Processing and Management. Vol.57, No.6 (2020) 10.1016/j.ipm.2020.102269 03064573 2-s2.0-85085523063 https://repository.li.mahidol.ac.th/handle/123456789/57817 Mahidol University SCOPUS https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85085523063&origin=inward |
institution |
Mahidol University |
building |
Mahidol University Library |
continent |
Asia |
country |
Thailand Thailand |
content_provider |
Mahidol University Library |
collection |
Mahidol University Institutional Repository |
topic |
Computer Science Decision Sciences Engineering Social Sciences |
spellingShingle |
Computer Science Decision Sciences Engineering Social Sciences Iqra Safder Saeed Ul Hassan Anna Visvizi Thanapon Noraset Raheel Nawaz Suppawong Tuarob Deep Learning-based Extraction of Algorithmic Metadata in Full-Text Scholarly Documents |
description |
© 2020 Elsevier Ltd The advancements of search engines for traditional text documents have enabled the effective retrieval of massive textual information in a resource-efficient manner. However, such conventional search methodologies often suffer from poor retrieval accuracy especially when documents exhibit unique properties that behoove specialized and deeper semantic extraction. Recently, AlgorithmSeer, a search engine for algorithms has been proposed, that extracts pseudo-codes and shallow textual metadata from scientific publications and treats them as traditional documents so that the conventional search engine methodology could be applied. However, such a system fails to facilitate user search queries that seek to identify algorithm-specific information, such as the datasets on which algorithms operate, the performance of algorithms, and runtime complexity, etc. In this paper, a set of enhancements to the previously proposed algorithm search engine are presented. Specifically, we propose a set of methods to automatically identify and extract algorithmic pseudo-codes and the sentences that convey related algorithmic metadata using a set of machine-learning techniques. In an experiment with over 93,000 text lines, we introduce 60 novel features, comprising content-based, font style based and structure-based feature groups, to extract algorithmic pseudo-codes. Our proposed pseudo-code extraction method achieves 93.32% F1-score, outperforming the state-of-the-art techniques by 28%. Additionally, we propose a method to extract algorithmic-related sentences using deep neural networks and achieve an accuracy of 78.5%, outperforming a Rule-based model and a support vector machine model by 28% and 16%, respectively. |
author2 |
Information Technology University |
author_facet |
Information Technology University Iqra Safder Saeed Ul Hassan Anna Visvizi Thanapon Noraset Raheel Nawaz Suppawong Tuarob |
format |
Article |
author |
Iqra Safder Saeed Ul Hassan Anna Visvizi Thanapon Noraset Raheel Nawaz Suppawong Tuarob |
author_sort |
Iqra Safder |
title |
Deep Learning-based Extraction of Algorithmic Metadata in Full-Text Scholarly Documents |
title_short |
Deep Learning-based Extraction of Algorithmic Metadata in Full-Text Scholarly Documents |
title_full |
Deep Learning-based Extraction of Algorithmic Metadata in Full-Text Scholarly Documents |
title_fullStr |
Deep Learning-based Extraction of Algorithmic Metadata in Full-Text Scholarly Documents |
title_full_unstemmed |
Deep Learning-based Extraction of Algorithmic Metadata in Full-Text Scholarly Documents |
title_sort |
deep learning-based extraction of algorithmic metadata in full-text scholarly documents |
publishDate |
2020 |
url |
https://repository.li.mahidol.ac.th/handle/123456789/57817 |
_version_ |
1763487506817351680 |