DeepCite: tool for systematic annotation of scientific literature enabling machine learning-based aggregation of research results

The goal of a research paper is to gather and interpret information into writing, and to share your results and findings for others to learn. Although these are good intentions, the amount and speed at which research papers are being published has increased exponentially over the last decade. It is...

Full description

Saved in:
Bibliographic Details
Main Author: Low, Hector Chong Hao
Other Authors: Vanessa Evers
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/156647
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:The goal of a research paper is to gather and interpret information into writing, and to share your results and findings for others to learn. Although these are good intentions, the amount and speed at which research papers are being published has increased exponentially over the last decade. It is becoming overwhelming to consume the increasing amount of information. This project concerns the implementation of a machine learning text classification model together with an easy-to-use tool for systematic annotation of scientific literature. The purpose of the classification model is to determine whether a sentence is important or not important in the context of research papers. The tool is created to facilitate the collection of the data required by the model. The goal of the project is to create a model to help researchers identify important sentences in papers, thus saving the time and effort to read through long research reports. The tool is developed as a Chrome extension and is able to export annotations made by the user in a table format. Data collected by the tool is then processed and passed to the model for training. The model is implemented as a transformer-based model, which is a deep learning model that utilizes the mechanism of self-attention heavily to compute a representation of the input sequence. The model has demonstrated that it is possible to classify important sentences in the field of research papers.