Information extraction from bibliography data
Digital Bibliography and Library Project (DBLP) is an online service which provides rich amounts of information in various Computer Science publications. This project aims to build a sentiment analysis model to analyse the polarity of an author’s comment on a citation using the publications in the...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2020
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/137909 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-137909 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1379092020-04-18T03:23:37Z Information extraction from bibliography data Ng, Jian Cheng Ke Yiping, Kelly School of Computer Science and Engineering ypke@ntu.edu.sg Engineering::Computer science and engineering::Data Engineering::Computer science and engineering::Software Digital Bibliography and Library Project (DBLP) is an online service which provides rich amounts of information in various Computer Science publications. This project aims to build a sentiment analysis model to analyse the polarity of an author’s comment on a citation using the publications in the DBLP dataset. This aim can be achieved in the following steps. Firstly, the DBLP XML file was parsed using StAX Parser to extract relevant features before loading into MySQL database. Secondly, data analytics was conducted to understand the DBLP data to discover interesting insights that DBLP data might have. These insights include analysing the distribution of publication, author’s experience, collaborator analysis and prediction and Topic Modelling. Thirdly, the sentiment analysis model was built using various approaches. Before building the model, sentiment text was collected from the publications in the DBLP dataset, and their polarity will be determined based on their direct mentions to another paper, or a list of common positive and negative unigrams and bigram. After collection of the dataset, the model was then built upon various approaches. These approaches include Lexicon Based Approach using TextBlob and VADER Sentiment, Deep Learning Approach using LSTM, and Machine Learning Approach using Decision Tree, Logistic Regression and Naïve Bayes. The parameters were fine tuned to their best accuracy. A comparison between the different models was evaluated using precision and recall. Lastly, a GUI was built to facilitate querying for publication by their name, author, field of study or year of publication. Publicly available PDF file will be downloaded to analyse sentences containing citations. These sentences will have their polarity classified based on the sentiment analysis model. Bachelor of Engineering (Computer Science) 2020-04-18T03:23:37Z 2020-04-18T03:23:37Z 2020 Final Year Project (FYP) https://hdl.handle.net/10356/137909 en SCE19-0333 application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
country |
Singapore |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering::Data Engineering::Computer science and engineering::Software |
spellingShingle |
Engineering::Computer science and engineering::Data Engineering::Computer science and engineering::Software Ng, Jian Cheng Information extraction from bibliography data |
description |
Digital Bibliography and Library Project (DBLP) is an online service which provides rich amounts of information in various Computer Science publications. This project aims to build a sentiment analysis model to analyse the polarity of an author’s comment on a citation using the publications in the DBLP dataset. This aim can be achieved in the following steps.
Firstly, the DBLP XML file was parsed using StAX Parser to extract relevant features before loading into MySQL database. Secondly, data analytics was conducted to understand the DBLP data to discover interesting insights that DBLP data might have. These insights include analysing the distribution of publication, author’s experience, collaborator analysis and prediction and Topic Modelling.
Thirdly, the sentiment analysis model was built using various approaches. Before building the model, sentiment text was collected from the publications in the DBLP dataset, and their polarity will be determined based on their direct mentions to another paper, or a list of common positive and negative unigrams and bigram.
After collection of the dataset, the model was then built upon various approaches. These approaches include Lexicon Based Approach using TextBlob and VADER Sentiment, Deep Learning Approach using LSTM, and Machine Learning Approach using Decision Tree, Logistic Regression and Naïve Bayes. The parameters were fine tuned to their best accuracy. A comparison between the different models was evaluated using precision and recall.
Lastly, a GUI was built to facilitate querying for publication by their name, author, field of study or year of publication. Publicly available PDF file will be downloaded to analyse sentences containing citations. These sentences will have their polarity classified based on the sentiment analysis model. |
author2 |
Ke Yiping, Kelly |
author_facet |
Ke Yiping, Kelly Ng, Jian Cheng |
format |
Final Year Project |
author |
Ng, Jian Cheng |
author_sort |
Ng, Jian Cheng |
title |
Information extraction from bibliography data |
title_short |
Information extraction from bibliography data |
title_full |
Information extraction from bibliography data |
title_fullStr |
Information extraction from bibliography data |
title_full_unstemmed |
Information extraction from bibliography data |
title_sort |
information extraction from bibliography data |
publisher |
Nanyang Technological University |
publishDate |
2020 |
url |
https://hdl.handle.net/10356/137909 |
_version_ |
1681058562016542720 |