Information extraction and analysis of DBLP data

There is much information that can be extracted and analysed from the large collection of DBLP bibliography data. However, it can be a difficult to extract the useful information from such a large set of data with more than 3 million entries. The purpose of this report is to highlight the work done...

Full description

Saved in:
Bibliographic Details
Main Author: Neo, Lynette Shi Yun
Other Authors: Ke Yiping, Kelly
Format: Final Year Project
Language:English
Published: 2017
Subjects:
Online Access:http://hdl.handle.net/10356/70552
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-70552
record_format dspace
spelling sg-ntu-dr.10356-705522023-03-03T20:23:22Z Information extraction and analysis of DBLP data Neo, Lynette Shi Yun Ke Yiping, Kelly School of Computer Science and Engineering DRNTU::Engineering::Computer science and engineering There is much information that can be extracted and analysed from the large collection of DBLP bibliography data. However, it can be a difficult to extract the useful information from such a large set of data with more than 3 million entries. The purpose of this report is to highlight the work done during the course of this final year project. There are two main objectives for this project. The first is to parse an XML file containing DBLP bibliography data into CSV files and load them into a relational database. The second is to do data analytics and mining on the data to extract useful information. For the second objective of this project, three data analytics tasks were done to analyse the DBLP data. This project aims to analyse the collaboration between authors of the DBLP community. In this project, the collaboration network of the authors was analysed to show the trend in collaboration between authors. Next, the collaborators of individual authors were obtained to analyse if there was a relation between the authors and their collaborators. Lastly, topic modelling was done on the titles of the publications and the topics are used to suggest collaborators for authors based on the past topics where the author had published in. This report then discusses the results of these analysis done and conclude with suggestions to future work. Bachelor of Engineering (Computer Science) 2017-04-27T06:16:43Z 2017-04-27T06:16:43Z 2017 Final Year Project (FYP) http://hdl.handle.net/10356/70552 en Nanyang Technological University 44 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering
spellingShingle DRNTU::Engineering::Computer science and engineering
Neo, Lynette Shi Yun
Information extraction and analysis of DBLP data
description There is much information that can be extracted and analysed from the large collection of DBLP bibliography data. However, it can be a difficult to extract the useful information from such a large set of data with more than 3 million entries. The purpose of this report is to highlight the work done during the course of this final year project. There are two main objectives for this project. The first is to parse an XML file containing DBLP bibliography data into CSV files and load them into a relational database. The second is to do data analytics and mining on the data to extract useful information. For the second objective of this project, three data analytics tasks were done to analyse the DBLP data. This project aims to analyse the collaboration between authors of the DBLP community. In this project, the collaboration network of the authors was analysed to show the trend in collaboration between authors. Next, the collaborators of individual authors were obtained to analyse if there was a relation between the authors and their collaborators. Lastly, topic modelling was done on the titles of the publications and the topics are used to suggest collaborators for authors based on the past topics where the author had published in. This report then discusses the results of these analysis done and conclude with suggestions to future work.
author2 Ke Yiping, Kelly
author_facet Ke Yiping, Kelly
Neo, Lynette Shi Yun
format Final Year Project
author Neo, Lynette Shi Yun
author_sort Neo, Lynette Shi Yun
title Information extraction and analysis of DBLP data
title_short Information extraction and analysis of DBLP data
title_full Information extraction and analysis of DBLP data
title_fullStr Information extraction and analysis of DBLP data
title_full_unstemmed Information extraction and analysis of DBLP data
title_sort information extraction and analysis of dblp data
publishDate 2017
url http://hdl.handle.net/10356/70552
_version_ 1759855004388163584