Information extraction and analysis of DBLP data

There is much information that can be extracted and analysed from the large collection of DBLP bibliography data. However, it can be a difficult to extract the useful information from such a large set of data with more than 3 million entries. The purpose of this report is to highlight the work done...

Full description

Saved in:
Bibliographic Details
Main Author: Neo, Lynette Shi Yun
Other Authors: Ke Yiping, Kelly
Format: Final Year Project
Language:English
Published: 2017
Subjects:
Online Access:http://hdl.handle.net/10356/70552
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:There is much information that can be extracted and analysed from the large collection of DBLP bibliography data. However, it can be a difficult to extract the useful information from such a large set of data with more than 3 million entries. The purpose of this report is to highlight the work done during the course of this final year project. There are two main objectives for this project. The first is to parse an XML file containing DBLP bibliography data into CSV files and load them into a relational database. The second is to do data analytics and mining on the data to extract useful information. For the second objective of this project, three data analytics tasks were done to analyse the DBLP data. This project aims to analyse the collaboration between authors of the DBLP community. In this project, the collaboration network of the authors was analysed to show the trend in collaboration between authors. Next, the collaborators of individual authors were obtained to analyse if there was a relation between the authors and their collaborators. Lastly, topic modelling was done on the titles of the publications and the topics are used to suggest collaborators for authors based on the past topics where the author had published in. This report then discusses the results of these analysis done and conclude with suggestions to future work.