Information extraction and analysis of DBLP data
There is much information that can be extracted and analysed from the large collection of DBLP bibliography data. However, it can be a difficult to extract the useful information from such a large set of data with more than 3 million entries. The purpose of this report is to highlight the work done...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
2017
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/70552 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-70552 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-705522023-03-03T20:23:22Z Information extraction and analysis of DBLP data Neo, Lynette Shi Yun Ke Yiping, Kelly School of Computer Science and Engineering DRNTU::Engineering::Computer science and engineering There is much information that can be extracted and analysed from the large collection of DBLP bibliography data. However, it can be a difficult to extract the useful information from such a large set of data with more than 3 million entries. The purpose of this report is to highlight the work done during the course of this final year project. There are two main objectives for this project. The first is to parse an XML file containing DBLP bibliography data into CSV files and load them into a relational database. The second is to do data analytics and mining on the data to extract useful information. For the second objective of this project, three data analytics tasks were done to analyse the DBLP data. This project aims to analyse the collaboration between authors of the DBLP community. In this project, the collaboration network of the authors was analysed to show the trend in collaboration between authors. Next, the collaborators of individual authors were obtained to analyse if there was a relation between the authors and their collaborators. Lastly, topic modelling was done on the titles of the publications and the topics are used to suggest collaborators for authors based on the past topics where the author had published in. This report then discusses the results of these analysis done and conclude with suggestions to future work. Bachelor of Engineering (Computer Science) 2017-04-27T06:16:43Z 2017-04-27T06:16:43Z 2017 Final Year Project (FYP) http://hdl.handle.net/10356/70552 en Nanyang Technological University 44 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Computer science and engineering |
spellingShingle |
DRNTU::Engineering::Computer science and engineering Neo, Lynette Shi Yun Information extraction and analysis of DBLP data |
description |
There is much information that can be extracted and analysed from the large collection of DBLP bibliography data. However, it can be a difficult to extract the useful information from such a large set of data with more than 3 million entries. The purpose of this report is to highlight the work done during the course of this final year project. There are two main objectives for this project. The first is to parse an XML file containing DBLP bibliography data into CSV files and load them into a relational database. The second is to do data analytics and mining on the data to extract useful information. For the second objective of this project, three data analytics tasks were done to analyse the DBLP data. This project aims to analyse the collaboration between authors of the DBLP community. In this project, the collaboration network of the authors was analysed to show the trend in collaboration between authors. Next, the collaborators of individual authors were obtained to analyse if there was a relation between the authors and their collaborators. Lastly, topic modelling was done on the titles of the publications and the topics are used to suggest collaborators for authors based on the past topics where the author had published in. This report then discusses the results of these analysis done and conclude with suggestions to future work. |
author2 |
Ke Yiping, Kelly |
author_facet |
Ke Yiping, Kelly Neo, Lynette Shi Yun |
format |
Final Year Project |
author |
Neo, Lynette Shi Yun |
author_sort |
Neo, Lynette Shi Yun |
title |
Information extraction and analysis of DBLP data |
title_short |
Information extraction and analysis of DBLP data |
title_full |
Information extraction and analysis of DBLP data |
title_fullStr |
Information extraction and analysis of DBLP data |
title_full_unstemmed |
Information extraction and analysis of DBLP data |
title_sort |
information extraction and analysis of dblp data |
publishDate |
2017 |
url |
http://hdl.handle.net/10356/70552 |
_version_ |
1759855004388163584 |