Information extraction from bibliography data
DBLP is a computer science bibliography hosted by the University of Trier from Germany. It contains bibliographic information on major computer science journals and proceedings. As of Dec 2017, there were 4,004,065 publications, 2,012,222 authors, 5,263 conferences and 1566 journals. Due to the magn...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
2018
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/74009 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | DBLP is a computer science bibliography hosted by the University of Trier from Germany. It contains bibliographic information on major computer science journals and proceedings. As of Dec 2017, there were 4,004,065 publications, 2,012,222 authors, 5,263 conferences and 1566 journals. Due to the magnitude of information, it is tedious for users to gain valuable insights and information from the data. In order to bridge this gap, this report consists of 4 main objectives. Firstly, parsing the large DBLP XML file and other datasets into a relational database to accommodate efficient querying. Secondly, an exploration of techniques used to extract author’s career length, ethnicity, area of specialization and gender from the DBLP data. In addition, this paper also explored the data to discover knowledge. Thirdly, modeling the data to perform link prediction to predict who might an author collaborate with in future. This includes improving the existing link prediction methods with the concept of homophily. Fourthly, this report also introduces a web application that was developed for data analysis and data visualization of the DBLP data. This helps users gain insight and make sense of the data. Finally, this report discusses the results from the link prediction and interprets the newly discovered insights. |
---|