Information extraction from bibliography data
DBLP is a computer science bibliography hosted by the University of Trier from Germany. It contains bibliographic information on major computer science journals and proceedings. As of Dec 2017, there were 4,004,065 publications, 2,012,222 authors, 5,263 conferences and 1566 journals. Due to the magn...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
2018
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/74009 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-74009 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-740092023-03-03T20:59:07Z Information extraction from bibliography data Toh, Joel Zhu Er Kong Wai-Kin Adams School of Computer Science and Engineering DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing DRNTU::Engineering::Computer science and engineering::Information systems::Information interfaces and presentation DBLP is a computer science bibliography hosted by the University of Trier from Germany. It contains bibliographic information on major computer science journals and proceedings. As of Dec 2017, there were 4,004,065 publications, 2,012,222 authors, 5,263 conferences and 1566 journals. Due to the magnitude of information, it is tedious for users to gain valuable insights and information from the data. In order to bridge this gap, this report consists of 4 main objectives. Firstly, parsing the large DBLP XML file and other datasets into a relational database to accommodate efficient querying. Secondly, an exploration of techniques used to extract author’s career length, ethnicity, area of specialization and gender from the DBLP data. In addition, this paper also explored the data to discover knowledge. Thirdly, modeling the data to perform link prediction to predict who might an author collaborate with in future. This includes improving the existing link prediction methods with the concept of homophily. Fourthly, this report also introduces a web application that was developed for data analysis and data visualization of the DBLP data. This helps users gain insight and make sense of the data. Finally, this report discusses the results from the link prediction and interprets the newly discovered insights. Bachelor of Engineering (Computer Science) 2018-04-23T05:58:18Z 2018-04-23T05:58:18Z 2018 Final Year Project (FYP) http://hdl.handle.net/10356/74009 en Nanyang Technological University 84 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing DRNTU::Engineering::Computer science and engineering::Information systems::Information interfaces and presentation |
spellingShingle |
DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing DRNTU::Engineering::Computer science and engineering::Information systems::Information interfaces and presentation Toh, Joel Zhu Er Information extraction from bibliography data |
description |
DBLP is a computer science bibliography hosted by the University of Trier from Germany. It contains bibliographic information on major computer science journals and proceedings. As of Dec 2017, there were 4,004,065 publications, 2,012,222 authors, 5,263 conferences and 1566 journals. Due to the magnitude of information, it is tedious for users to gain valuable insights and information from the data. In order to bridge this gap, this report consists of 4 main objectives. Firstly, parsing the large DBLP XML file and other datasets into a relational database to accommodate efficient querying. Secondly, an exploration of techniques used to extract author’s career length, ethnicity, area of specialization and gender from the DBLP data. In addition, this paper also explored the data to discover knowledge. Thirdly, modeling the data to perform link prediction to predict who might an author collaborate with in future. This includes improving the existing link prediction methods with the concept of homophily. Fourthly, this report also introduces a web application that was developed for data analysis and data visualization of the DBLP data. This helps users gain insight and make sense of the data. Finally, this report discusses the results from the link prediction and interprets the newly discovered insights. |
author2 |
Kong Wai-Kin Adams |
author_facet |
Kong Wai-Kin Adams Toh, Joel Zhu Er |
format |
Final Year Project |
author |
Toh, Joel Zhu Er |
author_sort |
Toh, Joel Zhu Er |
title |
Information extraction from bibliography data |
title_short |
Information extraction from bibliography data |
title_full |
Information extraction from bibliography data |
title_fullStr |
Information extraction from bibliography data |
title_full_unstemmed |
Information extraction from bibliography data |
title_sort |
information extraction from bibliography data |
publishDate |
2018 |
url |
http://hdl.handle.net/10356/74009 |
_version_ |
1759856890308722688 |