Information extraction from bibliography data

DBLP is a computer science bibliography hosted by the University of Trier from Germany. It contains bibliographic information on major computer science journals and proceedings. As of Dec 2017, there were 4,004,065 publications, 2,012,222 authors, 5,263 conferences and 1566 journals. Due to the magn...

Full description

Saved in:
Bibliographic Details
Main Author: Toh, Joel Zhu Er
Other Authors: Kong Wai-Kin Adams
Format: Final Year Project
Language:English
Published: 2018
Subjects:
Online Access:http://hdl.handle.net/10356/74009
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-74009
record_format dspace
spelling sg-ntu-dr.10356-740092023-03-03T20:59:07Z Information extraction from bibliography data Toh, Joel Zhu Er Kong Wai-Kin Adams School of Computer Science and Engineering DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing DRNTU::Engineering::Computer science and engineering::Information systems::Information interfaces and presentation DBLP is a computer science bibliography hosted by the University of Trier from Germany. It contains bibliographic information on major computer science journals and proceedings. As of Dec 2017, there were 4,004,065 publications, 2,012,222 authors, 5,263 conferences and 1566 journals. Due to the magnitude of information, it is tedious for users to gain valuable insights and information from the data. In order to bridge this gap, this report consists of 4 main objectives. Firstly, parsing the large DBLP XML file and other datasets into a relational database to accommodate efficient querying. Secondly, an exploration of techniques used to extract author’s career length, ethnicity, area of specialization and gender from the DBLP data. In addition, this paper also explored the data to discover knowledge. Thirdly, modeling the data to perform link prediction to predict who might an author collaborate with in future. This includes improving the existing link prediction methods with the concept of homophily. Fourthly, this report also introduces a web application that was developed for data analysis and data visualization of the DBLP data. This helps users gain insight and make sense of the data. Finally, this report discusses the results from the link prediction and interprets the newly discovered insights. Bachelor of Engineering (Computer Science) 2018-04-23T05:58:18Z 2018-04-23T05:58:18Z 2018 Final Year Project (FYP) http://hdl.handle.net/10356/74009 en Nanyang Technological University 84 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing
DRNTU::Engineering::Computer science and engineering::Information systems::Information interfaces and presentation
spellingShingle DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing
DRNTU::Engineering::Computer science and engineering::Information systems::Information interfaces and presentation
Toh, Joel Zhu Er
Information extraction from bibliography data
description DBLP is a computer science bibliography hosted by the University of Trier from Germany. It contains bibliographic information on major computer science journals and proceedings. As of Dec 2017, there were 4,004,065 publications, 2,012,222 authors, 5,263 conferences and 1566 journals. Due to the magnitude of information, it is tedious for users to gain valuable insights and information from the data. In order to bridge this gap, this report consists of 4 main objectives. Firstly, parsing the large DBLP XML file and other datasets into a relational database to accommodate efficient querying. Secondly, an exploration of techniques used to extract author’s career length, ethnicity, area of specialization and gender from the DBLP data. In addition, this paper also explored the data to discover knowledge. Thirdly, modeling the data to perform link prediction to predict who might an author collaborate with in future. This includes improving the existing link prediction methods with the concept of homophily. Fourthly, this report also introduces a web application that was developed for data analysis and data visualization of the DBLP data. This helps users gain insight and make sense of the data. Finally, this report discusses the results from the link prediction and interprets the newly discovered insights.
author2 Kong Wai-Kin Adams
author_facet Kong Wai-Kin Adams
Toh, Joel Zhu Er
format Final Year Project
author Toh, Joel Zhu Er
author_sort Toh, Joel Zhu Er
title Information extraction from bibliography data
title_short Information extraction from bibliography data
title_full Information extraction from bibliography data
title_fullStr Information extraction from bibliography data
title_full_unstemmed Information extraction from bibliography data
title_sort information extraction from bibliography data
publishDate 2018
url http://hdl.handle.net/10356/74009
_version_ 1759856890308722688