On exploring and visualizing conference relationship

The Digital Bibliography & Library Project (DBLP) is a computer science bibliography which holds records of millions of publications. A XML copy of the DBLP data is available for download from its official web page. Due to the large file size of the DBLP XML, loading of the DBLP XML into the mem...

Full description

Saved in:
Bibliographic Details
Main Author: Lim, Jia Xing.
Other Authors: Sun Aixin
Format: Final Year Project
Language:English
Published: 2012
Subjects:
Online Access:http://hdl.handle.net/10356/48781
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:The Digital Bibliography & Library Project (DBLP) is a computer science bibliography which holds records of millions of publications. A XML copy of the DBLP data is available for download from its official web page. Due to the large file size of the DBLP XML, loading of the DBLP XML into the memory for every run of for any type of program created would be very inefficient. Furthermore, manipulation of the DBLP XML data in the memory would require the system to have a large amount of memory. In this report, we propose a design of a database through the study and understanding of the structure use in the storing of data in the DBLP XML. With the design of the database, a program will be created to migrate the data from the DBLP XML to MySQL database. With the migration of the data to a database, various types of programs could then be written to perform various kinds of data manipulation. In the context of this project, this report discusses how a program would be written to extract conference related information from the database and build a Lucene index on the extracted information. With the index created, similarities between different conferences will be computed based on papers published and the authors of the papers published. A graph based on the similarities of the conferences would then be generate and visualize.