Document reference and citation analysis

With knowledge growing rampantly as more papers were published, researchers are finding the task to retrieve relevant papers to their domain of research a time-consuming task. The aim for this project is to lessen the stress on searching for relevant papers with the idea that, researcher should just...

Full description

Saved in:
Bibliographic Details
Main Author: Yap, Lina.
Other Authors: Sun Aixin
Format: Final Year Project
Language:English
Published: 2013
Subjects:
Online Access:http://hdl.handle.net/10356/52087
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-52087
record_format dspace
spelling sg-ntu-dr.10356-520872023-03-03T20:29:24Z Document reference and citation analysis Yap, Lina. Sun Aixin School of Computer Engineering DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval With knowledge growing rampantly as more papers were published, researchers are finding the task to retrieve relevant papers to their domain of research a time-consuming task. The aim for this project is to lessen the stress on searching for relevant papers with the idea that, researcher should just retrieve an article of interest, and system should based on that article recommend related articles. With this idea in mind, the main goals were outlined to first retrieve two hops of articles based on the selected document, then find clusters of similar papers defined by a set of features, and finally rank and recommend papers. For experimental purposes, the data set used in this project was in the field of Biomedical and Life Sciences, downloaded from PMC Open Access Subset. The features selected to cluster the dataset were namely bibliographic coupling degree, and degree of similarity for title's, and degree of similarity for abstract's topic vector. Bibliographic coupling degree is defined as the number of matching outgoing citation between two articles. Topic vector, obtained through a Topic Modelling library, was then computed for its cosine angle to determine the degree of similarity. This project also employed an interesting No-SQL database, MongoDB, for persisting articles. One of the limitation of this project was evaluating the relevancy of recommendation. While PMC Open Access Subset had provided a large data set enough to gather sufficient articles from two hops, it would require personnel who are highly knowledgeable in this field to evaluate if the recommended articles were relevant. Moreover, the prototype was limited by only three features and could have been further enhanced by using better criteria to select better results. For example, weight of each citation in the feature vector could be represented count of the number of times referenced in an article, instead of representing the feature vector in a binary form. Bachelor of Engineering (Computer Science) 2013-04-22T06:38:21Z 2013-04-22T06:38:21Z 2012 2012 Final Year Project (FYP) http://hdl.handle.net/10356/52087 en Nanyang Technological University 58 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval
spellingShingle DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval
Yap, Lina.
Document reference and citation analysis
description With knowledge growing rampantly as more papers were published, researchers are finding the task to retrieve relevant papers to their domain of research a time-consuming task. The aim for this project is to lessen the stress on searching for relevant papers with the idea that, researcher should just retrieve an article of interest, and system should based on that article recommend related articles. With this idea in mind, the main goals were outlined to first retrieve two hops of articles based on the selected document, then find clusters of similar papers defined by a set of features, and finally rank and recommend papers. For experimental purposes, the data set used in this project was in the field of Biomedical and Life Sciences, downloaded from PMC Open Access Subset. The features selected to cluster the dataset were namely bibliographic coupling degree, and degree of similarity for title's, and degree of similarity for abstract's topic vector. Bibliographic coupling degree is defined as the number of matching outgoing citation between two articles. Topic vector, obtained through a Topic Modelling library, was then computed for its cosine angle to determine the degree of similarity. This project also employed an interesting No-SQL database, MongoDB, for persisting articles. One of the limitation of this project was evaluating the relevancy of recommendation. While PMC Open Access Subset had provided a large data set enough to gather sufficient articles from two hops, it would require personnel who are highly knowledgeable in this field to evaluate if the recommended articles were relevant. Moreover, the prototype was limited by only three features and could have been further enhanced by using better criteria to select better results. For example, weight of each citation in the feature vector could be represented count of the number of times referenced in an article, instead of representing the feature vector in a binary form.
author2 Sun Aixin
author_facet Sun Aixin
Yap, Lina.
format Final Year Project
author Yap, Lina.
author_sort Yap, Lina.
title Document reference and citation analysis
title_short Document reference and citation analysis
title_full Document reference and citation analysis
title_fullStr Document reference and citation analysis
title_full_unstemmed Document reference and citation analysis
title_sort document reference and citation analysis
publishDate 2013
url http://hdl.handle.net/10356/52087
_version_ 1759854551087710208