Document reference and citation analysis

With knowledge growing rampantly as more papers were published, researchers are finding the task to retrieve relevant papers to their domain of research a time-consuming task. The aim for this project is to lessen the stress on searching for relevant papers with the idea that, researcher should just...

Full description

Saved in:

Bibliographic Details
Main Author:	Yap, Lina.
Other Authors:	Sun Aixin
Format:	Final Year Project
Language:	English
Published:	2013
Subjects:	DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval
Online Access:	http://hdl.handle.net/10356/52087
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-52087
record_format	dspace
spelling	sg-ntu-dr.10356-520872023-03-03T20:29:24Z Document reference and citation analysis Yap, Lina. Sun Aixin School of Computer Engineering DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval With knowledge growing rampantly as more papers were published, researchers are finding the task to retrieve relevant papers to their domain of research a time-consuming task. The aim for this project is to lessen the stress on searching for relevant papers with the idea that, researcher should just retrieve an article of interest, and system should based on that article recommend related articles. With this idea in mind, the main goals were outlined to first retrieve two hops of articles based on the selected document, then find clusters of similar papers defined by a set of features, and finally rank and recommend papers. For experimental purposes, the data set used in this project was in the field of Biomedical and Life Sciences, downloaded from PMC Open Access Subset. The features selected to cluster the dataset were namely bibliographic coupling degree, and degree of similarity for title's, and degree of similarity for abstract's topic vector. Bibliographic coupling degree is defined as the number of matching outgoing citation between two articles. Topic vector, obtained through a Topic Modelling library, was then computed for its cosine angle to determine the degree of similarity. This project also employed an interesting No-SQL database, MongoDB, for persisting articles. One of the limitation of this project was evaluating the relevancy of recommendation. While PMC Open Access Subset had provided a large data set enough to gather sufficient articles from two hops, it would require personnel who are highly knowledgeable in this field to evaluate if the recommended articles were relevant. Moreover, the prototype was limited by only three features and could have been further enhanced by using better criteria to select better results. For example, weight of each citation in the feature vector could be represented count of the number of times referenced in an article, instead of representing the feature vector in a binary form. Bachelor of Engineering (Computer Science) 2013-04-22T06:38:21Z 2013-04-22T06:38:21Z 2012 2012 Final Year Project (FYP) http://hdl.handle.net/10356/52087 en Nanyang Technological University 58 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval
spellingShingle	DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval Yap, Lina. Document reference and citation analysis
description	With knowledge growing rampantly as more papers were published, researchers are finding the task to retrieve relevant papers to their domain of research a time-consuming task. The aim for this project is to lessen the stress on searching for relevant papers with the idea that, researcher should just retrieve an article of interest, and system should based on that article recommend related articles. With this idea in mind, the main goals were outlined to first retrieve two hops of articles based on the selected document, then find clusters of similar papers defined by a set of features, and finally rank and recommend papers. For experimental purposes, the data set used in this project was in the field of Biomedical and Life Sciences, downloaded from PMC Open Access Subset. The features selected to cluster the dataset were namely bibliographic coupling degree, and degree of similarity for title's, and degree of similarity for abstract's topic vector. Bibliographic coupling degree is defined as the number of matching outgoing citation between two articles. Topic vector, obtained through a Topic Modelling library, was then computed for its cosine angle to determine the degree of similarity. This project also employed an interesting No-SQL database, MongoDB, for persisting articles. One of the limitation of this project was evaluating the relevancy of recommendation. While PMC Open Access Subset had provided a large data set enough to gather sufficient articles from two hops, it would require personnel who are highly knowledgeable in this field to evaluate if the recommended articles were relevant. Moreover, the prototype was limited by only three features and could have been further enhanced by using better criteria to select better results. For example, weight of each citation in the feature vector could be represented count of the number of times referenced in an article, instead of representing the feature vector in a binary form.
author2	Sun Aixin
author_facet	Sun Aixin Yap, Lina.
format	Final Year Project
author	Yap, Lina.
author_sort	Yap, Lina.
title	Document reference and citation analysis
title_short	Document reference and citation analysis
title_full	Document reference and citation analysis
title_fullStr	Document reference and citation analysis
title_full_unstemmed	Document reference and citation analysis
title_sort	document reference and citation analysis
publishDate	2013
url	http://hdl.handle.net/10356/52087
_version_	1759854551087710208

Document reference and citation analysis

Similar Items