Document reference and citation analysis
With knowledge growing rampantly as more papers were published, researchers are finding the task to retrieve relevant papers to their domain of research a time-consuming task. The aim for this project is to lessen the stress on searching for relevant papers with the idea that, researcher should just...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
2013
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/52087 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-52087 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-520872023-03-03T20:29:24Z Document reference and citation analysis Yap, Lina. Sun Aixin School of Computer Engineering DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval With knowledge growing rampantly as more papers were published, researchers are finding the task to retrieve relevant papers to their domain of research a time-consuming task. The aim for this project is to lessen the stress on searching for relevant papers with the idea that, researcher should just retrieve an article of interest, and system should based on that article recommend related articles. With this idea in mind, the main goals were outlined to first retrieve two hops of articles based on the selected document, then find clusters of similar papers defined by a set of features, and finally rank and recommend papers. For experimental purposes, the data set used in this project was in the field of Biomedical and Life Sciences, downloaded from PMC Open Access Subset. The features selected to cluster the dataset were namely bibliographic coupling degree, and degree of similarity for title's, and degree of similarity for abstract's topic vector. Bibliographic coupling degree is defined as the number of matching outgoing citation between two articles. Topic vector, obtained through a Topic Modelling library, was then computed for its cosine angle to determine the degree of similarity. This project also employed an interesting No-SQL database, MongoDB, for persisting articles. One of the limitation of this project was evaluating the relevancy of recommendation. While PMC Open Access Subset had provided a large data set enough to gather sufficient articles from two hops, it would require personnel who are highly knowledgeable in this field to evaluate if the recommended articles were relevant. Moreover, the prototype was limited by only three features and could have been further enhanced by using better criteria to select better results. For example, weight of each citation in the feature vector could be represented count of the number of times referenced in an article, instead of representing the feature vector in a binary form. Bachelor of Engineering (Computer Science) 2013-04-22T06:38:21Z 2013-04-22T06:38:21Z 2012 2012 Final Year Project (FYP) http://hdl.handle.net/10356/52087 en Nanyang Technological University 58 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval |
spellingShingle |
DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval Yap, Lina. Document reference and citation analysis |
description |
With knowledge growing rampantly as more papers were published, researchers are finding the task to retrieve relevant papers to their domain of research a time-consuming task. The aim for this project is to lessen the stress on searching for relevant papers with the idea that, researcher should just retrieve an article of interest, and system should based on that article recommend related articles.
With this idea in mind, the main goals were outlined to first retrieve two hops of articles based on the selected document, then find clusters of similar papers defined by a set of features, and finally rank and recommend papers. For experimental purposes, the data set used in this project was in the field of Biomedical and Life Sciences, downloaded from PMC Open Access Subset.
The features selected to cluster the dataset were namely bibliographic coupling degree, and degree of similarity for title's, and degree of similarity for abstract's topic vector. Bibliographic coupling degree is defined as the number of matching outgoing citation between two articles. Topic vector, obtained through a Topic Modelling library, was then computed for its cosine angle to determine the degree of similarity. This project also employed an interesting No-SQL database, MongoDB, for persisting articles.
One of the limitation of this project was evaluating the relevancy of recommendation. While PMC Open Access Subset had provided a large data set enough to gather sufficient articles from two hops, it would require personnel who are highly knowledgeable in this field to evaluate if the recommended articles were relevant. Moreover, the prototype was limited by only three features and could have been further enhanced by using better criteria to select better results. For example, weight of each citation in the feature vector could be represented count of the number of times referenced in an article, instead of representing the feature vector in a binary form. |
author2 |
Sun Aixin |
author_facet |
Sun Aixin Yap, Lina. |
format |
Final Year Project |
author |
Yap, Lina. |
author_sort |
Yap, Lina. |
title |
Document reference and citation analysis |
title_short |
Document reference and citation analysis |
title_full |
Document reference and citation analysis |
title_fullStr |
Document reference and citation analysis |
title_full_unstemmed |
Document reference and citation analysis |
title_sort |
document reference and citation analysis |
publishDate |
2013 |
url |
http://hdl.handle.net/10356/52087 |
_version_ |
1759854551087710208 |