Information retrieval in blogs

Blogs have grown explosively nowadays and this makes the study of information retrieval (IR) in blogs increasingly crucial to research on how to effectively search for required and meaningful information from the huge and raw datasets of blogs. TREC (Text Retrieval Conference) is an annual conferenc...

Full description

Saved in:

Bibliographic Details
Main Author:	Tan, Kia Poh.
Other Authors:	Tsai Flora S
Format:	Final Year Project
Language:	English
Published:	2009
Subjects:	DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval
Online Access:	http://hdl.handle.net/10356/16693
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-16693
record_format	dspace
spelling	sg-ntu-dr.10356-166932023-07-07T16:06:00Z Information retrieval in blogs Tan, Kia Poh. Tsai Flora S School of Electrical and Electronic Engineering DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval Blogs have grown explosively nowadays and this makes the study of information retrieval (IR) in blogs increasingly crucial to research on how to effectively search for required and meaningful information from the huge and raw datasets of blogs. TREC (Text Retrieval Conference) is an annual conference with several tracks in which each of them researches on a particular domain of text retrieval. TREC Blog Track was created in 2006 to investigate the information seeking behavior in blog domain and there are several tasks performed under it now for different aspects of blogs. The focus of this project is to study the blog distillation (feed search) task which was designed to search for the relevant feeds which have a principal and recurring interest in a particular topic (query), so that the user may be interested to subscribe to the feeds in his feed reader. For the approaches deployed by the participating groups of this task, most of them perform the task by using Terrier search engine which is dedicated to handling most of the TREC datasets. However in this project, the author tries a novel approach that totally does not involve Terrier search engine. Instead, all the involved data is converted from file format to database format for higher reusability, portability and extensibility. By doing this, all existing programs/algorithms that are able to access database can work with this approach well. A well known Rocchio Algorithm is implemented to test out the performance of this approach and the results are quite promising. Further studies and researches are then required to substantiate the idea and the anticipated outcome is rewarding. Bachelor of Engineering 2009-05-28T02:16:25Z 2009-05-28T02:16:25Z 2009 2009 Final Year Project (FYP) http://hdl.handle.net/10356/16693 en Nanyang Technological University 72 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval
spellingShingle	DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval Tan, Kia Poh. Information retrieval in blogs
description	Blogs have grown explosively nowadays and this makes the study of information retrieval (IR) in blogs increasingly crucial to research on how to effectively search for required and meaningful information from the huge and raw datasets of blogs. TREC (Text Retrieval Conference) is an annual conference with several tracks in which each of them researches on a particular domain of text retrieval. TREC Blog Track was created in 2006 to investigate the information seeking behavior in blog domain and there are several tasks performed under it now for different aspects of blogs. The focus of this project is to study the blog distillation (feed search) task which was designed to search for the relevant feeds which have a principal and recurring interest in a particular topic (query), so that the user may be interested to subscribe to the feeds in his feed reader. For the approaches deployed by the participating groups of this task, most of them perform the task by using Terrier search engine which is dedicated to handling most of the TREC datasets. However in this project, the author tries a novel approach that totally does not involve Terrier search engine. Instead, all the involved data is converted from file format to database format for higher reusability, portability and extensibility. By doing this, all existing programs/algorithms that are able to access database can work with this approach well. A well known Rocchio Algorithm is implemented to test out the performance of this approach and the results are quite promising. Further studies and researches are then required to substantiate the idea and the anticipated outcome is rewarding.
author2	Tsai Flora S
author_facet	Tsai Flora S Tan, Kia Poh.
format	Final Year Project
author	Tan, Kia Poh.
author_sort	Tan, Kia Poh.
title	Information retrieval in blogs
title_short	Information retrieval in blogs
title_full	Information retrieval in blogs
title_fullStr	Information retrieval in blogs
title_full_unstemmed	Information retrieval in blogs
title_sort	information retrieval in blogs
publishDate	2009
url	http://hdl.handle.net/10356/16693
_version_	1772825844297760768

Information retrieval in blogs

Similar Items