Information retrieval in blogs

Blogs have grown explosively nowadays and this makes the study of information retrieval (IR) in blogs increasingly crucial to research on how to effectively search for required and meaningful information from the huge and raw datasets of blogs. TREC (Text Retrieval Conference) is an annual conferenc...

Full description

Saved in:

Bibliographic Details
Main Author:	Tan, Kia Poh.
Other Authors:	Tsai Flora S
Format:	Final Year Project
Language:	English
Published:	2009
Subjects:	DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval
Online Access:	http://hdl.handle.net/10356/16693
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

Description
Summary:	Blogs have grown explosively nowadays and this makes the study of information retrieval (IR) in blogs increasingly crucial to research on how to effectively search for required and meaningful information from the huge and raw datasets of blogs. TREC (Text Retrieval Conference) is an annual conference with several tracks in which each of them researches on a particular domain of text retrieval. TREC Blog Track was created in 2006 to investigate the information seeking behavior in blog domain and there are several tasks performed under it now for different aspects of blogs. The focus of this project is to study the blog distillation (feed search) task which was designed to search for the relevant feeds which have a principal and recurring interest in a particular topic (query), so that the user may be interested to subscribe to the feeds in his feed reader. For the approaches deployed by the participating groups of this task, most of them perform the task by using Terrier search engine which is dedicated to handling most of the TREC datasets. However in this project, the author tries a novel approach that totally does not involve Terrier search engine. Instead, all the involved data is converted from file format to database format for higher reusability, portability and extensibility. By doing this, all existing programs/algorithms that are able to access database can work with this approach well. A well known Rocchio Algorithm is implemented to test out the performance of this approach and the results are quite promising. Further studies and researches are then required to substantiate the idea and the anticipated outcome is rewarding.

Information retrieval in blogs

Similar Items