Information retrieval in blogs

Blogs have grown explosively nowadays and this makes the study of information retrieval (IR) in blogs increasingly crucial to research on how to effectively search for required and meaningful information from the huge and raw datasets of blogs. TREC (Text Retrieval Conference) is an annual conferenc...

Full description

Saved in:
Bibliographic Details
Main Author: Tan, Kia Poh.
Other Authors: Tsai Flora S
Format: Final Year Project
Language:English
Published: 2009
Subjects:
Online Access:http://hdl.handle.net/10356/16693
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-16693
record_format dspace
spelling sg-ntu-dr.10356-166932023-07-07T16:06:00Z Information retrieval in blogs Tan, Kia Poh. Tsai Flora S School of Electrical and Electronic Engineering DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval Blogs have grown explosively nowadays and this makes the study of information retrieval (IR) in blogs increasingly crucial to research on how to effectively search for required and meaningful information from the huge and raw datasets of blogs. TREC (Text Retrieval Conference) is an annual conference with several tracks in which each of them researches on a particular domain of text retrieval. TREC Blog Track was created in 2006 to investigate the information seeking behavior in blog domain and there are several tasks performed under it now for different aspects of blogs. The focus of this project is to study the blog distillation (feed search) task which was designed to search for the relevant feeds which have a principal and recurring interest in a particular topic (query), so that the user may be interested to subscribe to the feeds in his feed reader. For the approaches deployed by the participating groups of this task, most of them perform the task by using Terrier search engine which is dedicated to handling most of the TREC datasets. However in this project, the author tries a novel approach that totally does not involve Terrier search engine. Instead, all the involved data is converted from file format to database format for higher reusability, portability and extensibility. By doing this, all existing programs/algorithms that are able to access database can work with this approach well. A well known Rocchio Algorithm is implemented to test out the performance of this approach and the results are quite promising. Further studies and researches are then required to substantiate the idea and the anticipated outcome is rewarding. Bachelor of Engineering 2009-05-28T02:16:25Z 2009-05-28T02:16:25Z 2009 2009 Final Year Project (FYP) http://hdl.handle.net/10356/16693 en Nanyang Technological University 72 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval
spellingShingle DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval
Tan, Kia Poh.
Information retrieval in blogs
description Blogs have grown explosively nowadays and this makes the study of information retrieval (IR) in blogs increasingly crucial to research on how to effectively search for required and meaningful information from the huge and raw datasets of blogs. TREC (Text Retrieval Conference) is an annual conference with several tracks in which each of them researches on a particular domain of text retrieval. TREC Blog Track was created in 2006 to investigate the information seeking behavior in blog domain and there are several tasks performed under it now for different aspects of blogs. The focus of this project is to study the blog distillation (feed search) task which was designed to search for the relevant feeds which have a principal and recurring interest in a particular topic (query), so that the user may be interested to subscribe to the feeds in his feed reader. For the approaches deployed by the participating groups of this task, most of them perform the task by using Terrier search engine which is dedicated to handling most of the TREC datasets. However in this project, the author tries a novel approach that totally does not involve Terrier search engine. Instead, all the involved data is converted from file format to database format for higher reusability, portability and extensibility. By doing this, all existing programs/algorithms that are able to access database can work with this approach well. A well known Rocchio Algorithm is implemented to test out the performance of this approach and the results are quite promising. Further studies and researches are then required to substantiate the idea and the anticipated outcome is rewarding.
author2 Tsai Flora S
author_facet Tsai Flora S
Tan, Kia Poh.
format Final Year Project
author Tan, Kia Poh.
author_sort Tan, Kia Poh.
title Information retrieval in blogs
title_short Information retrieval in blogs
title_full Information retrieval in blogs
title_fullStr Information retrieval in blogs
title_full_unstemmed Information retrieval in blogs
title_sort information retrieval in blogs
publishDate 2009
url http://hdl.handle.net/10356/16693
_version_ 1772825844297760768