Information retrieval in blogs
Blogs have grown explosively nowadays and this makes the study of information retrieval (IR) in blogs increasingly crucial to research on how to effectively search for required and meaningful information from the huge and raw datasets of blogs. TREC (Text Retrieval Conference) is an annual conferenc...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
2009
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/16693 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-16693 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-166932023-07-07T16:06:00Z Information retrieval in blogs Tan, Kia Poh. Tsai Flora S School of Electrical and Electronic Engineering DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval Blogs have grown explosively nowadays and this makes the study of information retrieval (IR) in blogs increasingly crucial to research on how to effectively search for required and meaningful information from the huge and raw datasets of blogs. TREC (Text Retrieval Conference) is an annual conference with several tracks in which each of them researches on a particular domain of text retrieval. TREC Blog Track was created in 2006 to investigate the information seeking behavior in blog domain and there are several tasks performed under it now for different aspects of blogs. The focus of this project is to study the blog distillation (feed search) task which was designed to search for the relevant feeds which have a principal and recurring interest in a particular topic (query), so that the user may be interested to subscribe to the feeds in his feed reader. For the approaches deployed by the participating groups of this task, most of them perform the task by using Terrier search engine which is dedicated to handling most of the TREC datasets. However in this project, the author tries a novel approach that totally does not involve Terrier search engine. Instead, all the involved data is converted from file format to database format for higher reusability, portability and extensibility. By doing this, all existing programs/algorithms that are able to access database can work with this approach well. A well known Rocchio Algorithm is implemented to test out the performance of this approach and the results are quite promising. Further studies and researches are then required to substantiate the idea and the anticipated outcome is rewarding. Bachelor of Engineering 2009-05-28T02:16:25Z 2009-05-28T02:16:25Z 2009 2009 Final Year Project (FYP) http://hdl.handle.net/10356/16693 en Nanyang Technological University 72 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval |
spellingShingle |
DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval Tan, Kia Poh. Information retrieval in blogs |
description |
Blogs have grown explosively nowadays and this makes the study of information retrieval (IR) in blogs increasingly crucial to research on how to effectively search for required and meaningful information from the huge and raw datasets of blogs. TREC (Text Retrieval Conference) is an annual conference with several tracks in which each of them researches on a particular domain of text retrieval. TREC Blog Track was created in 2006 to investigate the information seeking behavior in blog domain and there are several tasks performed under it now for different aspects of blogs. The focus of this project is to study the blog distillation (feed search) task which was designed to search for the relevant feeds which have a principal and recurring interest in a particular topic (query), so that the user may be interested to subscribe to the feeds in his feed reader. For the approaches deployed by the participating groups of this task, most of them perform the task by using Terrier search engine which is dedicated to handling most of the TREC datasets. However in this project, the author tries a novel approach that totally does not involve Terrier search engine. Instead, all the involved data is converted from file format to database format for higher reusability, portability and extensibility. By doing this, all existing programs/algorithms that are able to access database can work with this approach well. A well known Rocchio Algorithm is implemented to test out the performance of this approach and the results are quite promising. Further studies and researches are then required to substantiate the idea and the anticipated outcome is rewarding. |
author2 |
Tsai Flora S |
author_facet |
Tsai Flora S Tan, Kia Poh. |
format |
Final Year Project |
author |
Tan, Kia Poh. |
author_sort |
Tan, Kia Poh. |
title |
Information retrieval in blogs |
title_short |
Information retrieval in blogs |
title_full |
Information retrieval in blogs |
title_fullStr |
Information retrieval in blogs |
title_full_unstemmed |
Information retrieval in blogs |
title_sort |
information retrieval in blogs |
publishDate |
2009 |
url |
http://hdl.handle.net/10356/16693 |
_version_ |
1772825844297760768 |