Information retrieval in blogs
Blogs have grown explosively nowadays and this makes the study of information retrieval (IR) in blogs increasingly crucial to research on how to effectively search for required and meaningful information from the huge and raw datasets of blogs. TREC (Text Retrieval Conference) is an annual conferenc...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
2009
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/16693 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Blogs have grown explosively nowadays and this makes the study of information retrieval (IR) in blogs increasingly crucial to research on how to effectively search for required and meaningful information from the huge and raw datasets of blogs. TREC (Text Retrieval Conference) is an annual conference with several tracks in which each of them researches on a particular domain of text retrieval. TREC Blog Track was created in 2006 to investigate the information seeking behavior in blog domain and there are several tasks performed under it now for different aspects of blogs. The focus of this project is to study the blog distillation (feed search) task which was designed to search for the relevant feeds which have a principal and recurring interest in a particular topic (query), so that the user may be interested to subscribe to the feeds in his feed reader. For the approaches deployed by the participating groups of this task, most of them perform the task by using Terrier search engine which is dedicated to handling most of the TREC datasets. However in this project, the author tries a novel approach that totally does not involve Terrier search engine. Instead, all the involved data is converted from file format to database format for higher reusability, portability and extensibility. By doing this, all existing programs/algorithms that are able to access database can work with this approach well. A well known Rocchio Algorithm is implemented to test out the performance of this approach and the results are quite promising. Further studies and researches are then required to substantiate the idea and the anticipated outcome is rewarding. |
---|