Spatial aware information retrieval for community-based Q&A (SIRCQA)
As social media are getting popular on the web, there are abundant of user-generated content that could be utilized using Information Retrieval (IR) to satisfy information needs. One particular form of social media, Community-based Question and Answering (CQA) is a possible source to be explored. It...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
2012
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/50820 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | As social media are getting popular on the web, there are abundant of user-generated content that could be utilized using Information Retrieval (IR) to satisfy information needs. One particular form of social media, Community-based Question and Answering (CQA) is a possible source to be explored. It provides an online platform for user to ask questions and allows other user's to post their answers. With similar questions that could be ask by multiple user, the CQA archive can be used to search for past answered questions for users that expressed similar query later. A technique of improving IR is also getting popular is called Spatial Aware Retrieval. This method of retrieval in addition to providing keyword matching for query terms, it also compares the similarity of spatial context found in the document indented by query. This further increases user satisfaction by promoting more relevant search result from queries that would require spatial context-dependent information. Although many researches has been performed for spatial aware retrieval on web documents, very little investigation has been done for CQA entries. Thus in this report, the techniques of Spatial Aware Retrieval were being analyzed to propose a suitable retrieval operation in Spatial Aware Information Retrieval for CQA, SIRCQ. Three considerations were looked into. First, traditional IR were known to treat every terms equally (bag of words) when performing keyword matching. In this study, it analyze on the effects that will be resulted in placing different emphasis on the identified location terms found in the query. Next, a proposed retrieval operation were introduced to further build on the IR similarity function by combining the carefully calculated weighted scores of the documents (CQA entries) from both textual and spatial relevance. Lastly, the concern of incorporating query expansion to add other similar or implied location terms deduced by the spatial relationship cue (term) found in queries was examined. From the findings, effective measures were thoughtfully constructed and utilized to improve the search results. The proposed retrieval operations were evaluated using the Yahoo! Answer dataset with Precision@K and Mean Average Precision (MAP) performance metric. The results shown from the experiment proved that the proposed retrieval operation performed significantly better (t-test: p<0.05) in precision and its improved flexibility to handle multiple locations in comparison to conventional "bag of words" retrieval. |
---|