Evaluation of query reformulation strategies for domain-specific information searches: a case study of the durian fruit domain / Azilawati Azizan
Although search engine technologies have made great strides in helping users find information on the Web, search results are only as good as the keywords and phrases that users use in the search query. Hence, search queries need to precisely formulated. However, users often fail to accurately transl...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2022
|
Subjects: | |
Online Access: | https://ir.uitm.edu.my/id/eprint/78545/1/78545.pdf https://ir.uitm.edu.my/id/eprint/78545/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Teknologi Mara |
Language: | English |
id |
my.uitm.ir.78545 |
---|---|
record_format |
eprints |
spelling |
my.uitm.ir.785452023-05-29T04:20:16Z https://ir.uitm.edu.my/id/eprint/78545/ Evaluation of query reformulation strategies for domain-specific information searches: a case study of the durian fruit domain / Azilawati Azizan Azizan, Azilawati Programming. Rule-based programming. Backtrack programming Although search engine technologies have made great strides in helping users find information on the Web, search results are only as good as the keywords and phrases that users use in the search query. Hence, search queries need to precisely formulated. However, users often fail to accurately translate their information needs into correct query words or phrases for a search engine to utilize. This becomes harder when users search for domain-specific information as, in most cases, users are unable to identify the keywords that are appropriate for the domain in the search query. As such, the search engine is unable to locate the relevant documents. This causes users to reformulate the query multiple times in the hopes of retrieving a more relevant set of search results. To address this issue, many researchers propose the use of query reformulation, query refinement, query expansion, or query disambiguation to intentionally build better queries and retrieve more relevant results. However, most of strategies employed to tackle this issue; such as the query log, rhetorical structure, thesaurus, WordNet, ontology, and user profiles; require extensive sources, risky and are time consuming. Therefore, more effective and simpler techniques are needed to obtain better search results as well reduce the need of query reformulation (QR). To that end, this study applied a search engine framework which employs standard methodology in Information Retrieval (IR) to evaluate several reformulation strategies and proposes an operative and effective QR strategy to locate domain-specific information. The fruit domain; specifically, durian; was chosen as the case study. An investigation was first conducted to prove that the issues present at the time of the study as well as the selected domain were still pertinent. Several popular commercial search engines were examined to determine their current search performance in locating domain-specific information on the Web. A group of users was then selected to conduct a task-based search to examine how users structured their queries to obtain the search intent. The results indicated that the most popular search engine (Google) only had an average of P@10 score of 0.463 and mean average precision (MAP) score of 0.649 when searching for durian-related information. The results of the task-based search showed that 84.82% of users reformulate their queries, clearly indicating that users do not obtain relevant search results on the first few tries. As such, several QR strategies that may produce better search results were investigated. Nine strategies were examined by using features, such as query keywords, ontology, the characteristic category of the domain, and the domain name. These features were manipulated using techniques, such as ‘generalization’, ‘specification’, and ‘new’. Of the nine strategies examined, three outperformed the baseline. Combining query keywords with ontology significantly surpassed the baseline MAP score by 2.65%. More interestingly, the characteristic category of the domain, which is considerably simpler and easier to use, also outperformed the baseline MAP score by 2.63%. The findings of this study contribute to the field of IR, through the performance of search engines, user behaviour, test collection and reformulation strategies in searching for domain specific informatio 2022 Thesis NonPeerReviewed text en https://ir.uitm.edu.my/id/eprint/78545/1/78545.pdf Evaluation of query reformulation strategies for domain-specific information searches: a case study of the durian fruit domain / Azilawati Azizan. (2022) PhD thesis, thesis, Universiti Teknologi MARA (UiTM). |
institution |
Universiti Teknologi Mara |
building |
Tun Abdul Razak Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Teknologi Mara |
content_source |
UiTM Institutional Repository |
url_provider |
http://ir.uitm.edu.my/ |
language |
English |
topic |
Programming. Rule-based programming. Backtrack programming |
spellingShingle |
Programming. Rule-based programming. Backtrack programming Azizan, Azilawati Evaluation of query reformulation strategies for domain-specific information searches: a case study of the durian fruit domain / Azilawati Azizan |
description |
Although search engine technologies have made great strides in helping users find information on the Web, search results are only as good as the keywords and phrases that users use in the search query. Hence, search queries need to precisely formulated. However, users often fail to accurately translate their information needs into correct query words or phrases for a search engine to utilize. This becomes harder when users search for domain-specific information as, in most cases, users are unable to identify the keywords that are appropriate for the domain in the search query. As such, the search engine is unable to locate the relevant documents. This causes users to reformulate the query multiple times in the hopes of retrieving a more relevant set of search results. To address this issue, many researchers propose the use of query reformulation, query refinement, query expansion, or query disambiguation to intentionally build better queries and retrieve more relevant results. However, most of strategies employed to tackle this issue; such as the query log, rhetorical structure, thesaurus, WordNet, ontology, and user profiles; require extensive sources, risky and are time consuming. Therefore, more effective and simpler techniques are needed to obtain better search results as well reduce the need of query reformulation (QR). To that end, this study applied a search engine framework which employs standard methodology in Information Retrieval (IR) to evaluate several reformulation strategies and proposes an operative and effective QR strategy to locate domain-specific information. The fruit domain; specifically, durian; was chosen as the case study. An investigation was first conducted to prove that the issues present at the time of the study as well as the selected domain were still pertinent. Several popular commercial search engines were examined to determine their current search performance in locating domain-specific information on the Web. A group of users was then selected to conduct a task-based search to examine how users structured their queries to obtain the search intent. The results indicated that the most popular search engine (Google) only had an average of P@10 score of 0.463 and mean average precision (MAP) score of 0.649 when searching for durian-related information. The results of the task-based search showed that 84.82% of users reformulate their queries, clearly indicating that users do not obtain relevant search results on the first few tries. As such, several QR strategies that may produce better search results were investigated. Nine strategies were examined by using features, such as query keywords, ontology, the characteristic category of the domain, and the domain name. These features were manipulated using techniques, such as ‘generalization’, ‘specification’, and ‘new’. Of the nine strategies examined, three outperformed the baseline. Combining query keywords with ontology significantly surpassed the baseline MAP score by 2.65%. More interestingly, the characteristic category of the domain, which is considerably simpler and easier to use, also outperformed the baseline MAP score by 2.63%. The findings of this study contribute to the field of IR, through the performance of search engines, user behaviour, test collection and reformulation strategies in searching for domain specific informatio |
format |
Thesis |
author |
Azizan, Azilawati |
author_facet |
Azizan, Azilawati |
author_sort |
Azizan, Azilawati |
title |
Evaluation of query reformulation strategies for domain-specific information searches: a case study of the durian fruit domain / Azilawati Azizan |
title_short |
Evaluation of query reformulation strategies for domain-specific information searches: a case study of the durian fruit domain / Azilawati Azizan |
title_full |
Evaluation of query reformulation strategies for domain-specific information searches: a case study of the durian fruit domain / Azilawati Azizan |
title_fullStr |
Evaluation of query reformulation strategies for domain-specific information searches: a case study of the durian fruit domain / Azilawati Azizan |
title_full_unstemmed |
Evaluation of query reformulation strategies for domain-specific information searches: a case study of the durian fruit domain / Azilawati Azizan |
title_sort |
evaluation of query reformulation strategies for domain-specific information searches: a case study of the durian fruit domain / azilawati azizan |
publishDate |
2022 |
url |
https://ir.uitm.edu.my/id/eprint/78545/1/78545.pdf https://ir.uitm.edu.my/id/eprint/78545/ |
_version_ |
1768011716012015616 |