Knowledge discovery from forum data

Advancement in information retrieval and data mining techniques has provided more and more useful mechanisms for the retrieval of most relevant information from documents, as well as for knowledge discovery from the same. The knowledge embedded in online forums, a kind of knowledge-rich data source,...

Full description

Saved in:
Bibliographic Details
Main Author: Li, Jun
Other Authors: Sun Aixin
Format: Theses and Dissertations
Language:English
Published: 2015
Subjects:
Online Access:http://hdl.handle.net/10356/62935
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Advancement in information retrieval and data mining techniques has provided more and more useful mechanisms for the retrieval of most relevant information from documents, as well as for knowledge discovery from the same. The knowledge embedded in online forums, a kind of knowledge-rich data source, has yet be fully utilized because of the limited search functionalities provided by most existing forum platforms. This project provides a prototype solution to improve search functions of online forums. More specifically, a multithreaded Crawler and a Parser have been implemented to download and parse the posts published in a local forum in HTML format. A Topic Modeler which is built based on the MALLET package is used to generate the high-level topics of the forum data. An Indexer and a Searcher are then developed based on Lucene, to support searching over the forum data. A web search interface which supports sophisticated search requests and search result facet visualization is developed for users to discover knowledge in online forums. As the result, the solution provided by this project allows users to search relevant information by simple (e.g. single-keyword) as well as sophisticated queries. It also shows users a high-level view of the search results in aggregative and multi-facet visualized form. Furthermore, it enables users to understand the high-level topics of the search results by topic modeling. This search interface helps users to find the relevant information more effectively and efficiently. This study ends with a few limitations identified but not tackled due to the project scope and time constraint. Nevertheless, recommendations on addressing these limitations are made as future work.