Extract, integrate and search healthcare knowledge from the web
In recent years, user-generated content has become increasingly popular on the World Wide Web. Community-based question-answering (CQA) portals are a particular form of user generated content, such as Yahoo! Answers and Wiki Answers, where users can ask and answer questions, and also discover questi...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
2013
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/52527 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | In recent years, user-generated content has become increasingly popular on the World Wide Web. Community-based question-answering (CQA) portals are a particular form of user generated content, such as Yahoo! Answers and Wiki Answers, where users can ask and answer questions, and also discover questions that have already been resolved.
With the increased usage of different CQA portals, there is a need for a common avenue to congregate information from these different websites to serve convenience for users to be able to retrieve answers without the hassle of going through many websites, especially in the field of health care where time can be a crucial factor. In addition, a better search technique to retrieve the most helpful health care information is important. Therefore in this project, we look at how to extract and distill CQA knowledge of high quality on the Web to build a database on health care, integrate the different type of data and search the data to answer queries from users.
To extract information, Java-programmed web crawlers are implemented to retrieve a total of six hundred thousand QA pairs from various websites. They are stored as XML files and questions without answers are removed before the remaining useful data was indexed by using Lucene, a Java Information Retrieval library, so that they can be used for searching. |
---|