Extract, integrate and search healthcare knowledge from the web

In recent years, user-generated content has become increasingly popular on the World Wide Web. Community-based question-answering (CQA) portals are a particular form of user generated content, such as Yahoo! Answers and Wiki Answers, where users can ask and answer questions, and also discover questi...

Full description

Saved in:
Bibliographic Details
Main Author: Tan, Kang Zhuang.
Other Authors: School of Computer Engineering
Format: Final Year Project
Language:English
Published: 2013
Subjects:
Online Access:http://hdl.handle.net/10356/52527
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-52527
record_format dspace
spelling sg-ntu-dr.10356-525272023-03-03T20:40:55Z Extract, integrate and search healthcare knowledge from the web Tan, Kang Zhuang. School of Computer Engineering Cong Gao DRNTU::Engineering::Computer science and engineering In recent years, user-generated content has become increasingly popular on the World Wide Web. Community-based question-answering (CQA) portals are a particular form of user generated content, such as Yahoo! Answers and Wiki Answers, where users can ask and answer questions, and also discover questions that have already been resolved. With the increased usage of different CQA portals, there is a need for a common avenue to congregate information from these different websites to serve convenience for users to be able to retrieve answers without the hassle of going through many websites, especially in the field of health care where time can be a crucial factor. In addition, a better search technique to retrieve the most helpful health care information is important. Therefore in this project, we look at how to extract and distill CQA knowledge of high quality on the Web to build a database on health care, integrate the different type of data and search the data to answer queries from users. To extract information, Java-programmed web crawlers are implemented to retrieve a total of six hundred thousand QA pairs from various websites. They are stored as XML files and questions without answers are removed before the remaining useful data was indexed by using Lucene, a Java Information Retrieval library, so that they can be used for searching. Bachelor of Engineering (Computer Science) 2013-05-15T04:15:32Z 2013-05-15T04:15:32Z 2013 2013 Final Year Project (FYP) http://hdl.handle.net/10356/52527 en Nanyang Technological University 49 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering
spellingShingle DRNTU::Engineering::Computer science and engineering
Tan, Kang Zhuang.
Extract, integrate and search healthcare knowledge from the web
description In recent years, user-generated content has become increasingly popular on the World Wide Web. Community-based question-answering (CQA) portals are a particular form of user generated content, such as Yahoo! Answers and Wiki Answers, where users can ask and answer questions, and also discover questions that have already been resolved. With the increased usage of different CQA portals, there is a need for a common avenue to congregate information from these different websites to serve convenience for users to be able to retrieve answers without the hassle of going through many websites, especially in the field of health care where time can be a crucial factor. In addition, a better search technique to retrieve the most helpful health care information is important. Therefore in this project, we look at how to extract and distill CQA knowledge of high quality on the Web to build a database on health care, integrate the different type of data and search the data to answer queries from users. To extract information, Java-programmed web crawlers are implemented to retrieve a total of six hundred thousand QA pairs from various websites. They are stored as XML files and questions without answers are removed before the remaining useful data was indexed by using Lucene, a Java Information Retrieval library, so that they can be used for searching.
author2 School of Computer Engineering
author_facet School of Computer Engineering
Tan, Kang Zhuang.
format Final Year Project
author Tan, Kang Zhuang.
author_sort Tan, Kang Zhuang.
title Extract, integrate and search healthcare knowledge from the web
title_short Extract, integrate and search healthcare knowledge from the web
title_full Extract, integrate and search healthcare knowledge from the web
title_fullStr Extract, integrate and search healthcare knowledge from the web
title_full_unstemmed Extract, integrate and search healthcare knowledge from the web
title_sort extract, integrate and search healthcare knowledge from the web
publishDate 2013
url http://hdl.handle.net/10356/52527
_version_ 1759853440944570368