Extract, integrate and search healthcare knowledge from the web

In recent years, user-generated content has become increasingly popular on the World Wide Web. Community-based question-answering (CQA) portals are a particular form of user generated content, such as Yahoo! Answers and Wiki Answers, where users can ask and answer questions, and also discover questi...

Full description

Saved in:

Bibliographic Details
Main Author:	Tan, Kang Zhuang.
Other Authors:	School of Computer Engineering
Format:	Final Year Project
Language:	English
Published:	2013
Subjects:	DRNTU::Engineering::Computer science and engineering
Online Access:	http://hdl.handle.net/10356/52527
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-52527
record_format	dspace
spelling	sg-ntu-dr.10356-525272023-03-03T20:40:55Z Extract, integrate and search healthcare knowledge from the web Tan, Kang Zhuang. School of Computer Engineering Cong Gao DRNTU::Engineering::Computer science and engineering In recent years, user-generated content has become increasingly popular on the World Wide Web. Community-based question-answering (CQA) portals are a particular form of user generated content, such as Yahoo! Answers and Wiki Answers, where users can ask and answer questions, and also discover questions that have already been resolved. With the increased usage of different CQA portals, there is a need for a common avenue to congregate information from these different websites to serve convenience for users to be able to retrieve answers without the hassle of going through many websites, especially in the field of health care where time can be a crucial factor. In addition, a better search technique to retrieve the most helpful health care information is important. Therefore in this project, we look at how to extract and distill CQA knowledge of high quality on the Web to build a database on health care, integrate the different type of data and search the data to answer queries from users. To extract information, Java-programmed web crawlers are implemented to retrieve a total of six hundred thousand QA pairs from various websites. They are stored as XML files and questions without answers are removed before the remaining useful data was indexed by using Lucene, a Java Information Retrieval library, so that they can be used for searching. Bachelor of Engineering (Computer Science) 2013-05-15T04:15:32Z 2013-05-15T04:15:32Z 2013 2013 Final Year Project (FYP) http://hdl.handle.net/10356/52527 en Nanyang Technological University 49 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Computer science and engineering
spellingShingle	DRNTU::Engineering::Computer science and engineering Tan, Kang Zhuang. Extract, integrate and search healthcare knowledge from the web
description	In recent years, user-generated content has become increasingly popular on the World Wide Web. Community-based question-answering (CQA) portals are a particular form of user generated content, such as Yahoo! Answers and Wiki Answers, where users can ask and answer questions, and also discover questions that have already been resolved. With the increased usage of different CQA portals, there is a need for a common avenue to congregate information from these different websites to serve convenience for users to be able to retrieve answers without the hassle of going through many websites, especially in the field of health care where time can be a crucial factor. In addition, a better search technique to retrieve the most helpful health care information is important. Therefore in this project, we look at how to extract and distill CQA knowledge of high quality on the Web to build a database on health care, integrate the different type of data and search the data to answer queries from users. To extract information, Java-programmed web crawlers are implemented to retrieve a total of six hundred thousand QA pairs from various websites. They are stored as XML files and questions without answers are removed before the remaining useful data was indexed by using Lucene, a Java Information Retrieval library, so that they can be used for searching.
author2	School of Computer Engineering
author_facet	School of Computer Engineering Tan, Kang Zhuang.
format	Final Year Project
author	Tan, Kang Zhuang.
author_sort	Tan, Kang Zhuang.
title	Extract, integrate and search healthcare knowledge from the web
title_short	Extract, integrate and search healthcare knowledge from the web
title_full	Extract, integrate and search healthcare knowledge from the web
title_fullStr	Extract, integrate and search healthcare knowledge from the web
title_full_unstemmed	Extract, integrate and search healthcare knowledge from the web
title_sort	extract, integrate and search healthcare knowledge from the web
publishDate	2013
url	http://hdl.handle.net/10356/52527
_version_	1759853440944570368

Extract, integrate and search healthcare knowledge from the web

Similar Items