Extracting integrate and search healthcare knowledge from the web (III)

Currently, there is a trend where users post questions and edit questions via the use of online websites. These sites are also known as Community Question Answering (CQA) sites. CQA sites are beneficial to the web users because of the valuable knowledge accumulated from everybody around the world. H...

Full description

Saved in:
Bibliographic Details
Main Author: Lim, Lionel Guan Chuan.
Other Authors: School of Computer Engineering
Format: Final Year Project
Language:English
Published: 2013
Subjects:
Online Access:http://hdl.handle.net/10356/51991
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-51991
record_format dspace
spelling sg-ntu-dr.10356-519912023-03-03T20:34:52Z Extracting integrate and search healthcare knowledge from the web (III) Lim, Lionel Guan Chuan. School of Computer Engineering Gao Cong DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing Currently, there is a trend where users post questions and edit questions via the use of online websites. These sites are also known as Community Question Answering (CQA) sites. CQA sites are beneficial to the web users because of the valuable knowledge accumulated from everybody around the world. However, as beneficial as CQA sites may be, there comes a complexity of how to extract only relevant information which is beneficial to the web user. The goal of this project aims to consolidate healthcare information and allow web users to extract information which is beneficial to them. To do so, java-programmed web crawlers are programmed to retrieve the URL, category, question answer from the CQA health category. The question answer pairs crawled are then saved into an XML format. Lucene, a java IR java library, is used for speed indexing of the various XML documents.Another goal is to design a centralised search engine that can retrieve relevant healthcare information from CQA data. As this project will be a continuation from Senior Lee Qian Hui’s progress, i am tasked to utilise Information Retrieval Models to data crawl from more CQA sites that resemble WikiAnswers, which was previously implemented by Senior Lee. Bachelor of Engineering (Computer Science) 2013-04-19T02:42:20Z 2013-04-19T02:42:20Z 2013 2013 Final Year Project (FYP) http://hdl.handle.net/10356/51991 en Nanyang Technological University 43 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval
DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing
spellingShingle DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval
DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing
Lim, Lionel Guan Chuan.
Extracting integrate and search healthcare knowledge from the web (III)
description Currently, there is a trend where users post questions and edit questions via the use of online websites. These sites are also known as Community Question Answering (CQA) sites. CQA sites are beneficial to the web users because of the valuable knowledge accumulated from everybody around the world. However, as beneficial as CQA sites may be, there comes a complexity of how to extract only relevant information which is beneficial to the web user. The goal of this project aims to consolidate healthcare information and allow web users to extract information which is beneficial to them. To do so, java-programmed web crawlers are programmed to retrieve the URL, category, question answer from the CQA health category. The question answer pairs crawled are then saved into an XML format. Lucene, a java IR java library, is used for speed indexing of the various XML documents.Another goal is to design a centralised search engine that can retrieve relevant healthcare information from CQA data. As this project will be a continuation from Senior Lee Qian Hui’s progress, i am tasked to utilise Information Retrieval Models to data crawl from more CQA sites that resemble WikiAnswers, which was previously implemented by Senior Lee.
author2 School of Computer Engineering
author_facet School of Computer Engineering
Lim, Lionel Guan Chuan.
format Final Year Project
author Lim, Lionel Guan Chuan.
author_sort Lim, Lionel Guan Chuan.
title Extracting integrate and search healthcare knowledge from the web (III)
title_short Extracting integrate and search healthcare knowledge from the web (III)
title_full Extracting integrate and search healthcare knowledge from the web (III)
title_fullStr Extracting integrate and search healthcare knowledge from the web (III)
title_full_unstemmed Extracting integrate and search healthcare knowledge from the web (III)
title_sort extracting integrate and search healthcare knowledge from the web (iii)
publishDate 2013
url http://hdl.handle.net/10356/51991
_version_ 1759853975998300160