Extracting integrate and search healthcare knowledge from the web (III)
Currently, there is a trend where users post questions and edit questions via the use of online websites. These sites are also known as Community Question Answering (CQA) sites. CQA sites are beneficial to the web users because of the valuable knowledge accumulated from everybody around the world. H...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
2013
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/51991 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-51991 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-519912023-03-03T20:34:52Z Extracting integrate and search healthcare knowledge from the web (III) Lim, Lionel Guan Chuan. School of Computer Engineering Gao Cong DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing Currently, there is a trend where users post questions and edit questions via the use of online websites. These sites are also known as Community Question Answering (CQA) sites. CQA sites are beneficial to the web users because of the valuable knowledge accumulated from everybody around the world. However, as beneficial as CQA sites may be, there comes a complexity of how to extract only relevant information which is beneficial to the web user. The goal of this project aims to consolidate healthcare information and allow web users to extract information which is beneficial to them. To do so, java-programmed web crawlers are programmed to retrieve the URL, category, question answer from the CQA health category. The question answer pairs crawled are then saved into an XML format. Lucene, a java IR java library, is used for speed indexing of the various XML documents.Another goal is to design a centralised search engine that can retrieve relevant healthcare information from CQA data. As this project will be a continuation from Senior Lee Qian Hui’s progress, i am tasked to utilise Information Retrieval Models to data crawl from more CQA sites that resemble WikiAnswers, which was previously implemented by Senior Lee. Bachelor of Engineering (Computer Science) 2013-04-19T02:42:20Z 2013-04-19T02:42:20Z 2013 2013 Final Year Project (FYP) http://hdl.handle.net/10356/51991 en Nanyang Technological University 43 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing |
spellingShingle |
DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing Lim, Lionel Guan Chuan. Extracting integrate and search healthcare knowledge from the web (III) |
description |
Currently, there is a trend where users post questions and edit questions via the use of online websites. These sites are also known as Community Question Answering (CQA) sites. CQA sites are beneficial to the web users because of the valuable knowledge accumulated from everybody around the world. However, as beneficial as CQA sites may be, there comes a complexity of how to extract only relevant information which is beneficial to the web user.
The goal of this project aims to consolidate healthcare information and allow web users to extract information which is beneficial to them. To do so, java-programmed web crawlers are programmed to retrieve the URL, category, question answer from the CQA health category. The question answer pairs crawled are then saved into an XML format. Lucene, a java IR java library, is used for speed indexing of the various XML documents.Another goal is to design a centralised search engine that can retrieve relevant healthcare information from CQA data. As this project will be a continuation from Senior Lee Qian Hui’s progress, i am tasked to utilise Information Retrieval Models to data crawl from more CQA sites that resemble WikiAnswers, which was previously implemented by Senior Lee. |
author2 |
School of Computer Engineering |
author_facet |
School of Computer Engineering Lim, Lionel Guan Chuan. |
format |
Final Year Project |
author |
Lim, Lionel Guan Chuan. |
author_sort |
Lim, Lionel Guan Chuan. |
title |
Extracting integrate and search healthcare knowledge from the web (III) |
title_short |
Extracting integrate and search healthcare knowledge from the web (III) |
title_full |
Extracting integrate and search healthcare knowledge from the web (III) |
title_fullStr |
Extracting integrate and search healthcare knowledge from the web (III) |
title_full_unstemmed |
Extracting integrate and search healthcare knowledge from the web (III) |
title_sort |
extracting integrate and search healthcare knowledge from the web (iii) |
publishDate |
2013 |
url |
http://hdl.handle.net/10356/51991 |
_version_ |
1759853975998300160 |