Review of feature extraction approaches on biomedical text classification
The overcoming volume of online biomedical literature causes congestion of data and difficulties in organizing these documents and also to retrieve the required documents from the database, especially in the Medline database. One of the solutions to surpass the overwhelming of documents is to apply...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Published: |
Inst Advanced Science Extension
2020
|
Subjects: | |
Online Access: | http://eprints.utm.my/id/eprint/87028/ http://www.dx.doi.org/10.21833/ijaas.2020.04.001 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Teknologi Malaysia |
id |
my.utm.87028 |
---|---|
record_format |
eprints |
spelling |
my.utm.870282020-10-31T12:16:41Z http://eprints.utm.my/id/eprint/87028/ Review of feature extraction approaches on biomedical text classification Dollah, R. Jafni, T. I. Hashim, H. Othman, M. S. Rasib, A. W. QA Mathematics The overcoming volume of online biomedical literature causes congestion of data and difficulties in organizing these documents and also to retrieve the required documents from the database, especially in the Medline database. One of the solutions to surpass the overwhelming of documents is to apply classification. However, each document must be represented by a set of terminology or feature vectors. The identification of terminology or feature from biomedical literature is one of the most important and challenging tasks in text classification. This is due to a large number of new features and entities that appear in the biomedical domain. In addition, combining sets of features from different terminological resources leads to naming conflicts such as homonymous use of names and terminological ambiguities. Therefore, the purpose of this research is to investigate and evaluate the effective ways for extracting the relevant and meaningful features in order to increase the classification accuracy and improve the performance of web searches. Towards this effort, we conduct several classification experiments to evaluate and compare the effectiveness of feature extraction approaches for extracting the relevant and informative features from the biomedical literature. For our experiments, we use two different sets of features, which are a set of features that are extracted using the Genia tagger tool and set of features that are extracted by medical experts from Pusat Perubatan Universiti Kebangsaan Malaysia (PPUKM). The results show the performance of classification using features that are extracted by medical experts outperform the performance of classification using the Genia Tagger tool when applying feature selection method. Inst Advanced Science Extension 2020-04 Article PeerReviewed Dollah, R. and Jafni, T. I. and Hashim, H. and Othman, M. S. and Rasib, A. W. (2020) Review of feature extraction approaches on biomedical text classification. International Journal of Advanced And Applied Sciences, 7 (4). pp. 1-8. http://www.dx.doi.org/10.21833/ijaas.2020.04.001 DOI:10.21833/ijaas.2020.04.001 |
institution |
Universiti Teknologi Malaysia |
building |
UTM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Teknologi Malaysia |
content_source |
UTM Institutional Repository |
url_provider |
http://eprints.utm.my/ |
topic |
QA Mathematics |
spellingShingle |
QA Mathematics Dollah, R. Jafni, T. I. Hashim, H. Othman, M. S. Rasib, A. W. Review of feature extraction approaches on biomedical text classification |
description |
The overcoming volume of online biomedical literature causes congestion of data and difficulties in organizing these documents and also to retrieve the required documents from the database, especially in the Medline database. One of the solutions to surpass the overwhelming of documents is to apply classification. However, each document must be represented by a set of terminology or feature vectors. The identification of terminology or feature from biomedical literature is one of the most important and challenging tasks in text classification. This is due to a large number of new features and entities that appear in the biomedical domain. In addition, combining sets of features from different terminological resources leads to naming conflicts such as homonymous use of names and terminological ambiguities. Therefore, the purpose of this research is to investigate and evaluate the effective ways for extracting the relevant and meaningful features in order to increase the classification accuracy and improve the performance of web searches. Towards this effort, we conduct several classification experiments to evaluate and compare the effectiveness of feature extraction approaches for extracting the relevant and informative features from the biomedical literature. For our experiments, we use two different sets of features, which are a set of features that are extracted using the Genia tagger tool and set of features that are extracted by medical experts from Pusat Perubatan Universiti Kebangsaan Malaysia (PPUKM). The results show the performance of classification using features that are extracted by medical experts outperform the performance of classification using the Genia Tagger tool when applying feature selection method. |
format |
Article |
author |
Dollah, R. Jafni, T. I. Hashim, H. Othman, M. S. Rasib, A. W. |
author_facet |
Dollah, R. Jafni, T. I. Hashim, H. Othman, M. S. Rasib, A. W. |
author_sort |
Dollah, R. |
title |
Review of feature extraction approaches on biomedical text classification |
title_short |
Review of feature extraction approaches on biomedical text classification |
title_full |
Review of feature extraction approaches on biomedical text classification |
title_fullStr |
Review of feature extraction approaches on biomedical text classification |
title_full_unstemmed |
Review of feature extraction approaches on biomedical text classification |
title_sort |
review of feature extraction approaches on biomedical text classification |
publisher |
Inst Advanced Science Extension |
publishDate |
2020 |
url |
http://eprints.utm.my/id/eprint/87028/ http://www.dx.doi.org/10.21833/ijaas.2020.04.001 |
_version_ |
1683230692621680640 |