Xgboost-based framework for smoking-induced noncommunicable disease prediction

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. Smoking-induced noncommunicable diseases (SiNCDs) have become a significant threat to public health and cause of death globally. In the last decade, numerous studies have been proposed using artificial intelligence techniques to predict the r...

Full description

Saved in:
Bibliographic Details
Main Authors: Khishigsuren Davagdorj, Van Huy Pham, Nipon Theera-Umpon, Keun Ho Ryu
Format: Journal
Published: 2020
Subjects:
Online Access:https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85090613582&origin=inward
http://cmuir.cmu.ac.th/jspui/handle/6653943832/70607
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Chiang Mai University
id th-cmuir.6653943832-70607
record_format dspace
spelling th-cmuir.6653943832-706072020-10-14T08:41:01Z Xgboost-based framework for smoking-induced noncommunicable disease prediction Khishigsuren Davagdorj Van Huy Pham Nipon Theera-Umpon Keun Ho Ryu Environmental Science Medicine © 2020 by the authors. Licensee MDPI, Basel, Switzerland. Smoking-induced noncommunicable diseases (SiNCDs) have become a significant threat to public health and cause of death globally. In the last decade, numerous studies have been proposed using artificial intelligence techniques to predict the risk of developing SiNCDs. However, determining the most significant features and developing interpretable models are rather challenging in such systems. In this study, we propose an efficient extreme gradient boosting (XGBoost) based framework incorporated with the hybrid feature selection (HFS) method for SiNCDs prediction among the general population in South Korea and the United States. Initially, HFS is performed in three stages: (I) significant features are selected by t-test and chi-square test; (II) multicollinearity analysis serves to obtain dissimilar features; (III) final selection of best representative features is done based on least absolute shrinkage and selection operator (LASSO). Then, selected features are fed into the XGBoost predictive model. The experimental results show that our proposed model outperforms several existing baseline models. In addition, the proposed model also provides important features in order to enhance the interpretability of the SiNCDs prediction model. Consequently, the XGBoost based framework is expected to contribute for early diagnosis and prevention of the SiNCDs in public health concerns. 2020-10-14T08:35:24Z 2020-10-14T08:35:24Z 2020-09-02 Journal 16604601 16617827 2-s2.0-85090613582 10.3390/ijerph17186513 https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85090613582&origin=inward http://cmuir.cmu.ac.th/jspui/handle/6653943832/70607
institution Chiang Mai University
building Chiang Mai University Library
continent Asia
country Thailand
Thailand
content_provider Chiang Mai University Library
collection CMU Intellectual Repository
topic Environmental Science
Medicine
spellingShingle Environmental Science
Medicine
Khishigsuren Davagdorj
Van Huy Pham
Nipon Theera-Umpon
Keun Ho Ryu
Xgboost-based framework for smoking-induced noncommunicable disease prediction
description © 2020 by the authors. Licensee MDPI, Basel, Switzerland. Smoking-induced noncommunicable diseases (SiNCDs) have become a significant threat to public health and cause of death globally. In the last decade, numerous studies have been proposed using artificial intelligence techniques to predict the risk of developing SiNCDs. However, determining the most significant features and developing interpretable models are rather challenging in such systems. In this study, we propose an efficient extreme gradient boosting (XGBoost) based framework incorporated with the hybrid feature selection (HFS) method for SiNCDs prediction among the general population in South Korea and the United States. Initially, HFS is performed in three stages: (I) significant features are selected by t-test and chi-square test; (II) multicollinearity analysis serves to obtain dissimilar features; (III) final selection of best representative features is done based on least absolute shrinkage and selection operator (LASSO). Then, selected features are fed into the XGBoost predictive model. The experimental results show that our proposed model outperforms several existing baseline models. In addition, the proposed model also provides important features in order to enhance the interpretability of the SiNCDs prediction model. Consequently, the XGBoost based framework is expected to contribute for early diagnosis and prevention of the SiNCDs in public health concerns.
format Journal
author Khishigsuren Davagdorj
Van Huy Pham
Nipon Theera-Umpon
Keun Ho Ryu
author_facet Khishigsuren Davagdorj
Van Huy Pham
Nipon Theera-Umpon
Keun Ho Ryu
author_sort Khishigsuren Davagdorj
title Xgboost-based framework for smoking-induced noncommunicable disease prediction
title_short Xgboost-based framework for smoking-induced noncommunicable disease prediction
title_full Xgboost-based framework for smoking-induced noncommunicable disease prediction
title_fullStr Xgboost-based framework for smoking-induced noncommunicable disease prediction
title_full_unstemmed Xgboost-based framework for smoking-induced noncommunicable disease prediction
title_sort xgboost-based framework for smoking-induced noncommunicable disease prediction
publishDate 2020
url https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85090613582&origin=inward
http://cmuir.cmu.ac.th/jspui/handle/6653943832/70607
_version_ 1681752933492850688