Degree and centrality-based approaches in network-based variable selection : insights from the Singapore Longitudinal Aging Study

We describe a network-based method to obtain a subset of representative variables from clinical data of subjects of the second Singapore Longitudinal Aging Study (SLAS-2), while preserving to a good extent the predictive performance of the full set with regards to a multi-faceted index of successful...

Full description

Saved in:
Bibliographic Details
Main Authors: Valenzuela, Jesus Felix Bayta, Monterola, Christopher, Tong, Victor Joo Chuan, Fülöp, Tamàs, Ng, Tze Pin, Larbi, Anis
Other Authors: Mariño, Inés P.
Format: Article
Language:English
Published: 2019
Subjects:
Online Access:https://hdl.handle.net/10356/85645
http://hdl.handle.net/10220/49823
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-85645
record_format dspace
spelling sg-ntu-dr.10356-856452023-02-28T17:01:04Z Degree and centrality-based approaches in network-based variable selection : insights from the Singapore Longitudinal Aging Study Valenzuela, Jesus Felix Bayta Monterola, Christopher Tong, Victor Joo Chuan Fülöp, Tamàs Ng, Tze Pin Larbi, Anis Mariño, Inés P. School of Biological Sciences Science::Biological sciences Variables Network-based We describe a network-based method to obtain a subset of representative variables from clinical data of subjects of the second Singapore Longitudinal Aging Study (SLAS-2), while preserving to a good extent the predictive performance of the full set with regards to a multi-faceted index of successful aging, SAGE. To examine differences in predictive performance of high-degree nodes (“hubs”) and high-centrality ones (“cores”), we implement four subsetting strategies (two degree-based, two centrality-based) and obtain four surrogate sets of variables, which we use as input features for machine learning models to predict the SAGE index of subjects. All four models have variables belonging to the physical, cardiovascular, cognitive and immunological domains among their fifteen most important predictors. A fifth domain (leisure-time activities, LTA) is also present in some form. From a comparison of the surrogate sets’ size and predictive performance, a centrality-based approach (selection of the most central variable-nodes within each cluster) yielded the smallest-sized surrogate set, while having high prediction accuracy (measured by its model’s area-under-curve, AUC) in comparison to its analogous degree-based strategy (selection of the highest-degree nodes per cluster). Inclusion of the next most-central variables yielded negligible changes in predictive performance while more than doubling the surrogate set size. The centrality-based approach thus yields a surrogate set which offers a good balance between number of variables and prediction performance, and can act as a representative subset of the SLAS-2 clinical dataset. ASTAR (Agency for Sci., Tech. and Research, S’pore) Published version 2019-08-30T02:56:51Z 2019-12-06T16:07:47Z 2019-08-30T02:56:51Z 2019-12-06T16:07:47Z 2019 Journal Article Valenzuela, J. F. B., Monterola, C., Tong, V. J. C., Fülöp, T., Ng, T. P., & Larbi, A. (2019). Degree and centrality-based approaches in network-based variable selection : insights from the Singapore Longitudinal Aging Study. PLOS ONE, 14(7), e0219186-. doi:10.1371/journal.pone.0219186 https://hdl.handle.net/10356/85645 http://hdl.handle.net/10220/49823 10.1371/journal.pone.0219186 en PLOS ONE © 2019 Valenzuela et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. 19 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Science::Biological sciences
Variables
Network-based
spellingShingle Science::Biological sciences
Variables
Network-based
Valenzuela, Jesus Felix Bayta
Monterola, Christopher
Tong, Victor Joo Chuan
Fülöp, Tamàs
Ng, Tze Pin
Larbi, Anis
Degree and centrality-based approaches in network-based variable selection : insights from the Singapore Longitudinal Aging Study
description We describe a network-based method to obtain a subset of representative variables from clinical data of subjects of the second Singapore Longitudinal Aging Study (SLAS-2), while preserving to a good extent the predictive performance of the full set with regards to a multi-faceted index of successful aging, SAGE. To examine differences in predictive performance of high-degree nodes (“hubs”) and high-centrality ones (“cores”), we implement four subsetting strategies (two degree-based, two centrality-based) and obtain four surrogate sets of variables, which we use as input features for machine learning models to predict the SAGE index of subjects. All four models have variables belonging to the physical, cardiovascular, cognitive and immunological domains among their fifteen most important predictors. A fifth domain (leisure-time activities, LTA) is also present in some form. From a comparison of the surrogate sets’ size and predictive performance, a centrality-based approach (selection of the most central variable-nodes within each cluster) yielded the smallest-sized surrogate set, while having high prediction accuracy (measured by its model’s area-under-curve, AUC) in comparison to its analogous degree-based strategy (selection of the highest-degree nodes per cluster). Inclusion of the next most-central variables yielded negligible changes in predictive performance while more than doubling the surrogate set size. The centrality-based approach thus yields a surrogate set which offers a good balance between number of variables and prediction performance, and can act as a representative subset of the SLAS-2 clinical dataset.
author2 Mariño, Inés P.
author_facet Mariño, Inés P.
Valenzuela, Jesus Felix Bayta
Monterola, Christopher
Tong, Victor Joo Chuan
Fülöp, Tamàs
Ng, Tze Pin
Larbi, Anis
format Article
author Valenzuela, Jesus Felix Bayta
Monterola, Christopher
Tong, Victor Joo Chuan
Fülöp, Tamàs
Ng, Tze Pin
Larbi, Anis
author_sort Valenzuela, Jesus Felix Bayta
title Degree and centrality-based approaches in network-based variable selection : insights from the Singapore Longitudinal Aging Study
title_short Degree and centrality-based approaches in network-based variable selection : insights from the Singapore Longitudinal Aging Study
title_full Degree and centrality-based approaches in network-based variable selection : insights from the Singapore Longitudinal Aging Study
title_fullStr Degree and centrality-based approaches in network-based variable selection : insights from the Singapore Longitudinal Aging Study
title_full_unstemmed Degree and centrality-based approaches in network-based variable selection : insights from the Singapore Longitudinal Aging Study
title_sort degree and centrality-based approaches in network-based variable selection : insights from the singapore longitudinal aging study
publishDate 2019
url https://hdl.handle.net/10356/85645
http://hdl.handle.net/10220/49823
_version_ 1759853659054669824