Correlation and causation analysis for cross-sectional and panel data
This dissertation investigates how data, algorithms, and expert knowledge can be harnessed to better understand human behavior and enhance well-being. It emphasizes the critical importance of interdisciplinary collaboration to bridge knowledge gaps and foster insights that support preventive care, c...
Saved in:
Main Author: | |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2024
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/etd_coll/665 https://ink.library.smu.edu.sg/context/etd_coll/article/1663/viewcontent/GPIS_AY2019_PhD_Barry_Nuqoba.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.etd_coll-1663 |
---|---|
record_format |
dspace |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Correlation Causality Cross-Sectional Panel Data Interdisciplinary Social Sciences Databases and Information Systems |
spellingShingle |
Correlation Causality Cross-Sectional Panel Data Interdisciplinary Social Sciences Databases and Information Systems NUQOBA, Barry Correlation and causation analysis for cross-sectional and panel data |
description |
This dissertation investigates how data, algorithms, and expert knowledge can be harnessed to better understand human behavior and enhance well-being. It emphasizes the critical importance of interdisciplinary collaboration to bridge knowledge gaps and foster insights that support preventive care, causal theory advancement, and policy development.
The first study, part of the SHINESeniors project, shed light on the potential usefulness of passive, unobtrusive sensors for detecting nocturia and poor sleep quality, symptoms commonly observed in chronic diseases, thereby enabling live-alone older adults to age in place. Utilizing machine learning techniques on sensor-derived features, the study can identify nocturia and poor sleep in this demographic. However, the study faces reliability issues due to the limited number of samples. Subsequently, through correlation analysis, our study revealed associations between nocturia, poor sleep quality, frailty, low energy, and sedentary behavior. While the correlational insights are useful for prediction purposes, their application to assist decision-making is limited due to the inexistence of causal direction in the discovered relationships. These findings inform the practical usefulness of passive, unobtrusive sensors for preventive care, offering valuable implications for aging-in-place strategies.
To complement the correlational analysis, the second study explores causal discovery algorithms potentially beneficial for social science applications. By evaluating several methods, the research found that deep-learning-based algorithms hold promise for generating temporal and individual-level causal graphs from social science panel data. However, the study highlights the challenges in applying these algorithms due to the unique characteristics of social science datasets. Therefore, future studies should investigate how to improve the generalizability of the algorithms for social science applications. In the subsequent exploration, the study identifies that constraint- and hybrid-based algorithms can be beneficial for social scientists in assisting causal exploration. However, the two case studies demonstrate that solely applying these algorithms can lead to inconclusive causal relationships and missing out on important causal relationships. In addition, the case studies also highlight that the incorporation of domain experts can complement the mentioned limitations of the algorithms. Therefore, the application of causal discovery algorithms for social science should be combined with domain expertise. These findings shed light on the potential usefulness and required adaptions for causal discovery algorithms to be applied in social science, offering valuable implications for advancing causal analysis in social sciences.
The third study explores another usefulness of causal discovery algorithms for assisting index derivation. The study proposes a causal-based feature selection to identify important indicators of well-being from hundreds of them in the Singapore Life Panel (SLP). After conducting experiments, the study found that causal discovery can be helpful to some extent in selecting important indicators. However, factor analysis shows that our models for deriving the index have fitness below ideal values. The study highlights the misspecification of several variables during the data preparation is the root cause of the unexpected results. Even though unable to inform critical indicators that lead to reliable indexes, the study highlights several lessons, including the importance of domain expertise in guiding the data preparation to facilitate the generation of reliable indexes. In addition, the proposed causal-based feature selection can be useful in identifying important variables if the target variable is latent, such as well-being. These findings shed light on the potential usefulness of causal discovery algorithms for assisting index derivation, offering valuable implications for assisting well-being assessment and monitoring.
Building upon earlier findings, the final study proposes a causal-driven collaborative framework that integrates domain expertise, causal discovery algorithms, structural causal models (SCM), and subjective perspectives from optional actors (i.e., data respondents and policymakers). Through a case study analyzing the more plausible causal direction between social activities and life satisfaction in the context of Singaporean older adults, the research obtains several findings. The findings suggest that the causal direction from social activities to life satisfaction is more plausible, with some activities having direct effects while others are mediated by different variables. Moreover, longitudinal analysis informs that social activities have expiring effects on life satisfaction. Additionally, simulating two policies using SCM reveals that multifaceted initiatives targeting various social activities yield greater improvements in life satisfaction than single-faceted approaches. Other findings show how subjective judgments and theoretical understanding are important for contextualizing and assessing the results of the algorithms. Despite its vast potential, the study is aware that the framework is still in its infancy. Therefore, future studies should investigate how to balance between data- and theory-driven approaches if conflicting results arise.
In conclusion, this dissertation presents a promising model for collaboration between computer and social sciences and paves the way for future advancements in the field. The integration of computational algorithms to extract causal information from large datasets is poised to become an essential resource for social scientists, empowering them to unravel complex social phenomena. As this area continues to evolve, the frameworks and concepts presented here aim to serve as a valuable guide for researchers and practitioners alike, fostering interdisciplinary projects that bridge the gap between these two vital domains. By embracing these innovative approaches, we can unlock new insights and drive meaningful change in our understanding of human behavior and well-being. |
format |
text |
author |
NUQOBA, Barry |
author_facet |
NUQOBA, Barry |
author_sort |
NUQOBA, Barry |
title |
Correlation and causation analysis for cross-sectional and panel data |
title_short |
Correlation and causation analysis for cross-sectional and panel data |
title_full |
Correlation and causation analysis for cross-sectional and panel data |
title_fullStr |
Correlation and causation analysis for cross-sectional and panel data |
title_full_unstemmed |
Correlation and causation analysis for cross-sectional and panel data |
title_sort |
correlation and causation analysis for cross-sectional and panel data |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2024 |
url |
https://ink.library.smu.edu.sg/etd_coll/665 https://ink.library.smu.edu.sg/context/etd_coll/article/1663/viewcontent/GPIS_AY2019_PhD_Barry_Nuqoba.pdf |
_version_ |
1827070759351091200 |
spelling |
sg-smu-ink.etd_coll-16632025-02-13T05:51:12Z Correlation and causation analysis for cross-sectional and panel data NUQOBA, Barry This dissertation investigates how data, algorithms, and expert knowledge can be harnessed to better understand human behavior and enhance well-being. It emphasizes the critical importance of interdisciplinary collaboration to bridge knowledge gaps and foster insights that support preventive care, causal theory advancement, and policy development. The first study, part of the SHINESeniors project, shed light on the potential usefulness of passive, unobtrusive sensors for detecting nocturia and poor sleep quality, symptoms commonly observed in chronic diseases, thereby enabling live-alone older adults to age in place. Utilizing machine learning techniques on sensor-derived features, the study can identify nocturia and poor sleep in this demographic. However, the study faces reliability issues due to the limited number of samples. Subsequently, through correlation analysis, our study revealed associations between nocturia, poor sleep quality, frailty, low energy, and sedentary behavior. While the correlational insights are useful for prediction purposes, their application to assist decision-making is limited due to the inexistence of causal direction in the discovered relationships. These findings inform the practical usefulness of passive, unobtrusive sensors for preventive care, offering valuable implications for aging-in-place strategies. To complement the correlational analysis, the second study explores causal discovery algorithms potentially beneficial for social science applications. By evaluating several methods, the research found that deep-learning-based algorithms hold promise for generating temporal and individual-level causal graphs from social science panel data. However, the study highlights the challenges in applying these algorithms due to the unique characteristics of social science datasets. Therefore, future studies should investigate how to improve the generalizability of the algorithms for social science applications. In the subsequent exploration, the study identifies that constraint- and hybrid-based algorithms can be beneficial for social scientists in assisting causal exploration. However, the two case studies demonstrate that solely applying these algorithms can lead to inconclusive causal relationships and missing out on important causal relationships. In addition, the case studies also highlight that the incorporation of domain experts can complement the mentioned limitations of the algorithms. Therefore, the application of causal discovery algorithms for social science should be combined with domain expertise. These findings shed light on the potential usefulness and required adaptions for causal discovery algorithms to be applied in social science, offering valuable implications for advancing causal analysis in social sciences. The third study explores another usefulness of causal discovery algorithms for assisting index derivation. The study proposes a causal-based feature selection to identify important indicators of well-being from hundreds of them in the Singapore Life Panel (SLP). After conducting experiments, the study found that causal discovery can be helpful to some extent in selecting important indicators. However, factor analysis shows that our models for deriving the index have fitness below ideal values. The study highlights the misspecification of several variables during the data preparation is the root cause of the unexpected results. Even though unable to inform critical indicators that lead to reliable indexes, the study highlights several lessons, including the importance of domain expertise in guiding the data preparation to facilitate the generation of reliable indexes. In addition, the proposed causal-based feature selection can be useful in identifying important variables if the target variable is latent, such as well-being. These findings shed light on the potential usefulness of causal discovery algorithms for assisting index derivation, offering valuable implications for assisting well-being assessment and monitoring. Building upon earlier findings, the final study proposes a causal-driven collaborative framework that integrates domain expertise, causal discovery algorithms, structural causal models (SCM), and subjective perspectives from optional actors (i.e., data respondents and policymakers). Through a case study analyzing the more plausible causal direction between social activities and life satisfaction in the context of Singaporean older adults, the research obtains several findings. The findings suggest that the causal direction from social activities to life satisfaction is more plausible, with some activities having direct effects while others are mediated by different variables. Moreover, longitudinal analysis informs that social activities have expiring effects on life satisfaction. Additionally, simulating two policies using SCM reveals that multifaceted initiatives targeting various social activities yield greater improvements in life satisfaction than single-faceted approaches. Other findings show how subjective judgments and theoretical understanding are important for contextualizing and assessing the results of the algorithms. Despite its vast potential, the study is aware that the framework is still in its infancy. Therefore, future studies should investigate how to balance between data- and theory-driven approaches if conflicting results arise. In conclusion, this dissertation presents a promising model for collaboration between computer and social sciences and paves the way for future advancements in the field. The integration of computational algorithms to extract causal information from large datasets is poised to become an essential resource for social scientists, empowering them to unravel complex social phenomena. As this area continues to evolve, the frameworks and concepts presented here aim to serve as a valuable guide for researchers and practitioners alike, fostering interdisciplinary projects that bridge the gap between these two vital domains. By embracing these innovative approaches, we can unlock new insights and drive meaningful change in our understanding of human behavior and well-being. 2024-12-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/etd_coll/665 https://ink.library.smu.edu.sg/context/etd_coll/article/1663/viewcontent/GPIS_AY2019_PhD_Barry_Nuqoba.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Dissertations and Theses Collection (Open Access) eng Institutional Knowledge at Singapore Management University Correlation Causality Cross-Sectional Panel Data Interdisciplinary Social Sciences Databases and Information Systems |