What can scatterplots teach us about doing data science better?
A scatterplot is often the graph of choice for displaying the relationship between two variables. Scatterplots are useful for exploratory analysis, but can do much more than just identifying correlations. As data sets get larger and more complex, relying solely on “eye power” alone may cause us to m...
Saved in:
Main Authors: | , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2022
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/163629 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-163629 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1636292022-12-13T01:59:52Z What can scatterplots teach us about doing data science better? Goh, Wilson Wen Bin Foo, Reuben Jyong Kiat Wong, Limsoon Lee Kong Chian School of Medicine (LKCMedicine) School of Chemical and Biomedical Engineering School of Biological Sciences Centre for Biomedical Informatics Science::Mathematics Scatterplots Visualization A scatterplot is often the graph of choice for displaying the relationship between two variables. Scatterplots are useful for exploratory analysis, but can do much more than just identifying correlations. As data sets get larger and more complex, relying solely on “eye power” alone may cause us to miss interesting associations, or worse, make wrong interpretations. We show that by combining scatterplots with statistical and logical reasoning (the sliding window and two-axis median bisection), we may identify interesting associations in a case study of Graduate Record Examination admission versus graduation outcomes, and whether low detectability of proteins in a biological sample are truly associated with low abundance. Due to subjective visual interpretability, we recommend graphing the data using a multitude of visual variables and graph types before concluding the absence of an association. Finally, even if associations are demonstrable, developing causal models that could explain the observed fuzziness and lack of apparent correlations in the scatterplot are helpful for better decision-making and interpretation. Ministry of Education (MOE) This work is supported by the Ministry of Education, Singapore, under its Academic Research Fund Tier-1 (RG35/20) to WWBG. This work is also supported in part by a Singapore Ministry of Education Tier-2 Grant (MOE2019-T21-042) to LW and WWBG. 2022-12-13T01:59:51Z 2022-12-13T01:59:51Z 2022 Journal Article Goh, W. W. B., Foo, R. J. K. & Wong, L. (2022). What can scatterplots teach us about doing data science better?. International Journal of Data Science and Analytics. https://dx.doi.org/10.1007/s41060-022-00362-9 2364-415X https://hdl.handle.net/10356/163629 10.1007/s41060-022-00362-9 2-s2.0-85137559546 en RG35/20 MOE2019-T21-042 International Journal of Data Science and Analytics © The Author(s), under exclusive licence to Springer Nature Switzerland AG 2022. |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Science::Mathematics Scatterplots Visualization |
spellingShingle |
Science::Mathematics Scatterplots Visualization Goh, Wilson Wen Bin Foo, Reuben Jyong Kiat Wong, Limsoon What can scatterplots teach us about doing data science better? |
description |
A scatterplot is often the graph of choice for displaying the relationship between two variables. Scatterplots are useful for exploratory analysis, but can do much more than just identifying correlations. As data sets get larger and more complex, relying solely on “eye power” alone may cause us to miss interesting associations, or worse, make wrong interpretations. We show that by combining scatterplots with statistical and logical reasoning (the sliding window and two-axis median bisection), we may identify interesting associations in a case study of Graduate Record Examination admission versus graduation outcomes, and whether low detectability of proteins in a biological sample are truly associated with low abundance. Due to subjective visual interpretability, we recommend graphing the data using a multitude of visual variables and graph types before concluding the absence of an association. Finally, even if associations are demonstrable, developing causal models that could explain the observed fuzziness and lack of apparent correlations in the scatterplot are helpful for better decision-making and interpretation. |
author2 |
Lee Kong Chian School of Medicine (LKCMedicine) |
author_facet |
Lee Kong Chian School of Medicine (LKCMedicine) Goh, Wilson Wen Bin Foo, Reuben Jyong Kiat Wong, Limsoon |
format |
Article |
author |
Goh, Wilson Wen Bin Foo, Reuben Jyong Kiat Wong, Limsoon |
author_sort |
Goh, Wilson Wen Bin |
title |
What can scatterplots teach us about doing data science better? |
title_short |
What can scatterplots teach us about doing data science better? |
title_full |
What can scatterplots teach us about doing data science better? |
title_fullStr |
What can scatterplots teach us about doing data science better? |
title_full_unstemmed |
What can scatterplots teach us about doing data science better? |
title_sort |
what can scatterplots teach us about doing data science better? |
publishDate |
2022 |
url |
https://hdl.handle.net/10356/163629 |
_version_ |
1753801103365898240 |