From footprint to evidence: An exploratory study of mining social data for credit scoring
With the booming popularity of online social networks like Twitter and Weibo, online user footprints are accumulating rapidly on the social web. Simultaneously, the question of how to leverage the large-scale user-generated social media data for personal credit scoring comes into the sight of both r...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2016
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/3455 https://ink.library.smu.edu.sg/context/sis_research/article/4456/viewcontent/FootprintEvidenceMiningSocialData_2016.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-4456 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-44562020-03-30T01:59:10Z From footprint to evidence: An exploratory study of mining social data for credit scoring GUO, Guangming ZHU, Feida CHEN, Enhong LIU, Qi WU, Le GUAN, Chu With the booming popularity of online social networks like Twitter and Weibo, online user footprints are accumulating rapidly on the social web. Simultaneously, the question of how to leverage the large-scale user-generated social media data for personal credit scoring comes into the sight of both researchers and practitioners. It has also become a topic of great importance and growing interest in the P2P lending industry. However, compared with traditional financial data, heterogeneous social data presents both opportunities and challenges for personal credit scoring. In this article, we seek a deep understanding of how to learn users’ credit labels from social data in a comprehensive and efficient way. Particularly, we explore the social-data-based credit scoring problem under the micro-blogging setting for its open, simple, and real-time nature. To identify credit-related evidence hidden in social data, we choose to conduct an analytical and empirical study on a large-scale dataset from Weibo, the largest and most popular tweet-style website in China. Summarizing results from existing credit scoring literature, we first propose three social-data-based credit scoring principles as guidelines for in-depth exploration. In addition, we glean six credit-related insights arising from empirical observations of the testbed dataset. Based on the proposed principles and insights, we extract prediction features mainly from three categories of users’ social data, including demographics, tweets, and networks. To harness this broad range of features, we put forward a two-tier stacking and boosting enhanced ensemble learning framework. Quantitative investigation of the extracted features shows that online social media data does have good potential in discriminating good credit users from bad. Furthermore, we perform experiments on the real-world Weibo dataset consisting of more than 7.3 million tweets and 200,000 users whose credit labels are known through our third-party partner. Experimental results show that (i) our approach achieves a roughly 0.625 AUC value with all the proposed social features as input, and (ii) our learning algorithm can outperform traditional credit scoring methods by as much as 17% for social-data-based personal credit scoring 2016-12-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/3455 info:doi/10.1145/2996465 https://ink.library.smu.edu.sg/context/sis_research/article/4456/viewcontent/FootprintEvidenceMiningSocialData_2016.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Consumer finance Features P2P lending Personal credit scoring Social data User profiling Databases and Information Systems Digital Communications and Networking Social Media |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Consumer finance Features P2P lending Personal credit scoring Social data User profiling Databases and Information Systems Digital Communications and Networking Social Media |
spellingShingle |
Consumer finance Features P2P lending Personal credit scoring Social data User profiling Databases and Information Systems Digital Communications and Networking Social Media GUO, Guangming ZHU, Feida CHEN, Enhong LIU, Qi WU, Le GUAN, Chu From footprint to evidence: An exploratory study of mining social data for credit scoring |
description |
With the booming popularity of online social networks like Twitter and Weibo, online user footprints are accumulating rapidly on the social web. Simultaneously, the question of how to leverage the large-scale user-generated social media data for personal credit scoring comes into the sight of both researchers and practitioners. It has also become a topic of great importance and growing interest in the P2P lending industry. However, compared with traditional financial data, heterogeneous social data presents both opportunities and challenges for personal credit scoring. In this article, we seek a deep understanding of how to learn users’ credit labels from social data in a comprehensive and efficient way. Particularly, we explore the social-data-based credit scoring problem under the micro-blogging setting for its open, simple, and real-time nature. To identify credit-related evidence hidden in social data, we choose to conduct an analytical and empirical study on a large-scale dataset from Weibo, the largest and most popular tweet-style website in China. Summarizing results from existing credit scoring literature, we first propose three social-data-based credit scoring principles as guidelines for in-depth exploration. In addition, we glean six credit-related insights arising from empirical observations of the testbed dataset. Based on the proposed principles and insights, we extract prediction features mainly from three categories of users’ social data, including demographics, tweets, and networks. To harness this broad range of features, we put forward a two-tier stacking and boosting enhanced ensemble learning framework. Quantitative investigation of the extracted features shows that online social media data does have good potential in discriminating good credit users from bad. Furthermore, we perform experiments on the real-world Weibo dataset consisting of more than 7.3 million tweets and 200,000 users whose credit labels are known through our third-party partner. Experimental results show that (i) our approach achieves a roughly 0.625 AUC value with all the proposed social features as input, and (ii) our learning algorithm can outperform traditional credit scoring methods by as much as 17% for social-data-based personal credit scoring |
format |
text |
author |
GUO, Guangming ZHU, Feida CHEN, Enhong LIU, Qi WU, Le GUAN, Chu |
author_facet |
GUO, Guangming ZHU, Feida CHEN, Enhong LIU, Qi WU, Le GUAN, Chu |
author_sort |
GUO, Guangming |
title |
From footprint to evidence: An exploratory study of mining social data for credit scoring |
title_short |
From footprint to evidence: An exploratory study of mining social data for credit scoring |
title_full |
From footprint to evidence: An exploratory study of mining social data for credit scoring |
title_fullStr |
From footprint to evidence: An exploratory study of mining social data for credit scoring |
title_full_unstemmed |
From footprint to evidence: An exploratory study of mining social data for credit scoring |
title_sort |
from footprint to evidence: an exploratory study of mining social data for credit scoring |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2016 |
url |
https://ink.library.smu.edu.sg/sis_research/3455 https://ink.library.smu.edu.sg/context/sis_research/article/4456/viewcontent/FootprintEvidenceMiningSocialData_2016.pdf |
_version_ |
1770573205824602112 |