How to find a perfect data scientist : a distance-metric learning approach

The title of data scientist has been described as one of the sexiest jobs of the 21st century. Numerous efforts have been made to define the job of a data scientist in a qualitative manner by, for example, listing the job functions and required skill sets of data scientists. However, to the best of...

Full description

Saved in:
Bibliographic Details
Main Authors: Hu, Han, Luo, Yong, Wen, Yonggang, Ong, Yew-Soon, Zhang, Xinwen
Other Authors: School of Computer Science and Engineering
Format: Article
Language:English
Published: 2018
Subjects:
Online Access:https://hdl.handle.net/10356/103255
http://hdl.handle.net/10220/47276
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-103255
record_format dspace
spelling sg-ntu-dr.10356-1032552020-03-07T11:50:49Z How to find a perfect data scientist : a distance-metric learning approach Hu, Han Luo, Yong Wen, Yonggang Ong, Yew-Soon Zhang, Xinwen School of Computer Science and Engineering Data Science and Artificial Intelligence Research Centre Natural Language Processing DRNTU::Engineering::Computer science and engineering Data Scientist The title of data scientist has been described as one of the sexiest jobs of the 21st century. Numerous efforts have been made to define the job of a data scientist in a qualitative manner by, for example, listing the job functions and required skill sets of data scientists. However, to the best of our knowledge, no attempt has been made to define the term data scientist in a scientific manner. In this paper, we address this issue by using a data-driven approach to answer three questions: 1) What is a proper definition of the term data scientist from a market-demand perspective? 2) Do self-described data scientists meet the market demand? and 3) Finally, how can companies efficiently recruit data scientists that match their openings? To answer these questions, we crawl two data sets for the supply and demand sides. For the former, we collect a set of data scientist user profiles from LinkedIn; for the latter, we collect a set of data scientist job descriptions from Monster. We first parse the set of data scientist job descriptions via natural language processing techniques and derive a scientific definition of the job of a data scientist via a clustering algorithm. Second, we use the same approach to determine that, under the aforementioned definition, self-claimed data scientists on the market would meet the market demand with a high probability. Finally, we introduce a distance-metric learning approach that can be used by companies to find data scientist candidates that match their openings. We achieve an average precision of 12.31%; i.e., one in ten candidates with matching qualifications would accept a given offer. The application of this quantitative approach could significantly reduce the human-resource costs incurred by companies in recruiting matching data scientists. Published version 2018-12-28T06:18:18Z 2019-12-06T21:08:28Z 2018-12-28T06:18:18Z 2019-12-06T21:08:28Z 2018 Journal Article Hu, H., Luo, Y., Wen, Y., Ong, Y.-S., & Zhang, X. (2018). How to find a perfect data scientist : a distance-metric learning approach. IEEE Access, 6, 60380-60395. doi:10.1109/ACCESS.2018.2870535 https://hdl.handle.net/10356/103255 http://hdl.handle.net/10220/47276 10.1109/ACCESS.2018.2870535 en IEEE Access © 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. 16 p. application/pdf
institution Nanyang Technological University
building NTU Library
country Singapore
collection DR-NTU
language English
topic Natural Language Processing
DRNTU::Engineering::Computer science and engineering
Data Scientist
spellingShingle Natural Language Processing
DRNTU::Engineering::Computer science and engineering
Data Scientist
Hu, Han
Luo, Yong
Wen, Yonggang
Ong, Yew-Soon
Zhang, Xinwen
How to find a perfect data scientist : a distance-metric learning approach
description The title of data scientist has been described as one of the sexiest jobs of the 21st century. Numerous efforts have been made to define the job of a data scientist in a qualitative manner by, for example, listing the job functions and required skill sets of data scientists. However, to the best of our knowledge, no attempt has been made to define the term data scientist in a scientific manner. In this paper, we address this issue by using a data-driven approach to answer three questions: 1) What is a proper definition of the term data scientist from a market-demand perspective? 2) Do self-described data scientists meet the market demand? and 3) Finally, how can companies efficiently recruit data scientists that match their openings? To answer these questions, we crawl two data sets for the supply and demand sides. For the former, we collect a set of data scientist user profiles from LinkedIn; for the latter, we collect a set of data scientist job descriptions from Monster. We first parse the set of data scientist job descriptions via natural language processing techniques and derive a scientific definition of the job of a data scientist via a clustering algorithm. Second, we use the same approach to determine that, under the aforementioned definition, self-claimed data scientists on the market would meet the market demand with a high probability. Finally, we introduce a distance-metric learning approach that can be used by companies to find data scientist candidates that match their openings. We achieve an average precision of 12.31%; i.e., one in ten candidates with matching qualifications would accept a given offer. The application of this quantitative approach could significantly reduce the human-resource costs incurred by companies in recruiting matching data scientists.
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Hu, Han
Luo, Yong
Wen, Yonggang
Ong, Yew-Soon
Zhang, Xinwen
format Article
author Hu, Han
Luo, Yong
Wen, Yonggang
Ong, Yew-Soon
Zhang, Xinwen
author_sort Hu, Han
title How to find a perfect data scientist : a distance-metric learning approach
title_short How to find a perfect data scientist : a distance-metric learning approach
title_full How to find a perfect data scientist : a distance-metric learning approach
title_fullStr How to find a perfect data scientist : a distance-metric learning approach
title_full_unstemmed How to find a perfect data scientist : a distance-metric learning approach
title_sort how to find a perfect data scientist : a distance-metric learning approach
publishDate 2018
url https://hdl.handle.net/10356/103255
http://hdl.handle.net/10220/47276
_version_ 1681042484146208768