Data pricing in machine learning pipelines

Machine learning is disruptive. At the same time, machine learning can only succeed by collaboration among many parties in multiple steps naturally as pipelines in an eco-system, such as collecting data for possible machine learning applications, collaboratively training models by multiple parties a...

Full description

Saved in:
Bibliographic Details
Main Authors: CONG, Zicun, LUO, Xuan, PEI, Jian, ZHU, Feida, ZHANG, Yong
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2022
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/7755
https://ink.library.smu.edu.sg/context/sis_research/article/8758/viewcontent/Data_pricing_in_machine_learning_pipelines.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-8758
record_format dspace
spelling sg-smu-ink.sis_research-87582023-01-19T10:14:28Z Data pricing in machine learning pipelines CONG, Zicun LUO, Xuan PEI, Jian ZHU, Feida ZHANG, Yong Machine learning is disruptive. At the same time, machine learning can only succeed by collaboration among many parties in multiple steps naturally as pipelines in an eco-system, such as collecting data for possible machine learning applications, collaboratively training models by multiple parties and delivering machine learning services to end users. Data are critical and penetrating in the whole machine learning pipelines. As machine learning pipelines involve many parties and, in order to be successful, have to form a constructive and dynamic eco-system, marketplaces and data pricing are fundamental in connecting and facilitating those many parties. In this article, we survey the principles and the latest research development of data pricing in machine learning pipelines. We start with a brief review of data marketplaces and pricing desiderata. Then, we focus on pricing in three important steps in machine learning pipelines. To understand pricing in the step of training data collection, we review pricing raw data sets and data labels. We also investigate pricing in the step of collaborative training of machine learning models and overview pricing machine learning models for end users in the step of machine learning deployment. We also discuss a series of possible future directions. 2022-05-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/7755 info:doi/10.1007/s10115-022-01679-4 https://ink.library.smu.edu.sg/context/sis_research/article/8758/viewcontent/Data_pricing_in_machine_learning_pipelines.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Data pricing Data asset Data governance Databases and Information Systems
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Data pricing
Data asset
Data governance
Databases and Information Systems
spellingShingle Data pricing
Data asset
Data governance
Databases and Information Systems
CONG, Zicun
LUO, Xuan
PEI, Jian
ZHU, Feida
ZHANG, Yong
Data pricing in machine learning pipelines
description Machine learning is disruptive. At the same time, machine learning can only succeed by collaboration among many parties in multiple steps naturally as pipelines in an eco-system, such as collecting data for possible machine learning applications, collaboratively training models by multiple parties and delivering machine learning services to end users. Data are critical and penetrating in the whole machine learning pipelines. As machine learning pipelines involve many parties and, in order to be successful, have to form a constructive and dynamic eco-system, marketplaces and data pricing are fundamental in connecting and facilitating those many parties. In this article, we survey the principles and the latest research development of data pricing in machine learning pipelines. We start with a brief review of data marketplaces and pricing desiderata. Then, we focus on pricing in three important steps in machine learning pipelines. To understand pricing in the step of training data collection, we review pricing raw data sets and data labels. We also investigate pricing in the step of collaborative training of machine learning models and overview pricing machine learning models for end users in the step of machine learning deployment. We also discuss a series of possible future directions.
format text
author CONG, Zicun
LUO, Xuan
PEI, Jian
ZHU, Feida
ZHANG, Yong
author_facet CONG, Zicun
LUO, Xuan
PEI, Jian
ZHU, Feida
ZHANG, Yong
author_sort CONG, Zicun
title Data pricing in machine learning pipelines
title_short Data pricing in machine learning pipelines
title_full Data pricing in machine learning pipelines
title_fullStr Data pricing in machine learning pipelines
title_full_unstemmed Data pricing in machine learning pipelines
title_sort data pricing in machine learning pipelines
publisher Institutional Knowledge at Singapore Management University
publishDate 2022
url https://ink.library.smu.edu.sg/sis_research/7755
https://ink.library.smu.edu.sg/context/sis_research/article/8758/viewcontent/Data_pricing_in_machine_learning_pipelines.pdf
_version_ 1770576435541442560