Duties for datasets

Machine learning (ML) systems are increasingly being deployed in contexts, such as law, medicine and finance, where system errors present serious and foreseeable risks. As ML system behaviour is largely determined by their training inputs, should dataset providers owe duties of care to victims? Usin...

Full description

Saved in:
Bibliographic Details
Main Author: SOH, Jerrold Tsin Howe
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2023
Subjects:
Online Access:https://ink.library.smu.edu.sg/sol_research/4443
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
Description
Summary:Machine learning (ML) systems are increasingly being deployed in contexts, such as law, medicine and finance, where system errors present serious and foreseeable risks. As ML system behaviour is largely determined by their training inputs, should dataset providers owe duties of care to victims? Using the ImageNet dataset and the Generative Pre-trained Transformer (GPT) models as case studies, this chapter argues that the conventional approach of centralising duties on system providers alone yields insufficient safeguards. Dataset-specific duties should also be considered to incentivise precaution in the preparation of crucial ML input. The chapter analyses how dataset duties may be encompassed in existing tort law, surfacing situations where duties are more appropriate. For instance, where a dataset is intended to be used in a risky context, the dataset provider actively influences system outputs, and the dataset is published without safety restrictions or warnings.