Distributed classification with variable distributions

When the data at a location is insufficient, one may apply a naive solution to gather data from other (remote) places and classify it using a centralized algorithm. Although this approach has good performance, it is often infeasible due to high communication overheads and lack of scalability of the...

Full description

Saved in:

Bibliographic Details
Main Author:	Quach, Vinh Thanh
Other Authors:	Vivekanand Gopalkrishnan
Format:	Theses and Dissertations
Language:	English
Published:	2015
Subjects:	DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition
Online Access:	https://hdl.handle.net/10356/62213
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-62213
record_format	dspace
spelling	sg-ntu-dr.10356-622132023-03-04T00:46:12Z Distributed classification with variable distributions Quach, Vinh Thanh Vivekanand Gopalkrishnan Ng Wee Keong School of Computer Engineering Centre for Advanced Information Systems DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition When the data at a location is insufficient, one may apply a naive solution to gather data from other (remote) places and classify it using a centralized algorithm. Although this approach has good performance, it is often infeasible due to high communication overheads and lack of scalability of the centralized solution. These concerns have led to the emergence of distributed classification. The promise of distributed classification is to improve the classification accuracy of a learning agent (called party) on its respective local data, using the knowledge of other parties in the distributed network. However, current explorations implicitly assume that all parties receive data from exactly the same distribution of data. We show that this is too simple a scenario, and that in reality, data across parties may be different from each other, in terms of both the data distribution of the inputs (observations) and/or the outputs (labels). We remove the current simplifying assumption by allowing parties to draw data from arbitrary distributions, thus formalizing a new and challenging problem of distributed classification with variable data distributions. We show that this problem is difficult, because it does not admit state-of-the-art solutions in the context of (conventional) distributed classification. After posing the problem and illustrating its difficulty, we present a list of remarkable research challenges (or sub-problems) that should be addressed in this challenging field. For each of those challenges, we provide some potential research directions. Finally, as the first attempt on this new problem, we present a simple-to-implement, straightforward yet working algorithm called VarDist that efficiently solves the problem where the data distribution may vary over the participating parties. Although VarDist is not a complete and sophisticated solution, it does have low costs of communication, while providing a more accurate classifier (than local learning) by benefiting from the auxiliary classifiers from the other parties. MASTER OF ENGINEERING (SCE) 2015-02-25T08:47:31Z 2015-02-25T08:47:31Z 2014 2014 Thesis Quach, V. T. (2014). Distributed classification with variable distributions. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/62213 10.32657/10356/62213 en 72 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition
spellingShingle	DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition Quach, Vinh Thanh Distributed classification with variable distributions
description	When the data at a location is insufficient, one may apply a naive solution to gather data from other (remote) places and classify it using a centralized algorithm. Although this approach has good performance, it is often infeasible due to high communication overheads and lack of scalability of the centralized solution. These concerns have led to the emergence of distributed classification. The promise of distributed classification is to improve the classification accuracy of a learning agent (called party) on its respective local data, using the knowledge of other parties in the distributed network. However, current explorations implicitly assume that all parties receive data from exactly the same distribution of data. We show that this is too simple a scenario, and that in reality, data across parties may be different from each other, in terms of both the data distribution of the inputs (observations) and/or the outputs (labels). We remove the current simplifying assumption by allowing parties to draw data from arbitrary distributions, thus formalizing a new and challenging problem of distributed classification with variable data distributions. We show that this problem is difficult, because it does not admit state-of-the-art solutions in the context of (conventional) distributed classification. After posing the problem and illustrating its difficulty, we present a list of remarkable research challenges (or sub-problems) that should be addressed in this challenging field. For each of those challenges, we provide some potential research directions. Finally, as the first attempt on this new problem, we present a simple-to-implement, straightforward yet working algorithm called VarDist that efficiently solves the problem where the data distribution may vary over the participating parties. Although VarDist is not a complete and sophisticated solution, it does have low costs of communication, while providing a more accurate classifier (than local learning) by benefiting from the auxiliary classifiers from the other parties.
author2	Vivekanand Gopalkrishnan
author_facet	Vivekanand Gopalkrishnan Quach, Vinh Thanh
format	Theses and Dissertations
author	Quach, Vinh Thanh
author_sort	Quach, Vinh Thanh
title	Distributed classification with variable distributions
title_short	Distributed classification with variable distributions
title_full	Distributed classification with variable distributions
title_fullStr	Distributed classification with variable distributions
title_full_unstemmed	Distributed classification with variable distributions
title_sort	distributed classification with variable distributions
publishDate	2015
url	https://hdl.handle.net/10356/62213
_version_	1759858289515954176

Distributed classification with variable distributions

Similar Items