Distributed classification with variable distributions

When the data at a location is insufficient, one may apply a naive solution to gather data from other (remote) places and classify it using a centralized algorithm. Although this approach has good performance, it is often infeasible due to high communication overheads and lack of scalability of the...

Full description

Saved in:
Bibliographic Details
Main Author: Quach, Vinh Thanh
Other Authors: Vivekanand Gopalkrishnan
Format: Theses and Dissertations
Language:English
Published: 2015
Subjects:
Online Access:https://hdl.handle.net/10356/62213
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-62213
record_format dspace
spelling sg-ntu-dr.10356-622132023-03-04T00:46:12Z Distributed classification with variable distributions Quach, Vinh Thanh Vivekanand Gopalkrishnan Ng Wee Keong School of Computer Engineering Centre for Advanced Information Systems DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition When the data at a location is insufficient, one may apply a naive solution to gather data from other (remote) places and classify it using a centralized algorithm. Although this approach has good performance, it is often infeasible due to high communication overheads and lack of scalability of the centralized solution. These concerns have led to the emergence of distributed classification. The promise of distributed classification is to improve the classification accuracy of a learning agent (called party) on its respective local data, using the knowledge of other parties in the distributed network. However, current explorations implicitly assume that all parties receive data from exactly the same distribution of data. We show that this is too simple a scenario, and that in reality, data across parties may be different from each other, in terms of both the data distribution of the inputs (observations) and/or the outputs (labels). We remove the current simplifying assumption by allowing parties to draw data from arbitrary distributions, thus formalizing a new and challenging problem of distributed classification with variable data distributions. We show that this problem is difficult, because it does not admit state-of-the-art solutions in the context of (conventional) distributed classification. After posing the problem and illustrating its difficulty, we present a list of remarkable research challenges (or sub-problems) that should be addressed in this challenging field. For each of those challenges, we provide some potential research directions. Finally, as the first attempt on this new problem, we present a simple-to-implement, straightforward yet working algorithm called VarDist that efficiently solves the problem where the data distribution may vary over the participating parties. Although VarDist is not a complete and sophisticated solution, it does have low costs of communication, while providing a more accurate classifier (than local learning) by benefiting from the auxiliary classifiers from the other parties. MASTER OF ENGINEERING (SCE) 2015-02-25T08:47:31Z 2015-02-25T08:47:31Z 2014 2014 Thesis Quach, V. T. (2014). Distributed classification with variable distributions. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/62213 10.32657/10356/62213 en 72 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition
spellingShingle DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition
Quach, Vinh Thanh
Distributed classification with variable distributions
description When the data at a location is insufficient, one may apply a naive solution to gather data from other (remote) places and classify it using a centralized algorithm. Although this approach has good performance, it is often infeasible due to high communication overheads and lack of scalability of the centralized solution. These concerns have led to the emergence of distributed classification. The promise of distributed classification is to improve the classification accuracy of a learning agent (called party) on its respective local data, using the knowledge of other parties in the distributed network. However, current explorations implicitly assume that all parties receive data from exactly the same distribution of data. We show that this is too simple a scenario, and that in reality, data across parties may be different from each other, in terms of both the data distribution of the inputs (observations) and/or the outputs (labels). We remove the current simplifying assumption by allowing parties to draw data from arbitrary distributions, thus formalizing a new and challenging problem of distributed classification with variable data distributions. We show that this problem is difficult, because it does not admit state-of-the-art solutions in the context of (conventional) distributed classification. After posing the problem and illustrating its difficulty, we present a list of remarkable research challenges (or sub-problems) that should be addressed in this challenging field. For each of those challenges, we provide some potential research directions. Finally, as the first attempt on this new problem, we present a simple-to-implement, straightforward yet working algorithm called VarDist that efficiently solves the problem where the data distribution may vary over the participating parties. Although VarDist is not a complete and sophisticated solution, it does have low costs of communication, while providing a more accurate classifier (than local learning) by benefiting from the auxiliary classifiers from the other parties.
author2 Vivekanand Gopalkrishnan
author_facet Vivekanand Gopalkrishnan
Quach, Vinh Thanh
format Theses and Dissertations
author Quach, Vinh Thanh
author_sort Quach, Vinh Thanh
title Distributed classification with variable distributions
title_short Distributed classification with variable distributions
title_full Distributed classification with variable distributions
title_fullStr Distributed classification with variable distributions
title_full_unstemmed Distributed classification with variable distributions
title_sort distributed classification with variable distributions
publishDate 2015
url https://hdl.handle.net/10356/62213
_version_ 1759858289515954176