Mixture of experts based on confusion matrix and distribution

The parameters and computational complexity of a neural network have been improved to achieve better performance. Condition computation has been proposed to increase the model efficiency with minor losses in the performance by activating parts of network on a per example basis. But there are still g...

Full description

Saved in:
Bibliographic Details
Main Author: Wang, Zhisheng
Other Authors: Mao Kezhi
Format: Thesis-Master by Coursework
Language:English
Published: Nanyang Technological University 2020
Subjects:
Online Access:https://hdl.handle.net/10356/141136
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-141136
record_format dspace
spelling sg-ntu-dr.10356-1411362023-07-04T16:42:05Z Mixture of experts based on confusion matrix and distribution Wang, Zhisheng Mao Kezhi School of Electrical and Electronic Engineering EKZMao@ntu.edu.sg Engineering::Computer science and engineering::Theory of computation::Analysis of algorithms and problem complexity The parameters and computational complexity of a neural network have been improved to achieve better performance. Condition computation has been proposed to increase the model efficiency with minor losses in the performance by activating parts of network on a per example basis. But there are still great challenges in practice as for performance and algorithmic. In this dissertation, we review the related works and propose a Mixture of Experts (MoE) method to address these challenges in a flexible manner. We introduce the confusion matrix and distribution analysis, where each expert to process specific grouping is trained by confusion matrix and the output data confidence of trained model for each example is predicted by distribution analysis. A sparse combination of experts are assigned by the distribution analysis result to be activated for each case. We test this method (MoE) in the task of classification, where the computation efficiency and accuracy is critical. We also evaluate the model in 5 datasets and test the effect of the expert number. The results show that the FLOPs of network is reduced at least 10% (Fashion MNIST with 10 experts) with minor losses (or even improvement) of the accuracy. Master of Science (Computer Control and Automation) 2020-06-04T06:01:52Z 2020-06-04T06:01:52Z 2020 Thesis-Master by Coursework https://hdl.handle.net/10356/141136 en application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering::Theory of computation::Analysis of algorithms and problem complexity
spellingShingle Engineering::Computer science and engineering::Theory of computation::Analysis of algorithms and problem complexity
Wang, Zhisheng
Mixture of experts based on confusion matrix and distribution
description The parameters and computational complexity of a neural network have been improved to achieve better performance. Condition computation has been proposed to increase the model efficiency with minor losses in the performance by activating parts of network on a per example basis. But there are still great challenges in practice as for performance and algorithmic. In this dissertation, we review the related works and propose a Mixture of Experts (MoE) method to address these challenges in a flexible manner. We introduce the confusion matrix and distribution analysis, where each expert to process specific grouping is trained by confusion matrix and the output data confidence of trained model for each example is predicted by distribution analysis. A sparse combination of experts are assigned by the distribution analysis result to be activated for each case. We test this method (MoE) in the task of classification, where the computation efficiency and accuracy is critical. We also evaluate the model in 5 datasets and test the effect of the expert number. The results show that the FLOPs of network is reduced at least 10% (Fashion MNIST with 10 experts) with minor losses (or even improvement) of the accuracy.
author2 Mao Kezhi
author_facet Mao Kezhi
Wang, Zhisheng
format Thesis-Master by Coursework
author Wang, Zhisheng
author_sort Wang, Zhisheng
title Mixture of experts based on confusion matrix and distribution
title_short Mixture of experts based on confusion matrix and distribution
title_full Mixture of experts based on confusion matrix and distribution
title_fullStr Mixture of experts based on confusion matrix and distribution
title_full_unstemmed Mixture of experts based on confusion matrix and distribution
title_sort mixture of experts based on confusion matrix and distribution
publisher Nanyang Technological University
publishDate 2020
url https://hdl.handle.net/10356/141136
_version_ 1772827400908832768