Mixture of experts based on confusion matrix and distribution

The parameters and computational complexity of a neural network have been improved to achieve better performance. Condition computation has been proposed to increase the model efficiency with minor losses in the performance by activating parts of network on a per example basis. But there are still g...

Full description

Saved in:
Bibliographic Details
Main Author: Wang, Zhisheng
Other Authors: Mao Kezhi
Format: Thesis-Master by Coursework
Language:English
Published: Nanyang Technological University 2020
Subjects:
Online Access:https://hdl.handle.net/10356/141136
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:The parameters and computational complexity of a neural network have been improved to achieve better performance. Condition computation has been proposed to increase the model efficiency with minor losses in the performance by activating parts of network on a per example basis. But there are still great challenges in practice as for performance and algorithmic. In this dissertation, we review the related works and propose a Mixture of Experts (MoE) method to address these challenges in a flexible manner. We introduce the confusion matrix and distribution analysis, where each expert to process specific grouping is trained by confusion matrix and the output data confidence of trained model for each example is predicted by distribution analysis. A sparse combination of experts are assigned by the distribution analysis result to be activated for each case. We test this method (MoE) in the task of classification, where the computation efficiency and accuracy is critical. We also evaluate the model in 5 datasets and test the effect of the expert number. The results show that the FLOPs of network is reduced at least 10% (Fashion MNIST with 10 experts) with minor losses (or even improvement) of the accuracy.