Mixture of experts based on confusion matrix and distribution
The parameters and computational complexity of a neural network have been improved to achieve better performance. Condition computation has been proposed to increase the model efficiency with minor losses in the performance by activating parts of network on a per example basis. But there are still g...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Master by Coursework |
Language: | English |
Published: |
Nanyang Technological University
2020
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/141136 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | The parameters and computational complexity of a neural network have been improved to achieve better performance. Condition computation has been proposed to increase the model efficiency with minor losses in the performance by activating parts of network on a per example basis. But there are still great challenges in practice as for performance and algorithmic. In this dissertation, we review the related works and propose a Mixture of Experts (MoE) method to address these challenges in a flexible manner. We introduce the confusion matrix and distribution analysis, where each expert to process specific grouping is trained by confusion matrix and the output data confidence of trained model for each example is predicted by distribution analysis. A sparse combination of experts are assigned by the distribution analysis result to be activated for each case. We test this method (MoE) in the task of classification, where the computation efficiency and accuracy is critical. We also evaluate the model in 5 datasets and test the effect of the expert number. The results show that the FLOPs of network is reduced at least 10% (Fashion MNIST with 10 experts) with minor losses (or even improvement) of the accuracy. |
---|