Dynamic configuration and approximation capabilities of extreme learning machines

Computational intelligence techniques have been extensively explored in wide applications in the past three decades. Out of numerous computational intelligence techniques, neural networks have been playing the dominant roles. However, it is known that neural networks usually face some challenging is...

Full description

Saved in:
Bibliographic Details
Main Author: Zhang, Rui
Other Authors: Huang Guangbin
Format: Theses and Dissertations
Language:English
Published: 2012
Subjects:
Online Access:https://hdl.handle.net/10356/50729
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Computational intelligence techniques have been extensively explored in wide applications in the past three decades. Out of numerous computational intelligence techniques, neural networks have been playing the dominant roles. However, it is known that neural networks usually face some challenging issues such as local minima, slow learning rate, and trivial human intervene. Extreme Learning Machines (ELMs) as an emergent technology, which overcome some challenges faced by other techniques, study a much wider type of ``generalized'' single-hidden layer feedforward networks (GSLFNs). The essence of ELMs is that (1) the hidden layer of ELMs need not be iteratively tuned so that the training time required is dramatically reduced and the human intervene is eliminated; (2) the hidden layer feature mapping need to satisfy the universal approximation condition so that the approximation capability of ELMs can be guaranteed; (3) both of the training error and the norm of output weights need to be minimized so that better generalization performance can be obtained. Although a plenty of simulation results have revealed that the performance of ELMs are relatively stable in a wide range of the number of hidden nodes, it may be hard to tell what the exact and proper range of the number of hidden nodes should be for a specific task. In addition, based on achieving the equal performance, it is apparent that the less the number of hidden nodes is, the more efficient and less computationally expensive ELMs will be, and hence the convergence rate of ELMs raises. Therefore, finding an appropriate and parsimonious network which can be best suited to the given problem, and further exploring its corresponding approximation capability, are still the important issues in the study of ELMs. The original incremental extreme learning machine (I-ELM) was proposed where the hidden nodes are randomly added one by one and the output weights of the existing hidden nodes are frozen when a new hidden node is added. Convex incremental extreme learning machine (CI-ELM) adopts another incremental method and allows to properly adjust the output weights of the existing hidden nodes when a new hidden node is added. CI-ELM can achieve faster convergence rate than I-ELM while retaining the simplicity of I-ELM. However, such convex combination of the existing hidden nodes and the newly added hidden node may not be the optimal fashion by which the residual error can be reduced largest. On the basis of this observation, an incremental extreme learning machine with optimal incremental coefficients (OCI-ELM) has been proposed in this thesis. Such OCI-ELM can not only provide an optimal tradeoff between the existing hidden nodes and the newly added hidden nodes but also work as an universal approximator. Although different architecture-adjustable ELMs by using either pruning method or constructive method are engaged in addressing the problem of network acquisition, they investigate only restricted topological subsets rather than the complete class of network architectures. Inspired by this cognition, an extreme learning machine with adaptive growth of hidden nodes (AG-ELM) where a novel approach to determining the network architecture in an adaptive manner has been introduced in this thesis. AG-ELM realizes the automatic design of networks in a search space of possible structures which are suitable to the problem. Furthermore, the obtained universal approximation theory of AG-ELM not only strengthens those existing results, but also provides a fundamental theory for studying the approximation capability of algorithms which use random mechanism for the parameter searching in neural networks. With the exception of I-ELMs, EM-ELM is another efficient algorithm that allows to add random hidden nodes one by one or group by group (with varying size) and incrementally updates the output weights during the network growth. However, in the implementation of both I-ELMs and EM-ELM, the number of the hidden nodes always monotonically increases with the learning progress and the final number of the hidden nodes is equivalent to the learning steps. Then large number of hidden nodes will be obtained eventually if there needs many iterative steps, while some of the hidden nodes may play a very minor role in the network output. On the other hand, although AG-ELM can automatically determine the network architecture, it is not so efficient as expected. Therefore, in order to choose the right size network automatically as well as improve the efficiency of ELMs, a dynamic extreme learning machine (D-ELM) performing dynamic growth of hidden nodes and incrementally updating of output weights has been proposed in this thesis. On the basis of the theory obtained in AG-ELM, it has been proved that the obtained D-ELM can approximate any Lebesgue $p-$integrable function as long as the hidden activation is Lebesgue $p-$integrable.