Natural two-view learning
Semi-supervised learning has attracted a lot of attention because in many data mining applications, unlabelled training examples are easily available, but labelled training examples are costly to attain. In this report, the MATLAB implementation of a new co-training style semi-supervised algorithm,...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
2013
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/52032 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-52032 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-520322019-12-10T12:32:52Z Natural two-view learning Rastogi, Raghav School of Computer Engineering Wu Jianxin DRNTU::Engineering Semi-supervised learning has attracted a lot of attention because in many data mining applications, unlabelled training examples are easily available, but labelled training examples are costly to attain. In this report, the MATLAB implementation of a new co-training style semi-supervised algorithm, named tri-training, has been discussed. Tri-training does not require adequate and redundant views. It also does not partition the instance space into equivalence classes. Hence, it can be used for a wide range of data mining applications. The algorithm produces three classifiers from the original labelled training data set. These classifiers are then refined using unlabelled examples in the tri-training process. For every training process an unlabelled example is labelled for a classifier if the other two classifiers are consistent on the labelling under certain conditions. This scenario solves the problem of how to label the unlabelled examples and how to produce the final hypothesis. This also contributes to the efficiency and the greater applicability of this algorithm compared to that of the previous co-training type algorithms. Experiments on the UCI data sets [1] show that the tri-training can efficiently make use of unlabelled data and improve the learning process. The generalization ability of its final hypothesis is extremely good, sometimes even outperforming that of the collaboration of three classifiers being provided with labels of all the unlabelled examples. The experiments showed that the error rates increase as the percentage of unlabeled data in the training data increases. The report also discusses the effect of percentage of training and testing data on the training and testing time. The training time increases with the increase in labelled data in the training data set. The testing time remains constant if the testing data set remains the same. Bachelor of Engineering (Computer Science) 2013-04-19T08:23:26Z 2013-04-19T08:23:26Z 2013 2013 Final Year Project (FYP) http://hdl.handle.net/10356/52032 en Nanyang Technological University 56 p. application/msword |
institution |
Nanyang Technological University |
building |
NTU Library |
country |
Singapore |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering |
spellingShingle |
DRNTU::Engineering Rastogi, Raghav Natural two-view learning |
description |
Semi-supervised learning has attracted a lot of attention because in many data mining applications, unlabelled training examples are easily available, but labelled training examples are costly to attain. In this report, the MATLAB implementation of a new co-training style semi-supervised algorithm, named tri-training, has been discussed. Tri-training does not require adequate and redundant views. It also does not partition the instance space into equivalence classes. Hence, it can be used for a wide range of data mining applications.
The algorithm produces three classifiers from the original labelled training data set.
These classifiers are then refined using unlabelled examples in the tri-training process.
For every training process an unlabelled example is labelled for a classifier if the other two classifiers are consistent on the labelling under certain conditions. This scenario solves the problem of how to label the unlabelled examples and how to produce the final hypothesis. This also contributes to the efficiency and the greater applicability of this algorithm compared to that of the previous co-training type algorithms.
Experiments on the UCI data sets [1] show that the tri-training can efficiently make use of unlabelled data and improve the learning process. The generalization ability of its final hypothesis is extremely good, sometimes even outperforming that of the collaboration of three classifiers being provided with labels of all the unlabelled examples. The experiments showed that the error rates increase as the percentage of unlabeled data in the training data increases. The report also discusses the effect of percentage of training and testing data on the training and testing time. The training time increases with the increase in labelled data in the training data set. The testing time remains constant if the testing data set remains the same. |
author2 |
School of Computer Engineering |
author_facet |
School of Computer Engineering Rastogi, Raghav |
format |
Final Year Project |
author |
Rastogi, Raghav |
author_sort |
Rastogi, Raghav |
title |
Natural two-view learning |
title_short |
Natural two-view learning |
title_full |
Natural two-view learning |
title_fullStr |
Natural two-view learning |
title_full_unstemmed |
Natural two-view learning |
title_sort |
natural two-view learning |
publishDate |
2013 |
url |
http://hdl.handle.net/10356/52032 |
_version_ |
1681038221897629696 |