Video Based Fish Species Classification Using Convolutional Neural Network

Fish recognition and classification is one of the challenging studies in the marine domain and even in agriculture and is recognized as promising research to push this field forward. Although in terms of the process of collecting data in real-time, the domain of this research has progressed rapidly,...

Full description

Saved in:
Bibliographic Details
Main Author: Naufal Rachmatullah, Muhammad
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/36295
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:36295
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description Fish recognition and classification is one of the challenging studies in the marine domain and even in agriculture and is recognized as promising research to push this field forward. Although in terms of the process of collecting data in real-time, the domain of this research has progressed rapidly, but in terms of recognizing and classifying fish from underwater image there are still needed to improve. This is because problems in the process of recognizing and classifying fish must be able to overcome problems such as the variation in size and shape of fish, poor quality of images or videos, changes in the environment, etc. One way that can be done to overcome this is by using a feature learning approach. Deep learning architecture that is often used in the detection and image classification domains is the convolutional neural network. The purpose of this study is to address the classification problem of video-based fish species. The approach used to overcome this problem is to apply feature learning-based feature extraction approach using the Convolutional Neural Network (CNN) method. In conducting a video-based fish classification process, two main stages are needed, namely detection and classification. In the detection phase, moving average pixel and adaptive GMM algorithms are used. From the test results it is known that the detection accuracy of the two algorithms is 38.70% for the moving average pixel method and 55.39% for the adaptive GMM method. The classification phase is done by constructing a model using the image dataset. The proportion of training and testing data used in this study is 70% for training data and 30 for testing. The process of model development is done by testing the CNN architecture with a variety of settings. The first setting is to test the use of dropout techniques to overcome overfitting. The test results show that the CNN model built with the dropout technique provides higher accuracy of testing data compared to the CNN model without dropout. The second setting is testing, CNN models vary in the number of layers and mini-batch data. From the testing results the best model is CNN with 4 convolution layers with 16 batch data with testing accuracy 98.7%. The next experiment is to apply the data augmentation into training data to overcome imbalanced data problems. As a result, the CNN model with 2 layers of convolution and 32 batches of data had the highest recall, precision and f1 scores with 99.3%, 99.65% and 99.48% respectively. In order to measure the generalization level of constructed model, the best model that is successfully obtained is tested using video datasets. The result showed that the CNN model with 2 layers and 32 batches data provides precision, recall and f1 scores of 42%, 47% and 36% respectively. Moreover, to improve the testing accuracy on the video dataset, the model has been constructed by combining the dataset image and video dataset. In this experiment, the video dataset used was 20 videos of 77 videos. The test results using this combined data provide the value of precision, recall and f1 score of respectively, 48.67%, 58.73% and 49.4%. The low value of precision, recall and f1 score on the constructed model CNN is due to the very large variation between the data used during training and testing data. In order to overcome these problems, architecture is built using transfer learning techniques. The architecture used in the transfer learning process is Faster RCNN and Resnet. Furthermore, this architecture was developed to solve two tasks at once, the detection and classification. The model that is construct using transfer learning techniques was carried out using 20 video datasets as training data and 57 videos as testing data. The transfer learning model was tested using the mean average precision (mAP) method, a test metric for detection and classification, with a value of 84%.
format Theses
author Naufal Rachmatullah, Muhammad
spellingShingle Naufal Rachmatullah, Muhammad
Video Based Fish Species Classification Using Convolutional Neural Network
author_facet Naufal Rachmatullah, Muhammad
author_sort Naufal Rachmatullah, Muhammad
title Video Based Fish Species Classification Using Convolutional Neural Network
title_short Video Based Fish Species Classification Using Convolutional Neural Network
title_full Video Based Fish Species Classification Using Convolutional Neural Network
title_fullStr Video Based Fish Species Classification Using Convolutional Neural Network
title_full_unstemmed Video Based Fish Species Classification Using Convolutional Neural Network
title_sort video based fish species classification using convolutional neural network
url https://digilib.itb.ac.id/gdl/view/36295
_version_ 1822924599583571968
spelling id-itb.:362952019-03-11T14:05:19ZVideo Based Fish Species Classification Using Convolutional Neural Network Naufal Rachmatullah, Muhammad Indonesia Theses Convolutional Neural Network, deep learning, fish classification, transfer learning. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/36295 Fish recognition and classification is one of the challenging studies in the marine domain and even in agriculture and is recognized as promising research to push this field forward. Although in terms of the process of collecting data in real-time, the domain of this research has progressed rapidly, but in terms of recognizing and classifying fish from underwater image there are still needed to improve. This is because problems in the process of recognizing and classifying fish must be able to overcome problems such as the variation in size and shape of fish, poor quality of images or videos, changes in the environment, etc. One way that can be done to overcome this is by using a feature learning approach. Deep learning architecture that is often used in the detection and image classification domains is the convolutional neural network. The purpose of this study is to address the classification problem of video-based fish species. The approach used to overcome this problem is to apply feature learning-based feature extraction approach using the Convolutional Neural Network (CNN) method. In conducting a video-based fish classification process, two main stages are needed, namely detection and classification. In the detection phase, moving average pixel and adaptive GMM algorithms are used. From the test results it is known that the detection accuracy of the two algorithms is 38.70% for the moving average pixel method and 55.39% for the adaptive GMM method. The classification phase is done by constructing a model using the image dataset. The proportion of training and testing data used in this study is 70% for training data and 30 for testing. The process of model development is done by testing the CNN architecture with a variety of settings. The first setting is to test the use of dropout techniques to overcome overfitting. The test results show that the CNN model built with the dropout technique provides higher accuracy of testing data compared to the CNN model without dropout. The second setting is testing, CNN models vary in the number of layers and mini-batch data. From the testing results the best model is CNN with 4 convolution layers with 16 batch data with testing accuracy 98.7%. The next experiment is to apply the data augmentation into training data to overcome imbalanced data problems. As a result, the CNN model with 2 layers of convolution and 32 batches of data had the highest recall, precision and f1 scores with 99.3%, 99.65% and 99.48% respectively. In order to measure the generalization level of constructed model, the best model that is successfully obtained is tested using video datasets. The result showed that the CNN model with 2 layers and 32 batches data provides precision, recall and f1 scores of 42%, 47% and 36% respectively. Moreover, to improve the testing accuracy on the video dataset, the model has been constructed by combining the dataset image and video dataset. In this experiment, the video dataset used was 20 videos of 77 videos. The test results using this combined data provide the value of precision, recall and f1 score of respectively, 48.67%, 58.73% and 49.4%. The low value of precision, recall and f1 score on the constructed model CNN is due to the very large variation between the data used during training and testing data. In order to overcome these problems, architecture is built using transfer learning techniques. The architecture used in the transfer learning process is Faster RCNN and Resnet. Furthermore, this architecture was developed to solve two tasks at once, the detection and classification. The model that is construct using transfer learning techniques was carried out using 20 video datasets as training data and 57 videos as testing data. The transfer learning model was tested using the mean average precision (mAP) method, a test metric for detection and classification, with a value of 84%. text