Unsupervised action segmentation in videos with clustering algorithms

This project sought to develop a system that performs unsupervised action segmentation in videos. Users are able to perform feature extraction on a desired raw video file as preprocessing for action segmentation, followed by segmenting unlabelled instructional videos according to distinct acti...

Full description

Saved in:
Bibliographic Details
Main Author: Lim, Isaac Sheng Yang
Other Authors: Yeo Chai Kiat
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/174834
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:This project sought to develop a system that performs unsupervised action segmentation in videos. Users are able to perform feature extraction on a desired raw video file as preprocessing for action segmentation, followed by segmenting unlabelled instructional videos according to distinct actions (action segmentation). The project utilised 3 datasets: the Breakfast Actions dataset, the 50 Salads dataset, and the YouTube Instruction Videos dataset, to perform tests and measure the performance of 5 different clustering algorithms. The videos from the datasets were pre-processed by resampling the videos to a common video codec and a frame rate that matched the dimensions of their respective ground truth labels. The features from the resampled videos in these datasets were extracted utilising a Bag-Of-Features model, in which 2 different feature extraction methods: Oriented FAST and Rotated BRIEF (ORB) and Scale-Invariant Feature Transform (SIFT)—were compared to find the better feature extraction algorithm for this action segmentation task. The extracted features were then passed to 5 clustering algorithms to analyse and compare their performance during exploration. These 5 clustering algorithms were: Temporally Weighted First NN Clustering Hierarchy (TW-FINCH), Action Boundary Detection (ABD), Spectral Clustering (SPECTRAL), Ordering Points To Identify the Clustering Structure (OPTICS), and Density-Based Spatial Clustering of Applications with Noise (DBSCAN). Of the 5 clustering algorithms, TW-FINCH was found to have the best performance for the action segmentation task. It was also found that the ORB feature descriptors provided good performance in terms of speed without drastically reducing accuracy, despite research suggesting ORB's inferiority in robustness compared to SIFT descriptors. Based on the exploration and comparisons made, a system was proposed for unsupervised action segmentations on raw video files, with the output being the labelled videos after clustering.