Unsupervised action segmentation in videos with clustering algorithms
This project sought to develop a system that performs unsupervised action segmentation in videos. Users are able to perform feature extraction on a desired raw video file as preprocessing for action segmentation, followed by segmenting unlabelled instructional videos according to distinct acti...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/174834 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | This project sought to develop a system that performs unsupervised action segmentation in
videos. Users are able to perform feature extraction on a desired raw video file as
preprocessing for action segmentation, followed by segmenting unlabelled instructional videos
according to distinct actions (action segmentation).
The project utilised 3 datasets: the Breakfast Actions dataset, the 50 Salads dataset, and the
YouTube Instruction Videos dataset, to perform tests and measure the performance of 5
different clustering algorithms. The videos from the datasets were pre-processed by
resampling the videos to a common video codec and a frame rate that matched the dimensions
of their respective ground truth labels.
The features from the resampled videos in these datasets were extracted utilising a Bag-Of-Features model, in which 2 different feature extraction methods: Oriented FAST and Rotated
BRIEF (ORB) and Scale-Invariant Feature Transform (SIFT)—were compared to find the
better feature extraction algorithm for this action segmentation task.
The extracted features were then passed to 5 clustering algorithms to analyse and compare
their performance during exploration. These 5 clustering algorithms were: Temporally
Weighted First NN Clustering Hierarchy (TW-FINCH), Action Boundary Detection (ABD),
Spectral Clustering (SPECTRAL), Ordering Points To Identify the Clustering Structure
(OPTICS), and Density-Based Spatial Clustering of Applications with Noise (DBSCAN).
Of the 5 clustering algorithms, TW-FINCH was found to have the best performance for the
action segmentation task. It was also found that the ORB feature descriptors provided good
performance in terms of speed without drastically reducing accuracy, despite research
suggesting ORB's inferiority in robustness compared to SIFT descriptors.
Based on the exploration and comparisons made, a system was proposed for unsupervised
action segmentations on raw video files, with the output being the labelled videos after
clustering. |
---|