Weakly-supervised learning for video understanding

Weakly supervised video segmentation is a challenging but meaningful task, which aims to achieve automatic video segmentation by using less annotation data, such as segment-level annotation of video or region-level annotation of video. Although the annotation workload is reduced, the existing weakly...

Full description

Saved in:
Bibliographic Details
Main Author: Deng, Dingfan
Other Authors: Tan Yap Peng
Format: Thesis-Master by Coursework
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/168218
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-168218
record_format dspace
spelling sg-ntu-dr.10356-1682182023-07-04T16:41:56Z Weakly-supervised learning for video understanding Deng, Dingfan Tan Yap Peng School of Electrical and Electronic Engineering EYPTan@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Weakly supervised video segmentation is a challenging but meaningful task, which aims to achieve automatic video segmentation by using less annotation data, such as segment-level annotation of video or region-level annotation of video. Although the annotation workload is reduced, the existing weakly supervised learning methods have certain limitations compared with traditional video segmentation methods, such as limited accuracy, dependence on the quality of annotations, and poor scalability. When applying weakly-supervised learning video segmentation to the field of robot imitation of human behavior, it is useful to use video segmentation to partition the demonstration video into key steps. In this dissertation, we aim to partition the key steps of human actions contained in a video based on 3D hand pose (hand joints detection and analysis), and use this to provide high-quality demonstration videos for robots to imitate human learning. The main work of this dissertation includes the following: 1.After studying the relevant articles in the field of video segmentation and weakly supervised learning technology in the past few years, I summarized the relevant cutting-edge technologies, and compared their advantages and disadvantages, including model structure, accuracy of sample recognition, and time complexity of algorithms , space complexity and other indicators. 2.Based on the existing data set, I use the method of weakly-supervised learning to train the 3D model for estimating 3D hand pose, recover the 3D hand pose by detecting hand joints, and track the spatial coordinate changes of the hand joints. I then analyze and perform video segmentation according to the change curve of the hand joint space coordinates. 3.I use the trained model to extract the hand joint data from the input video, and partition the key steps of the hand pose in the video by analyzing the relationship between its coordinate changes and the time series of the video. The partition of these key steps will be used together with the input video as a demonstration video to be applied to the training of the robot to imitate human behaviors. Master of Science (Communications Engineering) 2023-05-23T06:26:04Z 2023-05-23T06:26:04Z 2023 Thesis-Master by Coursework Deng, D. (2023). Weakly-supervised learning for video understanding. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/168218 https://hdl.handle.net/10356/168218 en ISM-DISS-03590 application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
spellingShingle Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
Deng, Dingfan
Weakly-supervised learning for video understanding
description Weakly supervised video segmentation is a challenging but meaningful task, which aims to achieve automatic video segmentation by using less annotation data, such as segment-level annotation of video or region-level annotation of video. Although the annotation workload is reduced, the existing weakly supervised learning methods have certain limitations compared with traditional video segmentation methods, such as limited accuracy, dependence on the quality of annotations, and poor scalability. When applying weakly-supervised learning video segmentation to the field of robot imitation of human behavior, it is useful to use video segmentation to partition the demonstration video into key steps. In this dissertation, we aim to partition the key steps of human actions contained in a video based on 3D hand pose (hand joints detection and analysis), and use this to provide high-quality demonstration videos for robots to imitate human learning. The main work of this dissertation includes the following: 1.After studying the relevant articles in the field of video segmentation and weakly supervised learning technology in the past few years, I summarized the relevant cutting-edge technologies, and compared their advantages and disadvantages, including model structure, accuracy of sample recognition, and time complexity of algorithms , space complexity and other indicators. 2.Based on the existing data set, I use the method of weakly-supervised learning to train the 3D model for estimating 3D hand pose, recover the 3D hand pose by detecting hand joints, and track the spatial coordinate changes of the hand joints. I then analyze and perform video segmentation according to the change curve of the hand joint space coordinates. 3.I use the trained model to extract the hand joint data from the input video, and partition the key steps of the hand pose in the video by analyzing the relationship between its coordinate changes and the time series of the video. The partition of these key steps will be used together with the input video as a demonstration video to be applied to the training of the robot to imitate human behaviors.
author2 Tan Yap Peng
author_facet Tan Yap Peng
Deng, Dingfan
format Thesis-Master by Coursework
author Deng, Dingfan
author_sort Deng, Dingfan
title Weakly-supervised learning for video understanding
title_short Weakly-supervised learning for video understanding
title_full Weakly-supervised learning for video understanding
title_fullStr Weakly-supervised learning for video understanding
title_full_unstemmed Weakly-supervised learning for video understanding
title_sort weakly-supervised learning for video understanding
publisher Nanyang Technological University
publishDate 2023
url https://hdl.handle.net/10356/168218
_version_ 1772825382131597312