Weakly-supervised learning for video understanding

Weakly supervised video segmentation is a challenging but meaningful task, which aims to achieve automatic video segmentation by using less annotation data, such as segment-level annotation of video or region-level annotation of video. Although the annotation workload is reduced, the existing weakly...

Full description

Saved in:

Bibliographic Details
Main Author:	Deng, Dingfan
Other Authors:	Tan Yap Peng
Format:	Thesis-Master by Coursework
Language:	English
Published:	Nanyang Technological University 2023
Subjects:	Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
Online Access:	https://hdl.handle.net/10356/168218
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-168218
record_format	dspace
spelling	sg-ntu-dr.10356-1682182023-07-04T16:41:56Z Weakly-supervised learning for video understanding Deng, Dingfan Tan Yap Peng School of Electrical and Electronic Engineering EYPTan@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Weakly supervised video segmentation is a challenging but meaningful task, which aims to achieve automatic video segmentation by using less annotation data, such as segment-level annotation of video or region-level annotation of video. Although the annotation workload is reduced, the existing weakly supervised learning methods have certain limitations compared with traditional video segmentation methods, such as limited accuracy, dependence on the quality of annotations, and poor scalability. When applying weakly-supervised learning video segmentation to the field of robot imitation of human behavior, it is useful to use video segmentation to partition the demonstration video into key steps. In this dissertation, we aim to partition the key steps of human actions contained in a video based on 3D hand pose (hand joints detection and analysis), and use this to provide high-quality demonstration videos for robots to imitate human learning. The main work of this dissertation includes the following: 1.After studying the relevant articles in the field of video segmentation and weakly supervised learning technology in the past few years, I summarized the relevant cutting-edge technologies, and compared their advantages and disadvantages, including model structure, accuracy of sample recognition, and time complexity of algorithms , space complexity and other indicators. 2.Based on the existing data set, I use the method of weakly-supervised learning to train the 3D model for estimating 3D hand pose, recover the 3D hand pose by detecting hand joints, and track the spatial coordinate changes of the hand joints. I then analyze and perform video segmentation according to the change curve of the hand joint space coordinates. 3.I use the trained model to extract the hand joint data from the input video, and partition the key steps of the hand pose in the video by analyzing the relationship between its coordinate changes and the time series of the video. The partition of these key steps will be used together with the input video as a demonstration video to be applied to the training of the robot to imitate human behaviors. Master of Science (Communications Engineering) 2023-05-23T06:26:04Z 2023-05-23T06:26:04Z 2023 Thesis-Master by Coursework Deng, D. (2023). Weakly-supervised learning for video understanding. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/168218 https://hdl.handle.net/10356/168218 en ISM-DISS-03590 application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
spellingShingle	Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Deng, Dingfan Weakly-supervised learning for video understanding
description	Weakly supervised video segmentation is a challenging but meaningful task, which aims to achieve automatic video segmentation by using less annotation data, such as segment-level annotation of video or region-level annotation of video. Although the annotation workload is reduced, the existing weakly supervised learning methods have certain limitations compared with traditional video segmentation methods, such as limited accuracy, dependence on the quality of annotations, and poor scalability. When applying weakly-supervised learning video segmentation to the field of robot imitation of human behavior, it is useful to use video segmentation to partition the demonstration video into key steps. In this dissertation, we aim to partition the key steps of human actions contained in a video based on 3D hand pose (hand joints detection and analysis), and use this to provide high-quality demonstration videos for robots to imitate human learning. The main work of this dissertation includes the following: 1.After studying the relevant articles in the field of video segmentation and weakly supervised learning technology in the past few years, I summarized the relevant cutting-edge technologies, and compared their advantages and disadvantages, including model structure, accuracy of sample recognition, and time complexity of algorithms , space complexity and other indicators. 2.Based on the existing data set, I use the method of weakly-supervised learning to train the 3D model for estimating 3D hand pose, recover the 3D hand pose by detecting hand joints, and track the spatial coordinate changes of the hand joints. I then analyze and perform video segmentation according to the change curve of the hand joint space coordinates. 3.I use the trained model to extract the hand joint data from the input video, and partition the key steps of the hand pose in the video by analyzing the relationship between its coordinate changes and the time series of the video. The partition of these key steps will be used together with the input video as a demonstration video to be applied to the training of the robot to imitate human behaviors.
author2	Tan Yap Peng
author_facet	Tan Yap Peng Deng, Dingfan
format	Thesis-Master by Coursework
author	Deng, Dingfan
author_sort	Deng, Dingfan
title	Weakly-supervised learning for video understanding
title_short	Weakly-supervised learning for video understanding
title_full	Weakly-supervised learning for video understanding
title_fullStr	Weakly-supervised learning for video understanding
title_full_unstemmed	Weakly-supervised learning for video understanding
title_sort	weakly-supervised learning for video understanding
publisher	Nanyang Technological University
publishDate	2023
url	https://hdl.handle.net/10356/168218
_version_	1772825382131597312

Weakly-supervised learning for video understanding

Similar Items