Weakly-supervised learning for video understanding
Weakly supervised video segmentation is a challenging but meaningful task, which aims to achieve automatic video segmentation by using less annotation data, such as segment-level annotation of video or region-level annotation of video. Although the annotation workload is reduced, the existing weakly...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Master by Coursework |
Language: | English |
Published: |
Nanyang Technological University
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/168218 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Weakly supervised video segmentation is a challenging but meaningful task, which aims to achieve automatic video segmentation by using less annotation data, such as segment-level annotation of video or region-level annotation of video. Although the annotation workload is reduced, the existing weakly supervised learning methods have certain limitations compared with traditional video segmentation methods, such as limited accuracy, dependence on the quality of annotations, and poor scalability.
When applying weakly-supervised learning video segmentation to the field of robot imitation of human behavior, it is useful to use video segmentation to partition the demonstration video into key steps. In this dissertation, we aim to partition the key steps of human actions contained in a video based on 3D hand pose (hand joints detection and analysis), and use this to provide high-quality demonstration videos for robots to imitate human learning.
The main work of this dissertation includes the following:
1.After studying the relevant articles in the field of video segmentation and weakly supervised learning technology in the past few years, I summarized the relevant cutting-edge technologies, and compared their advantages and disadvantages, including model structure, accuracy of sample recognition, and time complexity of algorithms , space complexity and other indicators.
2.Based on the existing data set, I use the method of weakly-supervised learning to train the 3D model for estimating 3D hand pose, recover the 3D hand pose by detecting hand joints, and track the spatial coordinate changes of the hand joints. I then analyze and perform video segmentation according to the change curve of the hand joint space coordinates.
3.I use the trained model to extract the hand joint data from the input video, and partition the key steps of the hand pose in the video by analyzing the relationship between its coordinate changes and the time series of the video. The partition of these key steps will be used together with the input video as a demonstration video to be applied to the training of the robot to imitate human behaviors. |
---|