Efficient video object co-localization with co-saliency activated tracklets

Video object co-localization is the task of jointly localizing common visual objects across videos. Due to the large variations both across the videos and within each video, it is quite challenging to identify and track the common objects jointly. Unlike the previous joint frameworks that use a larg...

Full description

Saved in:

Bibliographic Details
Main Authors:	Jerripothula, Koteswar Rao, Cai, Jianfei, Yuan, Junsong
Other Authors:	School of Computer Science and Engineering
Format:	Article
Language:	English
Published:	2020
Subjects:	Engineering::Computer science and engineering Tracklets Video
Online Access:	https://hdl.handle.net/10356/142175
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-142175
record_format	dspace
spelling	sg-ntu-dr.10356-1421752020-10-05T05:49:07Z Efficient video object co-localization with co-saliency activated tracklets Jerripothula, Koteswar Rao Cai, Jianfei Yuan, Junsong School of Computer Science and Engineering Engineering::Computer science and engineering Tracklets Video Video object co-localization is the task of jointly localizing common visual objects across videos. Due to the large variations both across the videos and within each video, it is quite challenging to identify and track the common objects jointly. Unlike the previous joint frameworks that use a large number of bounding box proposals to attack the problem, we propose to leverage co-saliency activated tracklets to efficiently address the problem. To highlight the common object regions, we first explore inter-video commonness, intra-video commonness, and motion saliency to generate the co-saliency maps for a small number of selected key frames at regular intervals. Object proposals of high objectness and co-saliency scores in those frames are tracked across each interval to build tracklets. Finally, the best tube for a video is obtained through selecting the optimal tracklet from each interval with the help of confidence and smoothness constraints. Experimental results on the benchmark YouTube-objects dataset show that the proposed method outperforms the state-of-the-art methods in terms of accuracy and speed under both weakly supervised and unsupervised settings. Moreover, by noticing the existing benchmark dataset lacks of sufficient annotations for object localization (only one annotated frame per video), we further annotate more than 15k frames of the YouTube videos and develop a new benchmark dataset for video co-localization. NRF (Natl Research Foundation, S’pore) 2020-06-16T09:26:57Z 2020-06-16T09:26:57Z 2018 Journal Article Jerripothula, K. R., Cai, J., & Yuan, J. (2019). Efficient video object co-localization with co-saliency activated tracklets. IEEE Transactions on Circuits and Systems for Video Technology, 29(3), 744-755. doi:10.1109/tcsvt.2018.2805811 1051-8215 https://hdl.handle.net/10356/142175 10.1109/TCSVT.2018.2805811 2-s2.0-85042105124 3 29 744 755 en IEEE Transactions on Circuits and Systems for Video Technology © 2018 IEEE. All rights reserved. application/pdf
institution	Nanyang Technological University
building	NTU Library
country	Singapore
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering Tracklets Video
spellingShingle	Engineering::Computer science and engineering Tracklets Video Jerripothula, Koteswar Rao Cai, Jianfei Yuan, Junsong Efficient video object co-localization with co-saliency activated tracklets
description	Video object co-localization is the task of jointly localizing common visual objects across videos. Due to the large variations both across the videos and within each video, it is quite challenging to identify and track the common objects jointly. Unlike the previous joint frameworks that use a large number of bounding box proposals to attack the problem, we propose to leverage co-saliency activated tracklets to efficiently address the problem. To highlight the common object regions, we first explore inter-video commonness, intra-video commonness, and motion saliency to generate the co-saliency maps for a small number of selected key frames at regular intervals. Object proposals of high objectness and co-saliency scores in those frames are tracked across each interval to build tracklets. Finally, the best tube for a video is obtained through selecting the optimal tracklet from each interval with the help of confidence and smoothness constraints. Experimental results on the benchmark YouTube-objects dataset show that the proposed method outperforms the state-of-the-art methods in terms of accuracy and speed under both weakly supervised and unsupervised settings. Moreover, by noticing the existing benchmark dataset lacks of sufficient annotations for object localization (only one annotated frame per video), we further annotate more than 15k frames of the YouTube videos and develop a new benchmark dataset for video co-localization.
author2	School of Computer Science and Engineering
author_facet	School of Computer Science and Engineering Jerripothula, Koteswar Rao Cai, Jianfei Yuan, Junsong
format	Article
author	Jerripothula, Koteswar Rao Cai, Jianfei Yuan, Junsong
author_sort	Jerripothula, Koteswar Rao
title	Efficient video object co-localization with co-saliency activated tracklets
title_short	Efficient video object co-localization with co-saliency activated tracklets
title_full	Efficient video object co-localization with co-saliency activated tracklets
title_fullStr	Efficient video object co-localization with co-saliency activated tracklets
title_full_unstemmed	Efficient video object co-localization with co-saliency activated tracklets
title_sort	efficient video object co-localization with co-saliency activated tracklets
publishDate	2020
url	https://hdl.handle.net/10356/142175
_version_	1681056167447494656

Efficient video object co-localization with co-saliency activated tracklets

Similar Items