Neighbourhood structure preserving cross-modal embedding for video hyperlinking

Video hyperlinking is a task aiming to enhance the accessibility of large archives, by establishing links between fragments of videos. The links model the aboutness between fragments for efficient traversal of video content. This paper addresses the problem of link construction from the perspective...

Full description

Saved in:
Bibliographic Details
Main Authors: HAO, Yanbin, NGO, Chong-wah, HUET, Benoit
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2020
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/6305
https://ink.library.smu.edu.sg/context/sis_research/article/7308/viewcontent/08736841.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-7308
record_format dspace
spelling sg-smu-ink.sis_research-73082021-11-23T06:59:30Z Neighbourhood structure preserving cross-modal embedding for video hyperlinking HAO, Yanbin NGO, Chong-wah HUET, Benoit Video hyperlinking is a task aiming to enhance the accessibility of large archives, by establishing links between fragments of videos. The links model the aboutness between fragments for efficient traversal of video content. This paper addresses the problem of link construction from the perspective of cross-modal embedding. To this end, a generalized multi-modal auto-encoder is proposed.& x00A0;The encoder learns two embeddings from visual and speech modalities, respectively, whereas each of the embeddings performs self-modal and cross-modal translation of modalities. Furthermore, to preserve the neighbourhood structure of fragments, which is important for video hyperlinking, the auto-encoder is devised to model data distribution of fragments in a dataset. Experiments are conducted on Blip10000 dataset using the anchor fragments provided by TRECVid Video Hyperlinking (LNK) task over the years of 2016 and 2017. This paper shares the empirical insights on a number of issues in cross-modal learning, including the preservation of neighbourhood structure in embedding, model fine-tuning and issue of missing modality, for video hyperlinking. 2020-01-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/6305 info:doi/10.1109/TMM.2019.2923121 https://ink.library.smu.edu.sg/context/sis_research/article/7308/viewcontent/08736841.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Task analysis Visualization Joining processes Gallium nitride Benchmark testing Feature extraction Neural networks Video hyperlinking cross-modal translation structure-preserving learning Graphics and Human Computer Interfaces OS and Networks
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Task analysis
Visualization
Joining processes
Gallium nitride
Benchmark testing
Feature extraction
Neural networks
Video hyperlinking
cross-modal translation
structure-preserving learning
Graphics and Human Computer Interfaces
OS and Networks
spellingShingle Task analysis
Visualization
Joining processes
Gallium nitride
Benchmark testing
Feature extraction
Neural networks
Video hyperlinking
cross-modal translation
structure-preserving learning
Graphics and Human Computer Interfaces
OS and Networks
HAO, Yanbin
NGO, Chong-wah
HUET, Benoit
Neighbourhood structure preserving cross-modal embedding for video hyperlinking
description Video hyperlinking is a task aiming to enhance the accessibility of large archives, by establishing links between fragments of videos. The links model the aboutness between fragments for efficient traversal of video content. This paper addresses the problem of link construction from the perspective of cross-modal embedding. To this end, a generalized multi-modal auto-encoder is proposed.& x00A0;The encoder learns two embeddings from visual and speech modalities, respectively, whereas each of the embeddings performs self-modal and cross-modal translation of modalities. Furthermore, to preserve the neighbourhood structure of fragments, which is important for video hyperlinking, the auto-encoder is devised to model data distribution of fragments in a dataset. Experiments are conducted on Blip10000 dataset using the anchor fragments provided by TRECVid Video Hyperlinking (LNK) task over the years of 2016 and 2017. This paper shares the empirical insights on a number of issues in cross-modal learning, including the preservation of neighbourhood structure in embedding, model fine-tuning and issue of missing modality, for video hyperlinking.
format text
author HAO, Yanbin
NGO, Chong-wah
HUET, Benoit
author_facet HAO, Yanbin
NGO, Chong-wah
HUET, Benoit
author_sort HAO, Yanbin
title Neighbourhood structure preserving cross-modal embedding for video hyperlinking
title_short Neighbourhood structure preserving cross-modal embedding for video hyperlinking
title_full Neighbourhood structure preserving cross-modal embedding for video hyperlinking
title_fullStr Neighbourhood structure preserving cross-modal embedding for video hyperlinking
title_full_unstemmed Neighbourhood structure preserving cross-modal embedding for video hyperlinking
title_sort neighbourhood structure preserving cross-modal embedding for video hyperlinking
publisher Institutional Knowledge at Singapore Management University
publishDate 2020
url https://ink.library.smu.edu.sg/sis_research/6305
https://ink.library.smu.edu.sg/context/sis_research/article/7308/viewcontent/08736841.pdf
_version_ 1770575931136540672