Look, read and feel : benchmarking ads understanding with multimodal multitask learning

Given the massive market of advertising and the sharply increasing online multimedia content (such as videos), it is now fashionable to promote advertisements (ads) together with the multimedia content. However, manually finding relevant ads to match the provided content is labor-intensive, and henc...

Full description

Saved in:

Bibliographic Details
Main Authors:	Zhang, Huaizheng, Luo, Yong, Ai, Qiming, Wen, Yonggang, Hu, Han
Other Authors:	School of Computer Science and Engineering
Format:	Conference or Workshop Item
Language:	English
Published:	2021
Subjects:	Engineering::Computer science and engineering Ads Understanding Multimodal Learning
Online Access:	https://hdl.handle.net/10356/152993
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-152993
record_format	dspace
spelling	sg-ntu-dr.10356-1529932021-10-27T08:32:15Z Look, read and feel : benchmarking ads understanding with multimodal multitask learning Zhang, Huaizheng Luo, Yong Ai, Qiming Wen, Yonggang Hu, Han School of Computer Science and Engineering 28th ACM International Conference on Multimedia Engineering::Computer science and engineering Ads Understanding Multimodal Learning Given the massive market of advertising and the sharply increasing online multimedia content (such as videos), it is now fashionable to promote advertisements (ads) together with the multimedia content. However, manually finding relevant ads to match the provided content is labor-intensive, and hence some automatic advertising techniques are developed. Since ads are usually hard to understand only according to its visual appearance due to the contained visual metaphor, some other modalities, such as the contained texts, should be exploited for understanding. To further improve user experience, it is necessary to understand both the ads' topic and sentiment. This motivates us to develop a novel deep multimodal multitask framework that integrates multiple modalities to achieve effective topic and sentiment prediction simultaneously for ads understanding. In particular, in our framework termed Deep$M^2$Ad, we first extract multimodal information from ads and learn high-level and comparable representations. The visual metaphor of the ad is decoded in an unsupervised manner. The obtained representations are then fed into the proposed hierarchical multimodal attention modules to learn task-specific representations for final prediction. A multitask loss function is also designed to jointly train both the topic and sentiment prediction models in an end-to-end manner, where bottom-layer parameters are shared to alleviate over-fitting. We conduct extensive experiments on a large-scale advertisement dataset and achieve state-of-the-art performance for both prediction tasks. The obtained results could be utilized as a benchmark for ads understanding. Energy Market Authority (EMA) Nanyang Technological University National Research Foundation (NRF) This research is supported in part and jointly by the National Research Foundation, Singapore, and the Energy Market Authority, under its Energy Programme (EP Award Ref. NRF2017EWT-EP003- 023) and a project fund from NTU (Ref. NTU–ACE2020-01). This research is also partially supported by National Natural Science Foundation of China (NSFC) under No.61971457 and No.7191101302. 2021-10-27T08:32:15Z 2021-10-27T08:32:15Z 2020 Conference Paper Zhang, H., Luo, Y., Ai, Q., Wen, Y. & Hu, H. (2020). Look, read and feel : benchmarking ads understanding with multimodal multitask learning. 28th ACM International Conference on Multimedia, 430-438. https://dx.doi.org/10.1145/3394171.3413582 9781450379885 https://hdl.handle.net/10356/152993 10.1145/3394171.3413582 2-s2.0-85104159119 430 438 en © 2020 Association for Computing Machinery. All rights reserved.
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering Ads Understanding Multimodal Learning
spellingShingle	Engineering::Computer science and engineering Ads Understanding Multimodal Learning Zhang, Huaizheng Luo, Yong Ai, Qiming Wen, Yonggang Hu, Han Look, read and feel : benchmarking ads understanding with multimodal multitask learning
description	Given the massive market of advertising and the sharply increasing online multimedia content (such as videos), it is now fashionable to promote advertisements (ads) together with the multimedia content. However, manually finding relevant ads to match the provided content is labor-intensive, and hence some automatic advertising techniques are developed. Since ads are usually hard to understand only according to its visual appearance due to the contained visual metaphor, some other modalities, such as the contained texts, should be exploited for understanding. To further improve user experience, it is necessary to understand both the ads' topic and sentiment. This motivates us to develop a novel deep multimodal multitask framework that integrates multiple modalities to achieve effective topic and sentiment prediction simultaneously for ads understanding. In particular, in our framework termed Deep$M^2$Ad, we first extract multimodal information from ads and learn high-level and comparable representations. The visual metaphor of the ad is decoded in an unsupervised manner. The obtained representations are then fed into the proposed hierarchical multimodal attention modules to learn task-specific representations for final prediction. A multitask loss function is also designed to jointly train both the topic and sentiment prediction models in an end-to-end manner, where bottom-layer parameters are shared to alleviate over-fitting. We conduct extensive experiments on a large-scale advertisement dataset and achieve state-of-the-art performance for both prediction tasks. The obtained results could be utilized as a benchmark for ads understanding.
author2	School of Computer Science and Engineering
author_facet	School of Computer Science and Engineering Zhang, Huaizheng Luo, Yong Ai, Qiming Wen, Yonggang Hu, Han
format	Conference or Workshop Item
author	Zhang, Huaizheng Luo, Yong Ai, Qiming Wen, Yonggang Hu, Han
author_sort	Zhang, Huaizheng
title	Look, read and feel : benchmarking ads understanding with multimodal multitask learning
title_short	Look, read and feel : benchmarking ads understanding with multimodal multitask learning
title_full	Look, read and feel : benchmarking ads understanding with multimodal multitask learning
title_fullStr	Look, read and feel : benchmarking ads understanding with multimodal multitask learning
title_full_unstemmed	Look, read and feel : benchmarking ads understanding with multimodal multitask learning
title_sort	look, read and feel : benchmarking ads understanding with multimodal multitask learning
publishDate	2021
url	https://hdl.handle.net/10356/152993
_version_	1715201499220410368

Look, read and feel : benchmarking ads understanding with multimodal multitask learning

Similar Items