Interpretable tensor fusion

Conventional machine learning methods are predominantly designed to predict outcomes based on a single data type. However, practical applications may encompass data of diverse types, such as text, images, and audio. We introduce interpretable tensor fusion (InTense), a multimodal learning method tra...

Full description

Saved in:
Bibliographic Details
Main Authors: VARSHNEYA, Saurabh, LEDENT, Antoine, LIZNERSKI, Philipp, BALINSKYY, Andriy, MEHTA, Purvanshi, MUSTAFA, Waleed, KLOFT, Marius
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2024
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/9305
https://ink.library.smu.edu.sg/context/sis_research/article/10305/viewcontent/0557__1_.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
Description
Summary:Conventional machine learning methods are predominantly designed to predict outcomes based on a single data type. However, practical applications may encompass data of diverse types, such as text, images, and audio. We introduce interpretable tensor fusion (InTense), a multimodal learning method training a neural network to simultaneously learn multiple data representations and their interpretable fusion. InTense can separately capture both linear combinations and multiplicative interactions of the data types, thereby disentangling higher-order interactions from the individual effects of each modality. InTense provides interpretability out of the box by assigning relevance scores to modalities and their associations, respectively. The approach is theoretically grounded and yields meaningful relevance scores on multiple synthetic and real-world datasets. Experiments on four real-world datasets show that InTense outperforms existing state-of-the-art multimodal interpretable approaches in terms of accuracy and interpretability.