MDA: Multimodal Data Augmentation Framework for Boosting Performance on Sentiment/Emotion Classification Tasks

Nan Xu; Wenji Mao; Penghui Wei; Daniel Zeng

doi:10.1109/mis.2020.3026715

Back to IS

IS 2021

MDA: Multimodal Data Augmentation Framework for Boosting Performance on Sentiment/Emotion Classification Tasks

Journal Article journal-article Artificial Intelligence · Intelligent Systems

Details DOI

Abstract

Multimodal data analysis has drawn increasing attention with the explosive growth of multimedia data. Although traditional unimodal data analysis tasks have accumulated abundant labeled datasets, there are few labeled multimodal datasets due to the difficulty and complexity of multimodal data annotation, nor is it easy to directly transfer unimodal knowledge to multimodal data. Unfortunately, there is little related data augmentation work in multimodal domain, especially for image–text data. In this article, to address the scarcity problem of labeled multimodal data, we propose a Multimodal Data Augmentation framework for boosting the performance on multimodal image–text classification task. Our framework learns a cross-modality matching network to select image–text pairs from existing unimodal datasets as the multimodal synthetic dataset, and uses this dataset to enhance the performance of classifiers. We take the multimodal sentiment analysis and multimodal emotion analysis as the experimental tasks and the experimental results show the effectiveness of our framework for boosting the performance on multimodal classification task.

Authors

Keywords

Task analysis
Data analysis
Boosting
Social networking (online)
Annotations
Sentiment analysis
Automation
Classification Task
Data Augmentation
Augmented Framework
Data Augmentation Framework
Training Data
Transfer Learning
Diverse Data
Training Strategy
Analysis Tasks
Image Texture
Matching Network
Multimodal Analysis
Traditional Tasks
Scarcity Problem
Multimodal Dataset
Challenges In Data Analysis
Data Analysis Tasks
Multimodal Tasks
Computer Vision Area
Matching Stage
Matching Strategy
Target Dataset
Pre-training Process
Image Encoder
Text Encoder
Image Dataset
Text Dataset
Generative Adversarial Networks
Base Classifiers
cross-modality matching
synthetic dataset
multimodal classification

Context

Venue: IEEE Intelligent Systems
Archive span: 2001-2026
Indexed papers: 2921
Paper id: 977544570913407381