IJCAI Conference 2022 Conference Paper
Targeted Multimodal Sentiment Classification based on Coarse-to-Fine Grained Image-Target Matching
- Jianfei Yu
- Jieming Wang
- Rui Xia
- Junjie Li
Targeted Multimodal Sentiment Classification (TMSC) aims to identify the sentiment polarities over each target mentioned in a pair of sentence and image. Existing methods to TMSC failed to explicitly capture both coarse-grained and fine-grained image-target matching, including 1) the relevance between the image and the target and 2) the alignment between visual objects and the target. To tackle this issue, we propose a new multi-task learning architecture named coarse-to-fine grained Image-Target Matching network (ITM), which jointly performs image-target relevance classification, object-target alignment, and targeted sentiment classification. We further construct an Image-Target Matching dataset by manually annotating the image-target relevance and the visual object aligned with the input target. Experiments on two benchmark TMSC datasets show that our model consistently outperforms the baselines, achieves state-of-the-art results, and presents interpretable visualizations.