Arrow Research search

Author name cluster

Linxin Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

1 paper
1 author row

Possible papers

1

AAAI Conference 2024 Conference Paper

Comprehensive Visual Grounding for Video Description

  • Wenhui Jiang
  • Yibo Cheng
  • Linxin Liu
  • Yuming Fang
  • Yuxin Peng
  • Yang Liu

The grounding accuracy of existing video captioners is still behind the expectation. The majority of existing methods perform grounded video captioning on sparse entity annotations, whereas the captioning accuracy often suffers from degenerated object appearances on the annotated area such as motion blur and video defocus. Moreover, these methods seldom consider the complex interactions among entities. In this paper, we propose a comprehensive visual grounding network to improve video captioning, by explicitly linking the entities and actions to the visual clues across the video frames. Specifically, the network consists of spatial-temporal entity grounding and action grounding. The proposed entity grounding encourages the attention mechanism to focus on informative spatial areas across video frames, albeit the entity is annotated in only one frame of a video. The action grounding dynamically associates the verbs to related subjects and the corresponding context, which keeps fine-grained spatial and temporal details for action prediction. Both entity grounding and action grounding are formulated as a unified task guided by a soft grounding supervision, which brings architecture simplification and improves training efficiency as well. We conduct extensive experiments on two challenging datasets, and demonstrate significant performance improvements of +2.3 CIDEr on ActivityNet-Entities and +2.2 CIDEr on MSR-VTT compared to state-of-the-arts.