Author name cluster

Jieming Yang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

2 papers

1 author row

TIST Journal 2026 Journal Article

Cascade Transformer for Hierarchical Semantic Reasoning in Text-Based Visual Question Answering

Yuan Gao
Dezhen Feng
Laurence T. Yang
Jing Yang
Xiaowen Jiang
Jieming Yang

Text-based visual question answering (TextVQA) aims to answer questions by understanding scene text in images. However, many current methods overly depend on the accuracy of Optical Character Recognition (OCR) systems, while overlooking the significance of visual objects. They tend to perform poorly when the question involves the relationships between visual objects and scene text. To address the above issues, we focus on raising the status of visual objects and innovatively propose a hierarchical semantic reasoning network (CT-HSR) based on the cascade transformer architecture, achieving fine-grained cross-modal reasoning and visual semantic enhancement. Specifically, the visual representations containing rich semantic information of the question modality are obtained through the cross-modal transformer-based vision-language pre-training model firstly. Then, the uni-modal transformer for unified modality encoding module is utilized to capture visual objects that are more semantically related to OCR texts. In addition, we further alleviate the cross-modal noise interference through the feature filtering strategy. Finally, we better align the three modalities by introducing TextVQA pre-training tasks and generate prediction answers through multi-step iterative prediction during fine-tuning. Extensive experiments on the TextVQA, ST-VQA, and OCR-VQA datasets have demonstrated the effectiveness of our proposed model compared to the state-of-the-art methods. The code will be released at https://github.com/FTFWO/CT-HSR.

Details DOI

AAAI Conference 2026 Conference Paper

Towards Multimodal Continual Knowledge Embedding with Modality Forgetting Modulation

Xiaowen Jiang
Jing Yang
ShunDong Yang
Yuan Gao
Xinfa Jiang
Laurence Tianruo Yang
Jieming Yang

The continuous emergence of new entities, relations, triples, and multimodal information drives the dynamic evolution of multimodal knowledge graph (MMKG). However, existing MMKG embedding models follow a static setting, where training from scratch for growing MMKG wastes learned knowledge, while fine-tuning on new knowledge easily leads to catastrophic forgetting, severely limiting their applicability in real-world scenarios. To address this, we propose a multimodal continual representation learning framework (MoFot) for growing MMKG. Unlike existing static multimodal embedding methods, MoFot focuses on alleviating catastrophic forgetting rather than retraining to adapt to new knowledge. Specifically, MoFot effectively mitigates catastrophic forgetting caused by parameter updates and differing forgetting rates across modalities through a multimodal collaborative modulation mechanism. The mechanism ensures consistent retention of previously learned multimodal knowledge across snapshots through multimodal weight modulation and multimodal feature modulation. MoFot outperforms existing MMKG embedding, KG continual learning, and MMKG inductive models. Experimental results demonstrate that MoFot not only avoids forgetting but also enhances old knowledge by learning new knowledge, achieving adaptation to new knowledge while mitigating forgetting of old knowledge.

PDF Details DOI