Author name cluster

Pijian Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

4 papers

1 author row

AAAI Conference 2024 Conference Paper

Rethinking Two-Stage Referring Expression Comprehension: A Novel Grounding and Segmentation Method Modulated by Point

Peizhi Zhao
Shiyi Zheng
Wenye Zhao
Dongsheng Xu
Pijian Li
Yi Cai
Qingbao Huang

As a fundamental and challenging task in the vision and language domain, Referring Expression Comprehension (REC) has shown impressive improvements recently. However, for a complex task that couples the comprehension of abstract concepts and the localization of concrete instances, one-stage approaches are bottlenecked by computing and data resources. To obtain a low-cost solution, the prevailing two-stage approaches decouple REC into localization (region proposal) and comprehension (region-expression matching) at region-level, but the solution based on isolated regions cannot sufficiently utilize the context and is usually limited by the quality of proposals. Therefore, it is necessary to rebuild an efficient two-stage solution system. In this paper, we propose a point-based two-stage framework for REC, in which the two stages are redefined as point-based cross-modal comprehension and point-based instance localization. Specifically, we reconstruct the raw bounding box and segmentation mask into center and mass scores as soft ground-truth for measuring point-level cross-modal correlations. With the soft ground-truth, REC can be approximated as a binary classification problem, which fundamentally avoids the impact of isolated regions on the optimization process. Remarkably, the consistent metrics between center and mass scores allow our system to directly optimize grounding and segmentation by utilizing the same architecture. Experiments on multiple benchmarks show the feasibility and potential of our point-based paradigm. Our code available at https://github.com/VILAN-Lab/PBREC-MT.

PDF Details DOI

AAAI Conference 2023 Conference Paper

Linking People across Text and Images Based on Social Relation Reasoning

Yang Lei
Peizhi Zhao
Pijian Li
Yi Cai
Qingbao Huang

As a sub-task of visual grounding, linking people across text and images aims to localize target people in images with corresponding sentences. Existing approaches tend to capture superficial features of people (e.g., dress and location) that suffer from the incompleteness information across text and images. We observe that humans are adept at exploring social relations to assist identifying people. Therefore, we propose a Social Relation Reasoning (SRR) model to address the aforementioned issues. Firstly, we design a Social Relation Extraction (SRE) module to extract social relations between people in the input sentence. Specially, the SRE module based on zero-shot learning is able to extract social relations even though they are not defined in the existing datasets. A Reasoning based Cross-modal Matching (RCM) module is further used to generate matching matrices by reasoning on the social relations and visual features. Experimental results show that the accuracy of our proposed SRR model outperforms the state-of-the-art models on the challenging datasets Who's Waldo and FL: MSRE, by more than 5\% and 7\%, respectively. Our source code is available at https://github.com/VILAN-Lab/SRR.

PDF Details DOI

AAAI Conference 2021 Conference Paper

Entity Guided Question Generation with Contextual Structure and Sequence Information Capturing

Qingbao Huang
Mingyi Fu
Linzhang Mo
Yi Cai
Jingyun Xu
Pijian Li
Qing Li
Ho-fung Leung

Question generation is a challenging task and has attracted widespread attention in recent years. Although previous studies have made great progress, there are still two main shortcomings: First, previous work did not simultaneously capture the sequence information and structure information hidden in the context, which results in poor results of the generated questions. Second, the generated questions cannot be answered by the given context. To tackle these issues, we propose an entity guided question generation model with contextual structure information and sequence information capturing. We use a Graph Convolutional Network and a Bidirectional Long Short Term Memory Network to capture the structure information and sequence information of the context, simultaneously. In addition, to improve the answerability of the generated questions, we use an entity-guided approach to obtain question type from the answer, and jointly encode the answer and question type. Both automatic and manual metrics show that our model can generate comparable questions with state-of-the-art models. Our code is available at https: //github. com/VISLANG-Lab/EGSS.

PDF Details

AAAI Conference 2021 Conference Paper

Story Ending Generation with Multi-Level Graph Convolutional Networks over Dependency Trees

Qingbao Huang
Linzhang Mo
Pijian Li
Yi Cai
Qingguang Liu
Jielong Wei
Qing Li
Ho-fung Leung

As an interesting and challenging task, story ending generation aims at generating a reasonable and coherent ending for a given story context. The key challenge of the task is to comprehend the context sufficiently and capture the hidden logic information effectively, which has not been well explored by most existing generative models. To tackle this issue, we propose a context-aware Multi-level Graph Convolutional Networks over Dependency Parse (MGCN-DP) trees to capture dependency relations and context clues more effectively. We utilize dependency parse trees to facilitate capturing relations and events in the context implicitly, and Multilevel Graph Convolutional Networks to update and deliver the representation crossing levels to obtain richer contextual information. Both automatic and manual evaluations show that our MGCN-DP can achieve comparable performance with state-of-the-art models. Our source code is available at https: //github. com/VISLANG-Lab/MLGCN-DP.

PDF Details