Arrow Research search

Author name cluster

Changjian Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

6 papers
2 author rows

Possible papers

6

AAAI Conference 2026 Conference Paper

Cross-Granularity Hypergraph Retrieval-Augmented Generation for Multi-hop Question Answering

  • Changjian Wang
  • Weihong Deng
  • Weili Guan
  • Quan Lu
  • Ning Jiang

Multi-hop question answering (MHQA) requires integrating knowledge scattered across multiple passages to derive the correct answer. Traditional retrieval-augmented generation (RAG) methods primarily focus on coarse-grained textual semantic similarity and ignore structural associations among dispersed knowledge, which limits their effectiveness in MHQA tasks. GraphRAG methods address this by leveraging knowledge graphs (KGs) to capture structural associations, but they tend to overly rely on structural information and fine-grained word- or phrase-level retrieval, resulting in an underutilization of textual semantics. In this paper, we propose a novel RAG approach called HGRAG for MHQA that achieves cross-granularity integration of structural and semantic information via hypergraphs. Structurally, we construct an entity hypergraph where fine-grained entities serve as nodes and coarse-grained passages as hyperedges, and establish knowledge association through shared entities. Semantically, we design a hypergraph retrieval method that integrates fine-grained entity similarity and coarse-grained passage similarity via hypergraph diffusion. Finally, we employ a retrieval enhancement module, which further refines the retrieved results both semantically and structurally, to obtain the most relevant passages as context for answer generation with the LLM. Experimental results on benchmark datasets demonstrate that our approach outperforms state-of-the-art methods in QA performance, and achieves a 6× speedup in retrieval efficiency.

AAAI Conference 2025 Conference Paper

Debiased Active Learning with Variational Gradient Rectifier

  • Weiguo Chen
  • Changjian Wang
  • Shijun Li
  • Kele Xu
  • Yanru Bai
  • Wei Chen
  • Shanshan Li

The strategy of selecting ``most informative'' hard samples in active learning has proven a boon for alleviating the challenges of few-shot learning and costly data annotation in deep learning. However, this very preference towards hard samples engenders bias issues, thereby impeding the full potential of active learning. It has witnessed an increasing trend to mitigate this stubborn problem, yet most neglect the quantification of bias itself and the direct rectification of dynamically evolving biases. Revisiting the bias issue, this paper presents an active learning approach based on the Variational Gradient Rectifier (VaGeRy). First, we employ variational methods to quantify bias at the level of latent state representations. Then, harnessing historical training dynamics, we introduce Uncertainty Consistency Regularization and Fluctuation Restriction, which asynchronously iterate to rectify gradient backpropagation. Extensive experiments demonstrate that our proposed methodology effectively counteracts bias phenomena in a majority of active learning scenarios

ICLR Conference 2024 Conference Paper

At Which Training Stage Does Code Data Help LLMs Reasoning?

  • Yingwei Ma
  • Yue Liu
  • Yue Yu 0001
  • Yuanliang Zhang
  • Yu Jiang 0001
  • Changjian Wang
  • Shanshan Li 0001

Large Language models (LLMs) have exhibited remarkable reasoning capabilities and become the foundation of language technologies. Inspired by the great success of code data in training LLMs, we naturally wonder at which training stage introducing code data can really help LLMs reasoning. To this end, this paper systematically explores the impact of code data on LLMs at different stages. Concretely, we introduce the code data at the pre-training stage, instruction-tuning stage, and both of them, respectively. Then, the reasoning capability of LLMs is comprehensively and fairly evaluated via six reasoning tasks. We critically analyze the experimental results and provide conclusions with insights. First, pre-training LLMs with the mixture of code and text can significantly enhance LLMs' general reasoning capability almost without negative transfer on other tasks. Besides, at the instruction-tuning stage, code data endows LLMs the task-specific reasoning capability. Moreover, the dynamic mixing strategy of code and text data assists LLMs to learn reasoning capability step-by-step during training. These insights deepen the understanding of LLMs regarding reasoning ability for their application, such as scientific question answering, legal support, etc.

AAAI Conference 2022 Conference Paper

Exploring Relational Semantics for Inductive Knowledge Graph Completion

  • Changjian Wang
  • Xiaofei Zhou
  • Shirui Pan
  • Linhua Dong
  • Zeliang Song
  • Ying Sha

Knowledge graph completion (KGC) aims to infer missing information in incomplete knowledge graphs (KGs). Most previous works only consider the transductive scenario where entities are existing in KGs, which cannot work effectively for the inductive scenario containing emerging entities. Recently some graph neural network-based methods have been proposed for inductive KGC by aggregating neighborhood information to capture some uncertainty semantics from the neighboring auxiliary triples. But these methods ignore the more general relational semantics underlying all the known triples that can provide richer information to represent emerging entities so as to satisfy the inductive scenario. In this paper, we propose a novel model called CFAG, which utilizes two granularity levels of relational semantics in a coarsegrained aggregator (CG-AGG) and a fine-grained generative adversarial net (FG-GAN), for inductive KGC. The CG-AGG firstly generates entity representations with multiple semantics through a hypergraph neural network-based global aggregator and a graph neural network-based local aggregator, and the FG-GAN further enhances entity representations with specific semantics through conditional generative adversarial nets. Experimental results on benchmark datasets show that our model outperforms state-of-the-art models for inductive KGC.

IJCAI Conference 2022 Conference Paper

Unsupervised Voice-Face Representation Learning by Cross-Modal Prototype Contrast

  • Boqing Zhu
  • Kele Xu
  • Changjian Wang
  • Zheng Qin
  • Tao Sun
  • Huaimin Wang
  • Yuxing Peng

We present an approach to learn voice-face representations from the talking face videos, without any identity labels. Previous works employ cross-modal instance discrimination tasks to establish the correlation of voice and face. These methods neglect the semantic content of different videos, introducing false-negative pairs as training noise. Furthermore, the positive pairs are constructed based on the natural correlation between audio clips and visual frames. However, this correlation might be weak or inaccurate in a large amount of real-world data, which leads to deviating positives into the contrastive paradigm. To address these issues, we propose the cross-modal prototype contrastive learning (CMPC), which takes advantage of contrastive methods and resists adverse effects of false negatives and deviate positives. On one hand, CMPC could learn the intra-class invariance by constructing semantic-wise positives via unsupervised clustering in different modalities. On the other hand, by comparing the similarities of cross-modal instances from that of cross-modal prototypes, we dynamically recalibrate the unlearnable instances' contribution to overall loss. Experiments show that the proposed approach outperforms state-of-the-art unsupervised methods on various voice-face association evaluation protocols. Additionally, in the low-shot supervision setting, our method also has a significant improvement compared to previous instance-wise contrastive learning.