Author name cluster

Xiang Zhao

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

12 papers

1 author row

AAAI Conference 2026 Conference Paper

Iterative Multi-Granular RAG with Contextual Hierarchical Graph

Yanli Hu
Teng Liu
Zhuangyi Zhou
Weixin Zeng
Zhen Tan
Xiang Zhao

Retrieval-Augmented Generation (RAG) enhances large language models (LLMs) with external knowledge retrieval, improving factual accuracy and knowledge coverage. However, existing RAG approaches face a fundamental trade-off when handling complex reasoning: while traditional iterative retrieval methods offer flexibility, their local perspective limits their ability to establish global knowledge connections. In contrast, structure-augmented RAG methods capture global relationships but incur significant construction costs. To fill in this gap, we propose MGranRAG, an innovative framework designed to integrate precise local retrieval with structured global reasoning. Our approach circumvents expensive semantic extraction by employing a lightweight contextual hierarchical graph, effectively combining the local adaptability of iterative retrieval with the global consistency of structured knowledge. The framework adopts a novel iterative optimization scheme: at the local level, the LLM identifies multi-granular contextual evidence, such as key sentences and phrases, within retrieved passages to refine retrieval. At the global level, these multi-granularity evidence nodes are then mapped and propagated within the structured hierarchical graph, enabling the diffusion of rich contextual information at different levels to introduce global semantic constraints and reorder retrieval results. This coordination between local and global iterative processes dynamically balances retrieval accuracy and contextual coherence. Experimental results on challenging multi-hop and open-domain question answering datasets show that our proposal achieves new state-of-the-art performance in both retrieval and answer accuracy.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Multi-granularity Temporal Knowledge Editing over Large Language Models

Simiao Zhao
Ning Pang
Zhen Tan
Yanli Hu
Weidong Xiao
Xiang Zhao

The evolving worldly dynamics necessitate continuous revision and updating of knowledge within Large Language Models (LLMs), driving the development of Knowledge Editing (KE) techniques. Recently, a novel paradigm of Temporal Knowledge Editing (TKE) has been proposed, emphasizing that models deployed in dynamic environments should integrate new information while retaining historical knowledge. However, we observe that current definitions and methods for TKE are insufficient, as they do not effectively capture or adapt to the fine-grained temporal dynamics inherent in real-world knowledge evolution. In this paper, we introduce the notion of multi-granularity TKE, encompassing temporal knowledge across yearly, monthly, and daily granularities, and propose a corresponding dataset, named MTKE. We argue that comprehending and retaining knowledge across different temporal granularities is crucial for LLMs to accurately reflect real-world changes. The key challenge lies in integrating new temporal knowledge at various granularities while also preserving relevant historical knowledge, thus ensuring LLMs maintain a consistent and accurate understanding over time. To achieve this, we propose a Sparse Parameter-Injected Knowledge Editing method, dubbed SPIKE, which anchors both temporal knowledge and subject positions within the model. Experiments demonstrate that our method effectively preserves historical knowledge performance while accurately incorporating dynamic temporal knowledge across multi-granularity temporal scenarios.

PDF Details DOI

AAAI Conference 2026 Conference Paper

NeSTR: A Neuro-Symbolic Abductive Framework for Temporal Reasoning in Large Language Models

Feng Liang
Weixin Zeng
Runhao Zhao
Xiang Zhao

Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of natural language processing tasks. However, temporal reasoning, particularly under complex temporal constraints, remains a major challenge. To this end, existing approaches have explored symbolic methods, which encode temporal structure explicitly, and reflective mechanisms, which revise reasoning errors through multi-step inference. Nonetheless, symbolic approaches often underutilize the reasoning capabilities of LLMs, while reflective methods typically lack structured temporal representations, which can result in inconsistent or hallucinated reasoning. As a result, even when the correct temporal context is available, LLMs may still misinterpret or misapply time-related information, leading to incomplete or inaccurate answers. To address these limitations, in this work, we propose Neuro-Symbolic Temporal Reasoning (NeSTR), a novel framework that integrates structured symbolic representations with hybrid reflective reasoning to enhance the temporal sensitivity of LLM inference. NeSTR preserves explicit temporal relations through symbolic encoding, enforces logical consistency via verification, and corrects flawed inferences using abductive reflection. Extensive experiments on diverse temporal question answering benchmarks demonstrate that NeSTR achieves superior zero-shot performance and consistently improves temporal reasoning without any fine-tuning, showcasing the advantage of neuro-symbolic integration in enhancing temporal understanding in large language models.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Cross-modal Multi-task Learning for Multimedia Event Extraction

Jianwei Cao
Yanli Hu
Zhen Tan
Xiang Zhao

Multimedia event extraction aims to jointly extract event structural knowledge from multiple modalities, thus improving the comprehension and utilization of events in the growing multimedia content (e.g., multimedia news). A key challenge in multimedia event extraction is to establish cross-modal correlations during training without multimedia event annotations. Considering the complexity and cost of annotation across modalities, the multimedia event extraction task only provides parallel annotated data for evaluation. Previous works attempt to learn implicit correlations directly from unlabeled image-text pairs, but do not yield substantially better performance for event-centric tasks. To address this problem, we propose a cross-modal multi-task learning framework X-MTL to establish cross-modal correlations at the task level, which can simultaneously address four key tasks of multimedia event extraction: trigger detection, argument extraction, verb classification, and role classification. Specifically, to process inputs from different modalities and tasks, we utilize two separate modality-specific encoders and a modality-shared encoder to learn joint task representations, and introduce textual and visual prompt learning methods to enrich and unify task inputs. To resolve task conflict in cross-modal multi-task learning, we propose a pseudo label based knowledge distillation method, combined with dynamic weight adjustment method, which can effectively lift the performance to surpass the separately-trained models. On the Multimedia Event Extraction benchmark M2E2, experimental results show that X-MTL surpasses the current state-of-the-art (SOTA) methods by 4.1% for multimedia event mention and 8.2% for multimedia argument role.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Each Fake News Is Fake in Its Own Way: An Attribution Multi-Granularity Benchmark for Multimodal Fake News Detection

Hao Guo
Zihan Ma
Zhi Zeng
Minnan Luo
Weixin Zeng
Jiuyang Tang
Xiang Zhao

Social platforms, while facilitating access to information, have also become saturated with a plethora of fake news, resulting in negative consequences. Automatic multimodal fake news detection is a worthwhile pursuit. Existing multimodal fake news datasets only provide binary labels of real or fake. However, real news is alike, while each fake news is fake in its own way. These datasets fail to reflect the mixed nature of various types of multimodal fake news. To bridge the gap, we construct an attributing multi-granularity multimodal fake news detection dataset AMG, revealing the inherent fake pattern. Furthermore, we propose a multi-granularity clue alignment model MGCA to achieve multimodal fake news detection and attribution. Experimental results demonstrate that AMG is a challenging dataset, and its attribution setting opens up new avenues for future research.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Logic Induced High-Order Reasoning Network for Event-Event Relation Extraction

Peixin Huang
Xiang Zhao
Minghao Hu
Zhen Tan
Weidong Xiao

To understand a document with multiple events, event-event relation extraction (ERE) emerges as a crucial task, aiming to discern how natural events temporally or structurally associate with each other. To achieve this goal, our work addresses the problems of temporal event relation extraction (TRE) and subevent relation extraction (SRE). The latest methods for such problems have commonly built document-level event graphs for global reasoning across sentences. However, the edges between events are usually derived from external tools heuristically, which are not always reliable and may introduce noise. Moreover, they are not capable of preserving logical constraints among event relations, e.g., coreference constraint, symmetry constraint and conjunction constraint. These constraints guarantee coherence between different relation types, enabling the generation of a unified event evolution graph. In this work, we propose a novel method named LogicERE, which performs high-order event relation reasoning through modeling logic constraints. Specifically, different from conventional event graphs, we design a logic constraint induced graph (LCG) without any external tools. LCG involves event nodes where the interactions among them can model the coreference constraint, and event pairs nodes where the interactions among them can retain the symmetry constraint and conjunction constraint. Then we perform high-order reasoning on LCG with relational graph transformer to obtain enhanced event and event pair embeddings. Finally, we further incorporate logic constraint information via a joint logic learning module. Extensive experiments demonstrate the effectiveness of the proposed method with state-of-the-art performance on benchmark datasets.

PDF Details DOI

AAAI Conference 2025 Conference Paper

RDPI: A Refine Diffusion Probability Generation Method for Spatiotemporal Data Imputation

Zijin Liu
Xiang Zhao
You Song

Spatiotemporal data imputation plays a crucial role in various fields such as traffic flow monitoring, air quality assessment, and climate prediction. However, spatiotemporal data collected by sensors often suffer from temporal incompleteness, and the sparse and uneven distribution of sensors leads to missing data in the spatial dimension. Among existing methods, autoregressive approaches are prone to error accumulation, while simple conditional diffusion models fail to adequately capture the spatiotemporal relationships between observed and missing data. To address these issues, we propose a novel two-stage Refined Diffusion Probability Impuation (RDPI) framework based on an initial network and a conditional diffusion model. In the initial stage, deterministic imputation methods are used to generate preliminary estimates of the missing data. In the refinement stage, residuals are treated as the diffusion target, and observed values are innovatively incorporated into the forward process. This results in a conditional diffusion model better suited for spatiotemporal data imputation, bridging the gap between the preliminary estimates and the true values. Experiments on multiple datasets demonstrate that RDPI not only achieves state-of-the-art imputation performance but also significantly reduces sampling computational costs.

PDF Details DOI

AAAI Conference 2020 Conference Paper

HAMNER: Headword Amplified Multi-Span Distantly Supervised Method for Domain Specific Named Entity Recognition

Shifeng Liu
Yifang Sun
Bing Li
Wei Wang
Xiang Zhao

To tackle Named Entity Recognition (NER) tasks, supervised methods need to obtain sufﬁcient cleanly annotated data, which is labor and time consuming. On the contrary, distantly supervised methods acquire automatically annotated data using dictionaries to alleviate this requirement. Unfortunately, dictionaries hinder the effectiveness of distantly supervised methods for NER due to its limited coverage, especially in speciﬁc domains. In this paper, we aim at the limitations of the dictionary usage and mention boundary detection. We generalize the distant supervision by extending the dictionary with headword based non-exact matching. We apply a function to better weight the matched entity mentions. We propose a span-level model, which classiﬁes all the possible spans then infers the selected spans with a proposed dynamic programming algorithm. Experiments on all three benchmark datasets demonstrate that our method outperforms previous state-of-the-art distantly supervised methods.

PDF Details

AAAI Conference 2020 Conference Paper

Recursively Binary Modification Model for Nested Named Entity Recognition

Bing Li
Shifeng Liu
Yifang Sun
Wei Wang
Xiang Zhao

Recently, there has been an increasing interest in identifying named entities with nested structures. Existing models only make independent typing decisions on the entire entity span while ignoring strong modiﬁcation relations between subentity types. In this paper, we present a novel Recursively Binary Modiﬁcation model for nested named entity recognition. Our model utilizes the modiﬁcation relations among sub-entities types to infer the head component on top of a Bayesian framework and uses entity head as a strong evidence to determine the type of the entity span. The process is recursive, allowing lower-level entities to help better model those on the outer-level. To the best of our knowledge, our work is the ﬁrst effort that uses modiﬁcation relation in nested NER task. Extensive experiments on four benchmark datasets demonstrate that our model outperforms state-of-the-art models in nested NER tasks, and delivers competitive results with state-of-the-art models in ﬂat NER task, without relying on any extra annotations or NLP tools.

PDF Details

AAAI Conference 2019 Conference Paper

Antonym-Synonym Classification Based on New Sub-Space Embeddings

Muhammad Asif Ali
Yifang Sun
Xiaoling Zhou
Wei Wang
Xiang Zhao

Distinguishing antonyms from synonyms is a key challenge for many NLP applications focused on the lexical-semantic relation extraction. Existing solutions relying on large-scale corpora yield low performance because of huge contextual overlap of antonym and synonym pairs. We propose a novel approach entirely based on pre-trained embeddings. We hypothesize that the pre-trained embeddings comprehend a blend of lexical-semantic information and we may distill the task-specific information using Distiller, a model proposed in this paper. Later, a classifier is trained based on features constructed from the distilled sub-spaces along with some word level features to distinguish antonyms from synonyms. Experimental results show that the proposed model outperforms existing research on antonym synonym distinction in both speed and performance.

PDF Details

AAAI Conference 2019 Conference Paper

Jointly Extracting Multiple Triplets with Multilayer Translation Constraints

Zhen Tan
Xiang Zhao
Wei Wang
Weidong Xiao

Triplets extraction is an essential and pivotal step in automatic knowledge base construction, which captures structural information from unstructured text corpus. Conventional extraction models use a pipeline of named entity recognition and relation classification to extract entities and relations, respectively, which ignore the connection between the two tasks. Recently, several neural network-based models were proposed to tackle the problem, and achieved state-of-the-art performance. However, most of them are unable to extract multiple triplets from a single sentence, which are yet commonly seen in real-life scenarios. To close the gap, we propose in this paper a joint neural extraction model for multitriplets, namely, TME, which is capable of adaptively discovering multiple triplets simultaneously in a sentence via ranking with translation mechanism. In experiment, TME exhibits superior performance and achieves an improvement of 37. 6% on F1 score over state-of-the-art competitors.

PDF Details

AAAI Conference 2019 Conference Paper

Read, Watch, and Move: Reinforcement Learning for Temporally Grounding Natural Language Descriptions in Videos

Dongliang He
Xiang Zhao
Jizhou Huang
Fu Li
Xiao Liu
Shilei Wen

The task of video grounding, which temporally localizes a natural language description in a video, plays an important role in understanding videos. Existing studies have adopted strategies of sliding window over the entire video or exhaustively ranking all possible clip-sentence pairs in a presegmented video, which inevitably suffer from exhaustively enumerated candidates. To alleviate this problem, we formulate this task as a problem of sequential decision making by learning an agent which regulates the temporal grounding boundaries progressively based on its policy. Specifically, we propose a reinforcement learning based framework improved by multi-task learning and it shows steady performance gains by considering additional supervised boundary information during training. Our proposed framework achieves state-ofthe-art performance on ActivityNet’18 DenseCaption dataset (Krishna et al. 2017) and Charades-STA dataset (Sigurdsson et al. 2016; Gao et al. 2017) while observing only 10 or less clips per video.

PDF Details