Author name cluster

Qi Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

156 papers

2 author rows

AAAI Conference 2026 Conference Paper

Channel-masked Asymmetric Distribution Matching for Cross-Domain Generalized Dataset Distillation

Qi Liu
Chenghao Xu
Jiexi Yan
guangtao lyu
Erkun Yang
Guihai Chen
Yanhua Yang

Dataset distillation has achieved remarkable progress as an effective approach for data compression. However, real-world data often comes from diverse domains, leading to potential mismatches between the domains of synthesized images and those of the evaluation set. Existing methods primarily assume domain alignment between them, which limits their generalization ability in the above cross-domain scenarios. In this paper, we aim to ensure that images synthesized from known domains maintain robust performance on unseen domains and propose a novel framework called Channel-masked Asymmetric Distribution Matching (CADM). During asymmetric distribution matching, domain-sensitive channels of real data are selectively masked at different layers to extract domain-invariant features that guide synthetic data optimization. To further improve synthetic data representation, we introduce a class-focused domain-agnostic regularization to capture class-relevant knowledge while ignoring domain-specific information. Experiments show that our method produces domain-robust synthetic data and substantially improves generalization performance on unseen domains.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Conversational Learning Diagnosis via Reasoning Multi-Turn Interactive Learning

Fangzhou Yao
Sheng Chang
Weibo Gao
Qi Liu

Learning diagnosis is a critical task that monitors students' cognitive state during educational activities, with the goal of enhancing learning outcomes. With advancements in language models (LMs), many AI-driven educational studies have shifted towards conversational learning scenarios, where students engage in multi-turn interactive dialogues with tutors. However, conversational learning diagnosis remains underdeveloped, and most existing techniques acquire students' cognitive state through intuitive instructional prompts on LMs to analyze the dialogue text. This direct prompting approach lacks a solid psychological foundation and fails to ensure the reliability of the generated analytical text. In this study, we introduce ParLD, a preview-analyze-reason framework for conversational learning diagnosis, which leverages multi-agent collaboration to diagnose students' cognitive state over multiple dialogue turns. Specifically, ParLD comprises main components: (1) Behavior Previewer, which generates a student behavior schema based on previous states and learning content; (2) State Analyzer, which diagnose the tutor-student dialogue and behavior schema to update the cognitive state; and (3) Performance Reasoner, which predicts the student's future responses and provides verifiable feedback to support ParLD's self-reflection with the Chain Reflector. They operate sequentially and iteratively during each interaction turn to diagnose the student’s cognitive state. We conduct experiments to evaluate both performance prediction and tutoring support, emphasizing the effectiveness of ParLD in providing reliable and insightful learning diagnosis.

PDF Details DOI

AAAI Conference 2026 Conference Paper

DMGIN: How Multimodal LLMs Enhance Large Recommendation Models for Lifelong User Post-click Behaviors

Zhuoxing Wei
Qingchen Xie
Qi Liu
Jingsong Yu

Modeling user interest based on lifelong user behavior sequences is crucial for enhancing Click-Through Rate (CTR) prediction. However, long post-click behavior sequences themselves pose severe performance issues: the sheer volume of data leads to high computational costs and inefficiencies in model training and inference. Traditional methods address this by introducing two-stage approaches, but this compromises model effectiveness due to incomplete utilization of the full sequence context. More importantly, integrating multimodal embeddings into existing large recommendation models (LRM) presents significant challenges: These embeddings often exacerbate computational burdens and mismatch with LRM architectures. To address these issues and enhance the model's efficiency and accuracy, we introduce Deep Multimodal Group Interest Network (DMGIN). Given the observation that user post-click behavior sequences contain a large number of repeated items with varying behaviors and timestamps, DMGIN employs Multimodal LLMs(MLLM) for grouping to reorganize complete lifelong post-click behavior sequences more effectively, with almost no additional computational overhead, as opposed to directly introducing multimodal embeddings. To mitigate the potential information loss from grouping, we have implemented two key strategies. First, we analyze behaviors within each group using both interest statistics and intra-group transformers to capture group traits. Second, apply inter-group transformers to temporally ordered groups to capture the evolution of user group interests. Our extensive experiments on both industrial and public datasets confirm the effectiveness and efficiency of DMGIN. The A/B test in our LBS advertising system shows that DMGIN improves CTR by 4.7% and Revenue per Mile by 2.3%.

PDF Details DOI

AAAI Conference 2026 Conference Paper

From Diagnosis to Generalization: A Cognitive Approach to Data Selection for Educational LLMs

Yuxiang Guo
Yan Zhuang
Qi Liu
Zhenya Huang
Xianquan Wang
Liyang He
Jiatong Li
Rui Li

Specializing Large Language Models for educational domains is a key frontier in creating personalized learning tools. The central challenge is not data scarcity but its abundance: efficiently selecting a curated data subset from vast corpora to enhance specialized skills and foster generalization, without degrading existing abilities. Existing data selection paradigms, relying on superficial semantic similarity or model training dynamics, often lack a principled framework to identify data that promotes true cognitive growth. Our work proposes a paradigm shift from leveraging indirect proxies of learning value, such as semantic similarity and training dynamics, towards a framework that performs a direct, cognitive-level modeling of the learner's state. We introduce CASS, a novel framework that implements this cognitive approach through a clear pipeline, moving from an initial Diagnosis to the ultimate goal of expanding the model's cognitive frontier. First, CASS diagnoses the LLM's cognitive frontier using Multidimensional Item Response Theory. Leveraging this diagnosis, it then employs Fisher Information to select a data subset situated at LLM's cognitive frontier that offers maximum informational gain. Finally, the model is fine-tuned on this curated data using a structured, easy-to-hard curriculum to ensure effective learning. Experiments on our new multi-subject dataset show that models trained with CASS not only achieve superior accuracy in the target domain but also exhibit enhanced generalization. CASS provides a more efficient, effective, and theoretically-grounded paradigm for building expert educational LLMs.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Generic Adversarial Attack Framework Against Graph-based Vertical Federated Learning

Yimin Liu
Peng Jiang
Qi Liu
Liehuang Zhu

Graph-based vertical federated learning (GVFL) enables multiple parties to collaboratively train and infer over aligned nodes, where each party contributes its own local embedding derived from different attributes and adjacency relations. Adversarial inputs injected by an attacker can skew the joint prediction toward its desired outcomes while diminishing the influence of benign parties and undermining contribution. However, most attacks typically have pre-set assumptions, such as access to the server architecture, model queries, or in-domain auxiliary graphs. In this paper, we propose SGAC, an attack framework that enables domination of joint inference without relying on above assumptions. SGAC learns label-indicative embeddings and class-transferable probabilities to generate a surrogate that closely mimics the server-side classification behavior by exploiting auxiliary graphs from non-training domains. SGAC then leverages saliency over node attributes and edges on the auxiliary graphs to construct a diverse set of shadow inputs resembling highly influential test instances. With the surrogate fidelity and input diversity, SGAC crafts transferable contribution-monopoly adversarial inputs that hijack GVFL incentives. Extensive experiments across diverse model architectures validate SGAC's effectiveness.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Look as You Think: Unifying Reasoning and Visual Evidence Attribution for Verifiable Document RAG via Reinforcement Learning

Shuochen Liu
Pengfei Luo
Chao Zhang
Yuhao Chen
Haotian Zhang
Qi Liu
Xin Kou
Tong Xu

Aiming to identify precise evidence sources from visual documents, visual evidence attribution for visual document retrieval–augmented generation (VD-RAG) ensures reliable and verifiable predictions from vision-language models (VLMs) in multimodal question answering. Most existing methods adopt end-to-end training to facilitate intuitive answer verification. However, they lack fine-grained supervision and progressive traceability throughout the reasoning process. In this paper, we introduce the Chain-of-Evidence (CoE) paradigm for VD-RAG. CoE unifies Chain-of-Thought (CoT) reasoning and visual evidence attribution by grounding reference elements in reasoning steps to specific regions with bounding boxes and page indexes. To enable VLMs to generate such evidence-grounded reasoning, we propose Look As You Think (LAT), a reinforcement learning framework that trains models to produce verifiable reasoning paths with consistent attribution. During training, LAT evaluates the attribution consistency of each evidence region and provides rewards only when the CoE trajectory yields correct answers, encouraging process-level self-verification. Experiments on vanilla Qwen2.5-VL-7B-Instruct with Paper‑ and Wiki‑VISA benchmarks show that LAT consistently improves the vanilla model in both single- and multi-image settings, yielding average gains of 8.23% in soft exact match (EM) and 47.0% in [email protected]. Meanwhile, LAT not only outperforms the supervised fine-tuning baseline, which is trained to directly produce answers with attribution, but also exhibits stronger generalization across domains.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Melodia: Training-Free Music Editing Guided by Attention Probing in Diffusion Models

Yi Yang
Haowen Li
Tianxiang Li
Boyu Cao
Xiaohan Zhang
Liqun Chen
Qi Liu

Text-to-music generation technology is progressing rapidly, creating new opportunities for musical composition and editing. However, existing music editing methods often fail to preserve the source music's temporal structure, including melody and rhythm, when altering particular attributes like instrument, genre, and mood. To address this challenge, this paper conducts an in-depth probing analysis on attention maps within AudioLDM 2, a diffusion-based model commonly used as the backbone for existing music editing methods. We reveal a key finding: cross-attention maps encompass details regarding distinct musical characteristics, and interventions on these maps frequently result in ineffective modifications. In contrast, self-attention maps are essential for preserving the temporal structure of the source music during its conversion into the target music. Building upon this understanding, we present Melodia, a training-free technique that selectively manipulates self-attention maps in particular layers during the denoising process and leverages an attention repository to store source music information, achieving accurate modification of musical characteristics while preserving the original structure without requiring textual descriptions of the source music. Additionally, we propose two novel metrics to better evaluate music editing methods. Both objective and subjective experiments demonstrate that our approach achieves superior results in terms of textual adherence and structural integrity across various datasets. This research enhances comprehension of internal mechanisms within music generation models and provides improved control for music creation.

PDF Details DOI

EAAI Journal 2026 Journal Article

Multi-feature unsupervised time series anomaly detection based on memory-augmented autoencoder - One-Class support vector machine

Guocheng Hao
Hanxing Ruan
Yuxin Li
Qi Liu
Cong Liu
Zhekang Wang
Xiangbo Li
Jiantao Yu

Details DOI

AAAI Conference 2026 Conference Paper

QueryCraft: Transformer-Guided Query Initialization for Enhanced Human-Object Interaction Detection

Yuxiao Wang
Wolin Liang
Yu Lei
Weiying Xue
Nan Zhuang
Qi Liu

Human-Object Interaction (HOI) detection aims to localize human-object pairs and recognize their interactions in images. Although DETR-based methods have recently emerged as the mainstream framework for HOI detection, they still suffer from a key limitation: Randomly initialized queries lack explicit semantics, leading to suboptimal detection performance. To address this challenge, we propose QueryCraft, a novel plug-and-play HOI detection framework that incorporates semantic priors and guided feature learning through transformer-based query initialization. Central to our approach is ACTOR (Action-aware Cross-modal TransfORmer), a cross-modal Transformer encoder that jointly attends to visual regions and textual prompts to extract action-relevant features. Rather than merely aligning modalities, ACTOR leverages language-guided attention to infer interaction semantics and produce semantically meaningful query representations. To further enhance object-level query quality, we introduce a Perceptual Distilled Query Decoder (PDQD), which distills object category awareness from a pre-trained detector to serve as object query initiation. This dual-branch query initialization enables the model to generate more interpretable and effective queries for HOI detection. Extensive experiments on HICO-Det and V-COCO benchmarks demonstrate that our method achieves state-of-the-art performance and strong generalization.

PDF Details DOI

AAAI Conference 2026 Conference Paper

TEMPLE: Incentivizing Temporal Understanding of Video Large Language Models via Progressive Pre-SFT Alignment

Shicheng Li
Lei Li
Kun Ouyang
Shuhuai Ren
Yuanxin Liu
Yuanxing Zhang
Fuzheng Zhang
Lingpeng Kong

Video Large Language Models (Video LLMs) have achieved significant success by adopting the paradigm of large-scale pre-training followed by supervised fine-tuning (SFT). However, existing approaches struggle with temporal reasoning due to weak temporal correspondence in the data and over-reliance on the next-token prediction paradigm, which collectively result in the absence temporal supervision. To address these limitations, we propose TEMPLE (TEMporal Preference Learning), a systematic framework that enhances temporal reasoning capabilities through Direct Preference Optimization (DPO). To address temporal information scarcity in data, we introduce an automated pipeline for systematically constructing temporality-intensive preference pairs comprising three steps: selecting temporally rich videos, designing video-specific perturbation strategies, and evaluating model responses on clean and perturbed inputs. Complementing this data pipeline, we provide additional supervision signals via preference learning and propose a novel Progressive Pre-SFT Alignment strategy featuring two key innovations: a curriculum learning strategy which progressively increases perturbation difficulty to maximize data efficiency; and applying preference optimization before instruction tuning to incentivize fundamental temporal alignment. Extensive experiments demonstrate that our approach consistently improves Video LLM performance across multiple benchmarks with a relatively small set of self-generated DPO data. Our findings highlight TEMPLE as a scalable and efficient complement to SFT-based methods, paving the way for developing reliable Video LLMs.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Themis: Automated Constraint-Aware Test Synthesis Framework for Code Reinforcement Learning

Shengyu Ye
Qi Liu
Hao Jiang
Zheng Zhang
Heng Yu
Zhenya Huang

Reinforcement learning (RL) has shown promise for enhancing code generation capabilities in large language models (LLMs), yet its effectiveness critically depends on high-quality test suites for reliable reward signals. Current approaches suffer from inadequate test case quantity and quality, leading to false positives (incorrect solutions passing verification) and slow positives (valid but suboptimal implementations), which corrupt RL training dynamics. We address these challenges through three key contributions: (1) We systematically analyze how low-quality test suites degrade Code RL performance via reward misalignment; (2) We propose Themis, an automated framework that transforms test case generation into code synthesis—first extracting problem constraints via template-guided parsing, then generating executable test generators through LLM-powered code synthesis, and finally validating tests through constraint-aware filtering; (3) We develop an error-guided test case reduction method that preserves error detection efficacy while reducing test set cardinality, thereby enhancing reinforcement learning training efficiency. Evaluated on programming competition datasets, Themis achieves 95 percent error detection rates, outperforming original test suites in most of the cases. When integrated into RL pipelines, models trained with Themis-generated tests demonstrate consistent 3-5 percent improvements across HumanEval, MBPP, and LiveCodeBench compared to the baseline, matching performance levels achieved with manually curated test suites. Our constraint-aware test synthesis framework ensures full automation while preserving semantic validity—critical for scaling RL training to complex code generation tasks. The framework's modular design also enables seamless integration with existing code data synthesis frameworks.

PDF Details DOI

EAAI Journal 2026 Journal Article

Trend-aware forecasting of pipeline displacement in thermal power plants via variational mode decomposition-aware temporal transformer

Minglu Dai
Qi Liu
Shuaiming Niu
Congcong Li
Jingdan Yuan
Xinyao Yu

Details DOI

AAAI Conference 2026 Conference Paper

What-Meets-Where: Unified Learning of Action and Contact Localization in Images

Yuxiao Wang
Yu Lei
Wolin Liang
Weiying Xue
Zhenao Wei
Nan Zhuang
Qi Liu

People control their bodies to establish contact with the environment. To comprehensively understand actions across diverse visual contexts, it is essential to simultaneously consider what action is occurring and where it is happening. Current methodologies, however, often inadequately capture this duality, typically failing to jointly model both action semantics and their spatial contextualization within scenes. To bridge this gap, we introduce a novel vision task that simultaneously predicts high-level action semantics and fine-grained body-part contact regions. Our proposed framework, PaIR-Net, comprises three key components: the Contact Prior Aware Module (CPAM) for identifying contact-relevant body parts, the Prior-Guided Concat Segmenter (PGCS) for pixel-wise contact segmentation, and the Interaction Inference Module (IIM) responsible for integrating global interaction relationships. To facilitate this task, we present PaIR (Part-aware Interaction Representation), a comprehensive dataset containing 13,979 images that encompass 654 actions, 80 object categories, and 17 body parts. Experimental evaluation demonstrates that PaIR-Net significantly outperforms baseline approaches, while ablation studies confirm the efficacy of each architectural component.

PDF Details DOI

EAAI Journal 2025 Journal Article

A binary particle swarm optimization with dual encoding mechanism for feature selection

Chong Zhou
Rumeng Liang
Qi Liu
Sirui Niu

Details DOI

NeurIPS Conference 2025 Conference Paper

A Closed-Form Solution for Fast and Reliable Adaptive Testing

Yan Zhuang
Chenye Ke
Zirui Liu
Qi Liu
Yuting Ning
Zhenya Huang
Weizhe Huang
Qingyang Mao

Human ability estimation is essential for educational assessment, career advancement, and professional certification. Adaptive Testing systems can improve estimation efficiency by selecting fewer, targeted questions, and are widely used in exams, e. g. , GRE, GMAT, and Duolingo English Test. However, selecting an optimal subset of questions remains a challenging nested optimization problem. Existing methods rely on costly approximations or data-intensive training, making them unsuitable for today's large-scale and complex testing environments. Thus, we propose a Closed-Form solution for question subset selection in Adaptive Testing. It directly minimizes ability estimation error by reducing ability parameter's gradient bias while maintaining Hessian stability, which enables a simple greedy algorithm for question selection. Moreover, it can quantify the impact of human behavioral perturbations on ability estimation. Extensive experiments on large-scale educational datasets demonstrate that it reduces the number of required questions by 10% compared to SOTA methods, while maintaining the same estimation accuracy.