Author name cluster

Kaiqi Huang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

40 papers

2 author rows

AAAI Conference 2026 Conference Paper

CausalStep: A Benchmark for Explicit Stepwise Causal Reasoning in Videos

Xuchen Li
Xuzhao Li
Shiyu Hu
Kaiqi Huang
Wentao Zhang

Recent advances in large language models (LLMs) have improved reasoning in text and image domains, yet achieving robust video reasoning remains a significant challenge. Existing video benchmarks mainly assess shallow understanding and reasoning and allow models to exploit global context, failing to rigorously evaluate true causal and stepwise reasoning. We present CausalStep, a benchmark designed for explicit stepwise causal reasoning in videos. CausalStep segments videos into causally linked units and enforces a strict stepwise question-answer (QA) protocol, requiring sequential answers and preventing shortcut solutions. Each question includes carefully constructed distractors based on error type taxonomy to ensure diagnostic value. The benchmark features 100 videos across six categories and 1,852 multiple-choice QA pairs. We introduce seven diagnostic metrics for comprehensive evaluation, enabling precise diagnosis of causal reasoning capabilities. Experiments with leading proprietary and open-source models, as well as human baselines, reveal a significant gap between current models and human-level stepwise reasoning. CausalStep provides a rigorous benchmark to drive progress in robust and interpretable video reasoning.

PDF Details DOI

AAAI Conference 2026 Conference Paper

ImagerySearch: Adaptive Test-Time Search for Video Generation Beyond Semantic Dependency Constraints

Meiqi Wu
Jiashu Zhu
Xiaokun Feng
Chubin Chen
Chen Zhu
Bingze Song
Fangyuan Mao
Jiahong Wu

Video generation models have achieved remarkable progress, particularly excelling in realistic scenarios; however, their performance degrades notably in imaginative scenarios. These prompts often involve rarely co-occurring concepts with long-distance semantic relationships, falling outside training distributions. Existing methods typically apply test-time scaling for improving video quality, but their fixed search spaces and static reward designs limit adaptability to imaginative scenarios. To fill this gap, we propose ImagerySearch, a dynamic test-time scaling law strategy inspired by imagery that adaptively adjusts the inference search space and reward guided by prompts, effectively enhancing generation quality in imaginative scenarios. Furthermore, we introduce LDT-Bench, the first benchmark targeting long-distance semantic prompts, designed to evaluate the creativity of video generation models. It comprises 2,839 challenging concept pairs from diverse recognition datasets and incorporates an automatic evaluation protocol to assess creative capacity. Extensive experiments on LDT-Bench demonstrate that our approach consistently outperforms general generation models and test-time scaling approaches. Additionally, ImagerySearch achieves strong performance on VBench, confirming its effectiveness in improving video generation quality under diverse conditions.

PDF Details DOI

AAAI Conference 2026 Conference Paper

No-Regret Strategy Solving in Imperfect-Information Games via Pre-Trained Embedding

Yanchang Fu
Shengda Liu
Pei Xu
Kaiqi Huang

High-quality information set abstraction remains a core challenge in solving large-scale imperfect-information extensive-form games (IIEFGs)--such as no-limit Texas Hold’em--where the finite nature of spatial resources hinders solving strategies for the full game. State-of-the-art AI methods rely on pre-trained discrete clustering for abstraction, yet their hard classification irreversibly discards critical information: specifically, the quantifiable subtle differences between information sets--vital for strategy solving--thus compromising the quality of such solving. Inspired by the word embedding paradigm in natural language processing, this paper proposes the Embedding CFR algorithm, a novel approach for solving strategies in IIEFGs within an embedding space. The algorithm pre-trains and embeds the features of individual information sets into an interconnected low-dimensional continuous space, where the resulting vectors more precisely capture both the distinctions and connections between information sets. Embedding CFR introduces a strategy-solving process driven by regret accumulation and strategy updates in this embedding space, with supporting theoretical analysis verifying its ability to reduce cumulative regret. Experiments on poker show that with the same spatial overhead, Embedding CFR achieves significantly faster exploitability convergence compared to cluster-based abstraction algorithms, confirming its effectiveness. Furthermore, to our knowledge, it is the first algorithm in poker AI that pre-trains information set abstractions via low-dimensional embedding for strategy solving.

PDF Details DOI

AAAI Conference 2026 Conference Paper

RefRea: Reference-Guided Reasoning with Meta-Cognition for Accurate Language Model Agents

Yuxiang Mai
Qiyue Yin
Wancheng Ni
Jianwei Guo
Xiaogang Ouyang
Pei Xu
Kaiqi Huang

In recent years, with the rapid development of large language models (LLMs), LLM-based agents have achieved remarkable progress across a wide range of tasks. However, reasoning inconsistencies in LLMs still significantly limit the performance of agents in complex decision-making scenarios. Cognitive science research suggests that individuals can benefit from observing others' explicit thinking processes to improve their strategy-making. Inspired by this mechanism, we propose Reference-guided Reasoning with meta-cognition (RefRea), a novel approach that enhances decision-making by introducing a reference language model to guide and calibrate the reasoning model's actions. RefRea enhances reasoning accuracy and stability by integrating a reference model and a meta-cognition module. The reference model relies solely on validated meta-cognition for consistent guidance, while the reasoning model interacts with the environment using both validated and exploratory meta-cognition. Guidance is provided by comparing the action similarity between the reference and reasoning models. This process is supported by the meta-cognition module, which generates summary knowledge by reflecting on action history and environmental feedback, leading to more adaptive and reliable behavior. We evaluate our algorithm in the text-based reasoning environment ScienceWorld. Experimental results demonstrate that RefRea outperforms state-of-the-art methods. Comprehensive ablation studies further highlight the effectiveness of both the reference model and the meta-cognition module.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

Constructive Conflict-Driven Multi-Agent Reinforcement Learning for Strategic Diversity

Yuxiang Mai
Qiyue Yin
Wancheng Ni
Pei Xu
Kaiqi Huang

In recent years, diversity has emerged as a useful mechanism to enhance the efficiency of multi-agent reinforcement learning (MARL). However, existing methods predominantly focus on designing policies based on individual agent characteristics, often neglecting the interplay and mutual influence among agents during policy formation. To address this gap, we propose Competitive Diversity through Constructive Conflict (CoDiCon), a novel approach that incorporates competitive incentives into cooperative scenarios to encourage policy exchange and foster strategic diversity among agents. Drawing inspiration from sociological research, which highlights the benefits of moderate competition and constructive conflict in group decision-making, we design an intrinsic reward mechanism using ranking features to introduce competitive motivations. A centralized intrinsic reward module generates and distributes varying reward values to agents, ensuring an effective balance between competition and cooperation. By optimizing the parameterized centralized reward module to maximize environmental rewards, we reformulate the constrained bilevel optimization problem to align with the original task objectives. We evaluate our algorithm against state-of-the-art methods in the SMAC and GRF environments. Experimental results demonstrate that CoDiCon achieves superior performance, with competitive intrinsic rewards effectively promoting diverse and adaptive strategies among cooperative agents.

PDF Details DOI

ICML Conference 2025 Conference Paper

CSTrack: Enhancing RGB-X Tracking via Compact Spatiotemporal Features

Xiaokun Feng
Dailing Zhang
Shiyu Hu
Xuchen Li 0001
Meiqi Wu
Jing Zhang 0110
Xiaotang Chen
Kaiqi Huang

Effectively modeling and utilizing spatiotemporal features from RGB and other modalities (e. g. , depth, thermal, and event data, denoted as X) is the core of RGB-X tracker design. Existing methods often employ two parallel branches to separately process the RGB and X input streams, requiring the model to simultaneously handle two dispersed feature spaces, which complicates both the model structure and computation process. More critically, intra-modality spatial modeling within each dispersed space incurs substantial computational overhead, limiting resources for inter-modality spatial modeling and temporal modeling. To address this, we propose a novel tracker, CSTrack, which focuses on modeling Compact Spatiotemporal features to achieve simple yet effective tracking. Specifically, we first introduce an innovative Spatial Compact Module that integrates the RGB-X dual input streams into a compact spatial feature, enabling thorough intra- and inter-modality spatial modeling. Additionally, we design an efficient Temporal Compact Module that compactly represents temporal features by constructing the refined target distribution heatmap. Extensive experiments validate the effectiveness of our compact spatiotemporal modeling method, with CSTrack achieving new SOTA results on mainstream RGB-X benchmarks. The code and models will be released at: https: //github. com/XiaokunFeng/CSTrack.

Details

ICML Conference 2025 Conference Paper

LLM Data Selection and Utilization via Dynamic Bi-level Optimization

Yang Yu 0056
Kai Han 0002
Hang Zhou
Yehui Tang 0001
Kaiqi Huang
Yunhe Wang 0001
Dacheng Tao

While large-scale training data is fundamental for developing capable large language models (LLMs), strategically selecting high-quality data has emerged as a critical approach to enhance training efficiency and reduce computational costs. Current data selection methodologies predominantly rely on static, training-agnostic criteria, failing to account for the dynamic model training and data interactions. In this paper, we propose a new Data Weighting Model (DWM) to adjust the weight of selected data within each batch to achieve a dynamic data utilization during LLM training. Specially, to better capture the dynamic data preference of the trained model, a bi-level optimization framework is implemented to update the weighting model. Our experiments demonstrate that DWM enhances the performance of models trained with randomly-selected data, and the learned weighting model can be transferred to enhance other data selection methods and models of different sizes. Moreover, we further analyze how a model’s data preferences evolve throughout training, providing new insights into the data preference of the model during training.

Details

AAAI Conference 2025 Conference Paper

Sequential Preference Optimization: Multi-Dimensional Preference Alignment with Implicit Reward Modeling

Xingzhou Lou
Junge Zhang
Jian Xie
Lifeng Liu
Dong Yan
Kaiqi Huang

Human preference alignment is critical in building powerful and reliable large language models (LLMs). However, current methods either ignore the multi-dimensionality of human preferences (e.g. helpfulness and harmlessness) or struggle with the complexity of managing multiple reward models. To address these issues, we propose Sequential Preference Optimization (SPO), a method that sequentially fine-tunes LLMs to align with multiple dimensions of human preferences. SPO avoids explicit reward modeling, directly optimizing the models to align with nuanced human preferences. We theoretically derive closed-form optimal SPO policy and loss function. Gradient analysis is conducted to show how SPO manages to fine-tune the LLMs while maintaining alignment on previously optimized dimensions. Empirical results on LLMs of different size and multiple evaluation datasets demonstrate that SPO successfully aligns LLMs across multiple dimensions of human preferences and significantly outperforms the baselines.

PDF Details DOI

AAMAS Conference 2025 Conference Paper

Uncertainty-Aware Opponent Modeling for Deep Reinforcement Learning

Likun Yang
Pei Xu
Shiyue Cao
Yongjian Ren
Xiaotang Chen
Kaiqi Huang

The ability to model opponent behavior is essential for autonomous decision-making in multi-agent games. Although stochastic behavior is universal in real-world situations, previous works have struggled to model opponents with high stochasticity, such as humans. The issue arises because stochasticity in opponent behavior introduces significant uncertainty into the opponent modeling process, which existing methods have not adequately addressed. We introduce a novel Uncertainty-Aware Opponent Modeling (UAOM) method that addresses two key sources of uncertainty stemming from the inherent randomness of the opponent’s actions. The first pertains to the uncertainty in constructing the opponent model, while the second concerns the uncertainty in applying the model during decision-making. For the first uncertainty, UAOM uses a hybrid behavior modeling module to learn a more powerful opponentaware representation by ensembling the deterministic and probabilistic models to address both aleatoric and epistemic uncertainties in opponent modeling. For the second uncertainty, UAOM uses an opponent-aware dynamic modeling module to learn a dynamicaware representation. We further provide a theoretical analysis showing that jointly optimizing our two modules can enhance downstream reinforcement learning performance while ensuring system convergence. We evaluate UAOM in both simulated settings This work is licensed under a Creative Commons Attribution International 4. 0 License. Proc. of the 24th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2025), Y. Vorobeychik, S. Das, A. Nowé (eds.), May 19 – 23, 2025, Detroit, Michigan, USA. © 2025 International Foundation for Autonomous Agents and Multiagent Systems (www. ifaamas. org). and human-agent interaction scenarios. Our experimental results show that the proposed method significantly enhances performance when facing opponents with varying degrees of stochastic behavior, while efficiently managing the uncertainties introduced by such opponents.

PDF

NeurIPS Conference 2025 Conference Paper

Unveiling Chain of Step Reasoning for Vision-Language Models with Fine-grained Rewards

Honghao Chen
Xingzhou Lou
Xiaokun Feng
Kaiqi Huang
Xinlong Wang

Chain of thought reasoning has demonstrated remarkable success in large language models, yet its adaptation to vision-language reasoning remains an open challenge with unclear best practices. Existing attempts typically employ reasoning chains at a coarse-grained level, which struggles to perform fine-grained structured reasoning and, more importantly, are difficult to evaluate the reward and quality of intermediate reasoning. In this work, we delve into chain of step reasoning for vision-language models, enabling assessing reasoning step quality accurately and leading to effective reinforcement learning and inference-time scaling with fine-grained rewards. We present a simple, effective, and fully transparent framework, including the step-level reasoning data, process reward model (PRM), and reinforcement learning training. With the proposed approaches, our models set strong baselines with consistent improvements on challenging vision-language benchmarks. More importantly, we conduct a thorough empirical analysis and ablation study, unveiling the impact of each component and several intriguing properties of inference-time scaling. We believe this paper serves as a baseline for vision-language models and offers insights into more complex multimodal reasoning. Our dataset, PRM, and code at https: //github. com/baaivision/CoS.