Author name cluster

Xiangqi Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

4 papers

1 author row

AAAI Conference 2026 Conference Paper

SPA: Achieving Consensus in LLM Alignment via Self-Priority Optimization

Yue Huang
Xiangqi Wang
Xiangliang Zhang

In high-stakes scenarios—such as self-harm, legal, or medical queries—LLMs must be both trustworthy and helpful. However, these goals often conflict. We propose priority alignment, a new alignment paradigm that enforces a strict “trustworthy-before-helpful” ordering: optimization of helpfulness is conditioned on first meeting trustworthy thresholds (e.g., harmlessness or honesty). To realize this, we introduce Self-Priority Alignment (SPA)—a fully unsupervised framework that generates diverse responses, self-evaluates them and refines them by the model itself, and applies dual-criterion denoising to remove inconsistency and control variance. From this, SPA constructs lexicographically ordered preference pairs and fine-tunes the model using an uncertainty-weighted alignment loss that emphasizes high-confidence, high-gap decisions. Experiments across multiple benchmarks show that SPA improves helpfulness without compromising safety, outperforming strong baselines while preserving general capabilities. Our results demonstrate that SPA provides a scalable and interpretable alignment strategy for critical LLM applications.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

AdaReasoner: Adaptive Reasoning Enables More Flexible Thinking

Xiangqi Wang
Yue Huang
Yanbo Wang
Xiaonan Luo
Kehan Guo
Yujun Zhou
Xiangliang Zhang

LLMs often need effective configurations, like temperature and reasoning steps, to handle tasks requiring sophisticated reasoning and problem-solving, ranging from joke generation to mathematical reasoning. Existing prompting approaches usually adopt general-purpose, fixed configurations that work “well enough” across tasks but seldom achieve task-specific optimality. To address this gap, we introduce AdaReasoner, an LLM-agnostic plugin designed for any LLM to automate adaptive reasoning configurations for tasks requiring different types of thinking. AdaReasoner is trained using a reinforcement learning (RL) framework, combining a factorized action space with a targeted exploration strategy, along with a pretrained reward model to optimize the policy model for reasoning configurations with only a few-shot guide. AdaReasoner is backed by theoretical guarantees and experiments of fast convergence and a sublinear policy gap. Across six different LLMs and a variety of reasoning tasks, it consistently outperforms standard baselines, preserves out-of-distribution robustness, and yield gains on knowledge-intensive tasks through tailored prompts.

PDF Details

NeurIPS Conference 2025 Conference Paper

DyFlow: Dynamic Workflow Framework for Agentic Reasoning

Yanbo Wang
Zixiang Xu
Yue Huang
Xiangqi Wang
Zirui Song
Lang Gao
Chenxi Wang
Robert Tang

Agent systems based on large language models (LLMs) have shown great potential in complex reasoning tasks, but building efficient and generalizable workflows remains a major challenge. Most existing approaches rely on manually designed processes, which limits their adaptability across different tasks. While a few methods attempt automated workflow generation, they are often tied to specific datasets or query types and make limited use of intermediate feedback, reducing system robustness and reasoning depth. Moreover, their operations are typically predefined and inflexible. To address these limitations, we propose DyFlow, a dynamic workflow generation framework that adaptively constructs and adjusts reasoning procedures based on task requirements and real-time intermediate feedback, thereby enhancing cross-task generalization. DyFlow consists of two core components: a designer and an executor. The designer decomposes complex problems into a sequence of sub-goals defined by high-level objectives and dynamically plans the next steps based on intermediate outputs and feedback. These plans are then carried out by the executor, which executes each operation using dynamic operators with context-aware parameterization, enabling flexible and semantically grounded reasoning. We systematically evaluate DyFlow across diverse domains, including social reasoning, biomedical tasks, mathematical problem solving, and code generation. Results demonstrate that DyFlow significantly outperforms existing baselines, achieving substantial Pass@k improvements and exhibiting robust generalization across diverse domains.

PDF Details

TMLR Journal 2025 Journal Article

Fair Online Influence Maximization

Xiangqi Wang
Shaokun Zhang
Jose Efraim Aguilar Escamilla
Qingyun Wu
Xiangliang Zhang
Jian Kang
Huazheng Wang

Fair influence maximization in networks has been actively studied to ensure equity in fields like viral marketing and public health. Existing studies often assume an offline setting, meaning that the learner identifies a set of seed nodes with known per-edge activation probabilities. In this paper, we study the problem of fair online influence maximization, i.e., without knowing the ground-truth activation probabilities. The learner in this problem aims to maximally propagate the information among demographic groups, while interactively selecting seed nodes and observing the activation feedback on the fly. We propose Fair Online Influence Maximization (FOIM) framework that can solve the online influence maximization problem under a wide range of fairness notions. Given a fairness notion, FOIM solves the problem with a combinatorial multi-armed bandit algorithm for balancing exploration-exploitation and an offline fair influence maximization oracle for seed nodes selection. FOIM enjoys sublinear regret when the fairness notion satisfies two mild conditions, i.e., monotonicity and bounded smoothness. Our analyses show that common fairness notions, including maximin fairness, diversity fairness, and welfare function, all satisfy the condition, and we prove the corresponding regret upper bounds under these notions. Extensive empirical evaluations on three real-world networks demonstrate the efficacy of our proposed framework.

PDF Details