Arrow Research search

Author name cluster

Qiang He

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

18 papers
2 author rows

Possible papers

18

AAAI Conference 2026 Conference Paper

EnViT: Enhancing the Performance of Early-Exit Vision Transformers via Exit-Aware Structured Dropout-Enabled Self-Distillation

  • Yonghao Dong
  • Qiang He
  • Penghong Rui
  • Zhenzhe Zheng
  • Zhao Li
  • Feifei Chen
  • Hai Jin
  • Yun Yang

Vision Transformers (ViTs) have gained significant attention and widespread adoption due to their impressive performance in various computer vision tasks. However, in practice, their substantial computational overhead often leads to high inference latency and increased overheads when deployed on resource-constrained edge devices like smartphones, autonomous vehicles, and robots. To address these challenges, Early Exit (EE) has emerged as a promising approach for lightweight inference on edge devices. It accelerates inference and reduces computational overhead by adaptively producing predictions through early exits based on sample complexity. Existing EE methods typically suffer from substantial accuracy decreases in late exits while providing only marginal accuracy improvements to early exits. This paper presents EnViT, an exit-aware structured dropout-enabled self-distillation approach that enhances the performance of early exits without compromising late exits. EnViT leverages structured dropout to enable self-distillation, where the full model serves as the teacher and its own virtual sub-models generated by structured dropout as students. This mechanism effectively distills knowledge from the full model to early exits and avoids performance degradation in late exits by mitigating parameter conflicts across exits during training. Evaluation on five datasets shows that our EnViT achieves accuracy improvements ranging from 0.36% to 7.92% while maintaining competitive speed-up ratios of 1.72x to 2.23x.

TMLR Journal 2026 Journal Article

One Model for All: Multi-Objective Controllable Language Models

  • Qiang He
  • Yucheng Yang
  • Tianyi Zhou
  • Meng Fang
  • Mykola Pechenizkiy
  • Setareh Maghsudi

Aligning large language models (LLMs) with human preferences is critical for enhancing LLMs' safety, helpfulness, humor, faithfulness, etc. Current reinforcement learning from human feedback (RLHF) mainly focuses on a fixed reward learned from average human ratings, which may weaken the adaptability and controllability of varying preferences. However, creating personalized LLMs requires aligning LLMs with individual human preferences, which is non-trivial due to the scarce data per user and the diversity of user preferences in multi-objective trade-offs, varying from emphasizing empathy in certain contexts to demanding efficiency and precision in others. Can we train one LLM to produce personalized outputs across different user preferences on the Pareto front? In this paper, we introduce Multi-Objective Control (MOC), which trains a single LLM to directly generate responses in the preference-defined regions of the Pareto front. Our approach introduces multi-objective optimization (MOO) principles into RLHF to train an LLM as a preference-conditioned policy network. We improve the computational efficiency of MOC by applying MOO at the policy level, enabling us to fine-tune a 7B-parameter model on a single A6000 GPU. Extensive experiments demonstrate the advantages of MOC over baselines in three aspects: (i) controllability of LLM outputs w.r.t. user preferences on the trade-off among multiple rewards; (ii) quality and diversity of LLM outputs, measured by the hyper-volume of multiple solutions achieved; and (iii) generalization to unseen preferences. These results highlight MOC's potential for real-world applications requiring scalable and customizable LLMs.

AAAI Conference 2026 Conference Paper

Understanding and Enhancing Differentiable Architecture Search from Information Bottleneck Perspective

  • Haidong Kang
  • Lianbo Ma
  • Pengjun Chen
  • Qiang He
  • Bo Yi

Performance collapse is an intractable issue of Differentiable Architecture Search (DAS), where severe performance degradation of DAS happens when it trains on different search spaces or datasets. We theoretically analyze the issue from the information bottleneck (IB) perspective, and disclose that a solution to overcome this problem is to seek the bifurcation point of IB tradeoff between compression and prediction of the supernet. To this end, we propose a simple yet highly effective method, namely, Batch Entropy-decay Regularization (BER), to guide the learning of DAS, which restricts compression in DAS by imposing a penalty on the architecture parameters. Comprehensive theoretical analyses demonstrate that BER is able to completely resolve DAS's performance collapse issue. Compared with a number of state-of-the-art DAS variants, BER shows its overwhelmingly better performance on 7 search spaces (i.e., NAS-Bench-201, DARTS, S1-S4, MobileNet-like) and 5 popular datasets (i.e., CIFAR-10, CIFAR-100, ImageNet1k, PASCAL VOC 2007, and MS COCO 2017).

AAAI Conference 2025 Conference Paper

DiffGrasp: Whole-Body Grasping Synthesis Guided by Object Motion Using a Diffusion Model

  • Yonghao Zhang
  • Qiang He
  • Yanguang Wan
  • Yinda Zhang
  • Xiaoming Deng
  • Cuixia Ma
  • Hongan Wang

Generating high-quality whole-body human object interaction motion sequences is becoming increasingly important in various fields such as animation, VR/AR, and robotics. The main challenge of this task lies in determining the level of involvement of each hand given the complex shapes of objects in different sizes and their different motion trajectories, while ensuring strong grasping realism and guaranteeing the coordination of movement in all body parts. Contrasting with existing work, which either generates human interaction motion sequences without detailed hand grasping poses or only models a static grasping pose, we propose a simple yet effective framework that jointly models the relationship between the body, hands, and the given object motion sequences within a single diffusion model. To guide our network in perceiving the object's spatial position and learning more natural grasping poses, we introduce novel contact-aware losses and incorporate a data-driven, carefully designed guidance. Experimental results demonstrate that our approach outperforms the state-of-the-art method and generates plausible results.

AAAI Conference 2025 Conference Paper

HHAN: Comprehensive Infectious Disease Source Tracing via Heterogeneous Hypergraph Neural Network

  • Qiang He
  • Yunting Bao
  • Hui Fang
  • Yuting Lin
  • Hao Sun

Infectious diseases have historically had profound effects on global health, economies, and social structures. Effective tracing of infectious diseases is essential not only for immediate public health responses but also for shaping future prevention strategies. Traditional tracing methods often emphasize homogeneous networks, overlooking the diverse transmission characteristics of heterogeneous populations. This research addresses two critical challenges: the heterogeneity of transmission across various media and modes, and the significant yet underexplored influence of community structures on epidemic spread and tracing.We propose a Heterogeneous Hypergraph Attention Network (HHAN) modelthat accounts for multiple transmission pathways and patterns within heterogeneous networks. HHAN integrates a heterogeneous graph neural network module to handle the complexity of communication among different populations, and an Agent-Based Modeling Module that combines agent-based ideas to model individual behaviors. This approach effectively captures complex interactions within community structures and addresses individual variability. Experimental results on three real-world datasets demonstrate that the HHAN model significantly outperforms other state-of-the-art methods in tackling the complex challenge of tracing infectious diseases in heterogeneous populations.

ICML Conference 2025 Conference Paper

Learning from Loss Landscape: Generalizable Mixed-Precision Quantization via Adaptive Sharpness-Aware Gradient Aligning

  • Lianbo Ma
  • Jianlun Ma
  • Yuee Zhou
  • Guoyang Xie
  • Qiang He
  • Zhichao Lu

Mixed Precision Quantization (MPQ) has become an essential technique for optimizing neural network by determining the optimal bitwidth per layer. Existing MPQ methods, however, face a major hurdle: they require a computationally expensive search for quantization strategies on large-scale datasets. To resolve this issue, we introduce a novel approach that first searches for quantization strategies on small datasets and then generalizes them to large-scale datasets. This approach simplifies the process, eliminating the need for large-scale quantization fine-tuning and only necessitating model weight adjustment. Our method is characterized by three key techniques: sharpness-aware minimization for enhanced quantized model generalization, implicit gradient direction alignment to handle gradient conflicts among different optimization objectives, and an adaptive perturbation radius to accelerate optimization. It offers advantages such as no intricate computation of feature maps and high search efficiency. Both theoretical analysis and experimental results validate our approach. Using the CIFAR10 dataset (just 0. 5% the size of ImageNet training data) for MPQ policy search, we achieved equivalent accuracy on ImageNet with a significantly lower computational cost, while improving efficiency by up to 150% over the baselines.

NeurIPS Conference 2025 Conference Paper

Sim-LLM: Optimizing LLM Inference at the Edge through Inter-Task KV Reuse

  • Ruikun Luo
  • Changwei Gu
  • Qiang He
  • Feifei Chen
  • Song Wu
  • Hai Jin
  • Yun Yang

KV cache technology, by storing key-value pairs, helps reduce the computational overhead incurred by large language models (LLMs). It facilitates their deployment on resource-constrained edge computing nodes like edge servers. However, as the complexity and size of tasks increase, KV cache usage leads to substantial GPU memory consumption. Existing research has focused on mitigating KV cache memory usage through sequence length reduction, task-specific compression, and dynamic eviction policies. However, these methods are computationally expensive for resource-constrained edge computing nodes. To tackle this challenge, this paper presents Sim-LLM, a novel inference optimization mechanism that leverages task similarity to reduce KV cache memory consumption for LLMs. By caching KVs from processed tasks and reusing them for subsequent similar tasks during inference, Sim-LLM significantly reduces memory consumption while boosting system throughput and increasing maximum batch size, all with minimal accuracy degradation. Evaluated on both A40 and A100 GPUs, Sim-LLM achieves a system throughput improvement of up to 39. 40\% and a memory reduction of up to 34. 65%, compared to state-of-the-art approaches. Our source code is available at https: //github. com/CGCL-codes/SimLLM.

ICLR Conference 2024 Conference Paper

Adaptive Regularization of Representation Rank as an Implicit Constraint of Bellman Equation

  • Qiang He
  • Tianyi Zhou 0001
  • Meng Fang
  • Setareh Maghsudi

Representation rank is an important concept for understanding the role of Neural Networks (NNs) in Deep Reinforcement learning (DRL), which measures the expressive capacity of value networks. Existing studies focus on unboundedly maximizing this rank; nevertheless, that approach would introduce overly complex models in the learning, thus undermining performance. Hence, fine-tuning representation rank presents a challenging and crucial optimization problem. To address this issue, we find a guiding principle for adaptive control of the representation rank. We employ the Bellman equation as a theoretical foundation and derive an upper bound on the cosine similarity of consecutive state-action pairs representations of value networks. We then leverage this upper bound to propose a novel regularizer, namely BEllman Equation-based automatic rank Regularizer (BEER). This regularizer adaptively regularizes the representation rank, thus improving the DRL agent's performance. We first validate the effectiveness of automatic control of rank on illustrative experiments. Then, we scale up BEER to complex continuous control tasks by combining it with the deterministic policy gradient method. Among 12 challenging DeepMind control tasks, BEER outperforms the baselines by a large margin. Besides, BEER demonstrates significant advantages in Q-value approximation. Our code is available at https://github.com/sweetice/BEER-ICLR2024.

ICML Conference 2024 Conference Paper

Advancing DRL Agents in Commercial Fighting Games: Training, Integration, and Agent-Human Alignment

  • Chen Zhang
  • Qiang He
  • Yuan Zhou
  • Elvis S. Liu
  • Hong Wang
  • Jian Zhao 0010
  • Yang Wang

Deep Reinforcement Learning (DRL) agents have demonstrated impressive success in a wide range of game genres. However, existing research primarily focuses on optimizing DRL competence rather than addressing the challenge of prolonged player interaction. In this paper, we propose a practical DRL agent system for fighting games named Shūkai, which has been successfully deployed to Naruto Mobile, a popular fighting game with over 100 million registered users. Shūkai quantifies the state to enhance generalizability, introducing Heterogeneous League Training (HELT) to achieve balanced competence, generalizability, and training efficiency. Furthermore, Shūkai implements specific rewards to align the agent’s behavior with human expectations. Shūkai ’s ability to generalize is demonstrated by its consistent competence across all characters, even though it was trained on only 13% of them. Additionally, HELT exhibits a remarkable 22% improvement in sample efficiency. Shūkai serves as a valuable training partner for players in Naruto Mobile, enabling them to enhance their abilities and skills.

ICLR Conference 2024 Conference Paper

Task Adaptation from Skills: Information Geometry, Disentanglement, and New Objectives for Unsupervised Reinforcement Learning

  • Yucheng Yang
  • Tianyi Zhou 0001
  • Qiang He
  • Lei Han 0001
  • Mykola Pechenizkiy
  • Meng Fang

Unsupervised reinforcement learning (URL) aims to learn general skills for unseen downstream tasks. Mutual Information Skill Learning (MISL) addresses URL by maximizing the mutual information between states and skills but lacks sufficient theoretical analysis, e.g., how well its learned skills can initialize a downstream task's policy. Our new theoretical analysis shows that the diversity and separatability of learned skills are fundamentally critical to downstream task adaptation but MISL does not necessarily guarantee them. To improve MISL, we propose a novel disentanglement metric LSEPIN and build an information-geometric connection between LSEPIN and downstream task adaptation cost. For better geometric properties, we investigate a new strategy that replaces the KL divergence in information geometry with Wasserstein distance. We extend the geometric analysis to it, which leads to a novel skill-learning objective WSEP. It is theoretically justified to be helpful to task adaptation and it is capable of discovering more initial policies for downstream tasks than MISL. We further propose a Wasserstein distance-based algorithm PWSEP can theoretically discover all potentially optimal initial policies.

AAMAS Conference 2023 Conference Paper

Centralized Cooperative Exploration Policy for Continuous Control Tasks

  • Chao Li
  • Chen Gong
  • Qiang He
  • Xinwen Hou
  • Yu Liu

Despite recent works making great progress in continuous control tasks, exploration in these tasks has remained insufficiently investigated. This paper proposes CCEP (Centralized Cooperative Exploration Policy), which utilizes estimation biases of value functions to contribute to the exploration capacity. CCEP keeps two value functions initialized with different parameters, and generates diverse policies with multiple exploration styles from a pair of value functions. In addition, a centralized policy framework ensures that CCEP achieves message delivery between multiple policies, furthermore contributing to exploring the environment cooperatively. Extensive experimental results demonstrate that CCEP achieves higher exploration capacity. Empirical analysis shows diverse exploration styles in the learned policies by CCEP, reaping benefits in more exploration regions. Besides, the exploration capabilities of CCEP have been demonstrated to outperform current state-ofthe-art methods on multiple continuous control tasks.

NeurIPS Conference 2023 Conference Paper

Diffusion Model for Graph Inverse Problems: Towards Effective Source Localization on Complex Networks

  • Xin Yan
  • Hui Fang
  • Qiang He

Information diffusion problems, such as the spread of epidemics or rumors, are widespread in society. The inverse problems of graph diffusion, which involve locating the sources and identifying the paths of diffusion based on currently observed diffusion graphs, are crucial to controlling the spread of information. The problem of localizing the source of diffusion is highly ill-posed, presenting a major obstacle in accurately assessing the uncertainty involved. Besides, while comprehending how information diffuses through a graph is crucial, there is a scarcity of research on reconstructing the paths of information propagation. To tackle these challenges, we propose a probabilistic model called DDMSL (Discrete Diffusion Model for Source Localization). Our approach is based on the natural diffusion process of information propagation over complex networks, which can be formulated using a message-passing function. First, we model the forward diffusion of information using Markov chains. Then, we design a reversible residual network to construct a denoising-diffusion model in discrete space for both source localization and reconstruction of information diffusion paths. We provide rigorous theoretical guarantees for DDMSL and demonstrate its effectiveness through extensive experiments on five real-world datasets.

NeurIPS Conference 2023 Conference Paper

Keep Various Trajectories: Promoting Exploration of Ensemble Policies in Continuous Control

  • Chao Li
  • Chen Gong
  • Qiang He
  • Xinwen Hou

The combination of deep reinforcement learning (DRL) with ensemble methods has been proved to be highly effective in addressing complex sequential decision-making problems. This success can be primarily attributed to the utilization of multiple models, which enhances both the robustness of the policy and the accuracy of value function estimation. However, there has been limited analysis of the empirical success of current ensemble RL methods thus far. Our new analysis reveals that the sample efficiency of previous ensemble DRL algorithms may be limited by sub-policies that are not as diverse as they could be. Motivated by these findings, our study introduces a new ensemble RL algorithm, termed \textbf{T}rajectories-awar\textbf{E} \textbf{E}nsemble exploratio\textbf{N} (TEEN). The primary goal of TEEN is to maximize the expected return while promoting more diverse trajectories. Through extensive experiments, we demonstrate that TEEN not only enhances the sample diversity of the ensemble policy compared to using sub-policies alone but also improves the performance over ensemble RL algorithms. On average, TEEN outperforms the baseline ensemble DRL algorithms by 41\% in performance on the tested representative environments.

TIST Journal 2022 Journal Article

Algorithms for Trajectory Points Clustering in Location-based Social Networks

  • Nan Han
  • Shaojie Qiao
  • Kun Yue
  • Jianbin Huang
  • Qiang He
  • Tingting Tang
  • Faliang Huang
  • Chunlin He

Recent advances in localization techniques have fundamentally enhanced social networking services, allowing users to share their locations and location-related contents. This has further increased the popularity of location-based social networks (LBSNs) and produces a huge amount of trajectories composed of continuous and complex spatio-temporal points from people’s daily lives. How to accurately aggregate large-scale trajectories is an important and challenging task. Conventional clustering algorithms (e.g., k -means or k -mediods) cannot be directly employed to process trajectory data due to their serialization, triviality and redundancy. Aiming to overcome the drawbacks of traditional k -means algorithm and k -mediods, including their sensitivity to the selection of the initial k value, the cluster centers and easy convergence to a locally optimal solution, we first propose an optimized k -means algorithm (namely OKM ) to obtain k optimal initial clustering centers based on the density of trajectory points. Second, because k -means is sensitive to noisy points, we propose an improved k -mediods algorithm called IKMD based on an acceptable radius r by considering users’ geographic location in LBSNs. The value of k can be calculated based on r, and the optimal k points are selected as the initial clustering centers with high densities to reduce the cost of distance calculation. Thirdly, we thoroughly analyze the advantages of IKMD by comparing it with the commonly used clustering approaches through illustrative examples. Last, we conduct extensive experiments to evaluate the performance of IKMD against seven clustering approaches including the proposed optimized k -means algorithm, k -mediods algorithm, traditional density-based k -mediods algorithm and the state-of-the-arts trajectory clustering methods. The results demonstrate that IKMD significantly outperforms existing algorithms in the cost of distance calculation and the convergence speed. The methods proposed is proved to contribute to a larger effort targeted at advancing the study of intelligent trajectory data analytics.

TIST Journal 2021 Journal Article

A Dynamic Convolutional Neural Network Based Shared-Bike Demand Forecasting Model

  • Shaojie Qiao
  • Nan Han
  • Jianbin Huang
  • Kun Yue
  • Rui Mao
  • Hongping Shu
  • Qiang He
  • Xindong Wu

Bike-sharing systems are becoming popular and generate a large volume of trajectory data. In a bike-sharing system, users can borrow and return bikes at different stations. In particular, a bike-sharing system will be affected by weather, the time period, and other dynamic factors, which challenges the scheduling of shared bikes. In this article, a new shared-bike demand forecasting model based on dynamic convolutional neural networks, called SDF, is proposed to predict the demand of shared bikes. SDF chooses the most relevant weather features from real weather data by using the Pearson correlation coefficient and transforms them into a two-dimensional dynamic feature matrix, taking into account the states of stations from historical data. The feature information in the matrix is extracted, learned, and trained with a newly proposed dynamic convolutional neural network to predict the demand of shared bikes in a dynamical and intelligent fashion. The phase of parameter update is optimized from three aspects: the loss function, optimization algorithm, and learning rate. Then, an accurate shared-bike demand forecasting model is designed based on the basic idea of minimizing the loss value. By comparing with classical machine learning models, the weight sharing strategy employed by SDF reduces the complexity of the network. It allows a high prediction accuracy to be achieved within a relatively short period of time. Extensive experiments are conducted on real-world bike-sharing datasets to evaluate SDF. The results show that SDF significantly outperforms classical machine learning models in prediction accuracy and efficiency.

IJCAI Conference 2018 Conference Paper

A Fast Algorithm for Optimally Finding Partially Disjoint Shortest Paths

  • Longkun Guo
  • Yunyun Deng
  • Kewen Liao
  • Qiang He
  • Timos Sellis
  • Zheshan Hu

The classical disjoint shortest path problem has recently recalled interests from researchers in the network planning and optimization community. However, the requirement of the shortest paths being completely vertex or edge disjoint might be too restrictive and demands much more resources in a network. Partially disjoint shortest paths, in which a bounded number of shared vertices or edges is allowed, balance between degree of disjointness and occupied network resources. In this paper, we consider the problem of finding k shortest paths which are edge disjoint but partially vertex disjoint. For a pair of distinct vertices in a network graph, the problem aims to optimally find k edge disjoint shortest paths among which at most a bounded number of vertices are shared by at least two paths. In particular, we present novel techniques for exactly solving the problem with a runtime that significantly improves the current best result. The proposed algorithm is also validated by computer experiments on both synthetic and real networks which demonstrate its superior efficiency of up to three orders of magnitude faster than the state of the art.