Arrow Research search

Author name cluster

Qi Zhou

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

20 papers
1 author row

Possible papers

20

EAAI Journal 2026 Journal Article

A bidirectional cross global attention based network for aeroengine remaining useful life prediction

  • Hao Dong
  • Yankun Sheng
  • Shuyang Luo
  • Jiexiang Hu
  • Qi Zhou

A major challenge in aeroengine remaining useful life (RUL) prediction is to effectively capture both temporal dynamics and complex inter-sensor dependencies inherent in multivariate time series (MTS) monitoring data. Existing methods often fail to fully exploit correlations between temporal sequences and sensor channels. To address this limitation, a bidirectional cross–global attention-based network (BCGA) is developed. The proposed network integrates a bidirectional Gated Recurrent Unit (BiGRU), a cross-attention module, and a global-attention module. First, the BiGRU learns dynamic patterns along both the temporal and sensor dimensions, capturing temporal evolution and inter-sensor associations. Next, the cross-attention module explicitly models cross-dependencies between time steps and sensors, enabling bidirectional information exchange and stronger modeling of inter-channel coupling. Finally, the global-attention module adaptively reweights time steps and sensor outputs to extract the most diagnostic global features, improving the robustness and interpretability of RUL prediction. Compared with state-of-the-art networks, BCGA network reduces mean Root Mean Square Error (RMSE) by 0. 40 % and mean Score by 3 % on the NASA-provided turbofan engine data (C-MAPSS) across diverse operating conditions. These results show that BCGA effectively captures degradation patterns, improves prediction accuracy, and exhibits strong robustness and generalizability, highlighting its potential for practical predictive maintenance.

YNIMG Journal 2026 Journal Article

Gestational age-specific DTI templates of the neonatal brain: Application in preterm developmental study

  • Xiaochen Jiang
  • Mengyi Wang
  • Ying Liu
  • Tianhao Zhang
  • Guangjuan Mao
  • Qi Zhou
  • Shilun Zhao
  • Baoci Shan

Due to significant differences in brain volume, morphology, and white matter integrity among neonates of varying gestational ages, using a single full-term template for preterm analysis inevitably introduces analytical errors. To address this, we aimed to develop gestational-age-specific stereotaxic DTI templates using retrospective diffusion MRI scans from 161 neonates acquired between August 2021 and January 2024. The cohort was stratified into four WHO-defined subgroups: extremely preterm (n = 31), very preterm (n = 29), moderate to late preterm (n = 28), and full-term (n = 73). Templates were constructed via iterative registration, with corresponding atlases transformed from JHU space and manually corrected. Quantitative evaluation using the Jacobian determinant and standard deviation revealed that our age-specific templates demonstrated significantly lower deformation magnitude and registration error compared to a standard full-term template. When applied to investigate developmental differences, we observed progressively more extensive fractional anisotropy reductions from moderate-to-late to extremely preterm neonates. Notably, commissural fibers, particularly the corpus callosum body (0.194 ± 0.005 in extremely preterm vs. 0.230 ± 0.003 in full-term, p < 0.001), exhibited significant developmental gradients. Consequently, these constructed gestational-age-specific DTI templates offer a robust tool to improve the accuracy of morbidity risk predictions and facilitate multicenter studies of preterm neonates.

AAAI Conference 2026 Conference Paper

Reconstruction Attack-Resistant Inference Paradigm for LLM Cloud Services

  • Zipeng Ye
  • Wenjian Luo
  • Qi Zhou
  • Yubo Tang

Large language models (LLMs) have seen remarkable growth in recent years. To leverage convenient LLM cloud services, users are inevitably to upload their prompts. Additionally, for tasks such as translation, reading comprehension, and summarization, associated files or context are inherently needed, whether or not they contain user privacy information. Despite the rapid progress in LLM capabilities, research on preserving user privacy during inference has been relatively scarce. To this end, this paper conducts some exploratory research in this domain. Firstly, we show that (1) the embedding space of tokens is highly sparse, and (2) LLMs primarily function in the orthogonal subspace of embedding space, these two factors making privacy extremely vulnerable. Then, we analyze the structural characteristics of LLMs and design a distributed privacy-preserving inference paradigm which can effectively resist privacy attacks. Finally, we perform a thorough evaluation of the defended models on mainstream tasks and find that low-bit quantization techniques can be effectively combined with our inference paradigm, achieving a balance between privacy, utility, and runtime memory efficiency.

NeurIPS Conference 2025 Conference Paper

A Set of Generalized Components to Achieve Effective Poison-only Clean-label Backdoor Attacks with Collaborative Sample Selection and Triggers

  • Zhixiao Wu
  • Yao Lu
  • Jie Wen
  • Hao Sun
  • Qi Zhou
  • Guangming Lu

Poison-only Clean-label Backdoor Attacks (PCBAs) aim to covertly inject attacker-desired behavior into DNNs by merely poisoning the dataset without changing the labels. To effectively implant a backdoor, multiple triggers are proposed for various attack requirements of Attack Success Rate (ASR) and stealthiness. Additionally, sample selection enhances clean-label backdoor attacks' ASR by meticulously selecting "hard'' samples instead of random samples to poison. Current methods, however, 1) usually handle the sample selection and triggers in isolation, leading to severely limited improvements on both ASR and stealthiness. Consequently, attacks exhibit unsatisfactory performance on evaluation metrics when converted to PCBAs via a mere stacking of methods. Therefore, we seek to explore the bi-directional collaborative relations between the sample selection and triggers to address the above dilemma. 2) Since the strong specificity within triggers, the simple combination of sample selection and triggers fails to substantially enhance both evaluation metrics, with generalization preserved among various attacks. Therefore, we seek to propose a set of components to significantly improve both stealthiness and ASR based on the commonalities of attacks. Specifically, Component A ascertains two critical selection factors, and then makes them an appropriate combination based on the trigger scale to select more reasonable "hard'' samples for improving ASR. Component B is proposed to select samples with similarities to relevant trigger implanted samples to promote stealthiness. Component C reassigns trigger poisoning intensity on RGB colors through distinct sensitivity of the human visual system to RGB for higher ASR, with stealthiness ensured by sample selection including Component B. Furthermore, all components can be strategically integrated into diverse PCBAs, enabling tailored solutions that balance ASR and stealthiness enhancement for specific attack requirements. Extensive experiments demonstrate the superiority of our components in stealthiness, ASR, and generalization. Our code will be released as soon as possible.

NeurIPS Conference 2025 Conference Paper

InfiFPO: Implicit Model Fusion via Preference Optimization in Large Language Models

  • Yanggan Gu
  • Yuanyi Wang
  • Zhaoyi Yan
  • Yiming Zhang
  • Qi Zhou
  • Fei Wu
  • Hongxia Yang

Model fusion combines multiple Large Language Models (LLMs) with different strengths into a more powerful, integrated model through lightweight training methods. Existing works on model fusion focus primarily on supervised fine-tuning (SFT), leaving preference alignment (PA) —a critical phase for enhancing LLM performance—largely unexplored. The current few fusion methods on PA phase, like WRPO, simplify the process by utilizing only response outputs from source models while discarding their probability information. To address this limitation, we propose InfiFPO, a preference optimization method for implicit model fusion. InfiFPO replaces the reference model in Direct Preference Optimization (DPO) with a fused source model that synthesizes multi-source probabilities at the sequence level, circumventing complex vocabulary alignment challenges in previous works and meanwhile maintaining the probability information. By introducing probability clipping and max-margin fusion strategies, InfiFPO enables the pivot model to align with human preferences while effectively distilling knowledge from source models. Comprehensive experiments on 11 widely-used benchmarks demonstrate that InfiFPO consistently outperforms existing model fusion and preference optimization methods. When using Phi-4 as the pivot model, InfiFPO improves its average performance from 79. 95 to 83. 33 on 11 benchmarks, significantly improving its capabilities in mathematics, coding, and reasoning tasks.

NeurIPS Conference 2025 Conference Paper

InfiGFusion: Graph-on-Logits Distillation via Efficient Gromov-Wasserstein for Model Fusion

  • Yuanyi Wang
  • Zhaoyi Yan
  • Yiming Zhang
  • Qi Zhou
  • Yanggan Gu
  • Fei Wu
  • Hongxia Yang

Recent advances in large language models (LLMs) have intensified efforts to fuse heterogeneous open-source models into a unified system that inherits their complementary strengths. Existing logit-based fusion methods maintain inference efficiency but treat vocabulary dimensions independently, overlooking semantic dependencies encoded by cross-dimension interactions. These dependencies reflect how token types interact under a model's internal reasoning and are essential for aligning models with diverse generation behaviors. To explicitly model these dependencies, we propose \textbf{InfiGFusion}, the first structure-aware fusion framework with a novel \textit{Graph-on-Logits Distillation} (GLD) loss. Specifically, we retain the top-$k$ logits per output and aggregate their outer products across sequence positions to form a global co-activation graph, where nodes represent vocabulary channels and edges quantify their joint activations. To ensure scalability and efficiency, we design a sorting-based closed-form approximation that reduces the original $O(n^4)$ cost of Gromov-Wasserstein distance to $O(n \log n)$, with provable approximation guarantees. Experiments across multiple fusion settings show that GLD consistently improves fusion quality and stability. InfiGFusion outperforms SOTA models and fusion baselines across 11 benchmarks spanning reasoning, coding, and mathematics. It shows particular strength in complex reasoning tasks, with +35. 6 improvement on Multistep Arithmetic and +37. 06 on Causal Judgement over SFT, demonstrating superior multi-step and relational inference.

AAAI Conference 2024 Conference Paper

High-Fidelity Gradient Inversion in Distributed Learning

  • Zipeng Ye
  • Wenjian Luo
  • Qi Zhou
  • Yubo Tang

Distributed learning frameworks aim to train global models by sharing gradients among clients while preserving the data privacy of each individual client. However, extensive research has demonstrated that these learning frameworks do not absolutely ensure the privacy, as training data can be reconstructed from shared gradients. Nevertheless, the existing privacy-breaking attack methods have certain limitations. Some are applicable only to small models, while others can only recover images in small batch size and low resolutions, or with low fidelity. Furthermore, when there are some data with the same label in a training batch, existing attack methods usually perform poorly. In this work, we successfully address the limitations of existing attacks by two steps. Firstly, we model the coefficient of variation (CV) of features and design an evolutionary algorithm based on the minimum CV to accurately reconstruct the labels of all training data. After that, we propose a stepwise gradient inversion attack, which dynamically adapts the objective function, thereby effectively and rationally promoting the convergence of attack results towards an optimal solution. With these two steps, our method is able to recover high resolution images (224*224 pixel, from ImageNet and Web) with high fidelity in distributed learning scenarios involving complex models and larger batch size. Experiment results demonstrate the superiority of our approach, reveal the potential vulnerabilities of the distributed learning paradigm, and emphasize the necessity of developing more secure mechanisms. Source code is available at https://github.com/MiLab-HITSZ/2023YeHFGradInv.

AAAI Conference 2024 Conference Paper

Regulating Intermediate 3D Features for Vision-Centric Autonomous Driving

  • Junkai Xu
  • Liang Peng
  • Haoran Cheng
  • Linxuan Xia
  • Qi Zhou
  • Dan Deng
  • Wei Qian
  • Wenxiao Wang

Multi-camera perception tasks have gained significant attention in the field of autonomous driving. However, existing frameworks based on Lift-Splat-Shoot (LSS) in the multi-camera setting cannot produce suitable dense 3D features due to the projection nature and uncontrollable densification process. To resolve this problem, we propose to regulate intermediate dense 3D features with the help of volume rendering. Specifically, we employ volume rendering to process the dense 3D features to obtain corresponding 2D features (e.g., depth maps, semantic maps), which are supervised by associated labels in the training. This manner regulates the generation of dense 3D features on the feature level, providing appropriate dense and unified features for multiple perception tasks. Therefore, our approach is termed Vampire, stands for ``Volume rendering As Multi-camera Perception Intermediate feature REgulator''. Experimental results on the Occ3D and nuScenes datasets demonstrate that Vampire facilitates fine-grained and appropriate extraction of dense 3D features, and is competitive with existing SOTA methods across diverse downstream perception tasks like 3D occupancy prediction, LiDAR segmentation and 3D objection detection, while utilizing moderate GPU resources. We provide a video demonstration in the supplementary materials and Codes are available at github.com/cskkxjk/Vampire.

AAAI Conference 2023 Conference Paper

Efficient Exploration in Resource-Restricted Reinforcement Learning

  • Zhihai Wang
  • Taoxing Pan
  • Qi Zhou
  • Jie Wang

In many real-world applications of reinforcement learning (RL), performing actions requires consuming certain types of resources that are non-replenishable in each episode. Typical applications include robotic control with limited energy and video games with consumable items. In tasks with non-replenishable resources, we observe that popular RL methods such as soft actor critic suffer from poor sample efficiency. The major reason is that, they tend to exhaust resources fast and thus the subsequent exploration is severely restricted due to the absence of resources. To address this challenge, we first formalize the aforementioned problem as a resource-restricted reinforcement learning, and then propose a novel resource-aware exploration bonus (RAEB) to make reasonable usage of resources. An appealing feature of RAEB is that, it can significantly reduce unnecessary resource-consuming trials while effectively encouraging the agent to explore unvisited states. Experiments demonstrate that the proposed RAEB significantly outperforms state-of-the-art exploration strategies in resource-restricted reinforcement learning environments, improving the sample efficiency by up to an order of magnitude.

AAAI Conference 2023 Conference Paper

Robust Representation Learning by Clustering with Bisimulation Metrics for Visual Reinforcement Learning with Distractions

  • Qiyuan Liu
  • Qi Zhou
  • Rui Yang
  • Jie Wang

Recent work has shown that representation learning plays a critical role in sample-efficient reinforcement learning (RL) from pixels. Unfortunately, in real-world scenarios, representation learning is usually fragile to task-irrelevant distractions such as variations in background or viewpoint. To tackle this problem, we propose a novel clustering-based approach, namely Clustering with Bisimulation Metrics (CBM), which learns robust representations by grouping visual observations in the latent space. Specifically, CBM alternates between two steps: (1) grouping observations by measuring their bisimulation distances to the learned prototypes; (2) learning a set of prototypes according to the current cluster assignments. Computing cluster assignments with bisimulation metrics enables CBM to capture task-relevant information, as bisimulation metrics quantify the behavioral similarity between observations. Moreover, CBM encourages the consistency of representations within each group, which facilitates filtering out task-irrelevant information and thus induces robust representations against distractions. An appealing feature is that CBM can achieve sample-efficient representation learning even if multiple distractions exist simultaneously. Experiments demonstrate that CBM significantly improves the sample efficiency of popular visual RL algorithms and achieves state-of-the-art performance on both multiple and single distraction settings. The code is available at https://github.com/MIRALab-USTC/RL-CBM.

AAAI Conference 2022 Conference Paper

Learning Robust Policy against Disturbance in Transition Dynamics via State-Conservative Policy Optimization

  • Yufei Kuang
  • Miao Lu
  • Jie Wang
  • Qi Zhou
  • Bin Li
  • Houqiang Li

Deep reinforcement learning algorithms can perform poorly in real-world tasks due to the discrepancy between source and target environments. This discrepancy is commonly viewed as the disturbance in transition dynamics. Many existing algorithms learn robust policies by modeling the disturbance and applying it to source environments during training, which usually requires prior knowledge about the disturbance and control of simulators. However, these algorithms can fail in scenarios where the disturbance from target environments is unknown or is intractable to model in simulators. To tackle this problem, we propose a novel model-free actor-critic algorithm—namely, State-Conservative Policy Optimization (SCPO)—to learn robust policies without modeling the disturbance in advance. Specifically, SCPO reduces the disturbance in transition dynamics to that in state space and then approximates it by a simple gradient-based regularizer. The appealing features of SCPO include that it is simple to implement and does not require additional knowledge about the disturbance or specially designed simulators. Experiments in several robot control tasks demonstrate that SCPO learns robust policies against the disturbance in transition dynamics.

AAAI Conference 2022 Conference Paper

Sample-Efficient Reinforcement Learning via Conservative Model-Based Actor-Critic

  • Zhihai Wang
  • Jie Wang
  • Qi Zhou
  • Bin Li
  • Houqiang Li

Model-based reinforcement learning algorithms, which aim to learn a model of the environment to make decisions, are more sample efficient than their model-free counterparts. The sample efficiency of model-based approaches relies on whether the model can well approximate the environment. However, learning an accurate model is challenging, especially in complex and noisy environments. To tackle this problem, we propose the conservative model-based actorcritic (CMBAC), a novel approach that achieves high sample efficiency without the strong reliance on accurate learned models. Specifically, CMBAC learns multiple estimates of the Q-value function from a set of inaccurate models and uses the average of the bottom-k estimates—a conservative estimate—to optimize the policy. An appealing feature of CMBAC is that the conservative estimates effectively encourage the agent to avoid unreliable “promising actions”— whose values are high in only a small fraction of the models. Experiments demonstrate that CMBAC significantly outperforms state-of-the-art approaches in terms of sample efficiency on several challenging tasks, and the proposed method is more robust than previous methods in noisy environments.

AAAI Conference 2021 Conference Paper

EvaLDA: Efficient Evasion Attacks Towards Latent Dirichlet Allocation

  • Qi Zhou
  • Haipeng Chen
  • Yitao Zheng
  • Zhen Wang

As one of the most powerful topic models, Latent Dirichlet Allocation (LDA) has been used in a vast range of tasks, including document understanding, information retrieval and peer-reviewer assignment. Despite its tremendous popularity, the security of LDA has rarely been studied. This poses severe risks to security-critical tasks such as sentiment analysis and peer-reviewer assignment that are based on LDA. In this paper, we are interested in knowing whether LDA models are vulnerable to adversarial perturbations of benign document examples during inference time. We formalize the evasion attack to LDA models as an optimization problem and prove it to be NP-hard. We then propose a novel and efficient algorithm, EvaLDA to solve it. We show the effectiveness of EvaLDA via extensive empirical evaluations. For instance, in the NIPS dataset, EvaLDA can averagely promote the rank of a target topic from 10 to around 7 by only replacing 1% of the words with similar words in a victim document. Our work provides significant insights into the power and limitations of evasion attacks to LDA models.

AAAI Conference 2020 Conference Paper

Deep Model-Based Reinforcement Learning via Estimated Uncertainty and Conservative Policy Optimization

  • Qi Zhou
  • Houqiang Li
  • Jie Wang

Model-based reinforcement learning algorithms tend to achieve higher sample efficiency than model-free methods. However, due to the inevitable errors of learned models, model-based methods struggle to achieve the same asymptotic performance as model-free methods. In this paper, We propose a Policy Optimization method with Model-Based Uncertainty (POMBU)—a novel model-based approach— that can effectively improve the asymptotic performance using the uncertainty in Q-values. We derive an upper bound of the uncertainty, based on which we can approximate the uncertainty accurately and efficiently for model-based methods. We further propose an uncertainty-aware policy optimization algorithm that optimizes the policy conservatively to encourage performance improvement with high probability. This can significantly alleviate the overfitting of policy to inaccurate models. Experiments show POMBU can outperform existing state-of-the-art policy optimization algorithms in terms of sample efficiency and asymptotic performance. Moreover, the experiments demonstrate the excellent robustness of POMBU compared to previous model-based approaches.

NeurIPS Conference 2020 Conference Paper

Promoting Stochasticity for Expressive Policies via a Simple and Efficient Regularization Method

  • Qi Zhou
  • Yufei Kuang
  • Zherui Qiu
  • Houqiang Li
  • Jie Wang

Many recent reinforcement learning (RL) methods learn stochastic policies with entropy regularization for exploration and robustness. However, in continuous action spaces, integrating entropy regularization with expressive policies is challenging and usually requires complex inference procedures. To tackle this problem, we propose a novel regularization method that is compatible with a broad range of expressive policy architectures. An appealing feature is that, the estimation of our regularization terms is simple and efficient even when the policy distributions are unknown. We show that our approach can effectively promote the exploration in continuous action spaces. Based on our regularization, we propose an off-policy actor-critic algorithm. Experiments demonstrate that the proposed algorithm outperforms state-of-the-art regularized RL methods in continuous control tasks.