Arrow Research search

Author name cluster

Guozheng Ma

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers
2 author rows

Possible papers

7

TMLR Journal 2025 Journal Article

Are Large Language Models Really Robust to Word-Level Perturbations?

  • Haoyu Wang
  • Guozheng Ma
  • Cong Yu
  • Ning Gui
  • Linrui Zhang
  • Zhiqi Huang
  • Suwei Ma
  • Yongzhe Chang

The swift advancement in the scales and capabilities of Large Language Models (LLMs) positions them as promising tools for a variety of downstream tasks. In addition to the pursuit of better performance and the avoidance of violent feedback on a certain prompt, to ensure the responsibility of the LLMs, much attention is drawn to the robustness of LLMs. However, existing evaluation methods mostly rely on traditional question answering datasets with predefined supervised labels, potentially ignoring the superior generation capabilities of contemporary LLMs. To investigate the robustness of LLMs while using their generation ability, we propose a novel rational evaluation pipeline that leverages reward models as diagnostic tools to evaluate the long conversation generated from more challenging open questions by LLMs, which we refer to as the Reward Model for Reasonable Robustness Evaluation (TREvaL). Longer conversations manifest the comprehensive grasp of language models in terms of their proficiency in understanding questions, a capability not entirely encompassed by individual words or letters.Our extensive empirical experiments demonstrate that TREvaL provides an identification for the lack of robustness of nowadays LLMs.Notably, we are surprised to discover that robustness tends to decrease as fine-tuning (SFT and RLHF) is conducted, calling for more attention on the robustness during alignment process.

ICML Conference 2025 Conference Paper

Mastering Massive Multi-Task Reinforcement Learning via Mixture-of-Expert Decision Transformer

  • Yilun Kong
  • Guozheng Ma
  • Qi Zhao
  • Haoyu Wang 0018
  • Li Shen 0008
  • Xueqian Wang 0001
  • Dacheng Tao

Despite recent advancements in offline multi-task reinforcement learning (MTRL) have harnessed the powerful capabilities of the Transformer architecture, most approaches focus on a limited number of tasks, with scaling to extremely massive tasks remaining a formidable challenge. In this paper, we first revisit the key impact of task numbers on current MTRL method, and further reveal that naively expanding the parameters proves insufficient to counteract the performance degradation as the number of tasks escalates. Building upon these insights, we propose M3DT, a novel mixture-of-experts (MoE) framework that tackles task scalability by further unlocking the model’s parameter scalability. Specifically, we enhance both the architecture and the optimization of the agent, where we strengthen the Decision Transformer (DT) backbone with MoE to reduce task load on parameter subsets, and introduce a three-stage training mechanism to facilitate efficient training with optimal performance. Experimental results show that, by increasing the number of experts, M3DT not only consistently enhances its performance as model expansion on the fixed task numbers, but also exhibits remarkable task scalability, successfully extending to 160 tasks with superior performance.

ICML Conference 2025 Conference Paper

Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement Learning

  • Guozheng Ma
  • Lu Li
  • Zilin Wang 0002
  • Li Shen 0008
  • Pierre-Luc Bacon
  • Dacheng Tao

Effectively scaling up deep reinforcement learning models has proven notoriously difficult due to network pathologies during training, motivating various targeted interventions such as periodic reset and architectural advances such as layer normalization. Instead of pursuing more complex modifications, we show that introducing static network sparsity alone can unlock further scaling potential beyond their dense counterparts with state-of-the-art architectures. This is achieved through simple one-shot random pruning, where a predetermined percentage of network weights are randomly removed once before training. Our analysis reveals that, in contrast to naively scaling up dense DRL networks, such sparse networks achieve both higher parameter efficiency for network expressivity and stronger resistance to optimization challenges like plasticity loss and gradient interference. We further extend our evaluation to visual and streaming RL scenarios, demonstrating the consistent benefits of network sparsity.

AAMAS Conference 2024 Conference Paper

Normalization Enhances Generalization in Visual Reinforcement Learning

  • Lu Li
  • Jiafei Lyu
  • Guozheng Ma
  • Zilin Wang
  • Zhenjie Yang
  • Xiu Li
  • Zhiheng Li

Recent advances in visual reinforcement learning (RL) have led to impressive success in handling complex tasks. However, these methods have demonstrated limited generalization capability to visual disturbances, which poses a significant challenge to their real-world application and adaptability. Though normalization techniques have demonstrated huge success in supervised and unsupervised learning, their applications in visual RL are still scarce. In this paper, we explore the potential benefits of integrating normalization into visual RL methods with respect to generalization performance. We find that, perhaps surprisingly, incorporating suitable normalization techniques is sufficient to enhance the generalization capabilities, without any additional special design. We utilize the combination of two normalization techniques, CrossNorm and SelfNorm, for generalizable visual RL. Extensive experiments are conducted on DMControl Generalization Benchmark, CARLA, and ProcGen Benchmark to validate the effectiveness of our method. We show that our method significantly improves generalization capability while only marginally affecting sample efficiency. In particular, when integrated with DrQ-v2, our method enhances the test performance of DrQ-v2 on CARLA across various scenarios, from 14% of the training performance to 97%. Our project page: https: //sites. google. com/view/norm-generalization-vrl/home

ICLR Conference 2024 Conference Paper

Revisiting Plasticity in Visual Reinforcement Learning: Data, Modules and Training Stages

  • Guozheng Ma
  • Lu Li
  • Sen Zhang 0006
  • Zixuan Liu 0002
  • Zhen Wang 0030
  • Yixin Chen 0001
  • Li Shen 0008
  • Xueqian Wang 0001

Plasticity, the ability of a neural network to evolve with new data, is crucial for high-performance and sample-efficient visual reinforcement learning (VRL). Although methods like resetting and regularization can potentially mitigate plasticity loss, the influences of various components within the VRL framework on the agent's plasticity are still poorly understood. In this work, we conduct a systematic empirical exploration focusing on three primary underexplored facets and derive the following insightful conclusions: (1) data augmentation is essential in maintaining plasticity; (2) the critic's plasticity loss serves as the principal bottleneck impeding efficient training; and (3) without timely intervention to recover critic's plasticity in the early stages, its loss becomes catastrophic. These insights suggest a novel strategy to address the high replay ratio (RR) dilemma, where exacerbated plasticity loss hinders the potential improvements of sample efficiency brought by increased reuse frequency. Rather than setting a static RR for the entire training process, we propose Adaptive RR, which dynamically adjusts the RR based on the critic’s plasticity level. Extensive evaluations indicate that Adaptive RR not only avoids catastrophic plasticity loss in the early stages but also benefits from more frequent reuse in later phases, resulting in superior sample efficiency.

NeurIPS Conference 2023 Conference Paper

Learning Better with Less: Effective Augmentation for Sample-Efficient Visual Reinforcement Learning

  • Guozheng Ma
  • Linrui Zhang
  • Haoyu Wang
  • Lu Li
  • Zilin Wang
  • Zhen Wang
  • Li Shen
  • Xueqian Wang

Data augmentation (DA) is a crucial technique for enhancing the sample efficiency of visual reinforcement learning (RL) algorithms. Notably, employing simple observation transformations alone can yield outstanding performance without extra auxiliary representation tasks or pre-trained encoders. However, it remains unclear which attributes of DA account for its effectiveness in achieving sample-efficient visual RL. To investigate this issue and further explore the potential of DA, this work conducts comprehensive experiments to assess the impact of DA's attributes on its efficacy and provides the following insights and improvements: (1) For individual DA operations, we reveal that both ample spatial diversity and slight hardness are indispensable. Building on this finding, we introduce Random PadResize (Rand PR), a new DA operation that offers abundant spatial diversity with minimal hardness. (2) For multi-type DA fusion schemes, the increased DA hardness and unstable data distribution result in the current fusion schemes being unable to achieve higher sample efficiency than their corresponding individual operations. Taking the non-stationary nature of RL into account, we propose a RL-tailored multi-type DA fusion scheme called Cycling Augmentation (CycAug), which performs periodic cycles of different DA operations to increase type diversity while maintaining data distribution consistency. Extensive evaluations on the DeepMind Control suite and CARLA driving simulator demonstrate that our methods achieve superior sample efficiency compared with the prior state-of-the-art methods.

IJCAI Conference 2022 Conference Paper

Don’t Touch What Matters: Task-Aware Lipschitz Data Augmentation for Visual Reinforcement Learning

  • Zhecheng Yuan
  • Guozheng Ma
  • Yao Mu
  • Bo Xia
  • Bo Yuan
  • Xueqian Wang
  • Ping Luo
  • Huazhe Xu

One of the key challenges in visual Reinforcement Learning (RL) is to learn policies that can generalize to unseen environments. Recently, data augmentation techniques aiming at enhancing data diversity have demonstrated proven performance in improving the generalization ability of learned policies. However, due to the sensitivity of RL training, naively applying data augmentation, which transforms each pixel in a task-agnostic manner, may suffer from instability and damage the sample efficiency, thus further exacerbating the generalization performance. At the heart of this phenomenon is the diverged action distribution and high-variance value estimation in the face of augmented images. To alleviate this issue, we propose Task-aware Lipschitz Data Augmentation (TLDA) for visual RL, which explicitly identifies the task-correlated pixels with large Lipschitz constants, and only augments the task-irrelevant pixels for stability. We verify the effectiveness of our approach on DeepMind Control suite, CARLA and DeepMind Manipulation tasks. The extensive empirical results show that TLDA improves both sample efficiency and generalization; it outperforms previous state-of-the-art methods across 3 different visual control benchmarks.