Arrow Research search

Author name cluster

Ran Cheng

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

11 papers
2 author rows

Possible papers

11

ICRA Conference 2025 Conference Paper

Beyond Traversing in a Thin Pipe: Self-Sensing Odometry of a Pipeline Robot Driven by High-Frequency Dielectric Elastomer Actuators

  • Ran Cheng
  • Qi Shao
  • Xin-Jun Liu
  • Huichan Zhao

In this paper, we propose an earthworm-inspired miniature pipeline robot capable of self-sensing odometry. The robot features a dielectric elastomer actuator as its elongation body and two specially designed passive anchors to achieve unidirectional motion without slipping. The odometry was achieved through the self-sensing scheme of DEAs and the summation of all step sizes over a period. The careful implementation of the self-sensing method resulted in a small sensing resolution of 0. 05 mm at a high actuation frequency of 20 Hz for a cylindrical DEA. Finally, the robot obtained a self-sensing odometry in a pipe, showing good consistency with the ground truth. This work paves a new way for a miniature in-pipe robot to sense its own state without additional sensors to save space and power.

NeurIPS Conference 2025 Conference Paper

Diversity-Aware Policy Optimization for Large Language Model Reasoning

  • Jian Yao
  • Ran Cheng
  • Xingyu Wu
  • Jibin Wu
  • KC Tan

The reasoning capabilities of large language models (LLMs) have advanced rapidly, particularly following the release of DeepSeek-R1, which has inspired a surge of research into data quality and reinforcement learning (RL) algorithms. Despite the pivotal role diversity plays in RL, its influence on LLM reasoning remains largely underexplored. To bridge this gap, this work presents a systematic investigation into the impact of diversity in RL-based training for LLM reasoning, and proposes a novel diversity-aware policy optimization method. Across evaluations on 12 LLMs, we observe a strong positive correlation between the solution diversity and potential@k (a novel metric quantifying an LLM’s reasoning potential) in high-performing models. This finding motivates our method to explicitly promote diversity during RL training. Specifically, we design a token-level diversity and reformulate it into a practical objective, then we selectively apply it to positive samples. Integrated into the R1-zero training framework, our method achieves a 3. 5\% average improvement across four mathematical reasoning benchmarks, while generating more diverse and robust solutions.

NeurIPS Conference 2025 Conference Paper

MetaBox-v2: A Unified Benchmark Platform for Meta-Black-Box Optimization

  • Zeyuan Ma
  • Yue-Jiao Gong
  • Hongshu Guo
  • Wenjie Qiu
  • Sijie Ma
  • Hongqiao Lian
  • Jiajun Zhan
  • Kaixu Chen

Meta-Black-Box Optimization (MetaBBO) streamlines the automation of optimization algorithm design through meta-learning. It typically employs a bi-level structure: the meta-level policy undergoes meta-training to reduce the manual effort required in developing algorithms for low-level optimization tasks. The original MetaBox (2023) provided the first open-source framework for reinforcement learning-based single-objective MetaBBO. However, its relatively narrow scope no longer keep pace with the swift advancement in this field. In this paper, we introduce MetaBox-v2 (\url{https: //github. com/MetaEvo/MetaBox}) as a milestone upgrade with four novel features: 1) a unified architecture supporting RL, evolutionary, and gradient-based approaches, by which we reproduce $23$ up-to-date baselines; 2) efficient parallelization schemes, which reduce the training/testing time by $10-40$x; 3) a comprehensive benchmark suite of $18$ synthetic/realistic tasks ($1900$+ instances) spanning single-objective, multi-objective, multi-model, and multi-task optimization scenarios; 4) plentiful and extensible interfaces for custom analysis/visualization and integrating to external optimization tools/benchmarks. To show the utility of MetaBox-v2, we carry out a systematic case study that evaluates the built-in baselines in terms of the optimization performance, generalization ability and learning efficiency. Valuable insights are concluded from thorough and detailed analysis for practitioners and those new to the field.

ICRA Conference 2025 Conference Paper

Scaling Diffusion Policy in Transformer to 1 Billion Parameters for Robotic Manipulation

  • Minjie Zhu
  • Yichen Zhu 0001
  • Jinming Li
  • Junjie Wen
  • Zhiyuan Xu
  • Ning Liu 0007
  • Ran Cheng
  • Chaomin Shen 0001

Diffusion Policy is a powerful technique tool for learning end-to-end visuomotor robot control. It is expected that Diffusion Policy possesses scalability, a key attribute for deep neural networks, typically suggesting that increasing model size would lead to enhanced performance. However, our observations indicate that Diffusion Policy in transformer architecture (DP-T) struggles to scale effectively; even minor additions of layers can deteriorate training outcomes. To address this issue, we introduce Scalable Diffusion Transformer Policy for visuomotor learning. Our proposed method, namely ScaleDP, introduces two modules that improve the training dynamic of Diffusion Policy and allow the network to better handle multimodal action distribution. First, we identify that DPT suffers from large gradient issues, making the optimization of Diffusion Policy unstable. To resolve this issue, we factorize the feature embedding of observation into multiple affine layers, and integrate it into the transformer blocks. Additionally, our utilize non-causal attention which allows the policy network to “see” future actions during prediction, helping to reduce compounding errors. We demonstrate that our proposed method successfully scales the Diffusion Policy from 10 million to 1 billion parameters. This new model, named ScaleDP, can effectively scale up the model size with improved performance and generalization. We benchmark ScaleDP across 50 different tasks from MetaWorld and find that our largest ScaleDP outperforms DP-T with an average improvement of 21. 6%. Across 7 real-world robot tasks, our ScaleDP demonstrates an average improvement of 36. 25% over DP-T on four single-arm tasks and 75% on three bimanual tasks. We believe our work paves the way for scaling up models for visuomotor learning. The project page is available at https://scaling-diffusion-policy.github.io/.

IROS Conference 2024 Conference Paper

MV-ROPE: Multi-view Constraints for Robust Category-level Object Pose and Size Estimation

  • Jiaqi Yang
  • Yucong Chen
  • Xiangting Meng
  • Chenxin Yan
  • Min Li
  • Ran Cheng
  • Lige Liu
  • Tao Sun

Recently there has been a growing interest in category-level object pose and size estimation, and prevailing methods commonly rely on single view RGB-D images. However, one disadvantage of such methods is that they require accurate depth maps which cannot be produced by consumer-grade sensors. Furthermore, many practical real-world situations involve a moving camera that continuously observes its surroundings, and the temporal information of the input video streams is simply overlooked by single-view methods. We propose a novel solution that makes use of RGB video streams. Our framework consists of three modules: a scale-aware monocular dense SLAM solution, a lightweight object pose predictor, and an object-level pose graph optimizer. The SLAM module utilizes a video stream and additional scale-sensitive readings to estimate camera poses and metric depth. The object pose predictor then generates canonical object representations from RGB images. The object pose is estimated through geometric registration of these canonical object representations with estimated object depth points. All per-view estimates finally undergo optimization within a pose graph, culminating in the output of robust and accurate canonical object poses. Our experimental results demonstrate that when utilizing public dataset sequences with high-quality depth information, the proposed method exhibits comparable performance to state-of-the-art RGB-D methods. We also collect and evaluate on new datasets containing depth maps of varying quality to further quantitatively benchmark the proposed method alongside previous RGB-D based methods. We demonstrate a significant advantage in scenarios where depth input is absent or the quality of depth sensing is limited.

IROS Conference 2023 Conference Paper

Scale Jump-Aware Pose Graph Relaxation for Monocular SLAM with Re-Initializations

  • Runze Yuan
  • Ran Cheng
  • Lige Liu
  • Tao Sun
  • Laurent Kneip

Pose graph relaxation has become an indispensable addition to SLAM enabling efficient global registration of sensor reference frames under the objective of satisfying pair-wise relative transformation constraints. The latter may be given by incremental motion estimation or global place recognition. While the latter case enables loop closures and drift compensation, care has to be taken in the monocular case in which local estimates of structure and displacements can differ from reality not just in terms of noise, but also in terms of a scale factor. Owing to the accumulation of scale propagation errors, this scale factor is drifting over time, hence scale-drift aware pose graph relaxation has been introduced. We extend this idea to cases in which the relative scale between subsequent sensor frames is unknown, a situation that can easily occur if monocular SLAM enters re-initialization and no reliable overlap between successive local maps can be identified. The approach is realized by a hybrid pose graph formulation that combines the regular similarity consistency terms with novel, scale-blind constraints. We apply the technique to the practically relevant case of small indoor service robots capable of effectuating purely rotational displacements, a condition that can easily cause tracking failures. We demonstrate that globally consistent trajectories can be recovered even if multiple re-initializations occur along the loop, and present an in-depth study of success and failure cases.

IROS Conference 2021 Conference Paper

Latent Attention Augmentation for Robust Autonomous Driving Policies

  • Ran Cheng
  • Christopher Agia
  • Florian Shkurti
  • David Meger
  • Gregory Dudek

Model-free reinforcement learning has become a viable approach for vision-based robot control. However, sample complexity and adaptability to domain shifts remain persistent challenges when operating in high-dimensional observation spaces (images, LiDAR), such as those that are involved in autonomous driving. In this paper, we propose a flexible framework by which a policy’s observations are augmented with robust attention representations in the latent space to guide the agent’s attention during training. Our method encodes local and global descriptors of the augmented state representations into a compact latent vector, and scene dynamics are approximated by a recurrent network that processes the latent vectors in sequence. We outline two approaches for constructing attention maps; a supervised pipeline leveraging semantic segmentation networks, and an unsupervised pipeline relying only on classical image processing techniques. We conduct our experiments in simulation and test the learned policy against varying seasonal effects and weather conditions. Our design decisions are supported in a series of ablation studies. The results demonstrate that our state augmentation method both improves learning efficiency and encourages robust domain adaptation when compared to common end-to-end frameworks and methods that learn directly from intermediate representations.

ICRA Conference 2021 Conference Paper

Lite-HDSeg: LiDAR Semantic Segmentation Using Lite Harmonic Dense Convolutions

  • Ryan Razani
  • Ran Cheng
  • Ehsan Taghavi
  • Bingbing Liu

Autonomous driving vehicles and robotic systems rely on accurate perception of their surroundings. Scene understanding is one of the crucial components of perception modules. Among all available sensors, LiDARs are one of the essential sensing modalities of autonomous driving systems due to their active sensing nature with high resolution of sensor readings. Accurate and fast semantic segmentation methods are needed to fully utilize LiDAR sensors for scene understanding. In this paper, we present Lite-HDSeg, a novel real-time convolutional neural network for semantic segmentation of full 3D LiDAR point clouds. Lite-HDSeg can achieve the best accuracy vs. computational complexity trade-off in SemanticKITTI bench-mark and is designed on the basis of a new encoder-decoder architecture with light-weight harmonic dense convolutions as its core. Moreover, we introduce ICM, an improved global contextual module to capture multi-scale contextual features, and MCSPN, a multi-class Spatial Propagation Network to further refine the semantic boundaries. Our experimental results show that the proposed method outperforms state-of- the-art semantic segmentation approaches which can run real-time, thus is suitable for robotic and autonomous driving applications.

ICRA Conference 2021 Conference Paper

S3Net: 3D LiDAR Sparse Semantic Segmentation Network

  • Ran Cheng
  • Ryan Razani
  • Yuan Ren
  • Bingbing Liu

Semantic Segmentation is a crucial component in the perception systems of many applications, such as robotics and autonomous driving that rely on accurate environmental perception and understanding. In literature, several approaches are introduced to attempt LiDAR semantic segmentation task, such as projection-based (range-view or birds-eye-view), and voxel-based approaches. However, they either abandon the valuable 3D topology and geometric relations and suffer from information loss introduced in the projection process or are inefficient. Therefore, there is a need for accurate models capable of processing the 3D driving-scene point cloud in 3D space. In this paper, we propose S3Net, a novel convolutional neural network for LiDAR point cloud semantic segmentation. It adopts an encoder-decoder backbone that consists of Sparse Intra-channel Attention Module (SIntraAM), and Sparse Inter-channel Attention Module (SInterAM) to emphasize the fine details of both within each feature map and among nearby feature maps. To extract the global contexts in deeper layers, we introduce Sparse Residual Tower based upon sparse convolution that suits varying sparsity of LiDAR point cloud. In addition, geo-aware anisotrophic loss is leveraged to emphasize the semantic boundaries and penalize the noise within each predicted regions, leading to a robust prediction. Our experimental results show that the proposed method leads to a large improvement (12%) compared to its baseline counterpart (MinkNet42 [1]) on SemanticKITTI [2] test set and achieves state-of-the-art mIoU accuracy of semantic segmentation approaches.

IROS Conference 2018 Conference Paper

Vision-Based Autonomous Underwater Swimming in Dense Coral for Combined Collision Avoidance and Target Selection

  • Travis Manderson
  • Juan Camilo Gamboa Higuera
  • Ran Cheng
  • Gregory Dudek

We address the problem of learning vision-based, collision-avoiding, and target-selecting controllers in 3D, specifically in underwater environments densely populated with coral reefs. Using a highly maneuverable, dynamic, six-legged (or flippered) vehicle to swim underwater, we exploit real time visual feedback to make close-range navigation decisions that would be hard to achieve with other sensors. Our approach uses computer vision as the sole mechanism for both collision avoidance and visual target selection. In particular, we seek to swim close to the reef to make observations while avoiding both collisions and barren, coral-deprived regions. To carry out path selection while avoiding collisions, we use monocular image data processed in real time. The proposed system uses a convolutional neural network that takes an image from a forward-facing camera as input and predicts unscaled and relative path changes. The network is trained to encode our desired obstacle-avoidance and reef-exploration objectives via supervised learning from human-labeled data. The predictions from the network are transformed into absolute path changes via a combination of a temporally-smoothed proportional controller for heading targets and a low-level motor controller. This system enables safe and autonomous coral reef navigation in underwater environments. We validate our approach using an untethered and fully autonomous robot swimming through coral reef in the open ocean. Our robot successfully traverses 1000 m of the ocean floor collision-free while collecting close-up footage of coral reefs.