Arrow Research search

Author name cluster

Bowen Yang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

10 papers
2 author rows

Possible papers

10

AAAI Conference 2026 Conference Paper

MoCHA: Advanced Vision-Language Reasoning with MoE Connector and Hierarchical Group Attention

  • Yuqi Pang
  • Bowen Yang
  • Yun Cao
  • Rong Fan
  • Xiaoyu Li
  • Chen He

Vision large language models (VLLMs) are focusing primarily on handling complex and fine-grained visual information by incorporating advanced vision encoders and scaling up visual models. However, these approaches face high training and inference costs, as well as challenges in extracting visual details, effectively bridging across modalities. In this work, we propose a novel visual framework, MoCHA, to address these issues. Our framework integrates four vision backbones (i.e., CLIP, SigLIP, DINOv2 and ConvNeXt) to extract complementary visual features and is equipped with a sparse Mixture of Experts Connectors (MoECs) module to dynamically select experts tailored to different visual dimensions. To mitigate redundant or insufficient use of the visual information encoded by the MoECs module, we further design a Hierarchical Group Attention (HGA) with intra- and inter-group operations and an adaptive gating strategy for encoded visual features. We train MoCHA on two mainstream LLMs (e.g., Phi2-2.7B and Vicuna-7B) and evaluate their performance across various benchmarks. Notably, MoCHA outperforms state-of-the-art open-weight models on various tasks. For example, compared to CuMo (Mistral-7B), our MoCHA (Phi2-2.7B) presents outstanding abilities to mitigate hallucination by showing improvements of 3.25% in POPE and to follow visual instructions by raising 153 points on MME. Finally, ablation studies further confirm the effectiveness and robustness of the proposed MoECs and HGA in improving the overall performance of MoCHA.

ICLR Conference 2025 Conference Paper

Predictive Uncertainty Quantification for Bird's Eye View Segmentation: A Benchmark and Novel Loss Function

  • Linlin Yu
  • Bowen Yang
  • Tianhao Wang
  • Kangshuo Li
  • Feng Chen 0001

The fusion of raw sensor data to create a Bird's Eye View (BEV) representation is critical for autonomous vehicle planning and control. Despite the growing interest in using deep learning models for BEV semantic segmentation, anticipating segmentation errors and enhancing the explainability of these models remain underexplored. This paper introduces a comprehensive benchmark for predictive uncertainty quantification in BEV segmentation, evaluating multiple uncertainty quantification methods across three popular datasets with three representative network architectures. Our study focuses on the effectiveness of quantified uncertainty in detecting misclassified and out-of-distribution (OOD) pixels while also improving model calibration. Through empirical analysis, we uncover challenges in existing uncertainty quantification methods and demonstrate the potential of evidential deep learning techniques, which capture both aleatoric and epistemic uncertainty. To address these challenges, we propose a novel loss function, Uncertainty-Focal-Cross-Entropy (UFCE), specifically designed for highly imbalanced data, along with a simple uncertainty-scaling regularization term that improves both uncertainty quantification and model calibration for BEV segmentation.

NeurIPS Conference 2025 Conference Paper

Rope to Nope and Back Again: A New Hybrid Attention Strategy

  • Bowen Yang
  • Bharat Venkitesh
  • Dwaraknath Gnaneshwar Talupuru
  • Hangyu Lin
  • David Cairuz
  • Phil Blunsom
  • Acyr Locatelli

Long-context large language models (LLMs) have achieved remarkable advancements, driven by techniques like Rotary Position Embedding (RoPE) (Su et al. , 2023) and its extensions (Chen et al. , 2023; Liu et al. , 2024c; Peng et al. , 2023). By adjusting RoPE parameters and incorporating training data with extended contexts, we can train performant models with considerably longer input sequences. However, existing RoPE-based methods exhibit performance limitations when applied to extended context lengths. This paper presents a comprehensive analysis of various attention mechanisms, including RoPE, No Positional Embedding (NoPE), and Query-Key Normalization (QK-Norm), identifying their strengths and shortcomings in long-context modeling. Our investigation identifies distinctive attention patterns in these methods and highlights their impact on long-context performance, providing valuable insights for architectural design. on long context performance, providing valuable insights for architectural design. Building on these findings, we propose a novel architecture featuring a hybrid attention mechanism that integrates global and local attention spans. This design not only surpasses conventional RoPE-based transformer models with full attention in both long and short context tasks but also delivers substantial efficiency gains during training and inference.

NeurIPS Conference 2024 Conference Paper

Rethinking Imbalance in Image Super-Resolution for Efficient Inference

  • Wei Yu
  • Bowen Yang
  • Qinglin Liu
  • Jianing Li
  • Shengping Zhang
  • Xiangyang Ji

Existing super-resolution (SR) methods optimize all model weights equally using $\mathcal{L}_1$ or $\mathcal{L}_2$ losses by uniformly sampling image patches without considering dataset imbalances or parameter redundancy, which limits their performance. To address this, we formulate the image SR task as an imbalanced distribution transfer learning problem from a statistical probability perspective, proposing a plug-and-play Weight-Balancing framework (WBSR) to achieve balanced model learning without changing the original model structure and training data. Specifically, we develop a Hierarchical Equalization Sampling (HES) strategy to address data distribution imbalances, enabling better feature representation from texture-rich samples. To tackle model optimization imbalances, we propose a Balanced Diversity Loss (BDLoss) function, focusing on learning texture regions while disregarding redundant computations in smooth regions. After joint training of HES and BDLoss to rectify these imbalances, we present a gradient projection dynamic inference strategy to facilitate accurate and efficient inference. Extensive experiments across various models, datasets, and scale factors demonstrate that our method achieves comparable or superior performance to existing approaches with about 34\% reduction in computational cost.

ICRA Conference 2024 Conference Paper

Rethinking Imitation-based Planners for Autonomous Driving

  • Jie Cheng 0008
  • Yingbing Chen
  • Xiaodong Mei 0001
  • Bowen Yang
  • Bo Li
  • Ming Liu 0001

In recent years, imitation-based driving planners have reported considerable success. However, due to the absence of a standardized benchmark, the effectiveness of various designs remains unclear. The newly released nuPlan addresses this issue by offering a large-scale real-world dataset and a standardized closed-loop benchmark for equitable comparisons. Utilizing this platform, we conduct a comprehensive study on two fundamental yet underexplored aspects of imitation-based planners: the essential features for ego planning and the effective data augmentation techniques to reduce compounding errors. Furthermore, we highlight an imitation gap that has been overlooked by current learning systems. Finally, integrating our findings, we propose a strong baseline model—PlanTF. Our results demonstrate that a well-designed, purely imitation-based planner can achieve highly competitive performance compared to state-of-the-art methods involving hand-crafted rules and exhibit superior generalization capabilities in long-tail cases. Our models and benchmarks are publicly available. Project website https://jchengai.github.io/planTF.

NeurIPS Conference 2024 Conference Paper

SnapKV: LLM Knows What You are Looking for Before Generation

  • Yuhong Li
  • Yingbing Huang
  • Bowen Yang
  • Bharat Venkitesh
  • Acyr Locatelli
  • Hanchen Ye
  • Tianle Cai
  • Patrick Lewis

Large Language Models (LLMs) have made remarkable progress in processing extensive contexts, with the Key-Value (KV) cache playing a vital role in enhancing their performance. However, the growth of the KV cache in response to increasing input length poses challenges to memory and time efficiency. To address this problem, this paper introduces SnapKV, an innovative and fine-tuning-free approach that efficiently minimizes KV cache size while still delivering comparable performance in real-world applications. We discover that each attention head in the model consistently focuses on specific prompt attention features during generation. Meanwhile, this robust pattern can be obtained from an `observation' window located at the end of the prompts. Drawing on this insight, SnapKV automatically compresses KV caches by selecting clustered important KV positions for each attention head. Our approach significantly reduces the growing computational overhead and memory footprint when processing long input sequences. Specifically, SnapKV achieves a consistent decoding speed with a 3. 6x increase in generation speed and an 8. 2x enhancement in memory efficiency compared to baseline when processing inputs of 16K tokens. At the same time, it maintains comparable performance to baseline models across 16 long sequence datasets. Moreover, SnapKV can process up to 380K context tokens on a single A100-80GB GPU using HuggingFace implementation with minor changes, exhibiting only a negligible accuracy drop in the Needle-in-a-Haystack test. Further comprehensive studies suggest SnapKV's potential for practical applications.

IROS Conference 2022 Conference Paper

A Centaur System for Assisting Human Walking with Load Carriage

  • Ping Yang
  • Haoyun Yan
  • Bowen Yang
  • Jianquan Li
  • Kailin Li 0002
  • Yuquan Leng
  • Chenglong Fu 0001

Walking with load is a common task in daily life and disaster rescue. Long-term load carriage may cause irreversible damage to the human body. Although remarkable progress has been made in the field of wearable robots, it is still far from avoiding interference to human legs, which will lead to energy consumption. In this paper, a novel wearable robot, Centaur, for assisting load carriage has been proposed. The Centaur system consists of two rigid robotic legs of two degrees-of-freedom (DOFs) to transfer load weight to the ground. Different from exoskeletons, the robotic legs of the Centaur are placed behind the human rather than attached to human limbs, which can provide a larger support polygon and avoid additional interference to the wearer. Additionally, the Centaur can attain the locomotion stability of the quadruped while maintaining the motion agility of the biped itself. This paper also presents an interactive motion control strategy based on the human-robot interaction force. This control strategy incorporates legged robotics walking controller and real-time walking trajectory planning to realize the cooperative walking with human beings. Finally, experiments of human walking with load carriage have been conducted on flat terrain to verify the concept of the Centaur system. The result demonstrates that the Centaur system can effectively reduce 70. 03% of load weight during the single stance phase, which indicates that the Centaur system provides a new solution for assisting human walking with load-carriage.

IROS Conference 2022 Conference Paper

An Online Interactive Approach for Crowd Navigation of Quadrupedal Robots

  • Bowen Yang
  • Jianhao Jiao
  • Lujia Wang 0001
  • Ming Liu 0001

Robot navigation in human crowds remains the challenge of understanding human behaviors in different scenarios. We present an approach for interactive and human-friendly crowd navigation in complex static environments. The planner models the online interactions among the robot, humans, and the static environment based on game theory. It recurrently expands and optimizes the estimated trajectories for the robot and neighboring agents and provides human-friendly navigation commands. We use various indicators to evaluate the social awareness of the planners and show that our method outperforms existing approaches in success rate to reach the goals and compatibility with humans while maintaining low navigation times. The planner is successfully deployed on a real-world quadrupedal robot, demonstrating safe and interactive crowd navigation with real-time performance.

ICRA Conference 2021 Conference Paper

Real-time Optimal Navigation Planning Using Learned Motion Costs

  • Bowen Yang
  • Lorenz Wellhausen
  • Takahiro Miki
  • Ming Liu 0001
  • Marco Hutter 0001

Navigation on challenging terrain topographies requires the understanding of robots’ locomotion capabilities to produce optimal solutions. We present an integrated framework for real-time autonomous navigation of mobile robots based on elevation maps. The framework performs rapid global path planning and optimization that is aware of the locomotion capabilities of the robot. A GPU-aided, sampling-based path planner combined with a gradient-based path optimizer provides optimal paths by using a neural network-based locomotion cost predictor which is trained in simulation. We show that our approach is capable of planning and optimizing paths three orders of magnitude faster than RRT* on GPU-enabled hardware, enabling real-time deployment on mobile platforms. We successfully evaluate the framework on the ANYmal C quadrupedal robot in both simulations and real-world environments for path planning tasks on multiple complex terrains.