Arrow Research search

Author name cluster

Yibin Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

9 papers
2 author rows

Possible papers

9

AAAI Conference 2025 Conference Paper

Dynamic Operator Optimization for Efficient Multi-Tenant LoRA Model Serving

  • Changhai Zhou
  • Yuhua Zhou
  • Shiyang Zhang
  • Yibin Wang
  • Zekai Liu

Low-Rank Adaptation (LoRA) has become increasingly popular for efficiently fine-tuning large language models (LLMs) with minimal resources. However, traditional methods that serve multiple LoRA models independently result in redundant computation and low GPU utilization. This paper addresses these inefficiencies by introducing Dynamic Operator Optimization (Dop), an advanced automated optimization technique designed to dynamically optimize the Segmented Gather Matrix-Vector Multiplication (SGMV) operator based on specific scenarios. SGMV's unique design enables batching GPU operations for different LoRA models, significantly improving computational efficiency. The Dop approach leverages a Search Space Constructor to create a hierarchical search space, dividing the program space into high-level structural sketches and low-level implementation details, ensuring diversity and flexibility in operator implementation. Furthermore, an Optimization Engine refines these implementations using evolutionary search, guided by a cost model that estimates program performance. This iterative optimization process ensures that SGMV implementations can dynamically adapt to different scenarios to maintain high performance. We demonstrate that Dop can improve throughput by 1.30-1.46 times in a SOTA multi-tenant LoRA serving.

NeurIPS Conference 2025 Conference Paper

Improving Data Efficiency for LLM Reinforcement Fine-tuning Through Difficulty-targeted Online Data Selection and Rollout Replay

  • Yifan Sun
  • Jingyan Shen
  • Yibin Wang
  • Tianyu Chen
  • Zhendong Wang
  • Mingyuan Zhou
  • Huan Zhang

Reinforcement learning (RL) has become an effective approach for fine-tuning large language models (LLMs), particularly to enhance their reasoning capabilities. However, RL fine-tuning remains highly resource-intensive, and existing work has largely overlooked the problem of data efficiency. In this paper, we propose two techniques to improve data efficiency in LLM RL fine-tuning: difficulty-targeted online data selection and rollout replay. We introduce the notion of adaptive difficulty to guide online data selection, prioritizing questions of moderate difficulty that are more likely to yield informative learning signals. To estimate adaptive difficulty efficiently, we develop an attention-based framework that requires rollouts for only a small reference set of questions. The adaptive difficulty of the remaining questions is then estimated based on their similarity to this set. To further reduce rollout cost, we introduce a rollout replay mechanism inspired by experience replay in traditional RL. This technique reuses recent rollouts, lowering per-step computation while maintaining stable updates. Experiments across 6 LLM-dataset combinations show that our method reduces RL fine-tuning time by 23% to 62% while reaching the same level of performance as the original GRPO algorithm. Our code repository is available at https: //github. com/ASTRAL-Group/data-efficient-llm-rl/.

NeurIPS Conference 2025 Conference Paper

Training-Free Bayesianization for Low-Rank Adapters of Large Language Models

  • Haizhou Shi
  • Yibin Wang
  • Ligong Han
  • Huan Zhang
  • Hao Wang

Estimating the uncertainty of responses from Large Language Models (LLMs) remains a critical challenge. While recent Bayesian methods have demonstrated effectiveness in quantifying uncertainty through low-rank weight updates, they typically require complex fine-tuning or post-training procedures. In this paper, we propose T raining- F ree B ayesianization ( TFB ), a simple yet theoretically grounded framework that efficiently transforms trained low-rank adapters into Bayesian ones without additional training. TFB systematically searches for the maximally acceptable level of variance in the weight posterior, constrained within a family of low-rank isotropic Gaussian distributions. Our theoretical analysis shows that under mild conditions, this search process is equivalent to KL-regularized variational optimization, a generalized form of variational inference. Through comprehensive experiments, we show that TFB achieves superior uncertainty estimation and generalization compared to existing methods while eliminating the need for complex Bayesianization training procedures.

NeurIPS Conference 2025 Conference Paper

Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning

  • Yibin Wang
  • li zhimin
  • Yuhang Zang
  • Chunyu Wang
  • Qinglin Lu
  • Cheng Jin
  • Jiaqi Wang

Recent advances in multimodal Reward Models (RMs) have shown significant promise in delivering reward signals to align vision models with human preferences. However, current RMs are generally restricted to providing direct responses or engaging in shallow reasoning processes with limited depth, often leading to inaccurate reward signals. We posit that incorporating explicit long chains of thought (CoT) into the reward reasoning process can significantly strengthen their reliability and robustness. Furthermore, we believe that once RMs internalize CoT reasoning, their direct response accuracy can also be improved through implicit reasoning capabilities. To this end, this paper proposes UnifiedReward-Think, the first unified multimodal CoT-based reward model, capable of multi-dimensional, step-by-step long-chain reasoning for both visual understanding and generation reward tasks. Specifically, we adopt an exploration-driven reinforcement fine-tuning approach to elicit and incentivize the model's latent complex reasoning ability: (1) We first use a small amount of image generation preference data to distill the reasoning process of GPT-4o, which is then used for the model's cold start to learn the format and structure of CoT reasoning. (2) Subsequently, by leveraging the model's prior knowledge and generalization capabilities, we prepare large-scale unified multimodal preference data to elicit the model's reasoning process across various vision tasks. During this phase, correct reasoning outputs are retained for rejection sampling to refine the model (3) while incorrect predicted samples are finally used for Group Relative Policy Optimization (GRPO) based reinforcement fine-tuning, enabling the model to explore diverse reasoning paths and optimize for correct and robust solutions. Extensive experiments confirm that incorporating long CoT reasoning significantly enhances the accuracy of reward signals. Notably, after mastering CoT reasoning, the model exhibits implicit reasoning capabilities, allowing it to surpass existing baselines even without explicit reasoning traces.

NeurIPS Conference 2024 Conference Paper

BLoB: Bayesian Low-Rank Adaptation by Backpropagation for Large Language Models

  • Yibin Wang
  • Haizhou Shi
  • Ligong Han
  • Dimitris Metaxas
  • Hao Wang

Large Language Models (LLMs) often suffer from overconfidence during inference, particularly when adapted to downstream domain-specific tasks with limited data. Previous work addresses this issue by employing approximate Bayesian estimation after the LLMs are trained, enabling them to quantify uncertainty. However, such post-training approaches' performance is severely limited by the parameters learned during training. In this paper, we go beyond post-training Bayesianization and propose Bayesian Low-Rank Adaptation by Backpropagation (BLoB), an algorithm that continuously and jointly adjusts both the mean and covariance of LLM parameters throughout the whole fine-tuning process. Our empirical results verify the effectiveness of BLoB in terms of generalization and uncertainty estimation, when evaluated on both in-distribution and out-of-distribution data.

IROS Conference 2024 Conference Paper

Millipede-Inspired Multi-legged Magnetic Soft Robots for Targeted Locomotion in Tortuous Environments

  • Yibin Wang
  • Yiting Xiong
  • Kaiwen Fang
  • Jiangfan Yu

Miniature robots capable of untethered operation hold great promise for performing diagnostic and therapeutic procedures in hard-to-reach regions within the human body. Nonetheless, navigating these complex and diverse physiological environments remains a significant challenge. To effectively navigate the tortuous pathways inside the human body, it is essential to equip miniature robots with flexible body structures that can adapt to complex geometries and develop efficient actuation strategies for deformed robots. In this study, we present a miniature soft robot featuring a zigzag body structure, imparting the robot with remarkable deformation capabilities that enable it to adapt to confined and tortuous spaces. This robot is equipped with arrays of magnetic legs, enabling robust locomotion propelled by traveling metachronal waves. We demonstrate that the robot can crawl on both flat surfaces and slopes. Leveraging its in-plane flexibility and discrete actuation system, this robot can navigate through intricate environments with precise control using magnetic fields. Our work provides valuable insights into the development of crawling robots with enhanced agility and adaptability, creating opportunities for their future use in a wide range of biomedical applications.

ICRA Conference 2024 Conference Paper

Weakly-Supervised Depth Completion during Robotic Micromanipulation from a Monocular Microscopic Image

  • Han Yang
  • Yufei Jin
  • Guanqiao Shan
  • Yibin Wang
  • Yongbin Zheng
  • Jiangfan Yu
  • Yu Sun 0001
  • Zhuoran Zhang 0001

Obtaining three-dimensional information, especially the z-axis depth information, is crucial for robotic micromanipulation. Due to the unavailability of depth sensors such as lidars in micromanipulation setups, traditional depth acquisition methods such as depth from focus or depth from defocus directly infer depth from microscopic images and suffer from poor resolution. Alternatively, micromanipulation tasks obtain accurate depth information by detecting the contact between an end-effector and an object (e. g. , a cell). Despite its high accuracy, only sparse depth data can be obtained due to its low efficiency. This paper aims to address the challenge of acquiring dense depth information during robotic cell micromanipulation. A weakly-supervised depth completion network is proposed to take cell images and sparse depth data obtained by contact detection as input to generate a dense depth map. A two-stage data augmentation method is proposed to augment the sparse depth data, and the depth map is optimized by a network refinement method. The experimental results show that the MAE value of the depth prediction error is less than 0. 3 µm, which proves the accuracy and effectiveness of the method. This deep learning network pipeline can be seamlessly integrated with the robotic micromanipulation tasks to provide accurate depth information.

AAAI Conference 2023 Conference Paper

Multi-View MOOC Quality Evaluation via Information-Aware Graph Representation Learning

  • Lu Jiang
  • Yibin Wang
  • Jianan Wang
  • Pengyang Wang
  • Minghao Yin

In this paper, we study the problem of MOOC quality evaluation that is essential for improving the course materials, promoting students' learning efficiency, and benefiting user services. While achieving promising performances, current works still suffer from the complicated interactions and relationships of entities in MOOC platforms. To tackle the challenges, we formulate the problem as a course representation learning task based, and develop an Information-aware Graph Representation Learning(IaGRL) for multi-view MOOC quality evaluation. Specifically, We first build a MOOC Heterogeneous Network (HIN) to represent the interactions and relationships among entities in MOOC platforms. And then we decompose the MOOC HIN into multiple single-relation graphs based on meta-paths to depict multi-view semantics of courses. The course representation learning can be further converted to a multi-view graph representation task. Different from traditional graph representation learning, the learned course representations are expected to match the following three types of validity: (1) the agreement on expressiveness between the raw course portfolio and the learned course representations; (2) the consistency between the representations in each view and the unified representations; (3) the alignment between the course and MOOC platform representations. Therefore, we propose to exploit mutual information for preserving the validity of course representations. We conduct extensive experiments over real-world MOOC datasets to demonstrate the effectiveness of our proposed method.

TIST Journal 2023 Journal Article

Reinforced Explainable Knowledge Concept Recommendation in MOOCs

  • Lu Jiang
  • Kunpeng Liu
  • Yibin Wang
  • Dongjie Wang
  • Pengyang Wang
  • Yanjie Fu
  • Minghao Yin

In this article, we study knowledge concept recommendation in Massive Open Online Courses (MOOCs) in an explainable manner. Knowledge concepts, composing course units (e.g., videos) in MOOCs, refer to topics and skills that students are expected to master. Compared to traditional course recommendation in MOOCs, knowledge concepts recommendation has drawn more attention because students’ interests over knowledge concepts can better revealstudents’ real intention in a more refined granularity. However, there are three unique challenges in knowledge concept recommendation: (1) How to design an appropriate data structure to capture complex relationships between knowledge concepts, course units, and other participants (e.g., students, teachers)? (2) How to model interactions between students and knowledge concepts? (3) How to make explainable recommendation results to students? To tackle these challenges, we formulate the knowledge concept recommendation as a reinforcement learning task integrated with MOOC knowledge graph (KG). Specifically, we first construct MOOC KG as the environment to capture all the relationships and behavioral histories by considering all the entities (e.g., students, teachers, videos, courses, and knowledge concepts) on the MOOC provider. Then, to model the interactions between students and knowledge concepts, we train an agent to mimic students’ learning behavioral patterns facing the complex environment. Moreover, to provide explainable recommendation results, we generate recommended knowledge concepts in the format of a path from MOOC KG to indicate semantic reasons. Finally, we conduct extensive experiments on a real-world MOOC dataset to demonstrate the effectiveness of our proposed method.