Author name cluster

YI Shen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

11 papers

2 author rows

AAAI Conference 2026 Conference Paper

JRDB-Reasoning: A Difficulty-Graded Benchmark for Visual Reasoning in Robotics

Simindokht Jahangard
Mehrzad Mohammadi
YI Shen
Zhixi Cai
Hamid Rezatofighi

Recent advances in Vision-Language Models (VLMs) and large language models (LLMs) have greatly enhanced visual reasoning, a key capability for embodied AI agents like robots. However, existing visual reasoning benchmarks often suffer from several limitations: they lack a clear definition of reasoning complexity, offer have no control to generate questions over varying difficulty and task customization, and fail to provide structured, step-by-step reasoning annotations (workflows). To bridge these gaps, we formalize reasoning complexity, introduce an adaptive query engine that generates customizable questions of varying complexity with detailed intermediate annotations, and extend the JRDB dataset with human-object interaction and geometric relationship annotations to create JRDB-Reasoning, a benchmark tailored for visual reasoning in human-crowded environments. Our engine and benchmark enable fine-grained evaluation of visual reasoning frameworks and dynamic assessment of visual-language models across reasoning levels.

PDF Details DOI

IROS Conference 2025 Conference Paper

Hierarchical Collision-Free Configuration Planning for a Soft Manipulator

Yi Shen
Ruochen Tai
Feiyu Hu
Zhe Liu

Soft manipulators (SMs) have shown great potential for interactive tasks in confined environments. However, avoiding obstacles of SMs may conflict with the manipulator’s configuration, the planned trajectory for tracking control, and the target position for grasping. To coordinate configuration planning, tracking control, and target grasping in obstacle avoidance, this study proposes a hierarchical configuration planning framework with three levels: behavior planning, configuration planning, and shape/position control. At the behavior planning level, a Discrete Event System (DES)-based planner is designed to orchestrate mode transitions among obstacle avoidance, tracking control, and target grasping. The configuration planning level adopts the Bézier curve to model the SM backbone curve and constructs a repulsive potential field to quantify obstacle effects on the entire manipulator configuration. Under the constraints of grasping distance and material physical limit, the control points of the Bézier curve corresponding to the optimal configuration that minimizes the repulsive potential energy are computed. Experiments demonstrate the effectiveness of the proposed framework in achieving collision-free configuration planning for object grasping and placement in confined operational scenarios.

Details

JBHI Journal 2025 Journal Article

MCBTNet: Multi-Feature Fusion CNN and Bi- Level Routing Attention Transformer-Based Medical Image Segmentation Network

Boheng Zhang
Zelin Zheng
Yanqi Zhao
YI Shen
Mingjian Sun

Accurate medical image segmentation is crucial for precise diagnosis and treatment in clinical pathology analysis and surgical navigation. While Convolutional Neural Network (CNN)-based approaches excel in capturing and analyzing local features, they often lose key global context. Transformers, utilizing self-attention mechanisms, address this issue but often overlook localized and multi-scale features while also requiring significant computational resources. To integrate the advantages of CNNs and Transformers to achieve efficient and precise medical image segmentation, we propose a segmentation framework based on multi-feature fusion CNN and Bi-level Routing Attention Transformer (MCBTNet). MCBTNet integrates CNNs and Transformers within a U-shaped encoder-decoder architecture. This configuration not only extracts multi-scale features via the U-shaped structure but also efficiently captures global contextual information through the dynamic sparsity of the Bi-Level Routing Attention Transformer. Our novel Frequency-Channel-Spatial multi-dimensional attention mechanism is implemented on skip connections, enhancing segmentation accuracy and speed by maximizing multi-scale feature utilization. Finally, MCBTNet obtains the segmentation result by fusing the predictions of different scales. Experimental results on five public datasets demonstrate that MCBTNet outperforms state-of-the-art methods in Dice and HD metrics, with lower computational and memory requirements.

Details DOI

IROS Conference 2025 Conference Paper

YOLO-MARL: You Only LLM Once for Multi-Agent Reinforcement Learning

Yuan Zhuang
Yi Shen
Zhili Zhang
Yuxiao Chen 0008
Fei Miao

Advancements in deep multi-agent reinforcement learning (MARL) have positioned it as a promising approach for decision-making in cooperative games. However, it still remains challenging for MARL agents to learn cooperative strategies for some game environments. Recently, large language models (LLMs) have demonstrated emergent reasoning capabilities, making them promising candidates for enhancing coordination among the agents. However, due to the model size of LLMs, it can be expensive to frequently infer LLMs for actions that agents can take. In this work, we propose You Only LLM Once for MARL (YOLO-MARL), a novel framework that leverages the high-level task planning capabilities of LLMs to improve the policy learning process of multi-agents in cooperative games. Notably, for each game environment, YOLO-MARL only requires one time interaction with LLMs in the proposed strategy generation, state interpretation and planning function generation modules, before the MARL policy training process. This avoids the ongoing costs and computational time associated with frequent LLMs API calls during training. Moreover, trained decentralized policies based on normal-sized neural networks operate independently of the LLM. We evaluate our method across two different environments and demonstrate that YOLO-MARL outperforms traditional MARL algorithms. The Github repository of our code can be found at https://github.com/paulzyzy/YOLO-MARL.

Details

NeurIPS Conference 2024 Conference Paper

Outlier-Robust Distributionally Robust Optimization via Unbalanced Optimal Transport

Zifan Wang
YI Shen
Michael M. Zavlanos
Karl H. Johansson

Distributionally Robust Optimization (DRO) accounts for uncertainty in data distributions by optimizing the model performance against the worst possible distribution within an ambiguity set. In this paper, we propose a DRO framework that relies on a new distance inspired by Unbalanced Optimal Transport (UOT). The proposed UOT distance employs a soft penalization term instead of hard constraints, enabling the construction of an ambiguity set that is more resilient to outliers. Under smoothness conditions, we establish strong duality of the proposed DRO problem. Moreover, we introduce a computationally efficient Lagrangian penalty formulation for which we show that strong duality also holds. Finally, we provide empirical results that demonstrate that our method offers improved robustness to outliers and is computationally less demanding for regression and classification tasks.

PDF Details DOI

TMLR Journal 2024 Journal Article

Wasserstein Distributionally Robust Policy Evaluation and Learning for Contextual Bandits

YI Shen
Pan Xu
Michael Zavlanos

Off-policy evaluation and learning are concerned with assessing a given policy and learning an optimal policy from offline data without direct interaction with the environment. Often, the environment in which the data are collected differs from the environment in which the learned policy is applied. To account for the effect of different environments during learning and execution, distributionally robust optimization (DRO) methods have been developed that compute worst-case bounds on the policy values assuming that the distribution of the new environment lies within an uncertainty set. Typically, this uncertainty set is defined based on the KL divergence around the empirical distribution computed from the logging dataset. However, the KL uncertainty set fails to encompass distributions with varying support and lacks awareness of the geometry of the distribution support. As a result, KL approaches fall short in addressing practical environment mismatches and lead to over-fitting to worst-case scenarios. To overcome these limitations, we propose a novel DRO approach that employs the Wasserstein distance instead. While Wasserstein DRO is generally computationally more expensive compared to KL DRO, we present a regularized method and a practical (biased) stochastic gradient descent method to optimize the policy efficiently. We also provide a theoretical analysis of the finite sample complexity and iteration complexity for our proposed method. We further validate our approach using a public dataset that was recorded in a randomized stoke trial.

PDF Details

EAAI Journal 2023 Journal Article

A systematic empirical study on word embedding based methods in discovering Chinese black keywords

Chenyang Wang
YI Shen
Yuwei Li
Min Zhang
Miao Hu
Jinghua Zheng

Details DOI

AAAI Conference 2022 Conference Paper

Hybrid Curriculum Learning for Emotion Recognition in Conversation

Lin Yang
YI Shen
Yue Mao
Longjun Cai

Emotion recognition in conversation (ERC) aims to detect the emotion label for each utterance. Motivated by recent studies which have proven that feeding training examples in a meaningful order rather than considering them randomly can boost the performance of models, we propose an ERCoriented hybrid curriculum learning framework. Our framework consists of two curricula: (1) conversation-level curriculum (CC); and (2) utterance-level curriculum (UC). In CC, we construct a difficulty measurer based on “emotion shift” frequency within a conversation, then the conversations are scheduled in an “easy to hard” schema according to the difficulty score returned by the difficulty measurer. For UC, it is implemented from an emotion-similarity perspective, which progressively strengthens the model’s ability in identifying the confusing emotions. With the proposed model-agnostic hybrid curriculum learning strategy, we observe significant performance boosts over a wide range of existing ERC models and we are able to achieve new state-of-the-art results on four public ERC datasets.

PDF Details

ICRA Conference 2022 Conference Paper

Sen-Glove: A Lightweight Wearable Glove for Hand Assistance with Soft Joint Sensing

Linan Deng
Yi Shen
Yang Hong
Yunlong Dong
Xin He 0016
Ye Yuan 0002
Zhi Li 0039
Han Ding 0001

Perception and portability are critical issues for wearable gloves in hand assistive engineering. However, available wearable gloves either lack flexible sensing or are bulky. In this paper, we present a tendon-driven lightweight wearable glove with soft joint sensing, Sen-Glove. Sen-Glove is equipped with 14 soft strain sensors, which enables full bending motion monitoring of 14 joints of five fingers and greatly reduces the weight of the glove. Besides, modular design makes Sen-Glove more compact and weighs 161g in total, reducing the burden on hand. A series of mechanical tests are conducted to evaluate the characteristics of Sen-Glove. Experimental results show that Sen-Glove can withstand 500 bending cycles, assist the subject in grasping 21 multi-scale objects, and recognize 11 gestures. The classification accuracy of 11 different gestures reaches 98. 6 %, which verifies the efficacy of the strain sensors.

Details

AAAI Conference 2021 Conference Paper

A Joint Training Dual-MRC Framework for Aspect Based Sentiment Analysis

Yue Mao
YI Shen
Chao Yu
Longjun Cai

Aspect based sentiment analysis (ABSA) involves three fundamental subtasks: aspect term extraction, opinion term extraction, and aspect-level sentiment classification. Early works only focused on solving one of these subtasks individually. Some recent work focused on solving a combination of two subtasks, e. g. , extracting aspect terms along with sentiment polarities or extracting the aspect and opinion terms pair-wisely. More recently, the triple extraction task has been proposed, i. e. , extracting the (aspect term, opinion term, sentiment polarity) triples from a sentence. However, previous approaches fail to solve all subtasks in a unified end-to-end framework. In this paper, we propose a complete solution for ABSA. We construct two machine reading comprehension (MRC) problems and solve all subtasks by joint training two BERT-MRC models with parameters sharing. We conduct experiments on these subtasks, and results on several benchmark datasets demonstrate the effectiveness of our proposed framework, which significantly outperforms existing state-ofthe-art methods.

PDF Details

ICRA Conference 2019 Conference Paper

A Hierarchical Framework for Coordinating Large-Scale Robot Networks

Zhe Liu 0022
Shunbo Zhou
Hesheng Wang 0001
Yi Shen
Haoang Li
Yun-Hui Liu 0001

In this paper, we study the cooperative path planning and motion coordination problems of the multi-robot system with large number of robots, aiming for practical applications in robotic warehouses and automated transportation systems. Particularly, we solve the life-long planning problem and guarantee the coordination performance in the presence of robot motion uncertainties. A hierarchical path planning and motion coordination structure is presented. The environment is divided into several sectors and a traffic heat-map is presented to describe the current sector-level traffic condition. In path planning level, the sector-level path is calculated by considering the path distance, the current traffic condition and the current robot uncertainty. In motion coordination level, local cooperative A* algorithm and conflict-based searching strategy are utilized within each sector to generate the collision-free local path of each robot in a rolling planning manner. The effectiveness and practical applicability of the proposed approach are validated by simulations with more than one thousand robots and real experiments.

Details