Arrow Research search

Author name cluster

Yanyan Wei

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

2 papers
2 author rows

Possible papers

2

IROS Conference 2025 Conference Paper

Generalizable and Actionable Part Detection and Manipulation with SAM-rectified Segmentation and Iterative Pose Refinement

  • Sucheng Qian
  • Li Zhang 0104
  • Yanyan Wei
  • Liu Liu 0012
  • Cewu Lu

The ability to perform cross-category object perception and manipulation is highly desirable in building intelligent robots. One promising approach is to define the concept of Generalizable and Actionable Parts (GAParts), such as buttons and handles, on both seen and unseen object categories. However, the accurate cross-category perception of GAParts is still challenging due to the large inter-category object shape variations. To address this issue, we introduce SAMIR, a novel framework using SAM-rectified segmentation and Iterative pose Refinement for GAPart detection and manipulation. Firstly, we introduce a Segment Anything (SAM) segmentation prior to rectify the unconfident, fragmented GAPart instance proposals. Secondly, in addition to the zero-shot generalization of the SAM foundation model, we further finetune it with a lightweight adaptor model on our task dataset. Finally, we propose an iterative pose refinement procedure that improves the accuracy of GAPart pose estimation. Our perception experiments on GAPartNet dataset show that SAMIR consistently outperforms the baseline method on instance segmentation and pose estimation tasks. Our manipulation experiments in Sapien simulator illustrate that SAMIR leads to an improved manipulation success rate. We also deploy our method to a real robot for real-world manipulation. Our code and video are available at sites.google.com/view/samir-gapart.

AAAI Conference 2025 Conference Paper

Sign-IDD: Iconicity Disentangled Diffusion for Sign Language Production

  • Shengeng Tang
  • Jiayi He
  • Dan Guo
  • Yanyan Wei
  • Feng Li
  • Richang Hong

Sign Language Production (SLP) aims to generate semantically consistent sign videos from textual statements, where the conversion from textual glosses to sign poses (G2P) is a crucial step. Existing G2P methods typically treat sign poses as discrete three-dimensional coordinates and directly fit them, which overlooks the relative positional relationships among joints. To this end, we provide a new perspective, constraining joint associations and gesture details by modeling the limb bones to improve the accuracy and naturalness of the generated poses. In this work, we propose a pioneering iconicity disentangled diffusion framework, termed Sign-IDD, specifically designed for SLP. Sign-IDD incorporates a novel Iconicity Disentanglement (ID) module to bridge the gap between relative positions among joints. The ID module disentangles the conventional 3D joint representation into a 4D bone representation, comprising the 3D spatial direction vector and 1D spatial distance vector between adjacent joints. Additionally, an Attribute Controllable Diffusion (ACD) module is introduced to further constrain joint associations, in which the attribute separation layer aims to separate the bone direction and length attributes, and the attribute control layer is designed to guide the pose generation by leveraging the above attributes. The ACD module utilizes the gloss embeddings as semantic conditions and finally generates sign poses from noise embeddings. Extensive experiments on PHOENIX14T and USTC-CSL datasets validate the effectiveness of our method.