Arrow Research search

Author name cluster

Qi Ye

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers
2 author rows

Possible papers

7

NeurIPS Conference 2025 Conference Paper

OpenHOI: Open-World Hand-Object Interaction Synthesis with Multimodal Large Language Model

  • Zhenhao Zhang
  • Ye Shi
  • Lingxiao Yang
  • Suting Ni
  • Qi Ye
  • Jingya Wang

Understanding and synthesizing realistic 3D hand-object interactions (HOI) is critical for applications ranging from immersive AR/VR to dexterous robotics. Existing methods struggle with generalization, performing well on closed-set objects and predefined tasks but failing to handle unseen objects or open-vocabulary instructions. We introduce OpenHOI, the first framework for open-world HOI synthesis, capable of generating long-horizon manipulation sequences for novel objects guided by free-form language commands. Our approach integrates a 3D Multimodal Large Language Model (MLLM) fine-tuned for joint affordance grounding and semantic task decomposition, enabling precise localization of interaction regions (e. g. , handles, buttons) and breakdown of complex instructions (e. g. , “Find a water bottle and take a sip”) into executable sub-tasks. To synthesize physically plausible interactions, we propose an affordance-driven diffusion model paired with a training-free physics refinement stage that minimizes penetration and optimizes affordance alignment. Evaluations across diverse scenarios demonstrate OpenHOI’s superiority over state-of-the-art methods in generalizing to novel object categories, multi-stage tasks, and complex language instructions.

NeurIPS Conference 2025 Conference Paper

RFMPose: Generative Category-level Object Pose Estimation via Riemannian Flow Matching

  • Wenzhe Ouyang
  • Qi Ye
  • Jinghua Wang
  • Zenglin Xu
  • Jiming Chen

We introduce RFMPose, a novel generative framework for category-level 6D object pose estimation that learns deterministic pose trajectories through Riemannian Flow Matching (RFM). Existing discriminative approaches struggle with multi-hypothesis predictions (e. g. , symmetry ambiguities) and often require specialized network architectures. RFMPose advances this paradigm through three key innovations: (1) Ensuring geometric consistency via geodesic interpolation on Riemannian manifolds combined with bi-invariant metric constraints; (2) Alleviating symmetry-induced ambiguities through Riemannian Optimal Transport for probability mass redistribution without ad-hoc design; (3) Enabling end-to-end likelihood estimation through Hutchinson trace approximation, thereby eliminating auxiliary model dependencies. Extensive experiments on the Omni6DPose demonstrate state-of-the-art performance of the proposed method, with significant improvements of $\textbf{+4. 1}$ in $\mathrm{\textbf{IoU}_{25}}$ and $\textbf{+2. 4}$ in $\textbf{5°2cm}$ metrics compared to prior generative approaches. Furthermore, the proposed RFM framework exhibits robust sim-to-real transfer capabilities and facilitates pose tracking extensions with minimal architectural adaptation.

IJCAI Conference 2024 Conference Paper

Error-aware Sampling in Adaptive Shells for Neural Surface Reconstruction

  • Qi Wang
  • Yuchi Huo
  • Qi Ye
  • Rui Wang
  • Hujun Bao

Neural implicit surfaces with signed distance functions (SDFs) achieve superior quality in 3D geometry reconstruction. However, training SDFs is time-consuming because it requires a great number of samples to calculate accurate weight distributions and a considerable amount of samples sampled from the distribution for integrating the rendering results. Some existing sampling strategies focus on this problem. During the training, they assume a spatially-consistent convergence speed of kernel size, thus still suffering from low convergence or errors. Instead, we introduce an error-aware sampling method based on thin intervals of valid weight distributions, dubbed adaptive shells, to reduce the number of samples while still maintaining the reconstruction accuracy. To this end, we first extend Laplace-based neural implicit surfaces with learned spatially-varying kernel sizes which indicates the range of valid weight distributions. Then, the adaptive shell for each ray is determined by an efficient double-clipping strategy with spatially-varying SDF values and kernel sizes, fitting larger kernel sizes to wider shells. Finally, we calculate the error-bounded cumulative distribution functions (CDFs) of shells to conduct efficient importance sampling, achieving low-variance rendering with fewer calculations. Extensive results in various scenes demonstrate the superiority of our sampling technique, including significantly reducing sample counts and training time, even improving the reconstruction quality. The code is available at https: //github. com/erernan/ESampling.

AAAI Conference 2024 Conference Paper

In-Hand 3D Object Reconstruction from a Monocular RGB Video

  • Shijian Jiang
  • Qi Ye
  • Rengan Xie
  • Yuchi Huo
  • Xiang Li
  • Yang Zhou
  • Jiming Chen

Our work aims to reconstruct a 3D object that is held and rotated by a hand in front of a static RGB camera. Previous methods that use implicit neural representations to recover the geometry of a generic hand-held object from multi-view images achieved compelling results in the visible part of the object. However, these methods falter in accurately capturing the shape within the hand-object contact region due to occlusion. In this paper, we propose a novel method that deals with surface reconstruction under occlusion by incorporating priors of 2D occlusion elucidation and physical contact constraints. For the former, we introduce an object amodal completion network to infer the 2D complete mask of objects under occlusion. To ensure the accuracy and view consistency of the predicted 2D amodal masks, we devise a joint optimization method for both amodal mask refinement and 3D reconstruction. For the latter, we impose penetration and attraction constraints on the local geometry in contact regions. We evaluate our approach on HO3D and HOD datasets and demonstrate that it outperforms the state-of-the-art methods in terms of reconstruction surface quality, with an improvement of 52% on HO3D and 20% on HOD. Project webpage: https://east-j.github.io/ihor.

IJCAI Conference 2023 Conference Paper

Contact2Grasp: 3D Grasp Synthesis via Hand-Object Contact Constraint

  • Haoming Li
  • Xinzhuo Lin
  • Yang Zhou
  • Xiang Li
  • Yuchi Huo
  • Jiming Chen
  • Qi Ye

3D grasp synthesis generates grasping poses given an input object. Existing works tackle the problem by learning a direct mapping from objects to the distributions of grasping poses. However, because the physical contact is sensitive to small changes in pose, the high-nonlinear mapping between 3D object representation to valid poses is considerably non-smooth, leading to poor generation efficiency and restricted generality. To tackle the challenge, we introduce an intermediate variable for grasp contact areas to constrain the grasp generation; in other words, we factorize the mapping into two sequential stages by assuming that grasping poses are fully constrained given contact maps: 1) we first learn contact map distributions to generate the potential contact maps for grasps; 2) then learn a mapping from the contact maps to the grasping poses. Further, we propose a penetration-aware optimization with the generated contacts as a consistency constraint for grasp refinement. Extensive validations on two public datasets show that our method outperforms state-of-the-art methods regarding grasp generation on various metrics.

ICRA Conference 2023 Conference Paper

Implementation and Optimization of Grasping Learning with Dual-modal Soft Gripper

  • Lei Zhao
  • Haoyue Liu
  • Feihan Li
  • Xingyu Ding
  • Yuhao Sun
  • Fuchun Sun 0001
  • Jianhua Shan
  • Qi Ye

Robust and efficient grasping of different objects is still an open problem due to the difficulty of integrating multidisciplinary knowledge such as gripper ontology design, perception, control, and learning. In recent years, learning-based methods have achieved excellent results in grasping various novel objects. However, current methods are usually limited to a single grasping mode or rely on different end effectors to grasp objects of different shapes. For human beings, our hands are capable of grasping various objects with changes in grasping methods and form of hands. In light of this, developing a gripper with similar performance could possibly improve the robot's gripping ability. In this paper, we design a dual-modal soft gripper (DSG) and propose a deep reinforcement learning (DRL) framework to implement the operations. Both of our grasping modes, namely enveloping and pinching, are achieved through the tendon drive system and the deformation of the spring steel plate, which enables the gripper to switch between the two grasping modes in real time. We also combined the cutting-edge achievements of deep learning and reinforcement learning to design an autonomous grasping algorithm based on Q-learning and a deep Q network. Moreover, to fully utilize the visual input from the sensor, we added semantic embeddings of target objects to facilitate the learning, which is especially useful in deciding the grasping method for objects previously unseen. We also evaluate our DRL framework in different scenarios, offering a detailed comparison of each grasping mode and the mixed method (with or without semantic information). Our design has proved efficient in reducing the number of failing grasping actions and improving the success rate when facing novel and tricky objects.