Arrow Research search

Author name cluster

Yuchi Huo

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

13 papers
2 author rows

Possible papers

13

AAAI Conference 2026 Conference Paper

LiDAR-GS++: Improving LiDAR Gaussian Reconstruction via Diffusion Priors

  • Qifeng Chen
  • Jiarun Liu
  • Rengan Xie
  • Tao Tang
  • Sicong Du
  • Yiru Zhao
  • Yuchi Huo
  • Sheng Yang

Recent GS-based rendering has made significant progress for LiDAR, surpassing Neural Radiance Fields (NeRF) in both quality and speed. However, these methods exhibit artifacts in extrapolated novel view synthesis due to the incomplete reconstruction from single traversal scans. To address this limitation, we present LiDAR-GS++, a LiDAR Gaussian Splatting reconstruction method enhanced by diffusion priors for real-time and high-fidelity re-simulation on public urban roads. Specifically, we introduce a controllable LiDAR generation model conditioned on coarsely extrapolated rendering to produce extra geometry-consistent scans and employ an effective distillation mechanism for expansive LiDAR Gaussian reconstruction. By extending reconstruction to under-fitted regions, our approach ensures global geometric consistency for extrapolative novel views while preserving detailed scene surfaces captured by sensors. Experiments on multiple public datasets demonstrate that LiDAR-GS++ achieves state-of-the-art performance for both interpolated and extrapolated viewpoints, surpassing existing GS and NeRF-based methods.

AAAI Conference 2026 Conference Paper

OmniVDiff: Omni Controllable Video Diffusion for Generation and Understanding

  • Dianbing Xi
  • Jiepeng Wang
  • Yuanzhi Liang
  • Xi Qiu
  • Yuchi Huo
  • Rui Wang
  • Chi Zhang
  • Xuelong Li

In this paper, we propose a novel framework for controllable video diffusion, OmniVDiff, aiming to synthesize and comprehend multiple video visual content in a single diffusion model. To achieve this, OmniVDiff treats all video visual modalities in the color space to learn a joint distribution, while employing an adaptive control strategy that dynamically adjusts the role of each visual modality during the diffusion process, either as a generation modality or a conditioning modality. Our framework supports three key capabilities: (1) Text-conditioned video generation, where all modalities are jointly synthesized from a textual prompt; (2) Video understanding, where structural modalities are predicted from rgb inputs in a coherent manner; and (3) X-conditioned video generation, where video synthesis is guided by finegrained inputs such as depth, canny and segmentation. Extensive experiments demonstrate that OmniVDiff achieves state-of-the-art performance in video generation tasks and competitive results in video understanding. Its flexibility and scalability make it well-suited for downstream applications such as video-to-video translation, modality adaptation for visual tasks, and scene reconstruction.

AAAI Conference 2026 Conference Paper

PFAvatar: Pose-Fusion 3D Personalized Avatar Reconstruction from Real-World Outfit-of-the-Day Photos

  • Dianbing Xi
  • Guoyuan An
  • Jingsen Zhu
  • Zhijian Liu
  • Yuan Liu
  • Ruiyuan Zhang
  • Jiayuan Lu
  • Yuchi Huo

We propose PFAvatar (Pose-Fusion Avatar), a new method that reconstructs high-quality 3D avatars from Outfit of the Day (OOTD) photos, which exhibit diverse poses, occlusions, and complex backgrounds. Our method consists of two stages: (1) fine-tuning a pose-aware diffusion model from few-shot OOTD examples and (2) distilling a 3D avatar represented by a neural radiance field (NeRF). In the first stage, unlike previous methods that segment images into assets (e.g. garments, accessories) for 3D assembly, which is prone to inconsistency, we avoid decomposition and directly model the full-body appearance. By integrating a pre-trained ControlNet for pose estimation and a novel Condition Prior Preservation Loss (CPPL), our method enables end-to-end learning of fine details while mitigating language drift in few-shot training. Our method completes personalization in just 5 minutes, achieving a 48x speed-up compared to previous approaches. In the second stage, we introduce a NeRF-based avatar representation optimized by canonical SMPL-X space sampling and Multi-Resolution 3D-SDS. Compared to mesh-based representations that suffer from resolution-dependent discretization and erroneous occluded geometry, our continuous radiance field can preserve high-frequency textures (e.g., hair) and handle occlusions correctly through transmittance. Experiments demonstrate that PFAvatar outperforms state-of-the-art methods in terms of reconstruction fidelity, detail preservation, and robustness to occlusions/truncations, advancing practical 3D avatar generation from real-world OOTD albums. In addition, the reconstructed 3D avatars support downstream applications such as virtual try-on, animation, and human video reenactment, further demonstrating the versatility and practical value of our approach.

ICLR Conference 2025 Conference Paper

Inverse Rendering using Multi-Bounce Path Tracing and Reservoir Sampling

  • Yuxin Dai
  • Qi Wang 0111
  • Jingsen Zhu
  • Dianbing Xi
  • Yuchi Huo
  • Chen Qian 0006
  • Ying He 0001

We introduce MIRReS, a novel two-stage inverse rendering framework that jointly reconstructs and optimizes explicit geometry, materials, and lighting from multi-view images. Unlike previous methods that rely on implicit irradiance fields or oversimplified ray tracing, our method begins with an initial stage that extracts an explicit triangular mesh. In the second stage, we refine this representation using a physically-based inverse rendering model with multi-bounce path tracing and Monte Carlo integration. This enables our method to accurately estimate indirect illumination effects, including self-shadowing and internal reflections, leading to a more precise intrinsic decomposition of shape, material, and lighting. To address the noise issue in Monte Carlo integration, we incorporate reservoir sampling, improving convergence and enabling efficient gradient-based optimization with low sample counts. Through both qualitative and quantitative assessments across various scenarios, especially those with complex shadows, we demonstrate that our method achieves state-of-the-art decomposition performance. Furthermore, our optimized explicit geometry seamlessly integrates with modern graphics engines supporting downstream applications such as scene editing, relighting, and material editing.

IJCAI Conference 2025 Conference Paper

Leveraging Pretrained Diffusion Models for Zero-Shot Part Assembly

  • Ruiyuan Zhang
  • Qi Wang
  • Jiaxiang Liu
  • Yuchi Huo
  • Chao Wu

3D part assembly aims to understand part relationships and predict their 6-DoF poses to construct realistic 3D shapes, addressing the growing demand for autonomous assembly, which is crucial for robots. Existing methods mainly estimate the transformation of each part by training neural networks under supervision, which requires a substantial quantity of manually labeled data. However, the high cost of data collection and the immense variability of real-world shapes and parts make traditional methods impractical for large-scale applications. In this paper, we propose first a zero-shot part assembly method that utilizes pre-trained point cloud diffusion models as discriminators in the assembly process, guiding the manipulation of parts to form realistic shapes. Specifically, we theoretically demonstrate that utilizing a diffusion model for zero-shot part assembly can be transformed into an Iterative Closest Point (ICP) process. Then, we propose a novel pushing-away strategy to address the overlap parts, thereby further enhancing the robustness of the method. To verify our work, we conduct extensive experiments and quantitative comparisons to several strong baseline methods, demonstrating the effectiveness of the proposed approach, which even surpasses the supervised learning method. The code has been released on https: //github. com/Ruiyuan-Zhang/Zero-Shot-Assembly.

IJCAI Conference 2024 Conference Paper

Error-aware Sampling in Adaptive Shells for Neural Surface Reconstruction

  • Qi Wang
  • Yuchi Huo
  • Qi Ye
  • Rui Wang
  • Hujun Bao

Neural implicit surfaces with signed distance functions (SDFs) achieve superior quality in 3D geometry reconstruction. However, training SDFs is time-consuming because it requires a great number of samples to calculate accurate weight distributions and a considerable amount of samples sampled from the distribution for integrating the rendering results. Some existing sampling strategies focus on this problem. During the training, they assume a spatially-consistent convergence speed of kernel size, thus still suffering from low convergence or errors. Instead, we introduce an error-aware sampling method based on thin intervals of valid weight distributions, dubbed adaptive shells, to reduce the number of samples while still maintaining the reconstruction accuracy. To this end, we first extend Laplace-based neural implicit surfaces with learned spatially-varying kernel sizes which indicates the range of valid weight distributions. Then, the adaptive shell for each ray is determined by an efficient double-clipping strategy with spatially-varying SDF values and kernel sizes, fitting larger kernel sizes to wider shells. Finally, we calculate the error-bounded cumulative distribution functions (CDFs) of shells to conduct efficient importance sampling, achieving low-variance rendering with fewer calculations. Extensive results in various scenes demonstrate the superiority of our sampling technique, including significantly reducing sample counts and training time, even improving the reconstruction quality. The code is available at https: //github. com/erernan/ESampling.

AAAI Conference 2024 Conference Paper

In-Hand 3D Object Reconstruction from a Monocular RGB Video

  • Shijian Jiang
  • Qi Ye
  • Rengan Xie
  • Yuchi Huo
  • Xiang Li
  • Yang Zhou
  • Jiming Chen

Our work aims to reconstruct a 3D object that is held and rotated by a hand in front of a static RGB camera. Previous methods that use implicit neural representations to recover the geometry of a generic hand-held object from multi-view images achieved compelling results in the visible part of the object. However, these methods falter in accurately capturing the shape within the hand-object contact region due to occlusion. In this paper, we propose a novel method that deals with surface reconstruction under occlusion by incorporating priors of 2D occlusion elucidation and physical contact constraints. For the former, we introduce an object amodal completion network to infer the 2D complete mask of objects under occlusion. To ensure the accuracy and view consistency of the predicted 2D amodal masks, we devise a joint optimization method for both amodal mask refinement and 3D reconstruction. For the latter, we impose penetration and attraction constraints on the local geometry in contact regions. We evaluate our approach on HO3D and HOD datasets and demonstrate that it outperforms the state-of-the-art methods in terms of reconstruction surface quality, with an improvement of 52% on HO3D and 20% on HOD. Project webpage: https://east-j.github.io/ihor.

ICRA Conference 2024 Conference Paper

TPGP: Temporal-Parametric Optimization with Deep Grasp Prior for Dexterous Motion Planning

  • Haoming Li 0004
  • Qi Ye 0001
  • Yuchi Huo
  • Qingtao Liu
  • Shijian Jiang
  • Tao Zhou
  • Xiang Li
  • Yang Zhou

Grasping motion planning aims to find a feasible grasping trajectory in the configuration space given an input target grasp. While optimizing grasp motion with two or three-fingered grippers has been well studied, the study on natural grasp motion planning with a dexterous hand remains a very challenging problem due to the high dimensional working space. In this work, we propose a novel temporal-parametric grasp prior (TPGP) optimization method to simplify the difficulty of grasping trajectory optimization for the dexterous hand while maintaining smooth and natural properties of the grasping motion. Specifically, we formulate the discrete trajectory parameters into a temporal-based parameterization, where the prior constraint provided by a hand poser network, is introduced to ensure that hand pose is natural and reasonable throughout the trajectory. Finally, we present a joint target optimization strategy to enhance the target pose for more feasible trajectories. Extensive validations on two public datasets show that our method outperforms state-of-the-art methods regarding grasp motion on various metrics.

IJCAI Conference 2023 Conference Paper

Contact2Grasp: 3D Grasp Synthesis via Hand-Object Contact Constraint

  • Haoming Li
  • Xinzhuo Lin
  • Yang Zhou
  • Xiang Li
  • Yuchi Huo
  • Jiming Chen
  • Qi Ye

3D grasp synthesis generates grasping poses given an input object. Existing works tackle the problem by learning a direct mapping from objects to the distributions of grasping poses. However, because the physical contact is sensitive to small changes in pose, the high-nonlinear mapping between 3D object representation to valid poses is considerably non-smooth, leading to poor generation efficiency and restricted generality. To tackle the challenge, we introduce an intermediate variable for grasp contact areas to constrain the grasp generation; in other words, we factorize the mapping into two sequential stages by assuming that grasping poses are fully constrained given contact maps: 1) we first learn contact map distributions to generate the potential contact maps for grasps; 2) then learn a mapping from the contact maps to the grasping poses. Further, we propose a penetration-aware optimization with the generated contacts as a consistency constraint for grasp refinement. Extensive validations on two public datasets show that our method outperforms state-of-the-art methods regarding grasp generation on various metrics.

ICRA Conference 2023 Conference Paper

ImmFusion: Robust mmWave-RGB Fusion for 3D Human Body Reconstruction in All Weather Conditions

  • Anjun Chen
  • Xiangyu Wang
  • Kun Shi 0003
  • Shaohao Zhu
  • Bin Fang
  • Yingfeng Chen
  • Jiming Chen 0001
  • Yuchi Huo

3D human reconstruction from RGB images achieves decent results in good weather conditions but degrades dramatically in rough weather. Complementary, mmWave radars have been employed to reconstruct 3D human joints and meshes in rough weather. However, combining RGB and mmWave signals for robust all-weather 3D human reconstruction is still an open challenge, given the sparse nature of mmWave and the vulnerability of RGB images. In this paper, we present ImmFusion, the first mmWave-RGB fusion solution to reconstruct 3D human bodies in all weather conditions robustly. Specifically, our ImmFusion consists of image and point backbones for token feature extraction and a Transformer module for token fusion. The image and point backbones refine global and local features from original data, and the Fusion Transformer Module aims for effective information fusion of two modalities by dynamically selecting informative tokens. Extensive experiments on a large-scale dataset, mmBody, captured in various environments demonstrate that ImmFusion can efficiently utilize the information of two modalities to achieve a robust 3D human body reconstruction in all weather conditions. In addition, our method's accuracy is significantly superior to that of state-of-the-art Transformer-based LiDAR-camera fusion methods.

NeurIPS Conference 2023 Conference Paper

Topological RANSAC for instance verification and retrieval without fine-tuning

  • Guoyuan An
  • Ju-hyeong Seon
  • Inkyu An
  • Yuchi Huo
  • Sung-Eui Yoon

This paper presents an innovative approach to enhancing explainable image retrieval, particularly in situations where a fine-tuning set is unavailable. The widely-used SPatial verification (SP) method, despite its efficacy, relies on a spatial model and the hypothesis-testing strategy for instance recognition, leading to inherent limitations, including the assumption of planar structures and neglect of topological relations among features. To address these shortcomings, we introduce a pioneering technique that replaces the spatial model with a topological one within the RANSAC process. We propose bio-inspired saccade and fovea functions to verify the topological consistency among features, effectively circumventing the issues associated with SP's spatial model. Our experimental results demonstrate that our method significantly outperforms SP, achieving state-of-the-art performance in non-fine-tuning retrieval. Furthermore, our approach can enhance performance when used in conjunction with fine-tuned features. Importantly, our method retains high explainability and is lightweight, offering a practical and adaptable solution for a variety of real-world applications.

NeurIPS Conference 2021 Conference Paper

Hypergraph Propagation and Community Selection for Objects Retrieval

  • Guoyuan An
  • Yuchi Huo
  • Sung-Eui Yoon

Spatial verification is a crucial technique for particular object retrieval. It utilizes spatial information for the accurate detection of true positive images. However, existing query expansion and diffusion methods cannot efficiently propagate the spatial information in an ordinary graph with scalar edge weights, resulting in low recall or precision. To tackle these problems, we propose a novel hypergraph-based framework that efficiently propagates spatial information in query time and retrieves an object in the database accurately. Additionally, we propose using the image graph's structure information through community selection technique, to measure the accuracy of the initial search result and to provide correct starting points for hypergraph propagation without heavy spatial verification computations. Experiment results on ROxford and RParis show that our method significantly outperforms the existing query expansion and diffusion methods.