Arrow Research search

Author name cluster

Ruihuang Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

4 papers
2 author rows

Possible papers

4

AAAI Conference 2026 Conference Paper

Fast Multi-view Consistent 3D Editing with Video Priors

  • Liyi Chen
  • Ruihuang Li
  • Guowen Zhang
  • Pengfei Wang
  • Lei Zhang

Text-driven 3D editing enables user-friendly 3D object or scene editing with text instructions. Due to the lack of multi-view consistency priors, existing methods typically resort to employ 2D generation or editing models to process per-view individually, followed by iterative 2D-3D-2D updating. However, these methods are not only time-consuming but also prone to yielding over-smoothed results, since iterative process averages the different editing signals gathered from different views. In this paper, we propose, an early and pioneering work of generative Video Prior based 3D Editing, ViP3DE in short, to repurpose the temporal consistency priors from pre-trained video generation models to achieve consistent 3D editing within a single forward pass. Our key insight is to condition the video generation model on a single edited view to generate other consistent edited views for 3D updating directly, thereby bypassing iterative editing paradigm. First, 3D updating requires edited views to be paired with specific camera poses. To this end, we propose \textit{motion-preserved noise blending} for the video model to generate edited views at predefined camera poses. In addition, we introduce \textit{geometrically aware denoising} to further enhance multi-view consistency by integrating 3D geometric priors into video models. Extensive experiments demonstrate that our proposed ViP3DE can achieve high-quality 3D editing results even within a single forward pass, significantly outperforming existing methods in both editing quality and editing time cost.

ICLR Conference 2025 Conference Paper

FreCaS: Efficient Higher-Resolution Image Generation via Frequency-aware Cascaded Sampling

  • Zhengqiang Zhang
  • Ruihuang Li
  • Lei Zhang 0006

While image generation with diffusion models has achieved a great success, generating images of higher resolution than the training size remains a challenging task due to the high computational cost. Current methods typically perform the entire sampling process at full resolution and process all frequency components simultaneously, contradicting with the inherent coarse-to-fine nature of latent diffusion models and wasting computations on processing premature high-frequency details at early diffusion stages. To address this issue, we introduce an efficient $\textbf{Fre}$quency-aware $\textbf{Ca}$scaded $\textbf{S}$ampling framework, $\textbf{FreCaS}$ in short, for higher-resolution image generation. FreCaS decomposes the sampling process into cascaded stages with gradually increased resolutions, progressively expanding frequency bands and refining the corresponding details. We propose an innovative frequency-aware classifier-free guidance (FA-CFG) strategy to assign different guidance strengths for different frequency components, directing the diffusion model to add new details in the expanded frequency domain of each stage. Additionally, we fuse the cross-attention maps of previous and current stages to avoid synthesizing unfaithful layouts. Experiments demonstrate that FreCaS significantly outperforms state-of-the-art methods in image quality and generation speed. In particular, FreCaS is about 2.86$\times$ and 6.07$\times$ faster than ScaleCrafter and DemoFusion in generating a 2048$\times$2048 image using a pretrained SDXL model and achieves an $\text{FID}_b$ improvement of 11.6 and 3.7, respectively. FreCaS can be easily extended to more complex models such as SD3. The source code of FreCaS can be found at https://github.com/xtudbxk/FreCaS.

AAAI Conference 2025 Conference Paper

SyncNoise: Geometrically Consistent Noise Prediction for Instruction-based 3D Editing

  • Ruihuang Li
  • Liyi Chen
  • Zhengqiang Zhang
  • Varun Jampani
  • Vishal M. Patel
  • Lei Zhang

Text-based 2D diffusion models have demonstrated impressive capabilities in image generation and editing. Meanwhile, the 2D diffusion models also exhibit substantial potentials for 3D editing tasks. However, how to achieve consistent edits across multiple viewpoints remains a challenge. While the iterative dataset update method is capable of achieving global consistency, it suffers from slow convergence and over-smoothed textures. We propose SyncNoise, a novel geometry-guided multi-view consistent noise editing approach for high-fidelity 3D scene editing. SyncNoise synchronously edits multiple views with 2D diffusion models while enforcing multi-view noise predictions to be geometrically consistent, which ensures global consistency in both semantic structure and low-frequency appearance. To further enhance local consistency in high-frequency details, we set a group of anchor views and propagate them to their neighboring frames through cross-view reprojection. To improve the reliability of multi-view correspondences, we introduce depth supervision during training to enhance the reconstruction of precise geometries. Our method achieves high-quality 3D editing results respecting the textual instructions, especially in scenes with complex textures, by enhancing geometric consistency at the noise and pixel levels.

IJCAI Conference 2019 Conference Paper

Flexible Multi-View Representation Learning for Subspace Clustering

  • Ruihuang Li
  • Changqing Zhang
  • Qinghua Hu
  • Pengfei Zhu
  • Zheng Wang

In recent years, numerous multi-view subspace clustering methods have been proposed to exploit the complementary information from multiple views. Most of them perform data reconstruction within each single view, which makes the subspace representation unpromising and thus can not well identify the underlying relationships among data. In this paper, we propose to conduct subspace clustering based on Flexible Multi-view Representation (FMR) learning, which avoids using partial information for data reconstruction. The latent representation is flexibly constructed by enforcing it to be close to different views, which implicitly makes it more comprehensive and well-adapted to subspace clustering. With the introduction of kernel dependence measure, the latent representation can flexibly encode complementary information from different views and explore nonlinear, high-order correlations among these views. We employ the Alternating Direction Minimization (ADM) method to solve our problem. Empirical studies on real-world datasets show that our method achieves superior clustering performance over other state-of-the-art methods.