Arrow Research search

Author name cluster

Sheng-Ping Yang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

2 papers
2 author rows

Possible papers

2

AAAI Conference 2025 Conference Paper

Memory-Augmented Re-Completion for 3D Semantic Scene Completion

  • Yu-Wen Tseng
  • Sheng-Ping Yang
  • Jhih-Ciang Wu
  • I-Bin Liao
  • Yung-Hui Li
  • Hong-Han Shuai
  • Wen-Huang Cheng

Semantic Scene Completion (SSC) aims to reconstruct a 3D voxel representation occupied by semantic classes based on ordinary inputs such as 2D RGB images, depth maps, or point clouds. Given the cost-effective and promising applications in autonomous driving, camera-based SSC has attracted considerable attention to developing various approaches. However, current methods mainly focus on precise 2D-to-3D projection while overlooking the challenge of completing invisible regions, leading to numerous false negatives and suboptimal SSC performance. To address this issue, we propose a novel architecture, Memory-augmented Re-completion (MARE), designed to enhance completion capability. Our MARE model encapsulates regional relationships by incorporating a memory bank that stores vital region-tokens while two protocols concerning diversity and age are adopted to optimize the bank adversarially. Additionally, we introduce a Re-completion pipeline incorporated with an Information Spreading module to progressively complete the invisible regions while bridging the scale gap between region-level and voxel-level information. Extensive experiments conducted on the SSCBench-KITTI-360 and SemanticKITTI datasets validate the effectiveness of our approach.

ICML Conference 2025 Conference Paper

MuseControlLite: Multifunctional Music Generation with Lightweight Conditioners

  • Fang-Duo Tsai
  • Shih-Lun Wu
  • Weijaw Lee
  • Sheng-Ping Yang
  • Bo-Rui Chen
  • Hao-Chung Cheng
  • Yi-Hsuan Yang

We propose MuseControlLite, a lightweight mechanism designed to fine-tune text-to-music generation models for precise conditioning using various time-varying musical attributes and reference audio signals. The key finding is that positional embeddings, which have been seldom used by text-to-music generation models in the conditioner for text conditions, are critical when the condition of interest is a function of time. Using melody control as an example, our experiments show that simply adding rotary positional embeddings to the decoupled cross-attention layers increases control accuracy from 56. 6% to 61. 1%, while requiring 6. 75 times fewer trainable parameters than state-of-the-art fine-tuning mechanisms, using the same pre-trained diffusion Transformer model of Stable Audio Open. We evaluate various forms of musical attribute control, audio inpainting, and audio outpainting, demonstrating improved controllability over MusicGen-Large and Stable Audio Open ControlNet at a significantly lower fine-tuning cost, with only 85M trainable parameters. Source code, model checkpoints, and demo examples are available at: https: //MuseControlLite. github. io/web/