Prompt-guided Precise Audio Editing with Diffusion Models

Manjie Xu; Chenxing Li; Duzhen Zhang; Dan Su 0002; Wei Liang; Dong Yu 0001

Back to ICML

ICML 2024

Prompt-guided Precise Audio Editing with Diffusion Models

Conference Paper Accept (Poster) Artificial Intelligence · Machine Learning

Details

Abstract

Audio editing involves the arbitrary manipulation of audio content through precise control. Although text-guided diffusion models have made significant advancements in text-to-audio generation, they still face challenges in finding a flexible and precise way to modify target events within an audio track. We present a novel approach, referred to as PPAE, which serves as a general module for diffusion models and enables precise audio editing. The editing is based on the input textual prompt only and is entirely training-free. We exploit the cross-attention maps of diffusion models to facilitate accurate local editing and employ a hierarchical local-global pipeline to ensure a smoother editing process. Experimental results highlight the effectiveness of our method in various editing tasks.

Authors

Keywords

No keywords are indexed for this paper.

Context

Venue: International Conference on Machine Learning
Archive span: 1993-2025
Indexed papers: 16471
Paper id: 1048544971877741537