Strengthening temporal action segmentation through diffusion models

Danfeng Zhuang; Min Jiang; Hichem Arioui; Hedi Tabia

doi:10.1016/j.engappai.2025.113163

Back to EAAI

EAAI 2026

Strengthening temporal action segmentation through diffusion models

Journal Article journal-article Applied Artificial Intelligence · Artificial Intelligence

Details DOI

Abstract

This paper addresses the task of temporal action segmentation and presents a joint approach that combines low-level and high-level video analysis techniques. Our approach introduces the usage of diffusion models to enhance the quality and accuracy of temporal action segmentation. This paper introduces a new method to segment actions in videos over time. We blend basic and advanced video analysis methods, using the diffusion technique from denoising to make our results more accurate. By combining text information with visual features from different levels of a video, we create a multi-stage process that generates diverse features using the denoising network and improves the quality of our visual data. By leveraging the interplay between denoising and high-level segmentation tasks, our approach offers several advantages. These include the incorporation of a feedback loop for iterative refinement, improved overall performance through synergy between the two tasks, enhanced robustness and generalization. Our approach achieves reliable accuracies of 83. 4%, 90. 7% and 77. 6% on the Georgia Tech Egocentric Activities, 50 Salads and Breakfast datasets respectively.

Strengthening temporal action segmentation through diffusion models

Abstract

Authors

Keywords

Context