EAAI 2026
Strengthening temporal action segmentation through diffusion models
Abstract
This paper addresses the task of temporal action segmentation and presents a joint approach that combines low-level and high-level video analysis techniques. Our approach introduces the usage of diffusion models to enhance the quality and accuracy of temporal action segmentation. This paper introduces a new method to segment actions in videos over time. We blend basic and advanced video analysis methods, using the diffusion technique from denoising to make our results more accurate. By combining text information with visual features from different levels of a video, we create a multi-stage process that generates diverse features using the denoising network and improves the quality of our visual data. By leveraging the interplay between denoising and high-level segmentation tasks, our approach offers several advantages. These include the incorporation of a feedback loop for iterative refinement, improved overall performance through synergy between the two tasks, enhanced robustness and generalization. Our approach achieves reliable accuracies of 83. 4%, 90. 7% and 77. 6% on the Georgia Tech Egocentric Activities, 50 Salads and Breakfast datasets respectively.
Authors
Keywords
Context
- Venue
- Engineering Applications of Artificial Intelligence
- Archive span
- 1988-2026
- Indexed papers
- 13269
- Paper id
- 964414540287589770