Arrow Research search
Back to EAAI

EAAI 2026

Strengthening temporal action segmentation through diffusion models

Journal Article journal-article Applied Artificial Intelligence ยท Artificial Intelligence

Abstract

This paper addresses the task of temporal action segmentation and presents a joint approach that combines low-level and high-level video analysis techniques. Our approach introduces the usage of diffusion models to enhance the quality and accuracy of temporal action segmentation. This paper introduces a new method to segment actions in videos over time. We blend basic and advanced video analysis methods, using the diffusion technique from denoising to make our results more accurate. By combining text information with visual features from different levels of a video, we create a multi-stage process that generates diverse features using the denoising network and improves the quality of our visual data. By leveraging the interplay between denoising and high-level segmentation tasks, our approach offers several advantages. These include the incorporation of a feedback loop for iterative refinement, improved overall performance through synergy between the two tasks, enhanced robustness and generalization. Our approach achieves reliable accuracies of 83. 4%, 90. 7% and 77. 6% on the Georgia Tech Egocentric Activities, 50 Salads and Breakfast datasets respectively.

Authors

Keywords

  • Temporal action segmentation
  • Multimodal feature extraction
  • Diffusion-based model

Context

Venue
Engineering Applications of Artificial Intelligence
Archive span
1988-2026
Indexed papers
13269
Paper id
964414540287589770