Invariant Feature Learning for Counterfactual Watch-time Prediction in Video Recommendation

Chenghou Jin; Yixin Ren; Hongxu Ma; Yewei Xia; Yi Guan; Hao Zhang; Jiandong Ding; Jihong Guan; Shuigeng Zhou

doi:10.1609/aaai.v40i17.38518

Back to AAAI

AAAI 2026

Invariant Feature Learning for Counterfactual Watch-time Prediction in Video Recommendation

Conference Paper AAAI Technical Track on Data Mining & Knowledge Management I Artificial Intelligence

PDF Details DOI

Abstract

Video recommendation systems heavily rely on user watch time feedback, making accurate watch time prediction a crucial task. However, this task inherently suffers from bias, as recommendation models tend to favor long-duration videos to maximize watch time. This issue, known as duration bias in the watch-time prediction context, can be explained from a causal perspective, where video duration acts as a confounder. Recent works address this bias using backdoor adjustment, isolating the direct effect of content on watch time from observational data. These methods typically discretize video duration into groups, estimate group-wise effects, and then aggregate them via a unified prediction model. However, this aggregation strategy is prone to model misspecification due to feature distribution shift across groups. In this paper, we reinterpret the problem through the lens of invariant learning and propose a novel framework: Duration-Invariant Feature Learning (DIFL). DIFL employs a kernel-based regularization that enforces representation invariance across duration groups, reducing sensitivity to group design and improving generalization. This enables more accurate modeling of the direct causal effect and making counterfactual inference. Extensive experiments on both public and real large-scale production datasets demonstrate the effectiveness of our approach, which achieves SOTA performance.

Invariant Feature Learning for Counterfactual Watch-time Prediction in Video Recommendation

Abstract

Authors

Keywords

Context