Arrow Research search

Author name cluster

Pengwei Yin

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

3 papers
1 author row

Possible papers

3

IJCAI Conference 2025 Conference Paper

Denoising Diffusion Models are Good General Gaze Feature Learners

  • Guanzhong Zeng
  • Jingjing Wang
  • Pengwei Yin
  • Zefu Xu
  • Mingyang Zhou

Since the collection of labeled gaze data is laborious and time-consuming, methods which can learn generalizable features by leveraging large-scale available unlabeled data are desirable. In recent years, we have witnessed the tremendous capabilities of diffusion models in generating images as well as their potential in feature representation learning. In this paper, we investigate whether they can acquire discriminative representations for gaze estimation via generative pre-training. To achieve this goal, we propose a self-supervised learning framework with diffusion models for gaze estimation, called GazeDiff. Specifically, we utilize a conditional diffusion model to generate target image with gaze direction specified by the reference image as the pre-training task. To facilitate the diffusion model to learn gaze related features as condition, we propose a disentangling feature learning strategy, which first learns appearance feature, head pose feature, and eye direction feature respectively, and then combines them as the conditional features. Extensive experiments demonstrate denoising diffusion models are also good general gaze feature learners.

AAAI Conference 2025 Conference Paper

Gaze Label Alignment: Alleviating Domain Shift for Gaze Estimation

  • Guanzhong Zeng
  • Jingjing Wang
  • Zefu Xu
  • Pengwei Yin
  • Wenqi Ren
  • Di Xie
  • Jiang Zhu

Gaze estimation methods encounter significant performance deterioration when being evaluated across different domains, because of the domain gap between the testing and training data. Existing methods try to solve this issue by reducing the deviation of data distribution, however, they ignore the existence of label deviation in the data due to the acquisition mechanism of the gaze label and the individual physiological differences. In this paper, we first point out that the influence brought by the label deviation cannot be ignored, and propose a gaze label alignment algorithm (GLA) to eliminate the label distribution deviation. Specifically, we first train the feature extractor on all domains to get domain invariant features, and then select an anchor domain to train the gaze regressor. We predict the gaze label on remaining domains and use a mapping function to align the labels. Finally, these aligned labels can be used to train gaze estimation models. Therefore, our method can be combined with any existing method. Experimental results show that our GLA method can effectively alleviate the label distribution shift, and SOTA gaze estimation methods can be further improved obviously.

AAAI Conference 2024 Conference Paper

CLIP-Gaze: Towards General Gaze Estimation via Visual-Linguistic Model

  • Pengwei Yin
  • Guanzhong Zeng
  • Jingjing Wang
  • Di Xie

Gaze estimation methods often experience significant performance degradation when evaluated across different domains, due to the domain gap between the testing and training data. Existing methods try to address this issue using various domain generalization approaches, but with little success because of the limited diversity of gaze datasets, such as appearance, wearable, and image quality. To overcome these limitations, we propose a novel framework called CLIP-Gaze that utilizes a pre-trained vision-language model to leverage its transferable knowledge. Our framework is the first to leverage the vision-and-language cross-modality approach for gaze estimation task. Specifically, we extract gaze-relevant feature by pushing it away from gaze-irrelevant features which can be flexibly constructed via language descriptions. To learn more suitable prompts, we propose a personalized context optimization method for text prompt tuning. Furthermore, we utilize the relationship among gaze samples to refine the distribution of gaze-relevant features, thereby improving the generalization capability of the gaze estimation model. Extensive experiments demonstrate the excellent performance of CLIP-Gaze over existing methods on four cross-domain evaluations.