Arrow Research search

Author name cluster

Yi Jin

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

16 papers
1 author row

Possible papers

16

AAAI Conference 2026 Conference Paper

Beyond Single-Speed Reasoning: Coordinating Fast and Slow Dynamics for Efficient World Modeling

  • Hongwei Wang
  • Yangru Huang
  • Guangyao Chen
  • Xu Wang
  • Yi Jin

Model-based reinforcement learning (MBRL) enables efficient decision-making by learning predictive world modelsof environment dynamics. Despite recent advances, existingmodels often struggle to reconcile accurate short-term transitions with coherent long-term planning, especially in partially observable or long-horizon settings. We argue that thislimitation often stems from modeling all transitions at a single temporal resolution, which makes it challenging to simultaneously capture fine-grained local dynamics and abstractglobal structures. To this end, we propose SF-RSSM (Slow-Fast Recurrent State-Space Model), a novel method that decouples short-term and long-term dynamics via a dualbranchdesign. The fast branch captures short-horizon transitions using residual prediction, while the slow branch models long-range dependencies with a GRU-based recurrent pathway.A distillation mechanism is developed to enable cooperationacross timescales, with the slow model providing soft targetsto guide the fast model. Additionally, a curiosity module encourages exploration by promoting learning in regions wherethe fast and slow branches exhibit divergent dynamics. Experiments on CARLA, DMControl and Atari benchmarks showthat SF-RSSM outperforms strong baselines in policy performance.

JBHI Journal 2026 Journal Article

Fundus Image Enhancement With Pyramid Conditional Flow

  • Kai Xu
  • Zhen Liang
  • Wenjun Wei
  • Huaian Chen
  • Yi Jin

Deep learning-based approaches, which learn pixel-to-pixel mapping from input to output images, have demonstrated exceptional performance in enhancing low-quality fundus images. However, due to the ambiguous definition of the ground-truth high-quality image, the pixel-to-pixel mapping encounters an ill-posed problem arising from the complex one-to-many relationship between low-quality fundus images and their corresponding high-quality versions. To address this problem, this work proposes a PCFlow, the first normalizing flow method that learns the complex distributions of high-quality fundus images rather than a pixel-to-pixel mapping. Unlike the existing image natural enhancement methods that aim to restore images with comfortable visual quality, PCFlow enhances fundus images by prioritizing clinically significant information. To this end, we design a condition module that utilizes retinal structure as a conditioning factor to constrain the optimization of PCFlow, and then build an invertible coupling layer that employs a pyramid structure for identifying each frequency component of retinal features. With the cooperation and interactions of these key components, the proposed PCFlow preserves the retinal structures and pathological characteristics essential for clinical applications. Extensive experiments on the real and synthetic fundus datasets demonstrate that our method achieves better performance.

NeurIPS Conference 2025 Conference Paper

ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models

  • Hongbo Liu
  • Jingwen He
  • Yi Jin
  • Dian Zheng
  • Yuhao Dong
  • Fan Zhang
  • Ziqi Huang
  • Yinan He

Recent Vision-Language Models (VLMs) have shown strong performance in general-purpose visual understanding and reasoning, but their ability to comprehend the visual grammar of movie shots remains underexplored and insufficiently evaluated. To bridge this gap, we present \textbf{ShotBench}, a dedicated benchmark for assessing VLMs’ understanding of cinematic language. ShotBench includes 3, 049 still images and 500 video clips drawn from more than 200 films, with each sample annotated by trained annotators or curated from professional cinematography resources, resulting in 3, 608 high-quality question-answer pairs. We conduct a comprehensive evaluation of over 20 state-of-the-art VLMs across eight core cinematography dimensions. Our analysis reveals clear limitations in fine-grained perception and cinematic reasoning of current VLMs. To improve VLMs capability in cinematography understanding, we construct a large-scale multimodal dataset, named ShotQA, which contains about 70k Question-Answer pairs derived from movie shots. Besides, we propose ShotVL and train this VLM model with a two-stage training strategy, integrating both supervised fine-tuning and Group Relative Policy Optimization (GRPO). Experimental results demonstrate that our model achieves substantial improvements, surpassing all existing strongest open-source and proprietary models evaluated on ShotBench, establishing a new state-of-the-art performance.

NeurIPS Conference 2024 Conference Paper

DFA-GNN: Forward Learning of Graph Neural Networks by Direct Feedback Alignment

  • Gongpei Zhao
  • Tao Wang
  • Congyan Lang
  • Yi Jin
  • Yidong Li
  • Haibin Ling

Graph neural networks (GNNs) are recognized for their strong performance across various applications, with the backpropagation (BP) algorithm playing a central role in the development of most GNN models. However, despite its effectiveness, BP has limitations that challenge its biological plausibility and affect the efficiency, scalability and parallelism of training neural networks for graph-based tasks. While several non-backpropagation (non-BP) training algorithms, such as the direct feedback alignment (DFA), have been successfully applied to fully-connected and convolutional network components for handling Euclidean data, directly adapting these non-BP frameworks to manage non-Euclidean graph data in GNN models presents significant challenges. These challenges primarily arise from the violation of the independent and identically distributed (i. i. d. ) assumption in graph data and the difficulty in accessing prediction errors for all samples (nodes) within the graph. To overcome these obstacles, in this paper we propose DFA-GNN, a novel forward learning framework tailored for GNNs with a case study of semi-supervised learning. The proposed method breaks the limitations of BP by using a dedicated forward training mechanism. Specifically, DFA-GNN extends the principles of DFA to adapt to graph data and unique architecture of GNNs, which incorporates the information of graph topology into the feedback links to accommodate the non-Euclidean characteristics of graph data. Additionally, for semi-supervised graph learning tasks, we developed a pseudo error generator that spreads residual errors from training data to create a pseudo error for each unlabeled node. These pseudo errors are then utilized to train GNNs using DFA. Extensive experiments on 10 public benchmarks reveal that our learning framework outperforms not only previous non-BP methods but also the standard BP methods, and it exhibits excellent robustness against various types of noise and attacks.

NeurIPS Conference 2024 Conference Paper

Masked Pre-training Enables Universal Zero-shot Denoiser

  • Xiaoxiao Ma
  • Zhixiang Wei
  • Yi Jin
  • Pengyang Ling
  • Tianle Liu
  • Ben Wang
  • Junkang Dai
  • Huaian Chen

In this work, we observe that model trained on vast general images via masking strategy, has been naturally embedded with their distribution knowledge, thus spontaneously attains the underlying potential for strong image denoising. Based on this observation, we propose a novel zero-shot denoising paradigm, i. e. , $\textbf{M}$asked $\textbf{P}$re-train then $\textbf{I}$terative fill ($\textbf{MPI}$). MPI first trains model via masking and then employs pre-trained weight for high-quality zero-shot image denoising on a single noisy image. Concretely, MPI comprises two key procedures: $\textbf{1) Masked Pre-training}$ involves training model to reconstruct massive natural images with random masking for generalizable representations, gathering the potential for valid zero-shot denoising on images with varying noise degradation and even in distinct image types. $\textbf{2) Iterative filling}$ exploits pre-trained knowledge for effective zero-shot denoising. It iteratively optimizes the image by leveraging pre-trained weights, focusing on alternate reconstruction of different image parts, and gradually assembles fully denoised image within limited number of iterations. Comprehensive experiments across various noisy scenarios underscore the notable advances of MPI over previous approaches with a marked reduction in inference time.

AAAI Conference 2023 Conference Paper

MetaZSCIL: A Meta-Learning Approach for Generalized Zero-Shot Class Incremental Learning

  • Yanan Wu
  • Tengfei Liang
  • Songhe Feng
  • Yi Jin
  • Gengyu Lyu
  • Haojun Fei
  • Yang Wang

Generalized zero-shot learning (GZSL) aims to recognize samples whose categories may not have been seen at training. Standard GZSL cannot handle dynamic addition of new seen and unseen classes. In order to address this limitation, some recent attempts have been made to develop continual GZSL methods. However, these methods require end-users to continuously collect and annotate numerous seen class samples, which is unrealistic and hampers the applicability in the real-world. Accordingly, in this paper, we propose a more practical and challenging setting named Generalized Zero-Shot Class Incremental Learning (CI-GZSL). Our setting aims to incrementally learn unseen classes without any training samples, while recognizing all classes previously encountered. We further propose a bi-level meta-learning based method called MetaZSCIL to directly optimize the network to learn how to incrementally learn. Specifically, we sample sequential tasks from seen classes during the offline training to simulate the incremental learning process. For each task, the model is learned using a meta-objective such that it is capable to perform fast adaptation without forgetting. Note that our optimization can be flexibly equipped with most existing generative methods to tackle CI-GZSL. This work introduces a feature generative framework that leverages visual feature distribution alignment to produce replayed samples of previously seen classes to reduce catastrophic forgetting. Extensive experiments conducted on five widely used benchmarks demonstrate the superiority of our proposed method.

NeurIPS Conference 2022 Conference Paper

Deliberated Domain Bridging for Domain Adaptive Semantic Segmentation

  • Lin Chen
  • Zhixiang Wei
  • Xin Jin
  • Huaian Chen
  • Miao Zheng
  • Kai Chen
  • Yi Jin

In unsupervised domain adaptation (UDA), directly adapting from the source to the target domain usually suffers significant discrepancies and leads to insufficient alignment. Thus, many UDA works attempt to vanish the domain gap gradually and softly via various intermediate spaces, dubbed domain bridging (DB). However, for dense prediction tasks such as domain adaptive semantic segmentation (DASS), existing solutions have mostly relied on rough style transfer and how to elegantly bridge domains is still under-explored. In this work, we resort to data mixing to establish a deliberated domain bridging (DDB) for DASS, through which the joint distributions of source and target domains are aligned and interacted with each in the intermediate space. At the heart of DDB lies a dual-path domain bridging step for generating two intermediate domains using the coarse-wise and the fine-wise data mixing techniques, alongside a cross-path knowledge distillation step for taking two complementary models trained on generated intermediate samples as ‘teachers’ to develop a superior ‘student’ in a multi-teacher distillation manner. These two optimization steps work in an alternating way and reinforce each other to give rise to DDB with strong adaptation power. Extensive experiments on adaptive segmentation tasks with different settings demonstrate that our DDB significantly outperforms state-of-the-art methods.

IJCAI Conference 2021 Conference Paper

GM-MLIC: Graph Matching based Multi-Label Image Classification

  • Yanan Wu
  • He Liu
  • Songhe Feng
  • Yi Jin
  • Gengyu Lyu
  • Zizhang Wu

Multi-Label Image Classification (MLIC) aims to predict a set of labels that present in an image. The key to deal with such problem is to mine the associations between image contents and labels, and further obtain the correct assignments between images and their labels. In this paper, we treat each image as a bag of instances, and reformulate the task of MLIC as a instance-label matching selection problem. To model such problem, we propose a novel deep learning framework named Graph Matching based Multi-Label Image Classification (GM-MLIC), where Graph Matching (GM) scheme is introduced owing to its excellent capability of excavating the instance and label relationship. Specifically, we first construct an instance spatial graph and a label semantic graph respectively, and then incorporate them into a constructed assignment graph by connecting each instance to all labels. Subsequently, the graph network block is adopted to aggregate and update all nodes and edges state on the assignment graph to form structured representations for each instance and label. Our network finally derives a prediction score for each instance-label correspondence and optimizes such correspondence with a weighted cross-entropy loss. Extensive experiments conducted on various datasets demonstrate the superiority of our proposed method.

AAAI Conference 2021 Conference Paper

Unsupervised Domain Adaptation for Person Re-identification via Heterogeneous Graph Alignment

  • Minying Zhang
  • Kai Liu
  • Yidong Li
  • Shihui Guo
  • Hongtao Duan
  • Yimin Long
  • Yi Jin

Unsupervised person re-identification (re-ID) is becoming increasingly popular due to its power in real-world systems such as public security and intelligent transportation systems. However, the person re-ID task is challenged by the problems of data distribution discrepancy across cameras and lack of label information. In this paper, we propose a coarse-tofine heterogeneous graph alignment (HGA) method to find cross-camera person matches by characterizing the unlabeled data as a heterogeneous graph for each camera. In the coarsealignment stage, we assign a projection for each camera and utilize an adversarial learning based method to align coarsegrained node groups from different cameras into a shared space, which consequently alleviates the distribution discrepancy between cameras. In the fine-alignment stage, we exploit potential fine-grained node groups in the shared space and introduce conservative alignment loss functions to constrain the graph aligning process, resulting in reliable pseudo labels as learning guidance. The proposed domain adaptation framework not only improves model generalization on target domain, but also facilitates mining and integrating the potential discriminative information across different cameras. Extensive experiments on benchmark datasets demonstrate that the proposed approach outperforms the state-of-the-arts.

AAAI Conference 2020 Conference Paper

Domain Adaptive Attention Learning for Unsupervised Person Re-Identification

  • Yangru Huang
  • Peixi Peng
  • Yi Jin
  • Yidong Li
  • Junliang Xing

Person re-identification (Re-ID) across multiple datasets is a challenging task due to two main reasons: the presence of large cross-dataset distinctions and the absence of annotated target instances. To address these two issues, this paper proposes a domain adaptive attention learning approach to reliably transfer discriminative representation from the labeled source domain to the unlabeled target domain. In this approach, a domain adaptive attention model is learned to separate the feature map into domain-shared part and domainspecific part. In this manner, the domain-shared part is used to capture transferable cues that can compensate cross-dataset distinctions and give positive contributions to the target task, while the domain-specific part aims to model the noisy information to avoid the negative transfer caused by domain diversity. A soft label loss is further employed to take full use of unlabeled target data by estimating pseudo labels. Extensive experiments on the Market-1501, DukeMTMC-reID and MSMT17 benchmarks demonstrate the proposed approach outperforms the state-of-the-arts.

TIST Journal 2020 Journal Article

HERA

  • Gengyu Lyu
  • Songhe Feng
  • Yidong Li
  • Yi Jin
  • Guojun Dai
  • Congyan Lang

Partial label learning (PLL) aims to learn from the data where each training instance is associated with a set of candidate labels, among which only one is correct. Most existing methods deal with this type of problem by either treating each candidate label equally or identifying the ground-truth label iteratively. In this article, we propose a novel PLL approach named HERA, which simultaneously incorporates the HeterogEneous Loss and the SpaRse and Low-rAnk procedure to estimate the labeling confidence for each instance while training the desired model. Specifically, the heterogeneous loss integrates the strengths of both the pairwise ranking loss and the pointwise reconstruction loss to provide informative label ranking and reconstruction information for label identification, whereas the embedded sparse and low-rank scheme constrains the sparsity of ground-truth label matrix and the low rank of noise label matrix to explore the global label relevance among the whole training data, for improving the learning model. Comprehensive ablation study demonstrates the effectiveness of our employed heterogeneous loss, and extensive experiments on both artificial and real-world datasets demonstrate that our method achieves superior or comparable performance against state-of-the-art methods.

AAAI Conference 2019 Conference Paper

Partial Multi-Label Learning by Low-Rank and Sparse Decomposition

  • Lijuan Sun
  • Songhe Feng
  • Tao Wang
  • Congyan Lang
  • Yi Jin

Multi-Label Learning (MLL) aims to learn from the training data where each example is represented by a single instance while associated with a set of candidate labels. Most existing MLL methods are typically designed to handle the problem of missing labels. However, in many real-world scenarios, the labeling information for multi-label data is always redundant, which can not be solved by classical MLL methods, thus a novel Partial Multi-label Learning (PML) framework is proposed to cope with such problem, i. e. removing the the noisy labels from the multi-label sets. In this paper, in order to further improve the denoising capability of PML framework, we utilize the low-rank and sparse decomposition scheme and propose a novel Partial Multi-label Learning by Low-Rank and Sparse decomposition (PML-LRS) approach. Specifically, we first reformulate the observed label set into a label matrix, and then decompose it into a groundtruth label matrix and an irrelevant label matrix, where the former is constrained to be low rank and the latter is assumed to be sparse. Next, we utilize the feature mapping matrix to explore the label correlations and meanwhile constrain the feature mapping matrix to be low rank to prevent the proposed method from being overfitting. Finally, we obtain the ground-truth labels via minimizing the label loss, where the Augmented Lagrange Multiplier (ALM) algorithm is incorporated to solve the optimization problem. Enormous experimental results demonstrate that PML-LRS can achieve superior or competitive performance against other state-of-the-art methods.

AAAI Conference 2007 Conference Paper

Mutual Belief Revision: Semantics and Computation

  • Yi Jin

This paper presents both a semantic and a computational model for multi-agent belief revision. We show that these two models are equivalent but serve different purposes. The semantic model displays the intuition and construction of the belief revision operation in multi-agent environments, especially in case of just two agents. The logical properties of this model provide strong justifications for it. The computational model enables us to reassess the operation from a computational perspective. A complexity analysis reveals that belief revision between two agents is computationally no more demanding than single agent belief revision.

IJCAI Conference 2005 Conference Paper

Iterated Belief Revision, Revised

  • Yi Jin
  • Michael

The AGM postulates for belief revision, augmented by the DP postulates for iterated belief revision, provide generally accepted criteria for the design of operators by which intelligent agents adapt their beliefs incrementally to new information. These postulates alone, however, are too permissive: They support operators by which all newly acquired information is canceled as soon as an agent learns a fact that contradicts some of its current beliefs. In this paper, we present a formal analysis of the deficiency of the DP postulates, and we show how to solve the problem by an additional postulate of independence. We give a representation theorem for this postulate and prove that it is compatible with AGM and DP.