Author name cluster

Jingjun Yi

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers

1 author row

NeurIPS Conference 2025 Conference Paper

Degradation-Aware Dynamic Schrödinger Bridge for Unpaired Image Restoration

Jingjun Yi
Qi Bi
Hao Zheng
Huimin Huang
Yixian Shen
Haolan Zhan
Wei Ji
Yawen Huang

Image restoration is a fundamental task in computer vision and machine learning, which learns a mapping between the clear images and the degraded images under various conditions (e. g. , blur, low-light, haze). Yet, most existing image restoration methods are highly restricted by the requirement of degraded and clear image pairs, which limits the generalization and feasibility to enormous real-world scenarios without paired images. To address this bottleneck, we propose a Degradation-aware Dynamic Schr\"{o}dinger Bridge (DDSB) for unpaired image restoration. Its general idea is to learn a Schr\"{o}dinger Bridge between clear and degraded image distribution, while at the same time emphasizing the physical degradation priors to reduce the accumulation of errors during the restoration process. A Degradation-aware Optimal Transport (DOT) learning scheme is accordingly devised. Training a degradation model to learn the inverse restoration process is particularly challenging, as it must be applicable across different stages of the iterative restoration process. A Dynamic Transport with Consistency (DTC) learning objective is further proposed to reduce the loss of image details in the early iterations and therefore refine the degradation model. Extensive experiments on multiple image degradation tasks show its state-of-the-art performance over the prior arts.

PDF Details

AAAI Conference 2025 Conference Paper

DGFamba: Learning Flow Factorized State Space for Visual Domain Generalization

Qi Bi
Jingjun Yi
Hao Zheng
Haolan Zhan
Wei Ji
Yawen Huang
Yuexiang Li

Domain generalization aims to learn a representation from the source domain, which can be generalized to arbitrary unseen target domains. A fundamental challenge for visual domain generalization is the domain gap caused by the dramatic style variation whereas the image content is stable. The realm of selective state space, exemplified by VMamba, demonstrates its global receptive field in representing the content. However, the way exploiting the domain-invariant property for selective state space is rarely explored. In this paper, we propose a novel Flow Factorized State Space model, dubbed as DGFamba, for visual domain generalization. To maintain domain consistency, we innovatively map the style-augmented and the original state embeddings by flow factorization. In this latent flow space, each state embedding from a certain style is specified by a latent probability path. By aligning these probability paths in the latent space, the state embeddings are able to represent the same content distribution regardless of the style differences. Extensive experiments conducted on various visual domain generalization settings show its state-of-the-art performance.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Learning a Cross-Modal Schrödinger Bridge for Visual Domain Generalization

Hao Zheng
Jingjun Yi
Qi Bi
Huimin Huang
Haolan Zhan
Yawen Huang
Yuexiang Li
Xian Wu

Domain generalization aims to train models that perform robustly on unseen target domains without access to target data. The realm of vision-language foundation model has opened a new venue owing to its inherent out-of-distribution generalization capability. However, the static alignment to class-level textual anchors remains insufficient to handle the dramatic distribution discrepancy from diverse domain-specific visual features. In this work, we propose a novel cross-domain Schrödinger Bridge (SB) method, namely SBGen, to handle this challenge, which explicitly formulates the stochastic semantic evolution, to gain better generalization to unseen domains. Technically, the proposed \texttt{SBGen} consists of three key components: (1) \emph{text-guided domain-aware feature selection} to isolate semantically aligned image tokens; (2) \emph{stochastic cross-domain evolution} to simulate the SB dynamics via a learnable time-conditioned drift; and (3) \emph{stochastic domain-agnostic interpolation} to construct semantically grounded feature trajectories. Empirically, \texttt{SBGen} achieves state-of-the-art performance on domain generalization in both classification and segmentation. This work highlights the importance of modeling domain shifts as structured stochastic processes grounded in semantic alignment.

PDF Details

AAAI Conference 2025 Conference Paper

Learning Fine-grained Domain Generalization via Hyperbolic State Space Hallucination

Qi Bi
Jingjun Yi
Haolan Zhan
Wei Ji
Gui-Song Xia

Fine-grained domain generalization (FGDG) aims to learn a fine-grained representation that can be well generalized to unseen target domains when only trained on the source domain data. Compared with generic domain generalization, FGDG is particularly challenging in that the fine-grained category can be only discerned by some subtle and tiny patterns. Such patterns are particularly fragile under the cross-domain style shifts caused by illumination, color and etc. To push this frontier, this paper presents a novel Hyperbolic State Space Hallucination (HSSH) method. It consists of two key components, namely, state space hallucination (SSH) and hyperbolic manifold consistency (HMC). SSH enriches the style diversity for the state embeddings by firstly extrapolating and then hallucinating the source images. Then, the pre- and post- style hallucinate state embeddings are projected into the hyperbolic manifold. The hyperbolic state space models the high-order statistics, and allows a better discernment of the fine-grained patterns. Finally, the hyperbolic distance is minimized, so that the impact of style variation on fine-grained patterns can be eliminated. Experiments on three FGDG benchmarks demonstrate its state-of-the-art performance.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Learning Frequency-Adapted Vision Foundation Model for Domain Generalized Semantic Segmentation

Qi Bi
Jingjun Yi
Hao Zheng
Haolan Zhan
Yawen Huang
Wei Ji
Yuexiang Li
Yefeng Zheng

The emerging vision foundation model (VFM) has inherited the ability to generalize to unseen images. Nevertheless, the key challenge of domain-generalized semantic segmentation (DGSS) lies in the domain gap attributed to the cross-domain styles, i. e. , the variance of urban landscape and environment dependencies. Hence, maintaining the style-invariant property with varying domain styles becomes the key bottleneck in harnessing VFM for DGSS. The frequency space after Haar wavelet transformation provides a feasible way to decouple the style information from the domain-invariant content, since the content and style information are retained in the low- and high- frequency components of the space, respectively. To this end, we propose a novel Frequency-Adapted (FADA) learning scheme to advance the frontier. Its overall idea is to separately tackle the content and style information by frequency tokens throughout the learning process. Particularly, the proposed FADA consists of two branches, i. e. , low- and high- frequency branches. The former one is able to stabilize the scene content, while the latter one learns the scene styles and eliminates its impact to DGSS. Experiments conducted on various DGSS settings show the state-of-the-art performance of our FADA and its versatility to a variety of VFMs. Source code is available at \url{https: //github. com/BiQiWHU/FADA}.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Learning Generalized Medical Image Segmentation from Decoupled Feature Queries

Qi Bi
Jingjun Yi
Hao Zheng
Wei Ji
Yawen Huang
Yuexiang Li
Yefeng Zheng

Domain generalized medical image segmentation requires models to learn from multiple source domains and generalize well to arbitrary unseen target domain. Such a task is both technically challenging and clinically practical, due to the domain shift problem (i.e., images are collected from different hospitals and scanners). Existing methods focused on either learning shape-invariant representation or reaching consensus among the source domains. An ideal generalized representation is supposed to show similar pattern responses within the same channel for cross-domain images. However, to deal with the significant distribution discrepancy, the network tends to capture similar patterns by multiple channels, while different cross-domain patterns are also allowed to rest in the same channel. To address this issue, we propose to leverage channel-wise decoupled deep features as queries. With the aid of cross-attention mechanism, the long-range dependency between deep and shallow features can be fully mined via self-attention and then guides the learning of generalized representation. Besides, a relaxed deep whitening transformation is proposed to learn channel-wise decoupled features in a feasible way. The proposed decoupled fea- ture query (DFQ) scheme can be seamlessly integrate into the Transformer segmentation model in an end-to-end manner. Extensive experiments show its state-of-the-art performance, notably outperforming the runner-up by 1.31% and 1.98% with DSC metric on generalized fundus and prostate benchmarks, respectively. Source code is available at https://github.com/BiQiWHU/DFQ.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Samba: Severity-aware Recurrent Modeling for Cross-domain Medical Image Grading

Qi Bi
Jingjun Yi
Hao Zheng
Wei Ji
Haolan Zhan
Yawen Huang
Yuexiang Li
Yefeng Zheng

Disease grading is a crucial task in medical image analysis. Due to the continuous progression of diseases, i. e. , the variability within the same level and the similarity between adjacent stages, accurate grading is highly challenging. Furthermore, in real-world scenarios, models trained on limited source domain datasets should also be capable of handling data from unseen target domains. Due to the cross-domain variants, the feature distribution between source and unseen target domains can be dramatically different, leading to a substantial decrease in model performance. To address these challenges in cross-domain disease grading, we propose a Severity-aware Recurrent Modeling (Samba) method in this paper. As the core objective of most staging tasks is to identify the most severe lesions, which may only occupy a small portion of the image, we propose to encode image patches in a sequential and recurrent manner. Specifically, a state space model is tailored to store and transport the severity information by hidden states. Moreover, to mitigate the impact of cross-domain variants, an Expectation-Maximization (EM) based state recalibration mechanism is designed to map the patch embeddings into a more compact space. We model the feature distributions of different lesions through the Gaussian Mixture Model (GMM) and reconstruct the intermediate features based on learnable severity bases. Extensive experiments show the proposed Samba outperforms the VMamba baseline by an average accuracy of 23. 5\%, 5. 6\% and 4. 1\% on the cross-domain grading of fatigue fracture, breast cancer and diabetic retinopathy, respectively. Source code is available at \url{https: //github. com/BiQiWHU/Samba}.

PDF Details DOI