Author name cluster

Yunhe Feng

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

5 papers

2 author rows

AAAI Conference 2026 Conference Paper

OrgaCast: A Trustworthy Spatiotemporal Diffusion Model for Fluorescence Organoid Forecasting

Dawei Gao
Angello Huerta Gomez
Mingchen Li
Marcel El-Mokahal
Huaxiao Yang
Yunhe Feng

Accurately forecasting the spatiotemporal dynamics of biological systems, such as human pluripotent stem cell (hPSC)-derived cardiac organoids, from microscopy time-series is a critical challenge in biomedicine with profound implications for drug discovery. Existing generative models often fail to capture the intricate dynamics of organoid development, struggling with their irregular morphology, indistinct boundaries, and complex spatiotemporal patterns. To overcome these limitations, we introduce OrgaCast, a novel multimodal conditional diffusion model for high-fidelity organoid forecasting. OrgaCast uniquely conditions the generative process on three synergistic modalities: (i) historical image sequences, captured by a dedicated spatiotemporal control module; (ii) structured numerical metadata defining experimental conditions; and (iii) descriptive text captions summarizing the biological context. This comprehensive conditioning enables the generation of forecasts with high visual accuracy and biological plausibility. Furthermore, to enhance the model's utility in critical research settings, we introduce a post-hoc uncertainty quantification method that produces intuitive confidence maps, bolstering the interpretability and trustworthiness of predictions. Extensive experiments on a challenging cardiac organoid dataset demonstrate that OrgaCast outperforms baselines in metrics such as SSIM, PSNR, and LPIPS. Our framework presents a robust solution for biological forecasting, promising to accelerate research discovery while minimizing experimental costs and manual effort.

PDF Details DOI

IROS Conference 2025 Conference Paper

Efficient and Accurate Low-Resolution Transformer Tracking

Shaohua Dong
Yunhe Feng
James Liang
Qing Yang 0003
Yuewei Lin
Heng Fan 0001

High-performance Transformer trackers have exhibited excellent results, yet they often bear a heavy computational load. Observing that a smaller input can immediately and conveniently reduce computations without changing the model, an easy solution is to adopt a low-resolution input for efficient Transformer tracking. Albeit faster, this hurts tracking accuracy much due to the information loss in low resolution tracking. In this paper, we aim to mitigate such information loss to boost performance of low-resolution Transformer tracking via dual knowledge distillation from a frozen high-resolution (but not a larger) Transformer tracker. The core lies in two simple yet effective distillation modules, including query-key-value knowledge distillation (QKV-KD) and discrimination knowledge distillation (Disc-KD), across resolutions. The former, from the global view, allows the low-resolution tracker to inherit features and interactions from the high-resolution tracker, while the later, from the target-aware view, enhances the target-background distinguishing capacity via imitating discriminative regions from its high-resolution counterpart. With dual knowledge distillation, our Low-Resolution Transformer Tracker, dubbed LoReTrack, enjoys not only high efficiency owing to reduced computation but also enhanced accuracy by distilling knowledge from the high-resolution tracker. In extensive experiments, LoReTrack with a 256 2 resolution consistently improves baseline with the same resolution, and shows competitive or better results compared to the 384 2 high-resolution Transformer tracker, while running 52% faster and saving 56% MACs. Moreover, LoReTrack is resolution-scalable. With a 128 2 resolution, it runs 25 fps on a CPU with SUC scores of 64. 9%/46. 4% on LaSOT/LaSOT ext, surpassing other CPU real-time trackers. Code is released at https://github.com/ShaohuaDong2021/LoReTrack.

Details

IROS Conference 2024 Conference Paper

Efficient Multimodal Semantic Segmentation via Dual-Prompt Learning

Shaohua Dong
Yunhe Feng
Qing Yang 0003
Yan Huang 0002
Dongfang Liu
Heng Fan 0001

Multimodal (e. g. , RGB-Depth/RGB-Thermal) fusion has shown great potential for improving semantic segmentation in complex scenes (e. g. , indoor/low-light conditions). Existing approaches often fully fine-tune a dual-branch encoder-decoder framework with a complicated feature fusion strategy for achieving multimodal semantic segmentation, which is training-costly due to the massive parameter updates in feature extraction and fusion. To address this issue, we propose a surprisingly simple yet effective dual-prompt learning network (dubbed DPLNet) for training-efficient multimodal (e. g. , RGBD/T) semantic segmentation. The core of DPLNet is to directly adapt a frozen pre-trained RGB model to multimodal semantic segmentation, reducing parameter updates. For this purpose, we present two prompt learning modules, comprising multimodal prompt generator (MPG) and multimodal feature adapter (MFA). MPG works to fuse the features from different modalities in a compact manner and is inserted from shallow to deep stages to generate the multi-level multimodal prompts that are injected into the frozen backbone, while MFA adapts prompted multimodal features in the frozen backbone for better multimodal semantic segmentation. Since both the MPG and MFA are lightweight, only a few trainable parameters (3. 88M, 4. 4% of the pre-trained backbone parameters) are introduced for multimodal feature fusion and learning. Using a simple decoder (3. 27M parameters), DPLNet achieves new state-of-the-art performance or is on a par with other complex approaches on four RGB-D/T semantic segmentation datasets while satisfying parameter efficiency. Moreover, we show DPLNet is general and applicable to other multimodal segmentation tasks. Without special design, DPLNet outperforms many complicated models. The source code can be found at https://github.com/ShaohuaDong2021/DPLNet.

Details

IJCAI Conference 2023 Conference Paper

Addressing Weak Decision Boundaries in Image Classification by Leveraging Web Search and Generative Models

Preetam Prabhu Srikar Dammu
Yunhe Feng
Chirag Shah

Machine learning (ML) technologies are known to be riddled with ethical and operational problems, however, we are witnessing an increasing thrust by businesses to deploy them in sensitive applications. One major issue among many is that ML models do not perform equally well for underrepresented groups. This puts vulnerable populations in an even disadvantaged and unfavorable position. We propose an approach that leverages the power of web search and generative models to alleviate some of the shortcomings of discriminative models. We demonstrate our method on an image classification problem using ImageNet's People Subtree subset, and show that it is effective in enhancing robustness and mitigating bias in certain classes that represent vulnerable populations (e. g. , female doctor of color). Our new method is able to (1) identify weak decision boundaries for such classes; (2) construct search queries for Google as well as text for generating images through DALL-E 2 and Stable Diffusion; and (3) show how these newly captured training samples could alleviate population bias issue. While still improving the model's overall performance considerably, we achieve a significant reduction (77. 30%) in the model's gender accuracy disparity. In addition to these improvements, we observed a notable enhancement in the classifier's decision boundary, as it is characterized by fewer weakspots and an increased separation between classes. Although we showcase our method on vulnerable populations in this study, the proposed technique is extendable to a wide range of problems and domains.

PDF Details DOI

AAAI Conference 2022 Conference Paper

Has CEO Gender Bias Really Been Fixed? Adversarial Attacking and Improving Gender Fairness in Image Search

Yunhe Feng
Chirag Shah

Gender bias is one of the most common and well-studied demographic biases in information retrieval, and in general in AI systems. After discovering and reporting that gender bias for certain professions could change searchers’ worldviews, mainstreaming image search engines, such as Google, quickly took action to correct and fix such a bias. However, given the nature of these systems, viz. , being opaque, it is unclear if they addressed unequal gender representation and gender stereotypes in image search results systematically and in a sustainable way. In this paper, we propose adversarial attack queries composed of professions and countries (e. g. , ‘CEO United States’) to investigate whether gender bias is thoroughly mitigated by image search engines. Our experiments on Google, Baidu, Naver, and Yandex Image Search show that the proposed attack can trigger high levels of gender bias in image search results very effectively. To defend against such attacks and mitigate gender bias, we design and implement three novel re-ranking algorithms – epsilon-greedy algorithm, relevance-aware swapping algorithm, and fairness-greedy algorithm, to re-rank returned images for given image queries. Experiments on both simulated (three typical gender distributions) and real-world datasets demonstrate the proposed algorithms can mitigate gender bias effectively.

PDF Details