Arrow Research search

Author name cluster

Yang Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

98 papers
2 author rows

Possible papers

98

JBHI Journal 2026 Journal Article

Airs-Net: Adversarial-Improved Reversible Steganography Network for CT Images in the Internet of Medical Things and Telemedicine

  • Kai Chen
  • Mu Nie
  • Jean-Louis Coatrieux
  • Yang Chen
  • Shipeng Xie

Medical imaging has developed from an auxiliary means of clinical examination into a significant method and intuitive basis for clinical diagnosis of diseases, providing all-around and full-cycle health protection for the people. The Internet of Medical Things (IoMT) allows medical equipment, intelligent terminals, medical infrastructure, and other elements of medical production to be interconnected, eliminating information silos and data fragmentation. Medical images disseminated in IoMT contain a wide diversity of sensitive patient information, which means protecting the patient’s personal information is vital. In this work, an Adversarial-improved reversible steganography network (Airs-Net) for computed tomography (CT) images in the IoMT is presented. Specifically, the Airs-Net adopting the prediction-embedding strategy mainly consists of an image restoration network, an embedded pixel location network, and a discriminator. The image restoration network is effective in restoring the pixel prediction error of the restoration set in integer and non-integer scaled images of arbitrary size when information is concealed. The embedded information location network can automatically select pixel locations for information embedding based on the interpolated image features of the degraded image. The restored image, embedding location map, and embedding information are fed into the embedder for information embedding, and the subsequent secret-carrying image is continuously optimized for the quality of the information-embedded image by the discriminator. Quantitative results show that Airs-Net outperforms state-of-the-art methods in both PSNR and SSIM. Further, the qualitative and quantitative results and analyses under specific clinical application scenarios and in coping with multiple types of medical image information hiding demonstrate the excellent generalization performance and practical application capability of the Airs-Net.

JBHI Journal 2026 Journal Article

CoRe: An End-to-End Collaborative Refinement Network for Medical Image Segmentation

  • Xiao Ke
  • Yang Chen
  • Wenzhong Guo

The anatomical information obtained from medical image segmentation will provide a crucial decision-making basis for clinical diagnosis and treatment. Deep networks with encoder-decoder architecture proposed recently have achieved impressive results. However, these existing deep networks have some inherent flaws, e. g. , network depth and downsampling operators jointly determine the loss of spatial detail information of deep features. We find that it is the lack of targeted solutions to these inherent flaws that make it difficult to further improve the segmentation performance. Therefore, based on these findings, we propose an end-to-end collaborative refinement method (CoRe). Specifically, we first design to generate an Error-Prone Region (EPR) by predicting uncertainty map and foreground boundary map to simulate the error region, and after locating pixels with high error proneness, we propose a feature refinement module (FRM) based on neighborhood-aware features and foreground-boundary-enhanced features to refine the upsampling features of the decoder, so as to better reconstruct the lost spatial detail information. In addition, a segmentation refinement module (SRM) is proposed to refine coarse segmentation prediction by establishing highly representative global class centers that comprehensively contain the intrinsic properties of each segmentation target. Finally, we conduct extensive experiments on five datasets with different modalities and segmentation targets. The results show that our method achieves significant improvements and competes favorably with current state-of-the-art methods.

JBHI Journal 2026 Journal Article

Direct PET-to-CT Generation for Attenuation Correction: A Slice-to-Slice Continual Transformer Segmentation-Aware Network

  • Rongjun Ge
  • Hanyuan Zheng
  • Yuxin Liu
  • Liutao Yang
  • Li Wang
  • Xu Ji
  • Jingtao Shen
  • Nan Li

Direct synthetic computed tomography (CT) generation from positron emission tomography (PET) plays a crucial role in PET attenuation correction, yet providing detailed structural information to compensate for functional imaging. Compared to the widely used PET/CT and indirect PET/MR-CT, the direct PET-to-CT translation method (denoted as PET-to-CT) offers several advantages: 1) The CT required for PET-to-CT is directly obtained from PET, thereby avoiding the intermediate errors generated in the inter-step processes of multimodal scanning in PET/CT and PET/MR-CT. 2) Furthermore, direct PET-to-CT eliminates the requirement for supplementary imaging equipment, thereby reducing complexity and scan duration in contrast to PET/CT and PET/MR-CT imaging. Thus, direct PET-to-CT is highly promising for clinical applications. However, it faces challenges, including spatial resolution mismatches between PET and CT, as well as voxel-wise semantic differences arising from functional and structural imaging. To address these challenges, this paper proposes a 2D hierarchical method called S2SCT (Slice-to-Slice Continual Transformer)-SA (Segmentation-aware) Network. It uses a slice-continual network to acquire semantic transformation knowledge from each PET slice to a CT slice, facilitating the conversion between functional and structural imaging domains. Subsequently, the segmentation-aware network is designed to futher capture spatial correlations both between slices and within slice, resulting in improved CT spatial resolution. The experiment results demonstrate that our proposed method outperforms mainstream methods in both CT generation and attenuation correction, as evidenced by both visual results and metric values.

AAAI Conference 2026 Conference Paper

Do We Truly Need So Many Samples? Multi-LLM Repeated Sampling Efficiently Scales Test-Time Compute

  • Jianhao Chen
  • Zishuo Xun
  • Bocheng Zhou
  • Han Qi
  • Hangfan Zhang
  • Qiaosheng Zhang
  • Yang Chen
  • Wei Hu

This paper presents a simple, effective, and cost-efficient strategy, named ModelSwitch, to improve LLM performance by scaling test-time compute. ModelSwitch builds upon the repeated-sampling-then-voting framework, with a novel twist: incorporating multiple models, even weaker ones, to leverage their complementary strengths that potentially arise from diverse training data and paradigms. By using sample consistency as a signal, our strategy dynamically switches between models. Theoretical analysis highlights the efficiency and performance advantages of our strategy. Extensive experiments on seven datasets demonstrate that our strategy not only outperforms self-consistency and state-of-the-art multi-agent debate approaches, but also significantly reduces inference costs. Additionally, our strategy requires only a few comparable LLMs to achieve optimal performance and can be extended with verification methods, demonstrating the potential of leveraging multiple LLMs in the generation-verification paradigm.

JBHI Journal 2026 Journal Article

Edge-Aware Diffusion Segmentation Model With Hessian Priors for Automated Diaphragm Thickness Measurement in Ultrasound Imaging

  • Chen-long Miao
  • Yikang He
  • Baike Shi
  • Zhongkai Bian
  • Wenxue Yu
  • Yang Chen
  • Guang-Quan Zhou

The thickness of the diaphragm serves as a crucial biometric indicator, particularly in assessing rehabilitation and respiratory dysfunction. However, measuring diaphragm thickness from ultrasound images mainly depends on manual delineation of the fascia, which is subjective, time-consuming, and sensitive to the inherent speckle noise. In this study, we introduce an edge-aware diffusion segmentation model (ESADiff), which incorporates prior structural knowledge of the fascia to improve the accuracy and reliability of diaphragm thickness measurements in ultrasound imaging. We first apply a diffusion model, guided by annotations, to learn the image features while preserving edge details through an iterative denoising process. Specifically, we design an anisotropic edge-sensitive annotation refinement module that corrects inaccurate labels by integrating Hessian geometric priors with a backtracking shortest-path connection algorithm, further enhancing model accuracy. Moreover, a curvature-aware deformable convolution and edge-prior ranking loss function are proposed to leverage the shape prior knowledge of the fascia, allowing the model to selectively focus on relevant linear structures while mitigating the influence of noise on feature extraction. We evaluated the proposed model on an in-house diaphragm ultrasound dataset, a public calf muscle dataset, and an internal tongue muscle dataset to demonstrate robust generalization. Extensive experimental results demonstrate that our method achieves finer fascia segmentation and significantly improves the accuracy of thickness measurements compared to other state-of-the-art techniques, highlighting its potential for clinical applications.

JBHI Journal 2026 Journal Article

ESIP: Explicit Surgical Instrument Prompting for Surgical Workflow Recognition

  • Yixuan Qiu
  • Mengxing Liu
  • Siyuan He
  • Guangquan Zhou
  • Fei Lyu
  • Yang Chen
  • Ping Zhou

Surgical workflow recognition (SWR) stands as a pivotal component in computer-assisted surgery and is dedicated to identifying phases from surgical videos. Many deep learning-based methods have been proposed for this task and achieved acceptable SWR results. However, these methods usually implicitly extract and aggregate spatio-temporal features, so that it is challenging for these methods to adequately use some spatial information that is strongly relevant to surgical phase in SWR task, such as the information from the surgical instruments. To address this issue, an Explicit Surgical Instrument Prompting (ESIP) approach is proposed for SWR task. ESIP leverages surgical instrument segmentation to generate instrument-specific visual prompts, which explicitly guide the extraction of crucial intra-frame spatial features through a frozen pre-trained backbone, then enable effective inter-frame spatio-temporal feature extraction and aggregation. Unlike multi-task approaches that jointly perform SWR with auxiliary tasks within a shared network framework, ESIP is a single-task SWR approach dedicated to optimize framework itself for more adequate feature extraction. Furthermore, to accomplish the segmentation prompting efficiently, this paper presents SAM-based segmentation with prompt tuning strategy to explicitly integrate segmentation features into spatial features. Experimental results on Cholec80, M2CAI and AutoLaparo datasets demonstrate that our ESIP method achieves the best performance in comparison with 16 SOTA methods, with a Precision of 91. 8%, 89. 5% and 89. 6%, Recall of 92. 2%, 89. 5% and 76. 9%, Jaccard of 83. 3%, 77. 0% and 67. 3%, respectively.

AAAI Conference 2026 Conference Paper

Flowing Backwards: Improving Normalizing Flows via Reverse Representation Alignment

  • Yang Chen
  • Xiaowei Xu
  • Shuai Wang
  • Chenhui Zhu
  • Ruxue Wen
  • Xubin Li
  • Tiezheng Ge
  • Limin Wang

Normalizing Flows (NFs) are a class of generative models distinguished by a mathematically invertible architecture, where the forward pass transforms data into a latent space for density estimation, and the reverse pass generates new samples from this space. This characteristic creates an intrinsic synergy between representation learning and data generation. However, the generative quality of standard NFs is limited by poor semantic representations from log-likelihood optimization. To remedy this, we propose a novel alignment strategy that creatively leverages the invertibility of NFs: instead of regularizing the forward pass, we align the intermediate features of the generative (reverse) pass with representations from a powerful vision foundation model, demonstrating superior effectiveness over naive alignment. We also introduce a novel training-free, test-time optimization algorithm for classification, which provides a more intrinsic evaluation of the NF's embedded semantic knowledge. Comprehensive experiments demonstrate that our approach accelerates the training of NFs by over 3.3x, while simultaneously delivering significant improvements in both generative quality and classification accuracy. New state-of-the-art results for NFs are established on ImageNet 64 x 64 and 256 x 256.

JBHI Journal 2026 Journal Article

HSD: Hough-Based Structure-Aware Detection of B-Lines in Lung Ultrasound

  • Tuo Liu
  • Hao Zhou
  • Jia-Hao Wang
  • Yu Zhang
  • Chen Chen
  • Yang Chen
  • Guang-Quan Zhou
  • Lin Li

B-lines are artifacts produced by the interaction of the ultrasound with the small air-liquid interface, which often serve as crucial biomarkers for evaluating lung pathology, such as the presence of liquid. However, due to the reverberation phenomenon, B-lines manifest as blurred, strip-like comet tails perpendicularly originating from the pleural line, making their automatic identification in speckle-noisy ultrasound images particularly challenging. This study proposes a Hough-based structure-aware detection framework, dubbed HSD, which leverages structural priors and the intrinsic relationship between the pleural line and B-lines to enhance B-line detection in ultrasound images. First, the proposed method adopts the shared encoder and two collaborative decoders to improve B-lines identification with the auxiliary pleural line detection, ensuring effective representation learning of linear structural features under inherent prior constraints. Specifically, one decoder incorporates Hough-based regression to reinforce the modeling of the global linear nature for B-line detection, alleviating the appearance influences of the fuzzy comet-tail. Simultaneously, another pathway enhances the exploration of the slender, curved morphology by integrating semantic context learning with linear heatmap regression, thereby facilitating the detection of the pleural line for calibration of B-lines. Second, we introduce a position-aware rectification module to ensure the consistency of the pleural line and its perpendicular alignment with B-lines. This post-processing module reduces the influence of ambiguous pixels, improving the robustness of B-line detection. Extensive experimental results on an in-house ultrasound dataset demonstrate the superiority of the proposed approach, which achieves a precision of 0. 743, a recall of 0. 953, and an F-measure of 0. 837, substantially ahead of other methods, suggesting its potential for detecting pathological indicators in lung ultrasound.

AAAI Conference 2026 Conference Paper

MIGDiff: Multi-attributes Imputations for Attribute-missing Graphs via Graph Denoising Diffusion Model

  • Ye Liu
  • Yang Chen
  • Hongmin Cai

The missing of graph attributes poses a significant challenge in graph representation learning. Some existing graph attribute completion methods adopt the shared-space hypothesis or employ end-to-end frameworks to perform single-attribute imputation. However, these models can only generate one single attribute with a few specific patterns that either adhere to prior knowledge or are optimal for downstream tasks, making it difficult to capture the full range of variations in the target attribute distribution. This limitation negatively impacts the model's generalizability and efficiency. Therefore, to address this issue, we proposed a new method based on a graph denoising diffusion model, called Multi-attribute Imputation Graph Denoising Diffusion Model (MIGDiff), which can generate multiple high-quality attributes. Specifically, it employs a Dual-source Auto-encoder on existing attributes and graph topology to extract reliable knowledge, which serves as a condition for training the diffusion module. Within diffusion, noise is added to the structural embeddings of nodes without attributes in the forward process. In the reverse process, a Structure-aware Denoising Network is devised to integrate feature and structural information via an attention mechanism and to perform neighbor-guided refinement based on graph connectivity, thereby enhancing denoising and accurately recovering missing attributes while effectively maintaining structural consistency and distributional fidelity. During generation, multiple initial values are sampled to produce diverse attribute imputations, avoiding focusing on a few easy-to-learn patterns. Extensive experiments conducted on four public datasets highlight the state-of-the-art performance of MIGDiff in both attribute imputation and node classification tasks.

JBHI Journal 2026 Journal Article

Multi-Scale Temporal Analysis With a Dual-Branch Attention Network for Interpretable Gait-Based Classification of Neurodegenerative Diseases

  • Wei Zeng
  • Zhangbo Peng
  • Yang Chen
  • Shaoyi Du

The accurate diagnosis of neurodegenerative diseases (NDDs), such as Amyotrophic Lateral Sclerosis (ALS), Huntington’s Disease (HD), and Parkinson’s Disease (PD), remains a clinical challenge due to the complexity and subtlety of gait abnormalities. This paper proposes the Dual-Branch Attention-Enhanced Residual Network (DAERN), a novel deep learning architecture that integrates Dilated Causal Convolutions (DCCBlock) for local gait pattern extraction and Multi-Head Self-Attention (MHSA) for long-range dependency modeling. A Cross-Attention Fusion module enhances feature integration, while SHapley Additive exPlanations (SHAP) and Integrated Gradients (IG) improve interpretability, providing clinically relevant insights into gait-based NDD classification. Uniform Manifold Approximation and Projection (UMAP) visualizations reveal well-separated clusters corresponding to distinct NDDs categories, demonstrating the model’s ability to capture discriminative features. Comprehensive ablation studies validate the contributions of model components and preprocessing strategies, highlighting the significance of each in achieving state-of-the-art classification performance. Experimental evaluations on the Gait in Neurodegenerative Disease (GaitNDD) dataset demonstrate that DAERN achieves an accuracy of 99. 64%, an F1-score of 99. 65%, and an AUC of 0. 9997, significantly outperforming conventional deep learning and machine learning baselines. These findings suggest that DAERN could be a valuable and interpretable tool for clinical gait assessment, aiding in early-stage monitoring and automated screening of NDDs, with potential applications in real-time wearable sensor-based gait analysis.

JBHI Journal 2026 Journal Article

OrthoDetNet: An Enhanced YOLO-Based Framework for Detection of Orthopedic Surgical Instruments

  • Peng Xu
  • Guangquan Zhou
  • Mengxing Liu
  • Chu Guo
  • Yixuan Qiu
  • Yang Chen
  • Ping Zhou

Accurate detection of surgical instruments is critical for both routine surgical procedures and surgical robotics research. To the best of our knowledge, there is a notable lack of datasets and dedicated detection studies specifically addressing orthopedic surgical instruments. Detecting orthopedic surgical instruments presents particular challenges including significant size variations, highly similar shapes, and frequent, severe occlusions due to instrument intersections. To address these issues, we propose an orthopedic surgical instrument detection method (OrthoDetNet) incorporating three specialized modules. The FilterUnit mitigates occlusion effects via an adaptive feature filtering mechanism, that dynamically adjusts its filtering strategy based on context, prioritizing features from key regions while suppressing distracting interference features. The DEUnit enhances fine-grained feature discrimination in local regions to distinguish instruments with high shape similarity, and the BDFusion module improves multi-scale detection performance through bi-directional feature fusion between deep and shallow-level feature maps. A dataset for orthopedic surgical instrument detection is created, which is based on the proximal femoral nail antirotation (PFNA) instrument package manufactured by Shenzhen Mindray Bio-Medical Electronics Co. , Ltd. Images were captured in a controlled, simulated experimental environment, ensuring no patient privacy or ethical concerns. We obtained explicit authorization from the manufacturer for instrument use. Experimental results on this dataset demonstrate the effectiveness of the OrthoDetNet and its constituent modules.

JBHI Journal 2026 Journal Article

SkeDiff: Skeleton 3D CT Diffusion Reconstruction using 2D X-ray

  • Yuan Gao
  • Rongjun Ge
  • Yunbo Gu
  • Zhan Wu
  • Yuanhang Li
  • Mingle Zhou
  • Kai Chen
  • Jean-Louis Coatrieux

For orthopedic diagnostics, both 2D X-ray and 3D CT imaging play essential roles. X-ray imaging is widely accessible, clinically effective, easy to operate, and has lower radiation exposure than CT. However, its inherent 2D nature limits comprehensive visualization of skeletal structures, which 3D CT provides. To bridge this gap, we propose SkeDiff, an algorithm for reconstructing 3D CT images of the skeleton from orthogonal 2D X-ray projections. To fully leverage the information in X-ray images for guiding the diffusion process, we design a cross-dimensional conditional encoder, $E\_{Cond}$, to extract 2D priors for the 3D diffusion model, $DM\_{3DL}$. This encoder integrates a CNN-Mamba hybrid architecture to enhance feature extraction and nonlinear mapping. Additionally, we introduce a 3D UKAN diffusion backbone, which employs Kolmogorov-Arnold network (KAN) to improve feature representation through learnable nonlinear activations. Furthermore, we propose a diffusion-based scoliosis classifier, $D\_{SC}$, enabling scoliosis classification during the 3D CT reconstruction process. Experiments show that SkeDiff outperforms recent algorithms on spine, hip, and knee datasets.

TMLR Journal 2026 Journal Article

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

  • Guibin Zhang
  • Hejia Geng
  • Xiaohang Yu
  • Zhenfei Yin
  • Zaibin Zhang
  • Zelin Tan
  • Heng Zhou
  • Zhong-Zhi Li

The emergence of agentic reinforcement learning (Agentic RL) marks a paradigm shift from conventional reinforcement learning applied to large language models (LLM RL), reframing LLMs from passive sequence generators into autonomous, decision-making agents embedded in complex, dynamic worlds. This survey formalizes this conceptual shift by contrasting the degenerate single-step Markov Decision Processes (MDPs) of LLM RL with the temporally extended Partially Observable Markov Decision Processes (POMDPs) that define Agentic RL. Building on this foundation, we propose a comprehensive twofold taxonomy: one organized around core agentic capabilities, including planning, tool use, memory, reasoning, self-improvement, and perception, and the other around their applications across diverse task domains. Central to our thesis is that reinforcement learning serves as the critical mechanism for transforming these capabilities from static, heuristic modules into adaptive, robust agentic behavior. To support and accelerate future research, we consolidate the landscape of open-source environments, benchmarks, and frameworks into a practical compendium. By synthesizing over five hundred recent works, this survey charts the contours of this rapidly evolving field and highlights the opportunities and challenges that will shape the development of scalable, general-purpose AI agents.

JBHI Journal 2026 Journal Article

USRMamba: Adaptive Routing-Guided State Space Model for Ultrasound Super-Resolution

  • Tao Wang
  • Zihan Zhou
  • Chufeng Jin
  • Tianyi Liu
  • Baike Shi
  • Guangquan Zhou
  • Rongjun Ge
  • Jean-Louis Coatrieux

In ultrasound (US) imaging, resolution degradation caused by the acoustic diffraction limit and transducer array density can significantly reduce image quality, which have negative impacts on clinical diagnosis. Super-resolution (SR) reconstruction is a more flexible and cost-effective measure compared to system upgrades. However, the complexity and diversity of tissue acoustic properties make it difficult to establish a unified model for US image SR reconstruction. In this context, this paper pioneers a revolutionary Mamba-based single US image SR method, referred to as USRMamba. Firstly, a simple and efficient Enhanced Transform Combine Module (ETCM) is designed for shallow feature extraction, which achieves multi-scale decoupling through Laplacian sharpening and wavelet transform to solve the interference of high-frequency information loss and speckle noise in US images; More importantly, an Adaptive Top-k Prompt Module (ATPM) is proposed, whose core is to generate semantic prompts through an adaptive routing-guided strategy to suppress the interference of fuzzy region labels caused by attenuation on detail reconstruction. In addition, a Frequency Channel Attention Module (FCAM) is developed, forming a modeling strategy of “frequency-spatial domain reconstruction” in parallel with ATPM, further optimizing the fidelity for US images SR reconstruction. Qualitative and quantitative experiments demonstrate that USRMamba exhibits superior performance on several US datasets. Especially with scale factor ×2, the proposed method has an average PSNR 1. 31dB higher than state-of-the-art (SOTA) methods.

JBHI Journal 2026 Journal Article

WOADNet: A Wavelet-Inspired Orientational Adaptive Dictionary Network for CT Metal Artifact Reduction

  • Tong Jin
  • Jin Liu
  • Diandian Wang
  • Kun Wang
  • Chenlong Miao
  • Yikun Zhang
  • Dianlin Hu
  • Zhan Wu

In computed tomography (CT), metal artifacts pose a persistent challenge to achieving high-quality imaging. Despite advancements in metal artifact reduction (MAR) techniques, many existing approaches have not fully leveraged the intrinsic a priori knowledge related to metal artifacts, improved model interpretability, or addressed the complex texture of CT images effectively. To address these limitations, we propose a novel and interpretable framework, the wavelet-inspired oriented adaptive dictionary network (WOADNet). WOADNet builds on sparse coding with orientational information in the wavelet domain. By exploring the discriminative features of artifacts and anatomical tissues, we adopt a high-precision filter parameterization strategy that incorporates multiangle rotations. Furthermore, we integrate a reweighted sparse constraint framework into the convolutional dictionary learning process and employ a cross-space, multiscale attention mechanism to construct an adaptive convolutional dictionary unit for the artifact feature encoder. This innovative design allows for flexible adjustment of weights and convolutional representations, resulting in significant image quality improvements. The experimental results using synthetic and clinical datasets demonstrate that WOADNet outperforms both traditional and state-of-the-art MAR methods in terms of suppressing artifacts.

NeurIPS Conference 2025 Conference Paper

AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning

  • Yang Chen
  • Zhuolin Yang
  • Zihan Liu
  • Chankyu Lee
  • Peng Xu
  • Mohammad Shoeybi
  • Bryan Catanzaro
  • Wei Ping

Despite recent progress in large-scale reinforcement learning (RL) for reasoning, the training recipe for building high-performing reasoning models remains elusive. Key implementation details of frontier models, such as DeepSeek-R1, including data curation strategies and RL training recipe, are often omitted. Moreover, recent research indicates distillation remains more effective than RL for smaller models. In this work, we demonstrate that large-scale RL can significantly enhance the reasoning capabilities of strong, small- and mid-sized models, achieving results that surpass those of state-of-the-art distillation-based models. We systematically study the RL training process through extensive ablations and propose a simple yet effective approach: first training on math-only prompts, then on code-only prompts. Notably, we find that math-only RL not only significantly enhances the performance of strong distilled models on math benchmarks (e. g. , +14. 6\% / +17. 2\% on AIME 2025 for the 7B / 14B models), but also code reasoning tasks (e. g. , +6. 8\% / +5. 8\% on LiveCodeBench for the 7B / 14B models). In addition, extended code-only RL further improves code benchmark performance while causing minimal degradation in math results. We develop a robust data curation pipeline to collect challenging prompts with high-quality, verifiable answers and test cases to enable verification-based RL across both domains. The dataset will be released to support open research. Finally, we identify key experimental insights, including curriculum learning with progressively increasing response lengths and the stabilizing effect of on-policy parameter updates. We find that RL not only elicits the foundational reasoning capabilities acquired during pretraining and supervised fine-tuning (SFT), but also pushes the limits of the model’s reasoning ability, enabling it to solve problems that were previously unsolvable.

IROS Conference 2025 Conference Paper

Achieving Lift-to-Weight Ratio >3. 5 in Piezoelectric Direct-Driven Insect-Scale Flapping-Wing MAVs

  • Xiang Lu
  • Jie Chen
  • Yang Chen
  • Zixin Deng
  • Yulie Wu
  • Xuezhong Wu
  • Dingbang Xiao

Insect-scale flapping-wing micro aerial vehicles (FWMAVs) employing piezoelectric direct-drive configurations eliminate traditional kinematic chains through direct coupling of the wing and actuator. While this design approach significantly reduces structural complexity and manufacturing costs compared to transmission-dependent systems, it inherently limits wing stroke amplitude and consequent lift generation. This paper presents a novel lift-enhancement strategy for piezoelectric direct-drive FWMAVs, effectively improving payload capacity through optimized aerodynamic performance. The redesigned X-configuration prototype demonstrates outstanding metrics: 68 mm wingspan with 212 mg total mass achieves 7. 47 mN maximum lift (exceeding 3. 5: 1 lift-to-weight ratio) and 1. 25 m/s takeoff speed. Experimental validation confirms 39% payload capacity improvement and 34% lift-to-weight ratio enhancement compared to baseline designs. This enhancement establishes our robot as the current state-of-the-art in piezoelectric direct-drive FWMAVs regarding lift-to-weight ratio.

AAAI Conference 2025 Conference Paper

Acting Beyond Learning: Imagination-Assisted Decision-Making in the Visual-based Multi-Agent Cooperative Scenarios

  • Huanhuan Yang
  • Dianxi Shi
  • Songchang Jin
  • Guojun Xie
  • Yang Chen
  • Chunping Qiu
  • Shaowu Yang

Learning optimal policies in multi-agent cooperative settings with visual observations is significant and challenging. Agents must first perform state representation learning for their image observations and then learn policies in the abstracted state space. Aiming at this problem, we propose a novel model-based MARL method named Contrastive Latent World for Policy Optimization (CLWPO). In CLWPO, we first design a state representation model to facilitate learning in the latent state space. With the support of this model, we construct the latent world and introduce a contrastive variational bound (CVB) to optimize it. Subsequently, we develop a heuristic policy optimization (HPO) scheme, incorporating model-free learning with model-based planning to obtain robust policies that predict future behaviors. In particular, in the planning, we maintain a queue of teammate models and calculate an adaptive rollout length for each agent to support their self-imagination and reduce the model-based return discrepancy. Finally, we conducted extensive experiments in the PettingZoo benchmark, and results show that CLWPO significantly enhances learning efficiency and improves agent performance compared to state-of-the-art MARL methods.

NeurIPS Conference 2025 Conference Paper

Complete Structure Guided Point Cloud Completion via Cluster- and Instance-Level Contrastive Learning

  • Yang Chen
  • Yirun Zhou
  • Weizhong Zhang
  • Cheng Jin

Point cloud completion, aiming to reconstruct missing part from incomplete point clouds, is a pivotal task in 3D computer vision. Traditional supervised approaches often necessitate complete point clouds for training supervision, which are not readily accessible in real-world applications. Recent studies have attempted to mitigate this dependency by employing self-supervise mechanisms. However, these approaches frequently yield suboptimal results due to the absence of complete structure in the point cloud data during training. To address these issues, in this paper, we propose an effective framework to complete the point cloud under the guidance of self learned complete structure. A key contribution of our work is the development of a novel self-supervised complete structure reconstruction module, which can learn the complete structure explicitly from incomplete point clouds and thus eliminate the reliance on training data from complete point clouds. Additionally, we introduce a contrastive learning approach at both the cluster- and instance-level to extract shape features guided by the complete structure and to capture style features, respectively. This dual-level learning design ensures that the generated point clouds are both shape-completed and detail-preserving. Extensive experiments on both synthetic and real-world datasets demonstrate that our approach significantly outperforms state-of-the-art self-supervised methods.

JBHI Journal 2025 Journal Article

CPGNet: Multimodal Graph Learning with Hierarchical Category Guidance for Multi-Label Whole Slide Image Classification

  • Haoyun Zhao
  • Dapeng Tao
  • Yibing Zhan
  • Jun Ni
  • Yang Chen

The analysis of WSI categories in digital pathology is critical for clinician decision making regarding the diagnosis, treatment, and prognosis of cancer patients. However, current automated methods for cancer type identification are predominantly formulated as single-label classification problems. These methods typically rely on datasets with relatively balanced and abundant samples, where each WSI belongs to a single category. This approach does not fully align with real-world clinical scenarios, where cancer subtypes often exhibit multi-label characteristics and class imbalance, posing significant challenges. To address this issue, this paper proposes CPGNet, a category-prompted graph network designed as a multi-label WSI classifier better suited for clinical applications. CPGNet employs the MaskSLIC algorithm for superpixel segmentation of WSIs, effectively capturing the nonlinear spatial distribution of cellular and tissue structures. The segmented superpixels are then encoded as graph nodes with their corresponding features, while edges and edge features are constructed to abstractly model WSIs as graphs. Furthermore, the method introduces a GLGFI module, which aggregates features from neighboring nodes and edges via a GNN to capture local information, while simultaneously leveraging a multi-head self-attention mechanism to model global dependencies, mimicking the diagnostic behavior of pathologists. Additionally, a VCI module exploits semantic relationships between categories to guide visual feature classification, providing supplementary cues for accurate predictions. To enhance the model's focus on hard-to-classify positive samples, we also implement a reweighting strategy. The proposed approach is evaluated on a private dataset (YNLUAD) and two public challenge datasets (BCNB and AGGC22). The experimental results demonstrate the superiority, universality, and robustness of CPGNet. The code is available at https://github.com/zhy1312/CPGNet.

JBHI Journal 2025 Journal Article

DAM: Degradation-Aware Model for Ultrasound Image Quality Assessment

  • Tuo Liu
  • Xuan Zhang
  • Xiuzhu Ma
  • Shuang Chen
  • Xuejuan Wang
  • Ping Zhou
  • Yang Chen
  • Guangquan Zhou

One of the core challenges in ultrasound image quality assessment (IQA) is the entanglement of semantic content and quality-related information, such as blurring and shadows. Insufficient attention to the latter can easily lead to biased IQA results. Furthermore, fine-grained quality inconsistencies, i. e. , subtle variations in ultrasound images that can impact quality interpretations, may further complicate the IQA tasks. To address these challenges, we propose a novel degradation-aware model (DAM) for the ultrasound IQA, which effectively perceives various and subtle variations of quality patterns, accurately assessing the quality of ultrasound images. The advanced degradation-derived augmentation (DDA) in DAM incorporates degradations that clinicians may focus on during IQA into the synthesis of appearance changes, promoting the disentanglement of quality-related representations from semantic contents. Subsequently, we present fine-grained degradation learning (FGDL), which encourages distinctions between image versions with diminishing quality inconsistencies, boosting the awareness of quality nuances from easy to hard for better ultrasound IQA performance. A universal boundary acquisition operator (UBAO) is also developed to suppress interferences from redundant information, achieving the standardization of ultrasound images from various devices. Extensive experimental results on an in-house ultrasound dataset demonstrate that DAM outperforms 14 baseline methods, achieving a PLCC of 0. 760 and an SROCC of 0. 766. The code can be available at this URL.

JMLR Journal 2025 Journal Article

Determine the Number of States in Hidden Markov Models via Marginal Likelihood

  • Yang Chen
  • Cheng-Der Fuh
  • Chu-Lan Michael Kao

Hidden Markov models (HMM) have been widely used by scientists to model stochastic systems: the underlying process is a discrete Markov chain, and the observations are noisy realizations of the underlying process. Determining the number of hidden states for an HMM is a model selection problem which is yet to be satisfactorily solved, especially for the popular Gaussian HMM with heterogeneous covariance. In this paper, we propose a consistent method for determining the number of hidden states of HMM based on the marginal likelihood, which is obtained by integrating out both the parameters and hidden states. Moreover, we show that the model selection problem of HMM includes the order selection problem of finite mixture models as a special case. We give rigorous proof of the consistency of the proposed marginal likelihood method and provide an efficient computation method for practical implementation. We numerically compare the proposed method with the Bayesian information criterion (BIC), demonstrating the effectiveness of the proposed marginal likelihood method. [abs] [ pdf ][ bib ] &copy JMLR 2025. ( edit, beta )

JBHI Journal 2025 Journal Article

Dual-Level Imbalance Mitigation for Single-FoV Colorectal Histopathology Image Classification

  • Lingling Yuan
  • Yang Chen
  • Md Rahaman
  • Hongzan Sun
  • Haoyuan Chen
  • Marcin Grzegorzek
  • Chen Li
  • Xiaoyan Li

Single-field-of-view (FoV) histopathological image classification is vital for colorectal cancer (CRC) diagnosis in mid- to low-tier hospitals lacking whole-slide imaging (WSI) scanners and storage, yet suffers from severe class imbalance and degraded performance. To address this, we propose a dual-level imbalance mitigation (DIM) framework integrating data-level and algorithm-level approaches. Specifically: (1) A global context generative adversarial network (GCGAN) generates realistic minority-class images for augmentation to balance the dataset. (2) A frequency-aware adaptive focal loss (FAFL) applies a frequency-aware offset and adaptive modulation to better separate overlapping classes. (3) A lightweight receptive field-based convolutional neural network (LRF-CNN) is trained under DIM to leverage both augmentation and loss modulation for improved classification. Extensive experiments on the single-FoV colorectal histopathology dataset demonstrate that DIM-equipped LRF-CNN outperforms five state-of-the-art models (SOTA) across multiple metrics. Furthermore, each DIM component enhances performance when applied individually to those SOTA models, and additional validation on six single-FoV histopathological datasets confirms the generalizability and effectiveness of the proposed DIM framework. Our code is available at https://github.com/Lingling-Yuan/DIM.

ICLR Conference 2025 Conference Paper

Earlier Tokens Contribute More: Learning Direct Preference Optimization From Temporal Decay Perspective

  • Ruichen Shao
  • Bei Li
  • Gangao Liu
  • Yang Chen
  • ZhouXiang
  • Jingang Wang
  • Xunliang Cai
  • Peng Li

Direct Preference Optimization (DPO) has gained attention as an efficient alternative to reinforcement learning from human feedback (RLHF) for aligning large language models (LLMs) with human preferences. Despite its advantages, DPO suffers from a length bias, generating responses longer than those from the reference model. Existing solutions like SimPO and SamPO address this issue but uniformly treat the contribution of rewards across sequences, overlooking temporal dynamics. To this end, we propose an enhanced preference optimization method that incorporates a temporal decay factor controlled by a gamma parameter. This dynamic weighting mechanism adjusts the influence of each reward based on its position in the sequence, prioritizing earlier tokens that are more critical for alignment. By adaptively focusing on more relevant feedback, our approach mitigates overfitting to less pertinent data and remains responsive to evolving human preferences. Experimental results on several benchmarks show that our approach consistently outperforms vanilla DPO by 5.9-8.8 points on AlpacaEval 2 and 3.3-9.7 points on Arena-Hard across different model architectures and sizes. Furthermore, additional experiments on mathematical and reasoning benchmarks (MMLU, GSM8K, and MATH) confirm that our method enhances performance without compromising general capabilities. Our codebase would be available at \url{https://github.com/LotuSrc/D2PO}.

JBHI Journal 2025 Journal Article

EDG-Net: Encryption and Decryption based Gan-attention Network for CT images in the Internet of Medical Things and Telemedicine

  • Kai Chen
  • Yuchen Li
  • Shipeng Xie
  • Zhan Wu
  • Yikun Zhang
  • Jean-Louis Coatrieux
  • Wei Yan
  • Yang Chen

CT images provide medical practitioners with a scientific and intuitive rationale for the diagnosis of clinical diseases. The Internet of Medical Things (IoMT) and telemedicine facilitate the preservation, transmission, and application of medical data and drive the sharing of medical data, especially medical images. Encryption and decryption of CT images distributed in the IoMT and telemedicine are becoming critical because they contain a large amount of private patient–ensitive information and are vulnerable to third-party attacks, resulting in information exposure and privacy leakage. In this paper, we propose an Encryption and Decryption based Gan-attention network (EDG-Net) for CT images in the IoMT and telemedicine. EDG-Net consists of a generator, two discriminators, a domain transfer of attention, and adaptive normalization. In addition, a double encryption and decryption strategy is introduced by EDG-Net to effectively improve the security of the ciphertext image and the fidelity of the decrypted plaintext image. Specifically, during the encryption or decryption phase, the generator transforms the CT images mutually in the plaintext and ciphertext domains. Two discriminators to identify and modify the differences between these two domain transformations, especially improve the accuracy of the reconstruction during decryption. The parameters of the trained encryption and decryption network are considered as the secret keys of encryption and decryption. Qualitative and quantitative analysis of public and private datasets demonstrates the superior performance of EDG-Net regarding encryption security and robustness as well as decryption accuracy.

AAAI Conference 2025 Conference Paper

Efficient Reinforcement Learning Through Adaptively Pretrained Visual Encoder

  • Yuhan Zhang
  • Guoqing Ma
  • Guangfu Hao
  • Liangxuan Guo
  • Yang Chen
  • Shan Yu

While Reinforcement Learning (RL) agents can successfully learn to handle complex tasks, effectively generalizing acquired skills to unfamiliar settings remains a challenge. One of the reasons behind this is the visual encoder used are task-dependent, preventing effective feature extraction in different settings. To address this issue, recent studies have tried to pretrain encoders with diverse visual inputs in order to improve their performance. However, they rely on existing pretrained encoders without further exploring the impact of pretraining period. In this work, we propose APE: efficient reinforcement learning through Adaptively Pretrained visual Encoder—a framework that utilizes adaptive augmentation strategy during the pretraining phase and extracts useful features with only a few interactions within the task environments in the policy learning period. Experiments are conducted across various domains, including DeepMind Control Suite, Atari Games and Memory Maze benchmarks, to verify the effectiveness of our method. Results show that mainstream RL methods, such as DreamerV3 and DrQ-v2, achieve state-of-the-art performance when equipped with APE. In addition, APE significantly improves the sampling efficiency during learning, approaching the efficiency of state-based method using only visual inputs in several control tasks. These findings demonstrate the potential of adaptive pretraining of encoder in enhancing the generalization ability and efficiency of visual RL algorithms.

IJCAI Conference 2025 Conference Paper

Exploring Transferable Homogenous Groups for Compositional Zero-Shot Learning

  • Zhijie Rao
  • Jingcai Guo
  • Miaoge Li
  • Yang Chen
  • Mengzhu Wang

Conditional dependency present one of the trickiest problems in Compositional Zero-Shot Learning, leading to significant property variations of the same state (object) across different objects (states). To address this problem, existing approaches often adopt either all-to-one or one-to-one representation paradigms. However, these extremes create an imbalance in the seesaw between transferability and discriminability, favoring one at the expense of the other. Comparatively, humans are adept at analogizing and reasoning in a hierarchical clustering manner, intuitively grouping categories with similar properties to form cohesive concepts. Motivated by this, we propose Homogeneous Group Representation Learning (HGRL), a new perspective formulates state (object) representation learning as multiple homogeneous sub-group representation learning. HGRL seeks to achieve a balance between semantic transferability and discriminability by adaptively discovering and aggregating categories with shared properties, learning distributed group centers that retain group-specific discriminative features. Our method integrates three core components designed to simultaneously enhance both the visual and prompt representation capabilities of the model. Extensive experiments on three benchmark datasets validate the effectiveness of our method. Code is available at https: //github. com/zjrao/HGRL.

JBHI Journal 2025 Journal Article

Frequency-Phase Guided Attention Complex-Valued Network for Ultrasound Image Segmentation

  • Wen-Bo Zhang
  • Ping Zhou
  • Yang Chen
  • Guang-Quan Zhou

Ultrasound imaging has emerged as an effective tool for aiding diagnosis. The automatic segmentation of ultrasound images is crucial in identifying the lesion target and evaluating clinical indicators for accurate diagnosis and prognosis. However, the segmentation problems are challenging due to the inherent speckle noise interference and low contrast of ultrasound images. The complex-value-based neural network can directly deal with the phase components, offering a potential solution in a better-perceiving structure for ultrasound image segmentation. In this study, we develop a Frequency Phase-Guided Attention Network (FPGANet) for ultrasound image segmentation by exploring the properties of the complex-valued model under the guide of phase and frequency perspectives. First, our proposed method transforms images into a complex domain as the input to an advanced complex-value model consisting of pure complex-value convolutions and operations. Especially this model can then effectively scrutinize phase information to distinguish target areas from similar backgrounds better. Moreover, we introduce a complex hybrid attention module following complex convolution to selectively adjust the perception of phase components and the model's bias. Also, we designed a frequency-adaptive separation module to emphasize frequency features prioritized by the encoder and decoder using a combination of wavelet decomposition and frequency channel attention. We evaluate the proposed FPGANet on three publicly available ultrasound datasets of breast, cardiac and thyroid nodules and a private abdominal effusion ultrasound dataset. Comparative experiments were also conducted with state-of-the-art methods. The results demonstrate the superior performance of FPGANet, implying its potential for advancing ultrasound image segmentation.

JBHI Journal 2025 Journal Article

LADDA: Latent Diffusion-based Domain-adaptive Feature Disentangling for Unsupervised Multi-modal Medical Image Registration

  • Peng Yuan
  • Jianmin Dong
  • Wei Zhao
  • Fei Lyu
  • Cheng Xue
  • Yudong Zhang
  • Chunfeng Yang
  • Zhan Wu

Deformable image registration (DIR) is critical for accurate clinical diagnosis and effective treatment planning. However, patient movement, significant intensity differences, and large breathing deformations hinder accurate anatomical alignment in multi-modal image registration. These factors exacerbate the entanglement of anatomical and modality-specific style information, thereby severely limiting the performance of multi-modal registration. To address this, we propose a novel LAtent Diffusion-based Domain-Adaptive feature disentangling (LADDA) framework for unsupervised multi-modal medical image registration, which explicitly addresses the representation disentanglement. First, LADDA extracts reliable anatomical priors from the Latent Diffusion Model (LDM), facilitating downstream content-style disentangled learning. A Domain-Adaptive Feature Disentangling (DAFD) module is proposed to promote anatomical structure alignment further. This module disentangles image features into content and style information, boosting the network to focus on cross-modal content information. Next, a Neighborhood-Preserving Hashing (NPH) is constructed to further perceive and integrate hierarchical content information through local neighbourhood encoding, thereby maintaining cross-modal structural consistency. Furthermore, a Unilateral-Query-Frozen Attention (UQFA) module is proposed to enhance the coupling between upstream prior and downstream content information. The feature interaction within intra-domain consistent structures improves the fine recovery of detailed textures. The proposed framework is extensively evaluated on large-scale multi-center datasets, demonstrating superior performance across diverse clinical scenarios and strong generalization on out-of-distribution (OOD) data.

ICML Conference 2025 Conference Paper

Low-Dimension-to-High-Dimension Generalization and Its Implications for Length Generalization

  • Yang Chen
  • Long Yang 0004
  • Yitao Liang
  • Zhouchen Lin

Low-Dimension-to-High-Dimension (LDHD) generalization, a subset of Out-of-Distribution (OOD) generalization, involves training on a low-dimensional subspace and testing in a high-dimensional space. Assuming instances are generated from latent variables reflecting problem scale, LDHD generalization captures the inherent scaling challenge of length generalization. We theoretically show that LDHD generalization is unattainable without appropriate inductive bias. Focusing on Boolean functions, we demonstrate that different architectures trained with (S)GD converge to min-degree interpolators w. r. t. different linearly independent sets, achieving LDHD generalization only when the target function aligns with this bias. From the perspective of LDHD generalization for length generalization, we explain the success of CoT in restructuring latent space for improved LDHD generalization. We further propose a principle for designing position embeddings to address both LDHD generalization and data format nuisances separately. Following the principle, we introduce RPE-Square, a novel embedding that enhances RPE to better handle data formats.

AAMAS Conference 2025 Conference Paper

Mitigating Non-Stationarity in Deep Reinforcement Learning with Clustering Orthogonal Weight Modification

  • Guoqing Ma
  • Yuhan Zhang
  • Yuming Dai
  • Guangfu Hao
  • Yang Chen
  • Shan Yu

RL agents often operate under the assumption of environmental stationarity, which poses a great challenge to learning efficiency since many environments are inherently non-stationary in state distribution. To address this issue, we introduce the Clustering Orthogonal Weight Modified (COWM) layer, which can be integrated into the policy network of any RL algorithm and mitigate non-stationarity effectively. By employing clustering techniques and a projection matrix, the COWM layer stabilize the learning process. Empirically, the COWM layer is integrated into various RL methods and outperforms state-of-the-art methods on the DMControl benchmark, highlighting its robustness and generality across various tasks and algorithms.

IJCAI Conference 2025 Conference Paper

Modular Deep Reinforcement Learning for Multi-Workload Offloading in Edge Networks

  • Hongchang Ke
  • Yan Ding
  • Lin Pan
  • Yang Chen
  • Jia Zhao

Dynamic edge networks revolutionize mobile edge computing by enabling real-time applications in intelligent transportation, augmented reality, and industrial Internet of Things (IoT). Efficient workload offloading in dynamic edge networks is crucial for addressing the increasing demands of time-varying workloads while contending with limited computational and communication resources. Existing deep reinforcement learning (DRL)-based offloading decision-making schemes are inadequate for managing scenarios involving multiple workloads and edge servers, particularly when faced with time-varying workload arrivals and fluctuating channel states. To this end, we propose a flexible module weighted fusion DRL framework (DRL-MWF) for scalable and robust multi-workload offloading in edge environments. Unlike traditional monolithic networks, DRL-MWF employs a weighted fusion modular architecture that adapts flexibly to diverse workload distributions. Specifically, DRL-MWF introduces a state representation and normalization strategy to model state and workload characteristics, enabling precise and adaptive decision-making. Furthermore, we design two key mechanisms: a weighted policy correction method to stabilize learning and a prioritized experience replay with weighted importance sampling to accelerate convergence by emphasizing critical transitions. Extensive evaluations on real-world datasets demonstrate that DRL-MWF consistently outperforms state-of-the-art baselines. These results reveal DRL-MWF's potential to transform workload offloading in next-generation edge computing systems, ensuring high performance in dynamic scenarios.

AAAI Conference 2025 Conference Paper

ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data

  • Yufan Shen
  • Chuwei Luo
  • Zhaoqing Zhu
  • Yang Chen
  • Qi Zheng
  • Zhi Yu
  • Jiajun Bu
  • Cong Yao

Recently, large language models (LLMs) and multimodal large language models (MLLMs) have demonstrated promising results on document visual question answering (VQA) task, particularly after training on document instruction datasets. An effective evaluation method for document instruction data is crucial in constructing instruction data with high efficacy, which, in turn, facilitates the training of LLMs and MLLMs for document VQA. However, most existing evaluation methods for instruction data are limited to the textual content of the instructions themselves, thereby hindering the effective assessment of document instruction datasets and constraining their construction. In this paper, we propose ProcTag, a data-oriented method that assesses the efficacy of document instruction data. ProcTag innovatively performs tagging on the execution process of instructions rather than the instruction text itself. By leveraging the diversity and complexity of these tags to assess the efficacy of the given dataset, ProcTag enables selective sampling or filtering of document instructions. Furthermore, DocLayPrompt, a novel semi-structured layout-aware document prompting strategy, is proposed for effectively representing documents. Experiments demonstrate that sampling existing open-sourced and generated document VQA/instruction datasets with ProcTag significantly outperforms current methods for evaluating instruction data. Impressively, with ProcTag-based sampling in the generated document datasets, only 30.5 percent of the document instructions are required to achieve 100 percent efficacy compared to the complete dataset.

JBHI Journal 2025 Journal Article

SiamFSA: Optical Flow-driven Structural-aware Siamese Network for Ultrasound Videos Landmark Tracking

  • Guang-Quan Zhou
  • Yifan Hu
  • Qinghan Yang
  • Ruo-Li Wang
  • Yang Chen

Accurate anatomical landmark tracking within ultrasound video is a crucial analysis task with many clinical applications. However, the non-rigid deformations caused by motion and probe extrusion lead to intra-object semantic and scale variations, resulting in inaccurate landmark tracking. Additionally, the inevitable intrinsic speckle noise and imaging artifacts exacerbate the dissimilarity of targets, further complicating the landmark tracking. In this study, we propose a novel Optical Flow-driven Structural-aware Siamese Network, SiamFSA, for landmark tracking in continuous ultrasound images. This approach implicitly incorporates structure and motion priors into the Siamese model to compensate for the influence of intra-object variations caused by protean tissue deformation. Specifically, we imposed an auxiliary fine-grained heatmap regression branch into the Siamese-based backbone to couple anatomical landmarks with position-sensitive templates for better localization. Moreover, we propose a structural drift correction mechanism to align relative positions across continuous frames guided by the optical flow from the template. This mechanism ensures intra-object semantic similarity of anatomical deformations and facilitates better salient feature extraction under landmark-centered alignment constraints. Meanwhile, a structural prior affine transformation module is designed to optimize the template view for landmark tracking by exploring intra-object scale variations during motion, thereby enhancing foreground semantic perception. Extensive experiments on both public and in-house ultrasound datasets demonstrate that our SiamFSA handles protean anatomical landmark tracking more effectively than other state-of-the-art methods, showing its potential in clinical analysis tasks.

IJCAI Conference 2025 Conference Paper

Situational-Constrained Sequential Resources Allocation via Reinforcement Learning

  • Libo Zhang
  • Yang Chen
  • Toru Takisaka
  • Kaiqi Zhao
  • Weidong Li
  • Jiamou Liu

Sequential Resource Allocation with situational constraints presents a significant challenge in real-world applications, where resource demands and priorities are context-dependent. This paper introduces a novel framework, SCRL, to address this problem. We formalize situational constraints as logic implications and develop a new algorithm that dynamically penalizes constraint violations. To handle situational constraints effectively, we propose a probabilistic selection mechanism to overcome limitations of traditional constraint reinforcement learning (CRL) approaches. We evaluate SCRL across two scenarios: medical resource allocation during a pandemic and pesticide distribution in agriculture. Experiments demonstrate that SCRL outperforms existing baselines in satisfying constraints while maintaining high resource efficiency, showcasing its potential for real-world, context-sensitive decision-making tasks.

JBHI Journal 2025 Journal Article

TD-SAM: Temporal and Distance-Guided Adaptations of SAM for Accurate Surgical Instrument Segmentation

  • Cheng Xue
  • Shiyu Zhao
  • Danqiong Wang
  • Cheng Chen
  • Guanyu Yang
  • Yang Chen

Accurate automatic surgical instrument segmentation plays a crucial role in robot-assisted surgery, but analyzing surgical videos remains challenging due to factors such as rapid instrument movements, high inter-category similarity, and frequent object occlusions. Current surgical instrument segmentation models struggle to capture both inter-frame variations and intra-frame details in complex surgical scenarios. The Segment Anything Model (SAM) has shown significant potential in various segmentation tasks. However, it has not fully addressed the unique challenges posed by surgical videos. To tackle these issues, we propose a Temporal and Distance-Guided SAM model (TD-SAM) for accurate surgical instrument segmentation. Specifically, we introduce a dynamic cross-frame attention module that effectively captures temporal information across frames, allowing the model to track the dynamic changes of surgical instruments and their environment, thus improving segmentation accuracy. In addition, we present a distance-guided instance refinement module, which enhances the model's ability to distinguish between similar categories, mitigating the class ambiguity caused by inter-category similarity. Extensive experiments on the EndoVis18 and EndoVis17 datasets show that the proposed TD-SAM model outperforms existing models, achieving state-of-the-art performance without using any prompts.

IROS Conference 2025 Conference Paper

Three-DOF controlled flight in palm-scale micro robotic blimp driven by flapping wings

  • Jie Chen
  • Xiang Lu
  • Yulie Wu
  • Yang Chen
  • Dingbang Xiao
  • Xuezhong Wu

Micro blimps exhibit significant potential for applications in environmental monitoring and disaster rescue. Nonetheless, traditional propulsion methods for micro blimps encounter challenges such as complex mechanical structures, intricate attitude control, and large volumes. This paper present a novel compact and lightweight bio-inspired flapping-wing-driven micro robotic blimp actuated by piezoelectric (PZT), featuring a simplified structure and achieving three-degree-of-freedom (DOF) motion control with only two flapping-wing thruster units. We present a high-voltage drive-sense-control circuit and adaptive control strategy, enabling wireless remote control, onboard attitude sensing, and closed-loop yaw control. The proposed micro robotic blimp, powered by an onboard battery, measures 15 cm in major axis and weighs 1. 53 g, achieves a maneuvering speed of 17 cm/s, and angular velocity reaches 12°/s with a yaw angle control accuracy of 0. 5°. As the smallest and lightest known self-powered micro blimp capable of stable yaw control, the platform demonstrates excellent endurance and environmental stealth characteristics and advances the design of micro aerial vehicles by offering a novel and efficient approach.

NeurIPS Conference 2025 Conference Paper

Trust Region Reward Optimization and Proximal Inverse Reward Optimization Algorithm

  • Yang Chen
  • Menglin Zou
  • Jiaqi Zhang
  • Yitan Zhang
  • Junyi Yang
  • Gaël Gendron
  • Libo Zhang
  • Jiamou Liu

Inverse Reinforcement Learning (IRL) learns a reward function to explain expert demonstrations. Modern IRL methods often use the adversarial (minimax) formulation that alternates between reward and policy optimization, which often lead to {\em unstable} training. Recent non-adversarial IRL approaches improve stability by jointly learning reward and policy via energy-based formulations but lack formal guarantees. This work bridges this gap. We first present a unified view showing canonical non-adversarial methods explicitly or implicitly maximize the likelihood of expert behavior, which is equivalent to minimizing the expected return gap. This insight leads to our main contribution: Trust Region Reward Optimization (TRRO), a framework that guarantees monotonic improvement in this likelihood via a Minorization-Maximization process. We instantiate TRRO into Proximal Inverse Reward Optimization (PIRO), a practical and stable IRL algorithm. Theoretically, TRRO provides the IRL counterpart to the stability guarantees of Trust Region Policy Optimization (TRPO) in forward RL. Empirically, PIRO matches or surpasses state-of-the-art baselines in reward recovery, policy imitation with high sample efficiency on MuJoCo and Gym-Robotics benchmarks and a real-world animal behavior modeling task.

AAAI Conference 2024 Conference Paper

Adaptive Meta-Learning Probabilistic Inference Framework for Long Sequence Prediction

  • Jianping Zhu
  • Xin Guo
  • Yang Chen
  • Yao Yang
  • Wenbo Li
  • Bo Jin
  • Fei Wu

Long sequence prediction has broad and significant application value in fields such as finance, wind power, and weather. However, the complex long-term dependencies of long sequence data and the potential domain shift problems limit the effectiveness of traditional models in practical scenarios. To this end, we propose an Adaptive Meta-Learning Probabilistic Inference Framework (AMPIF) based on sequence decomposition, which can effectively enhance the long sequence prediction ability of various basic models. Specifically, first, we decouple complex sequences into seasonal and trend components through a frequency domain decomposition module. Then, we design an adaptive meta-learning task construction strategy, which divides the seasonal and trend components into different tasks through a clustering-matching approach. Finally, we design a dual-stream amortized network (ST-DAN) to capture shared information between seasonal-trend tasks and use the support set to generate task-specific parameters for rapid generalization learning on the query set. We conducted extensive experiments on six datasets, including wind power and finance scenarios, and the results show that our method significantly outperforms baseline methods in prediction accuracy, interpretability, and algorithm stability and can effectively enhance the long sequence prediction capabilities of base models. The source code is publicly available at https://github.com/Zhu-JP/AMPIF.

AAMAS Conference 2024 Conference Paper

Behaviour Modelling of Social Animals via Causal Structure Discovery and Graph Neural Networks

  • Gaël Gendron
  • Yang Chen
  • Mitchell Rogers
  • Yiping Liu
  • Mihailo Azhar
  • Shahrokh Heidari
  • David Arturo Soriano Valdez
  • Kobe Knowles

Better understanding the natural world is a crucial task with a wide range of applications. In environments with close proximity between humans and animals, such as zoos, it is essential to better understand the causes behind animal behaviour to predict unusual changes, mitigate their detrimental effects and increase the well-being of animals. However, the complex social behaviours of mammalian groups remain largely unexplored. In this work, we propose a method to build behavioural models using causal structure discovery and graph neural networks for time series. We apply this method to a mob of meerkats in a zoo environment and study its ability to predict future actions and model the behaviour distribution at an individual-level and at a group level. We show that our method can match and outperform standard deep learning architectures and generate more realistic data, while using fewer parameters and providing increased interpretability.

AAAI Conference 2024 Conference Paper

Continuous Rotation Group Equivariant Network Inspired by Neural Population Coding

  • Zhiqiang Chen
  • Yang Chen
  • Xiaolong Zou
  • Shan Yu

Neural population coding can represent continuous information by neurons with a series of discrete preferred stimuli, and we find that the bell-shaped tuning curve plays an important role in this mechanism. Inspired by this, we incorporate a bell-shaped tuning curve into the discrete group convolution to achieve continuous group equivariance. Simply, we modulate group convolution kernels by Gauss functions to obtain bell-shaped tuning curves. Benefiting from the modulation, kernels also gain smooth gradients on geometric dimensions (e.g., location dimension and orientation dimension). It allows us to generate group convolution kernels from sparse weights with learnable geometric parameters, which can achieve both competitive performances and parameter efficiencies. Furthermore, we quantitatively prove that discrete group convolutions with proper tuning curves (bigger than 1x sampling step) can achieve continuous equivariance. Experimental results show that 1) our approach achieves very competitive performances on MNIST-rot with at least 75% fewer parameters compared with previous SOTA methods, which is efficient in parameter; 2) Especially with small sample sizes, our approach exhibits more pronounced performance improvements (up to 24%); 3) It also has excellent rotation generalization ability on various datasets such as MNIST, CIFAR, and ImageNet with both plain and ResNet architectures.

IROS Conference 2024 Conference Paper

Design and Control of a Novel Six-Degree-of-Freedom Hybrid Robotic Arm

  • Yang Chen
  • Zhonghua Miao
  • Yuanyue Ge
  • Sen Lin 0008
  • Liping Chen
  • Ya Xiong

Robotic arms are key components in fruit-harvesting robots. In agricultural settings, conventional serial or parallel robotic arms often fall short in meeting the demands for a large workspace, rapid movement, enhanced capability of obstacle avoidance and affordability. This study proposes LingXtend, a novel hybrid six-degree-of-freedom (DoF) robotic arm that combines the advantages of parallel and serial mechanisms. Inspired by yoga, we designed two sliders capable of moving independently along a single rail, acting as two feet. These sliders are interconnected with linkages and a meshed-gear set, allowing the parallel mechanism to lower itself and perform a split to pass under obstacles. This unique feature allows the arm to avoid obstacles such as pipes, tables and beams typically found in greenhouses. Integrated with serially mounted joints, the patented hybrid arm is able to maintain the end’s pose even when it moves with a mobile platform, facilitating fruit picking with the optimal pose in dynamic conditions. Moreover, the hybrid arm’s workspace is substantially larger, being almost three times the volume of UR3 serial arms and fourteen times that of the ABB IRB parallel arms. Experiments show that the repeatability errors are 0. 017 mm, 0. 03 mm and 0. 109 mm for the two sliders and the arm’s end, respectively, providing sufficient precision for agricultural robots.

JBHI Journal 2024 Journal Article

Image Domain Multi-Material Decomposition Noise Suppression Through Basis Transformation and Selective Filtering

  • Xu Ji
  • Xu Zhuo
  • Yuchen Lu
  • Weilong Mao
  • Shiyu Zhu
  • Guotao Quan
  • Yan Xi
  • Tianling Lyu

Spectral CT can provide material characterization ability to offer more precise material information for diagnosis purposes. However, the material decomposition process generally leads to amplification of noise which significantly limits the utility of the material basis images. To mitigate such problem, an image domain noise suppression method was proposed in this work. The method performs basis transformation of the material basis images based on a singular value decomposition. The noise variances of the original spectral CT images were incorporated in the matrix to be decomposed to ensure that the transformed basis images are statistically uncorrelated. Due to the difference in noise amplitudes in the transformed basis images, a selective filtering method was proposed with the low-noise transformed basis image as guidance. The method was evaluated using both numerical simulation and real clinical dual-energy CT data. Results demonstrated that compared with existing methods, the proposed method performs better in preserving the spatial resolution and the soft tissue contrast while suppressing the image noise. The proposed method is also computationally efficient and can realize real-time noise suppression for clinical spectral CT images.

JBHI Journal 2024 Journal Article

LeSAM: Adapt Segment Anything Model for Medical Lesion Segmentation

  • Yunbo Gu
  • Qianyu Wu
  • Hui Tang
  • Xiaoli Mai
  • Huazhong Shu
  • Baosheng Li
  • Yang Chen

The Segment Anything Model (SAM) is a foundational model that has demonstrated impressive results in the field of natural image segmentation. However, its performance remains suboptimal for medical image segmentation, particularly when delineating lesions with irregular shapes and low contrast. This can be attributed to the significant domain gap between medical images and natural images on which SAM was originally trained. In this paper, we propose an adaptation of SAM specifically tailored for lesion segmentation termed LeSAM. LeSAM first learns medical-specific domain knowledge through an efficient adaptation module and integrates it with the general knowledge obtained from the pre-trained SAM. Subsequently, we leverage this merged knowledge to generate lesion masks using a modified mask decoder implemented as a lightweight U-shaped network design. This modification enables better delineation of lesion boundaries while facilitating ease of training. We conduct comprehensive experiments on various lesion segmentation tasks involving different image modalities such as CT scans, MRI scans, ultrasound images, dermoscopic images, and endoscopic images. Our proposed method achieves superior performance compared to previous state-of-the-art methods in 8 out of 12 lesion segmentation tasks while achieving competitive performance in the remaining 4 datasets. Additionally, ablation studies are conducted to validate the effectiveness of our proposed adaptation modules and modified decoder.

AAAI Conference 2024 Conference Paper

Meta-Inverse Reinforcement Learning for Mean Field Games via Probabilistic Context Variables

  • Yang Chen
  • Xiao Lin
  • Bo Yan
  • Libo Zhang
  • Jiamou Liu
  • Neset Özkan Tan
  • Michael Witbrock

Designing suitable reward functions for numerous interacting intelligent agents is challenging in real-world applications. Inverse reinforcement learning (IRL) in mean field games (MFGs) offers a practical framework to infer reward functions from expert demonstrations. While promising, the assumption of agent homogeneity limits the capability of existing methods to handle demonstrations with heterogeneous and unknown objectives, which are common in practice. To this end, we propose a deep latent variable MFG model and an associated IRL method. Critically, our method can infer rewards from different yet structurally similar tasks without prior knowledge about underlying contexts or modifying the MFG model itself. Our experiments, conducted on simulated scenarios and a real-world spatial taxi-ride pricing problem, demonstrate the superiority of our approach over state-of-the-art IRL methods in MFGs.

AAAI Conference 2024 Conference Paper

Point Cloud Part Editing: Segmentation, Generation, Assembly, and Selection

  • Kaiyi Zhang
  • Yang Chen
  • Ximing Yang
  • Weizhong Zhang
  • Cheng Jin

Ideal part editing should guarantee the diversity of edited parts, the fidelity to the remaining parts, and the quality of the results. However, previous methods do not disentangle each part completely, which means the edited parts will affect the others, resulting in poor diversity and fidelity. In addition, some methods lack constraints between parts, which need manual selections of edited results to ensure quality. Therefore, we propose a four-stage process for point cloud part editing: Segmentation, Generation, Assembly, and Selection. Based on this process, we introduce SGAS, a model for part editing that employs two strategies: feature disentanglement and constraint. By independently fitting part-level feature distributions, we realize the feature disentanglement. By explicitly modeling the transformation from object-level distribution to part-level distributions, we realize the feature constraint. Considerable experiments on different datasets demonstrate the efficiency and effectiveness of SGAS on point cloud part editing. In addition, SGAS can be pruned to realize unsupervised part-aware point cloud generation and achieves state-of-the-art results.

JBHI Journal 2024 Journal Article

RED-Net: Residual and Enhanced Discriminative Network for Image Steganalysis in the Internet of Medical Things and Telemedicine

  • Kai Chen
  • Zhengyuan Zhou
  • Yuchen Li
  • Xu Ji
  • Jiasong Wu
  • Jean-Louis Coatrieux
  • Yang Chen
  • Gouenou Coatrieux

Internet of Medical Things (IoMT) and telemedicine technologies utilize computers, communications, and medical devices to facilitate off-site exchanges between specialists and patients, specialists, and medical staff. If the information communicated in IoMT is illegally steganography, tampered or leaked during transmission and storage, it will directly impact patient privacy or the consultation results with possible serious medical incidents. Steganalysis is of great significance for the identification of medical images transmitted illegally in IoMT and telemedicine. In this article, we propose a Residual and Enhanced Discriminative Network (RED-Net) for image steganalysis in the internet of medical things and telemedicine. RED-Net consists of a steganographic information enhancement module, a deep residual network, and steganographic information discriminative mechanism. Specifically, a steganographic information enhancement module is adopted by the RED-Net to boost the illegal steganographic signal in texturally complex high-dimensional medical image features. A deep residual network is utilized for steganographic feature extraction and compression. A steganographic information discriminative mechanism is employed by the deep residual network to enable it to recalibrate the steganographic features and drop high-frequency features that are mistaken for steganographic information. Experiments conducted on public and private datasets with data hiding payloads ranging from 0. 1bpp/bpnzac-0. 5bpp/bpnzac in the spatial and JPEG domain led to RED-Net's steganalysis error $P_{\mathrm{E}}$ in the range of 0. 0732-0. 0010 and 0. 231-0. 026, respectively. In general, qualitative and quantitative results on public and private datasets demonstrate that the RED-Net outperforms 8 state-of-art steganography detectors.

ICML Conference 2024 Conference Paper

Relational Learning in Pre-Trained Models: A Theory from Hypergraph Recovery Perspective

  • Yang Chen
  • Cong Fang 0001
  • Zhouchen Lin
  • Bing Liu

Foundation Models (FMs) have demonstrated remarkable insights into the relational dynamics of the world, leading to the crucial question: how do these models acquire an understanding of world hybrid relations? Traditional statistical learning, particularly for prediction problems, may overlook the rich and inherently structured information from the data, especially regarding the relationships between objects. We introduce a mathematical model that formalizes relational learning as hypergraph recovery to study pre-training of FMs. In our framework, the world is represented as a hypergraph, with data abstracted as random samples from hyperedges. We theoretically examine the feasibility of a Pre-Trained Model (PTM) to recover this hypergraph and analyze the data efficiency in a minimax near-optimal style. By integrating rich graph theories into the realm of PTMs, our mathematical framework offers powerful tools for an in-depth understanding of pre-training from a unique perspective and can be used under various scenarios. As an example, we extend the framework to entity alignment in multimodal learning.

AAAI Conference 2024 Conference Paper

Robust Node Classification on Graph Data with Graph and Label Noise

  • Yonghua Zhu
  • Lei Feng
  • Zhenyun Deng
  • Yang Chen
  • Robert Amor
  • Michael Witbrock

Current research for node classification focuses on dealing with either graph noise or label noise, but few studies consider both of them. In this paper, we propose a new robust node classification method to simultaneously deal with graph noise and label noise. To do this, we design a graph contrastive loss to conduct local graph learning and employ self-attention to conduct global graph learning. They enable us to improve the expressiveness of node representation by using comprehensive information among nodes. We also utilize pseudo graphs and pseudo labels to deal with graph noise and label noise, respectively. Furthermore, We numerically validate the superiority of our method in terms of robust node classification compared with all comparison methods.

JBHI Journal 2024 Journal Article

SBCNet: Scale and Boundary Context Attention Dual-Branch Network for Liver Tumor Segmentation

  • Kai-Ni Wang
  • Sheng-Xiao Li
  • Zhenyu Bu
  • Fu-Xing Zhao
  • Guang-Quan Zhou
  • Shou-Jun Zhou
  • Yang Chen

Automated segmentation of liver tumors in CT scans is pivotal for diagnosing and treating liver cancer, offering a valuable alternative to labor-intensive manual processes and ensuring the provision of accurate and reliable clinical assessment. However, the inherent variability of liver tumors, coupled with the challenges posed by blurred boundaries in imaging characteristics, presents a substantial obstacle to achieving their precise segmentation. In this paper, we propose a novel dual-branch liver tumor segmentation model, SBCNet, to address these challenges effectively. Specifically, our proposed method introduces a contextual encoding module, which enables a better identification of tumor variability using an advanced multi-scale adaptive kernel. Moreover, a boundary enhancement module is designed for the counterpart branch to enhance the perception of boundaries by incorporating contour learning with the Sobel operator. Finally, we propose a hybrid multi-task loss function, concurrently concerning tumors' scale and boundary features, to foster interaction across different tasks of dual branches, further improving tumor segmentation. Experimental validation on the publicly available LiTS dataset demonstrates the practical efficacy of each module, with SBCNet yielding competitive results compared to other state-of-the-art methods for liver tumor segmentation.

JBHI Journal 2023 Journal Article

Adaptive Frequency Learning Network With Anti-Aliasing Complex Convolutions for Colon Diseases Subtypes

  • Kaini Wang
  • Shuaishuai Zhuang
  • Juzheng Miao
  • Yang Chen
  • Jie Hua
  • Guang-Quan Zhou
  • Xiaopu He
  • Shuo Li

The automatic and dependable identification of colonic disease subtypes by colonoscopy is crucial. Once successful, it will facilitate clinically more in-depth disease staging analysis and the formulation of more tailored treatment plans. However, inter-class confusion and brightness imbalance are major obstacles to colon disease subtyping. Notably, the Fourier-based image spectrum, with its distinctive frequency features and brightness insensitivity, offers a potential solution. To effectively leverage its advantages to address the existing challenges, this article proposes a framework capable of thorough learning in the frequency domain based on four core designs: the position consistency module, the high-frequency self-supervised module, the complex number arithmetic model, and the feature anti-aliasing module. The position consistency module enables the generation of spectra that preserve local and positional information while compressing the spectral data range to improve training stability. Through band masking and supervision, the high-frequency autoencoder module guides the network to learn useful frequency features selectively. The proposed complex number arithmetic model allows direct spectral training while avoiding the loss of phase information caused by current general-purpose real-valued operations. The feature anti-aliasing module embeds filters in the model to prevent spectral aliasing caused by down-sampling and improve performance. Experiments are performed on the collected five-class dataset, which contains 4591 colorectal endoscopic images. The outcomes show that our proposed method produces state-of-the-art results with an accuracy rate of 89. 82%.

AAMAS Conference 2023 Conference Paper

Adversarial Inverse Reinforcement Learning for Mean Field Games

  • Yang Chen
  • Libo Zhang
  • Jiamou Liu
  • Michael Witbrock

Goal-based agents respond to environments and adjust behaviour accordingly to reach objectives. Understanding incentives of interacting agents from observed behaviour is a core problem in multi-agent systems. Inverse reinforcement learning (IRL) solves this problem, which infers underlying reward functions by observing the behaviour of rational agents. Despite IRL being principled, it becomes intractable when the number of agents grows because of the curse of dimensionality and the explosion of agent interactions. The formalism of Mean field games (MFGs) has gained momentum as a mathematically tractable paradigm for studying large-scale multi-agent systems. By grounding IRL in MFGs, recent research attempts to push the limits of the agent number in IRL. However, the study of IRL for MFGs is far from being mature as existing methods assume strong rationality, while real-world agents often exhibit bounded rationality due to the limited cognitive or computational capacity. Towards a more general and practical IRL framework for MFGs, this paper proposes Mean-Field Adversarial IRL, a novel framework capable of tolerating bounded rationality. We build it upon the maximum entropy principle, adversarial learning, and a new equilibrium concept for MFGs. We evaluate our machinery on simulated tasks with imperfect demonstrations resulting from bounded rationality. Experimental results demonstrate the superiority of MF-AIRL over existing methods in reward recovery.

JBHI Journal 2023 Journal Article

dMIL-Transformer: Multiple Instance Learning Via Integrating Morphological and Spatial Information for Lymph Node Metastasis Classification

  • Yang Chen
  • Zhuchen Shao
  • Hao Bian
  • Zijie Fang
  • Yifeng Wang
  • Yuanhao Cai
  • Haoqian Wang
  • Guojun Liu

Automated classification of lymph node metastasis (LNM) plays an important role in the diagnosis and prognosis. However, it is very challenging to achieve satisfactory performance in LNM classification, because both the morphology and spatial distribution of tumor regions should be taken into account. To address this problem, this article proposes a two-stage dMIL-Transformer framework, which integrates both the morphological and spatial information of the tumor regions based on the theory of multiple instance learning (MIL). In the first stage, a double Max-Min MIL (dMIL) strategy is devised to select the suspected top-K positive instances from each input histopathology image, which contains tens of thousands of patches (primarily negative). The dMIL strategy enables a better decision boundary for selecting the critical instances compared with other methods. In the second stage, a Transformer-based MIL aggregator is designed to integrate all the morphological and spatial information of the selected instances from the first stage. The self-attention mechanism is further employed to characterize the correlation between different instances and learn the bag-level representation for predicting the LNM category. The proposed dMIL-Transformer can effectively deal with the thorny classification in LNM with great visualization and interpretability. We conduct various experiments over three LNM datasets, and achieve 1. 79%-7. 50% performance improvement compared with other state-of-the-art methods.

JBHI Journal 2023 Journal Article

DREAM-Net: Deep Residual Error Iterative Minimization Network for Sparse-View CT Reconstruction

  • Yikun Zhang
  • Dianlin Hu
  • Shilei Hao
  • Jin Liu
  • Guotao Quan
  • Yi Zhang
  • Xu Ji
  • Yang Chen

Sparse-view Computed Tomography (CT) has the ability to reduce radiation dose and shorten the scan time, while the severe streak artifacts will compromise anatomical information. How to reconstruct high-quality images from sparsely sampled projections is a challenging ill-posed problem. In this context, we propose the unrolled Deep Residual Error iterAtive Minimization Network (DREAM-Net) based on a novel iterative reconstruction framework to synergize the merits of deep learning and iterative reconstruction. DREAM-Net performs constraints using deep neural networks in the projection domain, residual space, and image domain simultaneously, which is different from the routine practice in deep iterative reconstruction frameworks. First, a projection inpainting module completes the missing views to fully explore the latent relationship between projection data and reconstructed images. Then, the residual awareness module attempts to estimate the accurate residual image after transforming the projection error into the image space. Finally, the image refinement module learns a non-standard regularizer to further fine-tune the intermediate image. There is no need to empirically adjust the weights of different terms in DREAM-Net because the hyper-parameters are embedded implicitly in network modules. Qualitative and quantitative results have demonstrated the promising performance of DREAM-Net in artifact removal and structural fidelity.

JBHI Journal 2023 Journal Article

DSANet: Dual-Branch Shape-Aware Network for Echocardiography Segmentation in Apical Views

  • Guang-Quan Zhou
  • Wen-Bo Zhang
  • Zhong-Qing Shi
  • Zhan-Ru Qi
  • Kai-Ni Wang
  • Hong Song
  • Jing Yao
  • Yang Chen

Echocardiography is an essential examination for cardiac disease diagnosis, from which anatomical structures segmentation is the key to assessing various cardiac functions. However, the obscure boundaries and large shape deformations due to cardiac motion make it challenging to accurately identify the anatomical structures in echocardiography, especially for automatic segmentation. In this study, we propose a dual-branch shape-aware network (DSANet) to segment the left ventricle, left atrium, and myocardium from the echocardiography. Specifically, the elaborate dual-branch architecture integrating shape-aware modules boosts the corresponding feature representation and segmentation performance, which guides the model to explore shape priors and anatomical dependence using an anisotropic strip attention mechanism and cross-branch skip connections. Moreover, we develop a boundary-aware rectification module together with a boundary loss to regulate boundary consistency, adaptively rectifying the estimation errors nearby the ambiguous pixels. We evaluate our proposed method on the publicly available and in-house echocardiography dataset. Comparative experiments with other state-of-the-art methods demonstrate the superiority of DSANet, which suggests its potential in advancing echocardiography segmentation.

JMLR Journal 2023 Journal Article

Group SLOPE Penalized Low-Rank Tensor Regression

  • Yang Chen
  • Ziyan Luo

This article aims to seek a selection and estimation procedure for a class of tensor regression problems with multivariate covariates and matrix responses, which can provide theoretical guarantees for model selection in finite samples. Considering the frontal slice sparsity and low-rankness inherited in the coefficient tensor, we formulate the regression procedure as a group SLOPE penalized low-rank tensor optimization problem based on an orthogonal decomposition, namely TgSLOPE. This procedure provably controls the newly introduced tensor group false discovery rate (TgFDR), provided that the predictor matrix is column-orthogonal. Moreover, we establish the asymptotically minimax convergence with respect to the TgSLOPE estimate risk. For efficient problem resolution, we equivalently transform the TgSLOPE problem into a difference-of-convex (DC) program with the level-coercive objective function. This allows us to solve the reformulation problem of TgSLOPE by an efficient proximal DC algorithm (DCA) with global convergence. Numerical studies conducted on synthetic data and a real human brain connection data illustrate the efficacy of the proposed TgSLOPE estimation procedure. [abs] [ pdf ][ bib ] &copy JMLR 2023. ( edit, beta )

AAAI Conference 2023 Conference Paper

HVTSurv: Hierarchical Vision Transformer for Patient-Level Survival Prediction from Whole Slide Image

  • Zhuchen Shao
  • Yang Chen
  • Hao Bian
  • Jian Zhang
  • Guojun Liu
  • Yongbing Zhang

Survival prediction based on whole slide images (WSIs) is a challenging task for patient-level multiple instance learning (MIL). Due to the vast amount of data for a patient (one or multiple gigapixels WSIs) and the irregularly shaped property of WSI, it is difficult to fully explore spatial, contextual, and hierarchical interaction in the patient-level bag. Many studies adopt random sampling pre-processing strategy and WSI-level aggregation models, which inevitably lose critical prognostic information in the patient-level bag. In this work, we propose a hierarchical vision Transformer framework named HVTSurv, which can encode the local-level relative spatial information, strengthen WSI-level context-aware communication, and establish patient-level hierarchical interaction. Firstly, we design a feature pre-processing strategy, including feature rearrangement and random window masking. Then, we devise three layers to progressively obtain patient-level representation, including a local-level interaction layer adopting Manhattan distance, a WSI-level interaction layer employing spatial shuffle, and a patient-level interaction layer using attention pooling. Moreover, the design of hierarchical network helps the model become more computationally efficient. Finally, we validate HVTSurv with 3,104 patients and 3,752 WSIs across 6 cancer types from The Cancer Genome Atlas (TCGA). The average C-Index is 2.50-11.30% higher than all the prior weakly supervised methods over 6 TCGA datasets. Ablation study and attention visualization further verify the superiority of the proposed HVTSurv. Implementation is available at: https://github.com/szc19990412/HVTSurv.

AAMAS Conference 2023 Conference Paper

Learning Density-Based Correlated Equilibria for Markov Games

  • Libo Zhang
  • Yang Chen
  • Toru Takisaka
  • Bakh Khoussainov
  • Michael Witbrock
  • Jiamou Liu

Correlated Equilibrium (CE) is a well-established solution concept that captures coordination among agents and enjoys good algorithmic properties. In real-world multi-agent systems, in addition to being in equilibrium, agents’ policies are often expected to meet requirements with respect to safety, and fairness. Such additional requirements can often be expressed in terms of the state density which measures the state-visitation frequencies during the course of a game. However, existing CE notions or CE-finding approaches cannot explicitly specify a CE with particular properties concerning state density; they do so implicitly by either modifying reward functions or using value functions as the selection criteria. The resulting CE may thus not fully fulfil the state-density requirements. In this paper, we propose Density-Based Correlated Equilibria (DBCE), a new notion of CE that explicitly takes state density as a selection criterion. Concretely, we instantiate DBCE by specifying different state-density requirements motivated by real-world applications. To compute DBCE, we put forward the Density Based Correlated Policy Iteration algorithm for the underlying control problem. We perform experiments on various games where results demonstrate the advantage of our CE-finding approach over existing methods in scenarios with state-density concerns.

AAAI Conference 2023 Conference Paper

MSDC: Exploiting Multi-State Power Consumption in Non-intrusive Load Monitoring Based on a Dual-CNN Model

  • Jialing He
  • Jiamou Liu
  • Zijian Zhang
  • Yang Chen
  • Yiwei Liu
  • Bakh Khoussainov
  • Liehuang Zhu

Non-intrusive load monitoring (NILM) aims to decompose aggregated electrical usage signal into appliance-specific power consumption and it amounts to a classical example of blind source separation tasks. Leveraging recent progress on deep learning techniques, we design a new neural NILM model {\em Multi-State Dual CNN} (MSDC). Different from previous models, MSDC explicitly extracts information about the appliance's multiple states and state transitions, which in turn regulates the prediction of signals for appliances. More specifically, we employ a dual-CNN architecture: one CNN for outputting state distributions and the other for predicting the power of each state. A new technique is invented that utilizes conditional random fields (CRF) to capture state transitions. Experiments on two real-world datasets REDD and UK-DALE demonstrate that our model significantly outperform state-of-the-art models while having good generalization capacity, achieving 6%-10% MAE gain and 33%-51% SAE gain to unseen appliances.

JBHI Journal 2023 Journal Article

Multi-Task Learning for Pulmonary Arterial Hypertension Prognosis Prediction Via Memory Drift and Prior Prompt Learning on 3D Chest CT

  • Guanyu Yang
  • Yuting He
  • Yang Lv
  • Yang Chen
  • Jean-Louis Coatrieux
  • Xiaoxuan Sun
  • Qiang Wang
  • Yongyue Wei

Pulmonary arterial hypertension (PAH) prognosis prediction on 3D non-contrast CT images is one of the most important tasks for PAH treatment. It will help clinicians stratify patients into different groups for early diagnosis and timely intervention via automatically extracting the potential biomarkers of PAH to predict mortality. However, it is still a task of great challenges due to the large volume and low-contrast regions of interest in 3D chest CT images. In this paper, we propose the first multi-task learning-based PAH prognosis prediction framework, P $^{2}$ -Net, which effectively optimizes the model and powerfully represents task-dependent features via our Memory Drift (MD) and Prior Prompt Learning (PPL) strategies. 1) Our MD maintains a large memory bank to provide a dense sampling of the deep biomarkers' distribution. Therefore, although the batch size is very small caused by our large volume, a reliable (negative log partial) likelihood loss is still able to be calculated on a representative probability distribution for robust optimization. 2) Our PPL simultaneously learns an additional manual biomarkers prediction task to embed clinical prior knowledge into our deep prognosis prediction task in hidden and explicit ways. Therefore, it will prompt the prediction of deep biomarkers and improve the perception of task-dependent features in our low-contrast regions. Our P $^{2}$ -Net achieves a high prognostic correlation of the prediction and great generalization with the highest 70. 19% C-index and 2. 14 HR. Extensive experiments with promising results on our PAH prognosis prediction reveal powerful prognosis performance and great clinical significance in PAH treatment. All of our code will be made publicly available online.

NeurIPS Conference 2023 Conference Paper

Task-Robust Pre-Training for Worst-Case Downstream Adaptation

  • Jianghui Wang
  • Yang Chen
  • Xingyu Xie
  • Cong Fang
  • Zhouchen Lin

Pre-training has achieved remarkable success when transferred to downstream tasks. In machine learning, we care about not only the good performance of a model but also its behavior under reasonable shifts of condition. The same philosophy holds when pre-training a foundation model. However, the foundation model may not uniformly behave well for a series of related downstream tasks. This happens, for example, when conducting mask recovery regression where the recovery ability or the training instances diverge like pattern features are extracted dominantly on pre-training, but semantic features are also required on a downstream task. This paper considers pre-training a model that guarantees a uniformly good performance over the downstream tasks. We call this goal as downstream-task robustness. Our method first separates the upstream task into several representative ones and applies a simple minimax loss for pre-training. We then design an efficient algorithm to solve the minimax lossand prove its convergence in the convex setting. In the experiments, we show both on large-scale natural language processing and computer vision datasets our method increases the metrics on worse-case downstream tasks. Additionally, some theoretical explanations for why our loss is beneficial are provided. Specifically, we show fewer samples are inherently required for the most challenging downstream task in some cases.

ICML Conference 2023 Conference Paper

Tensor Gaussian Process with Contraction for Multi-Channel Imaging Analysis

  • Hu Sun
  • Ward Manchester
  • Meng Jin
  • Yang Liu
  • Yang Chen

Multi-channel imaging data is a prevalent data format in scientific fields such as astronomy and biology. The structured information and the high dimensionality of these 3-D tensor data makes the analysis an intriguing but challenging topic for statisticians and practitioners. The low-rank scalar-on-tensor regression model, in particular, has received widespread attention and has been re-formulated as a tensor Gaussian Process (Tensor-GP) model with multi-linear kernel in Yu et al. (2018). In this paper, we extend the Tensor-GP model by introducing an integrative dimensionality reduction technique, called tensor contraction, with a Tensor-GP for a scalar-on-tensor regression task with multi-channel imaging data. This is motivated by the solar flare forecasting problem with high dimensional multi-channel imaging data. We first estimate a latent, reduced-size tensor for each data tensor and then apply a multi-linear Tensor-GP on the latent tensor data for prediction. We introduce an anisotropic total-variation regularization when conducting the tensor contraction to obtain a sparse and smooth latent tensor. We then propose an alternating proximal gradient descent algorithm for estimation. We validate our approach via extensive simulation studies and applying it to the solar flare forecasting problem.

AAAI Conference 2023 Conference Paper

Weakly-Supervised Semantic Segmentation for Histopathology Images Based on Dataset Synthesis and Feature Consistency Constraint

  • Zijie Fang
  • Yang Chen
  • Yifeng Wang
  • Zhi Wang
  • Xiangyang Ji
  • Yongbing Zhang

Tissue segmentation is a critical task in computational pathology due to its desirable ability to indicate the prognosis of cancer patients. Currently, numerous studies attempt to use image-level labels to achieve pixel-level segmentation to reduce the need for fine annotations. However, most of these methods are based on class activation map, which suffers from inaccurate segmentation boundaries. To address this problem, we propose a novel weakly-supervised tissue segmentation framework named PistoSeg, which is implemented under a fully-supervised manner by transferring tissue category labels to pixel-level masks. Firstly, a dataset synthesis method is proposed based on Mosaic transformation to generate synthesized images with pixel-level masks. Next, considering the difference between synthesized and real images, this paper devises an attention-based feature consistency, which directs the training process of a proposed pseudo-mask refining module. Finally, the refined pseudo-masks are used to train a precise segmentation model for testing. Experiments based on WSSS4LUAD and BCSS-WSSS validate that PistoSeg outperforms the state-of-the-art methods. The code is released at https://github.com/Vison307/PistoSeg.

JBHI Journal 2022 Journal Article

Automatic Video Analysis Framework for Exposure Region Recognition in X-Ray Imaging Automation

  • Jiarui Sun
  • Zhan Wu
  • Zechen Yu
  • Huanji Chen
  • Changping Du
  • Liang Xu
  • Jian Zhong
  • Juan Feng

The deep learning-based automatic recognition of the scanning or exposing region in medical imaging automation is a promising new technique, which can decrease the heavy workload of the radiographers, optimize imaging workflow and improve image quality. However, there is little related research and practice in X-ray imaging. In this paper, we focus on two key problems in X-ray imaging automation: automatic recognition of the exposure moment and the exposure region. Consequently, we propose an automatic video analysis framework based on the hybrid model, approaching real-time performance. The framework consists of three interdependent components: Body Structure Detection, Motion State Tracing, and Body Modeling. Body Structure Detection disassembles the patient to obtain the corresponding body keypoints and body Bboxes. Combining and analyzing the two different types of body structure representations is to obtain rich spatial location information about the patient body structure. Motion State Tracing focuses on the motion state analysis of the exposure region to recognize the appropriate exposure moment. The exposure region is calculated by Body Modeling when the exposure moment appears. A large-scale dataset for X-ray examination scene is built to validate the performance of the proposed method. Extensive experiments demonstrate the superiority of the proposed method in automatically recognizing the exposure moment and exposure region. This paradigm provides the first method that can enable automatically and accurately recognize the exposure region in X-ray imaging without the help of the radiographer.

AAAI Conference 2022 Conference Paper

Delving into the Local: Dynamic Inconsistency Learning for DeepFake Video Detection

  • Zhihao Gu
  • Yang Chen
  • Taiping Yao
  • Shouhong Ding
  • Jilin Li
  • Lizhuang Ma

The rapid development of facial manipulation techniques has aroused public concerns in recent years. Existing deepfake video detection approaches attempt to capture the discriminative features between real and fake faces based on temporal modelling. However, these works impose supervisions on sparsely sampled video frames but overlook the local motions among adjacent frames, which instead encode rich inconsistency information that can serve as an efficient indicator for DeepFake video detection. To mitigate this issue, we delves into the local motion and propose a novel sampling unit named snippet which contains a few successive videos frames for local temporal inconsistency learning. Moreover, we elaborately design an Intra-Snippet Inconsistency Module (Intra-SIM) and an Inter-Snippet Interaction Module (Inter- SIM) to establish a dynamic inconsistency modelling framework. Specifically, the Intra-SIM applies bi-directional temporal difference operations and a learnable convolution kernel to mine the short-term motions within each snippet. The Inter-SIM is then devised to promote the cross-snippet information interaction to form global representations. The Intra- SIM and Inter-SIM work in an alternate manner and can be plugged into existing 2D CNNs. Our method outperforms the state of the art competitors on four popular benchmark dataset, i. e. , FaceForensics++, Celeb-DF, DFDC and Wild- Deepfake. Besides, extensive experiments and visualizations are also presented to further illustrate its effectiveness.

AAAI Conference 2022 Conference Paper

Exploiting Fine-Grained Face Forgery Clues via Progressive Enhancement Learning

  • Qiqi Gu
  • Shen Chen
  • Taiping Yao
  • Yang Chen
  • Shouhong Ding
  • Ran Yi

With the rapid development of facial forgery techniques, forgery detection has attracted more and more attention due to security concerns. Existing approaches attempt to use frequency information to mine subtle artifacts under high-quality forged faces. However, the exploitation of frequency information is coarse-grained, and more importantly, their vanilla learning process struggles to extract fine-grained forgery traces. To address this issue, we propose a progressive enhancement learning framework to exploit both the RGB and fine-grained frequency clues. Specifically, we perform a fine-grained decomposition of RGB images to completely decouple the real and fake traces in the frequency space. Subsequently, we propose a progressive enhancement learning framework based on a two-branch network, combined with self-enhancement and mutual-enhancement modules. The self-enhancement module captures the traces in different input spaces based on spatial noise enhancement and channel attention. The Mutual-enhancement module concurrently enhances RGB and frequency features by communicating in the shared spatial dimension. The progressive enhancement process facilitates the learning of discriminative features with fine-grained face forgery clues. Extensive experiments on several datasets show that our method outperforms the state-of-the-art face forgery detection methods.

AAMAS Conference 2022 Conference Paper

Individual-Level Inverse Reinforcement Learning for Mean Field Games

  • Yang Chen
  • Libo Zhang
  • Jiamou Liu
  • Shuyue Hu

The recent mean field game (MFG) formalism has enabled the application of inverse reinforcement learning (IRL) methods in largescale multi-agent systems, with the goal of inferring reward signals that can explain demonstrated behaviours of large populations. The existing IRL methods for MFGs are built upon reducing an MFG to a Markov decision process (MDP) defined on the collective behaviours and average rewards of the population. However, this paper reveals that the reduction from MFG to MDP holds only for the fully cooperative setting. This limitation invalidates existing IRL methods on MFGs with non-cooperative environments. To measure more general behaviours in large populations, we study the use of individual behaviours to infer ground-truth reward functions for MFGs. We propose Mean Field IRL (MFIRL), the first dedicated IRL framework for MFGs that can handle both cooperative and non-cooperative environments. Based on this theoretically justified framework, we develop a practical algorithm effective for MFGs with unknown dynamics. We evaluate MFIRL on both cooperative and mixed cooperative-competitive scenarios with many agents. Results demonstrate that MFIRL excels in reward recovery, sample efficiency and robustness in the face of changing dynamics.

IJCAI Conference 2022 Conference Paper

Interpretable AMR-Based Question Decomposition for Multi-hop Question Answering

  • Zhenyun Deng
  • Yonghua Zhu
  • Yang Chen
  • Michael Witbrock
  • Patricia Riddle

Effective multi-hop question answering (QA) requires reasoning over multiple scattered paragraphs and providing explanations for answers. Most existing approaches cannot provide an interpretable reasoning process to illustrate how these models arrive at an answer. In this paper, we propose a Question Decomposition method based on Abstract Meaning Representation (QDAMR) for multi-hop QA, which achieves interpretable reasoning by decomposing a multi-hop question into simpler subquestions and answering them in order. Since annotating the decomposition is expensive, we first delegate the complexity of understanding the multi-hop question to an AMR parser. We then achieve decomposition of a multi-hop question via segmentation of the corresponding AMR graph based on the required reasoning type. Finally, we generate sub-questions using an AMR-to-Text generation model and answer them with an off-the-shelf QA model. Experimental results on HotpotQA demonstrate that our approach is competitive for interpretable reasoning and that the sub-questions generated by QDAMR are well-formed, outperforming existing question-decomposition-based multihop QA approaches.

IJCAI Conference 2022 Conference Paper

MNet: Rethinking 2D/3D Networks for Anisotropic Medical Image Segmentation

  • Zhangfu Dong
  • Yuting He
  • Xiaoming Qi
  • Yang Chen
  • Huazhong Shu
  • Jean-Louis Coatrieux
  • Guanyu Yang
  • Shuo Li

The nature of thick-slice scanning causes severe inter-slice discontinuities of 3D medical images, and the vanilla 2D/3D convolutional neural networks (CNNs) fail to represent sparse inter-slice information and dense intra-slice information in a balanced way, leading to severe underfitting to inter-slice features (for vanilla 2D CNNs) and overfitting to noise from long-range slices (for vanilla 3D CNNs). In this work, a novel mesh network (MNet) is proposed to balance the spatial representation inter axes via learning. 1) Our MNet latently fuses plenty of representation processes by embedding multi-dimensional convolutions deeply into basic modules, making the selections of representation processes flexible, thus balancing representation for sparse inter-slice information and dense intra-slice information adaptively. 2) Our MNet latently fuses multi-dimensional features inside each basic module, simultaneously taking the advantages of 2D (high segmentation accuracy of the easily recognized regions in 2D view) and 3D (high smoothness of 3D organ contour) representations, thus obtaining more accurate modeling for target regions. Comprehensive experiments are performed on four public datasets (CT\&MR), the results consistently demonstrate the proposed MNet outperforms the other methods. The code and datasets are available at: https: //github. com/zfdong-code/MNet

JBHI Journal 2022 Journal Article

MVSGAN: Spatial-Aware Multi-View CMR Fusion for Accurate 3D Left Ventricular Myocardium Segmentation

  • Xiaoming Qi
  • Yuting He
  • Guanyu Yang
  • Yang Chen
  • Jian Yang
  • Wangyag Liu
  • Yinsu Zhu
  • Yi Xu

The accurate 3D left ventricular (LV) myocardium segmentation in short-axis (SAX) view of cardiac magnetic resonance (CMR) is challenged by the sparse spatial structure of CMR. The strategy of multi-view CMR fusion can provide fine-grained spatial structure for accurate segmentation. However, the large information misalignment and lack of dense 3D CMR as fusion target in multi-view CMR fusion, and the different spatial resolution between the fusion result and the ground truth in segmentation limit the strategy. In this study, we propose a multi-view spatial-aware adversarial network (MVSGAN). It studies the perception of fine-grained cardiac structure for accurate segmentation by the spatialaware multi-view CMR fusion. It consists of three modules: (1) A residual adversarial fusion (RAF) module takes inter-slices deep correlation and anatomical prior to refine the spatial structures by residual supplement and adversarial optimization. (2) A structural perception-aggregation (SPA) module establishes the spatial correlation between the dense cardiac model and sparse label for accurate CMR LV myocardium segmentation. (3) A joint training strategy utilizes the dense SAX volume as explicit and implicit goals to jointly optimize the framework. The experiments are applied on a public dataset and a clinical dataset to evaluate the performance of MVSGAN. The average Dice and Jaccard score of LV myocardium segmentation obtained by MVSGAN are highest among seven existing state-of-the-art methods, which are up to 0. 92 and 0. 75. It is concluded that the spatial-aware multi-view CMR fusion can provide meaningful spatial correlation for accurate LV myocardium segmentation.

JBHI Journal 2022 Journal Article

Online Hard Patch Mining Using Shape Models and Bandit Algorithm for Multi-Organ Segmentation

  • Jianan He
  • Guangquan Zhou
  • Shoujun Zhou
  • Yang Chen

Hard sample selection can effectively improve model convergence by extracting the most representative samples from a training set. However, due to the large capacity of medical images, existing sampling strategies suffer from insufficient exploitation for hard samples or high time cost for sample selection when adopted by 3D patch-based models in the field of multi-organ segmentation. In this paper, we present a novel and effective online hard patch mining (OHPM) algorithm. In our method, an average shape model that can be mapped with all training images is constructed to guide the exploration of hard patches and aggregate feedback from predicted patches. The process of hard mining is formalized as a multi-armed bandit problem and solved with bandit algorithms. With the shape model, OHPM requires negligible time consumption and can intuitively locate difficult anatomical areas during training. The employment of bandit algorithms ensures online and sufficient hard mining. We integrate OHPM with advanced segmentation networks and evaluate them on two datasets containing different anatomical structures. Comparative experiments with other sampling strategies demonstrate the superiority of OHPM in boosting segmentation performance and improving model convergence. The results in each dataset with each network suggest that OHPM significantly outperforms other sampling strategies by nearly 2% average Dice score.

JBHI Journal 2022 Journal Article

PRIOR: Prior-Regularized Iterative Optimization Reconstruction For 4D CBCT

  • Dianlin Hu
  • Yikun Zhang
  • Jin Liu
  • Yi Zhang
  • Jean Louis Coatrieux
  • Yang Chen

4D cone-beam computed tomography (CBCT) is an important imaging modality in image-guided radiation therapy to address the motion-induced artifacts caused by organ movements during the respiratory process. However, due to the extremely sparse projection data for each temporal phase, 4D CBCT reconstructions will suffer from severe streaking artifacts. Therefore, to tackle the streak artifacts and provide high-quality images, we proposed a framework termed Prior-Regularized Iterative Optimization Reconstruction (PRIOR) for 4D CBCT. The PRIOR framework combines the physics-based model and data-driven method simultaneously, with powerful feature extracting capacity, significantly promoting the image quality compared to single model-based or deep learning-based methods. Besides, we designed a specialized deep learning model named PRIOR-Net, which can effectively excavate the static information in the prior image reconstructed from the fully-sampled projections at the encoding stage to improve the reconstruction performance for individual phase-resolved images. Both the simulated and clinical 4D CBCT datasets were performed to evaluate the performance of the PRIOR-Net and the PRIOR framework. Compared with the advanced 4D CBCT reconstruction methods, the proposed methods achieve promising results quantitatively and qualitatively in streak artifact suppression, soft tissue restoration, and tiny detail preservation.

AAAI Conference 2022 Conference Paper

Proximal PanNet: A Model-Based Deep Network for Pansharpening

  • Xiangyong Cao
  • Yang Chen
  • Wenfei Cao

Recently, deep learning techniques have been extensively studied for pansharpening, which aims to generate a high resolution multispectral (HRMS) image by fusing a low resolution multispectral (LRMS) image with a high resolution panchromatic (PAN) image. However, existing deep learningbased pansharpening methods directly learn the mapping from LRMS and PAN to HRMS. These network architectures always lack sufficient interpretability, which limits further performance improvements. To alleviate this issue, we propose a novel deep network for pansharpening by combining the model-based methodology with the deep learning method. Firstly, we build an observation model for pansharpening using the convolutional sparse coding (CSC) technique and design a proximal gradient algorithm to solve this model. Secondly, we unfold the iterative algorithm into a deep network, dubbed as Proximal PanNet, by learning the proximal operators using convolutional neural networks. Finally, all the learnable modules can be automatically learned in an end-to-end manner. Experimental results on some benchmark datasets show that our network performs better than other advanced methods both quantitatively and qualitatively.

IJCAI Conference 2022 Conference Paper

Region-Aware Temporal Inconsistency Learning for DeepFake Video Detection

  • Zhihao Gu
  • Taiping Yao
  • Yang Chen
  • Ran Yi
  • Shouhong Ding
  • Lizhuang Ma

The rapid development of face forgery techniques has drawn growing attention due to security concerns. Existing deepfake video detection methods always attempt to capture the discriminative features by directly exploiting static temporal convolution to mine temporal inconsistency, without explicit exploration on the diverse temporal dynamics of different forged regions. To effectively and comprehensively capture the various inconsistency, in this paper, we propose a novel Region-Aware Temporal Filter (RATF) module which automatically generates corresponding temporal filters for different spatial regions. Specifically, we decouple the dynamic temporal kernel into a set of region-agnostic basic filters and region-sensitive aggregation weights. And different weights guide the corresponding regions to adaptively learn temporal inconsistency, which greatly enhances the overall representational ability. Moreover, to cover the long-term temporal dynamics, we divide the video into multiple snippets and propose a Cross-Snippet Attention (CSA) to promote the cross-snippet information interaction. Extensive experiments and visualizations on several benchmarks demonstrate the effectiveness of our method against state-of-the-art competitors.

AAAI Conference 2022 Conference Paper

Unpaired Multi-Domain Stain Transfer for Kidney Histopathological Images

  • Yiyang Lin
  • Bowei Zeng
  • Yifeng Wang
  • Yang Chen
  • Zijie Fang
  • Jian Zhang
  • Xiangyang Ji
  • Haoqian Wang

As an essential step in the pathological diagnosis, histochemical staining can show specific tissue structure information and, consequently, assist pathologists in making accurate diagnoses. Clinical kidney histopathological analyses usually employ more than one type of staining: H&E, MAS, PAS, PASM, etc. However, due to the interference of colors among multiple stains, it is not easy to perform multiple staining simultaneously on one biological tissue. To address this problem, we propose a network based on unpaired training data to virtually generate multiple types of staining from one staining. Our method can preserve the content of input images while transferring them to multiple target styles accurately. To efficiently control the direction of stain transfer, we propose a style guided normalization (SGN). Furthermore, a multiple style encoding (MSE) is devised to represent the relationship among different staining styles dynamically. An improved one-hot label is also proposed to enhance the generalization ability and extendibility of our method. Vast experiments have demonstrated that our model can achieve superior performance on a tiny dataset. The results exhibit not only good performance but also great visualization and interpretability. Especially, our method also achieves satisfactory results over cross-tissue, cross-staining as well as cross-task. We believe that our method will significantly influence clinical stain transfer and reduce the workload greatly for pathologists. Our code and Supplementary materials are available at https: //github. com/linyiyang98/UMDST.

AAAI Conference 2021 Conference Paper

Local Relation Learning for Face Forgery Detection

  • Shen Chen
  • Taiping Yao
  • Yang Chen
  • Shouhong Ding
  • Jilin Li
  • Rongrong Ji

With the rapid development of facial manipulation techniques, face forgery detection has received considerable attention in digital media forensics due to security concerns. Most existing methods formulate face forgery detection as a classification problem and utilize binary labels or manipulated region masks as supervision. However, without considering the correlation between local regions, these global supervisions are insufficient to learn a generalized feature and prone to overfitting. To address this issue, we propose a novel perspective of face forgery detection via local relation learning. Specifically, we propose a Multi-scale Patch Similarity Module (MPSM), which measures the similarity between features of local regions and forms a robust and generalized similarity pattern. Moreover, we propose an RGB-Frequency Attention Module (RFAM) to fuse information in both RGB and frequency domains for more comprehensive local feature representation, which further improves the reliability of the similarity pattern. Extensive experiments show that the proposed method consistently outperforms the state-of-the-arts on widely-used benchmarks. Furthermore, detailed visualization shows the robustness and interpretability of our method.

NeurIPS Conference 2021 Conference Paper

TransMIL: Transformer based Correlated Multiple Instance Learning for Whole Slide Image Classification

  • Zhuchen Shao
  • Hao Bian
  • Yang Chen
  • Yifeng Wang
  • Jian Zhang
  • Xiangyang Ji
  • Yongbing Zhang

Multiple instance learning (MIL) is a powerful tool to solve the weakly supervised classification in whole slide image (WSI) based pathology diagnosis. However, the current MIL methods are usually based on independent and identical distribution hypothesis, thus neglect the correlation among different instances. To address this problem, we proposed a new framework, called correlated MIL, and provided a proof for convergence. Based on this framework, we devised a Transformer based MIL (TransMIL), which explored both morphological and spatial information. The proposed TransMIL can effectively deal with unbalanced/balanced and binary/multiple classification with great visualization and interpretability. We conducted various experiments for three different computational pathology problems and achieved better performance and faster convergence compared with state-of-the-art methods. The test AUC for the binary tumor classification can be up to 93. 09% over CAMELYON16 dataset. And the AUC over the cancer subtypes classification can be up to 96. 03% and 98. 82% over TCGA-NSCLC dataset and TCGA-RCC dataset, respectively. Implementation is available at: https: //github. com/szc19990412/TransMIL.

AAAI Conference 2020 Conference Paper

One-Shot Learning for Long-Tail Visual Relation Detection

  • Weitao Wang
  • Meng Wang
  • Sen Wang
  • Guodong Long
  • Lina Yao
  • Guilin Qi
  • Yang Chen

The aim of visual relation detection is to provide a comprehensive understanding of an image by describing all the objects within the scene, and how they relate to each other, in form; for example, . This ability is vital for image captioning, visual question answering, and many other applications. However, visual relationships have long-tailed distributions and, thus, the limited availability of training samples is hampering the practicability of conventional detection approaches. With this in mind, we designed a novel model for visual relation detection that works in one-shot settings. The embeddings of objects and predicates are extracted through a network that includes a feature-level attention mechanism. Attention alleviates some of the problems with feature sparsity, and the resulting representations capture more discriminative latent features. The core of our model is a dual graph neural network that passes and aggregates the context information of predicates and objects in an episodic training scheme to improve recognition of the one-shot predicates and then generate the triplets. To the best of our knowledge, we are the first to center on the viability of one-shot learning for visual relation detection. Extensive experiments on two newly-constructed datasets show that our model significantly improved the performance of two tasks PredCls and SGCls from 2. 8% to 12. 2% compared with state-of-the-art baselines.

IJCAI Conference 2020 Conference Paper

Pay Attention to Devils: A Photometric Stereo Network for Better Details

  • Yakun Ju
  • Kin-Man Lam
  • Yang Chen
  • Lin Qi
  • Junyu Dong

We present an attention-weighted loss in a photometric stereo neural network to improve 3D surface recovery accuracy in complex-structured areas, such as edges and crinkles, where existing learning-based methods often failed. Instead of using a uniform penalty for all pixels, our method employs the attention-weighted loss learned in a self-supervise manner for each pixel, avoiding blurry reconstruction result in such difficult regions. The network first estimates a surface normal map and an adaptive attention map, and then the latter is used to calculate a pixel-wise attention-weighted loss that focuses on complex regions. In these regions, the attention-weighted loss applies higher weights of the detail-preserving gradient loss to produce clear surface reconstructions. Experiments on real datasets show that our approach significantly outperforms traditional photometric stereo algorithms and state-of-the-art learning-based methods.

AAAI Conference 2013 Conference Paper

A Generalized Student-t Based Approach to Mixed-Type Anomaly Detection

  • Yen-Cheng Lu
  • Feng Chen
  • Yang Chen
  • Chang-Tien Lu

Anomaly detection for mixed-type data is an important problem that has not been well addressed in the machine learning field. There are two challenging issues for mixed-type datasets, namely modeling mutual correlations between mixed-type attributes and capturing large variations due to anomalies. This paper presents BuffDetect, a robust error buffering approach for anomaly detection in mixed-type datasets. A new variant of the generalized linear model is proposed to model the dependency between mixed-type attributes. The model incorporates an error buffering component based on Student-t distribution to absorb the variations caused by anomalies. However, because of the non- Gaussian design, the problem becomes analytically intractable. We propose a novel Bayesian inference approach, which integrates Laplace approximation and several computational optimizations, and is able to ef- ficiently approximate the posterior of high dimensional latent variables by iteratively updating the latent variables in groups. Extensive experimental evaluations based on 13 benchmark datasets demonstrate the effectiveness and efficiency of BuffDetect.

ICRA Conference 1991 Conference Paper

Object modeling by registration of multiple range images

  • Yang Chen
  • Gérard G. Medioni

The problem of creating a complete model of a physical object is studied. Although this may be possible using intensity images, the authors use range images which directly provide access to three-dimensional information. The first problem that needs to be solved is to find the transformation between the different views. Previous approaches have either assumed this transformation to be known (which is extremely difficult for a complete model) or computed it with feature matching (which is not accurate enough for integration. The authors propose an approach that works on range data directly and registers successive views with enough overlapping area to get an accurate transformation between views. This is performed by minimizing a functional that does not require point-to-point matches. Details are given of the registration method and modeling procedure, and they are illustrated on range images of complex objects. >