Arrow Research search

Author name cluster

Guang Yang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

43 papers
2 author rows

Possible papers

43

AAAI Conference 2026 Conference Paper

GEMA-Score: Granular Explainable Multi-Agent Scoring Framework for Radiology Report Evaluation

  • Zhenxuan Zhang
  • KinHei Lee
  • Peiyuan Jing
  • Weihang Deng
  • Huichi Zhou
  • Zihao Jin
  • Jiahao Huang
  • Zhifan Gao

Automatic medical report generation has the potential to support clinical diagnosis, reduce the workload of radiologists, and demonstrate potential for enhancing diagnostic consistency. However, current evaluation metrics often fail to reflect the clinical reliability of generated reports. Overlap-based methods overlook fine-grained details (e.g., location, severity), diagnostic metrics are constrained by fixed vocabularies. Some diagnostic metrics are limited by fixed vocabularies or templates, reducing their ability to capture diverse clinical expressions. LLM-based metrics lack interpretable reasoning, limiting trust in clinical settings. Therefore, we propose a Granular Explainable Multi-Agent Score (GEMA-Score) in this paper, which conducts both objective quantification and subjective evaluation through a large language model-based multi-agent workflow. Our GEMA-Score parses structured reports and employs stable calculations through interactive exchanges of information among agents to assess disease diagnosis, location, severity, and uncertainty. Additionally, an LLM-based scoring agent evaluates completeness, readability, and clinical terminology while providing explanatory feedback. Extensive experiments show that GEMA-Score achieves the highest correlation with human experts on public datasets (Kendall = 0.69 on ReXVal; 0.45 on RadEvalX), demonstrating improved clinical scoring reliability.

JBHI Journal 2026 Journal Article

MSAFed: Generalized Multi-Stage and Adaptive Federated Learning for Test-Time Medical Segmentation

  • Jiajie Jin
  • Xuanmin Chen
  • Liyan Ma
  • Shihui Ying
  • Guang Yang
  • Tieyong Zeng

Federated learning (FL) enables collaborative model training across multiple medical centers without sharing data, offering significant promise for privacy-preserving AI in healthcare. However, FL models often lack generalization across all participating clients (inside FL) and perform poorly when deployed to unseen clients (outside FL), particularly in heterogeneous domains. Current test-time adaptation methods for outside FL fail to address biases in personalized models toward source distributions, limiting their clinical applications. To tackle these challenges, we propose MSAFed, a generalized multi-stage adaptive FL framework that enhances both inside generalization and outside test-time adaptation. During pretraining, intra-client and inter-client contrastive learning with prototype-aware aggregation produces a generalized global model. An adaptive learning rate strategy further improves inside FL generalization. For unseen clients, source knowledge, including adaptive learning rates and prototypes, is leveraged to dynamically adapt the network architecture during test time. Experiments on three real-world multi-center medical datasets demonstrate the effectiveness of MSAFed, achieving superior performance on both inside and outside FL tasks.

AAAI Conference 2026 Conference Paper

Nighttime Flare Removal via Wavelet-Guided and Gated-Enhanced Spatial-Frequency Fusion Network

  • Yun Liu
  • Guang Yang
  • Tao Li
  • Weisi Lin

Nighttime flares, caused by complex scattering and reflections from artificial light sources, significantly degrade image quality and hinder downstream visual tasks. Existing deflare networks usually struggle to jointly capture and fuse latent spatial and frequency features. In this paper, we propose a novel Wavelet-guided and Gated-enhanced Spatial-frequency Fusion Network (WGSF-Net) for nighttime flare removal. WGSF-Net is primarily composed of two key modules: Wavelet-guided Fusion Block (WFB) and Local-Global Block (LGB). Specifically, WFB integrates a Multi-level Wavelet Enhancement Block (MWEB) and a Spatial-Frequency Fusion Network (SFFN) to effectively extract hierarchical spatial and frequency features through a coarse-to-fine strategy based on multi-level wavelet decomposition. To better suppress flare artifacts, LGB is designed to jointly capture local and global information: a Gated-Enhanced Attention Block (GEAB) selectively amplifies critical local features through a gated network and a difference network, and the subsequent SFFN performs global spatial-frequency fusion via depthwise separable convolution and partial Fourier convolution. This design enables LGB to effectively disentangle flare-corrupted regions and restore fine-grained details, making it particularly suited for challenging real-world flare scenarios. Extensive experiments on both synthetic and real datasets show that WGSF-Net achieves state-of-the-art performance in nighttime flare removal, outperforming existing methods across five evaluation metrics.

ICML Conference 2025 Conference Paper

Arbitrarily-Conditioned Multi-Functional Diffusion for Multi-Physics Emulation

  • Da Long
  • Zhitong Xu
  • Guang Yang
  • Akil Narayan 0001
  • Shandian Zhe

Modern physics simulation often involves multiple functions of interests, and traditional numerical approaches are known to be complex and computationally costly. While machine learning-based surrogate models can offer significant cost reductions, most focus on a single task, such as forward prediction, and typically lack uncertainty quantification — an essential component in many applications. To overcome these limitations, we propose Arbitrarily-Conditioned Multi-Functional Diffusion (ACM-FD), a versatile probabilistic surrogate model for multi-physics emulation. ACM-FD can perform a wide range of tasks within a single framework, including forward prediction, various inverse problems, and simulating data for entire systems or subsets of quantities conditioned on others. Specifically, we extend the standard Denoising Diffusion Probabilistic Model (DDPM) for multi-functional generation by modeling noise as Gaussian processes (GP). We propose a random-mask based, zero-regularized denoising loss to achieve flexible and robust conditional generation. We induce a Kronecker product structure in the GP covariance matrix, substantially reducing the computational cost and enabling efficient training and sampling. We demonstrate the effectiveness of ACM-FD across several fundamental multi-physics systems.

IJCAI Conference 2025 Conference Paper

Cyclic Vision-Language Manipulator: Towards Reliable and Fine-Grained Image Interpretation for Automated Report Generation

  • Yingying Fang
  • Zihao Jin
  • Shaojie Guo
  • Jinda Liu
  • Zhiling Yue
  • Yijian Gao
  • Junzhi Ning
  • Zhi Li

Despite significant advancements in automated report generation, the opaqueness of text interpretability continues to cast doubt on the reliability of the content produced. This paper introduces a novel approach to identify specific image features in X-ray images that influence the outputs of report generation models. Specifically, we propose Cyclic Vision-Language Manipulator (CVLM), a module to generate a manipulated X-ray from an original X-ray and its report from a designated report generator. The essence of CVLM is that cycling manipulated X-rays to the report generator produces altered reports aligned with the alterations pre-injected into the reports for X-ray generation, achieving the term ``cyclic manipulation''. This process allows direct comparison between original and manipulated X-rays, clarifying the critical image features driving changes in reports and enabling model users to assess the reliability of the generated texts. Empirical evaluations demonstrate that CVLM can identify more precise and reliable features compared to existing explanation methods, significantly enhancing the transparency and applicability of AI-generated reports.

AAAI Conference 2025 Conference Paper

Dehaze-RetinexGAN: Real-World Image Dehazing via Retinex-based Generative Adversarial Network

  • Xinran Wang
  • Guang Yang
  • Tian Ye
  • Yun Liu

Deep learning based dehazing networks trained on paired synthetic data have shown impressive performance, but they struggle with significant degradation in generalization ability on real-world hazy scenes. In this paper, we propose Dehaze-RetinexGAN, a lightweight Retinex-based Generative Adversarial Network for real-world image Dehazing using unpaired data. Our Dehaze-RetinexGAN consists of two stages: self-supervised pre-training and weakly-supervised fine-tuning. During the pre-training, we reduce the image dehazing task to an illumination-reflectance decomposition task based on the duality correlation between Retinex and dehazing. Specifically, a decomposition network named DecomNet is constructed to obtain an illumination and a reflectance, simultaneously. Moreover, a self-supervised learning strategy is developed to construct the connection between the preliminary dehazed result and the input hazy image, which constrains the solution space of DecomNet and accelerates training, leading to a more realistic dehazed result. In the fine-tuning stage, we develop a dual DTCWT-based attention module and embed it into the U-Net architecture to further improve the quality of preliminary result in the frequency domain. In addition, the adversarial learning is employed to constrain the relevance between the clean image and the final dehazed result in a weakly supervised manner, which can promote more natural performance. Extensive experiments on several real-world datasets demonstrate that our proposed framework performs favorably over state-of-the-art dehazing methods in visual quality and quantitative evaluation.

JBHI Journal 2025 Journal Article

Enhancing Visual Reasoning With LLM-Powered Knowledge Graphs for Visual Question Localized-Answering in Robotic Surgery

  • Pengfei Hao
  • Hongqiu Wang
  • Guang Yang
  • Lei Zhu

Expert surgeons often have heavy workloads and cannot promptly respond to queries from medical students and junior doctors about surgical procedures. Thus, research on Visual Question Localized-Answering in Surgery (Surgical-VQLA) is essential to assist medical students and junior doctors in understanding surgical scenarios. Surgical-VQLA aims to generate accurate answers and locate relevant areas in the surgical scene, requiring models to identify and understand surgical instruments, operative organs, and procedures. A key issue is the model's ability to accurately distinguish surgical instruments. Current Surgical-VQLA models rely primarily on sparse textual information, limiting their visual reasoning capabilities. To address this issue, we propose a framework called Enhancing Visual Reasoning with LLM-Powered Knowledge Graphs (EnVR-LPKG) for the Surgical-VQLA task. This framework enhances the model's understanding of the surgical scenario by utilizing knowledge graphs of surgical instruments constructed by the Large Language Model (LLM). Specifically, we design a Fine-grained Knowledge Extractor (FKE) to extract the most relevant information from knowledge graphs and perform contrastive learning with the extracted knowledge graphs and local image. Furthermore, we design a Multi-attention-based Surgical Instrument Enhancer (MSIE) module, which employs knowledge graphs to obtain an enhanced representation of the corresponding surgical instrument in the global scene. Through the MSIE module, the model can learn how to fuse visual features with knowledge graph text features, thereby strengthening the understanding of surgical instruments and further improving visual reasoning capabilities. Extensive experimental results on the EndoVis-17-VQLA and EndoVis-18-VQLA datasets demonstrate that our proposed method outperforms other state-of-the-art methods. We will release our code for future research.

IROS Conference 2025 Conference Paper

Mobile Manipulator For Robotic Lacrosse: Learning to Pass the Ball

  • Xinchi Huang
  • Yifan Mao
  • Guang Yang
  • Yi Guo

This paper introduces the lacrosse mobile manipulator, a robotic system designed to play lacrosse. We focus on the task of ball passing between two robots, where challenges exist, including managing a small ball in the soft lacrosse head and interacting with a fast-moving ball. In this study, we develop innovative neural-network based learning approaches to enhance performance in dynamic environments. The robots are autonomously controlled in a decentralized manner. A combination of analytical physical-based and machine learning methods is employed to refine motion planner and ball landing predictions in both the throwing and catching processes. The system achieves satisfactory performance in real-world ball-passing experiments.

NeurIPS Conference 2025 Conference Paper

Multi-Agent Reinforcement Learning with Communication-Constrained Priors

  • Guang Yang
  • Tianpei Yang
  • Jingwen Qiao
  • Yanqing Wu
  • Jing Huo
  • Xingguo Chen
  • Yang Gao

Communication is one of the effective means to improve the learning of cooperative policy in multi-agent systems. However, in most real-world scenarios, lossy communication is a prevalent issue. Existing multi-agent reinforcement learning with communication, due to their limited scalability and robustness, struggles to apply to complex and dynamic real-world environments. To address these challenges, we propose a generalized communication-constrained model to uniformly characterize communication conditions across different scenarios. Based on this, we utilize it as a learning prior to distinguish between lossy and lossless messages for specific scenarios. Additionally, we decouple the impact of lossy and lossless messages on distributed decision-making, drawing on a dual mutual information estimatior, and introduce a communication-constrained multi-agent reinforcement learning framework, quantifying the impact of communication messages into the global reward. Finally, we validate the effectiveness of our approach across several communication-constrained benchmarks.

JBHI Journal 2025 Journal Article

Multi-Level Noise Sampling From Single Image for Low-Dose Tomography Reconstruction

  • Weiwen Wu
  • Yifei Long
  • Zhifan Gao
  • Guang Yang
  • Fangxiao Cheng
  • Jianjia Zhang

Low-dose digital radiography (DR) and computed tomography (CT) become increasingly popular due to reduced radiation dose. However, they often result in degraded images with lower signal-to-noise ratios, creating an urgent need for effective denoising techniques. The recent advancement of the single-image-based denoising approach provides a promising solution without requirement of pairwise training data, which are scarce in medical imaging. These methods typically rely on sampling image pairs from a noisy image for inter-supervised denoising. Although enjoying simplicity, the generated image pairs are at the same noise level and only include partial information about the input images. This study argues that generating image pairs at different noise levels while fully using the information of the input image is preferable since it could provide richer multi-perspective clues to guide the denoising process. To this end, we present a novel Multi-Level Noise Sampling (MNS) method for low-dose tomography denoising. Specifically, MNS method generates multi-level noisy sub-images by partitioning the high-dimensional input space into multiple low-dimensional sub-spaces with a simple yet effective strategy. The superiority of the MNS method in single-image-based denoising over the competing methods has been investigated and verified theoretically. Moreover, to bridge the gap between self-supervised and supervised denoising networks, we introduce an optimization function that leverages prior knowledge of multi-level noisy sub-images to guide the training process. Through extensive quantitative and qualitative experiments conducted on large-scale clinical low-dose CT and DR datasets, we validate the effectiveness and superiority of our MNS approach over other state-of-the-art supervised and self-supervised methods.

IJCAI Conference 2025 Conference Paper

Multimodal Cancer Survival Analysis via Hypergraph Learning with Cross-Modality Rebalance

  • Mingcheng Qu
  • Guang Yang
  • Donglin Di
  • Tonghua Su
  • Yue Gao
  • Yang Song
  • Lei Fan

Multimodal pathology-genomic analysis has become increasingly prominent in cancer survival prediction. However, existing studies mainly utilize multi-instance learning to aggregate patch-level features, neglecting the information loss of contextual and hierarchical details within pathology images. Furthermore, the disparity in data granularity and dimensionality between pathology and genomics leads to a significant modality imbalance. The high spatial resolution inherent in pathology data renders it a dominant role while overshadowing genomics in multimodal integration. In this paper, we propose a multimodal survival prediction framework that incorporates hypergraph learning to effectively capture both contextual and hierarchical details from pathology images. Moreover, it employs a modality rebalance mechanism and an interactive alignment fusion strategy to dynamically reweight the contributions of the two modalities, thereby mitigating the pathology-genomics imbalance. Quantitative and qualitative experiments are conducted on five TCGA datasets, demonstrating that our model outperforms advanced methods by over 3. 4% in C-Index performance. Code: https: //github. com/MCPathology/MRePath.

ICML Conference 2025 Conference Paper

Toward Efficient Kernel-Based Solvers for Nonlinear PDEs

  • Zhitong Xu
  • Da Long
  • Yiming Xu
  • Guang Yang
  • Shandian Zhe
  • Houman Owhadi

We introduce a novel kernel learning framework toward efficiently solving nonlinear partial differential equations (PDEs). In contrast to the state-of-the-art kernel solver that embeds differential operators within kernels, posing challenges with a large number of collocation points, our approach eliminates these operators from the kernel. We model the solution using a standard kernel interpolation form and differentiate the interpolant to compute the derivatives. Our framework obviates the need for complex Gram matrix construction between solutions and their derivatives, allowing for a straightforward implementation and scalable computation. As an instance, we allocate the collocation points on a grid and adopt a product kernel, which yields a Kronecker product structure in the interpolation. This structure enables us to avoid computing the full Gram matrix, reducing costs and scaling efficiently to a large number of collocation points. We provide a proof of the convergence and rate analysis of our method under appropriate regularity assumptions. In numerical experiments, we demonstrate the advantages of our method in solving several benchmark PDEs.

IROS Conference 2025 Conference Paper

TVFET-VD: Time-Varying Formation Encircling and Tracking Control Based on Visual Detection

  • Guang Yang
  • Juntong Qi
  • Mingming Wang
  • Hailong Huang 0001
  • Yan Peng
  • Chong Wu 0004
  • Yuan Ping 0001

This paper proposes a whole process method of multi-quadrotors from detecting and locating to encircle and track targets. The reconnaissance quadrotor realizes accurate target detection based on the one-stage target detector of convolutional neural network. Then, based on a pinhole camera projection model, the target is located from the 2D pixel coordinates to 3D North East Down(NED) coordinates world. Finally, the hunter quadrotors realize the target encircling and the time-varying formation tracking based on the consensus theory. At the same time, we prove the stability of the time-varying formation tracking control. We built a multiple quadrotors platform composed of one reconnaissance quadrotor and four hunter quadrotors, and deployed the method on the platform to conduct a series of experiments with a minibus as the target for validation. The results indicate that reconnaissance quadrotor can accurately detect target and have small localization errors in the north and east directions. Hunter quadrotors can encircle and track targets in time-varying formation based on target information provided by reconnaissance quadrotor. Experiments have demonstrated that the method achieves high-speed and accurate target encirclement.

NeurIPS Conference 2024 Conference Paper

4DBInfer: A 4D Benchmarking Toolbox for Graph-Centric Predictive Modeling on RDBs

  • Minjie Wang
  • Quan Gan
  • David Wipf
  • Zhenkun Cai
  • Ning Li
  • Jianheng Tang
  • Yanlin Zhang
  • Zizhao Zhang

Given a relational database (RDB), how can we predict missing column values in some target table of interest? Although RDBs store vast amounts of rich, informative data spread across interconnected tables, the progress of predictive machine learning models as applied to such tasks arguably falls well behind advances in other domains such as computer vision or natural language processing. This deficit stems, at least in part, from the lack of established/public RDB benchmarks as needed for training and evaluation purposes. As a result, related model development thus far often defaults to tabular approaches trained on ubiquitous single-table benchmarks, or on the relational side, graph-based alternatives such as GNNs applied to a completely different set of graph datasets devoid of tabular characteristics. To more precisely target RDBs lying at the nexus of these two complementary regimes, we explore a broad class of baseline models predicated on: (i) converting multi-table datasets into graphs using various strategies equipped with efficient subsampling, while preserving tabular characteristics; and (ii) trainable models with well-matched inductive biases that output predictions based on these input subgraphs. Then, to address the dearth of suitable public benchmarks and reduce siloed comparisons, we assemble a diverse collection of (i) large-scale RDB datasets and (ii) coincident predictive tasks. From a delivery standpoint, we operationalize the above four dimensions (4D) of exploration within a unified, scalable open-source toolbox called 4DBInfer; please see https: //github. com/awslabs/multi-table-benchmark.

NeurIPS Conference 2024 Conference Paper

Attention boosted Individualized Regression

  • Guang Yang
  • Yuan Cao
  • Long Feng

Different from classical one-model-fits-all strategy, individualized models allow parameters to vary across samples and are gaining popularity in various fields, particularly in personalized medicine. Motivated by medical imaging analysis, this paper introduces a novel individualized modeling framework for matrix-valued data that does not require additional information on sample similarity for the individualized coefficients. Under our framework, the model individualization stems from an optimal internal relation map within the samples themselves. We refer to the proposed method as Attention boosted Individualized Regression, due to its close connections with the self-attention mechanism. Therefore, our approach provides a new interpretation for attention from the perspective of individualized modeling. Comprehensive numerical experiments and real brain MRI analysis using an ADNI dataset demonstrated the superior performance of our model.

JBHI Journal 2024 Journal Article

Is Attention all You Need in Medical Image Analysis? A Review

  • Giorgos Papanastasiou
  • Nikolaos Dikaios
  • Jiahao Huang
  • Chengjia Wang
  • Guang Yang

Medical imaging is a key component in clinical diagnosis, treatment planning and clinical trial design, accounting for almost 90% of all healthcare data. CNNs achieved performance gains in medical image analysis (MIA) over the last years. CNNs can efficiently model local pixel interactions and be trained on small-scale MI data. Despite their important advances, typical CNN have relatively limited capabilities in modelling “global” pixel interactions, which restricts their generalisation ability to understand out-of-distribution data with different “global” information. The recent progress of Artificial Intelligence gave rise to Transformers, which can learn global relationships from data. However, full Transformer models need to be trained on large-scale data and involve tremendous computational complexity. Attention and Transformer compartments (“Transf/Attention”) which can well maintain properties for modelling global relationships, have been proposed as lighter alternatives of full Transformers. Recently, there is an increasing trend to co-pollinate complementary local-global properties from CNN and Transf/Attention architectures, which led to a new era of hybrid models. The past years have witnessed substantial growth in hybrid CNN-Transf/Attention models across diverse MIA problems. In this systematic review, we survey existing hybrid CNN-Transf/Attention models, review and unravel key architectural designs, analyse breakthroughs, and evaluate current and future opportunities as well as challenges. We also introduced an analysis framework on generalisation opportunities of scientific and clinical impact, based on which new data-driven domain generalisation and adaptation methods can be stimulated.

JBHI Journal 2024 Journal Article

Real-Time Non-Invasive Imaging and Detection of Spreading Depolarizations through EEG: An Ultra-Light Explainable Deep Learning Approach

  • Yinzhe Wu
  • Sharon Jewell
  • Xiaodan Xing
  • Yang Nan
  • Anthony J. Strong
  • Guang Yang
  • Martyn G. Boutelle

A core aim of neurocritical care is to prevent secondary brain injury. Spreading depolarizations (SDs) have been identified as an important independent cause of secondary brain injury. SDs are usually detected using invasive electrocorticography recorded at high sampling frequency. Recent pilot studies suggest a possible utility of scalp electrodes generated electroencephalogram (EEG) for non-invasive SD detection. However, noise and attenuation of EEG signals makes this detection task extremely challenging. Previous methods focus on detecting temporal power change of EEG over a fixed high-density map of scalp electrodes, which is not always clinically feasible. Having a specialized spectrogram as an input to the automatic SD detection model, this study is the first to transform SD identification problem from a detection task on a 1-D time-series wave to a task on a sequential 2-D rendered imaging. This study presented a novel ultra-light-weight multi-modal deep-learning network to fuse EEG spectrogram imaging and temporal power vectors to enhance SD identification accuracy over each single electrode, allowing flexible EEG map and paving the way for SD detection on ultra-low-density EEG with variable electrode positioning. Our proposed model has an ultra-fast processing speed (<0. 3 sec). Compared to the conventional methods (2 hours), this is a huge advancement towards early SD detection and to facilitate instant brain injury prognosis. Seeing SDs with a new dimension – frequency on spectrograms, we demonstrated that such additional dimension could improve SD detection accuracy, providing preliminary evidence to support the hypothesis that SDs may show implicit features over the frequency profile.

AAAI Conference 2024 Conference Paper

Reducing Spatial Fitting Error in Distillation of Denoising Diffusion Models

  • Shengzhe Zhou
  • Zejian Li
  • Shengyuan Zhang
  • Lefan Hou
  • Changyuan Yang
  • Guang Yang
  • Zhiyuan Yang
  • Lingyun Sun

Denoising Diffusion models have exhibited remarkable capabilities in image generation. However, generating high-quality samples requires a large number of iterations. Knowledge distillation for diffusion models is an effective method to address this limitation with a shortened sampling process but causes degraded generative quality. Based on our analysis with bias-variance decomposition and experimental observations, we attribute the degradation to the spatial fitting error occurring in the training of both the teacher and student model in the distillation. Accordingly, we propose Spatial Fitting-Error Reduction Distillation model (SFERD). SFERD utilizes attention guidance from the teacher model and a designed semantic gradient predictor to reduce the student's fitting error. Empirically, our proposed model facilitates high-quality sample generation in a few function evaluations. We achieve an FID of 5.31 on CIFAR-10 and 9.39 on ImageNet 64x64 with only one step, outperforming existing diffusion methods. Our study provides a new perspective on diffusion distillation by highlighting the intrinsic denoising ability of models.

RLJ Journal 2024 Journal Article

ROIL: Robust Offline Imitation Learning without Trajectories

  • Gersi Doko
  • Guang Yang
  • Daniel S. Brown
  • Marek Petrik

We study the problem of imitation learning via inverse reinforcement learning where the agent attempts to learn an expert's policy from a dataset of collected state, action tuples. We derive a new Robust model-based Offline Imitation Learning method (ROIL) that mitigates covariate shift by avoiding estimating the expert's occupancy frequency. Frequently in offline settings, there is insufficient data to reliably estimate the expert's occupancy frequency and this leads to models that do not generalize well. Our proposed approach, ROIL, is a method that is guaranteed to recover the expert's occupancy frequency and is efficiently solvable as an LP. We demonstrate ROIL's ability to achieve minimal regret in large environments under covariate shift, such as when the state visitation frequency of the demonstrations does not come from the expert.

RLC Conference 2024 Conference Paper

ROIL: Robust Offline Imitation Learning without Trajectories

  • Gersi Doko
  • Guang Yang
  • Daniel S. Brown
  • Marek Petrik

We study the problem of imitation learning via inverse reinforcement learning where the agent attempts to learn an expert's policy from a dataset of collected state, action tuples. We derive a new Robust model-based Offline Imitation Learning method (ROIL) that mitigates covariate shift by avoiding estimating the expert's occupancy frequency. Frequently in offline settings, there is insufficient data to reliably estimate the expert's occupancy frequency and this leads to models that do not generalize well. Our proposed approach, ROIL, is a method that is guaranteed to recover the expert's occupancy frequency and is efficiently solvable as an LP. We demonstrate ROIL's ability to achieve minimal regret in large environments under covariate shift, such as when the state visitation frequency of the demonstrations does not come from the expert.

TMLR Journal 2024 Journal Article

The Missing U for Efficient Diffusion Models

  • Sergio Calvo Ordoñez
  • Chun-Wun Cheng
  • Jiahao Huang
  • Lipei Zhang
  • Guang Yang
  • Carola-Bibiane Schönlieb
  • Angelica I Aviles-Rivero

Diffusion Probabilistic Models stand as a critical tool in generative modelling, enabling the generation of complex data distributions. This family of generative models yields record-breaking performance in tasks such as image synthesis, video generation, and molecule design. Despite their capabilities, their efficiency, especially in the reverse process, remains a challenge due to slow convergence rates and high computational costs. In this paper, we introduce an approach that leverages continuous dynamical systems to design a novel denoising network for diffusion models that is more parameter-efficient, exhibits faster convergence, and demonstrates increased noise robustness. Experimenting with Denoising Diffusion Probabilistic Models (DDPMs), our framework operates with approximately a quarter of the parameters, and $\sim$ 30\% of the Floating Point Operations (FLOPs) compared to standard U-Nets in DDPMs. Furthermore, our model is notably faster in inference than the baseline when measured in fair and equal conditions. We also provide a mathematical intuition as to why our proposed reverse process is faster as well as a mathematical discussion of the empirical tradeoffs in the denoising downstream task. Finally, we argue that our method is compatible with existing performance enhancement techniques, enabling further improvements in efficiency, quality, and speed.

JBHI Journal 2023 Journal Article

Adversarial Transformer for Repairing Human Airway Segmentation

  • Zeyu Tang
  • Yang Nan
  • Simon Walsh
  • Guang Yang

Automated airway segmentation models often suffer from discontinuities in peripheral bronchioles, which limits their clinical applicability. Furthermore, data heterogeneity across different centres and pathological abnormalities pose significant challenges to achieving accurate and robust segmentation in distal small airways. Accurate segmentation of airway structures is essential for the diagnosis and prognosis of lung diseases. To address these issues, we propose a patch-scale adversarial-based refinement network that takes in preliminary segmentation and original CT images and outputs a refined mask of the airway structure. Our method is validated on three datasets, including healthy cases, pulmonary fibrosis, and COVID-19 cases, and quantitatively evaluated using seven metrics. Our method achieves more than a 15% increase in the detected length ratio and detected branch ratio compared to previously proposed models, demonstrating its promising performance. The visual results show that our refinement approach, guided by a patch-scale discriminator and centreline objective functions, effectively detects discontinuities and missing bronchioles. We also demonstrate the generalizability of our refinement pipeline on three previous models, significantly improving their segmentation completeness. Our method provides a robust and accurate airway segmentation tool that can help improve diagnosis and treatment planning for lung diseases.

AAAI Conference 2023 Conference Paper

ERASER: AdvERsArial Sensitive Element Remover for Image Privacy Preservation

  • Guang Yang
  • Juan Cao
  • Danding Wang
  • Peng Qi
  • Jintao Li

The daily practice of online image sharing enriches our lives, but also raises a severe issue of privacy leakage. To mitigate the privacy risks during image sharing, some researchers modify the sensitive elements in images with visual obfuscation methods including traditional ones like blurring and pixelating, as well as generative ones based on deep learning. However, images processed by such methods may be recovered or recognized by models, which cannot guarantee privacy. Further, traditional methods make the images very unnatural with low image quality. Although generative methods produce better images, most of them suffer from insufficiency in the frequency domain, which influences image quality. Therefore, we propose the AdvERsArial Sensitive Element Remover (ERASER) to guarantee both image privacy and image quality. 1) To preserve image privacy, for the regions containing sensitive elements, ERASER guarantees enough difference after being modified in an adversarial way. Specifically, we take both the region and global content into consideration with a Prior Transformer and obtain the corresponding region prior and global prior. Based on the priors, ERASER is trained with an adversarial Difference Loss to make the content in the regions different. As a result, ERASER can reserve the main structure and change the texture of the target regions for image privacy preservation. 2) To guarantee the image quality, ERASER improves the frequency insufficiency of current generative methods. Specifically, the region prior and global prior are processed with Fast Fourier Convolution to capture characteristics and achieve consistency in both pixel and frequency domains. Quantitative analyses demonstrate that the proposed ERASER achieves a balance between image quality and image privacy preservation, while qualitative analyses demonstrate that ERASER indeed reduces the privacy risk from the visual perception aspect.

JBHI Journal 2023 Journal Article

HDL: Hybrid Deep Learning for the Synthesis of Myocardial Velocity Maps in Digital Twins for Cardiac Analysis

  • Xiaodan Xing
  • Javier Del Ser
  • Yinzhe Wu
  • Yang Li
  • Jun Xia
  • Lei Xu
  • David Firmin
  • Peter Gatehouse

Synthetic digital twins based on medical data accelerate the acquisition, labelling and decision making procedure in digital healthcare. A core part of digital healthcare twins is model-based data synthesis, which permits the generation of realistic medical signals without requiring to cope with the modelling complexity of anatomical and biochemical phenomena producing them in reality. Unfortunately, algorithms for cardiac data synthesis have been so far scarcely studied in the literature. An important imaging modality in the cardiac examination is three-directional CINE multi-slice myocardial velocity mapping (3Dir MVM), which provides a quantitative assessment of cardiac motion in three orthogonal directions of the left ventricle. The long acquisition time and complex acquisition produce make it more urgent to produce synthetic digital twins of this imaging modality. In this study, we propose a hybrid deep learning (HDL) network, especially for synthetic 3Dir MVM data. Our algorithm is featured by a hybrid UNet and a Generative Adversarial Network with a foreground-background generation scheme. The experimental results show that from temporally down-sampled magnitude CINE images (six times), our proposed algorithm can still successfully synthesise high temporal resolution 3Dir MVM CMR data (PSNR=42. 32) with precise left ventricle segmentation (DICE=0. 92). These performance scores indicate that our proposed HDL algorithm can be implemented in real-world digital twins for myocardial velocity mapping data simulation. To the best of our knowledge, this work is the first one investigating digital twins of the 3Dir MVM CMR, which has shown great potential for improving the efficiency of clinical studies via synthesised cardiac data.

IJCAI Conference 2023 Conference Paper

Learning Object Consistency and Interaction in Image Generation from Scene Graphs

  • Yangkang Zhang
  • Chenye Meng
  • Zejian Li
  • Pei Chen
  • Guang Yang
  • Changyuan Yang
  • Lingyun Sun

This paper is concerned with synthesizing images conditioned on a scene graph (SG), a set of object nodes and their edges of interactive relations. We divide existing works into image-oriented and code-oriented methods. In our analysis, the image-oriented methods do not consider object interaction in spatial hidden feature. On the other hand, in empirical study, the code-oriented methods lose object consistency as their generated images miss certain objects in the input scene graph. To alleviate these two issues, we propose Learning Object Consistency and Interaction (LOCI). To preserve object consistency, we design a consistency module with a weighted augmentation strategy for objects easy to be ignored and a matching loss between scene graphs and image codes. To learn object interaction, we design an interaction module consisting of three kinds of message propagation between the input scene graph and the learned image code. Experiments on COCO-stuff and Visual Genome datasets show our proposed method alleviates the ignorance of objects and outperforms the state-of-the-art on visual fidelity of generated images and objects.

JBHI Journal 2023 Journal Article

Multiple Adversarial Learning Based Angiography Reconstruction for Ultra-Low-Dose Contrast Medium CT

  • Weiwei Zhang
  • Zhen Zhou
  • Zhifan Gao
  • Guang Yang
  • Lei Xu
  • Weiwen Wu
  • Heye Zhang

Iodinated contrast medium (ICM) dose reduction is beneficial for decreasing potential health risk to renal-insufficiency patients in CT scanning. Due to the low-intensity vessel in ultra-low-dose-ICM CT angiography, it cannot provide clinical diagnosis of vascular diseases. Angiography reconstruction for ultra-low-dose-ICM CT can enhance vascular intensity for directly vascular diseases diagnosis. However, the angiography reconstruction is challenging since patient individual differences and vascular disease diversity. In this paper, we propose a Multiple Adversarial Learning based Angiography Reconstruction (i. e. , MALAR) framework to enhance vascular intensity. Specifically, a bilateral learning mechanism is developed for mapping a relationship between source and target domains rather than the image-to-image mapping. Then, a dual correlation constraint is introduced to characterize both distribution uniformity from across-domain features and sample inconsistency within domain simultaneously. Finally, an adaptive fusion module by combining multi-scale information and long-range interactive dependency is explored to alleviate the interference of high-noise metal. Experiments are performed on CT sequences with different ICM doses. Quantitative results based on multiple metrics demonstrate the effectiveness of our MALAR on angiography reconstruction. Qualitative assessments by radiographers confirm the potential of our MALAR for the clinical diagnosis of vascular diseases.

AAAI Conference 2023 Conference Paper

Video Event Extraction via Tracking Visual States of Arguments

  • Guang Yang
  • Manling Li
  • Jiajie Zhang
  • Xudong Lin
  • Heng Ji
  • Shih-Fu Chang

Video event extraction aims to detect salient events from a video and identify the arguments for each event as well as their semantic roles. Existing methods focus on capturing the overall visual scene of each frame, ignoring fine-grained argument-level information. Inspired by the definition of events as changes of states, we propose a novel framework to detect video events by tracking the changes in the visual states of all involved arguments, which are expected to provide the most informative evidence for the extraction of video events. In order to capture the visual state changes of arguments, we decompose them into changes in pixels within objects, displacements of objects, and interactions among multiple arguments. We further propose Object State Embedding, Object Motion-aware Embedding and Argument Interaction Embedding to encode and track these changes respectively. Experiments on various video event extraction tasks demonstrate significant improvements compared to state-of-the-art models. In particular, on verb classification, we achieve 3.49% absolute gains (19.53% relative gains) in F1@5 on Video Situation Recognition. Our Code is publicly available at https://github.com/Shinetism/VStates for research purposes.

AAAI Conference 2022 Conference Paper

DRAG: Dynamic Region-Aware GCN for Privacy-Leaking Image Detection

  • Guang Yang
  • Juan Cao
  • Qiang Sheng
  • Peng Qi
  • Xirong Li
  • Jintao Li

The daily practice of sharing images on social media raises a severe issue about privacy leakage. To address the issue, privacy-leaking image detection is studied recently, with the goal to automatically identify images that may leak privacy. Recent advance on this task benefits from focusing on crucial objects via pretrained object detectors and modeling their correlation. However, these methods have two limitations: 1) they neglect other important elements like scenes, textures, and objects beyond the capacity of pretrained object detectors; 2) the correlation among objects is fixed, but a fixed correlation is not appropriate for all the images. To overcome the limitations, we propose the Dynamic Region-Aware Graph Convolutional Network (DRAG) that dynamically finds out crucial regions including objects and other important elements, and models their correlation adaptively for each input image. To find out crucial regions, we cluster spatiallycorrelated feature channels into several region-aware feature maps. Further, we dynamically model the correlation with the self-attention mechanism and explore the interaction among the regions with a graph convolutional network. The DRAG achieved an accuracy of 87% on the largest dataset for privacy-leaking image detection, which is 10 percentage points higher than the state of the art. The further case study demonstrates that it found out crucial regions containing not only objects but other important elements like textures. The code and more details are in https: //github. com/guangyanng/DRAG.

JBHI Journal 2022 Journal Article

JAS-GAN: Generative Adversarial Network Based Joint Atrium and Scar Segmentations on Unbalanced Atrial Targets

  • Jun Chen
  • Guang Yang
  • Habib Khan
  • Heye Zhang
  • Yanping Zhang
  • Shu Zhao
  • Raad Mohiaddin
  • Tom Wong

Automated and accurate segmentations of left atrium (LA) and atrial scars from late gadolinium-enhanced cardiac magnetic resonance (LGE CMR) images are in high demand for quantifying atrial scars. The previous quantification of atrial scars relies on a two-phase segmentation for LA and atrial scars due to their large volume difference (unbalanced atrial targets). In this paper, we propose an inter-cascade generative adversarial network, namely JAS-GAN, to segment the unbalanced atrial targets from LGE CMR images automatically and accurately in an end-to-end way. Firstly, JAS-GAN investigates an adaptive attention cascade to automatically correlate the segmentation tasks of the unbalanced atrial targets. The adaptive attention cascade mainly models the inclusion relationship of the two unbalanced atrial targets, where the estimated LA acts as the attention map to adaptively focus on the small atrial scars roughly. Then, an adversarial regularization is applied to the segmentation tasks of the unbalanced atrial targets for making a consistent optimization. It mainly forces the estimated joint distribution of LA and atrial scars to match the real ones. We evaluated the performance of our JAS-GAN on a 3D LGE CMR dataset with 192 scans. Compared with the state-of-the-art methods, our proposed approach yielded better segmentation performance (Average Dice Similarity Coefficient (DSC) values of 0. 946 and 0. 821 for LA and atrial scars, respectively), which indicated the effectiveness of our proposed approach for segmenting unbalanced atrial targets.

JBHI Journal 2022 Journal Article

Ultrasound Entropy Imaging for Detection and Monitoring of Thermal Lesion During Microwave Ablation of Liver

  • Xiejing Li
  • Xin Jia
  • Ting Shen
  • Mengke Wang
  • Guang Yang
  • Hua Wang
  • Qinli Sun
  • Mingxi Wan

Ultrasonic B-mode imaging offers non-invasive and real-time monitoring of thermal ablation treatment in clinical use, however it faces challenges of moderate lesion-normal contrast and detection accuracy. Quantitative ultrasound imaging techniques have been proposed as promising tools to evaluate the microstructure of ablated tissue. In this study, we introduced Shannon entropy, a non-model based statistical measurement of disorder, to quantitatively detect and monitor microwave-induced ablation in porcine livers. Performance of typical Shannon entropy (TSE), weighted Shannon entropy (WSE), and horizontally normalized Shannon entropy (hNSE) were explored and compared with conventional B-mode imaging. TSE estimated from non-normalized probability distribution histograms was found to have insufficient discernibility of different disorder of data. WSE that improves from TSE by adding signal amplitudes as weights obtained area under receiver operating characteristic (AUROC) curve of 0. 895, whereas it underestimated the periphery of lesion region. hNSE provided superior ablated area prediction with the correlation coefficient of 0. 90 against ground truth, AUROC of 0. 868, and remarkable lesion-normal contrast with contrast-to-noise ratio of 5. 86 which was significantly higher than other imaging methods. Data distributions shown in horizontally normalized probability distribution histograms indicated that the disorder of backscattered envelope signal from ablated region increased as treatment went on. These findings suggest that hNSE imaging could be a promising technique to assist ultrasound guided percutaneous thermal ablation.

IROS Conference 2020 Conference Paper

Learning Human Navigation Behavior Using Measured Human Trajectories in Crowded Spaces

  • Muhammad Fahad 0003
  • Guang Yang
  • Yi Guo 0004

As humans and mobile robots increasingly coexist in public spaces, their close proximity demands that robots navigate following navigation strategies similar to those exhibited by humans. This could be achieved by learning directly from human demonstration trajectories in a machine learning framework. In this paper, we present a method to learn human navigation behaviors using an imitation learning approach based on generative adversarial imitation learning (GAIL), which has the ability of directly extracting navigation policy. Specifically, we use a large open human trajectory dataset that was experimentally collected in a crowded public space. We then recreate these human trajectories in a 3D robotic simulator, and generate demonstration data using a LIDAR sensor onboard a robot with the robot following the measured human trajectories. We then propose a GAIL based algorithm, which uses occupancy maps generated using LIDAR data as the input, and outputs the navigation policy for robot navigation. Simulation experiments are conducted, and performance evaluation shows that the learned navigation policy generates trajectories qualitatively and quantitatively similar to human trajectories. Compared with existing works using analytical models (such as social force model) to generate human demonstration trajectories, our method learns directly from intrinsic human trajectories, thus exhibits more human-like navigation behaviors.

AAMAS Conference 2016 Conference Paper

Posting Prices for a Perishable Item (Extended Abstract)

  • Bo Zheng
  • Li Xiao
  • Guang Yang
  • Tao Qin

In this paper, we study the problem of posting take-it-or-leave-it prices for a perishable item to maximize the seller’s expected revenue. Agents arrive independently over time, and each of them makes a purchase decision according to the price. We study the case that the seller has no prior information about agents’ valuations, whereas the benchmark in the competitive analysis is an optimal mechanism that knows the value distribution. We propose a mechanism Γ1 which obtains constant competitive ratios under various valuation models. We also conduct numerical simulations and validate the empirical performance of Γ1. CCS Concepts •Theory of computation → Computational pricing and auctions; Online algorithms; •Computing methodologies → Multiagent systems;

IROS Conference 2002 Conference Paper

Multi-agent control algorithms for chemical cloud detection and mapping using unmanned air vehicles

  • Michael A. Kovacina
  • Daniel W. Palmer
  • Guang Yang
  • Ravi Vaidyanathan

Traditional control approaches fall well short of the necessary flexibility and efficiency needed to meet the commercial and military demands placed upon UAV swarms. Effective coordination of these swarms requires development of control strategies based on emergent behavior. We have developed a rule-based, decentralized control algorithm that relies on constrained randomized behavior and respects UAV restrictions on sensors, computation, and flight envelope. To demonstrate and evaluate the effectiveness of our approach, we have created a simulation of an air vehicle swarm searching for and mapping a chemical cloud within a patrolled region. We then consider several different detection and mapping strategies based on emergent behavior. We then establish an inverse linear relation between the size of the swarm and the time to detect the cloud, regardless of the size of the cloud. Further, we also show the size of the swarm has a linear relation with the successful detection of the cloud.