Arrow Research search

Author name cluster

Yue Sun

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

17 papers
2 author rows

Possible papers

17

AAAI Conference 2026 Conference Paper

EndoIR: Degradation-Agnostic All-in-One Endoscopic Image Restoration via Noise-Aware Routing Diffusion

  • Tong Chen
  • Xinyu Ma
  • Long Bai
  • Wenyang Wang
  • Yue Sun
  • Luping Zhou

Endoscopic images often suffer from diverse and co-occurring degradations such as low lighting, smoke, and bleeding, which obscure critical clinical details. Existing restoration methods are typically task-specific and often require prior knowledge of the degradation type, limiting their robustness in real-world clinical use. We propose EndoIR, an all-in-one, degradation-agnostic diffusion-based framework that restores multiple degradation types using a single model. EndoIR introduces a Dual-Domain Prompter that extracts joint spatial–frequency features, coupled with an adaptive embedding that encodes both shared and task-specific cues as conditioning for denoising. To mitigate feature confusion in conventional concatenation-based conditioning, we design a Dual-Stream Diffusion architecture that processes clean and degraded inputs separately, with a Rectified Fusion Block integrating them in a structured, degradation-aware manner. Furthermore, Noise-Aware Routing Block improves efficiency by dynamically selecting only noise-relevant features during denoising. Experiments on SegSTRONG-C and CEC datasets demonstrate that EndoIR achieves state-of-the-art performance across multiple degradation scenarios while using fewer parameters than strong baselines, and downstream segmentation experiments confirm its clinical utility.

AAAI Conference 2026 Conference Paper

HiFi-Mesh: High-Fidelity Efficient 3D Mesh Generation via Compact Autoregressive Dependence

  • Yanfeng Li
  • Tao Tan
  • Qinquan Gao
  • Zhiwen Cao
  • Xiaohong Liu
  • Yue Sun

High-fidelity 3D meshes can be tokenized into one-dimension (1D) sequences and directly modeled using autoregressive approaches for faces and vertices. However, existing methods suffer from insufficient resource utilization, resulting in slow inference and the ability to handle only small-scale sequences, which severely constrains the expressible structural details. We introduce the Latent Autoregressive Network (LANE), which incorporates compact autoregressive dependencies in the generation process, achieving a 6× improvement in maximum generatable sequence length compared to existing methods. To further accelerate inference, we propose the Adaptive Computation Graph Reconfiguration (AdaGraph) strategy, which effectively overcomes the efficiency bottleneck of traditional serial inference through spatiotemporal decoupling in the generation process. Experimental validation demonstrates that LANE achieves superior performance across generation speed, structural detail, and geometric consistency, providing an effective solution for high-quality 3D mesh generation.

JBHI Journal 2026 Journal Article

M $^{3}$ SegNet: A Multi-Modal and Multi-Branch Framework for Nasopharyngeal Carcinoma Segmentation in Radiotherapy Planning

  • Junqiang Ma
  • Luyi Han
  • Henry H. Y. Tong
  • DengQiang Jia
  • Hui Xie
  • Anne W. M. Lee
  • Hing Ming Hung
  • Tao Tan

Accurate and simultaneous labeling of multiple structures, including gross tumor volumes, clinical target volumes, and organs at risk, is a fundamental multi-task requirement for radiotherapy planning in nasopharyngeal carcinoma. However, conventional manual labeling is labor-intensive and suffers from substantial inter-observer variability. This variability poses a significant challenge to the multi-modal interpretation of CT and MRI scans. Against this backdrop, automated approaches, particularly multi-modal and multi-task learning, are promising solutions. However, their clinical adoption is limited by three urgent needs: attention mechanisms that fuse multi-modal information at both local and global views, explicit incorporation of anatomical priors to regularize predictions, and a unified framework that enables concurrent segmentation of all desired structures. To overcome these limitations, we propose M $^{3}$ SegNet, a novel multi-modal and multi-branch framework that concurrently performs all clinically relevant segmentation tasks, integrating feature fusion and anatomical guidance. Our primary contributions are threefold. First, we introduce the Synergistic Global-Local Attention that extracts informative features from various imaging modalities (CT, T1-weighted, T2-weighted, and T1 contrast). Second, we propose an Anatomy-Aware Hierarchical Learning strategy that uses OAR spatial information to guide tumor segmentation. We also integrate Random Modality Dropout to enhance robustness against missing modalities. We validated M $^{3}$ SegNet on an internal 257-patient NPC dataset and confirmed its generalizability on three external datasets. In experiments, our framework significantly outperformed state-of-the-art methods. By providing a mechanism to leverage multi-modal information and anatomical priors, our M $^{3}$ SegNet offers a reliable, automated, and clinically translatable solution for NPC radiotherapy planning.

AAAI Conference 2026 Conference Paper

SUGAR: Learning Skeleton Representation with Visual-Motion Knowledge for Action Recognition

  • Qilang Ye
  • Yu Zhou
  • Lian He
  • Jie Zhang
  • Xuanming Guo
  • Jiayu Zhang
  • Mingkui Tan
  • Weicheng Xie

Large Language Models (LLMs) hold rich implicit knowledge and powerful transferability. In this paper, we explore the combination of LLMs with the human skeleton to perform action classification and description. However, when treating LLM as a recognizer, two questions arise: 1) How can LLMs understand the skeleton? 2) How can LLMs distinguish among actions? To address these problems, we introduce a novel paradigm named learning Skeleton representation with visual-motion knowledge for Action Recognition (SUGAR). In our pipeline, we first utilize off-the-shelf large-scale video models as a knowledge base to generate visual, motion information related to actions. Then, we propose to supervise skeleton learning through this prior knowledge to yield discrete representations. Finally, we use the LLM with untouched pre-training weights to understand these representations and generate the desired action targets and descriptions. Notably, we present a Temporal Query Projection (TQP) module to continuously model the skeleton signals with long sequences. Experiments on several skeleton-based action classification benchmarks demonstrate the efficacy of our SUGAR. Moreover, experiments on zero-shot scenarios show that SUGAR is more versatile than linear-based methods.

JBHI Journal 2025 Journal Article

3MT-Net: A Multi-Modal Multi-Task Model for Breast Cancer and Pathological Subtype Classification Based on a Multicenter Study

  • Yaofei Duan
  • Patrick Cheong-Iao Pang
  • Ping He
  • Rongsheng Wang
  • Yue Sun
  • Chuntao Liu
  • Xiaorong Zhang
  • Xirong Yuan

Breast cancer poses a significant threat to women's health, and ultrasound plays a critical role in the assessment of breast lesions. This study introduces a prospective deep learning architecture, termed the “Multi-modal Multi-task Network” (3MT-Net), which integrates clinical data with B-mode and color Doppler ultrasound images. Specifically, an AM-CapsNet is employed to extract key features from ultrasound images, while a cascaded cross-attention mechanism is utilized to fuse clinical data. Moreover, an ensemble learning approach with an optimization algorithm is adopted to dynamically assign weights to different modalities, accommodating both high-dimensional and low-dimensional data. The 3MT-Net performs binary classification of benign versus malignant lesions and further classifies the pathological subtypes. Data were retrospectively collected from nine medical centers to ensure the broad applicability of the 3MT-Net. Two separate testsets were created and extensive experiments were conducted. Comparative analyses demonstrated that the AUC of the 3MT-Net outperforms the industry-standard computer-aided detection product, S-Detect, by 1. 4% to 3. 8%.

JBHI Journal 2025 Journal Article

BRPDNet: A BioRegion Prompt Distillation Network for Physiological Monitoring

  • Zhengxuan Chen
  • Bin Huang
  • Kangyang Cao
  • Tao Tan
  • Bingsheng Huang
  • Chan-Tong Lam
  • Yue Sun

Physiological signal extraction from video data is challenging in dynamic and occluded environments, requiring both accuracy and real-time performance. Existing methods struggle to balance accuracy with model efficiency, particularly under partial facial occlusion or redundant signals. We propose BRPDNet, a novel framework for efficient physiological signal extraction which includes a BioRegion Prompt module for adaptive convolution and a Hyper Distillation module to reduce signal redundancy, ensuring high accuracy and robustness, especially in dynamic and occluded environments. Additionally, the teacher-student network structure enhances the model's adaptability to occlusions and reduces computational complexity without relying on explicit segmentation. Experimental results show that BRPDNet outperforms stateof-the-art models in accuracy, robustness, and efficiency across multiple datasets. For instance, BRPDNet achieves an Mean Absolute Error (MAE) of 1. 55 beats per minute (bpm) and a Pearson Correlation Coefficient (PCC) of 0. 76 on PURE and UBFC-rPPG datasets with fewer parameters than existing models, ensuring efficient real-time performance.

JBHI Journal 2025 Journal Article

HRMamba: Fusing Luminance Information for Remote Physiological Measurement in Varied Lighting Conditions

  • Kaiwen Yang
  • Nuoer Long
  • Wei Ke
  • Chan-Tong Lam
  • Tao Tan
  • Zitong Yu
  • Yue Sun

Camera-based photoplethysmography (cbPPG) represents a non-invasive technique for capturing physiological parameters through facial videos, enabling the extraction of vital signs such as heart rate, respiration rate, and blood oxygen saturation without direct physical contact. Existing deep learning methods face two core challenges when dealing with cbPPG: firstly, extracting weak PPG signals from video segments with large spatial and temporal redundancy and understanding their periodic patterns in long contexts; secondly, accurately extracting PPG signals in complex lighting environments, especially in low-light conditions. To address these issues, this paper proposes an end-to-end method based on Mamba, named HRMamba. This method employs temporal difference mamba to process temporal signals and combines bidirectional state space to enable Mamba to robustly understand the scene and learn the periodic patterns of PPG. Furthermore, a luminance post-processing module is designed to extract luminance information from the video without enhancing lighting or altering the original video data, and embed it into the PPG signal. Experimental results demonstrate that HRMamba achieves state-of-the-art performance, and the designed luminance post-processing module can be applied in various lighting environments, significantly enhancing the performance in dark environments without degrading the performance in normal light scenes.

JBHI Journal 2025 Journal Article

UniMRISegNet: Universal 3D Network for Various Organs and Cancers Segmentation on Multi-Sequence MRI

  • Zhuoneng Zhang
  • Luyi Han
  • Tianyu Zhang
  • Zehui Lin
  • Qinquan Gao
  • Tong Tong
  • Yue Sun
  • Tao Tan

Three-dimensional organ and cancer segmentation based on multi-sequence MRI is crucial for assisting clinical diagnosis. However, current automated segmentation methods often focus on specific sequences, specific organs, and specific cancers, i. e. , lack of generality. To address this issue, we propose a universal segmentation network for multi-sequence MRI (UniMRISegNet) that can segment multiple organs and cancers. UniMRISegNet features a shared encoder-decoder architecture equipped with contextual prompt generation (CPG) and prompt-conditioned dynamic convolution (PCDC) modules. The CPG module encodes sequence-specific, position-specific, and organ/cancer-specific text prompts as prior information to inform UniMRISegNet about the specific task to be executed. The PCDC module can adaptively generate model weights based on the assigned prompts, enhancing the segmentation capabilities of the UniMRISegNet for specific tasks. To mitigate discrepancies between different sequences of the same organ and capture similarities between related sequences, we design a novel loss function called Semantic-Aware Cosine Similarity Loss (SACSL), which integrates the cosine similarity of text embeddings to reconcile discrepancies and similarities between MRI sequences of the same organ. We created a large-scale annotated multi-sequence, multi-organ, and multi-cancer segmentation workflow (MSOCS), and demonstrated that our UniMRISegNet outperforms other universal networks and single-task networks on MSOCS. Furthermore, the universal weights from MSOCS can be transferred to never-before-seen downstream tasks, achieving superior performance compared to training from scratch.

IJCAI Conference 2025 Conference Paper

Unveiling the Power of Noise Priors: Enhancing Diffusion Models for Mobile Traffic Prediction

  • Zhi Sheng
  • Daisy Yuan
  • Jingtao Ding
  • Qi Yan
  • Xi Zheng
  • Yue Sun
  • Yong Li

Accurate prediction of mobile traffic, i. e. , network traffic from cellular base stations, is crucial for optimizing network performance and supporting urban development. However, the non-stationary nature of mobile traffic, driven by human activity and environmental changes, leads to both regular patterns and abrupt variations. Diffusion models excel in capturing such complex temporal dynamics due to their ability to capture the inherent uncertainties. Most existing approaches prioritize designing novel denoising networks but often neglect the critical role of noise itself, potentially leading to sub-optimal performance. In this paper, we introduce a novel perspective by emphasizing the role of noise in the denoising process. Our analysis reveals that noise fundamentally shapes mobile traffic predictions, exhibiting distinct and consistent patterns. We propose NPDiff, a framework that decomposes noise into prior and residual components, with the prior derived from data dynamics, enhancing the model's ability to capture both regular and abrupt variations. NPDiff can seamlessly integrate with various diffusion-based prediction models, delivering predictions that are effective, efficient, and robust. Extensive experiments demonstrate that it achieves superior performance with an improvement over 30%, offering a new perspective on leveraging diffusion models in this domain. We provide code and data at https: //github. com/tsinghua-fib-lab/NPDiff.

ECAI Conference 2024 Conference Paper

Diversity-Enhanced Learning for Unsupervised Syntactically Controlled Paraphrase Generation

  • Shaojuan Wu
  • Jitong Li
  • Yue Sun
  • Xiaowang Zhang
  • Zhiyong Feng 0002

Syntactically controlled paraphrase generation is to generate diverse sentences that have the same semantics as the given original sentence but conform to the target syntactic structure. An optimal opportunity to enhance diversity is to make word substitutions during rephrasing based on syntactic control. Existing unsupervised methods have made great progress in syntactic control, but the generated paraphrases rarely have substitutions due to the limitation of training data. In this paper, we propose a Diversity syntactically controlled Paraphrase generation framework (DiPara), in which a novel training strategy is designed to obtain semantic sentences while using the given sentence as training objects. As diverse words vary the syntactic structure around them, we propose a phrase-aware attention mechanism to capture the syntactic structure associated with the current word. To achieve it, the linearized triple sequence is introduced to represent structure singly. Experiment results on two datasets show that DiPara outperforms strong baselines, especially diversity (Self-BLEU4) is improved by 10. 18% in ParaNMT-Small.

NeurIPS Conference 2021 Conference Paper

Towards Sample-efficient Overparameterized Meta-learning

  • Yue Sun
  • Adhyyan Narang
  • Ibrahim Gulluk
  • Samet Oymak
  • Maryam Fazel

An overarching goal in machine learning is to build a generalizable model with few samples. To this end, overparameterization has been the subject of immense interest to explain the generalization ability of deep nets even when the size of the dataset is smaller than that of the model. While the prior literature focuses on the classical supervised setting, this paper aims to demystify overparameterization for meta-learning. Here we have a sequence of linear-regression tasks and we ask: (1) Given earlier tasks, what is the optimal linear representation of features for a new downstream task? and (2) How many samples do we need to build this representation? This work shows that surprisingly, overparameterization arises as a natural answer to these fundamental meta-learning questions. Specifically, for (1), we first show that learning the optimal representation coincides with the problem of designing a task-aware regularization to promote inductive bias. We leverage this inductive bias to explain how the downstream task actually benefits from overparameterization, in contrast to prior works on few-shot learning. For (2), we develop a theory to explain how feature covariance can implicitly help reduce the sample complexity well below the degrees of freedom and lead to small estimation error. We then integrate these findings to obtain an overall performance guarantee for our meta-learning algorithm. Numerical experiments on real and synthetic data verify our insights on overparameterized meta-learning.

JBHI Journal 2021 Journal Article

Using BI-RADS Stratifications as Auxiliary Information for Breast Masses Classification in Ultrasound Images

  • Jie Xing
  • Chao Chen
  • Qinyang Lu
  • Xun Cai
  • Aijun Yu
  • Yi Xu
  • Xiaoling Xia
  • Yue Sun

Breast Ultrasound (BUS) imaging has been recognized as an essential imaging modality for breast masses classification in China. Current deep learning (DL) based solutions for BUS classification seek to feed ultrasound (US) images into deep convolutional neural networks (CNNs), to learn a hierarchical combination of features for discriminating malignant and benign masses. One existing problem in current DL-based BUS classification was the lack of spatial and channel-wise features weighting, which inevitably allow interference from redundant features and low sensitivity. In this study, we aim to incorporate the instructive information provided by breast imaging reporting and data system (BI-RADS) within DL-based classification. A novel DL-based BI-RADS Vector-Attention Network (BVA Net) that trains with both texture information and decoded information from BI-RADS stratifications was proposed for the task. Three baseline models, pre-trained DenseNet-121, ResNet-50 and Residual-Attention Network (RA Net) were included for comparison. Experiments were conducted on a large scale private main dataset and two public datasets, UDIAT and BUSI. On the main dataset, BVA Net outperformed other models, in terms of AUC (area under the receiver operating curve, 0. 908), ACC (accuracy, 0. 865), sensitivity (0. 812) and precision (0. 795). BVA Net also achieved the high AUC (0. 87 and 0. 882) and ACC (0. 859 and 0. 843), on UDIAT and BUSI. Moreover, we proposed a method that integrates both BVA Net binary classification and BI-RADS stratification estimation, called integrated classification. The introduction of integrated classification helped improving the overall sensitivity while maintaining a high specificity.

JBHI Journal 2020 Journal Article

Adaptive-Guided-Coupling-Probability Level Set for Retinal Layer Segmentation

  • Yue Sun
  • Sijie Niu
  • Xizhan Gao
  • Jie Su
  • Jiwen Dong
  • Yuehui Chen
  • Li Wang

Quantitative assessment of retinal layer thickness in spectral domain-optical coherence tomography (SD-OCT) images is vital for clinicians to determine the degree of ophthalmic lesions. However, due to the complex retinal tissues, high-level speckle noises and low intensity constraint, how to accurately recognize the retinal layer structure still remains a challenge. To overcome this problem, this paper proposes an adaptive-guided-coupling-probability level set method for retinal layer segmentation in SD-OCT images. Specifically, based on Bayes's theorem, each voxel probability representation is composed of two probability terms in our method. The first term is constructed as neighborhood Gaussian fitting distribution to characterize intensity information for each intra-retinal layer. The second one is boundary probability map generated by combining anatomical priors and adaptive thickness information to ensure surfaces evolve within a proper range. Then, the voxel probability representation is introduced into the proposed segmentation framework based on coupling probability level set to detect layer boundaries. A total of 1792 retinal B-scan images from 4 SD-OCT cubes in healthy eyes, 5 cubes in abnormal eyes with central serous chorioretinaopathy and 5 SD-OCT cubes in abnormal eyes with age-related macular disease are used to evaluate the proposed method. The experiment demonstrates that the segmentation results obtained by the proposed method have a good consistency with ground truth, and the proposed method outperforms six methods in the layer segmentation of uneven retinal SD-OCT images.

NeurIPS Conference 2019 Conference Paper

Escaping from saddle points on Riemannian manifolds

  • Yue Sun
  • Nicolas Flammarion
  • Maryam Fazel

We consider minimizing a nonconvex, smooth function $f$ on a Riemannian manifold $\mathcal{M}$. We show that a perturbed version of the gradient descent algorithm converges to a second-order stationary point for this problem (and hence is able to escape saddle points on the manifold). While the unconstrained problem is well-studied, our result is the first to prove such a rate for nonconvex, manifold-constrained problems. The rate of convergence depends as $1/\epsilon^2$ on the accuracy $\epsilon$, which matches a rate known only for unconstrained smooth minimization. The convergence rate also has a polynomial dependence on the parameters denoting the curvature of the manifold and the smoothness of the function.

IROS Conference 2018 Conference Paper

Unmanned Aerial Auger for Underground Sensor Installation

  • Yue Sun
  • Adam Plowcha
  • Mark Nail
  • Sebastian G. Elbaum
  • Benjamin Terry
  • Carrick Detweiler

Using an Unmanned Aerial Systems (UAS) to autonomously deploy soil sensors enables their installation in otherwise hard to access locations. In this paper, we present a system that integrates a UAS and a digging mechanism which can carry, secure, and install a small sensor into dirt effectively and efficiently. The integrated system includes 1) a low profile, light-weight, inexpensive auger mechanism, 2) a sensor carrying and deploying mechanism with low power consumption, and 3) sensors and software that control and evaluate the auger performance during digging. When tested on a suite of target soils and a target depth of 120mm, the system achieved a success rate of 100% for indoor tests and 92. 5% for outdoors, verifying the potential of the approach.