Author name cluster

Yang Gu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

12 papers

2 author rows

JBHI Journal 2026 Journal Article

APSevLM: Acute Pancreatitis Severity Language Model

Leqi Zheng
Jiajun Fang
Hongyi Chen
Naiqing Li
Yunyuan Huang
Qiulin Ge
Yang Gu
Tao Yu

Approximately one-fifth of patients with acute pancreatitis (AP) develop severe forms, which are associated with high mortality rates, making early prediction of severity crucial for effective patient management. In this study, we present APSevLM (Acute Pancreatitis Severity Language Model), a large language model (LLM)-based approach that integrates admission-time clinical data, imaging reports, and expert knowledge to predict AP severity at an early stage. Through a comprehensive evaluation using data from over five hundred patients, APSevLM outperforms traditional scoring systems (BISAP and MCTSI), conventional machine learning algorithms, and state-of-the-art deep learning models, achieving an AUC of 0. 857. Attention visualizations of the model explain complex mechanisms that dynamically weigh different information modalities based on case severity. Furthermore, a systematic feature importance analysis identifies key predictive factors, particularly hematological parameters and cardiac markers, offering valuable insights for clinical practice. Our study positions APSevLM as an accurate predictive model and highlights potential biomarkers for the early diagnosis of severe AP.

Details DOI

AAAI Conference 2026 Conference Paper

State Mamba: Spatiotemporal EEG State-Space Model with Dynamic Brain Alignment for Cross-Subject Representation

Weining Weng
Yang Gu
Yuan Ma
Yuchen Liu
Yingwei Zhang
Yiqiang Chen

Cross-subject EEG decoding remains a fundamental challenge due to substantial inter-subject variability in brain activity, which hinders the development of subject-independent EEG models. Despite progress in extracting cross-subject invariant features, existing studies neglect the shared neural responses that arise under similar cognitive or emotional states across individuals, limiting their ability to learn generalized and consistent EEG representations. To address the challenges, we propose State Mamba, a novel spatiotemporal EEG state-space model that explicitly models and aligns neural responses and their spatiotemporal state transitions to learn consistent and generalizable representations across subjects. Innovatively, State Mamba theoretically formulates a multi-channel Mamba architecture that jointly models spatial and temporal brain state transitions, supporting principled analysis of neural responses. To enhance spatiotemporal feature coupling, we introduce the LGANN module, which adopts global-local attention to integrate long- and short-term brain activity into a compact EEG representation. Furthermore, we design two self-supervised pretext tasks to extract consistent neural patterns across subjects: (1) representation alignment to align EEG representation, and (2) pattern alignment to align their transition rules under identical conditions, jointly promoting subject-invariant EEG representations. Extensive experiments on three benchmark datasets, FACED, DEAP, and ISRUC, demonstrate the superior performance of State Mamba in cross-subject emotion and sleep recognition tasks, validating its robust generalization capability.

PDF Details DOI

AAAI Conference 2025 Conference Paper

ADELA: Accelerating Evolutionary Design of Machine Learning Pipelines with the Accompanying Surrogate Model

Yang Gu
Jian Cao
Hengyu You
Nengjun Zhu
Shiyou Qian

The end-to-end automated design of machine learning (ML) pipelines significantly reduces the workload for data scientists and democratizes ML for non-experts. Evolutionary algorithm (EA)-based automated ML (AutoML) systems, a prominent category of AutoML, often face inefficiencies due to the costly fitness evaluation of candidate ML pipelines. Although surrogate models have been employed to approximate the true performance of pipelines more quickly, a key challenge remains in effectively bridging the semantic gap between the heterogeneous features of datasets and pipelines. To address this issue, we propose ADELA, a novel accompanying surrogate-based optimization strategy that accelerates EA-based AutoML while retaining the performance of the resulting pipelines. ADELA operates in two phases: Offline, leveraging a high-quality curated pipeline corpus to meta-learn an accompanying surrogate model; and Online, selecting the accompanying pipeline and using the learned model to predict the performance of evaluation pipelines instead of executing them. The accompanying mechanism effectively mitigates the semantic gap between datasets and pipelines, enabling ADELA to reduce computation times by an average of 73.66% while retaining 98.78% of the final pipeline performance, as demonstrated in extensive experimental evaluations.

PDF Details DOI

JBHI Journal 2025 Journal Article

PhysCL: Knowledge-Aware Contrastive Learning of Physiological Signal Models for Cuff-Less Blood Pressure Estimation

Renju Liu
Jianfei Shen
Yang Gu
Yiqiang Chen
Jiling Zhang
Qingyu Wu
Chenyang Xu
Feiyi Fan

Training deep learning models for photoplethysmography(PPG)-based cuff-less blood pressure estimation often requires a substantial amount of labeled data collected through sophisticated medical instruments, posing significant challenges in practical applications. To address this issue, we propose Physiological Knowledge-Aware Contrastive Learning (PhysCL), a novel approach designed to reduce the dependence on labeled PPG data while improving blood pressure estimation accuracy. Specifically, PhysCL tackles the semantic consistency problem in contrastive learning by introducing a knowledge-aware augmentation bank, which generates positive physiological signal pairs using knowledge-based constraints during the contrastive pair generation. Additionally, we propose a contrastive feature reconstruction method to enhance feature diversity and prevent model collapse through feature re-sampling and re-weighting. We evaluate PhysCL on data from 106 subjects across the MIMIC III, MIMIC IV, and UQVS datasets under cross-dataset validation settings, comparing it against state-of-the-art contrastive learning methods and blood pressure estimation models. PhysCL achieves an average mean absolute error of 9. 5/5. 9 mmHg (systolic/diastolic) across the three datasets, using only 2% labeled data combined with 98% unlabeled data for pre-training and 5 samples for personalization, which represents a 6. 2% /4. 3% improvement, respectively, over the current best supervised methods. The ablation study provides further convincing evidence that the unlabeled data can be utilized to improve the existing cuff-less blood pressure estimation models and shed light on unsupervised contrastive learning for physiological signals.

Details DOI

ICRA Conference 2024 Conference Paper

Microrobotic Flight Enabled by Ultralight Ion Thrusters with High Thrust-to-Weight Ratio and Low Fabrication Cost

Yang Gu
Xianfa Cai
Khadga Thakuri
Wenyu Yang
Yufeng Guo
Wei Li 0003

Flying microrobots have garnered growing research interest owing to their technological intricacies and suitability for various applications leveraging miniaturized size. Electrohydrodynamic (EHD) thrust offers advantages by generating propulsion without moving parts, but real-world use is limited by insufficient thrust generation, manufacturing challenges, fragility, and cost. This work presents the design and development of an optimized ion-propelled flying microrobot that excels in low weight, high thrust-to-weight ratio, and cost efficiency. Regarding design, multiphysics simulations guided structural optimization to increase thrust while decreasing weight. For materials, metal-coated polyethylene terephthalate (PET) film was selected to leverage the combined merits of metal conductivity and polymer flexibility, light weight, and low cost, enabling further weight reduction, easy assembly, robustness, and cost-effectiveness. Various experiments, including voltage-current measurements, ionic wind speed, thrust quantification, and airflow visualization, directed design refinements and validated performance. Through structural optimization, the maximum wind speed attained 2. 25 m/s. Flight demonstrations with payloads evidenced the microrobot can stably fly at an inherent 16 mg weight while carrying an additional 72 mg load, achieving a record 5. 5 thrust-to-weight ratio. These results open possibilities to incorporate microelectronics, enabling autonomous flight functionality.

Details

AAAI Conference 2020 Conference Paper

An End-to-End Visual-Audio Attention Network for Emotion Recognition in User-Generated Videos

Sicheng Zhao
Yunsheng Ma
Yang Gu
Jufeng Yang
Tengfei Xing
Pengfei Xu
Runbo Hu
Hua Chai

Emotion recognition in user-generated videos plays an important role in human-centered computing. Existing methods mainly employ traditional two-stage shallow pipeline, i. e. extracting visual and/or audio features and training classiﬁers. In this paper, we propose to recognize video emotions in an end-to-end manner based on convolutional neural networks (CNNs). Speciﬁcally, we develop a deep Visual- Audio Attention Network (VAANet), a novel architecture that integrates spatial, channel-wise, and temporal attentions into a visual 3D CNN and temporal attentions into an audio 2D CNN. Further, we design a special classiﬁcation loss, i. e. polarity-consistent cross-entropy loss, based on the polarity-emotion hierarchy constraint to guide the attention generation. Extensive experiments conducted on the challenging VideoEmotion-8 and Ekman-6 datasets demonstrate that the proposed VAANet outperforms the state-of-the-art approaches for video emotion recognition. Our source code is released at: https: //github. com/maysonma/VAANet.

PDF Details

AAAI Conference 2020 Conference Paper

Multi-Source Distilling Domain Adaptation

Sicheng Zhao
Guangzhi Wang
Shanghang Zhang
Yang Gu
Yaxian Li
Zhichao Song
Pengfei Xu
Runbo Hu

Deep neural networks suffer from performance decay when there is domain shift between the labeled source domain and unlabeled target domain, which motivates the research on domain adaptation (DA). Conventional DA methods usually assume that the labeled data is sampled from a single source distribution. However, in practice, labeled data may be collected from multiple sources, while naive application of the single-source DA algorithms may lead to suboptimal solutions. In this paper, we propose a novel multi-source distilling domain adaptation (MDDA) network, which not only considers the different distances among multiple sources and the target, but also investigates the different similarities of the source samples to the target ones. Speciﬁcally, the proposed MDDA includes four stages: (1) pre-train the source classi- ﬁers separately using the training data from each source; (2) adversarially map the target into the feature space of each source respectively by minimizing the empirical Wasserstein distance between source and target; (3) select the source training samples that are closer to the target to ﬁne-tune the source classiﬁers; and (4) classify each encoded target feature by corresponding source classiﬁer, and aggregate different predictions using respective domain weight, which corresponds to the discrepancy between each source and target. Extensive experiments are conducted on public DA benchmarks, and the results demonstrate that the proposed MDDA signiﬁcantly outperforms the state-of-the-art approaches. Our source code is released at: https: //github. com/daoyuan98/MDDA.

PDF Details

NeurIPS Conference 2019 Conference Paper

Multi-source Domain Adaptation for Semantic Segmentation

Sicheng Zhao
Bo Li
Xiangyu Yue
Yang Gu
Pengfei Xu
Runbo Hu
Hua Chai
Kurt Keutzer

Simulation-to-real domain adaptation for semantic segmentation has been actively studied for various applications such as autonomous driving. Existing methods mainly focus on a single-source setting, which cannot easily handle a more practical scenario of multiple sources with different distributions. In this paper, we propose to investigate multi-source domain adaptation for semantic segmentation. Specifically, we design a novel framework, termed Multi-source Adversarial Domain Aggregation Network (MADAN), which can be trained in an end-to-end manner. First, we generate an adapted domain for each source with dynamic semantic consistency while aligning at the pixel-level cycle-consistently towards the target. Second, we propose sub-domain aggregation discriminator and cross-domain cycle discriminator to make different adapted domains more closely aggregated. Finally, feature-level alignment is performed between the aggregated domain and target domain while training the segmentation network. Extensive experiments from synthetic GTA and SYNTHIA to real Cityscapes and BDDS datasets demonstrate that the proposed MADAN model outperforms state-of-the-art approaches. Our source code is released at: https: //github. com/Luodian/MADAN.

PDF Details

ICRA Conference 2008 Conference Paper

Learning tactic-based motion models with fast particle smoothing

Yang Gu
Manuela Veloso

Learning parameters of a motion model is an important challenge for autonomous robots. We address the particular instance of parameter learning when tracking motions with a switching state-space model. We present a general algorithm for dealing simultaneously with both unknown fixed model parameters and state variables. Using an Expectation-Maximization approach, we apply a tactic-based multi-model particle filter to estimate the state variables in the E-step, and use particle smoothing to update the parameters in the M-step. We test our algorithm both in simulation and in a team robot soccer environment, as a substrate for applying the learned models to object tracking in a team. One of the soccer robots learns the actuation model of its teammate. The experimental results show that the particle smoothing efficiency is substantially increased and the tracking performance is significantly improved using the learned teammate actuation model.

Details

ICRA Conference 2006 Conference Paper

Multi-model Tracking using Team Actuation Models

Yang Gu
Manuela Veloso

Robots need to track object. Object tracking efficiency completely depends on the accuracy of the motion model and of the sensory information. Interestingly, when multiple team members can actuate the object being tracked, the motion can become highly discontinuous and nonlinear. We have previously developed a successful tracking approach that switches among target motion models as a function of one robot's actions. In this paper, we report on a tracking approach that can use a dynamic multiple motion model based on a team coordination plan. We present the multi-model probabilistic tracking algorithms in detail and present empirical results both in simulation and in a human-robot Segway soccer team. The team coordination plan allows the robot to much more effectively track mobile targets

Details

IROS Conference 2006 Conference Paper

Team-Driven Multi-Model Motion Tracking with Communication

Yang Gu
Manuela Veloso

Interactions are frequently seen between the robot and the targets being tracked within the robotics community. Modeling the interactions using knowledge of robot cognition improves the performance of the tracker. Communication improves the performance of a multi-agent system. The focus of this paper is to present our solution to integrate the communication information into our team-driven multi-model motion tracking. We present the probabilistic tracking algorithm in detail and present empirical results both in simulation and in a Segway soccer team. The information from team communication allows the robot to much more effectively track mobile targets

Details

AAAI Conference 2005 Conference Paper

Tactic-Based Motion Modeling and Multi-Sensor Tracking

Yang Gu

Tracking in essence consists of using sensory information combined with a motion model to estimate the position of a moving object. Tracking efficiency completely depends on the accuracy of the motion model and of the sensory information. For a vision sensor like a camera, the estimation is translated into a command to guide the camera where to look. In this paper, we contribute a method to achieve efficient tracking through using a tactic-based motion model, combined vision and infrared sensory information. We use a supervised learning technique to map the state being tracked to the commands that lead the camera to consistently track the object. We present the probabilistic algorithms in detail and present empirical results both in simulation experiment and from their effective execution in a Segway RMP robot.

PDF Details