Arrow Research search

Author name cluster

Yang Hu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

37 papers
2 author rows

Possible papers

37

EAAI Journal 2026 Journal Article

Mamba-CorRL: Mamba-correlation graph convolutional networks with reinforcement learning for traffic flow prediction

  • Yan Chen
  • Dawen Xia
  • Yanmin Liu
  • Fuchu Zhang
  • Wenyong Zhang
  • Yang Hu
  • Yantao Li
  • Huaqing Li

Accurate traffic flow prediction can provide effective decisions for traffic management departments, but traditional methods face significant challenges, such as inadequate spatiotemporal modeling, limitations of static adjacency matrices, and inefficiencies in capturing complex nonlinear patterns. To this end, this paper proposes the Mamba-Correlation Graph Convolutional Networks with Reinforcement Learning (Mamba-CorRL) for traffic flow prediction. Specifically, the Mamba and correlation graph convolutional networks (Mamba-CorGCN) framework combines the Mamba with correlation graph convolutional networks (CorGCN) to flexibly explore spatiotemporal dependencies and accurately capture the interaction of traffic flows on different road segments. Moreover, the high-low frequency attention and correlation attention (HiLo CorAttention) module adaptively adjusts the attention to high- and low-frequency features by combining the correlation attention mechanism with the HiLo attention mechanism, improving the ability to capture short-term fluctuations and long-term trends. Finally, the deep deterministic policy gradient (DDPG) module optimizes the prediction strategy by interacting with the environment and can adaptively adjust the adjacency matrix, thereby ensuring that the Mamba-CorGCN framework maintains high accuracy in complex and changing traffic scenarios. Experimental results demonstrate that Mamba-CorRL outperforms baselines, including historical average (HA), autoregressive integrated moving average (ARIMA), long short-term memory (LSTM), gated recurrent unit (GRU), convolutional neural network (CNN), diffusion convolutional recurrent neural network (DCRNN), graph multi-attention network (GMAN), temporal graph convolutional network (T-GCN), spatio-temporal graph convolutional network (STGCN), attention-based spatial-temporal graph convolutional network (ASTGCN), spatial-temporal synchronous graph convolutional network (STSGCN), adaptive graph convolutional recurrent network (AGCRN), spatial-temporal graph ordinary differential equation network (STGODE), spatial-temporal complex graph convolution network (STCGCN), bidirectional spatial-temporal adaptive transformer (Bi-STAT), attention-based spatial-temporal multi-graph ordinary differential equation network (ASTMGODE), adaptive spatial-temporal transformer network (ASTTN), direction-and distance-aware graph transformer (DDGformer), and dynamic graph convolutional networks with temporal representation learning (DGCN-TRL).

AAAI Conference 2026 Conference Paper

Pano-GS: Perception-Aware Gaussian Optimization with Gradient Consistency and Multi-Criteria Densification for High-Quality Rendering

  • Yang Deng
  • Zhanke Wang
  • Jiahao Wu
  • Jie Liang
  • Jingui Ma
  • Yang Hu
  • Ronggang Wang

Reconstructing 3D scenes from multi-view image sequences remains a significant challenge in practical applications. While recent advances in 3D Gaussian Splatting have enabled high-quality rendering, existing methods rely heavily on pixel-level L1 loss, which misaligns with human perception, leading to a lack of high-frequency details and the emergence of artifacts. Additionally, the position gradient-based densification strategy often results in under-densified Gaussian primitives, thereby degrading rendering quality. To address these challenges, we propose Pano-GS, a perception-aware Gaussian optimization framework. Specifically, we introduce a gradient consistency-constrained loss to capture high-frequency details, mitigating the inherent shortcomings of traditional L1 loss and enhancing reconstruction fidelity. In addition, we use a multi-criteria densification strategy to reduce the sole reliance on average position gradients. Extensive experiments demonstrate that Pano-GS achieves state-of-the-art performance, confirming its effectiveness and robust generalization across diverse real-world scenes.

JBHI Journal 2026 Journal Article

Radar HRV Monitoring With Physiological Prior Inspired Deep Neural Networks

  • Haoyu Wang
  • Jinbo Chen
  • Dongheng Zhang
  • Zhi Lu
  • Yang Hu
  • Qibin Sun
  • Yan Chen

Radar sensing has emerged as a promising solution for the contactless monitoring of Heart Rate Variability (HRV), a crucial indicator of the cardiovascular and autonomic nervous systems. However, due to signal noise and interference that easily obscure heartbeat details, along with variations in heartbeat across different physiological conditions, existing methods remain restricted to laboratory settings with healthy subjects and fail in real-world scenarios involving more complex physiological conditions. In this study, we propose a physiological prior-inspired deep learning framework for robust radar-based HRV monitoring. Specifically, we leverage the prior that internal heartbeats drive movements across the entire torso surface and design a hybrid deep neural network to model the spatio-temporal relationship between full-body radio reflections and heartbeats, effectively mitigating interference. Then, we incorporate the cardiac motion's self-similarity prior to establish a signal augmentation strategy, effectively remodeling the HRV distribution and enhancing performance across diverse physiological conditions. We build and validate our method on a large-scale dataset comprising 7, 150 outpatients with complex physiological conditions in real-world scenarios. The experimental results demonstrate that our method achieves a mean IBI error of 19. 21 ms, an RMSSD error of 16. 23 ms, an SDSD error of 16. 70 ms, and a pNN50 error of 7. 28%. We further validate the performance by classifying five common cardiac conditions based on HRV results, demonstrating performance comparable to ECG-based methods. These results highlight the great potential of our approach for accurate, contactless HRV monitoring in real-world applications.

AAAI Conference 2026 Conference Paper

TraceTrans: Translation and Spatial Tracing for Surgical Prediction

  • Xiyu Luo
  • Haodong Li
  • Xinxing Cheng
  • He Zhao
  • Yang Hu
  • Xuan Song
  • Tianyang Zhang

Image-to-image translation models have achieved notable success in converting images across visual domains and are increasingly used for medical tasks such as predicting post-operative outcomes and modeling disease progression. However, most existing methods primarily aim to match the target distribution and often neglect spatial correspondences between the source and translated images. This limitation can lead to structural inconsistencies and hallucinations, undermining the reliability and interpretability of the predictions. These challenges are accentuated in clinical applications by the stringent requirement for anatomical accuracy. In this work, we present TraceTrans, a novel deformable image translation model designed for post-operative prediction that generates images aligned with the target distribution while explicitly revealing spatial correspondences with the pre-operative input. The framework employs an encoder for feature extraction and dual decoders for predicting spatial deformations and synthesizing the translated image. The predicted deformation field imposes spatial constraints on the generated output, ensuring anatomical consistency with the source. Extensive experiments on medical cosmetology and brain MRI datasets demonstrate that TraceTrans delivers accurate and interpretable post-operative predictions, highlighting its potential for reliable clinical deployment.

JBHI Journal 2026 Journal Article

WN-Sleep: Modeling Whole-Night Data for Improved Sleep Staging Classification

  • Fang Zhou
  • Zhi Lu
  • Zhi Wu
  • Gaohan Ye
  • Lingjie Shu
  • Yu Pu
  • Beilei Wang
  • Dong Zhang

Sleep staging, crucial for diagnosing sleep disorders, requires precise recognition of physiological signals within 30-second epochs, a task fundamentally different from managing long-term semantic dependencies in natural language processing (NLP). Our model aims to refine the integration of local and global features for more accurate sleep stage classification. Following the American Academy of Sleep Medicine (AASM) guidelines, it focuses on rigorous intra-epoch feature extraction to ensure reliable identification of sleep stages. Moreover, our approach incorporates a global perspective by analyzing whole-night data, which is essential for handling transitional periods and ambiguities. Existing sequential modeling techniques often overlook the unique requirements of sleep staging, leading to performance declines when epochs extend beyond approximately 200. Our model addresses this by structurally processing local and global information and carefully balancing detailed intra-epoch analysis with an overarching view of sleep cycles through a gating mechanism. This gate mechanism selectively integrates long-term dependencies, optimizing the balance between local accuracy and global context. This approach represents a significant advancement over existing models, offering more accurate, reliable, and clinically relevant sleep staging. Extensive experiments on the SHHS, SleepEDF-20, and SleepEDF-78 datasets demonstrate that our method outperforms state-of-the-art approaches.

EAAI Journal 2025 Journal Article

Applying deep learning and automated machine learning for enhanced state monitoring and health assessment of high-pressure heater in thermal power units

  • Guoxiong Zhu
  • Yang Hu
  • Xiaoning Zhang
  • Jiyu Chen
  • Jizhen Liu

High-pressure heaters are essential components in thermal power plants, where effective monitoring can significantly enhance safety and cost efficiency. This study proposes a novel method for monitoring and state prediction of high-pressure heaters, addressing challenges such as rapid load variations and extended low-load conditions that affect model accuracy. Using historical data for feature selection, the system employs a Convolutional Neural Network-bidirectional Long Short-term Memory (CNN-BiLSTM) network for state prediction. Hyperparameters are optimized using Bayesian methods, and automated machine learning enhances the model’s adaptability. Additionally, an improved Support Vector Data Description (SVDD) algorithm is used for both precise anomaly detection and health assessment. Validated with real-world data from a coal-fired power plant, the model achieved mean squared errors (MSE) of 1. 3260 for the shell-side drain temperature and 1. 4968 for the tube-side outlet temperature. The self-updating capability further reduced MSE by up to 56. 98%, and anomaly detection accuracy reached 96. 63%. These results highlight the model’s adaptability and effectiveness in improving monitoring and forecasting for high-pressure heaters.

IROS Conference 2025 Conference Paper

AVIP: Acoustic-Visual-Inertial-Pressure Fusion-based Underwater Localization System with Multi-Centric Calibration

  • Yuanbo Xue
  • Yang Hu
  • Dejin Zhang
  • Chih-Yung Wen
  • Bing Wang

Underwater localization is a crucial capability for ensuring robust and accurate vehicle navigation. Although various well-developed localization systems exist, their primary focus is on ground and aerial applications. The challenges posed by underwater environments, such as sparse textures and dynamic disturbances, enable the multi-modal fusion a promising solution for localization. This paper presents AVIP, a localization method that fuses Acoustic, Visual, Inertial, and Pressure modalities for underwater applications. To integrate the information from all modalities during initialization, visual and inertial modalities are alternately assigned as centric sensors to pairwise predict and update estimations of other modalities. The multi-centric calibration problem is addressed through factor graph optimization, which is fully integrated into the graph-based AVIP system as the calibration factor. To evaluate the performance and compare to state-of-the-art approaches, the proposed method is evaluated using semi-physical datasets recorded by a BlueROV2 robot and public real-world datasets. Extensive experiments demonstrate that AVIP achieves superior localization accuracy and exhibits adaptability across a range of sensor configurations.

IROS Conference 2025 Conference Paper

ComDrive: Comfort-Oriented End-to-End Autonomous Driving

  • Junming Wang
  • Xingyu Zhang
  • Zebin Xing
  • Songen Gu
  • Xiaoyang Guo
  • Yang Hu
  • Ziying Song
  • Qian Zhang 0001

We propose ComDrive: the first comfort-oriented end-to-end autonomous driving system to generate temporally consistent and comfortable trajectories. Recent studies have demonstrated that imitation learning-based planners and learning-based trajectory scorers can effectively generate and select safety trajectories that closely mimic expert demonstrations. However, such trajectory planners and scorers face the challenge of generating temporally inconsistent and uncomfortable trajectories. To address these issues, ComDrive first extracts 3D spatial representations through sparse perception, which then serves as conditional inputs. These inputs are used by a Conditional Denoising Diffusion Probabilistic Model (DDPM)-based motion planner to generate temporally consistent multi-modal trajectories. A dual-stream adaptive trajectory scorer subsequently selects the most comfortable trajectory from these candidates to control the vehicle. Experiments demonstrate that ComDrive achieves state-of-the-art performance in both comfort and safety, outperforming UniAD by 17%in driving comfort and reducing collision rates by 25%compared to SparseDrive. More results are available on our project page: https://jmwang0117.github.io/ComDrive/.

STOC Conference 2025 Conference Paper

Constant Approximation for Weighted Nash Social Welfare with Submodular Valuations

  • Yuda Feng
  • Yang Hu
  • Shi Li 0001
  • Ruilong Zhang 0001

We study the problem of assigning items to agents so as to maximize the weighted Nash Social Welfare (NSW) under submodular valuations. The best-known result for the problem is an O ( nw max )-approximation due to Garg, Husic, Li, Vegh, and Vondrak (STOC’23), where w max is the maximum weight over all agents. Obtaining a constant approximation algorithm is an open problem in the field that has recently attracted considerable attention. We give the first such algorithm for the problem, thus solving the open problem in the affirmative. Our algorithm is based on the natural Configuration LP for the problem, which was introduced recently by Feng and Li (ICALP’24) for the additive valuation case. Our rounding algorithm is similar to that of Li (SODA’25) developed for the unrelated machine scheduling problem to minimize weighted completion time. Roughly speaking, we designate the largest item in each configuration as a large item and the remaining items as small items. So, every agent gets precisely 1 fractional large item in the configuration LP solution. With the rounding algorithm in Li (SODA’25), we can ensure that in the obtained solution, every agent gets precisely 1 large item, and the assignments of small items are negatively correlated.

EAAI Journal 2025 Journal Article

DRL-ED: A deep reinforcement learning with encoder–decoder method for traffic flow prediction

  • Dawen Xia
  • Yan Chen
  • Wenyong Zhang
  • Xiaoduo Wei
  • Yang Hu
  • Yantao Li
  • Huaqing Li

Accurate traffic flow prediction assists transport authorities in addressing transportation problems. However, most forecasting methods encounter challenges in simultaneously modeling dynamic topologies and capturing long-range spatiotemporal dependencies. To this end, this paper proposes a Deep Reinforcement Learning with Encoder-Decoder (DRL-ED) method for traffic flow prediction. The proposed method incorporates an encoder-decoder architecture, which is equipped with an attention mechanism, a gated recurrent unit (GRU) network, and a correlation graph convolutional network (CorrGCN). Specifically, the attention mechanism facilitates the capture of spatiotemporal relationships by the GRU and CorrGCN networks, improving the accuracy and robustness of traffic flow prediction. Moreover, the GRU network contains short- and long-term memory units, enabling the discovery of temporal correlations across different time scales. This enables the method to consider both short-term fluctuations and long-term trends, allowing for more accurate prediction of traffic flow changes. Finally, the CorrGCN network aggregates node features through multi-layer graph convolution, enabling step-by-step information propagation to extract both local and global spatial features, thereby comprehensively understanding inter-site traffic flow relationships. The deep deterministic policy gradient (DDPG) algorithm is employed, enabling the CorrGCN network to automatically learn and update the dynamic neighbor matrix by the actor-critic framework combined with the soft update and empirical playback mechanism. This enables the network to adaptively adjust the adjacency matrix according to traffic data in different periods or scenarios, capturing spatiotemporal correlations in graph data. Experimental results demonstrate that DRL-ED outperforms 15 baseline methods in prediction accuracy.

ECAI Conference 2025 Conference Paper

Exploring Boundary-Aware Spatial-Frequency Fusion for Camouflaged Object Detection

  • Song Yu
  • Yang Hu
  • Haokang Ding
  • Zhifang Liao
  • Yucheng Song

Camouflaged Object Detection is challenging due to the high degree of similarity between camouflaged objects and their surrounding backgrounds. Current COD methods mainly rely on edge extraction in the spatial domain and local pixel-level information, neglecting the importance of global structural features. Additionally, they fail to effectively leverage the importance of phase spectrum information within frequency domain features. To this end, we propose a COD framework BASFNet based on boundary-aware frequency domain and spatial domain fusion. This method uses dual guided integration of frequency domain and spatial domain features. A phase-spectrum-based frequency-enhanced edge exploration module (FEEM) and a spatial core segmentation module (SCSM) are introduced to jointly capture the boundary and object features of camouflaged objects. These features are then effectively integrated through a spatial-frequency fusion interaction module (SFFIM). Furthermore, the boundary detection is further optimized through an boundary-aware training strategy. BASFNet outperforms existing state-of-the-art methods on three benchmark datasets, validating the effectiveness of the fusion of frequency and spatial domain information in COD tasks.

JBHI Journal 2025 Journal Article

Hyperbolic Geometry-Driven Robustness Enhancement for Rare Skin Disease Diagnosis

  • Yang Hu
  • Yuanyuan Chen
  • Xiaohan Xing
  • Jingfeng Zhang
  • Bolysbek Murat Yerzhanuly
  • Bazargul Matkerim
  • Yong Xia

The automated diagnosis of rare skin diseases using dermoscopy images, known as a few-shot learning (FSL) problem, remains challenging, since traditional FSL research tends to disregard the intrinsic hierarchical nature of rare diseases and data uncertainty. To address these issues, we propose to conduct rare skin disease diagnosis in hyperbolic space, which facilitates implicit class hierarchical structures and precise uncertainty measurement due to pivotal geometrical properties. We propose a Hyperbolic Geometry-driven Robustness Enhancement (HGRE) framework specifically tailored for diagnosing rare skin diseases. The HGRE framework uses implicit hierarchical relation in the hyperbolic space to better represent the features of rare diseases. Moreover, the framework incorporates an Adversarial Proxy Construction (APC) module to address the problem of data uncertainty. Specifically, the APC module uses the distance to the hyperbolic space origin as an indicator of uncertainty to filter and construct adversarial proxies for each uncertain prototype to achieve adversarial robust training. Leveraging the two unique geometrical properties, our HGRE framework effectively addresses the limitations of insufficient hierarchical relation utilization and data uncertainty in FSL-based rare skin disease diagnosis. This enhancement of the model's robustness in training has been corroborated by extensive empirical validation on two skin lesion datasets, where HGRE's performance notably surpassed existing state-of-the-art FSL methods.

EAAI Journal 2025 Journal Article

Lightweight train image fault detection model based on location information enhancement

  • Longxin Zhang
  • Yang Hu
  • Runti Tan
  • Wenliang Zeng
  • Jianguo Chen

In addition to the imperative for accurate detection, the computational and storage costs of defect detection models are often considered in computing environments with limited resources, such as mobile devices and edge nodes. Consequently, a lightweight fault detection network model (Light-LIENet), based on location information enhancement, is proposed. It is capable of achieving real-time and high-precision detection while minimizing computational overhead and parameter complexity. Light-LIENet uses a convolutional neural network as its foundational architecture for extracting pertinent image features. Furthermore, it integrates a location feature network (LFNet) to accomplish multiscale feature fusion and enhance object location information. LFNet includes a location sensitive block, a feature enhancement block, and a lightweight convolution block. These blocks collectively contribute to enhancing location information, augmenting feature extraction, and minimizing computational amount. Experimental results show that, when compared to state-of-the-art lightweight detectors on the freight train image fault dataset, Light-LIENet exhibits impressive performance. With a detection speed of 62 frames per second, a mere 1. 611 million model parameters, and a computational requirement of just 3. 984 G, it unequivocally demonstrates its effectiveness and practicality.

AAAI Conference 2025 Conference Paper

Poplar: Efficient Scaling of Distributed DNN Training on Heterogeneous GPU Clusters

  • WenZheng Zhang
  • Yang Hu
  • Jing Shi
  • Xiaoying Bai

Scaling Deep Neural Networks (DNNs) requires significant computational resources in terms of GPU quantity and compute capacity. In practice, there usually exists a large number of heterogeneous GPU devices due to the rapid release cycle of GPU products. It is highly needed to efficiently and economically harness the power of heterogeneous GPUs, so that it can meet the requirements of DNN research and development. The paper introduces Poplar, a distributed training system that extends Zero Redundancy Optimizer (ZeRO) with heterogeneous-aware capabilities. We explore a broader spectrum of GPU heterogeneity, including compute capability, memory capacity, quantity and a combination of them. In order to achieve high computational efficiency across all heterogeneous conditions, Poplar conducts fine-grained measurements of GPUs in each ZeRO stage. We propose a novel batch allocation method and a search algorithm to optimize the utilization of heterogeneous GPUs clusters. Furthermore, Poplar implements fully automated parallelism, eliminating the need for deploying heterogeneous hardware and finding suitable batch size. Extensive experiments on three heterogeneous clusters, comprising six different types of GPUs, demonstrate that Poplar achieves a training throughput improvement of 1.02-3.92x over current state-of-the-art heterogeneous training systems.

FOCS Conference 2025 Conference Paper

Static Retrieval Revisited: To Optimality and Beyond

  • Yang Hu
  • William Kuszmaul
  • Jingxun Liang
  • Huacheng Yu
  • Junkai Zhang
  • Renfei Zhou

In the static retrieval problem, a data structure must answer retrieval queries mapping a set of n keys in a universe $[U]$ to v-bit values. Information-theoretically, retrieval data structures can use as little as nv bits of space. For small value sizes v, it is possible to achieve $O(1)$ query time while using space $n v+o(n)$ bits-whether or not such a result is possible for larger values of v (e. g. , $v=\Theta(\log n)$) has remained open. In this paper, we obtain a tight lower bound (as well as matching upper bounds) for the static retrieval problem. In the case where values are large, we show that there is actually a significant tension between time and space. It is not possible, for example, to get $O(1)$ query time using $n v+o(n)$ bits of space, when $v=\Theta(\log n)$ (and assuming the word RAM model with $O(\log n)$-bit words)At first glance, our lower bound would seem to render retrieval unusable in many settings that aim to achieve very low redundancy. However, our second result offers a way around this: We show that, whenever a retrieval data structure $D_{1}$ is stored along with another data structure $D_{2}$ (whose size is similar to or larger than the size of $D_{1}$), it is possible to implement the combined data structure $D_{1} \cup D_{2}$ so that queries to $D_{1}$ take $O(1)$ time, operations on $D_{2}$ take the same asymptotic time as if $D_{2}$ were stored on its own, and the total space is $n v+\operatorname{Space}\left(D_{2}\right)+n^{0. 67}$ bits.

EAAI Journal 2025 Journal Article

Thoughtful and cautious reasoning: A fine-tuned knowledge graph-based multi-hop question answering framework

  • Yinghao Zheng
  • Ling Lu
  • Yang Hu
  • Yinong Chen
  • Aijuan Wang

The aim of Knowledge Graph Question Answering (KGQA) is to find the answer entity by utilizing the Knowledge Graph (KG). Despite remarkable successes in recent years, the existing multi-hop KGQA research still faces numerous challenges. First, a multi-hop question often contains multiple entities and their relationships, and the semantic information is complex. The current methods extract the semantics of the question through an encoder that cannot completely extract the complex and rich semantic information in the multi-hop questions. Second, current question answering models use the coarse information filtering mechanism in the process of reasoning, which lead to the loss of effective information and introduce additional noise. To address these issues, we propose a Thoughtful and Cautious Reasoning framework for Knowledge Graph Question Answering (TCR-KGQA). We design a new question encoder that can extract and fully fuse the local semantic information of the question at different levels, focusing on the unique local features of the multi-hop question text. Based on the advantages of Gated Recurrent Unit (GRU) for information filtering, we propose a loop instruction update framework based on residual-GRU to effectively capture key information in the reasoning process. Extensive experiments on three broad benchmark datasets demonstrate the effectiveness of our model on KGQA tasks, and it also yields excellent results in the case of incomplete knowledge graphs with missing question–answer pairs.

NeurIPS Conference 2024 Conference Paper

Dissect Black Box: Interpreting for Rule-Based Explanations in Unsupervised Anomaly Detection

  • Yu Zhang
  • Ruoyu Li
  • Nengwu Wu
  • Qing Li
  • Xinhan Lin
  • Yang Hu
  • Tao Li
  • Yong Jiang

In high-stakes sectors such as network security, IoT security, accurately distinguishing between normal and anomalous data is critical due to the significant implications for operational success and safety in decision-making. The complexity is exacerbated by the presence of unlabeled data and the opaque nature of black-box anomaly detection models, which obscure the rationale behind their predictions. In this paper, we present a novel method to interpret the decision-making processes of these models, which are essential for detecting malicious activities without labeled attack data. We put forward the Segmentation Clustering Decision Tree (SCD-Tree), designed to dissect and understand the structure of normal data distributions. The SCD-Tree integrates predictions from the anomaly detection model into its splitting criteria, enhancing the clustering process with the model's insights into anomalies. To further refine these segments, the Gaussian Boundary Delineation (GBD) algorithm is employed to define boundaries within each segmented distribution, effectively delineating normal from anomalous data points. At this point, this approach addresses the curse of dimensionality by segmenting high-dimensional data and ensures resilience to data drift and perturbations through flexible boundary fitting. We transform the intricate operations of anomaly detection into an interpretable rule's format, constructing a comprehensive set of rules for understanding. Our method's evaluation on diverse datasets and models demonstrates superior explanation accuracy, fidelity, and robustness over existing method, proving its efficacy in environments where interpretability is paramount.

IJCAI Conference 2024 Conference Paper

Learning-Based Tracking-before-Detect for RF-Based Unconstrained Indoor Human Tracking

  • Zhi Wu
  • Dongheng Zhang
  • Zixin Shang
  • Yuqin Yuan
  • Hanqin Gong
  • Binquan Wang
  • Zhi Lu
  • Yadong Li

Existing efforts on human tracking using wireless signal are primarily focused on constrained scenarios with only a few individuals in empty spaces. However, in practical unconstrained scenarios with severe interference and attenuation, accurate multi-person tracking has been intractable. In this paper, we propose NeuralTBD, utilizing the capability of deep models and advancement of Tracking-Before-Detect (TBD) methodology to achieve accurate human tracking. TBD is a classical tracking methodology from signal processing accumulating measurement in time domain to distinguish target traces from interference, which however relies on handcrafted shape/motion models, impeding efficacy in complex indoor scenarios. To tackle this challenge, we build an end-to-end learning-based TBD framework leverages the advanced modeling capabilities of deep models to significantly enhance the performance of TBD. To evaluate NeuralTBD, we collect an RF-based tracking dataset in unconstrained scenarios, which encompasses 4 million annotated radar frames with up to 19 individuals acting in 6 different scenarios. NeuralTBD realizes a 70% improvement in performance compared to conventional TBD methods. To our knowledge, this is the first attempt dealing with RF-based unconstrained human tracking. The code and dataset will be released.

ICLR Conference 2024 Conference Paper

NeuroBack: Improving CDCL SAT Solving using Graph Neural Networks

  • Wenxi Wang
  • Yang Hu
  • Mohit Tiwari
  • Sarfraz Khurshid
  • Kenneth L. McMillan
  • Risto Miikkulainen

Propositional satisfiability (SAT) is an NP-complete problem that impacts many research fields, such as planning, verification, and security. Mainstream modern SAT solvers are based on the Conflict-Driven Clause Learning (CDCL) algorithm. Recent work aimed to enhance CDCL SAT solvers using Graph Neural Networks (GNNs). However, so far this approach either has not made solving more effective, or required substantial GPU resources for frequent online model inferences. Aiming to make GNN improvements practical, this paper proposes an approach called NeuroBack, which builds on two insights: (1) predicting phases (i.e., values) of variables appearing in the majority (or even all) of the satisfying assignments are essential for CDCL SAT solving, and (2) it is sufficient to query the neural model only once for the predictions before the SAT solving starts. Once trained, the offline model inference allows NeuroBack to execute exclusively on the CPU, removing its reliance on GPU resources. To train NeuroBack, a new dataset called DataBack containing 120,286 data samples is created. Finally, NeuroBack is implemented as an enhancement to a state-of-the-art SAT solver called Kissat. As a result, it allowed Kissat to solve 5.2% more problems on the recent SAT competition problem set, SATCOMP-2022. NeuroBack therefore shows how machine learning can be harnessed to improve SAT solving in an effective and practical manner.

NeurIPS Conference 2024 Conference Paper

Probing Social Bias in Labor Market Text Generation by ChatGPT: A Masked Language Model Approach

  • Lei Ding
  • Yang Hu
  • Nicole Denier
  • Enze Shi
  • Junxi Zhang
  • Qirui Hu
  • Karen D. Hughes
  • Linglong Kong

As generative large language models (LLMs) such as ChatGPT gain widespread adoption in various domains, their potential to propagate and amplify social biases, particularly in high-stakes areas such as the labor market, has become a pressing concern. AI algorithms are not only widely used in the selection of job applicants, individual job seekers may also make use of generative LLMs to help develop their job application materials. Against this backdrop, this research builds on a novel experimental design to examine social biases within ChatGPT-generated job applications in response to real job advertisements. By simulating the process of job application creation, we examine the language patterns and biases that emerge when the model is prompted with diverse job postings. Notably, we present a novel bias evaluation framework based on Masked Language Models to quantitatively assess social bias based on validated inventories of social cues/words, enabling a systematic analysis of the language used. Our findings show that the increasing adoption of generative AI, not only by employers but also increasingly by individual job seekers, can reinforce and exacerbate gender and social inequalities in the labor market through the use of biased and gendered language.

ICLR Conference 2024 Conference Paper

Soft Robust MDPs and Risk-Sensitive MDPs: Equivalence, Policy Gradient, and Sample Complexity

  • Runyu Zhang 0001
  • Yang Hu
  • Na Li 0002

Robust Markov Decision Processes (MDPs) and risk-sensitive MDPs are both powerful tools for making decisions in the presence of uncertainties. Previous efforts have aimed to establish their connections, revealing equivalences in specific formulations. This paper introduces a new formulation for risk-sensitive MDPs, which assesses risk in a slightly different manner compared to the classical Markov risk measure [Ruszczy ́nski 2010], and establishes its equivalence with a class of soft robust MDP (RMDP) problems, including the standard RMDP as a special case. Leveraging this equivalence, we further derive the policy gradient theorem for both problems, proving gradient domination and global convergence of the exact policy gradient method under the tabular setting with direct parameterization. This forms a sharp contrast to the Markov risk measure, known to be potentially non-gradient-dominant [Huang et al. 2021]. We also propose a sample-based offline learning algorithm, namely the robust fitted-Z iteration (RFZI), for a specific soft RMDP problem with a KL-divergence regularization term (or equivalently the risk-sensitive MDP with an entropy risk measure). We showcase its streamlined design and less stringent assumptions due to the equivalence and analyze its sample complexity.

YNIMG Journal 2024 Journal Article

Uncovering individual variations in bystander intervention of injustice through intrinsic brain connectivity patterns

  • Yancheng Tang
  • Yang Hu
  • Jie Zhuang
  • Chunliang Feng
  • Xiaolin Zhou

When confronted with injustice, individuals often intervene as third parties to restore justice by either punishing the perpetrator or helping the victim, even at their own expense. However, little is known about how individual differences in third-party intervention propensity are related to inter-individual variability in intrinsic brain connectivity patterns and how these associations vary between help and punishment intervention. To address these questions, we employed a novel behavioral paradigm in combination with resting-state fMRI and inter-subject representational similarity analysis (IS-RSA). Participants acted as third-party bystanders and needed to decide whether to maintain the status quo or intervene by either helping the disadvantaged recipient (Help condition) or punishing the proposer (Punish condition) at a specific cost. Our analyses focused on three brain networks proposed in the third-party punishment (TPP) model: the salience (e.g., dorsal anterior cingulate cortex, dACC), central executive (e.g., dorsolateral prefrontal cortex, dlPFC), and default mode (e.g., dorsomedial prefrontal cortex, dmPFC; temporoparietal junction, TPJ) networks. IS-RSA showed that individual differences in resting-state functional connectivity (rs-FC) patterns within these networks were associated with the general third-party intervention propensity. Moreover, rs-FC patterns of the right dlPFC and right TPJ were more strongly associated with individual differences in the helping propensity rather than the punishment propensity, whereas the opposite pattern was observed for the dmPFC. Post-hoc predictive modeling confirmed the predictive power of rs-FC in these regions for intervention propensity across individuals. Collectively, these findings shed light on the shared and distinct roles of key regions in TPP brain networks at rest in accounting for individual variations in justice-restoring intervention behaviors.

NeurIPS Conference 2022 Conference Paper

Bounded-Regret MPC via Perturbation Analysis: Prediction Error, Constraints, and Nonlinearity

  • Yiheng Lin
  • Yang Hu
  • Guannan Qu
  • Tongxin Li
  • Adam Wierman

We study Model Predictive Control (MPC) and propose a general analysis pipeline to bound its dynamic regret. The pipeline first requires deriving a perturbation bound for a finite-time optimal control problem. Then, the perturbation bound is used to bound the per-step error of MPC, which leads to a bound on the dynamic regret. Thus, our pipeline reduces the study of MPC to the well-studied problem of perturbation analysis, enabling the derivation of regret bounds of MPC under a variety of settings. To demonstrate the power of our pipeline, we use it to generalize existing regret bounds on MPC in linear time-varying (LTV) systems to incorporate prediction errors on costs, dynamics, and disturbances. Further, our pipeline leads to regret bounds on MPC in systems with nonlinear dynamics and constraints.

YNIMG Journal 2022 Journal Article

Growth charts of brain morphometry for preschool children

  • Hongxi Zhang
  • Jia Li
  • Xiaoli Su
  • Yang Hu
  • Tianmei Liu
  • Shaoqing Ni
  • Haifeng Li
  • Xi-Nian Zuo

Brain development from 1 to 6 years of age anchors a wide range of functional capabilities and carries early signs of neurodevelopmental disorders. However, quantitative models for depicting brain morphology changes and making individualized inferences are lacking, preventing the identification of early brain atypicality during this period. With a sample size of 285, we characterized the age dependence of the cortical thickness and subcortical volume in neurologically normal children and constructed quantitative growth charts of all brain regions for preschool children. While the cortical thickness of most brain regions decreased with age, the entorhinal and parahippocampal regions displayed an inverted-U shape of age dependence. Compared to the cortical thickness, the normalized volume of subcortical regions exhibited more divergent trends, with some regions increasing, some decreasing, and some displaying inverted-U-shaped trends. The growth curve models for all brain regions demonstrated utilities in identifying brain atypicality. The percentile measures derived from the growth curves facilitate the identification of children with developmental speech and language disorders with an accuracy of 0.875 (area under the receiver operating characteristic curve: 0.943). Our results fill the knowledge gap in brain morphometrics in a critical development period and provide an avenue for individualized brain developmental status evaluation with demonstrated sensitivity. The brain growth charts are shared with the public (http://phi-group.top/resources.html).

AAAI Conference 2022 Conference Paper

Learning Token-Based Representation for Image Retrieval

  • Hui Wu
  • Min Wang
  • Wengang Zhou
  • Yang Hu
  • Houqiang Li

In image retrieval, deep local features learned in a data-driven manner have been demonstrated effective to improve retrieval performance. To realize efficient retrieval on large image database, some approaches quantize deep local features with a large codebook and match images with aggregated match kernel. However, the complexity of these approaches is nontrivial with large memory footprint, which limits their capability to jointly perform feature learning and aggregation. To generate compact global representations while maintaining regional matching capability, we propose a unified framework to jointly learn local feature representation and aggregation. In our framework, we first extract deep local features using CNNs. Then, we design a tokenizer module to aggregate them into a few visual tokens, each corresponding to a specific visual pattern. This helps to remove background noise, and capture more discriminative regions in the image. Next, a refinement block is introduced to enhance the visual tokens with self-attention and cross-attention. Finally, different visual tokens are concatenated to generate a compact global representation. The whole framework is trained end-to-end with image-level labels. Extensive experiments are conducted to evaluate our approach, which outperforms the state-of-the-art methods on the Revisited Oxford and Paris datasets.

NeurIPS Conference 2022 Conference Paper

On the Sample Complexity of Stabilizing LTI Systems on a Single Trajectory

  • Yang Hu
  • Adam Wierman
  • Guannan Qu

Stabilizing an unknown dynamical system is one of the central problems in control theory. In this paper, we study the sample complexity of the learn-to-stabilize problem in Linear Time-Invariant (LTI) systems on a single trajectory. Current state-of-the-art approaches require a sample complexity linear in $n$, the state dimension, which incurs a state norm that blows up exponentially in $n$. We propose a novel algorithm based on spectral decomposition that only needs to learn ``a small part'' of the dynamical matrix acting on its unstable subspace. We show that, under proper assumptions, our algorithm stabilizes an LTI system on a single trajectory with $O(k \log n)$ samples, where $k$ is the instability index of the system. This represents the first sub-linear sample complexity result for the stabilization of LTI systems under the regime when $k = o(n)$.

JBHI Journal 2022 Journal Article

Task-Coupling Elastic Learning for Physical Sign-Based Medical Image Classification

  • Yingxue Xu
  • Guihua Wen
  • Pei Yang
  • Baochao Fan
  • Yang Hu
  • Mingnan Luo
  • Changjun Wang

Physical signs of patients indicate crucial evidence for diagnosing both location and nature of the disease, where there is a sequential relationship between the two tasks. Thus their joint learning can utilize intrinsic association by transferring related knowledge across relevant tasks. Choosing the right time to transfer is a critical problem for joint learning. However, how to dynamically adjust when tasks interact to capture the right time for transferring related knowledge is still an open issue. To this end, we propose a Task-Coupling Elastic Learning (TCEL) framework to model the task relatedness for classifying disease-location and disease-nature based on physical sign images. The main idea is to dynamically transfer relevant knowledge by progressively shifting task-coupling from loose to tight during the multi-stage training. In the early stage of training, we relax the constraints of modeling relations to focus more in learning the generic task-common features. In the later stage, the semantic guidance will be strengthened to learn the task-specific features. Specifically, a dynamic sequential module (DSM) is proposed to explicitly model the sequential relationship and enable multi-stage training. Moreover, to address the side effect of DSM, a new loss regularization is proposed. The extensive experiments on these two clinical datasets show the superiority of the proposed method over the baselines, and demonstrate the effectiveness of the proposed task-coupling elastic mechanism.

TIST Journal 2022 Journal Article

What Can Knowledge Bring to Machine Learning?—A Survey of Low-shot Learning for Structured Data

  • Yang Hu
  • Adriane Chapman
  • Guihua Wen
  • Dame Wendy Hall

Supervised machine learning has several drawbacks that make it difficult to use in many situations. Drawbacks include heavy reliance on massive training data, limited generalizability, and poor expressiveness of high-level semantics. Low-shot Learning attempts to address these drawbacks. Low-shot learning allows the model to obtain good predictive power with very little or no training data, where structured knowledge plays a key role as a high-level semantic representation of human. This article will review the fundamental factors of low-shot learning technologies, with a focus on the operation of structured knowledge under different low-shot conditions. We also introduce other techniques relevant to low-shot learning. Finally, we point out the limitations of low-shot learning, the prospects and gaps of industrial applications, and future research directions.

AIIM Journal 2021 Journal Article

Fully-channel regional attention network for disease-location recognition with tongue images

  • Yang Hu
  • Guihua Wen
  • Mingnan Luo
  • Pei Yang
  • Dan Dai
  • Zhiwen Yu
  • Changjun Wang
  • Wendy Hall

Objective Using the deep learning model to realize tongue image-based disease location recognition and focus on solving two problems: 1. The ability of the general convolution network to model detailed regional tongue features is weak; 2. Ignoring the group relationship between convolution channels, which caused the high redundancy of the model. Methods To enhance the convolutional neural networks. In this paper, a stochastic region pooling method is proposed to gain detailed regional features. Also, an inner-imaging channel relationship modeling method is proposed to model multi-region relations on all channels. Moreover, we combine it with the spatial attention mechanism. Results The tongue image dataset with the clinical disease-location label is established. Abundant experiments are carried out on it. The experimental results show that the proposed method can effectively model the regional details of tongue image and improve the performance of disease location recognition. Conclusion In this paper, we construct the tongue image dataset with disease-location labels to mine the relationship between tongue images and disease locations. A novel fully-channel regional attention network is proposed to model the local detail tongue features and improve the modeling efficiency. Significance The applications of deep learning in tongue image disease-location recognition and the proposed innovative models have guiding significance for other assistant diagnostic tasks. The proposed model provides an example of efficient modeling of detailed tongue features, which is of great guiding significance for other auxiliary diagnosis applications.

YNIMG Journal 2021 Journal Article

Impact of inter-individual variability on the estimation of default mode network in temporal concatenation group ICA

  • Yang Hu
  • Zhi Yang

Temporal concatenation group ICA (TC-GICA) is a widely used data-driven method to extract common functional brain networks among individuals. TC-GICA concatenates the time series of individual fMRI data and applies dimension reduction and ICA algorithms to decompose the data into group-level components. The default mode network (DMN) estimated using TC-GICA at relatively high model orders (i.e., large numbers of components) is split into multiple components. The split DMNs are topographically different from those estimated using other methods (e.g., seed-based correlation, clustering, graph theoretical analysis, and other ICA methods like gRAICAR and IVA-GL) and are inconsistent with the existing knowledge of DMN. We hypothesize that the "DMN-splitting'' phenomenon reflects the impact of inter-individual variability in data, which is propagated into the ICA decomposition via the data-concatenation step of TC-GICA. By systematically manipulating the amount of variability involved in the temporal concatenation in both simulated and several realistic datasets, we observed that as more variability was involved, the estimated DMN became less similar to the averaged functional connectivity (FC) pattern obtained using seed-based correlation analysis. The performance of the DMN estimation in TC-GICA also exhibited remarkable dependence on the model order settings. Further analyses revealed that the "DMN-splitting" in TC-GICA could be reproduced when involving large variability in the data-concatenation and performing ICA at high model orders. These results were replicated across multiple datasets and various software implementations. When applying ICA approaches that avoid temporal concatenation, such as gRAICAR and IVA-GL, to the same datasets, the estimated group-level DMN was more consistent with the seed-based FC pattern and was more robust to various model order settings. This study calls for caution when applying TC-GICA to datasets expected to have large inter-individual variability, such as pooling different experimental groups of subjects.

NeurIPS Conference 2021 Conference Paper

Perturbation-based Regret Analysis of Predictive Control in Linear Time Varying Systems

  • Yiheng Lin
  • Yang Hu
  • Guanya Shi
  • Haoyuan Sun
  • Guannan Qu
  • Adam Wierman

We study predictive control in a setting where the dynamics are time-varying and linear, and the costs are time-varying and well-conditioned. At each time step, the controller receives the exact predictions of costs, dynamics, and disturbances for the future $k$ time steps. We show that when the prediction window $k$ is sufficiently large, predictive control is input-to-state stable and achieves a dynamic regret of $O(\lambda^k T)$, where $\lambda < 1$ is a positive constant. This is the first dynamic regret bound on the predictive control of linear time-varying systems. We also show a variation of predictive control obtains the first competitive bound for the control of linear time-varying systems: $1 + O(\lambda^k)$. Our results are derived using a novel proof framework based on a perturbation bound that characterizes how a small change to the system parameters impacts the optimal trajectory.

AIIM Journal 2020 Journal Article

Grouping attributes zero-shot learning for tongue constitution recognition

  • Guihua Wen
  • Jiajiong Ma
  • Yang Hu
  • Huihui Li
  • Lijun Jiang

Traditional Chinese Medicine (TCM) considers that the personal constitution determines the occurrence trend and therapeutic effects of certain diseases, which can be recognized by machine learning through tongue images. However, current machine learning methods are confronted with two challenges. First, there are not some larger tongue image databases available. Second, they do not use the domain knowledge of TCM, so that the imbalance of constitution categories cannot be solved. Therefore, this paper proposes a new constitution recognition method based on the zero-shot learning with the knowledge of TCM. To further improve the performance, a new zero-shot learning method is proposed by grouping attributes and learning discriminant latent features, which can better solve the imbalance problem of constitution categories. Experimental results on our constructed databases validate the proposed methods.

YNIMG Journal 2020 Journal Article

Individualized psychiatric imaging based on inter-subject neural synchronization in movie watching

  • Zhi Yang
  • Jinfeng Wu
  • Lihua Xu
  • Zhengzheng Deng
  • Yingying Tang
  • Jiaqi Gao
  • Yang Hu
  • Yiwen Zhang

The individual heterogeneity is a challenge to the prosperous promises of cutting-edge neuroimaging techniques for better diagnosis and early detection of psychiatric disorders. Individuals with similar clinical manifestations may result from very different pathophysiology. Conventional approaches based on comparing group-averages provide insufficient information to support the individualized diagnosis. Here we present an individualized imaging methodology that combines naturalistic imaging and the normative model. This paradigm adopts video clips with rich cognitive, social, and emotional contents to evoke synchronized brain dynamics of healthy participants and builds a spatiotemporal response norm. By comparing individual brain responses with the response norm, we could recognize patients using machine learning techniques. We applied this methodology to recognize first-episode drug-naïve schizophrenia patients in a dataset containing 72 patients and 54 healthy controls. Some segments of the video evoked more synchronized brain activity in the healthy controls than in the schizophrenia patients. We built a spatiotemporal response norm by averaging the brain responses of the healthy controls in a training set, and trained a classifier to recognize patients based on the differences between individual brain responses and the norm. The performance of the classifier was then evaluated using an independent test set. The mean accuracies from a 5-fold cross-validation were 0. 71–0. 78 depending on the parameters such as the number of features and the width of the sliding windows. These findings reflected the potential of this methodology towards a clinical tool for individualized diagnosis.

YNIMG Journal 2020 Journal Article

Reliability map of individual differences reflected in inter-subject correlation in naturalistic imaging

  • Jiaqi Gao
  • Gang Chen
  • Jinfeng Wu
  • YinShan Wang
  • Yang Hu
  • Ting Xu
  • Xi-Nian Zuo
  • Zhi Yang

Understanding individual differences in brain function is an essential aim of neuroscience. Naturalistic imaging links neural activity to real-life contexts and reflects individual differences in brain response. These unique features make it a promising tool for individualized psychiatry. An essential prerequisite for the extensive use of this paradigm is the reliable representation of inter-individual relationships. We used a test–retest approach to examine whether the naturalistic paradigm reliably represents inter-individual differences, which brain regions have the superior capability, and whether the ability alters with the contents of the stimuli. We quantified the reliability of the inter-subject relationships in repeated scans of two movie clips: a natural sight view and an emotion-evoking story. Besides statistical inference, we included resting-state scans, behavioral tests, and questionnaires as references for the comparison. The results showed that over one-third area of the brain could reliably characterize the inter-individual relationship, and the superior temporal lobe demonstrated comparable reliability representation with the State and Trait Anxiety Inventory. Furthermore, the temporal lobe regions could retain this capability across emotional movies with different contents. This study provides a base for pushing the naturalistic imaging paradigm towards clinical applications and proposes reliable target brain regions for future studies.

IJCAI Conference 2017 Conference Paper

Sampling for Approximate Maximum Search in Factorized Tensor

  • Zhi Lu
  • Yang Hu
  • Bing Zeng

Factorization models have been extensively used for recovering the missing entries of a matrix or tensor. However, directly computing all of the entries using the learned factorization models is prohibitive when the size of the matrix/tensor is large. On the other hand, in many applications, such as collaborative filtering, we are only interested in a few entries that are the largest among them. In this work, we propose a sampling-based approach for finding the top entries of a tensor which is decomposed by the CANDECOMP/PARAFAC model. We develop an algorithm to sample the entries with probabilities proportional to their values. We further extend it to make the sampling proportional to the $k$-th power of the values, amplifying the focus on the top ones. We provide theoretical analysis of the sampling algorithm and evaluate its performance on several real-world data sets. Experimental results indicate that the proposed approach is orders of magnitude faster than exhaustive computing. When applied to the special case of searching in a matrix, it also requires fewer samples than the other state-of-the-art method.