Author name cluster

Xu Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

50 papers

2 author rows

AAAI Conference 2026 Conference Paper

Ambiguity-Tolerant Cross-Modal Hashing with Partial Labels

Chao Su
Yanan Li
Xu Wang
Yingke Chen
Huiming Zheng
Dezhong Peng
Yuan Sun

Cross-modal hashing (CMH) has achieved remarkable success in large-scale cross-modal retrieval due to its low storage cost and high computational efficiency. However, most existing CMH methods rely on accurately annotated training data, which is often impractical in real-world applications due to the high cost and limited scalability of data annotation. In practice, annotators typically assign a candidate label set rather than a single precise label to each sample pair, resulting in partial labels with inherent ambiguity. Such ambiguous supervision poses significant challenges to conventional CMH methods that assume reliable and unambiguous labels. In this paper, we investigate a less-touched yet meaningful problem, i.e., cross-modal hashing with partial labels (PLCMH). PLCMH faces two major challenges: label ambiguity and modality-alignment barriers induced by misleading supervision. To address these issues, we propose a new approach named Ambiguity-Tolerant Cross-Modal Hashing (ATCH). Specifically, ATCH presents a Local Consensus Disambiguation (LCD) mechanism that resolves label ambiguity by effectively inferring stable and accurate label confidence based on local consensus within the Hamming space. Moreover, ATCH proposes a Confidence-Aware Contrastive Hashing (CACH) mechanism that derives both pseudo labels and trustworthiness scores from the label confidence vectors to learn discriminative hash codes, leading to effective modality alignment. Extensive experiments on three multimodal datasets demonstrate the superiority of ATCH.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Beyond Single-Speed Reasoning: Coordinating Fast and Slow Dynamics for Efficient World Modeling

Hongwei Wang
Yangru Huang
Guangyao Chen
Xu Wang
Yi Jin

Model-based reinforcement learning (MBRL) enables efficient decision-making by learning predictive world modelsof environment dynamics. Despite recent advances, existingmodels often struggle to reconcile accurate short-term transitions with coherent long-term planning, especially in partially observable or long-horizon settings. We argue that thislimitation often stems from modeling all transitions at a single temporal resolution, which makes it challenging to simultaneously capture fine-grained local dynamics and abstractglobal structures. To this end, we propose SF-RSSM (Slow-Fast Recurrent State-Space Model), a novel method that decouples short-term and long-term dynamics via a dualbranchdesign. The fast branch captures short-horizon transitions using residual prediction, while the slow branch models long-range dependencies with a GRU-based recurrent pathway.A distillation mechanism is developed to enable cooperationacross timescales, with the slow model providing soft targetsto guide the fast model. Additionally, a curiosity module encourages exploration by promoting learning in regions wherethe fast and slow branches exhibit divergent dynamics. Experiments on CARLA, DMControl and Atari benchmarks showthat SF-RSSM outperforms strong baselines in policy performance.

PDF Details DOI

AAAI Conference 2026 Conference Paper

DBGroup: Dual-Branch Point Grouping for Weakly Supervised 3D Semantic Instance Segmentation

Xuexun Liu
Xiaoxu Xu
Qiudan Zhang
Lin Ma
Xu Wang

Weakly supervised 3D instance segmentation is essential for 3D scene understanding, especially as the growing scale of data and high annotation costs associated with fully supervised approaches. Existing methods primarily rely on two forms of weak supervision: one-thing-one-click annotations and bounding box annotations, both of which aim to reduce labeling efforts. However, these approaches still encounter limitations, including labor-intensive annotation processes, high complexity, and reliance on expert annotators. To address these challenges, we propose DBGroup, a two-stage weakly supervised 3D instance segmentation framework that leverages scene-level annotations as a more efficient and scalable alternative. In the first stage, we introduce a Dual-Branch Point Grouping module to generate pseudo labels guided by semantic and mask cues extracted from multi-view images. To further improve label quality, we develop two refinement strategies: Granularity-Aware Instance Merging and Semantic Selection and Propagation. The second stage involves multi-round self-training on an end-to-end instance segmentation network using the refined pseudo-labels. Additionally, we introduce an Instance Mask Filter strategy to address inconsistencies within the pseudo labels. Extensive experiments demonstrate that DBGroup achieves competitive performance compared to sparse-point-level supervised 3D instance segmentation methods, while surpassing state-of-the-art scene-level supervised 3D semantic segmentation approaches.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Learning Beyond Domains: Misleading Prompts and Pseudo-Label Contrast for Text Domain Generalization

Qizhi Li
Xuyang Wang
Yingke Chen
Ming Yan
Dezhong Peng
Xi Peng
Xu Wang

Recent advancements in Pre-trained Language Models (PLMs) have significantly enhanced performance across various Natural Language Processing (NLP) tasks. However, the variability in data distributions across different domains presents challenges in generalizing these models to unseen domains. Domain generalization offers a promising solution, but existing text domain generalization methods typically rely on adversarial training to learn domain-invariant features, which often leads to models with high computational and memory overhead. To address this issue, this paper proposes a novel solution named Generalization via Prompts and Contrastive Learning (GenPromptCL) to enhance the generalization capability in domain generalization. GenPromptCL consists of two key components: Domain-Misleading Prompt Learning (DMPL) and Pseudo Label-based Contrastive Learning (PCL). Specifically, DMPL disrupts domain labels randomly, misleading the model into producing incorrect domain labels. This forces the model to learn domain-invariant features. Meanwhile, PCL generates pseudo labels within a single mini-batch, enabling the model to learn both intra-class and inter-class discriminative representations with low time and space complexity. Extensive experimental results demonstrate that GenPromptCL achieves state-of-the-art performance on three distinct text classification tasks (sentiment analysis, rumor detection, and natural language inference) while significantly improving model operation efficiency.

PDF Details DOI

EAAI Journal 2026 Journal Article

Multi-agent reinforcement learning tuned proportional-integral-derivative control with frequency-attentive finite impulse response filtering for robust seat vibration suppression under system parameter uncertainty

Yuli Zhao
Yihe Zhang
Xu Wang

Vehicle seat vibration strongly affects ride comfort. Its control becomes more challenging when biodynamic parameters of the human body, such as stiffness and damping, vary with posture and alter the vibration transmission characteristics of the seat-occupant system. To address this dynamic uncertainty, this paper proposes an artificial intelligence-based control framework for active seat vibration control, in which a multi-agent reinforcement learning tuned proportional-integral-derivative (PID) controller is developed based on the deep deterministic policy gradient (DDPG) algorithm. The multi-agent framework decomposes the coupled PID gain tuning task into three coordinated subtasks, thereby improving learning efficiency and training stability. To further improve comfort-oriented control performance, a finite impulse response (FIR) based reward formulation is introduced to emphasize the biodynamically sensitive low-frequency band relevant to human body vibration response. This design guides the controller to place greater emphasis on suppressing vibrations in the comfort-relevant frequency range while preserving adaptability to posture-induced parameter variation. Simulation results show that, compared with the RL-tuned PID controller without FIR-based reward formulation, the proposed framework reduces the root mean square (RMS) head displacement by 12% under sinusoidal excitation and by 16. 4% and 20. 3% under narrowband and wideband random excitations, respectively. Under impulse excitation, it also achieves a 98. 9% further magnitude reduction relative to conventional PID control. When the biodynamic parameters vary with posture, the performance degradation remains below 1%, indicating strong robustness of the learned controller to parameter variation. Overall, the proposed framework improves vibration suppression performance and maintains stable control under biodynamic parameter variation, indicating the potential of artificial intelligence-based active seat vibration control.

Details DOI

AAAI Conference 2026 Conference Paper

PointSLAM++: Robust Dense Neural Gaussian Point Cloud-based SLAM

Xu Wang
Boyao Han
Xiaojun Chen
Ying Liu
Ruihui Li

Real-time 3D reconstruction is crucial for robotics and augmented reality, yet current simultaneous localization and mapping(SLAM) approaches often struggle to maintain structural consistency and robust pose estimation in the presence of depth noise. This work introduces PointSLAM++, a novel RGB-D SLAM system that leverages a hierarchically constrained neural Gaussian representation to preserve structural relationships while generating Gaussian primitives for scene mapping. It also employs progressive pose optimization to mitigate depth sensor noise, significantly enhancing localization accuracy. Furthermore, it utilizes a dynamic neural representation graph that adjusts the distribution of Gaussian nodes based on local geometric complexity, enabling the map to adapt to intricate scene details in real time. This combination yields high-precision 3D mapping and photorealistic scene rendering. Experimental results show PointSLAM++ outperforms existing 3DGS-based SLAM methods in reconstruction accuracy and rendering quality, demonstrating its advantages for large-scale AR and robotics.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Semantic-Consistent Bidirectional Contrastive Hashing for Noisy Multi-Label Cross-Modal Retrieval

Likang Peng
Chao Su
Wenyuan Wu
Yuan Sun
Dezhong Peng
Xi Peng
Xu Wang

Cross-modal hashing (CMH) facilitates efficient retrieval across different modalities (e.g., image and text) by encoding data into compact binary representations. While recent methods have achieved remarkable performance, they often rely heavily on fully annotated datasets, which are costly and labor-intensive to obtain. In real-world scenarios, particularly in multi-label datasets, label noise is prevalent and severely degrades retrieval performance. Moreover, existing CMH approaches typically overlook the partial semantic overlaps inherent in multi-label data, limiting their robustness and generalization. To tackle these challenges, we propose a novel framework named Semantic-Consistent Bidirectional Contrastive Hashing (SCBCH). The framework comprises two complementary modules: (1) Cross-modal Semantic-Consistent Classification (CSCC), which leverages cross-modal semantic consistency to estimate sample reliability and reduce the impact of noisy labels; (2) Bidirectional Soft Contrastive Hashing (BSCH), which dynamically generates soft contrastive sample pairs based on multi-label semantic overlap, enabling adaptive contrastive learning between semantically similar and dissimilar samples across modalities. Extensive experiments on four widely-used cross-modal retrieval benchmarks validate the effectiveness and robustness of our method, consistently outperforming state-of-the-art approaches under noisy multi-label conditions.

PDF Details DOI

JBHI Journal 2026 Journal Article

Wavelet-Transformer Attention Network for Accurate Fetal ECG Estimation from Multi-Channel Abdominal Signals

Xu Wang
Zhaoshui He
Zhijie Lin
Yang Han
Shengli Xie

Accurate fetal electrocardiogram extraction from abdominal recordings remains challenging due to strong maternal electrocardiogram artifacts and low signal quality. To address these issues, a Wavelet-Transformer Attention Network (WTA-Net) is proposed for fetal electrocardiogram extraction, where the Cross-Attention Transformer (CAT) module is devised to suppress maternal interference by modeling cross-modal interactions, and the Residual Shrinkage (RS) module is designed to attenuate noise artifact through adaptive thresholding. Validation findings reveal that the proposed WTA-Net outperforms state-of-the-art methods, achieving positive predictive values of 99. 82% and 99. 87% for fetal QRS detection on the ADFECGDB and B2_LABOUR databases, respectively, further enhancing the reliability of prenatal monitoring.

Details DOI

IJCAI Conference 2025 Conference Paper

A Fast and Accurate ANN-SNN Conversion Algorithm with Negative Spikes

Xu Wang
Dongchen Zhu
Jiamao Li

Spiking neural network (SNN) is an event-driven neural network that can greatly reduce the power consumption of the conventional artificial neural networks (ANN). Many ANN models can be converted to SNN models when the activation function is ReLU. For ANN models with other activation functions, such as the Leaky ReLU function, the converted SNN models either suffer from serious accuracy degradation or require a long time step. In this paper, we propose a fast and accurate ANN-SNN conversion algorithm for models with the Leaky ReLU function. We design a novel neuron model that supports negative spikes. To address the problem of long tail distribution in the activation values, we propose a threshold optimization algorithm based on the variance of the activation values. To avoid the problem of error accumulation, we jointly calibrate all layers in the SNN model with adaptive weighting. Experiment results verify the effectiveness of the proposed algorithm.

PDF Details DOI

EAAI Journal 2025 Journal Article

A generative design method of airfoil based on conditional variational autoencoder

Xu Wang
Weiqi Qian
Tun Zhao
Hai Chen
Lei He
Haisheng Sun
Yuan Tian

The challenges in multi-objective and multi-dimensional optimization design of airfoils, marked by prolonged optimization cycles and low accuracy, call for an efficient solution to expedite airfoil design. This study presents an innovative airfoil generative design model based on a conditional variational autoencoder (CVAE). Initially, to overcome the limitation of insufficient training data, the model leverages the variational autoencoder (VAE) to learn the spatial distribution of University of Illinois at Urbana-Champaign (UIUC) airfoils, enabling the generation of a diverse set of airfoils with similar distributions. Subsequently, two CVAE-based airfoil generation models, the airfoil freedom design model and the airfoil precision design model, are proposed, which can realize diverse airfoil design under different conditions, such as shape and aerodynamic conditions. Furthermore, two measurements of roughness and diversity are introduced to evaluate the quality of the generated airfoils. The impact of different conditions and network parameters on the model’s generation performance is thoroughly analyzed. Results indicate that our proposed model achieves a 65% lower error compared to physics-guided conditional Wasserstein generative adversarial networks (PG-cWGAN) when generating airfoils that satisfy a specific lift coefficient and a 99. 99% lower error compared to airfoil pressure distributions generative adversarial networks (Airfoil-Cp-GAN) when generating airfoils that satisfy specific pressure distributions. This method introduces a more creative and accurate approach for aircraft designers in the realm of airfoil design. The code used for this paper is available at https: //github. com/liujun39/airfoilvae.

Details DOI

ECAI Conference 2025 Conference Paper

AIRES: A General Framework for Efficient Intrinsic Rewards Based on Attention Mechanisms

Xin Liu
Jie Tan
Li Shen
Xu Wang
Guoli Wu
Xiaoguang Ren
Huadong Dai

Efficient exploration in high-dimensional observation spaces remains a critical challenge in deep reinforcement learning, particularly in scenarios with sparse extrinsic rewards. A promising approach is to encourage exploration by estimating intrinsic rewards based on the novelty of observations. However, there is a gap between the observed novelty and the actual effectiveness of exploration, as both environmental stochasticity and the agent’s actions may influence observations. To accurately evaluate the novelty contributed by agent exploration in intrinsic rewards, we propose the AIRES (Attention-driven Intrinsic Reward for Exploration Strategy) framework. AIRES leverages the attention mechanisms to analyze the relationship within trajectory sequences generated by agent-environment interactions, employing attention weights to quantify the relevance of observations to actions. By applying attention weights to intrinsic rewards, the novelty brought by agent exploration is enhanced and the impact of environmental stochasticity is reduced. Extensive experiments demonstrate that AIRES significantly enhances the performance of prominent intrinsic reward methods, establishing it as a robust and scalable solution for efficient exploration.

Details

AAAI Conference 2025 Conference Paper

DiCA: Disambiguated Contrastive Alignment for Cross-Modal Retrieval with Partial Labels

Chao Su
Huiming Zheng
Dezhong Peng
Xu Wang

Cross-modal retrieval aims to retrieve relevant data across different modalities. Driven by costly massive labeled data, existing cross-modal retrieval methods achieve encouraging results. To reduce annotation costs while maintaining performance, this paper focuses on an untouched but challenging problem, i.e., cross-modal retrieval with partial labels (PLCMR). PLCMR faces the dual challenges of annotation ambiguity and modality gap. To address these challenges, we propose a novel method termed disambiguated contrastive alignment (DiCA) for cross-modal retrieval with partial labels. Specifically, DiCA proposes a novel non-candidate boosted disambiguation learning mechanism (NBDL), which elaborately balances the trade-off between the losses on candidate and non-candidate labels that eliminate label ambiguity and narrow the modality gap. Moreover, DiCA presents an instance-prototype representation learning mechanism (IPRL) to enhance the model by further eliminating the modality gap at both the instance and prototype levels. Thanks to NBDL and IPRL, our DiCA effectively addresses the issues of annotation ambiguity and modality gap for cross-modal retrieval with partial labels. Experiments on four benchmarks validate the effectiveness of our proposed method, which demonstrates enhanced performance over existing state-of-the-art methods.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Drawing Informative Gradients from Sources: A One-stage Transfer Learning Framework for Cross-city Spatiotemporal Forecasting

Yudong Zhang
Xu Wang
Xuan Yu
Zhaoyang Sun
Kai Wang
Yang Wang

Spatiotemporal forecasting (STF) is pivotal in urban computing, yet data scarcity in developing cities hampers robust model training. Addressing this, recent studies leverage transfer learning to migrate knowledge from data-rich (source) to data-poor (target) cities. This strategy, while effective, faces challenges as pre-trained models risk absorbing noise and harmful information due to data distribution disparities, potentially undermining the accuracy of forecasts for target cities. To address this issue, we propose a one-stage STF framework named Target-Skewed Joint Training (TSJT). Central to TSJT is a novel Target-Skewed Backward training strategy that selectively refines gradients from source city data, preserving only the elements that positively impact the target city. To further enhance the quality of these gradients, we have designed a Node Prompting Module (NPM). TSJT is crafted for seamless integration with existing STF models, endowing them with the capability to efficiently tackle challenges stemming from data scarcity. Experimental results on several real-world datasets from multiple cities substantiate the efficacy of TSJT in the realm of cross-city transfer learning.

PDF Details DOI

ICLR Conference 2025 Conference Paper

Efficient and Robust Neural Combinatorial Optimization via Wasserstein-Based Coresets

Xu Wang
Fuyou Miao 0001
Wenjie Liu 0008
Yan Xiong 0001

Combinatorial optimization (CO) is a fundamental tool in many fields. Many neural combinatorial optimization (NCO) methods have been proposed to solve CO problems. However, existing NCO methods typically require significant computational and storage resources, and face challenges in maintaining robustness to distribution shifts between training and test data. To address these issues, we model CO instances into probability measures, and introduce Wasserstein-based metrics to quantify the difference between CO instances. We then leverage a popular data compression technique, \emph{coreset}, to construct a small-size proxy for the original large dataset. However, the time complexity of constructing a coreset is linearly dependent on the size of the dataset. Consequently, it becomes challenging when datasets are particularly large. Further, we accelerate the coreset construction by adapting it to the merge-and-reduce framework, enabling parallel computing. Additionally, we prove that our coreset is a good representation in theory. {Subsequently}, to speed up the training process for existing NCO methods, we propose an efficient training framework based on the coreset technique. We train the model on a small-size coreset rather than on the full dataset, and thus save substantial computational and storage resources. Inspired by hierarchical Gonzalez’s algorithm, our coreset method is designed to capture the diversity of the dataset, which consequently improves robustness to distribution shifts. Finally, experimental results demonstrate that our training framework not only enhances robustness to distribution shifts but also achieves better performance with reduced resource requirements.

Details

AAAI Conference 2025 Conference Paper

Fast Omni-Directional Image Super-Resolution: Adapting the Implicit Image Function with Pixel and Semantic-Wise Spherical Geometric Priors

Xuelin Shen
Yitong Wang
Silin Zheng
Kang Xiao
Wenhan Yang
Xu Wang

In the context of Omni-Directional Image (ODI) Super-Resolution (SR), the unique challenge arises from the non-uniform oversampling characteristics caused by EquiRectangular Projection (ERP). Considerable efforts in designing complex spherical convolutions or polyhedron reprojection offer significant performance improvements but at the expense of cumbersome processing procedures and slower inference speeds. Under these circumstances, this paper proposes a new ODI-SR model characterized by its capacity to perform Fast and Arbitrary-scale ODI-SR processes, denoted as FAOR. The key innovation lies in adapting the implicit image function from the planar image domain to the ERP image domain by incorporating spherical geometric priors at both the latent representation and image reconstruction stages, in a low-overhead manner. Specifically, at the latent representation stage, we adopt a pair of pixel-wise and semantic-wise sphere-to-planar distortion maps to perform affine transformations on the latent representation, thereby incorporating it with spherical properties. Moreover, during the image reconstruction stage, we introduce a geodesic-based resampling strategy, aligning the implicit image function with spherical geometrics without introducing additional parameters. As a result, the proposed FAOR outperforms the state-of-the-art ODI-SR models with a much faster inference speed. Extensive experimental results and ablation studies have demonstrated the effectiveness of our design.

PDF Details DOI

AAAI Conference 2025 Conference Paper

GCD: Advancing Vision-Language Models for Incremental Object Detection via Global Alignment and Correspondence Distillation

Xu Wang
Zilei Wang
Zihan Lin

Incremental object detection (IOD) is a challenging task that requires detection models to continuously learn from newly arriving data. This work focuses on incremental learning for vision-language detectors (VLDs), an under explored domain. Existing research typically adopts a local alignment paradigm to avoid label conflicts, where different tasks are learned separately without interaction. However, we reveal that this practice fails to effectively preserve the semantic structure. Specifically, aligned relationships between objects and texts would collapse when handling novel categories, ultimately leading to catastrophic forgetting. Though knowledge distillation (KD) is a common approach for tackling this, traditional KD performs poorly when directly applied to VLDs, as for different phases, a natural knowledge gap exists in both encoding and decoding processes. To address above issues, we propose a novel method called Global alignment and Correspondence Distillation (GCD). Differently, we first integrate knowledge across phases within the same embedding space to construct global semantic structure. We then enable effective knowledge distillation in VLDs through a semantic correspondence mechanism, ensuring consistent proposal generation and decoding. On the top of that, we distill teacher model’s informative predictions and topological relationships to maintain stable local semantic structure. Extensive experiments on COCO 2017 demonstrate that our method significantly outperforms existing approaches, achieving new state-of-the-art in various IOD scenarios.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Less but More: Linear Adaptive Graph Learning Empowering Spatiotemporal Forecasting

Jiaming Ma
Binwu Wang
Guanjun Wang
Kuo Yang
Zhengyang Zhou
Pengkun Wang
Xu Wang
Yang Wang

The effectiveness of Spatiotemporal Graph Neural Networks (STGNNs) critically hinges on the quality of the underlying graph topology. While end-to-end adaptive graph learning methods have demonstrated promising results in capturing latent spatiotemporal dependencies, they often suffer from high computational complexity and limited expressive capacity. In this paper, we propose MAGE for efficient spatiotemporal forecasting. We first conduct a theoretical analysis demonstrating that the ReLU activation function employed in existing methods amplifies edge-level noise during graph topology learning, thereby compromising the fidelity of the learned graph structures. To enhance model expressiveness, we introduce a sparse yet balanced mixture-of-experts strategy, where each expert perceives the unique underlying graph through kernel-based functions and operates with linear complexity relative to the number of nodes. The sparsity mechanism ensures that each node interacts exclusively with compatible experts, while the balancing mechanism promotes uniform activation across all experts, enabling diverse and adaptive graph representations. Furthermore, we theoretically establish that a single graph convolution using the learned graph in MAGE is mathematically equivalent to multiple convolutional steps under conventional graphs. We evaluate MAGE against advanced baselines on multiple real-world spatiotemporal datasets. MAGE achieves competitive performance while maintaining strong computational efficiency.

PDF Details

JBHI Journal 2025 Journal Article

Multi-Gate Mixture of Multi-View Graph Contrastive Learning on Electronic Health Record

Yu Cao
Qian Wang
Xu Wang
Dezhong Peng
Peilin Li

Electronic Health Record (EHR) is the digital form of patient visits that contains various medical data, including diagnosis, treatment, and lab events. Representation learning of EHR with deep learning methods has been beneficial for patient-related prediction tasks. Recently, studies have focused on revealing the inherent graph structure between medical events in EHR. Graph neural network (GNN) methods are prevalent and perform well in various prediction tasks. However, the inherent relationships between various medical events must be marked, which is complicated and time-consuming. Most research works adopt the straightforward structure of GNN models on a single prediction task which could not fully exploit the potential of EHR representations. Compared with previous work, the multi-task prediction could utilize the latent information of concealed correlations between different prediction tasks. In addition, self-contrastive learning on graphs could improve the representation learned by GNN. We propose a multi-gate mixture of multi-view graph contrastive learning (MMMGCL) method, aiming to get a more reasonable EHR representation and improve the performances of downstream tasks. First, each patient visit is represented as a graph with a well-designed hierarchically fully-connected pattern. Second, node features in the manually constructed graph are pre-trained via the Glove method with hierarchical ontology knowledge. Finally, MMMGCL processes the pre-trained graph and adopts a joint learning strategy to simultaneously optimize task and contrastive losses. We verify our method on two large open-source medical datasets, Medical Information Mart for Intensive Care (MIMIC-III) and the eICU Collaborative Research Database (eICU). Experiment results show that our method could improve performance compared to straightforward graph-based methods on prediction tasks of patient readmission, mortality, and length of stay.

Details DOI

EAAI Journal 2025 Journal Article

Multisource aerodynamic data reconstruction method using an enhanced multifidelity neural network

Xu Wang
Huailu Li
Haitao Lin
Hui Tang
Weiwei Zhang

The acquisition of aircraft aerodynamic pressure distribution typically relies on wind tunnel tests or numerical simulations. However, discrepancies in accuracy and efficiency between multisource data present challenges for aerodynamic data reconstruction. In the present work, we propose a novel multifidelity architecture to enhance the model’s applicability to nonlinear inconsistency problems commonly encountered in transonic aerodynamic problems. By incorporating difference operations into the multifidelity neural network, the model can adaptively find suitable mapping relationships from potential low fidelity data. This method can reduce modeling errors when there is a trend inconsistencies between high and low fidelity data, which is a challenge that traditional multifidelity models struggle to address. To demonstrate the efficiency of our proposed ideas, we conducted multifidelity modeling on classic numerical examples and aerodynamic cases. Predictive results indicate that inconsistencies between multifidelity data can significantly affect traditional models, whereas the proposed multifidelity approach can enhance generalization performance. The reconstruction results of transonic pressure distribution for airfoils and the Office National d’Études et de Recherches Aérospatiales (ONERA) M6 wing indicate that the enhanced multifidelity model can effectively capture shock wave locations, thereby improving modeling accuracy. Analysis of the reconstruction results for pressure distribution indicates that the proposed method can reduce reconstruction error by over 30% compared to deep neural networks and multifidelity neural network. This method is also applicable for the data fusion of experimental and simulation data in various engineering problems.

Details DOI

NeurIPS Conference 2025 Conference Paper

Neighbor-aware Contrastive Disambiguation for Cross-Modal Hashing with Redundant Annotations

Chao Su
Likang Peng
Yuan Sun
Dezhong Peng
Xi Peng
Xu Wang

Cross-modal hashing aims to efficiently retrieve information across different modalities by mapping data into compact hash codes. However, most existing methods assume access to fully accurate supervision, which rarely holds in real-world scenarios. In fact, annotations are often redundant, i. e. , each sample is associated with a set of candidate labels that includes both ground-truth labels and redundant noisy labels. Treating all annotated labels as equally valid introduces two critical issues: (1) the sparse presence of true labels within the label set is not explicitly addressed, leading to overfitting on redundant noisy annotations; (2) redundant noisy labels induce spurious similarities that distort semantic alignment across modalities and degrade the quality of the hash space. To address these challenges, we propose that effective cross-modal hashing requires explicitly identifying and leveraging the true label subset within all candidate annotations. Based on this insight, we present Neighbor-aware Contrastive Disambiguation (NACD), a novel framework designed for robust learning under redundant supervision. NACD consists of two key components. The first, Neighbor-aware Confidence Reconstruction (NACR), refines label confidence by aggregating information from cross-modal neighbors to distinguish true labels from redundant noisy ones. The second, Class-aware Robust Contrastive Hashing (CRCH), constructs reliable positive and negative pairs based on label confidence scores, thereby significantly enhancing robustness against noisy supervision. Moreover, to effectively reduce the quantization error, we incorporate a quantization loss that enforces binary constraints on the learned hash representations. Extensive experiments conducted on three large-scale multimodal benchmarks demonstrate that our method consistently outperforms state-of-the-art approaches, thereby establishing a new standard for cross-modal hashing with redundant annotations. Code is available at https: //github. com/Rose-bud/NACD.

PDF Details

NeurIPS Conference 2025 Conference Paper

ReinAD: Towards Real-world Industrial Anomaly Detection with a Comprehensive Contrastive Dataset

Xu Wang
Jingyuan Zhuo
Zhiyuan You
Zhiyu Tan
Yikuan Yu
Siyu Wang
Xinyi Le

Recent years have witnessed significant advancements in industrial anomaly detection (IAD) thanks to existing anomaly detection datasets. However, the large performance gap between these benchmarks and real industrial practice reveals critical limitations in existing datasets. We argue that the mismatch between current datasets and real industrial scenarios becomes the primary barrier to practical IAD deployment. To this end, we propose ReinAD dataset, a comprehensive contrastive dataset towards Real-world industrial Anomaly Detection. Our dataset prioritizes three critical real-world requirements: 1) Contrast-based anomaly definition that is essential for industrial practice, 2) Fine-grained unaligned image pairs reflecting real inspections, and 3) Large-scale data from active production lines spanning multiple industrial categories. Based on our dataset, we introduce the ReinADNet. It takes both normal reference and test images as inputs, achieving anomaly detection through normal-anomaly comparison. To address the fine-grained and unaligned properties of real industrial scenes, our method integrates pyramidal similarity aggregation for comprehensive anomaly characterization and global-local feature fusion for spatial misalignment tolerance. Our method outperforms all baselines on the ReinAD dataset (e. g. , 64. 5% v. s. 59. 5% in 1-shot image-level AP) under all settings. Extensive experiments across several datasets demonstrate our dataset's challenging nature and our method's superior generalization. This work provides a solid foundation for practical industrial anomaly detection. Dataset and code are available at https: //tocmac. github. io/ReinAD.

PDF Details

AAAI Conference 2025 Conference Paper

RoDA: Robust Domain Alignment for Cross-Domain Retrieval Against Label Noise

Ziniu Yin
Yanglin Feng
Ming Yan
Xiaomin Song
Dezhong Peng
Xu Wang

This paper studies the complex challenge of cross-domain image retrieval under the condition of noisy labels (NCIR), a scenario that not only includes the inherent obstacles of traditional cross-domain image retrieval (CIR) but also requires alleviating the adverse effects of label noise. To address this challenge, this paper introduces a novel Robust Domain Alignment framework (RoDA), specifically designed for the NCIR task. At the heart of RoDA is the Selective Division and Adaptive Learning mechanism (SDAL), a key component crafted to shield the model from overfitting the noisy labels. SDAL effectively learns discriminative knowledge by dividing the dataset into clean and noisy parts, subsequently rectifying the labels for the latter based on information drawn from the clean one. This process involves adaptively weighting the relabeled samples and leveraging both the clean and relabeled data to bootstrap model training. Moreover, to bridge the domain gap further, we introduce the Accumulative Class Center Alignment (ACCA), a novel approach that fosters domain alignment through an accumulative domain loss mechanism.Thanks to SDAL and ACCA, our RoDA demonstrates its superiority in overcoming label noise and domain discrepancies within the NCIR paradigm. The effectiveness and robustness of our RoDA framework are comprehensively validated through extensive experiments across three multi-domain benchmarks.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

Time-Frequency Disentanglement Boosted Pre-Training: A Universal Spatio-Temporal Modeling Framework

Yudong Zhang
Zhaoyang Sun
Xu Wang
Xuan Yu
Kai Wang
Yang Wang

Current spatio-temporal modeling techniques largely rely on the abundant data and the design of task-specific models. However, many cities lack well-established digital infrastructures, making data scarcity and the high cost of model development significant barriers to application deployment. Therefore, this work aims to enable spatio-temporal learning to cope with the problems of few-shot data modeling and model generalizability. To this end, we propose a Universal Spatio-Temporal Correlationship pre-training framework (USTC), for spatio-temporal modeling across different cities and tasks. To enhance the spatio-temporal representations during pre-training, we propose to decouple the time-frequency patterns within data, and leverage contrastive learning to maintain the time-frequency consistency. To further improve the adaptability to downstream tasks, we design a prompt generation module to mine personalized spatio-temporal patterns on the target city, which can be integrated with the learned common spatio-temporal representations to collaboratively serve downstream tasks. Extensive experiments conducted on real-world datasets demonstrate that USTC significantly outperforms the advanced baselines in forecasting, imputation, and extrapolation across cities.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation

Bowen Chen
Brynn zhao
Haomiao Sun
Li Chen
Xu Wang
Daniel Du
Xinglong Wu

Achieving fine-grained control over subject identity and semantic attributes (pose, style, lighting) in text-to-image generation, particularly for multiple subjects, often undermines the editability and coherence of Diffusion Transformers (DiTs). Many approaches introduce artifacts or suffer from attribute entanglement. To overcome these challenges, we propose a novel multi-subject controlled generation model XVerse. By transforming reference images into offsets for token-specific text-stream modulation, XVerse allows for precise and independent control for specific subject without disrupting image latents or features. Consequently, XVerse offers high-fidelity, editable multi-subject image synthesis with robust control over individual subject characteristics and semantic attributes. This advancement significantly improves personalized and complex scene generation capabilities.

PDF Details

TIST Journal 2024 Journal Article

A Survey on Evaluation of Large Language Models

Yupeng Chang
Xu Wang
Jindong Wang
Yuan Wu
Linyi Yang
Kaijie Zhu
Hao Chen
Xiaoyuan Yi

Large language models (LLMs) are gaining increasing popularity in both academia and industry, owing to their unprecedented performance in various applications. As LLMs continue to play a vital role in both research and daily use, their evaluation becomes increasingly critical, not only at the task level, but also at the society level for better understanding of their potential risks. Over the past years, significant efforts have been made to examine LLMs from various perspectives. This paper presents a comprehensive review of these evaluation methods for LLMs, focusing on three key dimensions: what to evaluate, where to evaluate, and how to evaluate. Firstly, we provide an overview from the perspective of evaluation tasks, encompassing general natural language processing tasks, reasoning, medical usage, ethics, education, natural and social sciences, agent applications, and other areas. Secondly, we answer the ‘where’ and ‘how’ questions by diving into the evaluation methods and benchmarks, which serve as crucial components in assessing the performance of LLMs. Then, we summarize the success and failure cases of LLMs in different tasks. Finally, we shed light on several future challenges that lie ahead in LLMs evaluation. Our aim is to offer invaluable insights to researchers in the realm of LLMs evaluation, thereby aiding the development of more proficient LLMs. Our key point is that evaluation should be treated as an essential discipline to better assist the development of LLMs. We consistently maintain the related open-source materials at: https://github.com/MLGroupJLU/LLM-eval-survey

Details DOI

EAAI Journal 2024 Journal Article

An estimation method for multidimensional urban street walkability based on panoramic semantic segmentation and domain adaptation

Jiaxuan Li
Xuan Zhang
Linyu Li
Xu Wang
Jing Cheng
Chen Gao
Jun Ling

Urban walkability is a critical aspect of urban planning. Since the traditional measurement methods constrained by cost, time, and scalability researchers have turned to computer-assisted audits based on Panoramic Street View Images (PSVIs). However overlook image distortion and data annotation issues, impacting predictive accuracy. Current evaluations often lack a holistic approach, making comparison challenging. In response, a multidimensional evaluation approach is proposed through three indices: street red quality, physical walkability, and perceived walkability. To enhance accuracy, a Transformer-based Doubly Deformable Panoramic semantic segmentation Network (TDDPassNet) is introduced to calculate key metrics informing ecological quality and spatial layout evaluations. An unsupervised domain adaptation method is proposed for insufficient labeled data. Furthermore, a Geographic Information System (GIS) analysis was conducted to assess the physical walkability index. Human–machine adversarial technology and a random forest model evaluate the perceived walkability index. A comprehensive evaluation framework is presented using the Analytic Hierarchy Process to assign weights to assessment indices across the three dimensions. A case study was conducted in Lijiang City, China, to demonstrate the practical application of the methodology. Extensive experiments are conducted, TDDPassNet exhibits an average increase of 5. 3% in mIoU across diverse datasets compared to the prevailing models. This study evaluated 47, 758 sampling sites, providing insights into urban planning and development in similar contexts

Details DOI

AAAI Conference 2024 Conference Paper

AvatarVerse: High-Quality & Stable 3D Avatar Creation from Text and Pose

Huichao Zhang
Bowen Chen
Hao Yang
Liao Qu
Xu Wang
Li Chen
Chao Long
Feida Zhu

Creating expressive, diverse and high-quality 3D avatars from highly customized text descriptions and pose guidance is a challenging task, due to the intricacy of modeling and texturing in 3D that ensure details and various styles (realistic, fictional, etc). We present AvatarVerse, a stable pipeline for generating expressive high-quality 3D avatars from nothing but text descriptions and pose guidance. In specific, we introduce a 2D diffusion model conditioned on DensePose signal to establish 3D pose control of avatars through 2D images, which enhances view consistency from partially observed scenarios. It addresses the infamous Janus Problem and significantly stablizes the generation process. Moreover, we propose a progressive high-resolution 3D synthesis strategy, which obtains substantial improvement over the quality of the created 3D avatars. To this end, the proposed AvatarVerse pipeline achieves zero-shot 3D modeling of 3D avatars that are not only more expressive, but also in higher quality and fidelity than previous works. Rigorous qualitative evaluations and user studies showcase AvatarVerse's superiority in synthesizing high-fidelity 3D avatars, leading to a new standard in high-quality and stable 3D avatar creation. Our project page is: https://avatarverse3d.github.io/.

PDF Details DOI

EAAI Journal 2024 Journal Article

CFENet: Cost-effective underwater image enhancement network via cascaded feature extraction

Xun Ji
Xu Wang
Li-Ying Hao
Cheng-Tao Cai

Due to the inevitable light absorption and scattering, underwater images always suffer from severe quality degradation, leading to significant performance decline for various maritime engineering-related applications. In recent years, deep neural networks (DNNs) have been proven to achieve high-quality enhancement of underwater images, which aims to extract and learn abstract features to exhibit superior performance. However, existing solutions often stack multilayer processing units to enrich features, which not only significantly burdens computational load but also encounters difficulties in fully utilizing and interacting between features at different depths. To address the above problems, this paper presents a cost-effective underwater image enhancement network via cascaded feature extraction, termed CFENet. Specifically, we develop a cost-effective cascaded structure for sufficient feature extraction and multiscale establishment of pixel-level long-range dependencies. In addition, we construct a dual-branch structure for effective feature fusion, so that the detailed texture and semantic information in the image can be simultaneously enhanced. Extensive experiments reveal the superiority of our proposed CFENet in both underwater image enhancement effects and computational complexity. Sufficient ablation study is conducted to demonstrate the effectiveness of each component in the network.

Details DOI

AAAI Conference 2024 Conference Paper

DiDA: Disambiguated Domain Alignment for Cross-Domain Retrieval with Partial Labels

Haoran Liu
Ying Ma
Ming Yan
Yingke Chen
Dezhong Peng
Xu Wang

Driven by generative AI and the Internet, there is an increasing availability of a wide variety of images, leading to the significant and popular task of cross-domain image retrieval. To reduce annotation costs and increase performance, this paper focuses on an untouched but challenging problem, i.e., cross-domain image retrieval with partial labels (PCIR). Specifically, PCIR faces great challenges due to the ambiguous supervision signal and the domain gap. To address these challenges, we propose a novel method called disambiguated domain alignment (DiDA) for cross-domain retrieval with partial labels. In detail, DiDA elaborates a novel prototype-score unitization learning mechanism (PSUL) to extract common discriminative representations by simultaneously disambiguating the partial labels and narrowing the domain gap. Additionally, DiDA proposes a prototype-based domain alignment mechanism (PBDA) to further bridge the inherent cross-domain discrepancy. Attributed to PSUL and PBDA, our DiDA effectively excavates domain-invariant discrimination for cross-domain image retrieval. We demonstrate the effectiveness of DiDA through comprehensive experiments on three benchmarks, comparing it to existing state-of-the-art methods. Code available: https://github.com/lhrrrrrr/DiDA.

PDF Details DOI

EAAI Journal 2024 Journal Article

Electric bikes charging anomaly detection from alternating current side based on big data

Fang Yang
Yang Yang
Xu Wang
Xin Ouyang
Chunyan Shuai

With the widespread use of electric bikes (E-bikes), charging safety incidents occur frequently, even causing serious hazards. However, detecting and warning of unsafe charging from the E-bikes side is a challenge due to the lack of full-featured battery management systems and communication means vehicles and chargers in majority E-bikes. Aim for this, a diagnosis scheme is proposed to detect E-bikes’ abnormal charging from the alternating current (AC) side of the charging pile. Initially, 91, 282 charging records are collected from charging piles to analyze the correlations between the current features and the battery working principle, charging mode, and user behavior in depth. Then, ten current features and six feature sequences are formulated, and two algorithms based on the first-order difference and pattern-matching are proposed to recognize and extracted these features and feature sequences. A feature-based random forest model is presented to identify the abnormal charging. Empirical studies show that the anomaly recognition performance of the proposed framework exceeds that of the baselines, achieving a recognition precision of 0. 89 and an F1-score of 0. 86. The application of this scheme can provide early warning for unsafe charging from the charging pile side without modification of the existing E-bikes, and can be extended to diagnose the charging safety of other battery-powered system, such as electric vehicles. Meanwhile, the analysis of internal and external factors that lead to abnormal charging is beneficial for charging operation companies and government to develop charging security specifications and regulate charging behaviors.

Details DOI

NeurIPS Conference 2024 Conference Paper

Get Rid of Isolation: A Continuous Multi-task Spatio-Temporal Learning Framework

Zhongchao Yi
Zhengyang Zhou
Qihe Huang
Yanjiang Chen
Liheng Yu
Xu Wang
Yang Wang

Spatiotemporal learning has become a pivotal technique to enable urban intelligence. Traditional spatiotemporal models mostly focus on a specific task by assuming a same distribution between training and testing sets. However, given that urban systems are usually dynamic, multi-sourced with imbalanced data distributions, current specific task-specific models fail to generalize to new urban conditions and adapt to new domains without explicitly modeling interdependencies across various dimensions and types of urban data. To this end, we argue that there is an essential to propose a Continuous Multi-task Spatio-Temporal learning framework (CMuST) to empower collective urban intelligence, which reforms the urban spatiotemporal learning from single-domain to cooperatively multi-dimensional and multi-task learning. Specifically, CMuST proposes a new multi-dimensional spatiotemporal interaction network (MSTI) to allow cross-interactions between context and main observations as well as self-interactions within spatial and temporal aspects to be exposed, which is also the core for capturing task-level commonality and personalization. To ensure continuous task learning, a novel Rolling Adaptation training scheme (RoAda) is devised, which not only preserves task uniqueness by constructing data summarization-driven task prompts, but also harnesses correlated patterns among tasks by iterative model behavior modeling. We further establish a benchmark of three cities for multi-task spatiotemporal learning, and empirically demonstrate the superiority of CMuST via extensive evaluations on these datasets. The impressive improvements on both few-shot streaming data and new domain tasks against existing SOAT methods are achieved. Code is available at https: //github. com/DILab-USTCSZ/CMuST.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

LESS: Label-Efficient and Single-Stage Referring 3D Segmentation

Xuexun Liu
Xiaoxu Xu
Jinlong Li
Qiudan Zhang
Xu Wang
Nicu Sebe
Lin Ma

Referring 3D Segmentation is a visual-language task that segments all points of the specified object from a 3D point cloud described by a sentence of query. Previous works perform a two-stage paradigm, first conducting language-agnostic instance segmentation then matching with given text query. However, the semantic concepts from text query and visual cues are separately interacted during the training, and both instance and semantic labels for each object are required, which is time consuming and human-labor intensive. To mitigate these issues, we propose a novel Referring 3D Segmentation pipeline, Label-Efficient and Single-Stage, dubbed LESS, which is only under the supervision of efficient binary mask. Specifically, we design a Point-Word Cross-Modal Alignment module for aligning the fine-grained features of points and textual embedding. Query Mask Predictor module and Query-Sentence Alignment module are introduced for coarse-grained alignment between masks and query. Furthermore, we propose an area regularization loss, which coarsely reduces irrelevant background predictions on a large scale. Besides, a point-to-point contrastive loss is proposed concentrating on distinguishing points with subtly similar features. Through extensive experiments, we achieve state-of-the-art performance on ScanRefer dataset by surpassing the previous methods about 3. 7% mIoU using only binary labels. Code is available at https: //github. com/mellody11/LESS.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

NegativePrompt: Leveraging Psychology for Large Language Models Enhancement via Negative Emotional Stimuli

Xu Wang
Cheng Li
Yi Chang
Jindong Wang
Yuan Wu

Large Language Models (LLMs) have become integral to a wide spectrum of applications, ranging from traditional computing tasks to advanced artificial intelligence (AI) applications. This widespread adoption has spurred extensive research into LLMs across various disciplines, including the social sciences. Notably, studies have revealed that LLMs possess emotional intelligence, which can be further developed through positive emotional stimuli. This discovery raises an intriguing question: can negative emotions similarly influence LLMs, potentially enhancing their performance? In response to this question, we introduce NegativePrompt, a novel approach underpinned by psychological principles, involving ten specifically designed negative emotional stimuli. We embark on rigorous experimental evaluations of five LLMs including Flan-T5-Large, Vicuna, Llama 2, ChatGPT, and GPT-4, across a set of 45 tasks. The results are revealing: NegativePrompt markedly enhances the performance of LLMs, evidenced by relative improvements of 12. 89% in Instruction Induction tasks and 46. 25% in BIG-Bench tasks. Moreover, we conduct attention visualization experiments to decipher the underlying mechanisms of NegativePrompt's influence. Our research contributes significantly to the understanding of LLMs and emotion interaction, demonstrating the practical efficacy of NegativePrompt as an emotion-driven method and offering novel insights for the enhancement of LLMs in real-world applications. The code is available at https: //github. com/wangxu0820/NegativePrompt.

PDF Details DOI

AAAI Conference 2024 Conference Paper

NondBREM: Nondeterministic Offline Reinforcement Learning for Large-Scale Order Dispatching

Hongbo Zhang
Guang Wang
Xu Wang
Zhengyang Zhou
Chen Zhang
Zheng Dong
Yang Wang

One of the most important tasks in ride-hailing is order dispatching, i.e., assigning unserved orders to available drivers. Recent order dispatching has achieved a significant improvement due to the advance of reinforcement learning, which has been approved to be able to effectively address sequential decision-making problems like order dispatching. However, most existing reinforcement learning methods require agents to learn the optimal policy by interacting with environments online, which is challenging or impractical for real-world deployment due to high costs or safety concerns. For example, due to the spatiotemporally unbalanced supply and demand, online reinforcement learning-based order dispatching may significantly impact the revenue of the ride-hailing platform and passenger experience during the policy learning period. Hence, in this work, we develop an offline deep reinforcement learning framework called NondBREM for large-scale order dispatching, which learns policy from only the accumulated logged data to avoid costly and unsafe interactions with the environment. In NondBREM, a Nondeterministic Batch-Constrained Q-learning (NondBCQ) module is developed to reduce the algorithm extrapolation error and a Random Ensemble Mixture (REM) module that integrates multiple value networks with multi-head networks is utilized to improve the model generalization and robustness. Extensive experiments on large-scale real-world ride-hailing datasets show the superiority of our design.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Towards Dynamic Spatial-Temporal Graph Learning: A Decoupled Perspective

Binwu Wang
Pengkun Wang
Yudong Zhang
Xu Wang
Zhengyang Zhou
Lei Bai
Yang Wang

With the progress of urban transportation systems, a significant amount of high-quality traffic data is continuously collected through streaming manners, which has propelled the prosperity of the field of spatial-temporal graph prediction. In this paper, rather than solely focusing on designing powerful models for static graphs, we shift our focus to spatial-temporal graph prediction in the dynamic scenario, which involves a continuously expanding and evolving underlying graph. To address inherent challenges, a decoupled learning framework (DLF) is proposed in this paper, which consists of a spatial-temporal graph learning network (DSTG) with a specialized decoupling training strategy. Incorporating inductive biases of time-series structures, DSTG can interpret time dependencies into latent trend and seasonal terms. To enable prompt adaptation to the evolving distribution of the dynamic graph, our decoupling training strategy is devised to iteratively update these two types of patterns. Specifically, for learning seasonal patterns, we conduct thorough training for the model using a long time series (e.g., three months of data). To enhance the learning ability of the model, we also introduce the masked auto-encoding mechanism. During this period, we frequently update trend patterns to expand new information from dynamic graphs. Considering both effectiveness and efficiency, we develop a subnet sampling strategy to select a few representative nodes for fine-tuning the weights of the model. These sampled nodes cover unseen patterns and previously learned patterns. Experiments on dynamic spatial-temporal graph datasets further demonstrate the competitive performance, superior efficiency, and strong scalability of the proposed framework.

PDF Details DOI

AAAI Conference 2023 Conference Paper

Correspondence-Free Domain Alignment for Unsupervised Cross-Domain Image Retrieval

Xu Wang
Dezhong Peng
Ming Yan
Peng Hu

Cross-domain image retrieval aims at retrieving images across different domains to excavate cross-domain classificatory or correspondence relationships. This paper studies a less-touched problem of cross-domain image retrieval, i.e., unsupervised cross-domain image retrieval, considering the following practical assumptions: (i) no correspondence relationship, and (ii) no category annotations. It is challenging to align and bridge distinct domains without cross-domain correspondence. To tackle the challenge, we present a novel Correspondence-free Domain Alignment (CoDA) method to effectively eliminate the cross-domain gap through In-domain Self-matching Supervision (ISS) and Cross-domain Classifier Alignment (CCA). To be specific, ISS is presented to encapsulate discriminative information into the latent common space by elaborating a novel self-matching supervision mechanism. To alleviate the cross-domain discrepancy, CCA is proposed to align distinct domain-specific classifiers. Thanks to the ISS and CCA, our method could encode the discrimination into the domain-invariant embedding space for unsupervised cross-domain image retrieval. To verify the effectiveness of the proposed method, extensive experiments are conducted on four benchmark datasets compared with six state-of-the-art methods.

PDF Details DOI

TIST Journal 2023 Journal Article

Obfuscating the Dataset: Impacts and Applications

Guangsheng Yu
Xu Wang
Caijun Sun
Ping Yu
Wei Ni
Ren Ping Liu

Obfuscating a dataset by adding random noises to protect the privacy of sensitive samples in the training dataset is crucial to prevent data leakage to untrusted parties when dataset sharing is essential. We conduct comprehensive experiments to investigate how the dataset obfuscation can affect the resultant model weights —in terms of the model accuracy, ℓ 2 -distance-based model distance, and level of data privacy—and discuss the potential applications with the proposed Privacy, Utility, and Distinguishability (PUD)-triangle diagram to visualize the requirement preferences. Our experiments are based on the popular MNIST and CIFAR-10 datasets under both independent and identically distributed (IID) and non-IID settings. Significant results include a tradeoff between the model accuracy and privacy level and a tradeoff between the model difference and privacy level. The results indicate broad application prospects for training outsourcing and guarding against attacks in federated learning both of which have been increasingly attractive in many areas, particularly learning in edge computing.

Details DOI

JMLR Journal 2023 Journal Article

On the Optimality of Nuclear-norm-based Matrix Completion for Problems with Smooth Non-linear Structure

Yunhua Xiang
Tianyu Zhang
Xu Wang
Ali Shojaie
Noah Simon

Nuclear-norm-based matrix completion was originally developed for imputing missing entries in low rank, or approximately low rank matrices. However, it has proven widely effective in many problems where there is no reason to assume low-dimensional linear structure in the underlying matrix, as would be imposed by rank constraints. In this manuscript we show that nuclear-norm-based matrix completion attains within a log factor of the minimax rate for estimating the mean structure of matrices that are not necessarily low-rank, but lie in a low-dimensional non-linear manifold, when observations are missing completely at random. In particular, we give upper bounds on the rate of convergence as a function of the number of rows, columns, and observed entries in the matrix, as well as the smoothness and dimension of the non-linear embedding. We additionally give a minimax lower bound: This lower bound agrees with our upper bound (up to a logarithmic factor), which shows that nuclear-norm penalization is (up to log terms) minimax rate optimal for these problems. [abs] [ pdf ][ bib ] &copy JMLR 2023. ( edit, beta )

PDF Details

AAAI Conference 2023 Conference Paper

SwiftAvatar: Efficient Auto-Creation of Parameterized Stylized Character on Arbitrary Avatar Engines

Shizun Wang
Weihong Zeng
Xu Wang
Hao Yang
Li Chen
Chuang Zhang
Ming Wu
Yi Yuan

The creation of a parameterized stylized character involves careful selection of numerous parameters, also known as the "avatar vectors" that can be interpreted by the avatar engine. Existing unsupervised avatar vector estimation methods that auto-create avatars for users, however, often fail to work because of the domain gap between realistic faces and stylized avatar images. To this end, we propose SwiftAvatar, a novel avatar auto-creation framework that is evidently superior to previous works. SwiftAvatar introduces dual-domain generators to create pairs of realistic faces and avatar images using shared latent codes. The latent codes can then be bridged with the avatar vectors as pairs, by performing GAN inversion on the avatar images rendered from the engine using avatar vectors. Through this way, we are able to synthesize paired data in high-quality as many as possible, consisting of avatar vectors and their corresponding realistic faces. We also propose semantic augmentation to improve the diversity of synthesis. Finally, a light-weight avatar vector estimator is trained on the synthetic pairs to implement efficient auto-creation. Our experiments demonstrate the effectiveness and efficiency of SwiftAvatar on two different avatar engines. The superiority and advantageous flexibility of SwiftAvatar are also verified in both subjective and objective evaluations.

PDF Details DOI

JBHI Journal 2022 Journal Article

3DCANN: A Spatio-Temporal Convolution Attention Neural Network for EEG Emotion Recognition

Shuaiqi Liu
Xu Wang
Ling Zhao
Bing Li
Weiming Hu
Jie Yu
Yu-Dong Zhang

Since electroencephalogram (EEG) signals can truly reflect human emotional state, emotion recognition based on EEG has turned into a critical branch in the field of artificial intelligence. Aiming at the disparity of EEG signals in various emotional states, we propose a new deep learning model named three-dimension convolution attention neural network (3DCANN) for EEG emotion recognition in this paper. The 3DCANN model is composed of spatio-temporal feature extraction module and EEG channel attention weight learning module, which can extract the dynamic relation well among multi-channel EEG signals and the internal spatial relation of multi-channel EEG signals during continuous period time. In this model, the spatio-temporal features are fused with the weights of dual attention learning, and the fused features are input into the softmax classifier for emotion classification. In addition, we utilize SJTU Emotion EEG Dataset (SEED) to appraise the feasibility and effectiveness of the proposed algorithm. Finally, experimental results display that the 3DCANN method has superior performance over the state-of-the-art models in EEG emotion recognition.

Details DOI

NeurIPS Conference 2022 Conference Paper

Expansion and Shrinkage of Localization for Weakly-Supervised Semantic Segmentation

Jinlong Li
Zequn Jie
Xu Wang
Xiaolin Wei
Lin Ma

Generating precise class-aware pseudo ground-truths, a. k. a, class activation maps (CAMs), is essential for Weakly-Supervised Semantic Segmentation. The original CAM method usually produces incomplete and inaccurate localization maps. To tackle with this issue, this paper proposes an Expansion and Shrinkage scheme based on the offset learning in the deformable convolution, to sequentially improve the recall and precision of the located object in the two respective stages. In the Expansion stage, an offset learning branch in a deformable convolution layer, referred to as expansion sampler'', seeks to sample increasingly less discriminative object regions, driven by an inverse supervision signal that maximizes image-level classification loss. The located more complete object region in the Expansion stage is then gradually narrowed down to the final object region during the Shrinkage stage. In the Shrinkage stage, the offset learning branch of another deformable convolution layer referred to as the shrinkage sampler'', is introduced to exclude the false positive background regions attended in the Expansion stage to improve the precision of the localization maps. We conduct various experiments on PASCAL VOC 2012 and MS COCO 2014 to well demonstrate the superiority of our method over other state-of-the-art methods for Weakly-Supervised Semantic Segmentation. The code is available at https: //github. com/TyroneLi/ESOL_WSSS.

PDF Details

JBHI Journal 2020 Journal Article

Disrupting Healthcare Silos: Addressing Data Volume, Velocity and Variety With a Cloud-Native Healthcare Data Ingestion Service

Rohit Ranchal
Paul Bastide
Xu Wang
Aris Gkoulalas-Divanis
Maneesh Mehra
Senthil Bakthavachalam
Hui Lei
Ajay Mohindra

Healthcare enterprises are starting to adopt cloud computing due to its numerous advantages over traditional infrastructures. This has become a necessity because of the increased volume, velocity and variety of healthcare data, and the need to facilitate data correlation and large-scale analysis. Cloud computing infrastructures have the power to offer continuous acquisition of data from multiple heterogeneous sources, efficient data integration, and big data analysis. At the same time, security, availability, and disaster recovery are critical factors aiding towards the adoption of cloud computing. However, the migration of healthcare workloads to cloud is not straightforward due to the vagueness in healthcare data standards, heterogeneity and sensitive nature of healthcare data, and many regulations that govern its usage. This paper highlights the need for providing healthcare data acquisition using cloud infrastructures and presents the challenges, requirements, use-cases, and best practices for building a state-of-the-art healthcare data ingestion service on cloud.

Details DOI

AAAI Conference 2020 Short Paper

HGMAN: Multi-Hop and Multi-Answer Question Answering Based on Heterogeneous Knowledge Graph (Student Abstract)

Xu Wang
Shuai Zhao
Bo Cheng
Jiale Han
Yingting Li
Hao Yang
Guoshun Nan

Multi-hop question answering models based on knowledge graph have been extensively studied. Most existing models predict a single answer with the highest probability by ranking candidate answers. However, they are stuck in predicting all the right answers caused by the ranking method. In this paper, we propose a novel model that converts the ranking of candidate answers into individual predictions for each candidate, named heterogeneous knowledge graph based multi-hop and multi-answer model (HGMAN). HGMAN is capable of capturing more informative representations for relations assisted by our heterogeneous graph, which consists of multiple entity nodes and relation nodes. We rely on graph convolutional network for multi-hop reasoning and then binary classiﬁcation for each node to get multiple answers. Experimental results on MetaQA dataset show the performance of our proposed model over all baselines.

PDF Details

AAAI Conference 2020 Short Paper

Hypergraph Convolutional Network for Multi-Hop Knowledge Base Question Answering (Student Abstract)

Jiale Han
Bo Cheng
Xu Wang

Graph convolutional networks (GCN) have been applied in knowledge base question answering (KBQA) task. However, the pairwise connection between nodes of GCN limits the representation capability of high-order data correlation. Furthermore, most previous work does not fully utilize the semantic relation information, which is vital to reasoning. In this paper, we propose a novel multi-hop KBQA model based on hypergraph convolutional network. By constructing a hypergraph, the form of pairwise connection between nodes and nodes is converted to the high-level connection between nodes and edges, which effectively encodes complex related data. To better exploit the semantic information of relations, we apply co-attention method to learn similarity between relation and query, and assign weights to different relations. Experimental results demonstrate the effectivity of the model.

PDF Details

IJCAI Conference 2020 Conference Paper

Two-Phase Hypergraph Based Reasoning with Dynamic Relations for Multi-Hop KBQA

Jiale Han
Bo Cheng
Xu Wang

Multi-hop knowledge base question answering (KBQA) aims at finding the answers to a factoid question by reasoning across multiple triples. Note that when human performs multi-hop reasoning, one tends to concentrate on specific relation at different hops and pinpoint a group of entities connected by the relation. Hypergraph convolutional networks (HGCN) can simulate this behavior by leveraging hyperedges to connect more than two nodes more than pairwise connection. However, HGCN is for undirected graphs and does not consider the direction of information transmission. We introduce the directed-HGCN (DHGCN) to adapt to the knowledge graph with directionality. Inspired by human's hop-by-hop reasoning, we propose an interpretable KBQA model based on DHGCN, namely two-phase hypergraph based reasoning with dynamic relations, which explicitly updates relation information and dynamically pays attention to different relations at different hops. Moreover, the model predicts relations hop-by-hop to generate an intermediate relation path. We conduct extensive experiments on two widely used multi-hop KBQA datasets to prove the effectiveness of our model.

PDF Details DOI

NeurIPS Conference 2019 Conference Paper

Exploiting Local and Global Structure for Point Cloud Semantic Segmentation with Contextual Point Representations

Xu Wang
Jingming He
Lin Ma

In this paper, we propose one novel model for point cloud semantic segmentation, which exploits both the local and global structures within the point cloud based onthe contextual point representations. Specifically, we enrich each point represen-tation by performing one novel gated fusion on the point itself and its contextualpoints. Afterwards, based on the enriched representation, we propose one novelgraph pointnet module, relying on the graph attention block to dynamically com-pose and update each point representation within the local point cloud structure. Finally, we resort to the spatial-wise and channel-wise attention strategies to exploitthe point cloud global structure and thereby yield the resulting semantic label foreach point. Extensive results on the public point cloud databases, namely theS3DIS and ScanNet datasets, demonstrate the effectiveness of our proposed model, outperforming the state-of-the-art approaches. Our code for this paper is available at https: //github. com/fly519/ELGS.

PDF Details

YNICL Journal 2017 Journal Article

Anatomical brain difference of subthreshold depression in young and middle-aged individuals

Jing Li
Zengjian Wang
JiWon Hwang
Bingcong Zhao
Xinjing Yang
Suicheng Xin
Yu Wang
Huili Jiang

BACKGROUND: Subthreshold depression (StD) is associated with substantial functional impairments due to depressive symptoms that do not fully meet the diagnosis of major depressive disorder (MDD). Its high incidence in the general population and debilitating symptoms has recently put it at the forefront of mood disorder research. AIM: In this study we investigated common volumetric brain changes in both young and middle-aged StD patients. METHODS: = 76) underwent voxel-based morphometry (VBM). RESULTS: VBM analysis found that: 1) compared with healthy controls, StD patients showed decreased gray matter volume (GMV) in the bilateral globus pallidus and precentral gyrus, as well as increased GMV in the left thalamus and right rostral anterior cingulate cortex/medial prefrontal cortex; 2) there is a significant association between Center for Epidemiological Studies Depression Scale scores and the bilateral globus pallidus (negative) and left thalamus (positive); 3) there is no interaction between age (young vs. middle-age) and group (StD vs. controls). CONCLUSIONS: Our findings indicate significant VBM brain changes in both young and middle-aged individuals with StD. Individuals with StD, regardless of age, may share common neural characteristics.

Details DOI

YNIMG Journal 2015 Journal Article

Neural correlates of psychological resilience and their relation to life satisfaction in a sample of healthy young adults

Feng Kong
Xu Wang
Siyuan Hu
Jia Liu

Psychological resilience refers to the ability to thrive in the face of risk and adversity, which is crucial for individuals' mental and physical health. However, its precise neural correlates are still largely unknown. Here we used resting-state functional magnetic resonance imaging (rs-fMRI) to identify the brain regions underlying this construct by correlating individuals' psychological resilience scores with the regional homogeneity (ReHo) and then examined how these resilience-related regions predicted life satisfaction in a sample of healthy young adults. We found that the ReHo in the bilateral insula, right dorsal anterior cingulate cortex (dACC) and right rostral ACC (rACC) negatively predicted individual differences in psychological resilience, revealing the critical role of the salience network (SN) in psychological resilience. Crucially, the ReHo in the dACC within the SN mediated the effects of psychological resilience on life satisfaction. In summary, these findings suggest that spontaneous activity of the human brain reflect the efficiency of psychological resilience and highlight the dACC within the SN as a neural substrate linking psychological resilience and life satisfaction.

Details DOI

YNIMG Journal 2015 Journal Article

Neural correlates of the happy life: The amplitude of spontaneous low frequency fluctuations predicts subjective well-being

Feng Kong
Siyuan Hu
Xu Wang
Yiying Song
Jia Liu

Subjective well-being is assumed to be distributed in the hedonic hotspots of subcortical and cortical structures. However, the precise neural correlates underlying this construct, especially how it is maintained during the resting state, are still largely unknown. Here, we explored the neural basis of subjective well-being by correlating the regional fractional amplitude of low frequency fluctuations (fALFF) with the self-reported subjective well-being of healthy individuals. Behaviorally, we demonstrated that subjective well-being contained two related but distinct components: cognitive and affective well-being. Neurally, we showed that the fALFF in the bilateral posterior superior temporal gyrus (pSTG), right posterior mid-cingulate cortex (pMCC), right thalamus, left postcentral gyrus (PCG), right lingual gyrus, and left planum temporale (PT) positively predicted cognitive well-being, whereas the fALFF in the bilateral superior frontal gyrus (SFG), right orbitofrontal cortex (OFC), and left inferior temporal gyrus (ITG) negatively predicted cognitive well-being. In contrast, only the fALFF in the right amygdala reliably predicted affective well-being. Furthermore, emotional intelligence partially mediated the effects of the right pSTG and thalamus on cognitive well-being, as well as the effect of the right amygdala on affective well-being. In summary, we provide the first evidence that spontaneous brain activity in multiple regions associated with sensation, social perception, cognition, and emotion contributes to cognitive well-being, whereas the spontaneous brain activity in only one emotion-related region contributes to affective well-being, suggesting that the spontaneous activity of the human brain reflect the efficiency of subjective well-being.

Details DOI

YNIMG Journal 2015 Journal Article

Quantifying interindividual variability and asymmetry of face-selective regions: A probabilistic functional atlas

Zonglei Zhen
Zetian Yang
Lijie Huang
Xiang-Zhen Kong
Xu Wang
Xiaobin Dang
Yangyue Huang
Yiying Song

Face-selective regions (FSRs) are among the most widely studied functional regions in the human brain. However, individual variability of the FSRs has not been well quantified. Here we use functional magnetic resonance imaging (fMRI) to localize the FSRs and quantify their spatial and functional variabilities in 202 healthy adults. The occipital face area (OFA), posterior and anterior fusiform face areas (pFFA and aFFA), posterior continuation of the superior temporal sulcus (pcSTS), and posterior and anterior STS (pSTS and aSTS) were delineated for each individual with a semi-automated procedure. A probabilistic atlas was constructed to characterize their interindividual variability, revealing that the FSRs were highly variable in location and extent across subjects. The variability of FSRs was further quantified on both functional (i. e. , face selectivity) and spatial (i. e. , volume, location of peak activation, and anatomical location) features. Considerable interindividual variability and rightward asymmetry were found in all FSRs on these features. Taken together, our work presents the first effort to characterize comprehensively the variability of FSRs in a large sample of healthy subjects, and invites future work on the origin of the variability and its relation to individual differences in behavioral performance. Moreover, the probabilistic functional atlas will provide an adequate spatial reference for mapping the face network.

Details DOI