Author name cluster

Di Wu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

63 papers

2 author rows

TCS Journal 2026 Journal Article

Better guarantees for individual fairness k-median

Di Wu
Qilong Feng
Jianxin Wang

• This paper extends and refines the conference version that appeared in Proceedings of the 17th International Conference on Combinatorial Optimization and Applications by strengthening the theoretical analysis and broadening the scope of the results. • First, we include full and detailed proofs for nearly all lemmas and claims that were previously omitted ( Sections 3 - 4 ). • Second, we extend our techniques to the k -means clustering formulation and establish analogous approximation guarantees ( Section 4 ). • Specifically, we show that our dynamic programming approach yields a ( 1 + ε )-approximation algorithm for the individual fairness k -means problem with a fairness violation of at most ( 2 + ε ). The individual fairness k -median problem is a commonly encountered problem in applications involving center location. It generalizes the standard k -median problem by assigning each point a fairness radius, allowing connections only to centers within a constant factor of this radius. In this paper, we present a randomized ( 1 + ϵ ) -approximation algorithm with a ( 2 + ϵ ) -fairness violation for the individual fairness k -median problem, improving upon the previous best approximation ratio of 7. 081 + ϵ and fairness violation of 3. We propose a new dynamic programming approach to deal with the challenges caused by the individual fairness requirements, which is the crucial step in getting the improved ratio.

Details DOI

AAAI Conference 2026 Conference Paper

Cross-Modal Unlearning via Influential Neuron Path Editing in Multimodal Large Language Models

Kunhao Li
Wenhao Li
Di Wu
Lei Yang
Jun Bai
Ju Jia
Jason Xue

Multimodal Large Language Models (MLLMs) extend foundation models to real-world applications by integrating inputs such as text and vision. However, their broad knowledge capacity raises growing concerns about privacy leakage, toxicity mitigation, and intellectual property violations. Machine Unlearning (MU) offers a practical solution by selectively forgetting targeted knowledge while preserving overall model utility. When applied to MLLMs, existing neuron-editing-based MU approaches face two fundamental challenges: (i) forgetting becomes inconsistent across modalities because existing point-wise attribution methods fail to capture the structured, layer-by-layer information flow that connects different modalities; and (ii) general knowledge performance declines when sensitive neurons that also support important reasoning paths are pruned, as this disrupts the model’s ability to generalize. To alleviate these limitations, we propose a multimodal influential neuron path editor (MIP-Editor) for MU. Our approach introduces modality-specific attribution scores to identify influential neuron paths responsible for encoding forget-set knowledge and applies influential-path-aware neuron-editing via representation misdirection. This strategy also enables effective and coordinated forgetting across modalities while preserving the model's general capabilities. Experimental results demonstrate that MIP-Editor achieves a superior unlearning performance on multimodal tasks, with a maximum forgetting rate of 87.75% and up to 54.26% improvement in general knowledge retention. On textual tasks, MIP-Editor achieves up to 80.65% forgetting and preserves 77.90% of general performance.

PDF Details DOI

EAAI Journal 2026 Journal Article

Distributed Lyapunov-based model predictive formation control for unmanned surface vehicles with flexible-time prescribed performance

Zengyang Yan
Di Wu
Lei Qiao
Baozhu Du
Guoqing Zhang
Vincenzo Lippiello

This paper investigates the cooperative maneuvering control of unmanned surface vehicle (USV) formations in complex and time-varying marine environments subject to nonlinear disturbances and input constraints. To address the coupling between formation evolution and velocity regulation, a path-parameterized formation model is established to decouple the spatial and temporal dynamics, thereby enabling coordinated control of position and speed while preserving geometric consistency. To improve transient and steady-state performance, a flexible-time prescribed performance (FTPP) mechanism is developed to construct asymmetric and time-varying performance bounds. Different from conventional prescribed performance designs, the transformed error generated by FTPP is further embedded into the contraction constraint of the distributed Lyapunov-based model predictive control (DLMPC) framework. This coupling enables the predictive controller to inherit the convergence characteristics of FTPP and the stability of the auxiliary controller while optimizing constrained control inputs, thereby improving tracking accuracy, accelerating convergence, and reducing overshoot. To enhance disturbance estimation capability, a feature-enhanced neural predictor (FENP) is integrated into the DLMPC framework. In contrast to conventional radial basis function neural predictors, the proposed FENP introduces a Hadamard-product-based feature enhancement mechanism to enrich the neural-network input representation and improve the approximation of complex nonlinear uncertainties. Simulation results demonstrate that the proposed DLMPC–FTPP–FENP framework achieves accurate formation maneuvering, fast convergence, and robust closed-loop performance under complex marine disturbances.

Details DOI

EAAI Journal 2026 Journal Article

Graph channel receptive field transformer for multi-agent trajectory prediction

Jiankun Peng
Jiakang Wang
Nan Zhang
Di Wu
Chunye Ma

Multi-agent trajectory prediction is critical for safe autonomous driving. However, existing vectorized methods face limitations in modeling local interactions and capturing global dependencies, struggling with interaction uncertainties and long-range dependencies in complex traffic. To address these challenges, we propose Graph Channel-Receptive Field Transformer (GCRFormer). The framework models the traffic scene as a heterogeneous graph to uniformly represent agent trajectories and map features. It integrates a Graph Channel Weight Tuning (GCWT) mechanism to aggregate local interactions and lane constraints. By combining GCWT with a Dilated Graph Receptive Field (DGRF) module, it captures long-range dependencies and generates multimodal candidate embeddings. A decoder then fuses these hierarchical features to output future trajectories and their associated probabilities for all agents. Experiments conducted on the Argoverse1 benchmark confirm that the proposed GCRFormer architecture outperforms existing state-of-the-art methods, showcasing its enhanced capability in modeling complex interactions for accurate trajectory prediction.

Details DOI

AAAI Conference 2026 Conference Paper

Introducing Visual Scenes and Reasoning: A More Realistic Benchmark for Spoken Language Understanding

Di Wu
Liting Jiang
Ruiyu Fang
Bianjing
Hongyan Xie
Haoxiang Su
Hao Huang
Zhongjiang He

Spoken Language Understanding (SLU) consists of two sub-tasks: intent detection (ID) and slot filling (SF). Given its broad range of real-world applications, enhancing SLU for practical deployment is increasingly critical. Profile-based SLU addresses ambiguous user utterances by incorporating context awareness (CA), user profiles (UP), and knowledge graphs (KG) to support disambiguation, thereby advancing SLU research toward real-world applicability. However, existing SLU datasets still fall short in representing real-world scenarios. Specifically, (1) CA uses one-hot vectors for representation, which is overly idealized, and (2) models typically focuses solely on predicting intents and slot labels, neglecting the reasoning process that could enhance performance and interpretability. To overcome these limitations, we introduce VRSLU, a novel SLU dataset that integrates both Visual images and explicit Reasoning. For over-idealized CA, we use GPT-4o and FLUX.1-dev to generate images reflecting users’ environments and statuses, followed by human verification to ensure quality. For reasoning, GPT-4o is employed to generate explanations for predicted labels, which are then refined by human annotators to ensure accuracy and coherence. Additionally, we propose an instructional template, LR-Instruct, which first predicts labels and then generates corresponding reasoning. This two-step approach helps mitigate the influence of reasoning bias on label prediction. Experimental results confirm the effectiveness of incorporating visual information and highlight the promise of explicit reasoning in advancing SLU.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Learning Dynamics as Feedback: An Adaptive Entropy Flow Dynamics Framework for Long-tailed Human Action Recognition

Yuan Dong
Zhe Zhao
Liheng Yu
Di Wu
Pengkun Wang

Deep human action recognition models trained on real-world data are often challenged by long-tailed distributions, where performance on rare classes is severely degraded. Current solutions typically apply static or heuristic interventions that are disconnected from the model's evolving internal state. To overcome this limitation, we reconceptualize long-tailed human action recognition as a closed-loop, self-regulating system, inspired by ecological theory. We further introduce an Adaptive Ecological Entropy Dynamics (AEED) framework, which is built upon three synergistic components. First, AEED perceives the learning state through entropy flow, providing a robust and directional signal of learning progress. Second, this signal drives an adaptation mechanism, which dynamically adjusts class-specific loss weights to allocate more learning resources to underperforming classes. Finally, AEED facilitates intelligent knowledge transfer via Confidence-Guided Symbiosis (CS-Mix). Extensive experiments demonstrate that AEED achieves state-of-the-art performance on challenging skeleton-based action recognition benchmarks, including NTU-60-LT and Kinetics-400-LT.

PDF Details DOI

JBHI Journal 2026 Journal Article

Neuro-BERT: Rethinking Masked Autoencoding for Self-Supervised Neurological Pretraining

Di Wu
Siyuan Li
Jie Yang
Mohamad Sawan

Deep learning associated with neurological signals is poised to drive major advancements in diverse fields such as medical diagnostics, neurorehabilitation, and brain-computer interfaces. The challenge in harnessing the full potential of these signals lies in the dependency on extensive, high-quality annotated data, which is often scarce and expensive to acquire, requiring specialized infrastructure and domain expertise. To address the appetite for data in deep learning, we present Neuro-BERT, a self-supervised pre-training framework of neurological signals based on masked autoencoding in the Fourier domain. The intuition behind our approach is simple: frequency and phase distribution of neurological signals can reveal intricate neurological activities. We propose a novel pre-training task dubbed Fourier Inversion Prediction (FIP), which randomly masks out a portion of the input signal and then predicts the missing information using the Fourier inversion theorem. Pre-trained models can be potentially used for various downstream tasks such as sleep stage classification and gesture recognition. Unlike contrastive-based methods, which strongly rely on carefully hand-crafted augmentations and siamese structure, our approach works reasonably well with a simple transformer encoder with no augmentation requirements. By evaluating our method on several benchmark datasets, we show that Neuro-BERT improves downstream neurological-related tasks by a large margin.

Details DOI

JBHI Journal 2026 Journal Article

NeuroCLIP: A Multimodal Contrastive Learning Method for rTMS-treated Methamphetamine Addiction Analysis

Chengkai Wang
Di Wu
Yunsheng Liao
Wenyao Zheng
Ziyi Zeng
Xurong Gao
Hemmings Wu
Zhoule Zhu

Methamphetamine dependence poses a significant global health challenge, yet its assessment and the evaluation of treatments like repetitive transcranial magnetic stimulation (rTMS) frequently depend on subjective self-reports, which may introduce uncertainties. While objective neuroimaging modalities such as electroencephalography (EEG) and functional near-infrared spectroscopy (fNIRS) offer alternatives, their individual limitations and the reliance on conventional, often hand-crafted, feature extraction can compromise the reliability of derived biomarkers. To overcome these limitations, we propose NeuroCLIP, a novel deep learning framework integrating simultaneously recorded EEG and fNIRS data through a progressive learning strategy. This approach offers a robust and trustworthy data-driven biomarker for methamphetamine addiction. Validation experiments show that NeuroCLIP significantly improves discriminative capabilities among the methamphetamine-dependent individuals and healthy controls compared to models using either EEG or only fNIRS alone. Furthermore, the proposed framework facilitates objective, brain-based evaluation of rTMS treatment efficacy, demonstrating measurable shifts in neural patterns towards healthy control profiles after treatment. Critically, we establish the trustworthiness of the multimodal data-driven biomarker by showing its strong correlation with psychometrically validated craving scores. These findings suggest that biomarker derived from EEG-fNIRS data via NeuroCLIP offers enhanced robustness and reliability over single-modality approaches, providing a valuable tool for addiction neuroscience research and potentially improving clinical assessments.

Details DOI

AAAI Conference 2026 Conference Paper

PAGPL: Privacy-Aware Graph Prompt Learning Scheme via Adaptive Perturbation-Estimated Topology Recovery

Ju Jia
Jiansen Song
Jingxuan Yu
Jiabao Guo
Xiaoshuang Jia
Di Wu
Yali Yuan
Guang Cheng

Graph prompt learning (GPL) serves as a crucial framework for mitigating the knowledge transfer by reconciling the substantial mismatch between pre-training models and downstream tasks. However, prevalent GPL paradigm fail to accommodate graph data affected by privacy-induced noise. Specifically, 1) GPL typically relies on the stability of original graph structures for the design of effective prompt templates; 2) the construction of prompts lacks explicit guidance to suppress noise introduced by privacy perturbations; 3) prompt optimization on single disturbed graphs can easily lead to overfitting to noise patterns. To address these issues, we propose a novel privacy-aware graph prompt learning (PAGPL) scheme, which alleviates spurious clues caused by privacy noise injection. Initially, an adaptive structure-wise Bayesian estimation is applied to reconstruct the privacy-perturbed graphs. Subsequently, to suppress the impact of residual perturbation, a noise-resilient prompt generation is employed to filter unreliable structural and signals. Ultimately, we incorporate a multi-view-based progressive privacy consistency to promote the robustness of prompts against the semantic misalignment while improving the task-specific consistency. The experimental results reveal that our scheme outperforms state-of-the-art (SOTA) GPL approaches with a 10%–60% improvement in accuracy under various real-world privacy-perturbed scenarios.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Rethinking Crystal Symmetry Prediction: A Decoupled Perspective

Liheng Yu
Zhe Zhao
Xucong Wang
Di Wu
Pengkun Wang

Efficiently and accurately determining the symmetry is a crucial step in the structural analysis of crystalline materials. Existing methods usually mindlessly apply deep learning models while ignoring the underlying chemical rules. More importantly, experiments show that they face a serious sub-property confusion SPC problem. To address the above challenges, from a decoupled perspective, we introduce the XRDecoupler framework, a problem-solving arsenal specifically designed to tackle the SPC problem. Imitating the thinking process of chemists, we innovatively incorporate multidimensional crystal symmetry information as superclass guidance to ensure that the model's prediction process aligns with chemical intuition. We further design a hierarchical PXRD pattern learning model and a multi-objective optimization approach to achieve high-quality representation and balanced optimization. Comprehensive evaluations on three mainstream databases (e.g., CCDC, CoREMOF, and InorganicData) demonstrate that XRDecoupler excels in performance, interpretability, and generalization.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Rethinking Multimodal Point Cloud Completion: A Completion-by-Correction Perspective

Wang Luo
Di Wu
Hengyuan Na
Yinlin Zhu
Miao Hu
Guocong Quan

Point cloud completion aims to reconstruct complete 3D shapes from partial observations, which is a challenging problem due to severe occlusions and missing geometry. Despite recent advances in multimodal techniques that leverage complementary RGB images to compensate for missing geometry, most methods still follow a Completion-by-Inpainting paradigm, synthesizing missing structures from fused latent features. We empirically show that this paradigm often results in structural inconsistencies and topological artifacts due to limited geometric and semantic constraints. To address this, we rethink the task and propose a more robust paradigm, termed Completion-by-Correction, which begins with a topologically complete shape prior generated by a pretrained image-to-3D model and performs feature-space correction to align it with the partial observation. This paradigm shifts completion from unconstrained synthesis to guided refinement, enabling structurally consistent and observation-aligned reconstruction. Building upon this paradigm, we introduce PGNet, a multi-stage framework that conducts dual-feature encoding to ground the generative prior, synthesizes a coarse yet structurally aligned scaffold, and progressively refines geometric details via hierarchical correction. Experiments on the ShapeNetViPC dataset demonstrate the superiority of PGNet over state-of-the-art baselines in terms of average Chamfer Distance (-23.5%) and F-score (+7.1%).

PDF Details DOI

JBHI Journal 2026 Journal Article

Schizophrenia Recognition Based on Gramian Angular Field Combining Activation Features: A Functional Near-Infrared Spectroscopy Study

Mingxi Yang
Yulu Yang
Meiyun Xia
Huiting Qiao
Daifa Wang
Yajing Meng
Di Wu

In recent years, the incidence of schizophrenia has been increasing globally. Combination of functional near-infrared spectroscopy with verbal fluency task (fNIRS-VFT) provides an objective neurofunctional assessment tool for psychiatrists in auxiliary diagnosis. However, current methods face challenges such as insufficient time series feature extracting, inadequate feature utilization, and unstable robustness of single or few features. In this study we designed a fNIRS classification strategy by gramian angular field (GAF) coding integrating activation information. This strategy utilizes GAF normalized by the global maximum and minimum to via fNIRS signals into 2D representations virtual images for SCZ recognition. Specifically, fNIRS data of 200 participants were constructed and processed into virtual images. Then, the classification performance of five different processing methods, including directly using activation sequence, recurrence plot, Markov transition field, GAF image without activation, and traditional fNIRS features is compared. Finally, this study additionally collected 40 cases of fNIRS-VFT data as an external test set. Compared with EfftivenetV2 with GAF images without activate information (73. 4% accuracy) and CNN HbO signals classification (75. 5% accuracy), the SCZ recognition accuracy based on GAF images combining activation information improved by 7. 6% and 4. 9%. The ShuffleNetV2 achieved the best performance with an accuracy of 81. 0% on the cross-validation dataset, and obtained an accuracy of 72. 0% on the external test set. Our findings indicate that GAF virtual image coding approach integrating activation information forms a new strategy for supporting SCZ screening and diagnosis. It further promotes the application of fNIRS technology in the field of clinical psychiatric disorders.

Details DOI

AAAI Conference 2026 Conference Paper

SIAM: Towards Generalizable Articulated Object Modeling via Single Robot-Object Interaction

Yuyan Liu
Li Zhang
Di Wu
Yan Zhang
Anran Huang
Zhi Wang
Liu Liu
Dan Guo

Articulated object modeling, which represents interconnected rigid bodies with their geometry, part segmentation, articulation tree, and physical properties, is crucial for robotic perception and manipulation. Recently existing methods like SAGCI leverage Interactive Perception (IP) to refine models through robot interaction. However, SAGCI suffers from prior-dependency (requiring initialization), neglects kinematic/dynamic constraints, and generates non-watertight meshes. To overcome these limitations, we propose SIAM, a novel framework for efficient and generalizable Single-Interaction Articulated Modeling. Given an initial point cloud, SIAM first enables minimal robot interaction to trigger object motion. It then precisely segments parts by analyzing point cloud differences pre- and post-interaction. For joint parameter estimation, we introduce an optimization incorporating novel kinematic energy constraints, enhancing physical consistency. Finally, we reconstruct a high-quality, topologically watertight mesh by learning 3D Gaussian Primitives from multi-view RGB-D observations under deformation. Extensive experiments on the PartNet-Mobility benchmark demonstrate state-of-the-art articulation modeling performance. Successful real-world deployment with an xArm robot further validates the framework's practicality and transferability. SIAM achieves accurate, prior-free modeling with significantly reduced interaction cost.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

A Novel Sparse Active Online Learning Framework for Fast and Accurate Streaming Anomaly Detection Over Data Streams

Zhong Chen
Yi He
Di Wu
Chen Zhao
Meikang Qiu

Online Anomaly Detection (OAD) is critical for identifying rare yet important data points in large, dynamic, and complex data streams. A key challenge lies in achieving accurate and consistent detection of anomalies while maintaining computational and memory efficiency. Conventional OAD approaches, which depend on distributional deviations and static thresholds, struggle with model update delays and catastrophic forgetting, leading to missed detections and high false positive rates. To address these limitations, we propose a novel Streaming Anomaly Detection (SAD) method, grounded in a sparse active online learning framework. Our approach uniquely integrates ℓ1, 2-norm sparse online learning with CUR decomposition-based active learning, enabling simultaneous fast feature selection and dynamic instance selection. The efficient CUR decomposition further supports real-time residual analysis for anomaly scoring, eliminating the need for manual threshold settings about temporal data distributions. Extensive experiments on diverse streaming datasets demonstrate SAD's superiority, achieving a 14. 06% reduction in detection error rates compared to five state-of-the-art competitors.

PDF Details DOI

AAAI Conference 2025 Conference Paper

ARNet: Self-Supervised FG-SBIR with Unified Sample Feature Alignment and Multi-Scale Token Recycling

Jianan Jiang
Hao Tang
Zhilin Jiang
Weiren Yu
Di Wu

Fine-Grained Sketch-Based Image Retrieval (FG-SBIR) aims to minimize the distance between sketches and corresponding images in the embedding space. However, scalability is hindered by the growing complexity of solutions, mainly due to the abstract nature of fine-grained sketches. In this paper, we propose an effective approach to narrow the gap between the two domains. It mainly facilitates unified mutual information sharing both intra- and inter-samples, rather than treating them as a single feature alignment problem between modalities. Specifically, our approach includes: (i) Employing dual weight-sharing networks to optimize alignment within the sketch and image domain, which also effectively mitigates model learning saturation issues. (ii) Introducing an objective optimization function based on contrastive loss to enhance the model's ability to align features in both intra- and inter-samples. (iii) Presenting a self-supervised Multi-Scale Token Recycling (MSTR) Module featured by recycling discarded patch tokens in multi-scale features, further enhancing representation capability and retrieval performance. Our framework achieves excellent results on CNN- and ViT-based backbones. Extensive experiments demonstrate its superiority over existing methods. We also introduce Cloths-V1, the first professional fashion sketch-image dataset, utilized to validate our method and will be beneficial for other applications.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Calibrating Translation Decoding with Quality Estimation on LLMs

Di Wu
Yibin Lei
Christof Monz

Neural machine translation (NMT) systems typically employ maximum a posteriori (MAP) decoding to select the highest-scoring translation from the distribution. However, recent evidence highlights the inadequacy of MAP decoding, often resulting in low-quality or even pathological hypotheses as the decoding objective is only weakly aligned with real-world translation quality. This paper proposes to directly calibrate hypothesis likelihood with translation quality from a distributional view by directly optimizing their Pearson correlation, thereby enhancing decoding effectiveness. With our method, translation with large language models (LLMs) improves substantially after limited training (2K instances per direction). This improvement is orthogonal to those achieved through supervised fine-tuning, leading to substantial gains across a broad range of metrics and human evaluations. This holds even when applied to top-performing translation-specialized LLMs fine-tuned on high-quality translation data, such as Tower, or when compared to recent preference optimization methods, like CPO. Moreover, the calibrated translation likelihood can directly serve as a strong proxy for translation quality, closely approximating or even surpassing some state-of-the-art translation quality estimation models, like CometKiwi. Lastly, our in-depth analysis demonstrates that calibration enhances the effectiveness of MAP decoding, thereby enabling greater efficiency in real-world deployment. The resulting state-of-the-art translation model, which covers 10 languages, along with the accompanying code and human evaluation data, has been released: https: //github. com/moore3930/calibrating-llm-mt.

PDF Details

ICLR Conference 2025 Conference Paper

Can LLMs Really Learn to Translate a Low-Resource Language from One Grammar Book?

Seth Aycock
David Stap
Di Wu
Christof Monz
Khalil Sima'an

Extremely low-resource (XLR) languages lack substantial corpora for training NLP models, motivating the use of all available resources such as dictionaries and grammar books. Machine Translation from One Book (Tanzer et al., 2024) suggests that prompting long-context LLMs with one grammar book enables English–Kalamang translation, an XLR language unseen by LLMs—a noteworthy case of linguistics helping an NLP task. We investigate the source of this translation ability, finding almost all improvements stem from the book’s parallel examples rather than its grammatical explanations. We find similar results for Nepali and Guarani, seen low-resource languages, and we achieve performance comparable to an LLM with a grammar book by simply fine-tuning an encoder-decoder translation model. We then investigate where grammar books help by testing two linguistic tasks, grammaticality judgment and gloss prediction, and we explore what kind of grammatical knowledge helps by introducing a typological feature prompt that achieves leading results on these more relevant tasks. We thus emphasise the importance of task-appropriate data for XLR languages: parallel examples for translation, and grammatical data for linguistic tasks. As we find no evidence that long-context LLMs can make effective use of grammatical explanations for XLR translation, we conclude data collection for multilingual XLR tasks such as translation is best focused on parallel data over linguistic description.

Details

JBHI Journal 2025 Journal Article

CDAF-Net: A Contextual Contrast Detail Attention Feature Fusion Network for Low-Dose CT Denoising

Yaoyao Ma
Jing Wang
Chao Xu
Yuling Huang
Minghang Chu
Zhiwei Fan
Yishen Xu
Di Wu

Low-dose computed tomography (LDCT) is a specialized CT scan with a lower radiation dose than normal-dose CT. However, the reduced radiation dose can introduce noise and artifacts, affecting diagnostic accuracy. To enhance the LDCT image quality, we propose a Contextual Contrast Detail Attention Feature Fusion Network (CDAF-Net) for LDCT denoising. Firstly, the LDCT image, with dimensions 1 × H × W, is mapped to a feature map with dimensions C × H × W, and it is processed through the Contextual Contrast Detail Attention (CCDA) module and the Selective Kernel Feature Fusion (SKFF) module. The CCDA module combines a global contextual attention mechanism with detail-enhanced differential convolutions to better understand the overall semantics and structure of the LDCT image, capturing subtle changes and details. The SKFF module effectively merges shallow features extracted by the encoder with deep features from the decoder, integrating feature representations from different levels. This process is repeated across four different resolution feature maps, and the denoised LDCT image is output through a skip connection. We conduct experiments on the Mayo dataset, the LDCT-and-Projection-Data dataset, and the Piglet dataset. Specifically, the CDAF-Net achieves the optimal metrics with a PSNR of 33. 7262 dB, an SSIM of 0. 9254, and an RMSE of 5. 3731 on the Mayo dataset. Improvements are also observed in head CT and ultra-low-dose chest CT images of the LDCT-and-Projection-Data dataset and the Piglet dataset. Experimental results show that the proposed CDAF-Net algorithm provides superior denoising performance compared with the state-of-the-art (SOTA) algorithms.

Details DOI

EAAI Journal 2025 Journal Article

DPMF-Net: A dual-path perceptive multi-stage fusion network for skin lesion segmentation

Yuling Huang
Yaoyao Ma
Jing Wang
Chao Xu
Zhiwei Fan
Di Wu

Accurate segmentation of skin lesions in dermoscopic images is crucial for skin cancer detection and treatment. Despite progress in deep learning-based methods, challenges remain due to diverse skin lesion shapes, colors, and blurred boundaries. We propose a novel Dual-path Perceptive Multi-stage Fusion Network (DPMF-Net) for skin lesion segmentation. DPMF-Net integrates multiple feature refinement modules. It aims to gradually optimize lesion representations by leveraging the dual-path framework to perceive high-level contextual information. The Spatial Frequency Dual-path Cascaded Perception Module (SFDCP) synergizes spatial and frequency domains to model long-range dependencies and suppress noise, enhancing perception of low-contrast lesions. Subsequent to the SFDCP, the Spatial Channel Dual-path Parallel Perception Module (SCDPP) employs entropy-driven attention and multi-granularity convolutions in skip connections to dynamically select informative channels and extract lesion details across spatial scales. To verify the efficacy of our proposed DPMF-Net, extensive experimental assessments are carried out across four challenging datasets. The outcomes of these quantitative and qualitative experiments confirm that our approach significantly outperforms current state-of-the-art methods in terms of all evaluation metrics.

Details DOI

TMLR Journal 2025 Journal Article

DRDT3: Diffusion-Refined Decision Test-Time Training Model

Xingshuai Huang
Di Wu
Benoit Boulet

Decision Transformer (DT), a trajectory modelling method, has shown competitive performance compared to traditional offline reinforcement learning (RL) approaches on various classic control tasks. However, it struggles to learn optimal policies from suboptimal, reward-labelled trajectories. In this study, we explore the use of conditional generative modelling to facilitate trajectory stitching given its high-quality data generation ability. Additionally, recent advancements in Recurrent Neural Networks (RNNs) have shown their linear complexity and competitive sequence modelling performance over Transformers. We leverage the Test-Time Training (TTT) layer, an RNN that updates hidden states during testing, to model trajectories in the form of DT. We introduce a unified framework, called Diffusion-Refined Decision TTT (DRDT3), to achieve performance beyond DT models. Specifically, we propose the Decision TTT (DT3) module, which harnesses the sequence modelling strengths of both self-attention and the TTT layer to capture recent contextual information and make coarse action predictions. DRDT3 iteratively refines the coarse action predictions through the generative diffusion model, progressively moving closer to the optimal actions. We further integrate DT3 with the diffusion model using a unified optimization objective. With experiments on multiple tasks in the D4RL benchmark, our DT3 model without diffusion refinement demonstrates improved performance over standard DT, while DRDT3 further achieves superior results compared to state-of-the-art DT-based and offline RL methods.

PDF Details

AAAI Conference 2025 Conference Paper

Fully-Scalable Massively Parallel Algorithm for k-center with Outliers

Di Wu
Qilong Feng
Junyu Huang
Jinhui Xu
Ziyun Huang
Jianxin Wang

In this paper, we consider the k-center problem with outliers (the (k, z)-center problem) in the context of Massively Parallel Computation (MPC). Existing MPC algorithms for the (k, z)-center problem typically require Ω(k) local space per machine. While this may be feasible when k is small, these algorithms become impractical for large k, where each machine may lack sufficient space for computation. This motivates the study of fully-scalable algorithms with sublinear local space. We propose the first fully-scalable MPC algorithm for the (k, z)-center problem. The main challenge is to design an MPC algorithm that operates with sublinear local space for finding the inliers close to the optimal clustering centers, and ensuring the approximation loss remains bounded. To address this issue, we propose an iterative sampling-based algorithm with sublinear local space in the data size. A key component of our approach is an outliers-removal algorithm that adjusts the sample size in each iteration to select inliers as clustering centers. However, the number of discarded inliers increases with the iteration of the outliers-removal algorithm, making it difficult to bound. To address this, we propose a self-adaptive method that can automatically adjust sample size to account for different data distributions on each machine, ensuring a lower bound on the sampling success probability. With these techniques, we present an O(log^*n)-approximation MPC algorithm for the (k, z)-center problem in constant-dimensional Euclidean space. The algorithm discards at most (1 + ε)z outliers, completing in O(log log n) computation rounds while using Θ(n^δ) local space per machine.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

Generative Multi-Agent Collaboration in Embodied AI: A Systematic Review

Di Wu
Xian Wei
Guang Chen
Hao Shen
Bo Jin

Embodied multi-agent systems (EMAS) have attracted growing attention for their potential to address complex, real-world challenges in areas such as logistics and robotics. Recent advances in foundation models pave the way for generative agents capable of richer communication and adaptive problem-solving. This survey provides a systematic examination of how EMAS can benefit from these generative capabilities. We propose a taxonomy that categorizes EMAS by system architectures and embodiment modalities, emphasizing how collaboration spans both physical and virtual contexts. Central building blocks, perception, planning, communication, and feedback, are then analyzed to illustrate how generative techniques bolster system robustness and flexibility. Through concrete examples, we demonstrate the transformative effects of integrating foundation models into embodied, multi-agent frameworks. Finally, we discuss challenges and future directions, underlining the significant promise of EMAS to reshape the landscape of AI-driven collaboration.

PDF Details DOI

TMLR Journal 2025 Journal Article

Goal-Conditioned Data Augmentation for Offline Reinforcement Learning

Xingshuai Huang
Di Wu
Benoit Boulet

Offline reinforcement learning (RL) enables policy learning from pre-collected offline datasets, relaxing the need to interact directly with the environment. However, limited by the quality of offline datasets, it generally fails to learn well-qualified policies in suboptimal datasets. To address datasets with insufficient optimal demonstrations, we introduce Goal-cOnditioned Data Augmentation (GODA), a novel goal-conditioned diffusion-based method for augmenting samples with higher quality. Leveraging recent advancements in generative modelling, GODA incorporates a novel return-oriented goal condition with various selection mechanisms. Specifically, we introduce a controllable scaling technique to provide enhanced return-based guidance during data sampling. GODA learns a comprehensive distribution representation of the original offline datasets while generating new data with selectively higher-return goals, thereby maximizing the utility of limited optimal demonstrations. Furthermore, we propose a novel adaptive gated conditioning method for processing noisy inputs and conditions, enhancing the capture of goal-oriented guidance. We conduct experiments on the D4RL benchmark and real-world challenges, specifically traffic signal control (TSC) tasks, to demonstrate GODA's effectiveness in enhancing data quality and superior performance compared to state-of-the-art data augmentation methods across various offline RL algorithms.

PDF Details

NeurIPS Conference 2025 Conference Paper

Greedy Sampling Is Provably Efficient For RLHF

Di Wu
Chengshuai Shi
Jing Yang
Cong Shen

Reinforcement Learning from Human Feedback (RLHF) has emerged as a key technique for post‑training large language models. Despite its empirical success, the theoretical understanding of RLHF is still limited, as learning the KL-regularized target with only preference feedback poses additional challenges compared with canonical RL. Existing works mostly study the reward-based Bradley-Terry (BT) preference model, and extend classical designs utilizing optimism or pessimism. This work, instead, considers the general preference model (whose practical relevance has been observed recently) and obtains performance guarantees with major, order-wise improvements over existing ones. Surprisingly, these results are derived from algorithms that directly use empirical estimates (i. e. , greedy sampling), as opposed to constructing optimistic or pessimistic estimates in previous works. This insight has a deep root in the unique structural property of the optimal policy class under the KL-regularized target, and we further specialize it to the BT model, highlighting the surprising sufficiency of greedy sampling in RLHF.

PDF Details

EAAI Journal 2025 Journal Article

HMKRec: Optimize multi-user representation by hypergraph motifs for knowledge-aware recommendation

Di Wu
Mingjing Tang
Shu Zhang
Wei Gao

Knowledge graph-based recommender systems can explore users’ potential interests by learning user similarities, thereby further improving recommendation performance. However, existing methods focus only on the similarity between two users without considering the interaction patterns among multiple users, which overlook the influence of other users in user representation modeling. In this paper, we propose a novel framework using Hypergraph Motifs to optimize Multi-users representation for Recommendation (HMKRec). Specifically, HMKRec constructs a user–item hypergraph and maps it into a user–user adjacency graph. Then, it utilizes hypergraph motifs to model the interaction patterns of multiple users and reconstructs an implicit relationship network with weights and directions to explore high-order associations among multiple users. To learn the features of items and user relationships, we design a hierarchical graph convolution that integrates hypergraph convolutional networks and graph convolutional networks to obtain high-order representations of users. Finally, we propagate user preferences in the knowledge graph using the attention mechanism to obtain high-order representations of items for recommendation. Extensive experiments on three real-world datasets indicate that our method achieves at least a 1% performance improvement over the best-performing state-of-the-art baselines.

Details DOI

JBHI Journal 2025 Journal Article

Memory-Efficient Intrinsic Gating Adaptation for Enhanced On-Device Epilepsy Diagnosis

Shanjin Li
Di Wu
Shiqi Zhao
Jie Yang
Mohamad Sawan

Recently, advances in neuroscience and the rise of artificial intelligence have significantly enhanced the capabilities of epilepsy diagnosis. While EEG-based diagnosis offer a promising avenue for detecting and predicting seizure activity, practical implementation in real-world scenarios remains hindered by the heterogeneity of epilepsy and the variability of patient-specific biomarkers over time. Conventional deep learning models, trained on historical EEG, often fail to adapt to such biomarker variations, leading to degraded performance. Moreover, the computational and memory constraints of edge devices further exacerbate the challenge of on-device learning. To address these challenges, we introduce a novel framework, Memory-Efficient Intrinsic Gating Adaptation (MEIGA), designed to enhance real-world epilepsy diagnosis on resource-constrained edge devices. Our approach pre-trains a model using historical EEG data and employs lightweight adapter networks for efficient on-device tuning across new sessions, addressing session-to-session variability. By leveraging Direct Feedback Alignment (DFA), MEIGA reduces memory usage and computational overhead while maintaining high classification accuracy. Extensive experiments on the CHB-MIT epilepsy dataset demonstrate that MEIGA outperforms the pretrained-only Vision Transformer baseline, raising seizure prediction accuracy from 47. 88% to 86. 77% with only 3, 908 tunable parameters (5. 05% of the backbone). For seizure detection, MEIGA improves accuracy from 85. 06% to 96. 29% by adapting 2, 008 parameters (17. 40% of the base architecture). Further experiments on the AES dataset demonstrate that MEIGA consistently delivers strong performance across subjects and scales effectively to larger networks.

Details DOI

NeurIPS Conference 2025 Conference Paper

REArtGS: Reconstructing and Generating Articulated Objects via 3D Gaussian Splatting with Geometric and Motion Constraints

Di Wu
Liu Liu
Zhou Linli
Anran Huang
Liangtu Song
Qiaojun Yu
Qi Wu
Cewu Lu

Articulated objects, as prevalent entities in human life, their 3D representations play crucial roles across various applications. However, achieving both high-fidelity textured surface reconstruction and dynamic generation for articulated objects remains challenging for existing methods. In this paper, we present REArtGS, a novel framework that introduces additional geometric and motion constraints to 3D Gaussian primitives, enabling realistic surface reconstruction and generation for articulated objects. Specifically, given multi-view RGB images of arbitrary two states of articulated objects, we first introduce an unbiased Signed Distance Field (SDF) guidance to regularize Gaussian opacity fields, enhancing geometry constraints and improving surface reconstruction quality. Then we establish deformable fields for 3D Gaussians constrained by the kinematic structures of articulated objects, achieving unsupervised generation of surface meshes in unseen states. Extensive experiments on both synthetic and real datasets demonstrate our approach achieves high-quality textured surface reconstruction for given states, and enables high-fidelity surface generation for unseen states. Project site: https: //sites. google. com/view/reartgs/home.

PDF Details

ECAI Conference 2025 Conference Paper

Refining Dataset Distillation via Critical Region Selection and Multiview Teacher Guidance

Wenqing Ye
Xinyu Li
Hui Liu
Di Wu
Xiaoyan Sun 0001
Ronald X. Xu
Mingzhai Sun

Dataset distillation (DD) aims to improve training efficiency by condensing large datasets into compact yet informative subsets. Existing DD methods primarily use optimization-based approaches for image synthesis, which are computationally intensive. While some recent studies have explored optimization-free alternatives, their simplistic region selection strategies result in poor representations of the original dataset and inefficient utilization of synthetic data during downstream training. To address these limitations, we propose a method called Selection-and-Guidance for Dataset Distillation (SGDD). This approach refines the distillation process through two key stages: region selection and guidance enhancement. Specifically, we first obtain a candidate set of regions from various locations within the images. Then, we utilize the Representative Region Selector and Diverse Region Selector to identify the critical regions for image classification. Furthermore, we generate multiview guidance information for the synthetic data to enhance the distillation process further. By selecting representative and diverse regions while incorporating multiview guidance, our method unleashes the potential of optimization-free DD. Experimental results substantiate the superiority of our approach across various datasets and network architectures.

Details

NeurIPS Conference 2025 Conference Paper

Towards Effective Federated Graph Foundation Model via Mitigating Knowledge Entanglement

Yinlin Zhu
Xunkai Li
Jishuo Jia
Miao Hu
Di Wu
Meikang Qiu

Recent advances in graph machine learning have shifted to data-centric paradigms, driven by two emerging research fields: (1) Federated graph learning (FGL) facilitates multi-client collaboration but struggles with data and task heterogeneity, resulting in limited practicality; (2) Graph foundation model (GFM) enables desirable domain generalization but is typically confined to single-machine training, neglecting the potential of cross-silo data and computational resources. It is evident that these two paradigms are complementary, and their integration offers substantial advantages. Motivated by this, we present a pioneering study about the federated graph foundation model (FedGFM), a novel decentralized GFM training paradigm. Despite the promising vision of FedGFM, knowledge entanglement has emerged as a critical challenge, where multi-domain knowledge is encoded into indistinguishable representations, thereby limiting downstream adaptation. To this end, we propose FedGFM+, an effective FedGFM framework with two key modules to mitigate knowledge entanglement in a dual-pronged manner. (1) AncDAI: From a global perspective, we introduce a novel anchor-based domain-aware initialization strategy. Before pre-training, each client encodes its local graph into a domain-specific prototypes, which serve as semantic anchors in the representation space. Around each anchor, we construct synthetic embeddings to initialize the global model. We theoretically show that these prototypes are distinguishable across domains, and the initialization provides a strong inductive bias that facilitates disentanglement of domain-specific knowledge. (2) AdaDPP: From a local perspective, during pre-training, each client independently learns a lightweight graph prompt that captures domain semantic preferences. During fine-tuning, prompts from all clients are aggregated into an adaptive domain-sensitive prompt pool, from which the GFM selects relevant prompts to augment the target graph’s attributes, thereby improving the downstream adaptation. FedGFM+ is extensively evaluated on 8 diverse benchmarks spanning multiple domains and tasks, outperforming 20 baselines from isolated supervised learning, FGL, and federated variants of centralized GFM paradigms.

PDF Details

TCS Journal 2024 Journal Article

Approximation algorithms for fair k-median problem without fairness violation

Di Wu
Qilong Feng
Jianxin Wang

The fair k-median problem is one of the important clustering problems. The current best approximation ratio is 4. 675 for this problem with 1-fairness violation, which was proposed by Bercea et al. [APPROX-RANDOM'2019]. To our best knowledge, there is no available approximation algorithm for this problem without any fairness violation in doubling metrics. In this paper, we consider the fair k-median problem in doubling metrics and general metrics. We provide the first QPTAS for the fair k-median problem based on the hierarchical decomposition and dynamic programming process in doubling metrics. For applying a dynamic programming process to solve this problem, the distances from portals to facilities cannot be directly enumerated since each client may not be assigned to its closest open facility. To overcome the difficulties caused by the fairness constraints, we construct an auxiliary graph and use minimum weighted perfect matching to get the cost between the portals of each block and the ones in its children. In order to satisfy the fairness constraints, we bound the fairness constraints of each open facility in the leaves of the split-tree based on the relation between the subproblem and the subproblems of its children. To obtain the assignment of the given instance and remove the fairness violation, we construct a new b-value min-cost max-flow model based on the set of open facilities. For the fair k-median problem in general metrics, we present a polynomial-time approximation algorithm with ratio O ( log ⁡ k ). Our approximation algorithm for the fair k-median problem in doubling metrics is the first result for the corresponding problem without any fairness violation in doubling metrics.

Details DOI

IJCAI Conference 2024 Conference Paper

BADFSS: Backdoor Attacks on Federated Self-Supervised Learning

Jiale Zhang
Chengcheng Zhu
Di Wu
Xiaobing Sun
Jianming Yong
Guodong Long

Self-supervised learning (SSL) is capable of learning remarkable representations from centrally available data. Recent works further implement federated learning with SSL to learn from rapidly growing decentralized unlabeled images (e. g. , from cameras and phones), often resulting from privacy constraints. Extensive attention has been paid to designing new frameworks or methods that achieve better performance for the SSL-based FL. However, such an effort has not yet taken the security of SSL-based FL into consideration. We aim to explore backdoor attacks in the context of SSL-based FL via an in-depth empirical study. In this paper, we propose a novel backdoor attack BADFSS against SSL-based FL. First, BADFSS learns a backdoored encoder via supervised contrastive learning on poison datasets constructed based on local datasets. Then, BADFSS employs attention alignment to enhance the backdoor effect and maintain the consistency between backdoored and global encoders. Moreover, we perform empirical evaluations of the proposed backdoor attacks on four datasets and compared BADFSS with three existing backdoor attacks that are transferred into federated self-supervised learning. The experiments demonstrate that BADFSS outperforms baseline methods and is effective under various settings.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

FedTAD: Topology-aware Data-free Knowledge Distillation for Subgraph Federated Learning

Yinlin Zhu
Xunkai Li
Zhengyu Wu
Di Wu
Miao Hu
Rong-Hua Li

Subgraph federated learning (subgraph-FL) is a new distributed paradigm that facilitates the collaborative training of graph neural networks (GNNs) by multi-client subgraphs. Unfortunately, a significant challenge of subgraph-FL arises from subgraph heterogeneity, which stems from node and topology variation, causing the impaired performance of the global GNN. Despite various studies, they have not yet thoroughly investigated the impact mechanism of subgraph heterogeneity. To this end, we decouple node and topology variation, revealing that they correspond to differences in label distribution and structure homophily. Remarkably, these variations lead to significant differences in the class-wise knowledge reliability of multiple local GNNs, misguiding the model aggregation with varying degrees. Building on this insight, we propose topology-aware data-free knowledge distillation technology (FedTAD), enhancing reliable knowledge transfer from the local model to the global model. Extensive experiments on six public datasets consistently demonstrate the superiority of FedTAD over state-of-the-art baselines.

PDF Details DOI

JBHI Journal 2024 Journal Article

Isolated Random Forest Assisted Spatio-Temporal Ant Colony Evolutionary Algorithm for Cell Tracking in Time-Lapse Sequences

Benlian Xu
Di Wu
Jian Shi
Jinliang Cong
Mingli Lu
Feng Yang
Brett Nener

Multi-Object tracking in real world environments is a tough problem, especially for cell morphogenesis with division. Most cell tracking methods are hard to achieve reliable mitosis detection, efficient inter-frame matching, and accurate state estimation simultaneously within a unified tracking framework. In this paper, we propose a novel unified framework that leverages a spatio-temporal ant colony evolutionary algorithm to track cells amidst mitosis under measurement uncertainty. Each Bernoulli ant colony representing a migrating cell is able to capture the occurrence of mitosis through the proposed Isolation Random Forest (IRF)-assisted temporal mitosis detection algorithm with the assumption that mitotic cells exhibit unique spatio-temporal features different from non-mitotic ones. Guided by prediction of a division event, multiple ant colonies evolve between consecutive frames according to an augmented assignment matrix solved by the extended Hungarian method. To handle dense cell populations, an efficient group partition between cells and measurements is exploited, which enables multiple assignment tasks to be executed in parallel with a reduction in matrix dimension. After inter-frame traversing, the ant colony transitions to a foraging stage in which it begins approximating the Bernoulli parameter to estimate cell state by iteratively updating its pheromone field. Experiments on multi-cell tracking in the presence of cell mitosis and morphological changes are conducted, and the results demonstrate that the proposed method outperforms state-of-the-art approaches, striking a balance between accuracy and computational efficiency.

Details DOI

AAAI Conference 2024 Conference Paper

MKG-FENN: A Multimodal Knowledge Graph Fused End-to-End Neural Network for Accurate Drug–Drug Interaction Prediction

Di Wu
Wu Sun
Yi He
Zhong Chen
Xin Luo

Taking incompatible multiple drugs together may cause adverse interactions and side effects on the body. Accurate prediction of drug-drug interaction (DDI) events is essential for avoiding this issue. Recently, various artificial intelligence-based approaches have been proposed for predicting DDI events. However, DDI events are associated with complex relationships and mechanisms among drugs, targets, enzymes, transporters, molecular structures, etc. Existing approaches either partially or loosely consider these relationships and mechanisms by a non-end-to-end learning framework, resulting in sub-optimal feature extractions and fusions for prediction. Different from them, this paper proposes a Multimodal Knowledge Graph Fused End-to-end Neural Network (MKGFENN) that consists of two main parts: multimodal knowledge graph (MKG) and fused end-to-end neural network (FENN). First, MKG is constructed by comprehensively exploiting DDI events-associated relationships and mechanisms from four knowledge graphs of drugs-chemical entities, drug-substructures, drugs-drugs, and molecular structures. Correspondingly, a four channels graph neural network is designed to extract high-order and semantic features from MKG. Second, FENN designs a multi-layer perceptron to fuse the extracted features by end-to-end learning. With such designs, the feature extractions and fusions of DDI events are guaranteed to be comprehensive and optimal for prediction. Through extensive experiments on real drug datasets, we demonstrate that MKG-FENN exhibits high accuracy and significantly outperforms state-of-the-art models in predicting DDI events. The source code and supplementary file of this article are available on: https://github.com/wudi1989/MKG-FENN.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

MMGNN: A Molecular Merged Graph Neural Network for Explainable Solvation Free Energy Prediction

Wenjie Du
Shuai Zhang
Di Wu
Jun Xia
Ziyuan Zhao
Junfeng Fang
Yang Wang

In this paper, we address the challenge of accurately modeling and predicting Gibbs free energy in solute-solvent interactions, a pivotal yet complex aspect in the field of chemical modeling. Traditional approaches, primarily relying on deep learning models, face limitations in capturing the intricate dynamics of these interactions. To overcome these constraints, we introduce a novel framework, molecular modeling graph neural network (MMGNN), which more closely mirrors real-world chemical processes. Specifically, MMGNN explicitly models atomic interactions such as hydrogen bonds by initially forming indiscriminate connections between intermolecular atoms, which are then refined using an attention-based aggregation method, tailoring to specific solute-solvent pairs. To address the challenges of non-interactive or repulsive atomic interactions, MMGNN incorporates interpreters for nodes and edges in the merged graph, enhancing explainability and reducing redundancy. MMGNN stands as the first framework to explicitly align with real chemical processes, providing a more accurate and scientifically sound approach to modeling solute-solvent interactions. The infusion of explainability allows for the extraction of key subgraphs, which are pivotal for further research in solute-solvent dynamics. Extensive experimental validation confirms the efficacy and enhanced explainability of MMGNN.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Neural Network Approximation for Pessimistic Offline Reinforcement Learning

Di Wu
Yuling Jiao
Li Shen
Haizhao Yang
Xiliang Lu

Deep reinforcement learning (RL) has shown remarkable success in specific offline decision-making scenarios, yet its theoretical guarantees are still under development. Existing works on offline RL theory primarily emphasize a few trivial settings, such as linear MDP or general function approximation with strong assumptions and independent data, which lack guidance for practical use. The coupling of deep learning and Bellman residuals makes this problem challenging, in addition to the difficulty of data dependence. In this paper, we establish a non-asymptotic estimation error of pessimistic offline RL using general neural network approximation with C-mixing data regarding the structure of networks, the dimension of datasets, and the concentrability of data coverage, under mild assumptions. Our result shows that the estimation error consists of two parts: the first converges to zero at a desired rate on the sample size with partially controllable concentrability, and the second becomes negligible if the residual constraint is tight. This result demonstrates the explicit efficiency of deep adversarial offline RL frameworks. We utilize the empirical process tool for C-mixing sequences and the neural network approximation theory for the Holder class to achieve this. We also develop methods to bound the Bellman estimation error caused by function approximation with empirical Bellman constraint perturbations. Additionally, we present a result that lessens the curse of dimensionality using data with low intrinsic dimensionality and function classes with low complexity. Our estimation provides valuable insights into the development of deep offline RL and guidance for algorithm model design.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Robot Policy Learning with Temporal Optimal Transport Reward

Yuwei Fu
Haichao Zhang
Di Wu
Wei Xu
Benoit Boulet

Reward specification is one of the most tricky problems in Reinforcement Learning, which usually requires tedious hand engineering in practice. One promising approach to tackle this challenge is to adopt existing expert video demonstrations for policy learning. Some recent work investigates how to learn robot policies from only a single/few expert video demonstrations. For example, reward labeling via Optimal Transport (OT) has been shown to be an effective strategy to generate a proxy reward by measuring the alignment between the robot trajectory and the expert demonstrations. However, previous work mostly overlooks that the OT reward is invariant to temporal order information, which could bring extra noise to the reward signal. To address this issue, in this paper, we introduce the Temporal Optimal Transport (TemporalOT) reward to incorporate temporal order information for learning a more accurate OT-based proxy reward. Extensive experiments on the Meta-world benchmark tasks validate the efficacy of the proposed method. Our code is available at: https: //github. com/fuyw/TemporalOT.

PDF Details DOI

YNIMG Journal 2024 Journal Article

Trajectories and sex differences of brain structure, oxygenation and perfusion functions in normal aging

Di Wu
Yuanhao Li
Shun Zhang
Qiuyue Chen
Jiayu Fang
Junghun Cho
Yi Wang
Su Yan

BACKGROUND: Brain structure, oxygenation and perfusion are important factors in aging. Coupling between regional cerebral oxygen consumption and perfusion also reflects functions of neurovascular unit (NVU). Their trajectories and sex differences during normal aging important for clinical interpretation are still not well defined. In this study, we aim to investigate the relationship between brain structure, functions and age, and exam the sex disparities. METHOD: < 0.05 was considered statistically significant. RESULTS: < 0.05). CONCLUSION: The sex disparities, age trajectories of brain structure and functions as well as the coupling of NVU in healthy individuals provide insights into normal aging which are potential targets for study of pathological conditions.

Details DOI

IJCAI Conference 2024 Conference Paper

Unsupervised Anomaly Detection via Masked Diffusion Posterior Sampling

Di Wu
Shicai Fan
Xue Zhou
Li Yu
Yuzhong Deng
Jianxiao Zou
Baihong Lin

Reconstruction-based methods have been commonly used for unsupervised anomaly detection, in which a normal image is reconstructed and compared with the given test image to detect and locate anomalies. Recently, diffusion models have shown promising applications for anomaly detection due to their powerful generative ability. However, these models lack strict mathematical support for normal image reconstruction and unexpectedly suffer from low reconstruction quality. To address these issues, this paper proposes a novel and highly-interpretable method named Masked Diffusion Posterior Sampling (MDPS). In MDPS, the problem of normal image reconstruction is mathematically modeled as multiple diffusion posterior sampling for normal images based on the devised masked noisy observation model and the diffusion-based normal image prior under Bayesian framework. Using a metric designed from pixel-level and perceptual-level perspectives, MDPS can effectively compute the difference map between each normal posterior sample and the given test image. Anomaly scores are obtained by averaging all difference maps for multiple posterior samples. Exhaustive experiments on MVTec and BTAD datasets demonstrate that MDPS can achieve state-of-the-art performance in normal image reconstruction quality as well as anomaly detection and localization.

PDF Details DOI

EAAI Journal 2023 Journal Article

A general motion controller based on deep reinforcement learning for an autonomous underwater vehicle with unknown disturbances

Fei Huang
Jian Xu
Di Wu
Yunfei Cui
Zheping Yan
Wen Xing
Xun Zhang

This paper studies the application of deep Reinforcement Learning (RL) in the motion control of an underactuated autonomous underwater vehicle (AUV) with unknown disturbances. Firstly, a general state space, action space and reward function are designed for motion control problems rather than each specific motion control task, which ensures the generality of our method. Furthermore, a virtual AUV model with partial random disturbances is established, and on this basis, a simulation training method is developed to solve the problems of extremely high risk and extremely low efficiency caused by training in actual experiments. Then, in order to directly deploy the optimal control policy obtained through simulation training to an actual AUV, we employ Extended State Observers (ESOs) to estimate the unknown disturbances in five degrees of freedom, and give a deployment method using the estimated values as the disturbance state vector and compensation vector. Combining the above training method and deployment method, a novel general motion controller is proposed. Finally, four different AUV motion control simulations are carried out, and the results confirm the generality and effectiveness of our proposed controller.

Details DOI

IJCAI Conference 2023 Conference Paper

BARA: Efficient Incentive Mechanism with Online Reward Budget Allocation in Cross-Silo Federated Learning

Yunchao Yang
Yipeng Zhou
Miao Hu
Di Wu
Quan Z. Sheng

Federated learning (FL) is a prospective distributed machine learning framework that can preserve data privacy. In particular, cross-silo FL can complete model training by making isolated data islands of different organizations collaborate with a parameter server (PS) via exchanging model parameters for multiple communication rounds. In cross-silo FL, an incentive mechanism is indispensable for motivating data owners to contribute their models to FL training. However, how to allocate the reward budget among different rounds is an essential but complicated problem largely overlooked by existing works. The challenge of this problem lies in the opaque feedback between reward budget allocation and model utility improvement of FL, making the optimal reward budget allocation complicated. To address this problem, we design an online reward budget allocation algorithm using Bayesian optimization named BARA (Budget Allocation for Reverse Auction). Specifically, BARA can model the complicated relationship between reward budget allocation and final model accuracy in FL based on historical training records so that the reward budget allocated to each communication round is dynamically optimized so as to maximize the final model utility. We further incorporate the BARA algorithm into reverse auction-based incentive mechanisms to illustrate its effectiveness. Extensive experiments are conducted on real datasets to demonstrate that BARA significantly outperforms competitive baselines by improving model utility with the same amount of reward budget.

PDF Details DOI

EAAI Journal 2023 Journal Article

Corrigendum to “Unsupervised Simple Siamese Representation Learning for Blind Super-Resolution” [Eng. Appl. Artif. Intell. 114 (2022) 105092]

Pengfei Yin
Zhonghua Liu
Di Wu
Hua Huo
Haijun Wang
Kaibing Zhang

Details DOI

AAAI Conference 2023 Conference Paper

DPAUC: Differentially Private AUC Computation in Federated Learning

Jiankai Sun
Xin Yang
Yuanshun Yao
Junyuan Xie
Di Wu
Chong Wang

Federated learning (FL) has gained significant attention recently as a privacy-enhancing tool to jointly train a machine learning model by multiple participants. The prior work on FL has mostly studied how to protect label privacy during model training. However, model evaluation in FL might also lead to the potential leakage of private label information. In this work, we propose an evaluation algorithm that can accurately compute the widely used AUC (area under the curve) metric when using the label differential privacy (DP) in FL. Through extensive experiments, we show our algorithms can compute accurate AUCs compared to the ground truth. The code is available at https://github.com/bytedance/fedlearner/tree/master/example/privacy/DPAUC

PDF Details DOI

JBHI Journal 2023 Journal Article

E-DGAN: An Encoder-Decoder Generative Adversarial Network Based Method for Pathological to Normal Voice Conversion

Minghang Chu
Mengtao Yang
Chao Xu
Yaoyao Ma
Jing Wang
Zhiwei Fan
Zhi Tao
Di Wu

In recent years, more and more people suffer from voice-related diseases. Given the limitations of current pathological speech conversion methods, that is, a method can only convert a single kind of pathological voice. In this study, we propose a novel Encoder-Decoder Generative Adversarial Network (E-DGAN) to generate personalized speech for pathological to normal voice conversion, which is suitable for multiple kinds of pathological voices. Our proposed method can also solve the problem of improving the intelligibility and personalizing custom speech of pathological voices. Feature extraction is performed using a mel filter bank. The conversion network is an encoder-decoder structure, which is used to convert the mel spectrogram of pathological voices to the mel spectrogram of normal voices. After being converted by the residual conversion network, the personalized normal speech is synthesized by the neural vocoder. In addition, we propose a subjective evaluation metric named “content similarity” to evaluate the consistency between the converted pathological voice content and the reference content. The Saarbrücken Voice Database (SVD) is used to verify the proposed method. The intelligibility and content similarity of pathological voices are increased by 18. 67% and 2. 60%, respectively. Besides, an intuitive analysis based on a spectrogram was done and a significant improvement was achieved. The results show that our proposed method can improve the intelligibility of pathological voices and personalize the conversion of pathological voices into the normal voices of 20 different speakers. Our proposed method is compared with five other pathological voice conversion methods, and our proposed method has the best evaluation results.

Details DOI

IJCAI Conference 2023 Conference Paper

FedDWA: Personalized Federated Learning with Dynamic Weight Adjustment

Jiahao Liu
Jiang Wu
Jinyu Chen
Miao Hu
Yipeng Zhou
Di Wu

Different from conventional federated learning, personalized federated learning (PFL) is able to train a customized model for each individual client according to its unique requirement. The mainstream approach is to adopt a kind of weighted aggregation method to generate personalized models, in which weights are determined by the loss value or model parameters among different clients. However, such kinds of methods require clients to download others' models. It not only sheer increases communication traffic but also potentially infringes data privacy. In this paper, we propose a new PFL algorithm called FedDWA (Federated Learning with Dynamic Weight Adjustment) to address the above problem, which leverages the parameter server (PS) to compute personalized aggregation weights based on collected models from clients. In this way, FedDWA can capture similarities between clients with much less communication overhead. More specifically, we formulate the PFL problem as an optimization problem by minimizing the distance between personalized models and guidance models, so as to customize aggregation weights for each client. Guidance models are obtained by the local one-step ahead adaptation on individual clients. Finally, we conduct extensive experiments using five real datasets and the results demonstrate that FedDWA can significantly reduce the communication traffic and achieve much higher model accuracy than the state-of-the-art approaches.

PDF Details DOI

JMLR Journal 2023 Journal Article

Gap Minimization for Knowledge Sharing and Transfer

Boyu Wang
Jorge A. Mendez
Changjian Shui
Fan Zhou
Di Wu
Gezheng Xu
Christian Gagné
Eric Eaton

Learning from multiple related tasks by knowledge sharing and transfer has become increasingly relevant over the last two decades. In order to successfully transfer information from one task to another, it is critical to understand the similarities and differences between the domains. In this paper, we introduce the notion of performance gap, an intuitive and novel measure of the distance between learning tasks. Unlike existing measures which are used as tools to bound the difference of expected risks between tasks (e.g., $\mathcal{H}$-divergence or discrepancy distance), we theoretically show that the performance gap can be viewed as a data- and algorithm-dependent regularizer, which controls the model complexity and leads to finer guarantees. More importantly, it also provides new insights and motivates a novel principle for designing strategies for knowledge sharing and transfer: gap minimization. We instantiate this principle with two algorithms: 1. gapBoost, a novel and principled boosting algorithm that explicitly minimizes the performance gap between source and target domains for transfer learning; and 2. gapMTNN, a representation learning algorithm that reformulates gap minimization as semantic conditional matching for multitask learning. Our extensive evaluation on both transfer learning and multitask learning benchmark data sets shows that our methods outperform existing baselines. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2023. ( edit, beta )

PDF Details

YNIMG Journal 2023 Journal Article

Human anterior thalamic stimulation evoked cortical potentials align with intrinsic functional connectivity

Di Wu
Frederic L.W.V.J. Schaper
Guangyuan Jin
Lei Qi
Jialin Du
Xiaopeng Wang
Yuke Wang
Cuiping Xu

Characterizing human thalamocortical network is fundamental for understanding a vast array of human behaviors since the thalamus plays a central role in cortico-subcortical communication. Over the past few decades, advances in functional magnetic resonance imaging have allowed for spatial mapping of intrinsic resting-state functional connectivity (RSFC) between both cortical regions and in cortico-subcortical networks. Despite these advances, identifying the electrophysiological basis of human thalamocortical network architecture remains challenging. By leveraging stereoelectroencephalography electrodes temporarily implanted into distributed cortical regions and the anterior nucleus of the thalamus (ANT) of 10 patients with refractory focal epilepsy, we tested whether ANT stimulation evoked cortical potentials align with RSFC from the stimulation site, derived from a normative functional connectome (n = 1000). Our study identifies spatial convergence of ANT stimulation evoked cortical potentials and normative RSFC. Other than connections to the Papez circuit, the ANT was found to be closely connected to several distinct higher-order association cortices, including the precuneus, angular gyrus, dorsal lateral prefrontal cortex, and anterior insula. Remarkably, we found that the spatial distribution and magnitude of cortical-evoked responses to single-pulse electrical stimulation of the ANT aligned with the spatial pattern and strength of normative RSFC of the stimulation site. The present study provides electrophysiological evidence that stimulation evoked electrical activity flows along intrinsic brain networks connected on a thalamocortical level.

Details DOI

AAMAS Conference 2023 Conference Paper

Intention Progression with Maintenance Goals

Di Wu
Yuan Yao
Natasha Alechina
Brian Logan
John Thangarajah

PDF

TMLR Journal 2023 Journal Article

Lightweight Learner for Shared Knowledge Lifelong Learning

Yunhao Ge
Yuecheng Li
Di Wu
Ao Xu
Adam M. Jones
Amanda Sofie Rios
Iordanis Fostiropoulos
shixian wen

In Lifelong Learning (LL), agents continually learn as they encounter new conditions and tasks. Most current LL is limited to a single agent that learns tasks sequentially. Dedicated LL machinery is then deployed to mitigate the forgetting of old tasks as new tasks are learned. This is inherently slow. We propose a new Shared Knowledge Lifelong Learning (SKILL) challenge, which deploys a decentralized population of LL agents that each sequentially learn different tasks, with all agents operating independently and in parallel. After learning their respective tasks, agents share and consolidate their knowledge over a decentralized communication network, so that, in the end, all agents can master all tasks. We present one solution to SKILL which uses Lightweight Lifelong Learning (LLL) agents, where the goal is to facilitate efficient sharing by minimizing the fraction of the agent that is specialized for any given task. Each LLL agent thus consists of a common task-agnostic immutable part, where most parameters are, and individual task-specific modules that contain fewer parameters but are adapted to each task. Agents share their task-specific modules, plus summary information ("task anchors") representing their tasks in the common task-agnostic latent space of all agents. Receiving agents register each received task-specific module using the corresponding anchor. Thus, every agent improves its ability to solve new tasks each time new task-specific modules and anchors are received. If all agents can communicate with all others, eventually all agents become identical and can solve all tasks. On a new, very challenging SKILL-102 dataset with 102 image classification tasks (5,033 classes in total, 2,041,225 training, 243,464 validation, and 243,464 test images), we achieve much higher (and SOTA) accuracy over 8 LL baselines, while also achieving near perfect parallelization. Code and data can be found at https://github.com/gyhandy/Shared-Knowledge-Lifelong-Learning

PDF Details

AAAI Conference 2023 Conference Paper

Online Semi-supervised Learning with Mix-Typed Streaming Features

Di Wu
Shengda Zhuo
Yu Wang
Zhong Chen
Yi He

Online learning with feature spaces that are not fixed but can vary over time renders a seemingly flexible learning paradigm thus has drawn much attention. Unfortunately, two restrictions prohibit a ubiquitous application of this learning paradigm in practice. First, whereas prior studies mainly assume a homogenous feature type, data streams generated from real applications can be heterogeneous in which Boolean, ordinal, and continuous co-exist. Existing methods that prescribe parametric distributions such as Gaussians would not suffice to model the correlation among such mixtyped features. Second, while full supervision seems to be a default setup, providing labels to all arriving data instances over a long time span is tangibly onerous, laborious, and economically unsustainable. Alas, a semi-supervised online learner that can deal with mix-typed, varying feature spaces is still missing. To fill the gap, this paper explores a novel problem, named Online Semi-supervised Learning with Mixtyped streaming Features (OSLMF), which strives to relax the restrictions on the feature type and supervision information. Our key idea to solve the new problem is to leverage copula model to align the data instances with different feature spaces so as to make their distance measurable. A geometric structure underlying data instances is then established in an online fashion based on their distances, through which the limited labeling information is propagated, from the scarce labeled instances to their close neighbors. Experimental results are documented to evidence the viability and effectiveness of our proposed approach. Code is released in https://github.com/wudi1989/OSLMF.

PDF Details DOI

NeurIPS Conference 2022 Conference Paper

A Closer Look at Offline RL Agents

Yuwei Fu
Di Wu
Benoit Boulet

Despite recent advances in the field of Offline Reinforcement Learning (RL), less attention has been paid to understanding the behaviors of learned RL agents. As a result, there remain some gaps in our understandings, i. e. , why is one offline RL agent more performant than another? In this work, we first introduce a set of experiments to evaluate offline RL agents, focusing on three fundamental aspects: representations, value functions and policies. Counterintuitively, we show that a more performant offline RL agent can learn relatively low-quality representations and inaccurate value functions. Furthermore, we showcase that the proposed experiment setups can be effectively used to diagnose the bottleneck of offline RL agents. Inspired by the evaluation results, a novel offline RL algorithm is proposed by a simple modification of IQL and achieves SOTA performance. Finally, we investigate when a learned dynamics model is helpful to model-free offline RL agents, and introduce an uncertainty-based sample selection method to mitigate the problem of model noises. Code is available at: https: //github. com/fuyw/RIQL.

PDF Details

IJCAI Conference 2022 Conference Paper

Constrained Adaptive Projection with Pretrained Features for Anomaly Detection

Xingtai Gui
Di Wu
Yang Chang
Shicai Fan

Anomaly detection aims to separate anomalies from normal samples, and the pretrained network is promising for anomaly detection. However, adapting the pretrained features would be confronted with the risk of pattern collapse when finetuning on one-class training data. In this paper, we propose an anomaly detection framework called constrained adaptive projection with pretrained features (CAP). Combined with pretrained features, a simple linear projection head applied on a specific input and its k most similar pretrained normal representations is designed for feature adaptation, and a reformed self-attention is leveraged to mine the inner-relationship among one-class semantic features. A loss function is proposed to avoid potential pattern collapse. Concretely, it considers the similarity between a specific data and its corresponding adaptive normal representation, and incorporates a constraint term slightly aligning pretrained and adaptive spaces. Our method achieves state-of-the-art anomaly detection performance on semantic anomaly detection and sensory anomaly detection benchmarks including 96. 5% AUROC on CIFAR-100 dataset, 97. 0% AUROC on CIFAR-10 dataset and 89. 9% AUROC on MvTec dataset.

PDF Details DOI

AAAI Conference 2022 Conference Paper

Reinforcement Learning Based Dynamic Model Combination for Time Series Forecasting

Yuwei Fu
Di Wu
Benoit Boulet

Time series data appears in many real-world fields such as energy, transportation, communication systems. Accurate modelling and forecasting of time series data can be of significant importance to improve the efficiency of these systems. Extensive research efforts have been taken for time series problems. Different types of approaches, including both statistical-based methods and machine learning-based methods, have been investigated. Among these methods, ensemble learning has shown to be effective and robust. However, it is still an open question that how we should determine weights for base models in the ensemble. Sub-optimal weights may prevent the final model from reaching its full potential. To deal with this challenge, we propose a reinforcement learning (RL) based model combination (RLMC) framework for determining model weights in an ensemble for time series forecasting tasks. By formulating model selection as a sequential decision-making problem, RLMC learns a deterministic policy to output dynamic model weights for non-stationary time series data. RLMC further leverages deep learning to learn hidden features from raw time series data to adapt fast to the changing data distribution. Extensive experiments on multiple real-world datasets have been implemented to showcase the effectiveness of the proposed method.

PDF Details

EAAI Journal 2022 Journal Article

Unsupervised simple Siamese representation learning for blind super-resolution

Pengfeng Yin
Zhonghua Liu
Di Wu
Hua Huo
Haijun Wang
Kaibing Zhang

Deep convolutional neural networks have made unprecedented achievements in image super-resolution (SR) and dominated the field due to their remarkable performance. When the degradation pattern of the test images is inconsistent with the training images, it leads to poor model performance. For example, the degradation could happen after a dimensional stretching. In this case, the most common method is to take blurry, noise, and low-resolution (LR) images and reconstructs SR images by degradation estimation. However, the SR results for this method are highly dependent on the estimation accuracy. To overcome the difficulty with the degradation estimation, this paper designs a degradation representation attention network (DRAN) for image SR. In which, we propose the use of a simple Siamese representation learning to extract the degradation information from various LR images. Specifically, DRAN distinguishes degradation information instead of performing degradation estimation, which can greatly reduce the difficulty. In other words, DRAN can avoid pixel-level operations, transform degradation computation problems into degradation classification problems and flexibly process LR images through degradation representation learning. Finally, DRAN also introduces a channel attention mechanism to enhance the performance of SR. Experimental results show that the proposed scheme can distinguish different degradation modes and obtain accurate degradation information. Meanwhile, experiments on synthetic and real images show that the DRAN achieves remarkable performance on blind SR tasks with good visual effects.

Details DOI

EAAI Journal 2021 Journal Article

Causal artificial neural network and its applications in engineering design

Di Wu
G. Gary Wang

To reduce the computational cost in engineering design, expensive high-fidelity simulation models are approximated by mathematical models, named as metamodels. Typical metamodeling methods assume that expensive simulation models are black-box functions. In this paper, in order to improve the accuracy of metamodels and reduce the cost of building metamodels, knowledge about engineering design problems is employed to help develop a novel metamodel, named as causal artificial neural network (causal-ANN). Cause–effect relations intrinsic to the design problem are employed to decompose an ANN into sub-networks and values of intermediate variables are utilized to train these sub-networks. By involving knowledge of the design problem, the accuracy of causal-ANN is higher than the traditional metamodeling methods that assume black-box functions. Additionally, one can identify attractive subspaces from the causal-ANN by leveraging the structure of the causal-ANN and the theory of Bayesian Networks. The impacts of fidelity of causal graphs and design variable correlations are also discussed in the paper. The engineering case studies demonstrate that the causal-ANN can be accurately constructed with a small number of expensive simulations, and attractive design subspaces can be identified directly from the causal-ANN.

Details DOI

JBHI Journal 2021 Journal Article

Prediction of Synthetic Lethal Interactions in Human Cancers Using Multi-View Graph Auto-Encoder

Zhifeng Hao
Di Wu
Yuan Fang
Min Wu
Ruichu Cai
Xiaoli Li

Synthetic lethality (SL) is a very important concept for the development of targeted anticancer drugs. However, experimental methods for SL detection often suffer from various issues like high cost and low consistency across cell lines. Hence, computational methods for predicting novel SLs have recently emerged as complements for wet-lab experiments. In addition, SL data can be represented as a graph where nodes are genes and edges are the SL interactions. It is thus motivated to design advanced graph-based machine learning algorithms for SL prediction. In this paper, we propose a novel SL prediction method using Multi-view Graph Auto-Encoder (SLMGAE). We consider the SL graph as the main view and the graphs from other data sources (e. g. , PPI, GO, etc.) as support views. Multiple Graph Auto-Encoders (GAEs) are implemented to reconstruct the graphs for different views. We further design an attention mechanism, which assigns different weights for support views, to combine all the reconstructed graphs for SL prediction. The overall SLMGAE model is then trained by minimizing both the reconstruction error and prediction error. Experimental results on the SynLethDB dataset show that SLMGAE outperforms state-of-the-arts. The case studies on novel predicted SLs also illustrate the effectiveness of our SLMGAE method.

Details DOI

IJCAI Conference 2021 Conference Paper

Residential Electric Load Forecasting via Attentive Transfer of Graph Neural Networks

Weixuan Lin
Di Wu

An accurate short-term electric load forecasting is critical for modern electric power systems' safe and economical operation. Electric load forecasting can be formulated as a multi-variate time series problem. Residential houses in the same neighborhood may be affected by similar factors and share some latent spatial dependencies. However, most of the existing works on electric load forecasting fail to explore such dependencies. In recent years, graph neural networks (GNNs) have shown impressive success in modeling such dependencies. However, such GNN based models usually would require a large amount of training data. We may have a minimal amount of data available to train a reliable forecasting model for houses in a new neighborhood area. At the same time, we may have a large amount of historical data collected from other houses that can be leveraged to improve the new neighborhood's prediction performance. In this paper, we propose an attentive transfer learning-based GNN model that can utilize the learned prior knowledge to improve the learning process in a new area. The transfer process is achieved by an attention network, which generically avoids negative transfer by leveraging knowledge from multiple sources. Extensive experiments have been conducted on real-world data sets. Results have shown that the proposed framework can consistently outperform baseline models in different areas.

PDF Details DOI

IJCAI Conference 2019 Conference Paper

Online Learning from Capricious Data Streams: A Generative Approach

Yi He
Baijun Wu
Di Wu
Ege Beyazit
Sheng Chen
Xindong Wu

Learning with streaming data has received extensive attention during the past few years. Existing approaches assume the feature space is fixed or changes by following explicit regularities, limiting their applicability in dynamic environments where the data streams are described by an arbitrarily varying feature space. To handle such capricious data streams, we in this paper develop a novel algorithm, named OCDS (Online learning from Capricious Data Streams), which does not make any assumption on feature space dynamics. OCDS trains a learner on a universal feature space that establishes relationships between old and new features, so that the patterns learned in the old feature space can be used in the new feature space. Specifically, the universal feature space is constructed by leveraging the relatednesses among features. We propose a generative graphical model to model the construction process, and show that learning from the universal feature space can effectively improve performance with theoretical analysis. The experimental results demonstrate that OCDS achieves conspicuous performance on synthetic and real datasets.

PDF Details

AAAI Conference 2018 Conference Paper

Introducing AI to Undergraduate Students via Computer Vision Projects

Kaiman Zeng
Yancheng Li
Yida Xu
Di Wu
Nansong Wu

Computer vision, as a subﬁeld in the general artiﬁcial intelligence (AI), is a technology can be visualized and easily found in a large number of state-of-art applications. In this project, undergraduate students performed research on a landmark recognition task using computer vision techniques. The project focused on analyzing, designing, conﬁguring, and testing the two core components in landmark recognition: feature detection and description. The project modeled the landmark recognition system as a tour guide for visitors to the campus and evaluated the performance in the real world circumstances. By analyzing real-world data and solving problems, student’s cognitive skills and critical thinking skills were sharpened. Their knowledge and understanding in mathematical modeling and data processing were also enhanced.

PDF Details

AAAI Conference 2017 Conference Paper

Multi-Kernel Low-Rank Dictionary Pair Learning for Multiple Features Based Image Classification

Xiaoke Zhu
Xiao-Yuan Jing
Fei Wu
Di Wu
Li Cheng
Sen Li
Ruimin Hu

Dictionary learning (DL) is an effective feature learning technique, and has led to interesting results in many classiﬁcation tasks. Recently, by combining DL with multiple kernel learning (which is a crucial and effective technique for combining different feature representation information), a few multi-kernel DL methods have been presented to solve the multiple feature representations based classiﬁcation problem. However, how to improve the representation capability and discriminability of multi-kernel dictionary has not been well studied. In this paper, we propose a novel multi-kernel DL approach, named multi-kernel low-rank dictionary pair learning (MKLDPL). Speciﬁcally, MKLDPL jointly learns a kernel synthesis dictionary and a kernel analysis dictionary by exploiting the class label information. The learned synthesis and analysis dictionaries work together to implement the coding and reconstruction of samples in the kernel space. To enhance the discriminability of the learned multi-kernel dictionaries, MKLDPL imposes the low-rank regularization on the analysis dictionary, which can make samples from the same class have similar representations. We apply MKLDPL for multiple features based image classiﬁcation task. Experimental results demonstrate the effectiveness of the proposed approach.

PDF Details

TCS Journal 2016 Journal Article

Approximation algorithm for the balanced 2-connected k-partition problem

Di Wu
Zhao Zhang
Weili Wu

For two positive integers m, k and a connected graph G = ( V, E ) with a nonnegative vertex weight function w, the balanced m-connected k-partition problem, denoted as BC m P k, is to find a partition of V into k disjoint nonempty vertex subsets ( V 1, V 2, …, V k ) such that each G [ V i ] (the subgraph of G induced by V i ) is m-connected, and min 1 ≤ i ≤ k ⁡ { w ( V i ) } is maximized. The optimal value of BC m P k on graph G is denoted as β m ⁎ ( G, k ), that is, β m ⁎ ( G, k ) = max ⁡ min 1 ≤ i ≤ k ⁡ { w ( V i ) }, where the maximum is taken over all m-connected k-partition of G. In this paper, we study the BC 2 P k problem on interval graphs, and obtain the following results. (1) For k = 2, a 4/3-approximation algorithm is given for BC 2 P 2 on 4-connected interval graphs. (2) In the case that there exists a vertex v with weight at least W / k, where W is the total weight of the graph, we prove that the BC 2 P k problem on a 2k-connected interval graph G can be reduced to the BC 2 P k − 1 problem on the ( 2 k − 1 ) -connected interval graph G − v. In the case that every vertex has weight at most W / k, we prove a lower bound β 2 ⁎ ( G, k ) ≥ W / ( 2 k − 1 ) for 2k-connected interval graph G. (3) Assuming that weight w is integral, a pseudo-polynomial time algorithm is obtained. Combining this pseudo-polynomial time algorithm with the above lower bound, a fully polynomial time approximation scheme (FPTAS) is obtained for the BC 2 P k problem on 2k-connected interval graphs.

Details DOI

ICRA Conference 2014 Conference Paper

Sample path sharing in simulation-based policy improvement

Di Wu
Qing-Shan Jia
Chun-Hung Chen

Simulation-based policy improvement (SBPI) has been widely used to improve given base policies through simulation. The basic idea of SBPI is to estimate all the Q-factors for a given state using simulation, and then select the action that achieves the minimal cost. It is therefore of great importance to efficiently use the given budget in order to select the best action with high probability. Different from existing budget allocation algorithms that estimate Q-factors by independent simulation, we share the sample paths to improve the probability of correctly selecting the best action. Our method can be combined with equal allocation, Successive Rejects, and optimal computing budget allocation to enhance their probabilities of correct selection as well as to achieve better policies in SBPI. Such improvement depends on the overlap in reachable states under different actions. Numerical results show that with such overlap, combining our method with equal allocation, Successive Rejects and optimal computing budget allocation produces higher probability of selection as well as better policies in SBPI.

Details

AIJ Journal 2008 Journal Article

Reachability analysis of uncertain systems using bounded-parameter Markov decision processes

Di Wu
Xenofon Koutsoukos

Verification of reachability properties for probabilistic systems is usually based on variants of Markov processes. Current methods assume an exact model of the dynamic behavior and are not suitable for realistic systems that operate in the presence of uncertainty and variability. This research note extends existing methods for Bounded-parameter Markov Decision Processes (BMDPs) to solve the reachability problem. BMDPs are a generalization of MDPs that allows modeling uncertainty. Our results show that interval value iteration converges in the case of an undiscounted reward criterion that is required to formulate the problems of maximizing the probability of reaching a set of desirable states or minimizing the probability of reaching an unsafe set. Analysis of the computational complexity is also presented.

Details DOI