EAAI Journal 2026 Journal Article
A three-dimensional multi-sensor fusion convolutional network for bearing fault diagnosis under complex small sample conditions
- Qiang Li
- Rundong Zhou
- Xinyu Zhai
- Jin Wang
- Qing Lv
Author name cluster
Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.
EAAI Journal 2026 Journal Article
EAAI Journal 2026 Journal Article
EAAI Journal 2026 Journal Article
AAAI Conference 2026 Conference Paper
Remote sensing change detection (CD) has achieved remarkable progress in recent years. However, little attention has been paid to generalizable change detection (GCD) methods that can effectively generalize to unseen scenarios or domains beyond the training distribution. The major challenges in GCD arise from domain diversity and bitemporal domain shifts in remote sensing images, caused by variations in imaging platforms, acquisition times, geographic regions, and observed events. To tackle these challenges, we propose GenCD, a GCD framework built upon vision foundation models (VFMs). Specifically, GenCD introduces two key components: (1) a Low-Rank Exchange Adaptation (LREA) strategy of VFMs that aligns bitemporal representations while preserving the generalization capacity of VFMs on single-temporal inputs; and (2) a Token-Guided Feature Refinement (TGFR) mechanism that leverages an input-independent token as a guide to refine difference features, improving the discrimination between changed and unchanged regions. We conduct extensive cross-dataset evaluations on eight diverse datasets across three binary CD tasks: land cover, land use, and building-only CD. The results consistently demonstrate the superior generalization of GenCD over SoTA methods, highlighting its effectiveness in GCD.
AAAI Conference 2026 Conference Paper
Text-guided image editing has advanced rapidly with the rise of diffusion models. While flow-based inversion-free methods offer high efficiency by avoiding latent inversion, they often fail to effectively integrate source information, leading to poor background preservation, spatial inconsistencies, and over-editing due to the lack of effective integration of source information. In this paper, we present FIA-Edit, a novel inversion-free framework that achieves high-fidelity and semantically precise edits through a Frequency-Interactive Attention. Specifically, we design two key components: (1) a Frequency Representation Interaction (FRI) module that enhances cross-domain alignment by exchanging frequency components between source and target features within self-attention, and (2) a Feature Injection (FIJ) module that explicitly incorporates source-side queries, keys, values, and text embeddings into the target branch's cross-attention to preserve structure and semantics. Comprehensive and extensive experiments demonstrate that FIA-Edit supports high-fidelity editing at low computational cost (~6s per 512 * 512 image on an RTX 4090) and consistently outperforms existing methods across diverse tasks in visual quality, background fidelity, and controllability. Furthermore, we are the first to extend text-guided image editing to clinical applications. By synthesizing anatomically coherent hemorrhage variations in surgical images, FIA-Edit opens new opportunities for medical data augmentation and delivers significant gains in downstream bleeding classification.
AAAI Conference 2026 Conference Paper
In the realm of autonomous driving, accurately detecting surrounding obstacles is crucial for effective decision-making. Traditional methods primarily rely on 3D bounding boxes to represent these obstacles, which often fail to capture the complexity of irregularly shaped, real-world objects. To overcome these limitations, we present GUIDE, a novel framework that utilizes 3D Gaussians for instance detection and occupancy prediction. Unlike conventional occupancy prediction methods, GUIDE also offers robust tracking capabilities. Our framework employs a sparse representation strategy, using Gaussian-to-Voxel Splatting to provide fine-grained, instance-level occupancy data without the computational demands associated with dense voxel grids. Experimental validation on the nuScenes dataset demonstrates GUIDE's performance, with an instance occupancy mAP of 21.61, marking a 50% improvement over existing methods, alongside competitive tracking capabilities. GUIDE establishes a new benchmark in autonomous perception systems, effectively combining precision with computational efficiency to better address the complexities of real-world driving environments.
JBHI Journal 2026 Journal Article
3D cerebrovascular segmentation poses a significant challenge, akin to locating a line within a vast 3D environment. This complexity can be substantially reduced by projecting the vessels onto a 2D plane, enabling easier segmentation. In this paper, we create a vessel-segmentation-friendly space using a clinical visualization technique called maximum intensity projection (MIP). Leveraging this, we propose a Dual-space Context-Aware Network (DCANet) for 3D vessel segmentation, designed to capture even the finest vessel structures accurately. DCANet begins by transforming a magnetic resonance angiography (MRA) volume into a 3D Regional-MIP volume, where each Regional-MIP slice is constructed by projecting adjacent MRA slices. This transformation highlights vessels as prominent continuous curves rather than the small circular or ellipsoidal cross-sections seen in MRA slices. DCANet encodes vessels separately in the MRA and the projected Regional-MIP spaces and introduces the Regional-MIP Image Fusion Block (MIFB) between these dual spaces to selectively integrate contextual features from Regional-MIP into MRA. Following dual-space encoding, DCANet employs a Dual-mask Spatial Guidance TransFormer (DSGFormer) decoder to focus on vessel regions while effectively excluding background areas, which reduces the learning burden and improves segmentation accuracy. We benchmark DCANet on four datasets: two public datasets, TubeTK and IXI-IOP, and two in-house datasets, Xiehe and IXI-HH. The results demonstrate that DCANet achieves superior performance, with improvements in average DSC values of at least 2. 26%, 2. 17%, 2. 62%, and 2. 58% for thin vessels, respectively.
EAAI Journal 2026 Journal Article
AAAI Conference 2026 Conference Paper
Cold-start item recommendation is a significant challenge in recommendation systems, particularly when new items are introduced without any historical interaction data. While existing methods leverage multi-modal content to alleviate the cold-start issue, they often neglect the inherent multi-view structure of modalities, namely the distinction between shared and modality-specific features. In this paper, we propose Multi-Modal Multi-View Variational AutoEncoder (M²VAE), a generative model that addresses the challenges of modeling common and unique views in attribute and multi-modal features, as well as user preferences over single-typed item features. Specifically, we generate type-specific latent variables for item IDs, categorical attributes, and image features, and use Product-of-Experts (PoE) to derive a common representation. A disentangled contrastive loss decouples the common view from unique views while preserving feature informativeness. To model user inclinations, we employ a user-aware hierarchical Mixture-of-Experts (MoE) to adaptively fuse representations. We further incorporate co-occurrence signals via contrastive learning, eliminating the need for pretraining. Extensive experiments on real-world datasets validate the effectiveness of our approach.
AAAI Conference 2026 Conference Paper
White-Light Imaging (WLI) is the standard for endoscopic cancer screening, but Narrow-Band Imaging (NBI) offers superior diagnostic details. A key challenge is transferring knowledge from NBI to enhance WLI-only models, yet existing methods are critically hampered by their reliance on paired NBI-WLI images of the same lesion, a costly and often impractical requirement that leaves vast amounts of clinical data untapped. In this paper, we break this paradigm by introducing PaGKD, a novel Pairing-free Group-level Knowledge Distillation framework that that enables effective cross-modal learning using unpaired WLI and NBI data. Instead of forcing alignment between individual, often semantically mismatched image instances, PaGKD operates at the group level to distill more complete and compatible knowledge across modalities. Central to PaGKD are two complementary modules: (1) Group-level Prototype Distillation (GKD-Pro) distills compact group representations by extracting modality-invariant semantic prototypes via shared lesion-aware queries; (2) Group-level Dense Distillation (GKD-Den) performs dense cross-modal alignment by guiding group-aware attention with activation-derived relation maps. Together, these modules enforce global semantic consistency and local structural coherence without requiring image-level correspondence. Extensive experiments on four clinical datasets demonstrate that PaGKD consistently and significantly outperforms state-of-the-art methods, boosting AUC by 3.3%, 1.1%, 2.8%, and 3.2%, respectively, establishing a new direction for cross-modal learning from unpaired data.
EAAI Journal 2026 Journal Article
AAAI Conference 2026 Conference Paper
Locating vertebral landmarks on anteroposterior (AP) X-ray images is challenging due to the tissue overlap. Despite the great progress of heatmap-based methods, they often predict missing/false points, which are intolerable in the downstream applications like scoliosis assessment. In this paper, we instead modernize the classic point-regression scheme, and propose a novel model termed RouterNet to locate the 68 vertebral landmarks completely and accurately. RouterNet starts from an initial root point, and then gradually routes it onto more and more points with finer and finer semantics. RouterNet naturally couples such point routing process with its hierarchical and multi-scale feature learning. That is, lower-scale feature maps are utilized to regress points with coarser semantics, and the regressed points pilot a more focused local feature extraction on the next higher-scale map to route onto their subsequent positions with finer semantics. With this divide-and-conquer, RouterNet alleviates the task difficulty, and can robustly localize by routing from the whole spinal center to 17 vertebral centers, and further to their 68 corner points. Extensive and comprehensive experiments on both public and private datasets demonstrate our superior performance over other state-of-the-arts, by decreasing NMSE by 73.8% for landmark localization, and SMAPE by 14.8% for the downstream scoliosis assessment.
EAAI Journal 2025 Journal Article
IROS Conference 2025 Conference Paper
This paper introduces a systematic approach to identifying a physically feasible set of robot dynamics parameters. The framework consists of four steps: 1) Identification of robot dynamics parameters using least squares combined with a linear friction model. 2) Construction of a weighting matrix based on the least squares identification error, and performing weighted least squares identification combined with the linear friction model. 3) Introduction of a nonlinear friction model to fit joint friction. 4) Optimization of the remaining robot dynamics parameters to adhere to physical feasibility constraints. Various combinations of identification methods with linear or nonlinear friction models are analyzed experimentally, using a 6-DoF industrial robot and a 7-DoF collaborative robot, respectively, to demonstrate the effectiveness of the proposed recognition framework. Experimental results affirm that the proposed method provides accurate estimates of the robot joint torques while maintaining the physical feasibility of the dynamics.
JBHI Journal 2025 Journal Article
Few-shot semantic segmentation (FSS) of 3D medical images requires finding a 2D slice from the labeled volume as support to ‘query’ slices of the unlabeled one. Accurately determining support slices is crucial for learning representative prototypical features, thereby enhancing segmentation accuracy. The existing methods typically resort to the true position of the query target to align the query with support slices or simply exploit one key support slice to segment all query slices, which inevitably results in poor practicality and mis-segmentation. In this regard, we seek a practical and efficient solution by proposing a novel Collaborative Slice Alignment (CSA) module, which densely assigns each query slice its own fittest support without knowing the target prior. Concretely, our CSA first estimates the confidence scores of slices from the sorting task to implicitly reflect their physical location in the human body. The estimated scores are considered as spatial references for aligning support slices and query slices so that each matching pair shares the most similar image contents. Moreover, the self-learnable ranking objective allows CSA to transfer internal knowledge into both support and query features to further boost the FSS performance. Additionally, we introduce an Information Reconciliation (InRe) module to mitigate the inconsistent feature distribution caused by the individual differences between support and query images. Experimental results demonstrate that the combination of CSA and InRe achieves an average Dice score improvement of at least 8. 61% across three datasets, consistently outperforming other state-of-the-art methods.
AIIM Journal 2025 Journal Article
NeurIPS Conference 2025 Conference Paper
Low-Rank Adaptation (LoRA) methods have demonstrated considerable success in achieving parameter-efficient fine-tuning (PEFT) for Transformer-based foundation models. These methods typically fine-tune individual Transformer layers using independent LoRA adaptations. However, directly applying existing LoRA techniques to convolutional networks (ConvNets) yields unsatisfactory results due to the high correlation between the stacked sequential layers of ConvNets. To overcome this challenge, we introduce a novel framework called Correlated Low-Rank Adaptation (CoLoRA), which explicitly utilizes correlated low-rank matrices to model the inter-layer dependencies among convolutional layers. Additionally, to enhance tuning efficiency, we propose a parameter-free filtering method that enlarges the receptive field of LoRA, thus minimizing interference from non-informative local regions. Comprehensive experiments conducted across various mainstream vision tasks, including image classification, semantic segmentation, and object detection, illustrate that CoLoRA significantly advances the state-of-the-art PEFT approaches. Notably, our CoLoRA achieves superior performance with only 5\% of trainable parameters, surpassing full fine-tuning in the image classification task on the VTAB-1k dataset using ConvNeXt-S. Code is available at https: //github. com/VISION-SJTU/CoLoRA.
JBHI Journal 2025 Journal Article
A key challenge in registering pre- and post-operative brain tumor images lies in the anatomical inconsistencies caused by pathological changes and surgical resections. Recent efforts have addressed this issue by masking affected regions during optimization, but such approaches discard contextual information and rely on CNN backbones that implicitly model deformation, often overfitting to distant normal tissues and failing to capture the severe nonlinear distortions near the tumor. Correlation-based alternatives enhance generalization to diverse deformation patterns by explicitly modeling geometric correspondences, yet they frequently yield unreliable matches in and around tumor regions, disrupting the deformation field. In this paper, we propose Cross-correlation Rectification-based Registration Network (CRRNet), the first framework that introduces an active rectification mechanism specifically for robustness and structurally coherent pre- to post-operative brain tumor image registration. Specifically, CRRNet achieves this through two complementary modules: 1) a Cross-correlation Analysis-Based Inconsistency module that identifies invalid correspondences via bidirectional loop-closure evaluation on cross-correlations, and 2) a Dual-level Correspondence Rectification module that adaptively integrates contextually reliable correlations from local and long-range perspectives to restore structurally coherent matches. This synergistic design retains the strengths of cross-correlation while effectively mitigating correspondence mismatches. Extensive experiments on multiple tumor benchmarks demonstrate the superiority of CRRNet. Specifically, on the BraTS-Reg dataset, it reduces the mean registration errors by 8. 18% in near-tumor regions and 3. 95% in far-from-tumor regions, surpassing state-of-the-art methods.
IROS Conference 2025 Conference Paper
Underwater object detection (UOD) is crucial for monitoring marine ecosystems, underwater robotics, environmental protection, and autonomous underwater vehicles (AUVs). Despite progress, many models struggle under real-world conditions due to poor visibility, dynamic lighting, and domain shifts. Traditional methods like Faster R-CNN are computationally expensive, while YOLO-based models suffer in challenging underwater scenarios. The scarcity of large-scale annotated datasets further limits model generalization. To address these challenges, we introduce UOD-SZTU-2025, a new dataset of 3, 133 high-quality underwater images, sourced primarily from video platforms. The dataset is used in EFCWM (Enhanced Feature Correction and Weighting Module) to extract and refine a feature material library for detection targets. We propose EFCWM-Mamba-YOLO, a lightweight, real-time detection model designed to enhance feature representation and adapt to diverse underwater environments. The EFCWM module incorporates domain adaptation for improved robustness. Additionally, a two-stage training strategy first trains on a source domain and fine-tunes with limited target domain samples to enhance generalization. Experiments show our approach surpasses existing lightweight UOD models in accuracy, real-time performance, and robustness. Our dataset, model, and benchmark establish a strong foundation for future UOD research. The dataset for EFCWM-Mamba-YOLO is available at https://github.com/wojiaosun/UOD-SZTU-2025.
JBHI Journal 2025 Journal Article
Esophagogastroduodenoscopy (EGD) requires inspecting plentiful upper gastrointestinal (UGI) sites completely for a precise cancer screening. Automated temporal site monitoring for EGD assistance is thus of high demand, yet often fails if directly applying the existing methods of online action detection. The key challenges are two-fold: 1) the global camera motion dominates, invalidating the temporal patterns derived from the object optical flows, and 2) the UGI sites are fine-grained, yielding highly homogenized appearances. In this paper, we propose an EGD-customized model, powered by two novel designs, i. e. , Visual Time-aware Embedding plus Vision-text Asymmetric Coworking (VTE+VAC), for real-time accurate fine-grained UGI site monitoring. Concretely, VTE learns visual embeddings by differentiating frames via classification losses, and meanwhile by reordering the sampled time-agnostic frames to be temporally coherent via a ranking loss. Such joint objective encourages VTE to capture the sequential relation without resorting to the inapplicable object optical flows, and thus to provide the time-aware frame-wise embeddings. In the subsequent analysis, VAC uses a temporal sliding window, and extracts vision-text multimodal knowledge from each frame and its corresponding textualized prediction via the learned VTE and a frozen BERT. The text embeddings help provide more representative cues, but also may cause misdirection due to prediction errors. Thus, VAC randomly drops or replaces historical predictions to increase the error tolerance to avoid collapsing onto the last few predictions. Qualitative and quantitative experiments demonstrate that the proposed method achieves superior performance compared to other state-of-the-art methods, with an average F1-score improvement of at least 7. 66%.
NeurIPS Conference 2025 Conference Paper
Latent Diffusion-based Text-to-Image (T2I) is a free image editing tool that typically reverses an image into noise, reconstructs it using its original text prompt, and then generates an edited version under a new target prompt. To preserve unaltered image content, features from the reconstruction are directly injected to replace selected features in the generation. However, this direct replacement often leads to feature incompatibility, compromising editing fidelity and limiting creative flexibility, particularly for non-rigid edits (\emph{e. g. }, structural or pose changes). In this paper, we aim to address these limitations by proposing \textbf{FSI-Edit}, a novel framework using frequency- and stochasticity-based feature injection for flexible image editing. First, FSI-Edit enhances feature consistency by injecting \emph{high-frequency} components of reconstruction features into generation features, mitigating incompatibility while preserving the editing ability for major structures encoded in low-frequency information. Second, it introduces controlled \emph{noise} into the replaced reconstruction features, expanding the generative space to enable diverse non-rigid edits beyond the original image’s constraints. Experiments on non-rigid edits, \emph{e. g. }, addition, deletion, and pose manipulation, demonstrate that FSI-Edit outperforms existing baselines in target alignment, semantic fidelity and visual quality. Our work highlights the critical roles of frequency-aware design and stochasticity in overcoming rigidity in diffusion-based editing.
IROS Conference 2025 Conference Paper
Sensor simulation is pivotal for scalable validation of autonomous driving systems, yet existing Neural Radiance Fields (NeRF) based methods face applicability and efficiency challenges in industrial workflows. This paper introduces a Gaussian Splatting (GS) based system to address these challenges: We first break down sensor simulator components and analyze the possible advantages of GS over NeRF. Then in practice, we refactor three crucial components through GS, to leverage its explicit scene representation and real-time rendering: (1) choosing the 2D neural Gaussian representation for physics-compliant scene and sensor modeling, (2) proposing a scene editing pipeline to leverage Gaussian primitives library for data augmentation, and (3) coupling a controllable diffusion model for scene expansion and harmonization. We implement this framework on a proprietary autonomous driving dataset supporting cameras and LiDAR sensors. We demonstrate through ablation studies that our approach reduces frame-wise simulation latency, achieves better geometric and photometric consistency, and enables interpretable explicit scene editing and expansion. Furthermore, we showcase how integrating such a GS-based sensor simulator with traffic and dynamic simulators enables full-stack testing of end-to-end autonomy algorithms. Our work provides both algorithmic insights and practical validation, establishing GS as a cornerstone for industrial-grade sensor simulation.
YNIMG Journal 2025 Journal Article
AAAI Conference 2025 Conference Paper
We propose MonoBox, an innovative box-supervised segmentation method constrained by monotonicity to liberate its training from the user-unfriendly box-tightness assumption. In contrast to conventional box-supervised segmentation, where the box edges must precisely touch the target boundaries, MonoBox leverages imprecisely-annotated boxes to achieve robust pixel-wise segmentation. The 'linchpin' is that, within the noisy zones around box edges, MonoBox discards the traditional misguiding multiple-instance learning loss, and instead optimizes a carefully-designed objective, termed monotonicity constraint. Along directions transitioning from the foreground to background, this new constraint steers responses to adhere to a trend of monotonically decreasing values. Consequently, the originally unreliable learning within the noisy zones is transformed into a correct and effective monotonicity optimization. Moreover, an adaptive label correction is introduced, enabling MonoBox to enhance the tightness of box annotations using predicted masks from the previous epoch and dynamically shrink the noisy zones as training progresses. We verify MonoBox in the box-supervised segmentation task of polyps, where satisfying box-tightness is challenging due to the vague boundaries between the polyp and normal tissues. Experiments on both public synthetic and in-house real noisy datasets demonstrate that MonoBox exceeds other anti-noise state-of-the-arts by improving Dice by at least 5.5% and 3.3%, respectively.
EAAI Journal 2025 Journal Article
JBHI Journal 2025 Journal Article
The R-peak in electrocardiogram (ECG) signals is a critical physiological marker for the diagnosis of cardiovascular diseases. Although various R-peak detection methods have been proposed, their performance is often hindered by noise, especially in dynamic ECG monitoring. Furthermore, the potential of harnessing complementary information from 12-lead ECG signals has not been fully exploited. To address these challenges, this study conceptualized 12-lead ECG data as two-dimensional images and employed YOLOv5 as the model's backbone for R-peak detection, effectively transforming a signal segmentation task into an object detection task in images. Specifically, considering the characteristics of consistent R-peak positions across different leads, we proposed a strip attention mechanism to treat horizontal or vertical strips as tokens for computing inter- and intra-strip attention, enhancing the model's ability to capture R-peak positional information and likelihood. Additionally, a one-dimensional Manhattan distance-based NMS algorithm was used to minimize redundant detection frames, thereby enhancing model performance. The proposed model was rigorously evaluated on two publicly available datasets, INCART and LUDB, under varying noise conditions. On the INCART dataset, the model achieved F1 scores of 99. 97%, 99. 86%, 99. 63%, and 98. 00% at noise levels of Original, SNR = 10 dB, SNR = 5 dB, and SNR = 0 dB, respectively. Similarly, on the LUDB dataset, the F1 scores were 99. 89%, 100%, 100%, and 99. 86% for the corresponding noise levels. Extensive testing across multiple datasets and noise scenarios demonstrated that the proposed model outperformed existing state-of-the-art methods in terms of accuracy, noise robustness, and generalization capability.
EAAI Journal 2025 Journal Article
EAAI Journal 2025 Journal Article
ICRA Conference 2025 Conference Paper
In order to obtain a good tactile sensing, traditional dexterous hands always enable all the sensing units installed on them all the time, even if just a few sensor units are actually used, which make the tactile sensing system resource-wasting and energy consuming. In order to reduce their complexities by placing the tactile sensing units only at critical locations, this work proposes an embodied tactile dexterous hand (ET-Hand) and a novel multimodal sensor placement framework that learns multiple tasks to generate optimal placement proposal. Furthermore, our ET-Hand can dynamically adjust the perceived tactile sensor positions, types and numbers during robotic manipulation, providing novel tools and methods for investigating the tactile channels and placement scale required for robot exploration. In the object recognition and slip detection tasks, the results show that our proposed method performs close to or even better than traditional sensing way with large-scale placement.
AAAI Conference 2025 Conference Paper
Temporal action localization (TAL) involves dual tasks to classify and localize actions within untrimmed videos. However, the two tasks often have conflicting requirements for features. Existing methods typically employ separate heads for classification and localization tasks but share the same input feature, leading to suboptimal performance. To address this issue, we propose a novel TAL method with Cross Layer Task Decoupling and Refinement (CLTDR). Based on the feature pyramid of video, CLTDR strategy integrates semantically strong features from higher pyramid layers and detailed boundary-aware boundary features from lower pyramid layers to effectively disentangle the action classification and localization tasks. Moreover, the multiple features from cross layers are also employed to refine and align the disentangled classification and regression results. At last, a lightweight Gated Multi-Granularity (GMG) module is proposed to comprehensively extract and aggregate video features at instant, local, and global temporal granularities. Benefiting from the CLTDR and GMG modules, our method achieves state-of-the-art performance on five challenging benchmarks: THUMOS14, MultiTHUMOS, EPIC-KITCHENS-100, ActivityNet-1.3, and HACS. Code:https://github.com/LiQiang0307/CLTDR-GMG
AIIM Journal 2025 Journal Article
EAAI Journal 2025 Journal Article
AIIM Journal 2024 Journal Article
EAAI Journal 2024 Journal Article
NeurIPS Conference 2024 Conference Paper
Federated graph learning (FedGL) is an emerging learning paradigm to collaboratively train graph data from various clients. However, during the development and deployment of FedGL models, they are susceptible to illegal copying and model theft. Backdoor-based watermarking is a well-known method for mitigating these attacks, as it offers ownership verification to the model owner. We take the first step to protect the ownership of FedGL models via backdoor-based watermarking. Existing techniques have challenges in achieving the goal: 1) they either cannot be directly applied or yield unsatisfactory performance; 2) they are vulnerable to watermark removal attacks; and 3) they lack of formal guarantees. To address all the challenges, we propose FedGMark, the first certified robust backdoor-based watermarking for FedGL. FedGMark leverages the unique graph structure and client information in FedGL to learn customized and diverse watermarks. It also designs a novel GL architecture that facilitates defending against both the empirical and theoretically worst-case watermark removal attacks. Extensive experiments validate the promising empirical and provable watermarking performance of FedGMark. Source code is available at: https: //github. com/Yuxin104/FedGMark.
JBHI Journal 2024 Journal Article
Background: Duchenne muscular dystrophy (DMD) is a neuromuscular disorder that affects ambulatory function. Quantitative ultrasound (QUS) imaging, utilizing envelope statistics, has proven effective in diagnosing DMD. Radiomics enables the extraction of detailed features from QUS images. This study further proposes a hybrid QUS radiomics and explores its value in characterizing DMD. Methods: Patients (n = 85) underwent ultrasound examinations of gastrocnemius through Nakagami, homodyned K (HK), and information entropy imaging. The hybrid QUS radiomics extracted, selected, and integrated the retained features derived from each QUS image for classification of ambulatory function using support vector machine. Nested five fold cross-validation of the data was conducted, with the rotational process repeated 50 times. The performance was assessed by averaging the areas under the receiver operating characteristic curve (AUROC). Results: Radiomics enhanced the average AUROC of B-scan, Nakagami, HK, and entropy imaging to 0. 790, 0. 911, 0. 869, and 0. 890, respectively. By contrast, the hybrid QUS radiomics using HK and entropy images for diagnosing ambulatory function in DMD patients achieved a superior average AUROC of 0. 971 ( p < 0. 001 compared with conventional radiomics analysis). Conclusions: The proposed hybrid QUS radiomics incorporates microstructure-related backscattering information from various envelope statistics models to effectively enhance the performance of DMD assessment.
JBHI Journal 2024 Journal Article
The heart sound reflects the movement status of the cardiovascular system and contains the early pathological information of cardiovascular diseases. Automatic heart sound diagnosis plays an essential role in the early detection of cardiovascular diseases. In this study, we aim to develop a novel end-to-end heart sound abnormality detection and classification method, which can be adapted to different heart sound diagnosis tasks. Specifically, we developed a Multi-feature Decision Fusion Network (MDFNet) composed of a Multi-dimensional Feature Extraction (MFE) module and a Multi-dimensional Decision Fusion (MDF) module. The MFE module extracted spatial features, multi-level temporal features and spatial-temporal fusion features to learn heart sound characteristics from multiple perspectives. Through deep supervision and decision fusion, the MDF module made the multi-dimensional features extracted by the MFE module more discriminative, and fused the decision results of multi-dimensional features to integrate complementary information. Furthermore, attention modules were embedded in the MDFNet to emphasize the fundamental heart sounds containing effective feature information. Finally, we proposed an efficient data augmentation method to circumvent the diagnosis performance degradation caused by the lack of cardiac cycle segmentation in other end-to-end methods. The developed method achieved an overall accuracy of 94. 44% and a F1-score of 86. 90% on the binary classification task and a F1-score of 99. 30% on the five-classification task. Our method outperformed other state-of-the-art methods and had good clinical application prospects.
NeurIPS Conference 2024 Conference Paper
This paper studies a risk minimization problem with decision dependent data distribution. The problem pertains to the performative prediction setting in which a trained model can affect the outcome estimated by the model. Such dependency creates a feedback loop that influences the stability of optimization algorithms such as stochastic gradient descent (SGD). We present the first study on performative prediction with smooth but possibly non-convex loss. We analyze a greedy deployment scheme with SGD (SGD-GD). Note that in the literature, SGD-GD is often studied with strongly convex loss. We first propose the definition of stationary performative stable (SPS) solutions through relaxing the popular performative stable condition. We then prove that SGD-GD converges to a biased SPS solution in expectation. We consider two conditions of sensitivity on the distribution shifts: (i) the sensitivity is characterized by Wasserstein-1 distance and the loss is Lipschitz w. r. t. ~data samples, or (ii) the sensitivity is characterized by total variation (TV) divergence and the loss is bounded. In both conditions, the bias levels are proportional to the stochastic gradient's variance and sensitivity level. Our analysis is extended to a lazy deployment scheme where models are deployed once per several SGD updates, and we show that it converges to an SPS solution with reduced bias. Numerical experiments corroborate our theories.
JAAMAS Journal 2024 Journal Article
Abstract Belief propagation (BP) approaches, such as Max-sum and its variants, are important methods to solve large-scale Distributed Constraint Optimization Problems. However, these algorithms face a huge challenge since their computational complexity scales exponentially with the arity of each constraint function. Current accelerating techniques for BP use sorting or branch-and-bound (BnB) strategy to reduce the search space. However, the existing BnB-based methods are mainly designed for specific problems, which limits their applicability. On the other hand, though several generic sorting-based methods have been proposed, they require significantly high preprocessing as well as memory overhead, which prohibits their adoption in some realistic scenarios. In this paper, we aim to propose a series of generic and memory-efficient heuristic search techniques to accelerate belief propagation. Specifically, by leveraging dynamic programming, we efficiently build function estimations for every partial assignment scoped in a constraint function in the preprocessing phase. Then, by using these estimations to build upper bounds and employing a branch-and-bound in a depth-first fashion to reduce the search space, we propose our first method called FDSP. Next, we enhance FDSP by adapting a concurrent-search strategy and leveraging the upper bounds as guiding information and propose its first heuristic variant framework called CONC-FDSP. Finally, by choosing to expand the partial assignment with the highest upper bound in each step of exploration, we propose the second heuristic variant of FDSP, called BFS-FDSP. We prove the correctness of our methods theoretically, and our empirical evaluations indicate their superiority for accelerating Max-sum in terms of both time and memory, compared with the state-of-the-art.
JBHI Journal 2023 Journal Article
Automated Cobb angle estimation on X-ray images is crucial to scoliosis diagnosis. The existing efforts are typically two extremes, which either laboriously detect the raw vertebral landmarks or directly regress Cobb angles from the entire image. In this paper, we propose a novel two-stage end-to-end method as a balanced solution, to avoid vulnerability to false landmarks, and to preserve flexibility in clinical usages. Concretely, we cascade two stages sequentially for detecting vertebrae and then regressing their bending directions instead of raw landmarks. In the detection stage, we combine two networks called LocNet and SegNet to robustly localize vertebrae, and meanwhile to suppress the false positives by additionally segmenting the whole spine. In the subsequent stage, we introduce a regression network named RegNet to accurately regress bending directions of localized vertebrae. Furthermore, the vertebra-aligned local regions on LocNet's intermediate features are cropped via RoIAlign-pooling, and RegNet inherits the cropped regions to learn only feature residuals. By doing so, the regression difficulty can be dramatically alleviated, and the two stages are deeply coupled and mutually guided in an end-to-end training. Moreover, a random perturbation on the inherited features further enhances RegNet's robustness. We benchmark our method on both public and private datasets, and the errors are 2. 92 $\pm$ 2. 34 $^{\circ }$ and 6. 87 $\pm$ 6. 26% in terms of CMAE and SMAPE on the widely-employed AASCE dataset, outperforming other state-of-the-arts by at least 16. 81% and 6. 15%, respectively. Also, a clinical user study verifies our promising flexibility for allowing convenient rectifications to further decrease errors by a large marge.
IJCAI Conference 2023 Conference Paper
Recent advanced Table Structure Recognition (TSR) models adopt image-to-text solutions to parse table structure. These methods can be formulated as image caption problem, i. e. , input a single-table image and output table structure description in a specific text format, e. g. , HTML. With the impressive success of Transformer in text generation tasks, these methods use Transformer architecture to predict HTML table text in an autoregressive manner. However, tables always emerge with a large variety of shapes and sizes. Autoregressive models usually suffer from the error accumulation problem as the length of predicted text increases, which results in unsatisfactory performance for large tables. In this paper, we propose a novel image-to-text based TSR method that relieves error accumulation problems and improves performance noticeably. At the core of our method is a cascaded two-step decoder architecture with the former decoder predicting HTML table row tags non-autoregressively and the latter predicting HTML table cell tags of each row in a semi-autoregressive manner. Compared with existing methods that predict HTML text autoregressively, the superiority of our row-to-cell progressive table parsing is twofold: (1) it generates an HTML tag sequence with a vertical-and-horizontal two-step `scanning', which better fits the inherent 2D structure of image data, (2) it performs substantially better for large tables (long sequence prediction) since it alleviates error accumulation problem specific to autoregressive models. Extensive experiments demonstrate that our method achieves competitive performance on three public benchmarks.
IJCAI Conference 2023 Conference Paper
Mammogram image is important for breast cancer screening, and typically obtained in a dual-view form, i. e. , cranio-caudal (CC) and mediolateral oblique (MLO), to provide complementary information for clinical decisions. However, previous methods mostly learn features from the two views independently, which violates the clinical knowledge and ignores the importance of dual-view correlation in the feature learning. In this paper, we propose a dual-view correlation hybrid attention network (DCHA-Net) for robust holistic mammogram classification. Specifically, DCHA-Net is carefully designed to extract and reinvent deep feature maps for the two views, and meanwhile to maximize the underlying correlations between them. A hybrid attention module, consisting of local relation and non-local attention blocks, is proposed to alleviate the spatial misalignment of the paired views in the correlation maximization. A dual-view correlation loss is introduced to maximize the feature similarity between corresponding strip-like regions with equal distance to the chest wall, motivated by the fact that their features represent the same breast tissues, and thus should be highly-correlated with each other. Experimental results on the two public datasets, i. e. , INbreast and CBIS-DDSM, demonstrate that the DCHA-Net can well preserve and maximize feature correlations across views, and thus outperforms previous state-of-the-art methods for classifying a whole mammogram as malignant or not.
AAAI Conference 2023 Conference Paper
Generative Adversarial networks (GANs) have demonstrated their powerful capability of synthesizing high-resolution images, and great efforts have been made to interpret the semantics in the latent spaces of GANs. However, existing works still have the following limitations: (1) the majority of works rely on either pretrained attribute predictors or large-scale labeled datasets, which are difficult to collect in most cases, and (2) some other methods are only suitable for restricted cases, such as focusing on interpretation of human facial images using prior facial semantics. In this paper, we propose a GAN-based method called FEditNet, aiming to discover latent semantics using very few labeled data without any pretrained predictors or prior knowledge. Specifically, we reuse the knowledge from the pretrained GANs, and by doing so, avoid overfitting during the few-shot training of FEditNet. Moreover, our layer-wise objectives which take content consistency into account also ensure the disentanglement between attributes. Qualitative and quantitative results demonstrate that our method outperforms the state-of-the-art methods on various datasets. The code is available at https://github.com/THU-LYJ-Lab/FEditNet.
AAAI Conference 2023 Conference Paper
One-shot segmentation of brain tissues is typically a dual-model iterative learning: a registration model (reg-model) warps a carefully-labeled atlas onto unlabeled images to initialize their pseudo masks for training a segmentation model (seg-model); the seg-model revises the pseudo masks to enhance the reg-model for a better warping in the next iteration. However, there is a key weakness in such dual-model iteration that the spatial misalignment inevitably caused by the reg-model could misguide the seg-model, which makes it converge on an inferior segmentation performance eventually. In this paper, we propose a novel image-aligned style transformation to reinforce the dual-model iterative learning for robust one-shot segmentation of brain tissues. Specifically, we first utilize the reg-model to warp the atlas onto an unlabeled image, and then employ the Fourier-based amplitude exchange with perturbation to transplant the style of the unlabeled image into the aligned atlas. This allows the subsequent seg-model to learn on the aligned and style-transferred copies of the atlas instead of unlabeled images, which naturally guarantees the correct spatial correspondence of an image-mask training pair, without sacrificing the diversity of intensity patterns carried by the unlabeled images. Furthermore, we introduce a feature-aware content consistency in addition to the image-level similarity to constrain the reg-model for a promising initialization, which avoids the collapse of image-aligned style transformation in the first iteration. Experimental results on two public datasets demonstrate 1) a competitive segmentation performance of our method compared to the fully-supervised method, and 2) a superior performance over other state-of-the-art with an increase of average Dice by up to 4.67%. The source code is available at: https://github.com/JinxLv/One-shot-segmentation-via-IST.
NeurIPS Conference 2022 Conference Paper
We consider a scenario where multiple agents are learning a common decision vector from data which can be influenced by the agents’ decisions. This leads to the problem of multi-agent performative prediction (Multi-PfD). In this paper, we formulate Multi-PfD as a decentralized optimization problem that minimizes a sum of loss functions, where each loss function is based on a distribution influenced by the local decision vector. We first prove the necessary and sufficient condition for the Multi-PfD problem to admit a unique multi-agent performative stable (Multi-PS) solution. We show that enforcing consensus leads to a laxer condition for existence of Multi-PS solution with respect to the distributions’ sensitivities, compared to the single agent case. Then, we study a decentralized extension to the greedy deployment scheme [Mendler-Dünner et al. , 2020], called the DSGD-GD scheme. We show that DSGD-GD converges to the Multi-PS solution and analyze its non asymptotic convergence rate. Numerical results validate our analysis.
JBHI Journal 2022 Journal Article
Paroxysmal atrial fibrillation (AF) is generally diagnosed by long-term dynamic electrocardiogram (ECG) monitoring. Identifying AF episodes from long-term ECG data can place a heavy burden on clinicians. Many machine-learning-based automatic AF detection methods have been proposed to solve this issue. However, these methods require numerous annotated data to train the model, and the annotation of AF in long-term ECG is extremely time-consuming. Reducing the demand for labeled data can effectively improve the clinical practicability of automatic AF detection methods. In this study, we developed a novel semi-supervised learning method that generated modified low-entropy labels of unlabeled samples for training a deep learning model to automatically detect paroxysmal AF in 24 h Holter monitoring data. Our method employed a 1D CNN-LSTM neural network with RR intervals as input and used few labeled training data with numerous unlabeled data for training the neural network. This method was evaluated using a 24 h Holter monitoring dataset collected from 1000 paroxysmal AF patients. Using labeled samples from only 10 patients for model training, our method achieved a sensitivity of 97. 8%, specificity of 97. 9%, and accuracy of 97. 9% in five-fold cross-validation. Compared to the supervised learning method with complete labeled samples, the detection accuracy of our method was only 0. 5% lower, while the workload of data annotation was significantly reduced by more than 98%. In general, this is the first study to apply semi-supervised learning techniques for automatic AF detection using ECG. Our method can effectively reduce the demand for AF data annotations and can improve the clinical practicability of automatic AF detection.
NeurIPS Conference 2021 Conference Paper
Generative Adversarial Networks (GANs) have made a dramatic leap in high-fidelity image synthesis and stylized face generation. Recently, a layer-swapping mechanism has been developed to improve the stylization performance. However, this method is incapable of fitting arbitrary styles in a single model and requires hundreds of style-consistent training images for each style. To address the above issues, we propose BlendGAN for arbitrary stylized face generation by leveraging a flexible blending strategy and a generic artistic dataset. Specifically, we first train a self-supervised style encoder on the generic artistic dataset to extract the representations of arbitrary styles. In addition, a weighted blending module (WBM) is proposed to blend face and style representations implicitly and control the arbitrary stylization effect. By doing so, BlendGAN can gracefully fit arbitrary styles in a unified model while avoiding case-by-case preparation of style-consistent training images. To this end, we also present a novel large-scale artistic face dataset AAHQ. Extensive experiments demonstrate that BlendGAN outperforms state-of-the-art methods in terms of visual quality and style diversity for both latent-guided and reference-guided stylized face synthesis.
AAAI Conference 2021 Conference Paper
Online insurance is a new type of e-commerce with exponential growth. An effective recommendation model that maximizes the total revenue of insurance products listed in multiple customized sales scenarios is crucial for the success of online insurance business. Prior recommendation models are ineffective because they fail to characterize the complex relatedness of insurance products in multiple sales scenarios and maximize the overall conversion rate rather than the total revenue. Even worse, it is impractical to collect training data online for total revenue maximization due to the business logic of online insurance. We propose RevMan, a Revenueaware Multi-task Network for online insurance recommendation. RevMan adopts an adaptive attention mechanism to allow effective feature sharing among complex insurance products and sales scenarios. It also designs an efficient offline learning mechanism to learn the rank that maximizes the expected total revenue, by reusing training data and model for conversion rate maximization. Extensive offline and online evaluations show that RevMan outperforms the state-of-theart recommendation systems for e-commerce.
IJCAI Conference 2021 Conference Paper
For many Internet companies, it has been an important focus to improve user retention rate. To achieve this goal, we need to recommend proper services in order to meet the demands of users. Unlike conventional click-through rate (CTR) estimation, there are lots of noise in the collected data when modeling retention, caused by two major issues: 1) implicit impression-revisit effect: users could revisit the APP even if they do not explicitly interact with the recommender system; 2) selection bias: recommender system suffers from selection bias caused by user's self-selection. To address the above challenges, we propose a novel method named UR-IPW (User Retention Modeling with Inverse Propensity Weighting), which 1) makes full use of both explicit and implicit interactions in the observed data. 2) models revisit rate estimation from a causal perspective accounting for the selection bias problem. The experiments on both offline and online environments from different scenarios demonstrate the superiority of UR-IPW over previous methods. To the best of our knowledge, this is the first work to model user retention by estimating the revisit rate from a causal perspective.
YNIMG Journal 2019 Journal Article
AAMAS Conference 2019 Conference Paper
Asymmetric Distributed Constraint Optimization Problems (AD- COPs) have emerged as an important formalism in multi-agent community due to their ability to capture personal preferences. However, the existing search-based complete algorithms for AD- COPs can only use local knowledge to compute lower bounds, which leads to inefficient pruning and prohibits them from solving large scale problems. On the other hand, inference-based complete algorithms (e. g. , DPOP) for Distributed Constraint Optimization Problems (DCOPs) require only a linear number of messages, but they cannot be directly applied into ADCOPs due to a privacy concern. Therefore, in the paper, we consider the possibility of combining inference and search to effectively solve ADCOPs at an acceptable loss of privacy. Specifically, we propose a hybrid complete algorithm called PT-ISABB which uses a tailored inference algorithm to provide tight lower bounds and a tree-based complete search algorithm to exhaust the search space. We prove the correctness of our algorithm and the experimental results demonstrate its superiority over other state-of-the-art complete algorithms.
YNIMG Journal 2019 Journal Article
AAAI Conference 2019 Short Paper
Lacking in sequence preserving mechanism, existing heterogeneous information network (HIN) embedding discards the essential type sequence information during embedding. We propose a Type Sequence Preserving HIN Embedding model (SeqHINE) which expands the HIN embedding to sequence level. SeqHINE incorporates the type sequence information via type-aware GRU and preserves representative sequence information by decay function. Abundant experiments show that SeqHINE can outperform state-of-the-art even with 50% less labeled data.
AAAI Conference 2018 Conference Paper
Learning low-dimensional representations of networks has proved effective in a variety of tasks such as node classi- fication, link prediction and network visualization. Existing methods can effectively encode different structural properties into the representations, such as neighborhood connectivity patterns, global structural role similarities and other highorder proximities. However, except for objectives to capture network structural properties, most of them suffer from lack of additional constraints for enhancing the robustness of representations. In this paper, we aim to exploit the strengths of generative adversarial networks in capturing latent features, and investigate its contribution in learning stable and robust graph representations. Specifically, we propose an Adversarial Network Embedding (ANE) framework, which leverages the adversarial learning principle to regularize the representation learning. It consists of two components, i. e. , a structure preserving component and an adversarial learning component. The former component aims to capture network structural properties, while the latter contributes to learning robust representations by matching the posterior distribution of the latent representations to given priors. As shown by the empirical results, our method is competitive with or superior to state-of-the-art approaches on benchmark network embedding tasks. The source code will be available online.
ICRA Conference 2017 Conference Paper
In support of Cloud Robotics, Robotics and Automation as a Service (RAaaS) frameworks have the potential to reduce the complexity of software development, simplify software installation and maintenance, and facilitate data sharing for machine learning. In this proof-of-concept paper, we describe Berkeley Robotics and Automation as a Service (Brass), a RAaaS prototype that allows robots to access a remote server that hosts a robust grasp-planning system (Dex-Net 1. 0) that maintains data on hundreds of candidate grasps on thousands of 3D object meshes and uses perturbation sampling to estimate and update a stochastic robustness metric for each grasp. Results suggest that such a system can increase grasp reliability over naive locally-computed grasping strategies with network latencies of 30 and 200 msec for servers 500 and 6000 miles away, respectively. We also study how the system can use execution reports from robots in the field to update grasp recommendations over time.
AAAI Conference 2017 Conference Paper
In this work, we study the guaranteed delivery model which is widely used in online advertising. In the guaranteed delivery scenario, ad exposures (which are also called impressions in some works) to users are guaranteed by contracts signed in advance between advertisers and publishers. A crucial problem for the advertising platform is how to fully utilize the valuable user traffic to generate as much as possible revenue. Different from previous works which usually minimize the penalty of unsatisfied contracts and some other cost (e. g. representativeness), we propose the novel consumption minimization model, in which the primary objective is to minimize the user traffic consumed to satisfy all contracts. Under this model, we develop a near optimal method to deliver ads for users. The main advantage of our method lies in that it consumes nearly as least as possible user traffic to satisfy all contracts, therefore more contracts can be accepted to produce more revenue. It also enables the publishers to estimate how much user traffic is redundant or short so that they can sell or buy this part of traffic in bulk in the exchange market. Furthermore, it is robust with regard to priori knowledge of user type distribution. Finally, the simulation shows that our method outperforms the traditional state-of-the-art methods.
IJCAI Conference 2017 Conference Paper
Stochastic block models (SBMs) provide a statistical way modeling network data, especially in representing clusters or community structures. However, most block models do not consider complex characteristics of networks such as scale-free feature, making them incapable of handling degree variation of vertices, which is ubiquitous in real networks. To address this issue, we introduce degree decay variables into SBM, termed power-law degree SBM (PLD-SBM), to model the varying probability of connections between node pairs. The scale-free feature is approximated by a power-law degree characteristic. Such a property allows PLD-SBM to correct the distortion of degree distribution in SBM, and thus improves the performance of cluster prediction. Experiments on both simulated networks and two real-world networks including the Adolescent Health Data and the political blogs network demonstrate the validity of the motivation of PLD-SBM, and its practical superiority.
NeurIPS Conference 2017 Conference Paper
Influence maximization is the problem of selecting $k$ nodes in a social network to maximize their influence spread. The problem has been extensively studied but most works focus on the submodular influence diffusion models. In this paper, motivated by empirical evidences, we explore influence maximization in the non-submodular regime. In particular, we study the general threshold model in which a fraction of nodes have non-submodular threshold functions, but their threshold functions are closely upper- and lower-bounded by some submodular functions (we call them $\varepsilon$-almost submodular). We first show a strong hardness result: there is no $1/n^{\gamma/c}$ approximation for influence maximization (unless P = NP) for all networks with up to $n^{\gamma}$ $\varepsilon$-almost submodular nodes, where $\gamma$ is in (0, 1) and $c$ is a parameter depending on $\varepsilon$. This indicates that influence maximization is still hard to approximate even though threshold functions are close to submodular. We then provide $(1-\varepsilon)^{\ell}(1-1/e)$ approximation algorithms when the number of $\varepsilon$-almost submodular nodes is $\ell$. Finally, we conduct experiments on a number of real-world datasets, and the results demonstrate that our approximation algorithms outperform other baseline algorithms.
TIST Journal 2016 Journal Article
Heterogeneous face recognition, also known as cross-modality face recognition or intermodality face recognition, refers to matching two face images from alternative image modalities. Since face images from different image modalities of the same person are associated with the same face object, there should be mutual components that reflect those intrinsic face characteristics that are invariant to the image modalities. Motivated by this rationality, we propose a novel approach called Mutual Component Analysis (MCA) to infer the mutual components for robust heterogeneous face recognition. In the MCA approach, a generative model is first proposed to model the process of generating face images in different modalities, and then an Expectation Maximization (EM) algorithm is designed to iteratively learn the model parameters. The learned generative model is able to infer the mutual components (which we call the hidden factor, where hidden means the factor is unreachable and invisible, and can only be inferred from observations) that are associated with the person’s identity, thus enabling fast and effective matching for cross-modality face recognition. To enhance recognition performance, we propose an MCA-based multiclassifier framework using multiple local features. Experimental results show that our new approach significantly outperforms the state-of-the-art results on two typical application scenarios: sketch-to-photo and infrared-to-visible face recognition.
AAAI Conference 2016 Conference Paper
Noisy and incomplete data restoration is a critical preprocessing step in developing effective learning algorithms, which targets to reduce the effect of noise and missing values in data. By utilizing attribute correlations and/or instance similarities, various techniques have been developed for data denoising and imputation tasks. However, current existing data restoration methods are either specifically designed for a particular task, or incapable of dealing with mixed-attribute data. In this paper, we develop a new probabilistic model to provide a general and principled method for restoring mixed-attribute data. The main contributions of this study are twofold: a) a unified generative model, utilizing a generic random mixed field (RMF) prior, is designed to exploit mixedattribute correlations; and b) a structured mean-field variational approach is proposed to solve the challenging inference problem of simultaneous denoising and imputation. We evaluate our method by classification experiments on both synthetic data and real benchmark datasets. Experiments demonstrate, our approach can effectively improve the classification accuracy of noisy and incomplete data by comparing with other data restoration methods.