EAAI Journal 2026 Journal Article
A self-explanatory deep learning-based soft sensor induced by a physical diffusion process and its application in an industrial process
- Xiao Wang
- Han Liu
- Xiaomei Qi
- Yong Zhang
Author name cluster
Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.
EAAI Journal 2026 Journal Article
AAAI Conference 2026 Conference Paper
Session-based recommendation (SBR) aims to predict anonymous users' next interaction based on their interaction sessions. In practical recommendation scenario, low-exposure items constitute the majority of interactions, creating a long-tail distribution that severely compromises recommendation diversity. Existing approaches attempt to address this issue by promoting tail items but incur accuracy degradation, exhibiting a "see-saw" effect between long-tail and accuracy performance. We attribute such conflict to session-irrelevant noise within the tail item set, which existing long-tail approaches fail to identify and constrain effectively. To resolve our fundamental conflict, we propose HID (Hybrid Intent-based Dual Constraint Framework), a plug-and-play framework that transforms the conventional "see-saw" into a "win-win" relationship through introducing the hybrid intent-based dual constraints. Two key innovations are incorporated in this framework: (i) Hybrid Intent Learning, where we reformulate the intent extraction strategies by employing attribute-aware spectral clustering to reconstruct the item-to-intent mapping. Furthermore, discrimination of session-irrelevant noise is achieved through the assignment of both target and noise intents to each sessions. (ii) Intent Constraint Loss, where we propose two novel constraint paradigms regarding the diversity and accuracy to regulate the representation learning process, and unify the two optimization objectives into a unique loss. Extensive experiments across multiple SBR models and datasets demonstrate that HID can enhance both long-tail performance and recommendation accuracy, establishing new state-of-the-art performance in long-tail recommender systems.
AAAI Conference 2026 Conference Paper
While deep generative models have significantly advanced representation learning, they may inherit or amplify biases and fairness issues by encoding sensitive attributes alongside predictive features. Enforcing strict independence in disentanglement is often unrealistic when target and sensitive factors are naturally correlated. To address this challenge, we propose CAD-VAE(Correlation-Aware Disentangled VAE), which introduces a correlated latent code to capture the information shared between the target and sensitive attributes. Given this correlated latent, our method effectively separates overlapping factors without extra domain knowledge by directly minimizing the conditional mutual information between target and sensitive codes. A relevance-driven optimization strategy refines the correlated code by efficiently capturing essential correlated features and eliminating redundancy. Extensive experiments on benchmark datasets demonstrate that CAD-VAE produces fairer representations, realistic counterfactuals, and improved fairness-aware image editing.
EAAI Journal 2026 Journal Article
AAAI Conference 2026 Conference Paper
Time-Series (TS) exhibits pronounced non-stationarity. Consequently, most forecasting methods display compromised robustness to concept drift, despite the prevalent application of instance normalization. We tackle this challenge by first analysing concept drift through a bias-variance lens and proving that weighted ensemble reduces variance without increasing bias. These insights motivate DeepBooTS, a novel end-to-end dual-stream residual-decreasing boosting method that progressively reconstructs the intrinsic signal. In our design, each block of a deep model becomes an ensemble of learners with an auxiliary output branch forming a highway to the final prediction. The block‑wise outputs correct the residuals of previous blocks, leading to a learning‑driven decomposition of both inputs and targets. This method enhances versatility and interpretability while substantially improving robustness to concept drift. Extensive experiments, including those on large-scale datasets, show that the proposed method outperforms existing methods by a large margin, yielding an average performance improvement of 15.8% across various datasets, establishing a new benchmark for TS forecasting.
JBHI Journal 2026 Journal Article
Noninvasive continuous blood pressure (BP) monitoring has become a critical requirement for effective health management in the general population. To address the challenge of accurate few-shot personalized BP estimation, a photoplethysmography (PPG)-based framework built on the unified multi-task time series model with a Transformer backbone is proposed. The framework comprises population-level pretraining and personalized fine tuning with a pulse pressure segmented penalty (PPSP) loss. The PPSP couples systolic BP (SBP) and diastolic BP (DBP) outputs by penalizing pulse pressure values outside clinically accepted ranges, which enforces physiological consistency. In addition, a sampling-rate-robust low-rank adaptation (SRR-LoRA) is introduced to improve estimation accuracy when low-frequency PPG signals are employed. After rate alignment, SRR-LoRA prioritizes measurements over interpolated points, suppresses interpolation noise, and preserves cross-device generalization. Model performance was evaluated on the UCI cuffless BP estimation dataset, the University of Queensland vital signs dataset, and the CAS-BP dataset. 113, 812 samples from 2, 405 subjects were used for pretraining, and data from 316 subjects (each with 50 samples) were included for few-shot fine tuning. The proposed method achieved mean absolute errors of 1. 52/1. 07 mmHg for SBP/DBP. These results fulfill the Association for the Advancement of Medical Instrumentation BP standard and correspond to Grade A performance according to the British Hypertension Society standard and IEEE 1708 standard, which demonstrates the framework's potential for practical personalized wearable BP monitoring.
EAAI Journal 2026 Journal Article
AAAI Conference 2026 Conference Paper
Conditional molecular generation, aiming to generate 2D and 3D molecules that satisfy given properties, has achieved remarkable progress, thanks to the advances in deep generative models such as graph diffusion. However, existing methods generally assume that the given conditions for training and testing are consistent, failing to handle the realistic challenge when there exist distribution shifts between training and testing conditions. Invariant learning is a mainstream paradigm for addressing distribution shifts, but fusing invariant learning principles with conditional molecular generation faces three core challenges: (1) existing invariant learning methods focus on discriminative tasks and cannot be directly adapted to molecule generative tasks; (2) how to distinguish between invariant subgraph and variant subgraph of a molecule graph, which is treated as an integrated input; (3) how to fuse invariant subgraphs, variant subgraphs, and property conditions for effective generation. To tackle these challenges, we propose Invariant Conditional MOLecular generation (IC-MOL), a framework that combines invariant learning with graph diffusion to improve the generalization ability of conditional molecular generation under distribution shifts. Specifically, we first disentangle molecular graphs into invariant and variant subgraphs while maintaining SE(3) equivariance, an important inductive bias for molecular generation. On this basis, we further design a two-phase graph diffusion generation model. In the first phase, we generate an invariant molecular consistent with the target property. In the second phase, we propose a cross-attention mechanism to fuse variant subgraph representations and property conditions to guide the generation of complete molecules while maintaining property alignment. Extensive experiments on the benchmark dataset show that IC-MOL consistently outperforms state-of-the-art baselines across six property conditions under distribution shifts.
JBHI Journal 2026 Journal Article
The subcellular localization of messenger RNA (mRNA) is essential for the regulation of gene expression and plays a pivotal role in targeted drug development. Although several computational models have been developed to predict mRNA localization, these approaches still face challenges in sequence representation and exhibit limited performance in handling multi-localization tasks. In this paper, we propose mRSubLoc, a novel multi-label deep learning framework for predicting mRNA subcellular localization. The model integrates the RNA large language model RNAErnie with one-hot encoding and Word2Vec embeddings to construct a comprehensive representation of mRNA sequences. A text convolutional neural network (TextCNN) is employed to capture local feature patterns, while a bidirectional long short-term memory network (BiLSTM) is used to capture long-range dependencies. These features are fused using a multi-head self-attention mechanism to effectively capture localization-specific characteristics. Finally, a multi-layer perceptron (MLP) explores complex dependencies among multiple localization sites, facilitating accurate mRNA subcellular localization prediction. Experimental results on a testing set demonstrate that mRSubLoc significantly outperforms state-of-the-art methods across multiple metrics, including Aiming (0. 7858), Coverage (0. 6212), Accuracy (0. 6161), Absolute-True (0. 3070), and Absolute-False (0. 1319). This study proposes a novel approach for predicting mRNA subcellular localization and provides new perspectives for advancing disease diagnosis and drug discovery in biomedical research.
AAAI Conference 2026 Conference Paper
Video object detection is a fundamental yet challenging task in computer vision. Recently, DETR-based methods have gained prominence in this domain owing to their powerful global modeling capabilities. However, these methods are still confronted with two key limitations: frame-agnostic initialization of object queries and scale-agnostic attention mechanisms, which hinder their capability to capture the appearance variations of dynamic objects and model the temporal consistency across frames. To alleviate these limitations, we propose a multiscale-aware transformer diffusion network (MSTDiff), a novel framework designed for the video object detection task, including two technical improvements over existing methods. First, we design a diffusion-driven adaptive query module, which models the object query distribution through a diffusion process conditioned on input frames, enabling an adaptive and content-aware initialization of object queries. Second, we develop a multiscale-aware transformer encoder module, which combines multi-head convolutional units with attention mechanisms to enhance multi-scale feature representations while preserving global dependence modeling. We conduct extensive experiments on the public ImageNet VID dataset, and the results demonstrate that our MSTDiff achieves 87.7% mAP with ResNet-101, outperforming most previous state-of-the-art video object detection methods.
TMLR Journal 2026 Journal Article
In computer vision, Visual Prompting (VP) and Visual Prompt Tuning (VPT) have recently emerged as lightweight and effective alternatives to full fine-tuning for adapting large-scale vision models within the ``pretrain-then-finetune'' paradigm. However, despite rapid progress, their conceptual boundaries remain blurred, as VP and VPT are frequently used interchangeably in current research, reflecting a lack of systematic distinction between these techniques and their respective applications. In this survey, we revisit the designs of VP and VPT from first principles, and conceptualize them within a unified framework termed Prompt-based Adaptation (PA). Within this framework, we distinguish methods based on their injection granularity: VP operates at the pixel level, while VPT injects prompts at the token level. We further categorize these methods by their generation mechanism into fixed, learnable, and generated prompts. Beyond the core methodologies, we examine PA’s integrations across diverse domains, including medical imaging, 3D point clouds, and vision-language tasks, as well as its role in test-time adaptation and trustworthy AI. We also summarize current benchmarks and identify key challenges and future directions. To the best of our knowledge, we are the first comprehensive survey dedicated to PA's methodologies and applications in light of their distinct characteristics. Our survey aims to provide a clear roadmap for researchers and practitioners in all area to understand and explore the evolving landscape of PA-related research.
JBHI Journal 2026 Journal Article
Inspired by the tremendous success of Large Language Models (LLMs), existing Radiology report generation methods attempt to leverage large models to achieve better performance. They usually adopt a Transformer to extract the visual features of a given X-ray image, and then, feed them into the LLM for text generation. How to extract more effective information for the LLMs to help them improve final results is an urgent problem that needs to be solved. Additionally, the use of visual Transformer models also brings high computational complexity. To address these issues, this paper proposes a novel context-guided efficient radiology report generation framework. Specifically, we introduce the Mamba as the vision backbone with linear complexity, and the performance obtained is comparable to that of the strong Transformer model. More importantly, we perform context retrieval from the training set for samples within each mini-batch during the training phase, utilizing both positively and negatively related samples to enhance feature representation and discriminative learning. Subsequently, we feed the vision tokens, context information, and prompt statements to invoke the LLM for generating high-quality medical reports. Extensive experiments on three X-ray report generation datasets (i. e. , IU X-Ray, MIMIC-CXR, CheXpert Plus) fully validated the effectiveness of our proposed model. The source code is available at https://github.com/Event-AHU/Medical_ Image_Analysis.
AAAI Conference 2026 Conference Paper
Existing cross-modal pedestrian detection (CMPD) employs complementary information from RGB and thermal-infrared (TIR) modalities to detect pedestrians in 24h-surveillance systems. RGB captures rich pedestrian details under daylight, while TIR excels at night. However, TIR focuses primarily on the person's silhouette, neglecting critical texture details essential for detection. While the near-infrared (NIR) captures texture under low-light conditions, which effectively alleviates performance issues of RGB and detail loss in TIR, thereby reducing missed detections. To this end, we construct a new Triplet RGB–NIR–TIR (TRNT) dataset, comprising 8,281 pixel-aligned image triplets, establishing a comprehensive foundation for algorithmic research. However, due to the variable nature of real-world scenarios, imaging devices may not always capture all three modalities simultaneously. This results in input data with unpredictable combinations of modal types, which challenge existing CMPD methods that fail to extract robust pedestrian information under arbitrary input combinations, leading to significant performance degradation. To address these challenges, we propose the Adaptive Uncertainty-aware Network (AUNet) for accurately discriminating modal availability and fully utilizing the available information under uncertain inputs. Specifically, we introduce Unified Modality Validation Refinement (UMVR), which includes an uncertainty-aware router to validate modal availability and a semantic refinement to ensure the reliability of information within the modality. Furthermore, we design a Modality-Aware Interaction (MAI) module to adaptively activate or deactivate its internal interaction mechanisms per UMVR output, enabling effective complementary information fusion from available modalities. AUNet enables accurate modality validation and robust inference without fixed modality pairings, facilitating the effective fusion of RGB, NIR, and TIR information across diverse inputs.
AAAI Conference 2026 Conference Paper
Lens flare is a common nighttime artifact caused by strong light sources scattering within camera lenses, leading to hazy streaks, halos, and glare that degrade visual quality. However, existing methods usually fail to effectively address nonuniform scattered flares, which severely reduces their applicability to complex real-world scenarios with diverse lighting conditions.To address this issue, we propose SLCFormer, a novel spectral-local context transformer framework for effective nighttime lens flare removal. SLCFormer integrates two key modules: the Frequency Fourier and Excitation Module (FFEM), which captures efficient global contextual representations in the frequency domain to model flare characteristics, and the Directionally-Enhanced Spatial Module (DESM) for local structural enhancement and directional features in the spatial domain for precise flare removal. Furthermore, we introduce a ZernikeVAE-based scatter flare generation pipeline to synthesize physically realistic scatter flares with spatially varying PSFs, bridging optical physics and data-driven training. Extensive experiments on the Flare7K++ dataset demonstrate that our method achieves state-of-the-art performance, outperforming existing approaches in both quantitative metrics and perceptual visual quality, and generalizing robustly to real nighttime scenes with complex flare artifacts.
AAAI Conference 2026 Conference Paper
Long Chain-of-Thought (CoT) reasoning enhances large reasoning models' performance but suffers from severe inefficiencies, as models often overthink simple problems or underthink complex ones. Current sequence-level optimizations, like length penalties, are too coarse-grained to distinguish core logic from verbose language, precluding the necessary token-level control for efficient reasoning CoT. To overcome these limitations, we introduce Time-Frequency token Advantage Clipping (TFAC), a novel training framework designed to build efficient large reasoning models via token-level interventions. Specifically, TFAC functions along two dimensions: 1) The Frequency Dimension: It discourages inefficient loops and encourages deeper exploration by dynamically reducing the advantage scores of high-entropy tokens that are repeatedly generated within a single reasoning path. 2) The Time Dimension: It reduces excessive overthinking of the system by establishing a historical baseline for the occurrence count of each critical token in previously successful trajectories, and clipping the advantages of tokens that exceed this baseline during training. Crucially, to preserve the model's exploratory capabilities on novel problems, this suppression mechanism is automatically disabled when no historical record of success is available. Experiments conducted on the Deepseek-Distill-32B and Qwen3-8B models show that TFAC outperforms leading baseline methods, improving performance by 2.3 and 3.1 percentage points, respectively, while simultaneously reducing inference costs by 35% and 28% in scenarios where correct answers are generated. These results validate the significant efficacy of TFAC in training large reasoning models that are both powerful and highly efficient.
AAAI Conference 2026 Conference Paper
We introduce the Probabilistic Coin Change Problem (PCCP), a novel variant of the classical Combination Coin Change Problem (CCCP), motivated by a real-world scientific inverse task. The goal of CCCP is to enumerate all unordered combinations of coin denominations that sum to a given target. In PCCP, each coin type’s value follows a discrete probability distribution, and the aggregate value of a combination of coins is thus stochastic. Given a set of such coin types and noisy observations of total sums, the task is to infer the most likely latent coin combination. To address the combinatorial and probabilistic complexity of PCCP, we propose DeepProReasoner (Deep Combinatorial Probabilistic Reasoning with Embedded Representations), an unsupervised, end-to-end, deep-learning framework that integrates combinatorial reasoning, latent-space modeling, and differentiable probabilistic reasoning. The model is trained using a reconstruction loss between the observed empirical distribution and a decoded probability mass function (PMF), enabling efficient gradient-based search over a continuous relaxation of the combinatorial space. We evaluate DeepProReasoner on two instances of PCCP: (1) a synthetic Candy Mix problem for ablation studies, and (2) a real-world task of molecular formula inference from ultrahigh resolution mass spectrometry (MS) data. Besides the two given instances, PCCP captures a wide range of inverse settings in biology, chemistry, environmental sciences, and medicine, where latent combinatorial structures give rise to noisy aggregate observations through stochastic processes. Our results show that DeepProReasoner achieves high accuracy and robustness, outperforming state-of-the-art methods.
AAAI Conference 2026 Conference Paper
Cardiac magnetic resonance (CMR) imaging is widely used to characterize cardiac morphology and function. To accelerate CMR imaging, various methods have been proposed to recover high-quality spatiotemporal CMR images from highly undersampled k-t space data. However, current CMR reconstruction techniques either fail to achieve satisfactory image quality or are restricted by the scarcity of ground truth data, leading to limited applicability in clinical scenarios. In this work, we proposed MoCo‑INR, a new unsupervised method that integrates implicit neural representations (INR) with the conventional motion‑compensated (MoCo) framework. Using the explicit motion modeling and the continuous prior of INRs, our MoCo-INR can produce accurate cardiac motion decomposition and high-quality CMR reconstruction. Moreover, we present a new INR network architecture tailored to the CMR problem, which can greatly stabilize model optimization. Experiments on retrospective (i.e., simulated) datasets demonstrate the superiority of MoCo‑INR over state‑of‑the‑art methods, achieving fast convergence and fine‑detailed reconstructions at ultra‑high acceleration factors (e.g., 20x in VISTA sampling). In addition, evaluations on prospective (i.e., real-acquired) free‑breathing CMR scans highlight its clinical practicality for real‑time imaging. Several ablation studies also confirm the effectiveness of critical components of MoCo-INR.
AAAI Conference 2026 Conference Paper
Recent researchers have proposed using event cameras for person re-identification (ReID) due to their promising performance and better balance in terms of privacy protection, event camera-based person ReID has attracted significant attention. Currently, mainstream event-based person ReID algorithms primarily focus on fusing visible light and event stream, as well as preserving privacy. Although significant progress has been made, these methods are typically trained and evaluated on small-scale or simulated event camera datasets, making it difficult to assess their real identification performance and generalization ability. To address the issue of data scarcity, this paper introduces a large-scale RGB-event based person ReID dataset, called EvReID. The dataset contains 118,988 image pairs and covers 1200 pedestrian identities, with data collected across multiple seasons, scenes, and lighting conditions. We also evaluate 15 state-of-the-art person ReID algorithms, laying a solid foundation for future research in terms of both data and benchmarking. Based on our newly constructed dataset, this paper further proposes a pedestrian attribute-guided contrastive learning framework to enhance feature learning for person re-identification, termed TriPro-ReID. This framework not only effectively explores the visual features from both RGB frames and event streams, but also fully utilizes pedestrian attributes as mid-level semantic features. Extensive experiments on the EvReID dataset and MARS datasets fully validated the effectiveness of our proposed RGB-Event person ReID framework.
EAAI Journal 2025 Journal Article
IROS Conference 2025 Conference Paper
The social robot’s open API allows users to customize open-domain interactions. However, it remains inaccessible to those without programming experience. We introduce AutoMisty, the first LLM-powered multi-agent framework that converts natural-language commands into executable Misty robot code by decomposing high-level instructions, generating sub-task code, and integrating everything into a deployable program. Each agent employs a two-layer optimization mechanism: first, a self-reflective loop that instantly validates and automatically executes the generated code, regenerating whenever errors emerge; second, human review for refinement and final approval, ensuring alignment with user preferences and preventing error propagation. To evaluate AutoMisty’s effectiveness, we designed a benchmark task set spanning four levels of complexity and conducted experiments in a real Misty robot environment. Extensive evaluations demonstrate that AutoMisty not only consistently generates high-quality code but also enables precise code control, significantly outperforming direct reasoning with ChatGPT-4o and ChatGPT-o1. All code, optimized APIs, and experimental videos will be publicly released through the webpage: AutoMisty.
ICML Conference 2025 Conference Paper
This paper provides a comprehensive analysis of variational inference in latent variable models for survival analysis, emphasizing the distinctive challenges associated with applying variational methods to survival data. We identify a critical weakness in the existing methodology, demonstrating how a poorly designed variational distribution may hinder the objective of survival analysis tasks—modeling time-to-event distributions. We prove that the optimal variational distribution, which perfectly bounds the log-likelihood, may depend on the censoring mechanism. To address this issue, we propose censor-dependent variational inference (CDVI), tailored for latent variable models in survival analysis. More practically, we introduce CD-CVAE, a V-structure Variational Autoencoder (VAE) designed for the scalable implementation of CDVI. Further discussion extends some existing theories and training techniques to survival analysis. Extensive experiments validate our analysis and demonstrate significant improvements in the estimation of individual survival distributions.
TMLR Journal 2025 Journal Article
Several variants of Variational Autoencoders have been developed to address inherent limitations. Specifically, $\sigma$-VAE utilizes a scaled identity matrix $\sigma^2 I$ in the decoder variance, while $\beta$-VAE introduces a hyperparameter $\beta$ to reweight the negative ELBO loss. However, a unified theoretical and practical understanding of model optimality remains unclear. For example, existing learning theories on the global optimality of VAE provide limited insight into their empirical success. Previous work showed the mathematical equivalence between the variance scalar $\sigma^2$ and the hyperparameter $\beta$ in shaping the loss landscape. While $\beta$-annealing is widely used, how to implement $\sigma$-annealing is still unclear. This paper presents a comprehensive analysis of $\sigma$-CVAE, highlighting its enhanced expressiveness in parameterizing conditional densities while addressing the associated estimation challenges arising from suboptimal variational inference. In particular, we propose Calibrated Robust $\sigma$-CVAE, a doubly robust algorithm that facilitates accurate estimation of $\sigma$ while effectively preventing the posterior collapse of $\phi$. Our approach, leveraging functional neural decomposition and KL annealing techniques, provides a unified framework to understand both $\sigma$-VAE and $\beta$-VAE regarding parameter optimality and training dynamics. Experimental results on synthetic and real-world datasets demonstrate the superior performance of our method across various conditional density estimation tasks, highlighting its significance for accurate and reliable probabilistic modeling.
ECAI Conference 2025 Conference Paper
Vision-Language Foundation Models (VLFMs) demonstrate promise in zero-shot learning through joint visual-textual representations. However, in histopathology image analysis, their effectiveness is limited by weak image-text alignment due to coarse-grained textual descriptions that fail to capture critical fine-grained visual details. This misalignment introduces semantic noise and imprecision in zero-shot retrieval, hindering the identification of relevant cases and degrading downstream classification. To address this, we introduce Retrieval-based De-noising Causal Language Modelling (RDCLM), a novel framework that refines noisy retrieval outputs from pathology VLFMs. RDCLM constructs a pathology-specific knowledge base of fine-grained, discriminative tumour malignancy descriptions using a large language model (LLM). Given a query histopathology image, a pathology VLFM retrieves candidate descriptions from this knowledge base. Our de-noising module, leveraging a frozen language model, integrates visual features with these retrieved texts, filtering irrelevant content and enhancing semantic alignment. This significantly improves retrieval precision (by an average of 10% across datasets) and enables more accurate zero-shot image classification. To further bolster performance and generalization, we propose two retrieval augmentation strategies: Retrieval Negatives Replacement (RNR) and Description-wise Shuffling (DS). Extensive evaluations across four histopathology cancer datasets demonstrate that RDCLM significantly outperforms state-of-the-art methods in both zero-shot image-text retrieval and malignancy classification, achieving an average improvement of 12. 7% in F1-score and 9. 6% in accuracy over the second-best competitor. These results highlight the importance of retrieval de-noising for advancing VLFM-based zero-shot learning in histopathology. Our code available at: https: //github. com/xw18958/RDCLM
YNIMG Journal 2025 Journal Article
Fluid-attenuated inversion recovery (FLAIR) is indispensable in MRI-based head-and-neck assessments, but its quantitative counterpart remains clinically absent due to the influence of cerebrospinal fluid (CSF) dynamics and the lengthy acquisition time spent on a series of weighting-increasing images. This work implements and validates fast fluid-attenuated T2 (FLA-T2) mapping via inversion-recovery-prepared multiple overlapping-echo detachment imaging (IR-MOLED). The clinical value is prospectively investigated with a cohort of 54 meningioma patients (mean age: 56 years ± 11 [standard deviation]; 19 men). Fluid-attenuated proton density mapping was simultaneously fulfilled and therefore intrinsically co-registered, revealing notable benefits in identifying CSF inflow. In quantifying parenchymal T2, IR-MOLED yielded a mean absolute error of 1.22 ms referring to spin-echo, and in fluid suppression, IR-MOLED exhibited a high radiographic consistence with orthodox FLAIR imaging. Using first-level histogram analysis, results of meningioma investigation first discovered: (1) in grading meningiomas, FLA-T2 mapping (AUC = 0.814) outshined FLAIR imaging (AUC = 0.685), contrast-enhanced T1-weighted imaging (insignificant), and T2 mapping (insignificant); and (2) in typing meningiomas, FLA-T2 classified transitional meningiomas from meningothelial or/and fibrous meningiomas, complementing the predictive ability of T2 mapping. In conclusion, with excluded parametric contribution from free water and standardized voxel value scales, FLA-T2 mapping permits a more precise description of brain parenchyma in both structural morphology and relaxation variables than T2 mapping and is fully superior to FLAIR imaging in preoperatively predicting the histopathologic heterogeneity of meningiomas.
IJCAI Conference 2025 Conference Paper
Heterogeneous Graph Neural Networks (HGNNs) are vulnerable, highlighting the need for tailored attacks to assess their robustness and ensure security. However, existing HGNN attacks often require complex retraining of parameters to generate specific perturbations for new scenarios. Recently, foundation models have opened new horizons for the generalization of graph neural networks by capturing shared semantics across various graph distributions. This leads us to ask: Can we design a foundation attack model for HGNNs that enables generalizable perturbations across different HGNNs, and quickly adapts to new heterogeneous graphs (HGs)? Empirical findings reveal that, despite significant differences in model design and parameter space, different HGNNs surprisingly share common vulnerability patterns from a relation-aware perspective. Therefore, we explore how to design foundation HGNN attack criteria by mining shared attack units. In this paper, we propose a novel relation-wise heterogeneous graph foundation attack model, HeTa. We introduce a foundation surrogate model to align heterogeneity and identify the importance of shared relation-aware attack units. Building on this, we implement a serialized relation-by-relation attack based on the identified relational weights. In this way, the perturbation can be transferred to various target HGNNs and easily fine-tuned for new HGs. Extensive experiments exhibit powerful attack performances and generalizability of our method.
AAAI Conference 2025 Conference Paper
Due to its effectiveness and efficiency, graph-based multi-view clustering has recently attracted much attention. However, the multi-view data are often incomplete and unpaired in real-world applications as a consequence of data loss or corruption. Although efforts have been made through a series of methods to address the problems of incomplete or unpaired multi-view data, the following issues still persist: 1) Most existing methods only focus on the incomplete multi-view data or unpaired multi-view data, and exhibit weaknesses when addressing both incomplete and unpaired multi-view data simultaneously. 2) Some methods neglect the graph information of the data from different views during the learning process. To tackle these issues, we propose the Multi-view Graph Clustering framework with Cross-view Feature Fusion (MGCCFF), a novel approach for clustering incomplete and unpaired multi-view data. Specifically, MGCCFF learns soft clustering label information from complete data and utilizes this to capture category-level cross-view correspondences. It then learns latent representation enriched with cross-view information based on the established mappings. To obtain a multi-view graph structure under conditions of incomplete and unpaired data, MGCCFF innovatively integrates the concept of self-expression with the autoencoder architecture and exploits the latent relationships between labels and the graph structure, thereby enabling the generation of sparse and accurate graphical structure under multi-view conditions for the final clustering task. The experiments on incomplete and unpaired multi-view datasets demonstrate that MGCCFF outperforms state-of-the-art methods.
EAAI Journal 2025 Journal Article
AAAI Conference 2025 Conference Paper
We consider nonconvex optimization problem over simplex, and more generally, a product of simplices. We provide an algorithm, Langevin Multiplicative Weights Update (LMWU) for solving global optimization problems by adding a noise scaling with the non-Euclidean geometry in the simplex. Non-convex optimization has been extensively studied by machine learning community due to its application in various scenarios such as neural network approximation and finding Nash equilibrium. Despite recent progresses on provable guarantee of escaping and avoiding saddle point (convergence to local minima) and global convergence of Langevin gradient based method without constraints, the global optimization with constraints is less studied. We show that LMWU algorithm is provably convergent to interior global minima with a non-asymptotic convergence analysis. We verify the efficiency of the proposed algorithm in real data set from polynomial portfolio management, where optimization of a highly non-linear objective function plays a crucial role.
EAAI Journal 2025 Journal Article
EAAI Journal 2025 Journal Article
NeurIPS Conference 2025 Conference Paper
Decoding visual experiences from fMRI offers a powerful avenue to understand human perception and develop advanced brain-computer interfaces. However, current progress often prioritizes maximizing reconstruction fidelity while overlooking interpretability, an essential aspect for deriving neuroscientific insight. To address this gap, we propose MoRE-Brain, a neuro-inspired framework designed for high-fidelity, adaptable, and interpretable visual reconstruction. MoRE-Brain uniquely employs a hierarchical Mixture-of-Experts architecture where distinct experts process fMRI signals from functionally related voxel groups, mimicking specialized brain networks. The experts are first trained to encode fMRI into the frozen CLIP space. A finetuned diffusion model then synthesizes images, guided by expert outputs through a novel dual-stage routing mechanism that dynamically weighs expert contributions across the diffusion process. MoRE-Brain offers three main advancements: First, it introduces a novel Mixture-of-Experts architecture grounded in brain network principles for neuro-decoding. Second, it achieves efficient cross-subject generalization by sharing core expert networks while adapting only subject-specific routers. Third, it provides enhanced mechanistic insight, as the explicit routing reveals precisely how different modeled brain regions shape the semantic and spatial attributes of the reconstructed image. Extensive experiments validate MoRE-Brain’s high reconstruction fidelity, with bottleneck analyses further demonstrating its effective utilization of fMRI signals, distinguishing genuine neural decoding from over-reliance on generative priors. Consequently, MoRE-Brain marks a substantial advance towards more generalizable and interpretable fMRI-based visual decoding.
EAAI Journal 2025 Journal Article
ECAI Conference 2025 Conference Paper
In this paper, we introduce the Perturbed Natural Adaptive Gradient Descent (PN-AdaGrad) method, a novel optimization algorithm that combines the principles of Natural gradient descent and adaptive gradient descent on Riemannian manifold. We provide a rigorous theoretical analysis of the PN-AdaGrad method, proving its convergence to critical point of the objective function under mild assumptions. To validate the practical effectiveness of the PN-AdaGrad method, we verify our algorithm on real-world datasets in the context of portfolio optimization. Portfolio optimization involves selecting the optimal allocation of assets to maximize returns while minimizing risk. Our experiments show that the PN-AdaGrad method outperforms traditional gradient descent and other state-of-the-art optimization algorithms.
JMLR Journal 2025 Journal Article
In this paper, we establish tight lower bounds for Byzantine-robust distributed first-order stochastic methods in both strongly convex and non-convex stochastic optimization. We reveal that when the distributed nodes have heterogeneous data, the convergence error comprises two components: a non-vanishing Byzantine error and a vanishing optimization error. We establish the lower bounds on the Byzantine error and on the minimum number of queries to a stochastic gradient oracle for achieving an arbitrarily small optimization error. Nevertheless, we also identify significant discrepancies between our established lower bounds and the existing upper bounds. To fill this gap, we leverage the techniques of Nesterov's acceleration and variance reduction to develop novel Byzantine-robust distributed stochastic optimization methods that provably match these lower bounds, up to at most logarithmic factors, implying that our established lower bounds are tight. [abs] [ pdf ][ bib ] © JMLR 2025. ( edit, beta )
AAAI Conference 2025 Conference Paper
Pedestrian Attribute Recognition (PAR) is one of the indispensable tasks in human-centered research. However, existing datasets neglect different domains (e.g., environments, times, populations, and data sources), only conducting simple random splits, and the performance of these datasets has already approached saturation. In the past five years, no large-scale dataset has been opened to the public. To address this issue, this paper proposes a new large-scale, cross-domain pedestrian attribute recognition dataset to fill the data gap, termed MSP60K. It consists of 60,122 images and 57 attribute annotations across eight scenarios. Synthetic degradation is also conducted to further narrow the gap between the dataset and real-world challenging scenarios. To establish a more rigorous benchmark, we evaluate 17 representative PAR models under both random and cross-domain split protocols on our dataset. Additionally, we propose an innovative Large Language Model (LLM) augmented PAR framework, named LLM-PAR. This framework processes pedestrian images through a Vision Transformer (ViT) backbone to extract features and introduces a multi-embedding query Transformer to learn partial-aware features for attribute classification. Significantly, we enhance this framework with LLM for ensemble learning and visual feature augmentation. Comprehensive experiments across multiple PAR benchmark datasets have thoroughly validated the efficacy of our proposed framework.
NeurIPS Conference 2025 Conference Paper
Diffusion models are increasingly deployed in real-world text-to-image services. These models, however, encode implicit assumptions about the world based on web-scraped image-caption pairs used during training. Over time, such assumptions may become outdated, incorrect, or socially biased--leading to failures where the generated images misalign with users' expectations or evolving societal norms. Identifying and fixing such failures is challenging and, thus, a valuable asset for service providers, as failures often emerge post-deployment and demand specialized expertise and resources to resolve them. In this work, we introduce $\textit{SURE}$, the first end‑to‑end framework that $\textbf{S}$ec$\textbf{U}$rely $\textbf{RE}$pairs failures flagged by users of diffusion-based services. $\textit{SURE}$ enables the service provider to securely collaborate with an external third-party specialized in model repairing (i. e. , Model Repair Institute) without compromising the confidentiality of user feedback, the service provider’s proprietary model, or the Model Repair Institute’s proprietary repairing knowledge. To achieve the best possible efficiency, we propose a co-design of a model editing algorithm with a customized two-party cryptographic protocol. Our experiments show that $\textit{SURE}$ is highly practical: $\textit{SURE}$ securely and effectively repairs all 32 layers of {Stable Diffusion v1. 4} in under 17 seconds (four orders of magnitude more efficient than a general baseline). Our results demonstrate that practical, secure model repair is attainable for large-scale, modern diffusion services.
AAAI Conference 2025 Conference Paper
To preserve user privacy in recommender systems, federated recommendation (FR) based on federated learning (FL) emerges, keeping the personal data on the local client and updating a model collaboratively. Unlike FL, FR has a unique sparse aggregation mechanism, where the embedding of each item is updated by only partial clients, instead of full clients in a dense aggregation of general FL. Recently, as an essential principle of FL, model security has received increasing attention, especially for Byzantine attacks, where malicious clients can send arbitrary updates. The problem of exploring the Byzantine robustness of FR is particularly critical since in the domains applying FR, e.g., e-commerce, malicious clients can be injected easily by registering new accounts. However, existing Byzantine works neglect the unique sparse aggregation of FR, making them unsuitable for our problem. Thus, we make the first effort to investigate Byzantine attacks on FR from the perspective of sparse aggregation, which is non-trivial: it is not clear how to define Byzantine robustness under sparse aggregations and design Byzantine attacks under limited knowledge/capability. In this paper, we reformulate the Byzantine robustness under sparse aggregation by defining the aggregation for a single item as the smallest execution unit. Then we propose a family of effective attack strategies, named Spattack, which exploit the vulnerability in sparse aggregation and are categorized along the adversary's knowledge and capability. Extensive experimental results demonstrate that Spattack can effectively prevent convergence and even break down defenses under a few malicious clients, raising alarms for securing FR systems.
NeurIPS Conference 2025 Conference Paper
Event cameras provide asynchronous, low-latency, and high-dynamic-range visual signals, making them ideal for real-time perception tasks such as object detection. However, effectively modeling the temporal dynamics of event streams remains a core challenge. Most existing methods follow frame-based detection paradigms, applying temporal modules only at high-level features, which limits early-stage temporal modeling. Transformer-based approaches introduce global attention to capture long-range dependencies, but often add unnecessary complexity and overlook fine-grained temporal cues. In this paper, we propose a CNN-RNN hybrid framework that rethinks temporal modeling for event-based object detection. Our approach is based on two key insights: (1) introducing recurrent modules at lower spatial scales to preserve detailed temporal information where events are most dense, and (2) utilizing Decoupled Deformable-enhanced Recurrent Layers specifically designed according to the inherent motion characteristics of event cameras to extract multiple spatiotemporal features, and performing independent downsampling at multiple spatiotemporal scales to enable flexible, scale-aware representation learning. These multi-scale features are then fused via a feature pyramid network to produce robust detection outputs. Experiments on Gen1, 1 Mpx and eTram dataset demonstrate that our approach achieves superior accuracy over recent transformer-based models, highlighting the importance of precise temporal feature extraction in early stages. This work offers a new perspective on designing architectures for event-driven vision beyond attention-centric paradigms. Code: https: //github. com/BIT-Vision/SATE.
NeurIPS Conference 2025 Conference Paper
This paper presents a novel approach to addressing the long-sequence problem in high-resolution medical images for Vision Transformers (ViTs). Using smaller patches as tokens can enhance ViT performance, but quadratically increases computation and memory requirements. Therefore, the common practice for applying ViTs to high-resolution images is either to: (a) employ complex sub-quadratic attention schemes or (b) use large to medium-sized patches and rely on additional mechanisms within the model to capture the spatial hierarchy of details. We propose Symmetrical Hierarchical Forest (SHF), a lightweight approach that adaptively patches the input image to increase token information density and encode hierarchical spatial structures into the input embedding. We then apply a reverse depatching scheme to the output embeddings of the transformer encoder, eliminating the need for convolution-based decoders. Unlike previous methods that modify attention mechanisms \wahib{or use a complex hierarchy of interacting models}, SHF can be retrofitted to any ViT model to allow it to learn the hierarchical structure of details in high-resolution images without requiring architectural changes. Experimental results demonstrate significant gains in computational efficiency and performance: on the PAIP WSI dataset, we achieved a 3$\sim$32$\times$ speedup or a 2. 95\% to 7. 03\% increase in accuracy (measured by Dice score) at a $64K^2$ resolution with the same computational budget, compared to state-of-the-art production models. On the 3D medical datasets BTCV and KiTS, training was 6$\times$ faster, with accuracy gains of 6. 93\% and 5. 9\%, respectively, compared to models without SHF.
AAAI Conference 2025 Conference Paper
Video object detection has made significant progress in recent years thanks to convolutional neural networks (CNNs) and vision transformers (ViTs). Typically, CNNs excel at capturing local features but struggle to model global representations. Conversely, ViTs are adept at capturing long-range global features but face challenges in representing local feature details. Off-the-shelf video object detection methods solely rely on CNNs or ViTs to conduct feature aggregation, which hampers their capability to simultaneously leverage global and local information, thereby resulting in limited detection performance. In this paper, we propose a Transformer-GraphFormer Blender Network (TGBFormer) for video object detection, with three key technical improvements to fully exploit the advantages of transformers and graph convolutional networks while compensating for their limitations. First, we develop a spatial-temporal transformer module to aggregate global contextual information, constituting global representations with long-range feature dependencies. Second, we introduce a spatial-temporal GraphFormer module that utilizes local spatial and temporal relationships to aggregate features, generating new local representations that are complementary to the transformer outputs. Third, we design a global-local feature blender module to adaptively couple transformer-based global representations and GraphFormer-based local representations. Extensive experiments demonstrate that our TGBFormer establishes new state-of-the-art results on the ImageNet VID dataset. Particularly, our TGBFormer achieves 86.5% mAP while running at around 41.0 FPS on a single Tesla A100 GPU.
AAAI Conference 2025 Conference Paper
Unsupervised visible-infrared person re-identification (US-VI-ReID) seeks to match infrared and visible images of the same individual without the use of annotations. Current methods typically derive cross-modal correspondences through a single global feature matching process for generating pseudo labels and learning modality-invariant features. However, this matching approach is hindered by both intra-modality and inter-modality discrepancies, which result in imprecise measurements. As a consequence, the clustering of individuals with single global feature is often incomplete and unreliable, leading to suboptimal performance in cross-modal clustering tasks. To address these challenges and to extract cross-modality discriminative identity information, we propose a TokenMatcher, which encompasses three key components: Diverse Tokens Matching (DTM), Diverse Tokens Neighbor Learning (DTNL), and the Homogeneous Fusion (HF) Module. DTM utilizes multiple class tokens within the visual transformer framework to capture diverse embedding representations, thereby facilitating the integration of fine-grained information essential for reliable cross-modality correspondences. DTNL enhances the intra-modality and inter-modality consistency among diverse tokens by refining neighborhood sets with insights from neighboring tokens and camera information, promoting robust neighborhood learning and fostering discriminative identity information. Additionally, the HF module consolidates clusters of the same identity while effectively separating those of different identities. Extensive experiments conducted on the publicly available SYSU-MM01 and RegDB datasets demonstrate the efficacy of the proposed method.
NeurIPS Conference 2025 Conference Paper
Graph Transformers (GTs) have emerged as a powerful paradigm for graph representation learning due to their ability to model diverse node interactions. However, existing GTs often rely on intricate architectural designs tailored to specific interactions, limiting their flexibly. To address this, we propose a unified hierarchical mask framework that reveals an underlying equivalence between model architecture and attention mask construction. This framework enables a consistent modeling paradigm by capturing diverse interactions through carefully designed attention masks. Theoretical analysis under this framework demonstrates that the probability of correct classification positively correlates with the receptive field size and label consistency, leading to a fundamental design principle: An effective attention mask should ensure both a sufficiently large receptive field and a high level of label consistency. While no single existing mask satisfies this principle across all scenarios, our analysis reveals that hierarchical masks offer complementary strengths—motivating their effective integration. Then, we introduce M$^3$Dphormer, a Mixture-of-Experts based Graph Transformer with Multi-Level Masking and Dual Attention Computation. M$^3$Dphormer incorporates three theoretically grounded hierarchical masks and employs a bi-level expert routing mechanism to adaptively integrate multi-level interaction information. To ensure scalability, we further introduce a dual attention computation scheme that dynamically switches between dense and sparse modes based on local mask sparsity. Extensive experiments across multiple benchmarks demonstrate that M$^3$Dphormer achieves state-of-the-art performance, validating the effectiveness of our unified framework and model design.
TMLR Journal 2025 Journal Article
Coreset selection, a technique for compressing large datasets while preserving performance, is crucial for modern machine learning. This paper presents a novel method for generating high-quality Wasserstein coresets using the Sinkhorn loss, a powerful tool with computational advantages. However, existing approaches suffer from numerical instability in Sinkhorn's algorithm. We address this by proposing stable algorithms for the computation and differentiation of the Sinkhorn optimization problem, including an analytical formula for the derivative of the Sinkhorn loss and a rigorous stability analysis of our method. Extensive experiments demonstrate that our approach significantly outperforms existing methods in terms of sample selection quality, computational efficiency, and achieving a smaller Wasserstein distance.
AAAI Conference 2024 Conference Paper
Recent studies reveal the connection between GNNs and the diffusion process, which motivates many diffusion based GNNs to be proposed. However, since these two mechanisms are closely related, one fundamental question naturally arises: Is there a general diffusion framework that can formally unify these GNNs? The answer to this question can not only deepen our understanding of the learning process of GNNs, but also may open a new door to design a broad new class of GNNs. In this paper, we propose a general diffusion equation framework with the fidelity term, which formally establishes the relationship between the diffusion process with more GNNs. Meanwhile, with this framework, we identify one characteristic of graph diffusion networks, i.e., the current neural diffusion process only corresponds to the first-order diffusion equation. However, by an experimental investigation, we show that the labels of high-order neighbors actually appear monophily property, which induces the similarity based on labels among high-order neighbors without requiring the similarity among first-order neighbors. This discovery motives to design a new high-order neighbor-aware diffusion equation, and derive a new type of graph diffusion network (HiD-Net) based on the framework. With the high-order diffusion equation, HiD-Net is more robust against attacks and works on both homophily and heterophily graphs. We not only theoretically analyze the relation between HiD-Net with high-order random walk, but also provide a theoretical convergence guarantee. Extensive experimental results well demonstrate the effectiveness of HiD-Net over state-of-the-art graph diffusion networks.
NeurIPS Conference 2024 Conference Paper
Artificial intelligence has significantly impacted medical applications, particularly with the advent of Medical Large Vision Language Models (Med-LVLMs), sparking optimism for the future of automated and personalized healthcare. However, the trustworthiness of Med-LVLMs remains unverified, posing significant risks for future model deployment. In this paper, we introduce CARES and aim to comprehensively evaluate the Trustworthiness of Med-LVLMs across the medical domain. We assess the trustworthiness of Med-LVLMs across five dimensions, including trustfulness, fairness, safety, privacy, and robustness. CARES comprises about 41K question-answer pairs in both closed and open-ended formats, covering 16 medical image modalities and 27 anatomical regions. Our analysis reveals that the models consistently exhibit concerns regarding trustworthiness, often displaying factual inaccuracies and failing to maintain fairness across different demographic groups. Furthermore, they are vulnerable to attacks and demonstrate a lack of privacy awareness. We publicly release our benchmark and code in https: //github. com/richard-peng-xia/CARES.
AAAI Conference 2024 Conference Paper
As a bio-inspired vision sensor, the spike camera emulates the operational principles of the fovea, a compact retinal region, by employing spike discharges to encode the accumulation of per-pixel luminance intensity. Leveraging its high temporal resolution and bio-inspired neuromorphic design, the spike camera holds significant promise for advancing computer vision applications. Saliency detection mimic the behavior of human beings and capture the most salient region from the scenes. In this paper, we investigate the visual saliency in the continuous spike stream for the first time. To effectively process the binary spike stream, we propose a Recurrent Spiking Transformer (RST) framework, which is based on a full spiking neural network. Our framework enables the extraction of spatio-temporal features from the continuous spatio-temporal spike stream while maintaining low power consumption. To facilitate the training and validation of our proposed model, we build a comprehensive real-world spike-based visual saliency dataset, enriched with numerous light conditions. Extensive experiments demonstrate the superior performance of our Recurrent Spiking Transformer framework in comparison to other spike neural network-based methods. Our framework exhibits a substantial margin of improvement in capturing and highlighting visual saliency in the spike stream, which not only provides a new perspective for spike-based saliency segmentation but also shows a new paradigm for full SNN-based transformer models. The code and dataset are available at https://github.com/BIT-Vision/SVS.
EAAI Journal 2024 Journal Article
IROS Conference 2024 Conference Paper
In the context of 3D scene perception tasks, the significance of 3D occupancy prediction has been progressively growing, aiming to forecast the occupancy state of voxels in a discrete 3D space. However, existing methods typically exhibit several limitations, such as restricted adaptability to non-pinhole cameras due to fixed camera parameters, heavy reliance on 3D annotations because of the inability to project 3D output back to the camera plane, and inferior real-time inference performance resulting from the conversion process from 2D to 3D features. To address these constrains, we introduce GenerOcc, a self-supervised framework of real-time 3D occupancy prediction for monocular generic cameras. We have collected the fisheye Dominant dataset to confirm the compatibility of our ray-based camera model with non-pinhole cameras. By transforming the occupancy prediction task into a depth estimation task in a self-supervised manner, we eliminate dependency on 3D annotations. Furthermore, we propose a parametric voxel probability distribution module that leverages 2D features to quickly predict 3D occupancy without 3D representations of the scene. Additionally, our GenerOcc has been extensively evaluated on public pinhole Occ3D-nuScenes dataset and our proprietary fisheye Dominant dataset, both yielding impressive performance.
AAAI Conference 2024 Conference Paper
Graph contrastive learning (GCL), learning the node representation by contrasting two augmented graphs in a self-supervised way, has attracted considerable attention. GCL is usually believed to learn the invariant representation. However, does this understanding always hold in practice? In this paper, we first study GCL from the perspective of causality. By analyzing GCL with the structural causal model (SCM), we discover that traditional GCL may not well learn the invariant representations due to the non-causal information contained in the graph. How can we fix it and encourage the current GCL to learn better invariant representations? The SCM offers two requirements and motives us to propose a novel GCL method. Particularly, we introduce the spectral graph augmentation to simulate the intervention upon non-causal factors. Then we design the invariance objective and independence objective to better capture the causal factors. Specifically, (i) the invariance objective encourages the encoder to capture the invariant information contained in causal variables, and (ii) the independence objective aims to reduce the influence of confounders on the causal variables. Experimental results demonstrate the effectiveness of our approach on node classification tasks.
AAAI Conference 2024 Conference Paper
The main streams of human activity recognition (HAR) algorithms are developed based on RGB cameras which usually suffer from illumination, fast motion, privacy preservation, and large energy consumption. Meanwhile, the biologically inspired event cameras attracted great interest due to their unique features, such as high dynamic range, dense temporal but sparse spatial resolution, low latency, low power, etc. As it is a newly arising sensor, even there is no realistic large-scale dataset for HAR. Considering its great practical value, in this paper, we propose a large-scale benchmark dataset to bridge this gap, termed HARDVS, which contains 300 categories and more than 100K event sequences. We evaluate and report the performance of multiple popular HAR algorithms, which provide extensive baselines for future works to compare. More importantly, we propose a novel spatial-temporal feature learning and fusion framework, termed ESTF, for event stream based human activity recognition. It first projects the event streams into spatial and temporal embeddings using StemNet, then, encodes and fuses the dual-view representations using Transformer networks. Finally, the dual features are concatenated and fed into a classification head for activity prediction. Extensive experiments on multiple datasets fully validated the effectiveness of our model. Both the dataset and source code will be released at https://github.com/Event-AHU/HARDVS.
TMLR Journal 2024 Journal Article
Nonlinear dimensional reduction with the manifold assumption, often called manifold learning, has proven its usefulness in a wide range of high-dimensional data analysis. The significant impact of t-SNE and UMAP has catalyzed intense research interest, seeking further innovations toward visualizing not only the local but also the global structure information of the data. Moreover, there have been consistent efforts toward generalizable dimensional reduction that handles unseen data. In this paper, we first propose GLoMAP, a novel manifold learning method for dimensional reduction and high-dimensional data visualization. GLoMAP preserves locally and globally meaningful distance estimates and displays a progression from global to local formation during the course of optimization. Furthermore, we extend GLoMAP to its inductive version, iGLoMAP, which utilizes a deep neural network to map data to its lower-dimensional representation. This allows iGLoMAP to provide lower-dimensional embeddings for unseen points without needing to re-train the algorithm. iGLoMAP is also well-suited for mini-batch learning, enabling large-scale, accelerated gradient calculations. We have successfully applied both GLoMAP and iGLoMAP to the simulated and real-data settings, with competitive experiments against the state-of-the-art methods.
NeurIPS Conference 2024 Conference Paper
Image captioning was recently found to be an effective pretraining method similar to contrastive pretraining. This opens up the largely-unexplored potential of using natural language as a flexible and powerful interface for handling diverse pretraining tasks. In this paper, we demonstrate this with a novel visual pretraining paradigm, LocCa, that incorporates location-aware tasks into captioners to teach models to extract rich information from images. Specifically, LocCa employs two tasks, bounding box prediction and location-dependent captioning, conditioned on the image pixel input. Thanks to the multitask capabilities of an encoder-decoder architecture, we show that an image captioner can effortlessly handle multiple tasks during pretraining. LocCa significantly outperforms standard captioners on downstream localization tasks, achieving state-of-the-art results on RefCOCO/+/g, while maintaining comparable performance on holistic tasks. Our work paves the way for further exploration of natural language interfaces in visual pretraining.
YNICL Journal 2024 Journal Article
The long-term motor outcome of acute stroke patients may be correlated to the reorganization of brain motor network. Abundant neuroimaging studies contribute to understand the pathological changes and recovery of motor networks after stroke. In this review, we summarized how current neuroimaging studies have increased understanding of reorganization and plasticity in post stroke motor recovery. Firstly, we discussed the changes in the motor network over time during the motor-activation and resting states, as well as the overall functional integration trend of the motor network. These studies indicate that the motor network undergoes dynamic bilateral hemispheric functional reorganization, as well as a trend towards network randomization. In the second part, we summarized the current study progress in the application of neuroimaging technology to early predict the post-stroke motor outcome. In the third part, we discuss the neuroimaging techniques commonly used in the post-stroke recovery. These methods provide direct or indirect visualization patterns to understand the neural mechanisms of post-stroke motor recovery, opening up new avenues for studying spontaneous and treatment-induced recovery and plasticity after stroke.
NeurIPS Conference 2024 Conference Paper
Transformer models have gained significant attention due to their power in machine learning tasks. Their extensive deployment has raised concerns about the potential leakage of sensitive information during inference. However, when being applied to Transformers, existing approaches based on secure two-party computation (2PC) bring about efficiency limitations in two folds: (1) resource-intensive matrix multiplications in linear layers, and (2) complex non-linear activation functions like $\mathsf{GELU}$ and $\mathsf{Softmax}$. This work presents a new two-party inference framework $\mathsf{Nimbus}$ for Transformer models. Specifically, we propose a new 2PC paradigm to securely compute matrix multiplications based on an outer-product insight, which achieves $2. 9\times \sim 12. 5\times$ performance improvements compared to the state-of-the-art (SOTA) protocol. Furthermore, through a new observation of utilizing the input distribution, we propose an approach of low-degree polynomial approximation for $\mathsf{GELU}$ and $\mathsf{Softmax}$, which improves the performance of the SOTA polynomial approximation by $2. 9\times \sim 4. 0\times$, where the average accuracy loss of our approach is 0. 08\% compared to the non-2PC inference without privacy. Compared with the SOTA two-party inference, $\mathsf{Nimbus}$ improves the end-to-end performance of $BERT_{base}$ inference by $2. 7\times \sim 4. 7\times$ across different network settings.
NeurIPS Conference 2024 Conference Paper
We study cultural and socioeconomic diversity in contrastive vision-language models (VLMs). Using a broad range of benchmark datasets and evaluation metrics, we bring to attention several important findings. First, the common filtering of training data to English image-text pairs disadvantages communities of lower socioeconomic status and negatively impacts cultural understanding. Notably, this performance gap is not captured by - and even at odds with - the currently popular evaluation metrics derived from the Western-centric ImageNet and COCO datasets. Second, pretraining with global, unfiltered data before fine-tuning on English content can improve cultural understanding without sacrificing performance on said popular benchmarks. Third, we introduce the task of geo-localization as a novel evaluation metric to assess cultural diversity in VLMs. Our work underscores the value of using diverse data to create more inclusive multimodal systems and lays the groundwork for developing VLMs that better represent global perspectives.
NeurIPS Conference 2024 Conference Paper
Face feature fusion is indispensable for robust face recognition, particularly in scenarios involving long-range, low-resolution media (unconstrained environments) where not all frames or features are equally informative. Existing methods often rely on large intermediate feature maps or face metadata information, making them incompatible with legacy biometric template databases that store pre-computed features. Additionally, real-time inference and generalization to large probe sets remains challenging. To address these limitations, we introduce a linear time O(N) proxy based sparse expert selection and pooling approach for context driven feature-set attention. Our approach is order invariant on the feature-set, generalizes to large sets, is compatible with legacy template stores, and utilizes significantly less parameters making it suitable real-time inference and edge use-cases. Through qualitative experiments, we demonstrate that ProxyFusion learns discriminative information for importance weighting of face features without relying on intermediate features. Quantitative evaluations on challenging low-resolution face verification datasets such as IARPA BTS3. 1 and DroneSURF show the superiority of ProxyFusion in unconstrained long-range face recognition setting. Our code and pretrained models are available at: https: //github. com/bhavinjawade/ProxyFusion
ICML Conference 2024 Conference Paper
Optimization problems with access to only zeroth-order information of the objective function on Riemannian manifolds arise in various applications, spanning from statistical learning to robot learning. While various zeroth-order algorithms have been proposed in Euclidean space, they are not inherently designed to handle the challenging constraints imposed by Riemannian manifolds. The proper adaptation of zeroth-order techniques to Riemannian manifolds remained unknown until the pioneering work of (Li et al. , 2023a). However, zeroth-order algorithms are widely observed to converge slowly and be unstable in practice. To alleviate these issues, we propose a Riemannian accelerated zeroth-order algorithm with improved robustness. Regarding efficiency, our accelerated algorithm has the function query complexity of $\mathcal{O}(\epsilon^{-7/4}d)$ for finding an $\epsilon$-approximate first-order stationary point. By introducing a small perturbation, it exhibits a function query complexity of $\tilde{\mathcal{O}}(\epsilon^{-7/4}d)$ for seeking a second-order stationary point with a high probability, matching state-of-the-art result in Euclidean space. Moreover, we further establish the almost sure convergence in the asymptotic sense through the Stable Manifold Theorem. Regarding robustness, our algorithm requires larger smoothing parameters in the order of $\tilde{\mathcal{O}}(\epsilon^{7/8}d^{-1/2})$, improving the existing result by a factor of $\tilde{\mathcal{O}}(\epsilon^{3/4})$.
IJCAI Conference 2024 Conference Paper
Session-based recommendation (SBR) aims to predict the next-interacted item based on anonymous users' behavior sequences. The main challenge is how to recognize the user intent with limited interactions to achieve a more accurate inference of user behavior. Existing works usually regard several consecutive items in the current session as intent. However, we argue such intent generation based on temporal transition ignores the fact that each item also has its semantically connected items in the feature space, which can be regarded as spatial intent. The limited consideration of intent fails to capture complex behavioral patterns in real-world scenarios, leading to sub-optimal solutions. To address this issue, we propose the Hierarchical Intent Perceiving Contrastive Learning Framework (HearInt) for SBR, which proposes a hierarchical consideration of intents from both temporal and spatial perspective. Specifically, we first propose that the user's temporal intents are mutually exclusive while the spatial intents are mutually compatible. Following these analyses, we design a Temporal Intent Decoupling module to mitigate the mutual influence of long-term and short-term intents, and a Cross-scale Contrastive Learning task to enhance the consistency of intents across different spatial scales. Experimental results on three real-world datasets exhibit that HearInt achieves state-of-the-art performance.
AAAI Conference 2024 Conference Paper
Understanding vehicles in images is important for various applications such as intelligent transportation and self-driving system. Existing vehicle-centric works typically pre-train models on large-scale classification datasets and then fine-tune them for specific downstream tasks. However, they neglect the specific characteristics of vehicle perception in different tasks and might thus lead to sub-optimal performance. To address this issue, we propose a novel vehicle-centric pre-training framework called VehicleMAE, which incorporates the structural information including the spatial structure from vehicle profile information and the semantic structure from informative high-level natural language descriptions for effective masked vehicle appearance reconstruction. To be specific, we explicitly extract the sketch lines of vehicles as a form of the spatial structure to guide vehicle reconstruction. The more comprehensive knowledge distilled from the CLIP big model based on the similarity between the paired/unpaired vehicle image-text sample is further taken into consideration to help achieve a better understanding of vehicles. A large-scale dataset is built to pre-train our model, termed Autobot1M, which contains about 1M vehicle images and 12693 text information. Extensive experiments on four vehicle-based downstream tasks fully validated the effectiveness of our VehicleMAE. The source code and pre-trained models will be released at https://github.com/Event-AHU/VehicleMAE.
NeurIPS Conference 2024 Conference Paper
Graph self-supervised learning, as a powerful pre-training paradigm for Graph Neural Networks (GNNs) without labels, has received considerable attention. We have witnessed the success of graph self-supervised learning on pre-training the parameters of GNNs, leading many not to doubt that whether the learned GNNs parameters are all useful. In this paper, by presenting the experimental evidence and analysis, we surprisingly discover that the graph self-supervised learning models are highly redundant at both of neuron and layer levels, e. g. , even randomly removing 51. 6\% of parameters, the performance of graph self-supervised learning models still retains at least 96. 2\%. This discovery implies that the parameters of graph self-supervised models can be largely reduced, making simultaneously fine-tuning both graph self-supervised learning models and prediction layers more feasible. Therefore, we further design a novel graph pre-training and fine-tuning paradigm called SLImming DE-correlation Fine-tuning (SLIDE). The effectiveness of SLIDE is verified through extensive experiments on various benchmarks, and the performance can be even improved with fewer parameters of models in most cases. For example, in comparison with full fine-tuning GraphMAE on Amazon-Computers dataset, even randomly reducing 40\% of parameters, we can still achieve the improvement of 0. 24\% and 0. 27\% for Micro-F1 and Macro-F1 scores respectively.
JMLR Journal 2024 Journal Article
In this paper, we study approximation algorithms for several classes of DR-submodular optimization problems, where DR is short for diminishing return. Following a newly introduced algorithm framework for zeroth-order stochastic approximation methods, we first propose algorithms {\bf CG-ZOSA} and {\bf RG-ZOSA} for smooth DR-submodular optimization based on the coordinate-wise gradient estimator and the randomized gradient estimator, respectively. Our theoretical analysis proves that \rm{\bf{CG-ZOSA}} can reach a solution whose expected objective value exceeds $(1-e^{-1}-\epsilon^{2})$OPT$-\epsilon$ after $\mathcal{O}(\epsilon^{-2})$ iterations and $\mathcal{O}(N^{2/3}d\epsilon^{-2})$ oracle calls, where $d$ represents the problem dimension. On the other hand, \rm{\bf{RG-ZOSA}} improves the approximation ratio to $(1-e^{-1}-\epsilon^{2}/d)$ while maintaining the same overall oracle complexity. For non-smooth up-concave maximization problems, we propose a novel auxiliary function based on a smoothed objective function and introduce the \rm{\bf{NZOSA}} algorithm. This algorithm achieves an approximation ratio of $(1-e^{-1}-\epsilon \ln \epsilon^{-1}- \epsilon^{2}\ln \epsilon^{-1})$ with $\mathcal{O}(d\epsilon^{-2})$ iterations and $\mathcal{O}(N^{2/3}d^{3/2} \epsilon^{-3})$ oracle calls. We also extend \rm{\bf{NZOSA}} to handle a class of robust DR-submodular maximization problems. To validate the effectiveness of our proposed algorithms, we conduct experiments on both synthetic and real-world problems. The results demonstrate the superior performance and efficiency of our methods in solving DR-submodular optimization problems. [abs] [ pdf ][ bib ] © JMLR 2024. ( edit, beta )
AAAI Conference 2023 Conference Paper
Estimating the structure of directed acyclic graphs (DAGs) of features (variables) plays a vital role in revealing the latent data generation process and providing causal insights in various applications. Although there have been many studies on structure learning with various types of data, the structure learning on the dynamic graph has not been explored yet, and thus we study the learning problem of node feature generation mechanism on such ubiquitous dynamic graph data. In a dynamic graph, we propose to simultaneously estimate contemporaneous relationships and time-lagged interaction relationships between the node features. These two kinds of relationships form a DAG, which could effectively characterize the feature generation process in a concise way. To learn such a DAG, we cast the learning problem as a continuous score-based optimization problem, which consists of a differentiable score function to measure the validity of the learned DAGs and a smooth acyclicity constraint to ensure the acyclicity of the learned DAGs. These two components are translated into an unconstraint augmented Lagrangian objective which could be minimized by mature continuous optimization techniques. The resulting algorithm, named GraphNOTEARS, outperforms baselines on simulated data across a wide range of settings that may encounter in real-world applications. We also apply the proposed approach on two dynamic graphs constructed from the real-world Yelp dataset, demonstrating our method could learn the connections between node features, which conforms with the domain knowledge.
YNIMG Journal 2023 Journal Article
Long-term dance training offers numerous benefits, including improvements in physical health, posture, body coordination, and mental health and well-being. Since dance is an art form of body-to-body communication, professional dancers may share feelings and thoughts on dance with their partners, owing to their shared training experiences. Considering this perspective, one may expect that professional dancers would demonstrate pronounced neural similarities when viewing dancing videos, which could be associated with their training duration. To test these hypotheses, we collected functional magnetic resonance imaging (fMRI) data while presenting ballroom dancing and neutral video clips with long durations (∼100 s each) to 41 professional ballroom dancers (19 pairs of dance partners) and 39 age- and sex-matched nondancers. Our findings revealed that dancers exhibited broader and stronger neural similarities across the whole brain when watching dancing video clips, as compared to the control group. These increased neural similarities could be interpreted in at least two distinct ways. First, neural similarities in certain brain regions within the motor control circuit (i.e., frontal cortical-basal ganglia-thalamic circuit) were significantly correlated with dance-related information (e.g., dance partners' cooperation duration), which reinforced the impact of long-term dance training on neural synchronization. Second, neural similarities in other brain regions (e.g., memory-related brain regions) were significantly correlated with subjects' impression of the viewed videos (i.e., whether they have watched before, familiarity, and liking), which may not necessarily be directly linked to long-term dance training. Altogether, our study provided solid evidence for synchronized neural mechanisms in professional dancers due to long-term dance training.
IJCAI Conference 2023 Conference Paper
Graph neural network (GNN) based recommender systems have become one of the mainstream trends due to the powerful learning ability from user behavior data. Understanding the user intents from behavior data is the key to recommender systems, which poses two basic requirements for GNN-based recommender systems. One is how to learn complex and diverse intents especially when the user behavior is usually inadequate in reality. The other is different behaviors have different intent distributions, so how to establish their relations for a more explainable recommender system. In this paper, we present the Intent-aware Recommendation via Disentangled Graph Contrastive Learning (IDCL), which simultaneously learns interpretable intents and behavior distributions over those intents. Specifically, we first model the user behavior data as a user-item-concept graph, and design a GNN based behavior disentangling module to learn the different intents. Then we propose the intent-wise contrastive learning to enhance the intent disentangling and meanwhile infer the behavior distributions. Finally, the coding rate reduction regularization is introduced to make the behaviors of different intents orthogonal. Extensive experiments demonstrate the effectiveness of IDCL in terms of substantial improvement and the interpretability.
NeurIPS Conference 2023 Conference Paper
Molecular representation learning lays the foundation for drug discovery. However, existing methods suffer from poor out-of-distribution (OOD) generalization, particularly when data for training and testing originate from different environments. To address this issue, we propose a new framework for learning molecular representations that exhibit invariance and robustness against distribution shifts. Specifically, we propose a strategy called ``first-encoding-then-separation'' to identify invariant molecule features in the latent space, which deviates from conventional practices. Prior to the separation step, we introduce a residual vector quantization module that mitigates the over-fitting to training data distributions while preserving the expressivity of encoders. Furthermore, we design a task-agnostic self-supervised learning objective to encourage precise invariance identification, which enables our method widely applicable to a variety of tasks, such as regression and multi-label classification. Extensive experiments on 18 real-world molecular datasets demonstrate that our model achieves stronger generalization against state-of-the-art baselines in the presence of various distribution shifts. Our code is available at https: //github. com/HICAI-ZJU/iMoLD.
NeurIPS Conference 2023 Conference Paper
Graph neural networks (GNNs) have become increasingly popular in modeling graph-structured data due to their ability to learn node representations by aggregating local structure information. However, it is widely acknowledged that the test graph structure may differ from the training graph structure, resulting in a structure shift. In this paper, we experimentally find that the performance of GNNs drops significantly when the structure shift happens, suggesting that the learned models may be biased towards specific structure patterns. To address this challenge, we propose the Cluster Information Transfer (\textbf{CIT}) mechanism, which can learn invariant representations for GNNs, thereby improving their generalization ability to various and unknown test graphs with structure shift. The CIT mechanism achieves this by combining different cluster information with the nodes while preserving their cluster-independent information. By generating nodes across different clusters, the mechanism significantly enhances the diversity of the nodes and helps GNNs learn the invariant representations. We provide a theoretical analysis of the CIT mechanism, showing that the impact of changing clusters during structure shift can be mitigated after transfer. Additionally, the proposed mechanism is a plug-in that can be easily used to improve existing GNNs. We comprehensively evaluate our proposed method on three typical structure shift scenarios, demonstrating its effectiveness in enhancing GNNs' performance.
IJCAI Conference 2023 Conference Paper
Graph-level contrastive learning, aiming to learn the representations for each graph by contrasting two augmented graphs, has attracted considerable attention. Previous studies usually simply assume that a graph and its augmented graph as a positive pair, otherwise as a negative pair. However, it is well known that graph structure is always complex and multi-scale, which gives rise to a fundamental question: after graph augmentation, will the previous assumption still hold in reality? By an experimental analysis, we discover the semantic information of an augmented graph structure may be not consistent as original graph structure, and whether two augmented graphs are positive or negative pairs is highly related with the multi-scale structures. Based on this finding, we propose a multi-scale subgraph contrastive learning architecture which is able to characterize the fine-grained semantic information. Specifically, we generate global and local views at different scales based on subgraph sampling, and construct multiple contrastive relationships according to their semantic associations to provide richer self-supervised signals. Extensive experiments and parametric analyzes on eight graph classification real-world datasets well demonstrate the effectiveness of the proposed method.
NeurIPS Conference 2023 Conference Paper
Last-iterate convergence has received extensive study in two player zero-sum games starting from bilinear, convex-concave up to settings that satisfy the MVI condition. Typical methods that exhibit last-iterate convergence for the aforementioned games include extra-gradient (EG) and optimistic gradient descent ascent (OGDA). However, all the established last-iterate convergence results hold for the restrictive setting where the underlying repeated game does not change over time. Recently, a line of research has focused on regret analysis of OGDA in time-varying games, i. e. , games where payoffs evolve with time; the last-iterate behavior of OGDA and EG in time-varying environments remains unclear though. In this paper, we study the last-iterate behavior of various algorithms in two types of unconstrained, time-varying, bilinear zero-sum games: periodic and convergent perturbed games. These models expand upon the usual repeated game formulation and incorporate external environmental factors, such as the seasonal effects on species competition and vanishing external noise. In periodic games, we prove that EG will converge while OGDA and momentum method will diverge. This is quite surprising, as to the best of our knowledge, it is the first result that indicates EG and OGDA have qualitatively different last-iterate behaviors and do not exhibit similar behavior. In convergent perturbed games, we prove all these algorithms converge as long as the game itself stabilizes with a faster rate than $1/t$.
IS Journal 2023 Journal Article
The recent debut and success of ChatGPT have brought up renewed debates and desires for artificial general intelligence (AGI) amid fears and anxieties of potential disruptions to our humanity and social values, as witnessed by the call from tech celebrities for a pause in the development of ChatGPT-style AGI tools. At the IEEE IS’ AI and CPSS Department, we would like to initiate cautious, balanced, hopefully deep investigations to address various related issues on the impact and significance of intelligent science and technology to our economy and society. Let’s start with the “three Bs” and “ACP” for parallel intelligence in CPSSs: Being by artificial systems (A), Becoming through computational experiments (C), and Believing with parallel execution (P).
NeurIPS Conference 2023 Conference Paper
Graph Contrastive Learning (GCL) has emerged as a popular training approach for learning node embeddings from augmented graphs without labels. Despite the key principle that maximizing the similarity between positive node pairs while minimizing it between negative node pairs is well established, some fundamental problems are still unclear. Considering the complex graph structure, are some nodes consistently well-trained and following this principle even with different graph augmentations? Or are there some nodes more likely to be untrained across graph augmentations and violate the principle? How to distinguish these nodes and further guide the training of GCL? To answer these questions, we first present experimental evidence showing that the training of GCL is indeed imbalanced across all nodes. To address this problem, we propose the metric "node compactness", which is the lower bound of how a node follows the GCL principle related to the range of augmentations. We further derive the form of node compactness theoretically through bound propagation, which can be integrated into binary cross-entropy as a regularization. To this end, we propose the PrOvable Training (POT) for GCL, which regularizes the training of GCL to encode node embeddings that follows the GCL principle better. Through extensive experiments on various benchmarks, POT consistently improves the existing GCL approaches, serving as a friendly plugin.
TMLR Journal 2023 Journal Article
Ensuring the safety of reinforcement learning (RL) algorithms is crucial to unlock their potential for many real-world tasks. However, vanilla RL and most safe RL approaches do not guarantee safety. In recent years, several methods have been proposed to provide hard safety guarantees for RL, which is essential for applications where unsafe actions could have disastrous consequences. Nevertheless, there is no comprehensive comparison of these provably safe RL methods. Therefore, we introduce a categorization of existing provably safe RL methods, present the conceptual foundations for both continuous and discrete action spaces, and empirically benchmark existing methods. We categorize the methods based on how they adapt the action: action replacement, action projection, and action masking. Our experiments on an inverted pendulum and a quadrotor stabilization task indicate that action replacement is the best-performing approach for these applications despite its comparatively simple realization. Furthermore, adding a reward penalty, every time the safety verification is engaged, improved training performance in our experiments. Finally, we provide practical guidance on selecting provably safe RL approaches depending on the safety specification, RL algorithm, and type of action space.
NeurIPS Conference 2023 Conference Paper
Collaborative machine learning (ML) is widely used to enable institutions to learn better models from distributed data. While collaborative approaches to learning intuitively protect user data, they remain vulnerable to either the server, the clients, or both, deviating from the protocol. Indeed, because the protocol is asymmetric, a malicious server can abuse its power to reconstruct client data points. Conversely, malicious clients can corrupt learning with malicious updates. Thus, both clients and servers require a guarantee when the other cannot be trusted to fully cooperate. In this work, we propose a peer-to-peer (P2P) learning scheme that is secure against malicious servers and robust to malicious clients. Our core contribution is a generic framework that transforms any (compatible) algorithm for robust aggregation of model updates to the setting where servers and clients can act maliciously. Finally, we demonstrate the computational efficiency of our approach even with 1-million parameter models trained by 100s of peers on standard datasets.
IS Journal 2023 Journal Article
This article explores the concept of human–autonomous organizations (HAOs) based on decentralized autonomous organizations (DAOs) and operations as well as human, artificial, natural, and organizational intelligence and their roles in shaping smart societies in the context of Industry 5. 0 and Society 5. 0. It discusses the potential of AI-generated content and prompt engineering in specific goal-guided manufacture and governance. Additionally, the article introduces the concept of the HAO as a framework for integrating human intelligence to achieve fair, transparent, and accountable decision making within DAOs. The proposed HAO reduces the risk of instability and unreliability in “human-in-the-loop” copilot systems and human–machine hybrid systems, leading to more reliable, secure, and flexible systems. It provides insights into the future management of smart societies and the symbiotic relationship between human ingenuity and the suite of emerging new AI technologies.
NeurIPS Conference 2023 Conference Paper
We introduce Three Towers (3T), a flexible method to improve the contrastive learning of vision-language models by incorporating pretrained image classifiers. While contrastive models are usually trained from scratch, LiT (Zhai et al. , 2022) has recently shown performance gains from using pretrained classifier embeddings. However, LiT directly replaces the image tower with the frozen embeddings, excluding any potential benefits from training the image tower contrastively. With 3T, we propose a more flexible strategy that allows the image tower to benefit from both pretrained embeddings and contrastive training. To achieve this, we introduce a third tower that contains the frozen pretrained embeddings, and we encourage alignment between this third tower and the main image-text towers. Empirically, 3T consistently improves over LiT and the CLIP-style from-scratch baseline for retrieval tasks. For classification, 3T reliably improves over the from-scratch baseline, and while it underperforms relative to LiT for JFT-pretrained models, it outperforms LiT for ImageNet-21k and Places365 pretraining.
IJCAI Conference 2022 Conference Paper
We consider nonconvex optimization problem with constraint that is a product of simplices. A commonly used algorithm in solving this type of problem is the Multiplicative Weights Update (MWU), an algorithm that is widely used in game theory, machine learning and multi agent systems. Despite it has been known that MWU avoids saddle points, there is a question that remains unaddressed: ``Is there an accelerated version of MWU that avoids saddle points provably? '' In this paper we provide a positive answer to above question. We provide an accelerated MWU based on Riemannian Accelerated Gradient Descent, and prove that the Riemannian Accelerated Gradient Descent, thus the accelerated MWU, avoid saddle points.
NeurIPS Conference 2022 Conference Paper
Most Graph Neural Networks (GNNs) predict the labels of unseen graphs by learning the correlation between the input graphs and labels. However, by presenting a graph classification investigation on the training graphs with severe bias, surprisingly, we discover that GNNs always tend to explore the spurious correlations to make decision, even if the causal correlation always exists. This implies that existing GNNs trained on such biased datasets will suffer from poor generalization capability. By analyzing this problem in a causal view, we find that disentangling and decorrelating the causal and bias latent variables from the biased graphs are both crucial for debiasing. Inspired by this, we propose a general disentangled GNN framework to learn the causal substructure and bias substructure, respectively. Particularly, we design a parameterized edge mask generator to explicitly split the input graph into causal and bias subgraphs. Then two GNN modules supervised by causal/bias-aware loss functions respectively are trained to encode causal and bias subgraphs into their corresponding representations. With the disentangled representations, we synthesize the counterfactual unbiased training samples to further decorrelate causal and bias variables. Moreover, to better benchmark the severe bias problem, we construct three new graph datasets, which have controllable bias degrees and are easier to visualize and explain. Experimental results well demonstrate that our approach achieves superior generalization performance over existing baselines. Furthermore, owing to the learned edge mask, the proposed model has appealing interpretability and transferability.
JBHI Journal 2022 Journal Article
The functional connectivity network (FCN) has been used to achieve several remarkable advancements in the diagnosis of neuro-degenerative disorders. Therefore, it is imperative to accurately estimate biologically meaningful FCNs. Several efforts have been dedicated to this purpose by encoding biological priors. However, owing to the high complexity of the human brain, the estimation of an ’ideal' FCN remains an open problem. To the best of our knowledge, almost all existing studies lack the integration of domain expert knowledge, which limits their performance. In this study, we focused on incorporating domain expert knowledge into the FCN estimation from a modularity perspective. To achieve this, we presented a human-guided modular representation (MR) FCN estimation framework. Specifically, we designed an adversarial low-rank constraint to describe the module structure of FCNs under the guidance of domain expert knowledge (i. e. , a predefined participant index). The chronic tinnitus (TIN) identification task based on the estimated FCNs was conducted to examine the proposed MR methods. Remarkably, MR significantly outperformed the baseline and state-of-the-art(SOTA) methods, achieving an accuracy of 92. 11%. Moreover, post-hoc analysis revealed that the FCNs estimated by the proposed MR could highlight more biologically meaningful connections, which is beneficial for exploring the underlying mechanisms of TIN and diagnosing early TIN.
IS Journal 2022 Journal Article
The concept of metaverses has received extensive attention recently and cyber-physical-social systems (CPSS) is its academic foundation. In almost all the applications of metaverses, the sensing system is an essential part and intelligent sensing capacity must be provided. However, due to the insufficient consideration of human factors in most of the studies, digital twins’ sensing in cyber-physical systems cannot achieve smart sensing in metaverses. For this reason, a novel framework for intelligent sensing in metaverses, MetaSensing, is proposed based on parallel intelligence in CPSS. Within the framework of MetaSensing, there are four states of sensing: physical sensing, descriptive sensing, predictive sensing, and prescriptive sensing. To protect sensors’ data privacy in metaverses, DAO-based decentralized sensing is introduced as a mechanism of the operation and maintenance for smart sensing industries.
IS Journal 2022 Journal Article
A total of 12 years have been passed since this Department was created in 2010 as the first academic forum dedicated to cyber-physical-social systems (CPSS), with the first CPSS research article on the field: “The Emergence of Intelligent Enterprises: From CPS to CPSS. ” What has happened and changed during the past decade? A brief reflection and review are presented here with a focus on digital twins in CPS versus parallel intelligence in CPSS, and their relationship to blockchain intelligence, smart contracts, metaverses, DAO, Web3, and decentralized science. The concept of DeMetaverses is thus introduced and interpreted as a DAO-based decentralized autonomous metaverse. The characteristics, mechanism, and impact of DeMetaverses are discussed with a vision for achieving an integrated human, artificial, natural, and organizational intelligence that would transform our world into “6S” societies.
AAAI Conference 2022 Conference Paper
Despite the remarkable performance of graph neural networks (GNNs) in semi-supervised learning, it is criticized for not making full use of unlabeled data and suffering from overfitting. Recently, graph data augmentation, used to improve both accuracy and generalization of GNNs, has received considerable attentions. However, one fundamental question is how to evaluate the quality of graph augmentations in principle? In this paper, we propose two metrics, Consistency and Diversity, from the aspects of augmentation correctness and generalization. Moreover, we discover that existing augmentations fall into a dilemma between these two metrics. Can we find a graph augmentation satisfying both consistency and diversity? A well-informed answer can help us understand the mechanism behind graph augmentation and improve the performance of GNNs. To tackle this challenge, we analyze two representative semi-supervised learning algorithms: label propagation (LP) and consistency regularization (CR). We find that LP utilizes the prior knowledge of graphs to improve consistency and CR adopts variable augmentations to promote diversity. Based on this discovery, we treat neighbors as augmentations to capture the prior knowledge embodying homophily assumption, which promises a high consistency of augmentations. To further promote diversity, we randomly replace the immediate neighbors of each node with its remote neighbors. After that, a neighbor-constrained regularization is proposed to enforce the predictions of the augmented neighbors to be consistent with each other. Extensive experiments on five real-world graphs validate the superiority of our method in improving the accuracy and generalization of GNNs.
AAAI Conference 2022 Conference Paper
Due to high-speed motion blur and challenging illumination, conventional frame-based cameras have encountered an important challenge in object detection tasks. Neuromorphic cameras that output asynchronous visual streams instead of intensity frames, by taking the advantage of high temporal resolution and high dynamic range, have brought a new perspective to address the challenge. In this paper, we propose a novel problem setting, retinomorphic object detection, which is the first trial that integrates foveal-like and peripheral-like visual streams. Technically, we first build a large-scale multimodal neuromorphic object detection dataset (i. e. , PKU- Vidar-DVS) over 215. 5k spatio-temporal synchronized labels. Then, we design temporal aggregation representations to preserve the spatio-temporal information from asynchronous visual streams. Finally, we present a novel bio-inspired unifying framework to fuse two sensing modalities via a dynamic interaction mechanism. Our experimental evaluation shows that our approach has significant improvements over the stateof-the-art methods with the single-modality, especially in high-speed motion and low-light scenarios. We hope that our work will attract further research into this newly identified, yet crucial research direction. Our dataset can be available at https: //www. pkuml. org/resources/pku-vidar-dvs. html.
NeurIPS Conference 2022 Conference Paper
Graph Contrastive Learning (GCL), learning the node representations by augmenting graphs, has attracted considerable attentions. Despite the proliferation of various graph augmentation strategies, there are still some fundamental questions unclear: what information is essentially learned by GCL? Are there some general augmentation rules behind different augmentations? If so, what are they and what insights can they bring? In this paper, we answer these questions by establishing the connection between GCL and graph spectrum. By an experimental investigation in spectral domain, we firstly find the General grAph augMEntation (GAME) rule for GCL, i. e. , the difference of the high-frequency parts between two augmented graphs should be larger than that of low-frequency parts. This rule reveals the fundamental principle to revisit the current graph augmentations and design new effective graph augmentations. Then we theoretically prove that GCL is able to learn the invariance information by contrastive invariance theorem, together with our GAME rule, for the first time, we uncover that the learned representations by GCL essentially encode the low-frequency information, which explains why GCL works. Guided by this rule, we propose a spectral graph contrastive learning module (SpCo), which is a general and GCL-friendly plug-in. We combine it with different existing GCL models, and extensive experiments well demonstrate that it can further improve the performances of a wide variety of different GCL methods.
AAAI Conference 2022 Conference Paper
Heterogeneous Graph Neural Networks (HGNNs) have drawn increasing attention in recent years and achieved outstanding performance in many tasks. However, despite their wide use, there is currently no understanding of their robustness to adversarial attacks. In this work, we first systematically study the robustness of HGNNs and show that they can be easily fooled by adding the adversarial edge between the target node and large-degree node (i. e. , hub). Furthermore, we show two key reasons for such vulnerabilities of HGNNs: one is perturbation enlargement effect, i. e. , HGNNs, failing to encode transiting probability, will enlarge the effect of the adversarial hub in comparison of GCNs, and the other is soft attention mechanism, i. e. , such mechanism assigns positive attention values to obviously unreliable neighbors. Based on the two facts, we propose a novel robust HGNN framework RoHe against topology adversarial attacks by equipping an attention purifier, which can prune malicious neighbors based on topology and feature. Specifically, to eliminate the perturbation enlargement, we introduce the metapath-based transiting probability as the prior criterion of the purifier, restraining the confidence of malicious neighbors from adversarial hub. Then the purifier learns to mask out neighbors with low confidence, thus can effectively alleviate the negative effect of malicious neighbors in the soft attention mechanism. Extensive experiments on different benchmark datasets for multiple HGNNs are conducted, where the considerable improvement of HGNNs under adversarial attacks will demonstrate the effectiveness and generalization ability of our defense framework.
IJCAI Conference 2022 Conference Paper
Traditional recommendation usually focuses on utilizing only one target user behavior (e. g. , purchase) but ignoring other auxiliary behaviors (e. g. , click, add to cart). Early efforts of multi-behavior recommendation often emphasize the differences between multiple behaviors, i. e. , they aim to extract useful information by distinguishing different behaviors. However, the commonality between them, which reflects user's common preference for items associated with different behaviors, is largely ignored. Meanwhile, the multi-behavior recommendation still severely suffers from limited supervision signal issue. In this paper, we propose a novel self-supervised graph collaborative filtering model for multi-behavior recommendation named S-MBRec. Specifically, for each behavior, we execute the GCNs to learn the user and item embeddings. Then we design a supervised task, distinguishing the importance of different behaviors, to capture the differences between embeddings. Meanwhile, we propose a star-style contrastive learning task to capture the embedding commonality between target and auxiliary behaviors, so as to alleviate the sparsity of supervision signal, reduce the redundancy among auxiliary behavior, and extract the most critical information. Finally, we jointly optimize the above two tasks. Extensive experiments, in comparison with state-of-the-arts, well demonstrate the effectiveness of S-MBRec, where the maximum improvement can reach to 20%.
IS Journal 2022 Journal Article
This article discusses the impact and significance of the autonomous science movement and the role and potential uses of intelligent technology in DAO-based decentralized science (DeSci) organizations and operations. What is DeSci? How does it relate the science of team science? What are its potential contributions to multidisciplinary, interdisciplinary, and/or transdisciplinary studies? Does it have any correspondence to the social movement organizations in traditional social sciences or the cyber movement organizations in the new digital age? Particularly, issues on DeSci to current professional communities, such as IEEE and its societies, conferences, and publications, are addressed, and the effort for the framework and process of DAO-based DeSci for free, fair, and responsibility sensitive sciences is reviewed.
NeurIPS Conference 2022 Conference Paper
Recent studies show that graph convolutional network (GCN) often performs worse for low-degree nodes, exhibiting the so-called structural unfairness for graphs with long-tailed degree distributions prevalent in the real world. Graph contrastive learning (GCL), which marries the power of GCN and contrastive learning, has emerged as a promising self-supervised approach for learning node representations. How does GCL behave in terms of structural fairness? Surprisingly, we find that representations obtained by GCL methods are already fairer to degree bias than those learned by GCN. We theoretically show that this fairness stems from intra-community concentration and inter-community scatter properties of GCL, resulting in a much clear community structure to drive low-degree nodes away from the community boundary. Based on our theoretical analysis, we further devise a novel graph augmentation method, called GRAph contrastive learning for DEgree bias (GRADE), which applies different strategies to low- and high-degree nodes. Extensive experiments on various benchmarks and evaluation protocols validate the effectiveness of the proposed method.
AAAI Conference 2021 Conference Paper
Federated learning (FL) is a promising approach for training decentralized data located on local client devices while improving efficiency and privacy. However, the distribution and quantity of the training data on the clients’ side may lead to significant challenges such as class imbalance and non- IID (non-independent and identically distributed) data, which could greatly impact the performance of the common model. While much effort has been devoted to helping FL models converge when encountering non-IID data, the imbalance issue has not been sufficiently addressed. In particular, as FL training is executed by exchanging gradients in an encrypted form, the training data is not completely observable to either clients or server, and previous methods for class imbalance do not perform well for FL. Therefore, it is crucial to design new methods for detecting class imbalance in FL and mitigating its impact. In this work, we propose a monitoring scheme that can infer the composition of training data for each FL round, and design a new loss function — Ratio Loss to mitigate the impact of the imbalance. Our experiments demonstrate the importance of acknowledging class imbalance and taking measures as early as possible in FL training, and the effectiveness of our method in mitigating the impact. Our method is shown to significantly outperform previous methods, while maintaining client privacy.
NeurIPS Conference 2021 Conference Paper
Despite Graph Neural Networks (GNNs) have achieved remarkable accuracy, whether the results are trustworthy is still unexplored. Previous studies suggest that many modern neural networks are over-confident on the predictions, however, surprisingly, we discover that GNNs are primarily in the opposite direction, i. e. , GNNs are under-confident. Therefore, the confidence calibration for GNNs is highly desired. In this paper, we propose a novel trustworthy GNN model by designing a topology-aware post-hoc calibration function. Specifically, we first verify that the confidence distribution in a graph has homophily property, and this finding inspires us to design a calibration GNN model (CaGCN) to learn the calibration function. CaGCN is able to obtain a unique transformation from logits of GNNs to the calibrated confidence for each node, meanwhile, such transformation is able to preserve the order between classes, satisfying the accuracy-preserving property. Moreover, we apply the calibration GNN to self-training framework, showing that more trustworthy pseudo labels can be obtained with the calibrated confidence and further improve the performance. Extensive experiments demonstrate the effectiveness of our proposed model in terms of both calibration and accuracy.
AAAI Conference 2021 Conference Paper
Graph neural networks (GNNs) have been proven to be effective in various network-related tasks. Most existing GNNs usually exploit the low-frequency signals of node features, which gives rise to one fundamental question: is the low-frequency information all we need in the real world applications? In this paper, we first present an experimental investigation assessing the roles of low-frequency and high-frequency signals, where the results clearly show that exploring low-frequency signal only is distant from learning an effective node representation in different scenarios. How can we adaptively learn more information beyond low-frequency information in GNNs? A well-informed answer can help GNNs enhance the adaptability. We tackle this challenge and propose a novel Frequency Adaptation Graph Convolutional Networks (FAGCN) with a selfgating mechanism, which can adaptively integrate different signals in the process of message passing. For a deeper understanding, we theoretically analyze the roles of low-frequency signals and high-frequency signals on learning node representations, which further explains why FAGCN can perform well on different types of networks. Extensive experiments on six real-world networks validate that FAGCN not only alleviates the over-smoothing problem, but also has advantages over the state-of-the-arts.
IJCAI Conference 2021 Conference Paper
This paper proposes Characteristic Examples for effectively fingerprinting deep neural networks, featuring high-robustness to the base model against model pruning as well as low-transferability to unassociated models. This is the first work taking both robustness and transferability into consideration for generating realistic fingerprints, whereas current methods lack practical assumptions and may incur large false positive rates. To achieve better trade-off between robustness and transferability, we propose three kinds of characteristic examples: vanilla C-examples, RC-examples, and LTRC-example, to derive fingerprints from the original base model. To fairly characterize the trade-off between robustness and transferability, we propose Uniqueness Score, a comprehensive metric that measures the difference between robustness and transferability, which also serves as an indicator to the false alarm problem. Extensive experiments demonstrate that the proposed characteristic examples can achieve superior performance when compared with existing fingerprinting methods. In particular, for VGG ImageNet models, using LTRC-examples gives 4X higher uniqueness score than the baseline method and does not incur any false positives.
IJCAI Conference 2021 Conference Paper
Graph-level representation learning is to learn low-dimensional representation for the entire graph, which has shown a large impact on real-world applications. Recently, limited by expensive labeled data, contrastive learning based graph-level representation learning attracts considerable attention. However, these methods mainly focus on graph augmentation for positive samples, while the effect of negative samples is less explored. In this paper, we study the impact of negative samples on learning graph-level representations, and a novel curriculum contrastive learning framework for self-supervised graph-level representation, called CuCo, is proposed. Specifically, we introduce four graph augmentation techniques to obtain the positive and negative samples, and utilize graph neural networks to learn their representations. Then a scoring function is proposed to sort negative samples from easy to hard and a pacing function is to automatically select the negative samples in each training procedure. Extensive experiments on fifteen graph classification real-world datasets, as well as the parameter analysis, well demonstrate that our proposed CuCo yields truly encouraging results in terms of performance on classification and convergence.
AAAI Conference 2021 Conference Paper
Heterogeneous Graph Neural Networks (HGNNs) have drawn increasing attention in recent years and achieved outstanding performance in many tasks. The success of the existing HGNNs relies on one fundamental assumption, i. e. , the original heterogeneous graph structure is reliable. However, this assumption is usually unrealistic, since the heterogeneous graph in reality is inevitably noisy or incomplete. Therefore, it is vital to learn the heterogeneous graph structure for HGNNs rather than rely only on the raw graph structure. In light of this, we make the first attempt towards learning an optimal heterogeneous graph structure for HGNNs and propose a novel framework HGSL, which jointly performs Heterogeneous Graph Structure Learning and GNN parameter learning for classification. Different from traditional homogeneous graph structure learning, considering the heterogeneity of different relations in heterogeneous graph, HGSL generates each relation subgraph separately. Specifically, in each generated relation subgraph, HGSL not only considers the feature similarity by generating feature similarity graph, but also considers the complex heterogeneous interactions in features and semantics by generating feature propagation graph and semantic graph. Then, these graphs are fused to a learned heterogeneous graph and optimized together with a GNN towards classification objective. Extensive experiments on real-world graphs demonstrate that the proposed framework significantly outperforms the state-of-the-art methods.
IJCAI Conference 2021 Conference Paper
Heterogeneous information network (HIN) embedding, learning the low-dimensional representation of multi-type nodes, has been applied widely and achieved excellent performance. However, most of the previous works focus more on static heterogeneous networks or learning node embedding within specific snapshots, and seldom attention has been paid to the whole evolution process and capturing all temporal dynamics. In order to fill the gap of obtaining multi-type node embeddings by considering all temporal dynamics during the evolution, we propose a novel temporal HIN embedding method (THINE). THINE not only uses attention mechanism and meta-path to preserve structures and semantics in HIN but also combines the Hawkes process to simulate the evolution of the temporal network. Our extensive evaluations with various real-world temporal HINs demonstrate that THINE achieves state-of-the-art performance in both static and dynamic tasks, including node classification, link prediction, and temporal link recommendation.
NeurIPS Conference 2021 Conference Paper
Graph Convolutional Networks (GCNs), aiming to obtain the representation of a node by aggregating its neighbors, have demonstrated great power in tackling various analytics tasks on graph (network) data. The remarkable performance of GCNs typically relies on the homophily assumption of networks, while such assumption cannot always be satisfied, since the heterophily or randomness are also widespread in real-world. This gives rise to one fundamental question: whether networks with different structural properties should adopt different propagation mechanisms? In this paper, we first conduct an experimental investigation. Surprisingly, we discover that there are actually segmentation rules for the propagation mechanism, i. e. , 1-hop, 2-hop and $k$-nearest neighbor ($k$NN) neighbors are more suitable as neighborhoods of network with complete homophily, complete heterophily and randomness, respectively. However, the real-world networks are complex, and may present diverse structural properties, e. g. , the network dominated by homophily may contain a small amount of randomness. So can we reasonably utilize these segmentation rules to design a universal propagation mechanism independent of the network structural assumption? To tackle this challenge, we develop a new universal GCN framework, namely U-GCN. It first introduces a multi-type convolution to extract information from 1-hop, 2-hop and $k$NN networks simultaneously, and then designs a discriminative aggregation to sufficiently fuse them aiming to given learning objectives. Extensive experiments demonstrate the superiority of U-GCN over state-of-the-arts. The code and data are available at https: //github. com/jindi-tju.
AAAI Conference 2021 Conference Paper
This paper presents a new high-quality dataset for Very Important Person Localization (VIPLoc), named Unconstrained-7k. Generally, existing datasets are: 1) limited in scale; 2) built under simple and constrained conditions, where the number of disturbing non-VIPs is not large, the scene is relatively simple, and the face of VIP is always in frontal view and salient. To tackle these problems, the proposed Unconstrained-7k dataset is featured in two aspects. First, it contains over 7, 000 annotated images, making it the largest VIPLoc dataset under unconstrained conditions to date. Second, our dataset is collected freely on the Internet, including multiple scenes, where images are in unconstrained conditions. VIPs in the new dataset are in different settings, e. g. , large view variation, varying sizes, occluded, and complex scenes. Meanwhile, each image has more persons (> 20), making the dataset more challenging. As a minor contribution, motivated by the observation that VIPs are highly related to not only neighbors but also iconic objects, this paper proposes a Joint Social Relation and Individual Interaction Graph Neural Networks (JSRII-GNN) for VIPLoc. Experiments show that the JSRII-GNN yields competitive accuracy on NCAA (National Collegiate Athletic Association), MS (Multi-scene), and Unconstrained-7k datasets. https: //github. com/xiaowang1516/VIPLoc.
AAAI Conference 2021 Conference Paper
The prosperous development of social e-commerce has spawned diverse recommendation demands, and accompanied a new recommendation paradigm, share recommendation. Significantly different from traditional binary recommendations (e. g. , item recommendation and friend recommendation), share recommendation models ternary interactions among hUser, Item, Friendi, which aims to recommend a most likely friend to a user who would like to share a specific item, progressively becoming an indispensable service in social e-commerce. Seamlessly integrating the social relations and purchase behaviours, share recommendation improves user stickiness and monetizes the user influence, meanwhile encountering three unique challenges: rich heterogeneous information, complex ternary interaction, and asymmetric share action. In this paper, we first study the share recommendation problem and propose a heterogeneous graph neural network based share recommendation model, called HGSRec. Specifically, HGSRec delicately designs a tripartite heterogeneous GNNs to describe the multifold characteristics of users and items, and then dynamically fuses them via capturing potential ternary dependency with a dual co-attention mechanism, followed by a transitive triplet representation to depict the asymmetry of share action and predict whether share action happens. Offline experiments demonstrate the superiority of the proposed HGSRec with significant improvements (11. 7%-14. 5%) over the state-of-the-arts, and online A/B testing on Taobao platform further demonstrates the high industrial practicability and stability of HGSRec.
IJCAI Conference 2020 Conference Paper
Most of existing clustering algorithms are proposed without considering the selection bias in data. In many real applications, however, one cannot guarantee the data is unbiased. Selection bias might bring the unexpected correlation between features and ignoring those unexpected correlations will hurt the performance of clustering algorithms. Therefore, how to remove those unexpected correlations induced by selection bias is extremely important yet largely unexplored for clustering. In this paper, we propose a novel Decorrelation regularized K-Means algorithm (DCKM) for clustering with data selection bias. Specifically, the decorrelation regularizer aims to learn the global sample weights which are capable of balancing the sample distribution, so as to remove unexpected correlations among features. Meanwhile, the learned weights are combined with k-means, which makes the reweighted k-means cluster on the inherent data distribution without unexpected correlation influence. Moreover, we derive the updating rules to effectively infer the parameters in DCKM. Extensive experiments results on real world datasets well demonstrate that our DCKM algorithm achieves significant performance gains, indicating the necessity of removing unexpected feature correlations induced by selection bias when clustering.
NeurIPS Conference 2020 Conference Paper
Sampling is a fundamental and arguably very important task with numerous applications in Machine Learning. One approach to sample from a high dimensional distribution $e^{-f}$ for some function $f$ is the Langevin Algorithm (LA). Recently, there has been a lot of progress in showing fast convergence of LA even in cases where $f$ is non-convex, notably \cite{VW19}, \cite{MoritaRisteski} in which the former paper focuses on functions $f$ defined in $\mathbb{R}^n$ and the latter paper focuses on functions with symmetries (like matrix completion type objectives) with manifold structure. Our work generalizes the results of \cite{VW19} where $f$ is defined on a manifold $M$ rather than $\mathbb{R}^n$. From technical point of view, we show that KL decreases in a geometric rate whenever the distribution $e^{-f}$ satisfies a log-Sobolev inequality on $M$.
AAAI Conference 2020 Conference Paper
We address the problem of disentangled representation learning with independent latent factors in graph convolutional networks (GCNs). The current methods usually learn node representation by describing its neighborhood as a perceptual whole in a holistic manner while ignoring the entanglement of the latent factors. However, a real-world graph is formed by the complex interaction of many latent factors (e. g. , the same hobby, education or work in social network). While little effort has been made toward exploring the disentangled representation in GCNs. In this paper, we propose a novel Independence Promoted Graph Disentangled Networks (IPGDN) to learn disentangled node representation while enhancing the independence among node representations. In particular, we firstly present disentangled representation learning by neighborhood routing mechanism, and then employ the Hilbert- Schmidt Independence Criterion (HSIC) to enforce independence between the latent representations, which is effectively integrated into a graph convolutional framework as a regularizer at the output layer. Experimental studies on realworld graphs validate our model and demonstrate that our algorithms outperform the state-of-the-arts by a wide margin in different network applications, including semi-supervised graph classification, graph clustering and graph visualization.
IJCAI Conference 2020 Conference Paper
Network embedding, mapping nodes in a network to a low-dimensional space, achieves powerful performance. An increasing number of works focus on static network embedding, however, seldom attention has been paid to temporal network embedding, especially without considering the effect of mesoscopic dynamics when the network evolves. In light of this, we concentrate on a particular motif --- triad --- and its temporal dynamics, to study the temporal network embedding. Specifically, we propose MTNE, a novel embedding model for temporal networks. MTNE not only integrates the Hawkes process to stimulate the triad evolution process that preserves motif-aware high-order proximities, but also combines attention mechanism to distinguish the importance of different types of triads better. Experiments on various real-world temporal networks demonstrate that, compared with several state-of-the-art methods, our model achieves the best performance in both static and dynamic tasks, including node classification, link prediction, and link recommendation.
AAAI Conference 2020 Conference Paper
The interactions of users and items in recommender system could be naturally modeled as a user-item bipartite graph. In recent years, we have witnessed an emerging research effort in exploring user-item graph for collaborative filtering methods. Nevertheless, the formation of user-item interactions typically arises from highly complex latent purchasing motivations, such as high cost performance or eye-catching appearance, which are indistinguishably represented by the edges. The existing approaches still remain the differences between various purchasing motivations unexplored, rendering the inability to capture fine-grained user preference. Therefore, in this paper we propose a novel Multi-Component graph convolutional Collaborative Filtering (MCCF) approach to distinguish the latent purchasing motivations underneath the observed explicit user-item interactions. Specifically, there are two elaborately designed modules, decomposer and combiner, inside MCCF. The former first decomposes the edges in user-item graph to identify the latent components that may cause the purchasing relationship; the latter then recombines these latent components automatically to obtain unified embeddings for prediction. Furthermore, the sparse regularizer and weighted random sample strategy are utilized to alleviate the overfitting problem and accelerate the optimization. Empirical results on three real datasets and a synthetic dataset not only show the significant performance gains of MCCF, but also well demonstrate the necessity of considering multiple components.
IJCAI Conference 2020 Conference Paper
As heterogeneous networks have become increasingly ubiquitous, Heterogeneous Information Network (HIN) embedding, aiming to project nodes into a low-dimensional space while preserving the heterogeneous structure, has drawn increasing attention in recent years. Many of the existing HIN embedding methods adopt meta-path guided random walk to retain both the semantics and structural correlations between different types of nodes. However, the selection of meta-paths is still an open problem, which either depends on domain knowledge or is learned from label information. As a uniform blueprint of HIN, the network schema comprehensively embraces the high-order structure and contains rich semantics. In this paper, we make the first attempt to study network schema preserving HIN embedding, and propose a novel model named NSHE. In NSHE, a network schema sampling method is first proposed to generate sub-graphs (i. e. , schema instances), and then multi-task learning task is built to preserve the heterogeneous structure of each schema instance. Besides preserving pairwise structure information, NSHE is able to retain high-order structure (i. e. , network schema). Extensive experiments on three real-world datasets demonstrate that our proposed model NSHE significantly outperforms the state-of-the-art methods.
AAAI Conference 2020 Conference Paper
We introduce a novel and efficient algorithm called the stochastic approximate gradient descent (SAGD), as an alternative to the stochastic gradient descent for cases where unbiased stochastic gradients cannot be trivially obtained. Traditional methods for such problems rely on general-purpose sampling techniques such as Markov chain Monte Carlo, which typically requires manual intervention for tuning parameters and does not work efficiently in practice. Instead, SAGD makes use of the Langevin algorithm to construct stochastic gradients that are biased in finite steps but accurate asymptotically, enabling us to theoretically establish the convergence guarantee for SAGD. Inspired by our theoretical analysis, we also provide useful guidelines for its practical implementation. Finally, we show that SAGD performs well experimentally in popular statistical and machine learning problems such as the expectation-maximization algorithm and the variational autoencoders.
IJCAI Conference 2020 Conference Paper
Pedestrian detection at nighttime is a crucial and frontier problem in surveillance, but has not been well explored by the computer vision and artificial intelligence communities. Most of existing methods detect pedestrians under favorable lighting conditions (e. g. daytime) and achieve promising performances. In contrast, they often fail under unstable lighting conditions (e. g. nighttime). Night is a critical time for criminal suspects to act in the field of security. The existing nighttime pedestrian detection dataset is captured by a car camera, specially designed for autonomous driving scenarios. The dataset for nighttime surveillance scenario is still vacant. There are vast differences between autonomous driving and surveillance, including viewpoint and illumination. In this paper, we build a novel pedestrian detection dataset from the nighttime surveillance aspect: NightSurveillance1. As a benchmark dataset for pedestrian detection at nighttime, we compare the performances of state-of-the-art pedestrian detectors and the results reveal that the methods cannot solve all the challenging problems of NightSurveillance. We believe that NightSurveillance can further advance the research of pedestrian detection, especially in the field of surveillance security at nighttime.
NeurIPS Conference 2019 Conference Paper
In a series of papers [Lee et al 2016], [Panageas and Piliouras 2017], [Lee et al 2019], it was established that some of the most commonly used first order methods almost surely (under random initializations) and with step-size being small enough, avoid strict saddle points, as long as the objective function $f$ is $C^2$ and has Lipschitz gradient. The key observation was that first order methods can be studied from a dynamical systems perspective, in which instantiations of Center-Stable manifold theorem allow for a global analysis. The results of the aforementioned papers were limited to the case where the step-size $\alpha$ is constant, i. e. , does not depend on time (and typically bounded from the inverse of the Lipschitz constant of the gradient of $f$). It remains an open question whether or not the results still hold when the step-size is time dependent and vanishes with time. In this paper, we resolve this question on the affirmative for gradient descent, mirror descent, manifold descent and proximal point. The main technical challenge is that the induced (from each first order method) dynamical system is time non-homogeneous and the stable manifold theorem is not applicable in its classic form. By exploiting the dynamical systems structure of the aforementioned first order methods, we are able to prove a stable manifold theorem that is applicable to time non-homogeneous dynamical systems and generalize the results in [Lee et al 2019] for time dependent step-sizes.
AAAI Conference 2019 Conference Paper
Heterogeneous information network (HIN) embedding, aiming to project HIN into a low-dimensional space, has attracted considerable research attention. Most of the exiting HIN embedding methods focus on preserving the inherent network structure and semantic correlations in Euclidean spaces. However, one fundamental problem is that whether the Euclidean spaces are the appropriate or intrinsic isometric spaces of HIN? Recent researches argue that the complex network may have the hyperbolic geometry underneath, because the underlying hyperbolic geometry can naturally reflect some properties of complex network, e. g. , hierarchical and power-law structure. In this paper, we make the first effort toward HIN embedding in hyperbolic spaces. We analyze the structures of two real-world HINs and discover some properties, e. g. , the power-law distribution, also exist in HIN. Therefore, we propose a novel hyperbolic heterogeneous information network embedding model. Specifically, to capture the structure and semantic relations between nodes, we employ the meta-path guided random walk to sample the sequences for each node. Then we exploit the distance in hyperbolic spaces as the proximity measurement. The hyperbolic distance is able to meet the triangle inequality and well preserve the transitivity in HIN. Our model enables the nodes and their neighborhoods have small hyperbolic distances. We further derive the effective optimization strategy to update the hyperbolic embeddings iteratively. The experimental results, in comparison with the state-of-the-art, demonstrate that our proposed model not only has superior performance on network reconstruction and link prediction tasks but also shows its ability of capture hierarchy structure in HIN via visualization.
IJCAI Conference 2019 Conference Paper
Despite achieving remarkable success in various domains, recent studies have uncovered the vulnerability of deep neural networks to adversarial perturbations, creating concerns on model generalizability and new threats such as prediction-evasive misclassification or stealthy reprogramming. Among different defense proposals, stochastic network defenses such as random neuron activation pruning or random perturbation to layer inputs are shown to be promising for attack mitigation. However, one critical drawback of current defenses is that the robustness enhancement is at the cost of noticeable performance degradation on legitimate data, e. g. , large drop in test accuracy. This paper is motivated by pursuing for a better trade-off between adversarial robustness and test accuracy for stochastic network defenses. We propose Defense Efficiency Score (DES), a comprehensive metric that measures the gain in unsuccessful attack attempts at the cost of drop in test accuracy of any defense. To achieve a better DES, we propose hierarchical random switching (HRS), which protects neural networks through a novel randomization scheme. A HRS-protected model contains several blocks of randomly switching channels to prevent adversaries from exploiting fixed model structures and parameters for their malicious purposes. Extensive experiments show that HRS is superior in defending against state-of-the-art white-box and adaptive adversarial misclassification attacks. We also demonstrate the effectiveness of HRS in defending adversarial reprogramming, which is the first defense against adversarial programs. Moreover, in most settings the average DES of HRS is at least 5X higher than current stochastic network defenses, validating its significantly improved robustness-accuracy trade-off.
IJCAI Conference 2019 Conference Paper
This paper presents a framework for norm-based capacity control with respect to an lp, q-norm in weight-normalized Residual Neural Networks (ResNets). We first formulate the representation of each residual block. For the regression problem, we analyze the Rademacher Complexity of the ResNets family. We also establish a tighter generalization upper bound for weight-normalized ResNets. in a more general sight. Using the lp, q-norm weight normalization in which 1/p+1/q >=1, we discuss the properties of a width-independent capacity control, which only relies on the depth according to a square root term. Several comparisons suggest that our result is tighter than previous work. Parallel results for Deep Neural Networks (DNN) and Convolutional Neural Networks (CNN) are included by introducing the lp, q-norm weight normalization for DNN and the lp, q-norm kernel normalization for CNN. Numerical experiments also verify that ResNet structures contribute to better generalization properties.
IJCAI Conference 2018 Conference Paper
Trust prediction, aiming to predict the trust relations between users in a social network, is a key to helping users discover the reliable information. Many trust prediction methods are proposed based on the low-rank assumption of a trust network. However, one typical property of the trust network is that the trust relations follow the power-law distribution, i. e. , few users are trusted by many other users, while most tail users have few trustors. Due to these tail users, the fundamental low-rank assumption made by existing methods is seriously violated and becomes unrealistic. In this paper, we propose a simple yet effective method to address the problem of the violated low-rank assumption. Instead of discovering the low-rank component of the trust network alone, we learn a sparse component of the trust network to describe the tail users simultaneously. With both of the learned low-rank and sparse components, the trust relations in the whole network can be better captured. Moreover, the transitive closure structure of the trust relations is also integrated into our model. We then derive an effective iterative algorithm to infer the parameters of our model, along with the proof of correctness. Extensive experimental results on real-world trust networks demonstrate the superior performance of our proposed method over the state-of-the-arts.
IJCAI Conference 2018 Conference Paper
Nonnegative matrix factorization (NMF), a well-known technique to find parts-based representations of nonnegative data, has been widely studied. In reality, ordinal relations often exist among data, such as data i is more related to j than to q. Such relative order is naturally available, and more importantly, it truly reflects the latent data structure. Preserving the ordinal relations enables us to find structured representations of data that are faithful to the relative order, so that the learned representations become more discriminative. However, current NMFs pay no attention to this. In this paper, we make the first attempt towards incorporating the ordinal relations and propose a novel ranking preserving nonnegative matrix factorization (RPNMF) approach, which enforces the learned representations to be ranked according to the relations. We derive iterative updating rules to solve RPNMF's objective function with convergence guaranteed. Experimental results with several datasets for clustering and classification have demonstrated that RPNMF achieves greater performance against the state-of-the-arts, not only in terms of accuracy, but also interpretation of orderly data structure.
AAAI Conference 2018 Conference Paper
Network embedding has recently attracted lots of attentions in data mining. Existing network embedding methods mainly focus on networks with pairwise relationships. In real world, however, the relationships among data points could go beyond pairwise, i. e. , three or more objects are involved in each relationship represented by a hyperedge, thus forming hyper-networks. These hyper-networks pose great challenges to existing network embedding methods when the hyperedges are indecomposable, that is to say, any subset of nodes in a hyperedge cannot form another hyperedge. These indecomposable hyperedges are especially common in heterogeneous networks. In this paper, we propose a novel Deep Hyper-Network Embedding (DHNE) model to embed hypernetworks with indecomposable hyperedges. More specifically, we theoretically prove that any linear similarity metric in embedding space commonly used in existing methods cannot maintain the indecomposibility property in hypernetworks, and thus propose a new deep model to realize a non-linear tuplewise similarity function while preserving both local and global proximities in the formed embedding space. We conduct extensive experiments on four different types of hyper-networks, including a GPS network, an online social network, a drug network and a semantic network. The empirical results demonstrate that our method can significantly and consistently outperform the state-of-the-art algorithms.
AAAI Conference 2018 Conference Paper
Singular Value Decomposition (SVD) is a popular approach in various network applications, such as link prediction and network parameter characterization. Incremental SVD approaches are proposed to process newly changed nodes and edges in dynamic networks. However, incremental SVD approaches suffer from serious error accumulation inevitably due to approximation on incremental updates. SVD restart is an effective approach to reset the aggregated error, but when to restart SVD for dynamic networks is not addressed in literature. In this paper, we propose TIMERS, Theoretically Instructed Maximum-Error-bounded Restart of SVD, a novel approach which optimally sets the restart time in order to reduce error accumulation in time. Specifically, we monitor the margin between reconstruction loss of incremental updates and the minimum loss in SVD model. To reduce the complexity of monitoring, we theoretically develop a lower bound of SVD minimum loss for dynamic networks and use the bound to replace the minimum loss in monitoring. By setting a maximum tolerated error as a threshold, we can trigger SVD restart automatically when the margin exceeds this threshold. We prove that the time complexity of our method is linear with respect to the number of local dynamic changes, and our method is general across different types of dynamic networks. We conduct extensive experiments on several synthetic and real dynamic networks. The experimental results demonstrate that our proposed method significantly outperforms the existing methods by reducing 27% to 42% in terms of the maximum error for dynamic network reconstruction when fixing the number of restarts. Our method reduces the number of restarts by 25% to 50% when fixing the maximum error tolerated.
NeurIPS Conference 2018 Conference Paper
This paper presents a general framework for norm-based capacity control for $L_{p, q}$ weight normalized deep neural networks. We establish the upper bound on the Rademacher complexities of this family. With an $L_{p, q}$ normalization where $q\le p^*$ and $1/p+1/p^{*}=1$, we discuss properties of a width-independent capacity control, which only depends on the depth by a square root term. We further analyze the approximation properties of $L_{p, q}$ weight normalized deep neural networks. In particular, for an $L_{1, \infty}$ weight normalized network, the approximation error can be controlled by the $L_1$ norm of the output layer, and the corresponding generalization error only depends on the architecture by the square root of the depth.
AAAI Conference 2017 Conference Paper
Network embedding, aiming to learn the low-dimensional representations of nodes in networks, is of paramount importance in many real applications. One basic requirement of network embedding is to preserve the structure and inherent properties of the networks. While previous network embedding methods primarily preserve the microscopic structure, such as the first- and second-order proximities of nodes, the mesoscopic community structure, which is one of the most prominent feature of networks, is largely ignored. In this paper, we propose a novel Modularized Nonnegative Matrix Factorization (M-NMF) model to incorporate the community structure into network embedding. We exploit the consensus relationship between the representations of nodes and community structure, and then jointly optimize NMF based representation learning model and modularity based community detection model in a unified framework, which enables the learned representations of nodes to preserve both of the microscopic and community structures. We also provide efficient updating rules to infer the parameters of our model, together with the correctness and convergence guarantees. Extensive experimental results on a variety of real-world networks show the superior performance of the proposed method over the state-of-the-arts.
IJCAI Conference 2017 Conference Paper
Real data are usually complex and contain various components. For example, face images have expressions and genders. Each component mainly reflects one aspect of data and provides information others do not have. Therefore, exploring the semantic information of multiple components as well as the diversity among them is of great benefit to understand data comprehensively and in-depth. However, this cannot be achieved by current nonnegative matrix factorization (NMF)-based methods, despite that NMF has shown remarkable competitiveness in learning parts-based representation of data. To overcome this limitation, we propose a novel multi-component nonnegative matrix factorization (MCNMF). Instead of seeking for only one representation of data, MCNMF learns multiple representations simultaneously, with the help of the Hilbert Schmidt Independence Criterion (HSIC) as a diversity term. HSIC explores the diverse information among the representations, where each representation corresponds to a component. By integrating the multiple representations, a more comprehensive representation is then established. A new iterative updating optimization scheme is derived to solve the objective function of MCNMF, along with its correctness and convergence guarantees. Extensive experimental results on real-world datasets have shown that MCNMF not only achieves more accurate performance over the state-of-the-arts using the aggregated representation, but also interprets data from different aspects with the multiple representations, which is beyond what current NMFs can offer.
YNICL Journal 2016 Journal Article
Mild traumatic brain injury (mTBI) accounts for over one million emergency visits each year in the United States. The large-scale structural and functional network connectivity changes of mTBI are still unknown. This study was designed to determine the connectome-scale brain network connectivity changes in mTBI at both structural and functional levels. 40 mTBI patients at the acute stage and 50 healthy controls were recruited. A novel approach called Dense Individualized and Common Connectivity-based Cortical Landmarks (DICCCOLs) was applied for connectome-scale analysis of both diffusion tensor imaging and resting state functional MRI data. Among 358 networks identified on DICCCOL analysis, 41 networks were identified as structurally discrepant between patient and control groups. The involved major white matter tracts include the corpus callosum, and superior and inferior longitudinal fasciculi. Functional connectivity analysis identified 60 connectomic signatures that differentiate patients from controls with 93.75% sensitivity and 100% specificity. Analysis of functional domains showed decreased intra-network connectivity within the emotion network and among emotion-cognition interactions, and increased interactions among action-emotion and action-cognition as well as within perception networks. This work suggests that mTBI may result in changes of structural and functional connectivity on a connectome scale at the acute stage.
IJCAI Conference 2016 Conference Paper
Identification of module or community structures is important for characterizing and understanding complex systems. While designed with different objectives, i. e. , stochastic models for regeneration and modularity maximization models for discrimination, both these two types of model look for low-rank embedding to best represent and reconstruct network topology. However, the mapping through such embedding is linear, whereas real networks have various nonlinear features, making these models less effective in practice. Inspired by the strong representation power of deep neural networks, we propose a novel nonlinear reconstruction method by adopting deep neural networks for representation. We then extend the method to a semi-supervised community detection algorithm by incorporating pairwise constraints among graph nodes. Extensive experimental results on synthetic and real networks show that the new methods are effective, outperforming most state-of-the-art methods for community detection.
AAAI Conference 2016 Conference Paper
Identification of modular or community structures of a network is a key to understanding the semantics and functions of the network. While many network community detection methods have been developed, which primarily explore network topologies, they provide little semantic information of the communities discovered. Although structures and semantics are closely related, little effort has been made to discover and analyze these two essential network properties together. By integrating network topology and semantic information on nodes, e. g. , node attributes, we study the problems of detection of communities and inference of their semantics simultaneously. We propose a novel nonnegative matrix factorization (NMF) model with two sets of parameters, the community membership matrix and community attribute matrix, and present efficient updating rules to evaluate the parameters with a convergence guarantee. The use of node attributes improves upon community detection and provides a semantic interpretation to the resultant network communities. Extensive experimental results on synthetic and real-world networks not only show the superior performance of the new method over the state-of-the-art approaches, but also demonstrate its ability to semantically annotate the communities.
YNICL Journal 2015 Journal Article
Higher local carotid artery strain has previously been shown to be a characteristic of unstable carotid plaques. These plaques may be characterized by microvascular changes that predispose to intraplaque hemorrhage, increasing the likelihood of embolization. Little is known however, about how these strain indices correspond with imaging markers of brain health and metrics of brain structure. White matter hyperintensities (WMHs), which are bright regions seen on T2-weighted brain MRI imaging, are postulated to result from cumulative ischemic vascular injury. Consequently, we hypothesized that plaques that are more prone to microvascular changes and embolization, represented by higher strain indices on ultrasound, would be associated with an increased amount of WMH lesion volume. This relationship would suggest not only emboli as a cause for the brain degenerative changes, but more importantly, a common microvascular etiology for large and small vessel contributions to this process. Subjects scheduled to undergo a carotid endarterectomy were recruited from a neurosurgery clinic. Prior to surgery, participating subjects underwent both ultrasound strain imaging and brain MRI scans as part of a larger clinical study on vascular health and cognition. A linear regression found that maximum absolute strain and peak to peak strain in the surgical side carotid artery were predictive of WMH burden. Furthermore, the occurrence of microembolic signals monitored using transcranial Doppler (TCD) ultrasound examinations also correlated with increasing lesion burden. It is becoming increasingly recognized that cognitive decline is often multifactorial in nature. One contributing extra-brain factor may be changes in the microvasculature that produce unstable carotid artery plaques. In this study, we have shown that higher strain indices in carotid artery plaques are significantly associated with an increased WMH burden, a marker of vascular mediated brain damage.
IS Journal 2013 Journal Article
Research on social media has been applied to various academic fields. During this year's Chinese National Holiday traffic congestion event, online users showed great enthusiasm on social media, such as forums, Weibo, communities, and other platforms. This article describes the construction of a dynamic evolution network, analyzes the transformation of online users' concentration, and studies the geographic distribution of travelers by analyzing online users' attributes.
AAAI Conference 2012 Conference Paper
As the popularity of the social media increases, as evidenced in Twitter, Facebook and China’s Renren, spamming activities also picked up in numbers and variety. On social network sites, spammers often disguise themselves by creating fake accounts and hijacking normal users’ accounts for personal gains. Different from the spammers in traditional systems such as SMS and email, spammers in social media behave like normal users and they continue to change their spamming strategies to fool anti-spamming systems. However, due to the privacy and resource concerns, many social media websites cannot fully monitor all the contents of users, making many of the previous approaches, such as topology-based and content-classification-based methods, infeasible to use. In this paper, we propose a Supervised Matrix Factorization method with Social Regularization (SMFSR) for spammer detection in social networks that exploits both social activities as well as users’ social relations in an innovative and highly scalable manner. The proposed method detects spammers collectively based on users’ social actions and social relations. We have empirically tested our method on data from Renren. com, which is one of the largest social networks in China, and demonstrated that our new method can improve the detection performance significantly.