Arrow Research search

Author name cluster

Jie Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

98 papers
2 author rows

Possible papers

98

TAAS Journal 2026 Journal Article

A Novel Physics-Informed Federated Learning Framework for Robust Bearing Fault Diagnosis

  • Jiaqi Chen
  • Jie Wang
  • Yongquan Jiang
  • ZhengHong Wang
  • Fan Zhang
  • Yan Yang

Rolling bearing failures are a primary cause of catastrophic machinery breakdowns, posing significant economic and safety risks. Effective fault diagnosis is frequently hindered by challenges inherent to modern industrial settings, including data privacy constraints, statistical heterogeneity across Non-Independent and Identically Distributed (Non-IID) datasets, and the prevalence of few-shot learning scenarios. To address these challenges, this paper introduces CARR-MgNet, a novel physics-informed federated learning framework. The framework utilizes a M ulti- g ranularity fusion Net work (MgNet) backbone, which enhances feature robustness by embedding physical fault characteristics directly into its convolutional kernels. To ensure stable federated training across heterogeneous clients, we then introduce a C lass- A verage R epresentation R egularization (CARR) mechanism to effectively mitigate client drift. Extensive experiments on four public industrial datasets validate the state-of-the-art performance of our proposed framework. Under challenging non-IID conditions, CARR-MgNet surpasses established baselines, including FedProx and MOON, by up to 8.2% in accuracy. Furthermore, it reduces the number of communication rounds required to reach 95% accuracy by 40% compared to FedAvg and reduces total communication overhead by 35%. These results demonstrate that our physics-informed federated approach provides a robust, communication-efficient, and privacy-preserving solution for real-world industrial fault diagnosis.

EAAI Journal 2026 Journal Article

A short-term water demand forecasting method integrating wavelet stepwise decomposition and spatial-temporal features

  • Chenlei Xie
  • Jie Wang
  • Tao Chen
  • Qiansheng Fang
  • Shanshou Li
  • Xuelei Yang

Accurate short-term water demand forecasting is crucial for the management and scheduling of water distribution systems. However, existing decomposition-based prediction models face two major challenges: prevalent data leakage during global decomposition, which distorts model evaluation, and the inherent shift-variance in methods designed to avoid leakage, resulting in poor forecasting accuracy. To address these issues, this paper proposes an innovative forecasting framework integrating Wavelet stepwise decomposition (WSD) with spatial-temporal features. The core contributions of this work are threefold: First, the proposed WSD method employs a fixed-length sliding window for decomposition, fundamentally eliminating data leakage. Second, correlation analysis is introduced to optimize the selection of the mother wavelet, thereby minimizing errors caused by shift-variance. Third, a hybrid prediction model is constructed, where Extreme gradient boosting (XGBoost) fits the stable trends of low-frequency subseries, and an inverted Transformer (iTransformer) captures the dynamic dependencies within multi-dimensional spatial-temporal features of high-frequency subseries, significantly enhancing their prediction accuracy. Experimental results on a real-world water distribution networks (WDN) demonstrate that the proposed method outperforms benchmark models, including Long short-term memory (LSTM) and graph-based models.

EAAI Journal 2026 Journal Article

Cross-domain attention guided multi-source domain adaptation method for machinery fault diagnosis

  • Jie Wang
  • Jianning Gou
  • Haidong Shao
  • Yiming Xiao
  • Ying Peng
  • Bin Liu

Compared with single-source approaches, multi-source domain adaptation (MSDA) for fault diagnosis integrates complementary information from various domains. This avoids the subjectivity and arbitrariness associated with selecting a single source. However, existing MSDA methods for fault diagnosis typically enforce global distribution alignment between the feature of source and target domains. Such alignment often leads to the loss of discriminative fault features in the target domain, resulting in negative transfer. To address aforementioned issues, a cross-domain attention guided MSDA model (CDA-MSDA) is proposed in this paper. In this framework, a cross-domain attention module is constructed to dynamically fuse source and target domain features. This module effectively enhances the transfer of task-relevant features in the source domain and preserves discriminative features in the target domain. Then, a fault knowledge distillation module is developed to guide the feature extractor and classifier in achieving cross-domain fault category alignment. Finally, a multi-model dynamic collaborative decision module is designed. By aggregating prediction results from multiple classifiers, it addresses prediction conflicts arising from the varying reliability of different source domains. Extensive experiments on three benchmark datasets across 16 transfer tasks validate the effectiveness of the proposed method. Specifically, CDA-MSDA achieves an average diagnostic accuracy of 94. 99 %, outperforming state-of-the-art baselines by 2–10 %, demonstrating superior robustness and stability in complex fault diagnosis scenarios.

EAAI Journal 2026 Journal Article

Dual-stage interpretable domain generalization fault diagnosis: integrating prior knowledge and gradient-weighted class activation mapping

  • Ying Peng
  • Haidong Shao
  • Yiming Xiao
  • Jie Wang
  • Bin Liu

Recent advancements in domain generalization methods for fault diagnosis have achieved excellent performance. However, its inherent black-box characteristics seriously hinder its practical deployment in critical industrial scenarios. In addition, current cross-domain interpretability research often focuses on a single stage, resulting in an incomplete and unreliable understanding of model behavior. To overcome the above bottlenecks, this article proposes a dual-stage interpretable domain generalization fault diagnosis framework. In the first stage, a prior knowledge-guided feature extractor is constructed to extract steady-state and transient features from low- and high-frequency directions, thereby improving the model's ante-hoc interpretability. In the second stage, gradient-weighted class activation mapping is employed to visualize the class activation maps, revealing the attention regions during signal processing and enabling post-hoc interpretability analysis. The proposed method is validated using two distinct gearbox datasets, demonstrating superior performance in diagnostic accuracy and model interpretability compared to conventional domain generalization fault diagnosis approaches. In addition, the prior knowledge-guided feature extractor proves effective when integrated into other domain generalization models, and gradient-weighted class activation mapping proves to be a valuable tool for post-hoc interpretability assessment in the field of domain generalization fault diagnosis.

AAAI Conference 2026 Conference Paper

Mitigating Hallucinations in Large Language Models via Causal Reasoning

  • Yuangang Li
  • Yiqing Shen
  • Yi Nian
  • Jiechao Gao
  • Ziyi Wang
  • Chenxiao Yu
  • Li Li
  • Jie Wang

Large language models (LLMs) exhibit logically inconsistent hallucinations that appear coherent yet violate reasoning principles, with recent research suggesting an inverse relationship between causal reasoning capabilities and such hallucinations. However, existing reasoning approaches in LLMs, such as Chain-of-Thought (CoT) and its graph-based variants, operate at the linguistic token level rather than modeling the underlying causal relationships between variables, lacking the ability to represent conditional independencies or satisfy causal identification assumptions. To bridge this gap, we introduce causal-DAG construction and reasoning (CDCR-SFT), a supervised fine-tuning framework that trains LLMs to explicitly construct variable-level directed acyclic graph (DAG) and then perform reasoning over it. Moreover, we present a dataset comprising 25,368 samples (CausalDR), where each sample includes an input question, explicit causal DAG, graph-based reasoning trace, and validated answer. Experiments on four LLMs across eight tasks show that CDCR-SFT improves the causal reasoning capability with the state-of-the-art 95.33% accuracy on CLADDER (surpassing human performance of 94.8% for the first time) and reduces the hallucination on HaluEval with 10% improvements. It demonstrates that explicit causal structure modeling in LLMs can effectively mitigate logical inconsistencies in LLM outputs.

JBHI Journal 2026 Journal Article

RT-SAM: Visual-Prompt Fusion and Uncertainty Enhancement for Nasopharyngeal Carcinoma Radiotherapy Target Delineation

  • Hee Guan Khor
  • Xin Yang
  • Yihua Sun
  • Sijuan Huang
  • Yingni Wang
  • Jie Wang
  • Shaobin Wang
  • Lu Bai

Precise delineation of the clinical target volume (CTV) and nodal CTV (CTV $_{{\mathit{nd}}}$ ) is crucial for effective radiotherapy planning in nasopharyngeal carcinoma (NPC). Manual contouring is labor-intensive and subject to substantial inter-observer variability, particularly in regions with complex anatomy and indistinct boundaries. This study presents RT-SAM, a novel framework that adapts the Medical Segment Anything Model 2 (MedSAM-2) for automated CTV (i. e. , primary CTV and CTV $_{nd}$ ) contouring in NPC computed tomography (CT) images. The framework synergistically integrates a generalist foundation model (MedSAM-2) with a domain-specific specialist network (2D U-Net) through three principal contributions: (1) automated generation of multi-modal prompts—comprising mask, bounding box, and point representations—derived from specialist network predictions to guide the generalist model; (2) a Visual-Prompt Fusion Attention (ViPFA) mechanism that optimizes feature-prompt interactions through bidirectional cross-modal attention; and (3) an Uncertainty-Enhanced Prediction Adjustment (UEPA) mechanism that enhances model robustness via confidence-based refinement and selective domain adaptation. Comprehensive evaluation on a multi-center cohort of 256 clinical NPC cases from Sun Yat-sen University Cancer Center and 212 public NPC cases from the SegRap2025 lymph node CTV dataset using 5- fold cross-validation demonstrates that RT-SAM achieves a mean DICE coefficient of 0. 796 $\pm$ 0. 033 (mean $\pm$ standard deviation), significantly outperforming current state-of-the-art methods. Clinical validation by eight radiation oncologists demonstrates that RT-SAM contours are clinically indistinguishable from expert delineations in blinded Turing assessments, achieve superior quality ratings in 75% of comparisons with mean scores of 2. 73 for RT-SAM versus 2. 66 for manual expert contours, and attain clinically acceptable ratings in over 97% of cases. These results demonstrate that RT-SAM is a clinically feasible solution for automated CTV contouring, with strong potential to standardize treatment planning and mitigate inter-observer variability in NPC radiotherapy.

AAAI Conference 2026 Conference Paper

S2-Boost: Synergistic Semantic Boosting for Coarse-to-Fine Ensemble Learning

  • Guanxiong He
  • Zheng Wang
  • Jie Wang
  • Liaoyuan Tang
  • Rong Wang
  • Feiping Nie

Neuroscientific evidence reveals that human visual recognition is not an instantaneous event but a hierarchical process, where the brain constructs a holistic perception by progressively integrating simple features like edges or texture into complex scenes. Ensemble learning successfully utilizes this principle, yet existing methods typically integrate models at the decision level, neglecting the rich, complementary information within the feature space itself and thus fundamentally limiting their potential. To address this, we introduce Synergistic Semantic Boosting (S2-Boosting), a framework that employs a self-supervised hierarchical semantic learning module to decompose an image into complementary, semantically meaningful parts autonomously. These parts guide a boosting procedure where a sequence of specialized learners, each focusing on a specific semantic partition, collaboratively corrects the ensemble's errors. We further present encouraging results on real-world image datasets, highlighting the intrinsic interpretability, paving the way for more robust and transparent models.

AAAI Conference 2026 Conference Paper

Towards Federated Clustering: A Client-wise Private Graph Aggregation Framework

  • Guanxiong He
  • Zheng Wang
  • Jie Wang
  • Liaoyuan Tang
  • Rong Wang
  • Feiping Nie

Federated clustering addresses the critical challenge of extracting patterns from decentralized, unlabeled data. However, it is hampered by the flaw that current approaches are forced to accept a compromise between performance and privacy: transmitting embedding representations risks sensitive data leakage, while sharing only abstract cluster prototypes leads to diminished model accuracy. To resolve this dilemma, we propose Structural Privacy-Preserving Federated Graph Clustering (SPP-FGC), a novel algorithm that innovatively leverages local structural graphs as the primary medium for privacy-preserving knowledge sharing, thus moving beyond the limitations of conventional techniques. Our framework operates on a clear client-server logic; on the client-side, each participant constructs a private structural graph that captures intrinsic data relationships, which the server then securely aggregates and aligns to form a comprehensive global graph from which a unified clustering structure is derived. The framework offers two distinct modes to suit different needs. SPP-FGC is designed as an efficient one-shot method that completes its task in a single communication round, ideal for rapid analysis. For more complex, unstructured data like images, SPP-FGC+ employs an iterative process where clients and the server collaboratively refine feature representations to achieve superior downstream performance. Extensive experiments demonstrate that our framework achieves state-of-the-art performance, improving clustering accuracy by up to 10% (NMI) over federated baselines while maintaining provable privacy guarantees.

AAAI Conference 2025 System Paper

A Multi-Style Chinese Characters Writing Intelligent Tool Based on Small-scale Training Data

  • Zhen Zeng
  • Jie Wang
  • Xi Lyu

Chinese characters are a unique blend of language and art, featuring diverse artistic styles. Mastering these styles requires extensive practice and limits public participation. To encourage broader participation, we developed a real-time, interactive tool that supports multiple Chinese character art styles. This tool uses a diffusion model and several LoRA models to capture the diversity of Chinese character art. It generates personalized, visually striking Chinese character artworks in real-time by utilizing handwritten input, allowing users to adjust various stylistic parameters.

NeurIPS Conference 2025 Conference Paper

Accurate KV Cache Eviction via Anchor Direction Projection for Efficient LLM Inference

  • Zijie Geng
  • Jie Wang
  • Ziqi Liu
  • Feng Ju
  • Yiming Li
  • Xing Li
  • Mingxuan Yuan
  • Jianye Hao

Key-Value (KV) cache eviction---which retains the KV pairs of the most important tokens while discarding less important ones---is a critical technique for optimizing both memory usage and inference latency in large language models (LLMs). However, existing approaches often rely on simple heuristics---such as attention weights---to measure token importance, overlooking the spatial relationships between token value states in the vector space. This often leads to suboptimal token selections and thus performance degradation. To tackle this problem, we propose a novel method, namely **AnDPro** (**An**chor **D**irection **Pro**jection), which introduces a projection-based scoring function to more accurately measure token importance. Specifically, AnDPro operates in the space of value vectors and leverages the projections of these vectors onto an *``Anchor Direction''*---the direction of the pre-eviction output---to measure token importance and guide more accurate token selection. Experiments on $16$ datasets from the LongBench benchmark demonstrate that AnDPro can maintain $96. 07\\%$ of the full cache accuracy using only $3. 44\\%$ KV cache budget, reducing KV cache budget size by $46. 0\\%$ without compromising quality compared to previous state-of-the-arts.

NeurIPS Conference 2025 Conference Paper

ArchCAD-400K: A Large-Scale CAD drawings Dataset and New Baseline for Panoptic Symbol Spotting

  • Ruifeng Luo
  • Zhengjie Liu
  • Tianxiao Cheng
  • Jie Wang
  • Tongjie Wang
  • Fei Cheng
  • Fu Chai
  • Yanpeng Li

Recognizing symbols in architectural CAD drawings is critical for various advanced engineering applications. In this paper, we propose a novel CAD data annotation engine that leverages intrinsic attributes from systematically archived CAD drawings to automatically generate high-quality annotations, thus significantly reducing manual labeling efforts. Utilizing this engine, we construct ArchCAD-400K, a large-scale CAD dataset consisting of 413, 062 chunks from 5538 highly standardized drawings, making it over 26 times larger than the largest existing CAD dataset. ArchCAD-400K boasts an extended drawing diversity and broader categories, offering line-grained annotations. Furthermore, we present a new baseline model for panoptic symbol spotting, termed Dual-Pathway Symbol Spotter (DPSS). It incorporates an adaptive fusion module to enhance primitive features with complementary image features, achieving state-of-the-art performance and enhanced robustness. Extensive experiments validate the effectiveness of DPSS, demonstrating the value of ArchCAD-400K and its potential to drive innovation in architectural design and construction.

NeurIPS Conference 2025 Conference Paper

AttentionPredictor: Temporal Patterns Matter for KV Cache Compression

  • Qingyue Yang
  • Jie Wang
  • Xing Li
  • Zhihai Wang
  • Chen Chen
  • Lei Chen
  • Xianzhi Yu
  • Wulong Liu

With the development of large language models (LLMs), efficient inference through Key-Value (KV) cache compression has attracted considerable attention, especially for long-context generation. To compress the KV cache, recent methods identify critical KV tokens through static modeling of attention scores. However, these methods often struggle to accurately determine critical tokens as they neglect the *temporal patterns* in attention scores, resulting in a noticeable degradation in LLM performance. To address this challenge, we propose **AttentionPredictor**, which is the **first learning-based method to directly predict attention patterns for KV cache compression and critical token identification**. Specifically, AttentionPredictor learns a lightweight, unified convolution model to dynamically capture spatiotemporal patterns and predict the next-token attention scores. An appealing feature of AttentionPredictor is that it accurately predicts the attention score and shares the unified prediction model, which consumes negligible memory, among all transformer layers. Moreover, we propose a cross-token critical cache prefetching framework that hides the token estimation time overhead to accelerate the decoding stage. By retaining most of the attention information, AttentionPredictor achieves **13$\times$** KV cache compression and **5. 6$\times$** speedup in a cache offloading scenario with comparable LLM performance, significantly outperforming the state-of-the-arts. The code is available at https: //github. com/MIRALab-USTC/LLM-AttentionPredictor.

NeurIPS Conference 2025 Conference Paper

Benchmarking End-To-End Performance of AI-Based Chip Placement Algorithms

  • Zhihai Wang
  • Zijie Geng
  • Zhaojie Tu
  • Jie Wang
  • Yuxi Qian
  • Zhexuan Xu
  • Ziyan Liu
  • Siyuan Xu

Chip placement is a critical step in the Electronic Design Automation (EDA) workflow, which aims to arrange chip modules on the canvas to optimize the performance, power, and area (PPA) metrics of final designs. Recent advances show great potential of AI-based algorithms in chip placement. However, due to the lengthy EDA workflow, evaluations of these algorithms often focus on intermediate surrogate metrics, which are computationally efficient but often misalign with the final end-to-end performance (i. e. , the final design PPA). To address this challenge, we propose to build ChiPBench, a comprehensive benchmark specifically designed to evaluate the effectiveness of AI-based algorithms in final design PPA metrics. Specifically, we generate a diverse evaluation dataset from $20$ circuits across various domains, such as CPUs, GPUs, and NPUs. We then evaluate six state-of-the-art AI-based chip placement algorithms on the dataset and conduct a thorough analysis of their placement behavior. Extensive experiments show that AI-based chip placement algorithms produce unsatisfactory final PPA results, highlighting the significant influence of often-overlooked factors like regularity and dataflow. We believe ChiPBench will effectively bridge the gap between academia and industry.

NeurIPS Conference 2025 Conference Paper

Can Class-Priors Help Single-Positive Multi-Label Learning?

  • Biao Liu
  • Ning Xu
  • Jie Wang
  • Xin Geng

Single-positive multi-label learning (SPMLL) is a weakly supervised multi-label learning problem, where each training example is annotated with only one positive label. Existing SPMLL methods typically assign pseudo-labels to unannotated labels with the assumption that prior probabilities of all classes are identical. However, the class-prior of each category may differ significantly in real-world scenarios, which makes the predictive model not perform as well as expected due to the unrealistic assumption on real-world application. To alleviate this issue, a novel framework named Crisp, i. e. , Class-pRiors Induced Single-Positive multi-label learning, is proposed. Specifically, a class-priors estimator is introduced, which can estimate the class-priors that are theoretically guaranteed to converge to the ground-truth class-priors. In addition, based on the estimated class-priors, an unbiased risk estimator for classification is derived, and the corresponding risk minimizer can be guaranteed to approximately converge to the optimal risk minimizer on fully supervised data. Experimental results on ten MLL benchmark datasets demonstrate the effectiveness and superiority of our method over existing SPMLL approaches.

AIIM Journal 2025 Journal Article

CATI: A medical context-enhanced framework for diagnosis code assignment in the UK Biobank study

  • Yue Shen
  • Jie Wang
  • Zhe Wang
  • Zhihao Shi
  • Hanzhu Chen
  • Zheng Wang
  • Yukang Jiang
  • Xiaopu Wang

Diagnosis codes are standard code format of diseases or medical conditions. This study is aimed at assigning diagnosis codes to patients in large-scale biobanks, particularly addressing the issue of missing codes for some patients. This is crucial for downstream disease-related tasks. While recent methods primarily rely on structured biobank data for code assignment, they often overlook the valuable medical context provided by textual information in the biobanks and hierarchical structure of the disease coding system. To address this gap, we have developed CATI, a medical context-enhanced framework for diagnosis Code Assignment by integrating Textual details derived from key features and disease hIerarchy. The study is based on the UK Biobank data and considers Phecodes and ICD-10 codes as standard disease formats. We start by representing ten informative codified features using their formal names and then integrate them into CATI as text embeddings, achieved through prompt tuning on the pre-trained language model BioBERT. Recognizing the hierarchical structure of diagnosis codes, we have developed a novel convolution layer in our method that effectively propagates logits between adjacent diagnosis codes. Evaluation results demonstrate that CATI outperforms existing state-of-the-art methods in terms of both Phecodes and ICD-10 codes, boasting at least a 5. 16% improvement in average AUROC for unseen disease codes and an 8. 68% rise in average AUPRC for disease codes with training instances ranging in (1000, 10000]. This framework contributes to the formation of well-defined cohorts for downstream studies and offers a unique perspective for addressing complex healthcare tasks by incorporating vital medical context.

EAAI Journal 2025 Journal Article

Comprehensive performance evaluation of valuable medical equipment based on cloud modelling and combined weighting methodologies

  • Xingtong Zhang
  • Saifeng Fang
  • Yongchun Jin
  • Ying Huang
  • Shucheng Wang
  • Jie Wang
  • Yunhua Xu

The construction of the performance evaluation index system for valuable medical equipment is the basis for measuring the use of medical equipment in hospitals. It is crucial to the management and evaluation of equipment. This study aimed to develop a comprehensive evaluation model that thoroughly assesses the operational status, utilization efficiency, and service quality of valuable medical equipment in hospitals. The performance evaluation index system for valuable medical equipment was constructed using four dimensions. The subjective weights were determined using the Analytical Hierarchy Process (AHP), while the objective weights were calculated using the Entropy Weight Method (EWM). The combined weights of the index system were derived by integrating both subjective and objective weights through game theory. The Delphi method was employed to establish a standard cloud model, which was subsequently integrated with the combined weights to construct a comprehensive evaluation cloud model. This research evaluated the performance of nine newly acquired pieces of valuable medical equipment that were operational in a hospital after 2020, thereby validating the reliability of the proposed model. The outcomes indicate that the model effectively addresses the problem of data uncertainty in fuzzy evaluations while alleviating the limitations associated with single weighting methods. The performance evaluation model for medical equipment proposed in this study provided an innovative and effective strategy for assessing valuable medical equipment in hospitals, thereby enhancing the scientific and effective management of medical equipment.

NeurIPS Conference 2025 Conference Paper

Dynamic Configuration for Cutting Plane Separators via Reinforcement Learning on Incremental Graph

  • Mingxuan Ye
  • Jie Wang
  • Fangzhou Fangzhou
  • Zhihai Wang
  • Yufei Kuang
  • Xijun Li
  • Weilin Luo
  • Jianye Hao

Cutting planes (cuts) are essential for solving mixed-integer linear programming (MILP) problems, as they tighten the feasible solution space and accelerate the solving process. Modern MILP solvers offer diverse cutting plane separators to generate cuts, enabling users to leverage their potential complementary strengths to tackle problems with different structures. Recent machine learning approaches learn to configure separators based on problem-specific features, selecting effective separators and deactivating ineffective ones to save unnecessary computing time. However, they ignore the dynamics of separator efficacy at different stages of cut generation and struggle to adapt the configurations for the evolving problems after multiple rounds of cut generation. To address this challenge, we propose a novel dyn amic sep arator configuration ( DynSep ) method that models separator configuration in different rounds as a reinforcement learning task, making decisions based on an incremental triplet graph updated by iteratively added cuts. Specifically, we tokenize the incremental subgraphs and utilize a decoder-only Transformer as our policy to autoregressively predict when to halt separation and which separators to activate at each round. Evaluated on synthetic and large-scale real-world MILP problems, DynSep speeds up average solving time by 64% on easy and medium datasets, and reduces primal-dual gap integral within the given time limit by 16% on hard datasets. Moreover, experiments demonstrate that DynSep well generalizes to MILP instances of significantly larger sizes than those seen during training.

JBHI Journal 2025 Journal Article

Explainable End-to-End Seizure Prediction via Stationary Wavelet Transform-Driven Dynamic Multiscale Fuzzy Clustering

  • Jie Wang
  • Yingchao Wang
  • Weiwei Nie
  • Qi Yuan

Epileptic seizure prediction holds critical clinical significance for enhancing the quality of life in patients. Despite technological advances, existing approaches face persistent challenges arising from inter-subject variability in electroencephalogram (EEG) dynamics and the complex spatiotemporal coupling associated with ictal transitions. These issues compromise both feature discriminability and model explainability. To address these dual limitations, we propose a novel stationary wavelet transform (SWT)-driven dynamic multiscale fuzzy clustering (SD-MFC) framework, an explainable prediction pipeline that integrates EEG signal analysis with transparent clinical decision-making. Methodologically, the spectral-temporal decomposition of EEG signals via SWT is combined with a geometric attention mechanism to model cross-channel dependencies. To capture the dynamic nature of EEG signals, we develop a Riemannian manifold-based fuzzy clustering algorithm through covariance matrix optimization and time-variant manifold metrics. Hierarchical feature fusion is achieved through multiscale convolutional kernels within a three-layer convolutional network. Model training incorporates contrastive learning with a hybrid supervised/self-supervised strategy to enhance robustness. Notably, two explainability methods: a joint feature visualization strategy and an efficient feature ablation study, are proposed to bridge the adaptation gap between the “black-box” nature of deep learning models and the requirements of clinical diagnostic. Experimental results on both intracranial and extracranial datasets demonstrate that SD-MFC framework not only exhibits superior predictive performance but also maintains a low FPR, offering a feasible scheme for clinical application of EEG-based seizure prediction. The code has been available at https://github.com/JW-Image/SD-MFC.

AAAI Conference 2025 Conference Paper

Exploring Temporal Event Cues for Dense Video Captioning in Cyclic Co-Learning

  • Zhuyang Xie
  • Yan Yang
  • Yankai Yu
  • Jie Wang
  • Yongquan Jiang
  • Xiao Wu

Dense video captioning aims to detect and describe all events in untrimmed videos. This paper presents a dense video captioning network called Multi-Concept Cyclic Learning (MCCL), which aims to: (1) detect multiple concepts at the frame level and leverage these concepts to provide temporal event cues; and (2) establish cyclic co-learning between the generator and the localizer within the captioning network to promote semantic perception and event localization. Specifically, weakly supervised concept detection is performed for each frame, and the detected concept embeddings are integrated into the video features to provide event cues. Additionally, video-level concept contrastive learning is introduced to produce more discriminative concept embeddings. In the captioning network, a cyclic co-learning strategy is proposed, where the generator guides the localizer for event localization through semantic matching, while the localizer enhances the generator’s event semantic perception through location matching, making semantic perception and event localization mutually beneficial. MCCL achieves state-of-the-art performance on the ActivityNet Captions and YouCook2 datasets. Extensive experiments demonstrate its effectiveness and interpretability.

EAAI Journal 2025 Journal Article

Hierarchical multi-scale matched masked autoencoder for industrial multi-rate time series modeling

  • Changqing Yuan
  • Yongfang Xie
  • Shiwen Xie
  • Jie Wang

In practical industrial processes, due to sensor hardware limitations, the sampling rates of different variables often vary, leading to multi-rate time series (MRTS) data. However, the distribution of multi-scale dynamics in MRTS data typically follows a step-like pattern, with intricate scale transitions from fine to coarse and complex scale-consistent dependencies across rates. Additionally, the inherent characteristics of MRTS data often result in label scarcity. Both factors present significant challenges for MRTS modeling. To address these issues, we propose a novel self-supervised learning strategy, called Hierarchical Multi-Scale Matched Masked Autoencoder (H3MAE). Specifically, we design a scale-matching input fusion mechanism where each layer is hierarchically aligned to a specific scale, with the scale-matching integration from two sources, effectively capturing the multi-scale dynamics and cross-rate scale-consistent dependencies in MRTS data. Besides, we introduce a novel auxiliary task that imputes masked positions in the encoded representation space at each layer, aiming to achieve MRTS representation learning and mitigate label scarcity. Furthermore, we propose a unique encoder-imputer structure in each layer to enable multi-scale self-supervised learning while generating temporally aligned features satisfying the input requirements of the next layer. Experimental results on three benchmark datasets and two industrial multi-rate tasks demonstrate that our framework yields better performance in MRTS modeling. The code is publicly available at https: //github. com/monolithycq/H3MAE.

NeurIPS Conference 2025 Conference Paper

High-Performance Arithmetic Circuit Optimization via Differentiable Architecture Search

  • Xilin Xia
  • Jie Wang
  • Wanbo Zhang
  • Zhihai Wang
  • Mingxuan Yuan
  • Jianye Hao
  • Feng Wu

Arithmetic circuit optimization remains a fundamental challenge in modern integrated circuit design. Recent advances have cast this problem within the Learning to Optimize (L2O) paradigm, where intelligent agents autonomously explore high-performance design spaces with encouraging results. However, existing approaches predominantly target coarse-grained architectural configurations, while the crucial interconnect optimization stage is often relegated to oversimplified proxy models or a heuristic approach. This disconnect undermines design quality, leading to suboptimal solutions in the circuit topology search space. To bridge this gap, we present **Arith-DAS**, a **D**ifferentiable **A**rchitecture **S**earch framework for **Arith**metic circuits. To the best of our knowledge, **Arith-DAS** is the first to formulate interconnect optimization within arithmetic circuits as a differentiable edge prediction problem over a multi-relational directed acyclic graph, enabling fine-grained, proxy-free optimization at the interconnection level. We evaluate **Arith-DAS** on a suite of representative arithmetic circuits, including multipliers and multiply-accumulate units. Experiments show substantial improvements over state-of-the-art L2O and conventional methods, achieving up to $\textbf{27. 05}$% gain in hypervolume of area-delay Pareto front, a standard metric for evaluating multi-objective optimization performance. Moreover, integrating our optimized arithmetic units into large-scale AI accelerators yields up to $\textbf{6. 59}$% delay reduction, demonstrating both scalability and real-world applicability.

TIST Journal 2025 Journal Article

Hire: Hybrid-Modal Interaction with Multiple Relational Enhancements for Image-Text Matching

  • Xuri Ge
  • Fuhai Chen
  • Songpei Xu
  • Fuxiang Tao
  • Jie Wang
  • Joemon M. Jose

Image-Text Matching (ITM) is a fundamental problem in computer vision. The key issue lies in jointly learning the visual and textual representation to estimate their similarity accurately. Most existing methods focus on feature enhancement within modality or feature interaction across modalities, which, however, neglects the contextual information of the object representation based on the inter-object relationships that match the corresponding sentences with rich contextual semantics. In this article, we propose a Hybrid-modal Interaction with multiple Relational Enhancements (termed Hire ) for ITM, which correlates the intra- and inter-modal semantics between objects and words with implicit and explicit relationship modeling. In particular, the explicit intra-modal spatial-semantic graph-based reasoning network is designed to improve the contextual representation of visual objects with salient spatial and semantic relational connectivities, guided by the explicit relationships of the objects’ spatial positions and their scene graph. We use implicit relationship modeling for potential relationship interactions before explicit modeling to improve the fault tolerance of explicit relationship detection. Then the visual and textual semantic representations are refined jointly via inter-modal interactive attention and cross-modal alignment. To correlate the context of objects with the textual context, we further refine the visual semantic representation via cross-level object-sentence and word-image-based interactive attention. Extensive experiments validate that the proposed hybrid-modal interaction with implicit and explicit modeling is more beneficial for ITM. And the proposed Hire obtains new state-of-the-art results on MS-COCO and Flickr30K benchmarks.

AAAI Conference 2025 System Paper

InstantPainting: Expanding GANs for Efficient Text-Conditioned Image Generation Platform

  • Bing-Kun Bao
  • Yefei Sheng
  • Jie Wang
  • Yaning Li
  • Sisi You

Text-conditioned image generation enables cross-modal comprehension. Recent emergence of many platforms have found applications in diverse domains like assisted designing and video gaming. However, there still exist challenges in existing platforms due to their expensive training and time-consuming generation processes. In this paper, we introduce an efficient text-conditioned image generation platform, termed InstantPainting. Unlike existing platforms based on large-scale pre-trained diffusion models, InstantPainting expands generative adversarial networks (GANs) to achieve efficient generation by using only about three percent pre-training data of other platforms. Compared to existing platforms, InstantPainting achieves the following functions at a very low deployment cost and approximately 4 to 5 times faster generation speeds: (1) Multi-category and multi-size image generation (2) Image stylization and controlled generation (3) Creative generation, including the generation of poetry pictures and counterfactual images. The proposed platform provides web application implementations for PC and mobile, users can create high-quality images directly through the user interface.

JBHI Journal 2025 Journal Article

Interpretable End to End Epileptic Seizure Detection via Linear and Nonlinear Filtering Networks

  • Jie Wang
  • Xianlei Zeng
  • Yingchao Wang
  • Jie Xu
  • Defu Zhai
  • Han Xiao
  • Weiwei Nie
  • Qi Yuan

Epilepsy is a prevalent neurological disorder marked recurrent, unpredictable seizures. Electroencephalogram (EEG)-based seizure detection has become a key focus in clinical research due to its potential for identifying abnormal brain activity patterns. However, most current approaches rely on single-modal feature analysis and struggle to disentangle the complex linear and nonlinear dynamics of EEG signals, limiting their clinical utility. To address this limitation, we propose a novel contrastive learning framework with linear and nonlinear filtering networks (CL LNFNet) for interpretable seizure detection. CL-LNFNet enhances explainability by tracing the full decision-making pathway from raw EEG signals to diagnostic outcomes. Through comparative analysis of feature evolution across six seizure types and non-seizure states, the model bridges the gap between the “black-box” nature of deep learning and the transparency required in clinical diagnostics. The framework first employs a recursive residual decomposition scheme to extract linear and nonlinear components using dual-branch decoupling networks. These features are then refined via two adaptive filtering networks equipped with feature selection gating mechanisms. A multi-scale convolutional module within a three-layer convolutional architecture hierarchically integrates the dual-stream outputs to improve classification performance. Furthermore, we introduce a hybrid learning strategy that combines supervised and self-supervised contrastive learning to enhance feature representation through the joint optimization of both loss functions. Experimental evaluations on both scalp and intracranial EEG datasets demonstrate that CL-LNFNet achieves over 95% accuracy in both cross-patient and specific patient scenarios, outperforming existing state-of-the-art methods. The code is available at https://github.com/JW Image/CL-LNFNet.

AAAI Conference 2025 Conference Paper

Language Pre-training Guided Masking Representation Learning for Time Series Classification

  • Liaoyuan Tang
  • Zheng Wang
  • Jie Wang
  • Guanxiong He
  • Zhezheng Hao
  • Rong Wang
  • Feiping Nie

The representation learning of time series has a wide range of downstream tasks and applications in many practical scenarios. However, due to the complexity, spatiotemporality, and continuity of sequential stream data, compared with the representation learning of structural data such as images/videos, the time series self-supervised representation learning is even more challenging. Besides, the direct application of existing contrastive learning and masked autoencoder based approaches to time series representation learning encounters inherent theoretical limitations, such as ineffective augmentation and masking strategies. To this end, we propose a Language Pre-training guided Masking Representation Learning (LPMRL) for times series classification. Specifically, we first propose a novel language pre-training guided masking encoder for adaptively sampling semantic spatiotemporal patches via natural language descriptions and improving the discriminability of latent representations. Furthermore, we present the dual-information contrastive learning mechanism to explore both local and global information by meticulously designing high-quality hard negative samples of time series data samples. As a result, we also design various experiments, such as visualization of masking position and distribution and reconstruction error to verify the reasonability of proposed language guided masking technique. Last, we evaluate the performance of proposed representation learning via classification task conducted on 106 time series datasets, which demonstrates the effectiveness of proposed method.

NeurIPS Conference 2025 Conference Paper

LogicTree: Improving Complex Reasoning of LLMs via Instantiated Multi-step Synthetic Logical Data

  • Zehao Wang
  • Lin Yang
  • Jie Wang
  • Kehan Wang
  • Hanzhu Chen
  • Bin Wang
  • Jianye Hao
  • Defu Lian

Despite their remarkable performance on various tasks, Large Language Models (LLMs) still struggle with logical reasoning, particularly in complex and multi-step reasoning processes. Among various efforts to enhance LLMs' reasoning capabilities, synthesizing large-scale, high-quality logical reasoning datasets has emerged as a promising direction. However, existing methods often rely on predefined templates for logical reasoning data generation, limiting their adaptability to real-world scenarios. To address the limitation, we propose LogicTree, a novel framework for efficiently synthesizing multi-step logical reasoning dataset that excels in both complexity and instantiation. By iteratively searching for applicable logic rules based on structural pattern matching to perform backward deduction, LogicTree constructs multi-step logic trees that capture complex reasoning patterns. Furthermore, we employ a two-stage LLM-based approach to instantiate various real-world scenarios for each logic tree, generating consistent real-world reasoning processes that carry contextual significance. This helps LLMs develop generalizable logical reasoning abilities across diverse scenarios rather than merely memorizing templates. Experiments on multiple benchmarks demonstrate that our approach achieves an average improvement of 9. 4\% in accuracy on complex logical reasoning tasks.

NeurIPS Conference 2025 Conference Paper

MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds

  • Bingquan Dai
  • Luo Li
  • Qihong Tang
  • Jie Wang
  • Xinyu Lian
  • Hao Xu
  • Minghan Qin
  • Xudong XU

Reconstructing 3D objects into editable programs is pivotal for applications like reverse engineering and shape editing. However, existing methods often rely on limited domain-specific languages (DSLs) and small-scale datasets, restricting their ability to model complex geometries and structures. To address these challenges, we introduce MeshLLM, a novel framework that reconstructs complex 3D objects from point clouds into editable Blender Python scripts. We develop a comprehensive set of expressive Blender Python APIs capable of synthesizing intricate geometries. Leveraging these APIs, we construct a large-scale paired object-code dataset, where the code for each object is decomposed into distinct semantic parts. Subsequently, we train a multimodal large language model (LLM) that translates 3D point cloud into executable Blender Python scripts. Our approach not only achieves superior performance in shape-to-code reconstruction tasks but also facilitates intuitive geometric and topological editing through convenient code modifications. Furthermore, our code-based representation enhances the reasoning capabilities of LLMs in 3D shape understanding tasks. Together, these contributions establish MeshLLM as a powerful and flexible solution for programmatic 3D shape reconstruction and understanding.

NeurIPS Conference 2025 Conference Paper

Mixture-of-Experts Operator Transformer for Large-Scale PDE Pre-Training

  • Hong Wang
  • Haiyang Xin
  • Jie Wang
  • Xuanze Yang
  • Fei Zha
  • huanshuo dong
  • Yan Jiang

Pre-training has proven effective in addressing data scarcity and performance limitations in solving PDE problems with neural operators. However, challenges remain due to the heterogeneity of PDE datasets in equation types, which leads to high errors in mixed training. Additionally, dense pre-training models that scale parameters by increasing network width or depth incur significant inference costs. To tackle these challenges, we propose a novel M ixture- o f- E xperts P re-training O perator T ransformer ( MoE-POT ), a sparse-activated architecture that scales parameters efficiently while controlling inference costs. Specifically, our model adopts a layer-wise router-gating network to dynamically select 4 routed experts from 16 expert networks during inference, enabling the model to focus on equation-specific features. Meanwhile, we also integrate 2 shared experts, aiming to capture common properties of PDE and reduce redundancy among routed experts. The final output is computed as the weighted average of the results from all activated experts. We pre-train models with parameters from 30M to 0. 5B on 6 public PDE datasets. Our model with 90M activated parameters achieves up to a 40\% reduction in zero-shot error compared with existing models with 120M activated parameters. Additionally, we conduct interpretability analysis, showing that dataset types can be inferred from router-gating network decisions, which validates the rationality and effectiveness of the MoE architecture.

NeurIPS Conference 2025 Conference Paper

OptiTree: Hierarchical Thoughts Generation with Tree Search for LLM Optimization Modeling

  • Haoyang Liu
  • Jie Wang
  • Yuyang Cai
  • Xiongwei Han
  • Yufei Kuang
  • Jianye Hao

Optimization modeling is one of the most crucial but technical parts of operations research (OR). To automate the modeling process, existing works have leveraged large language models (LLMs), prompting them to break down tasks into steps for generating variables, constraints, and objectives. However, due to the highly complex mathematical structures inherent in OR problems, standard fixed-step decomposition often fails to achieve high performance. To address this challenge, we introduce OptiTree, a novel tree search approach designed to enhance modeling capabilities for complex problems through adaptive problem decomposition into simpler subproblems. Specifically, we develop a modeling tree that organizes a wide range of OR problems based on their hierarchical problem taxonomy and complexity, with each node representing a problem category and containing relevant high-level modeling thoughts. Given a problem to model, we recurrently search the tree to identify a series of simpler subproblems and synthesize the global modeling thoughts by adaptively integrating the hierarchical thoughts. Experiments show that OptiTree significantly improves the modeling accuracy compared to the state-of-the-art, achieving over 10% improvements on the challenging benchmarks.

NeurIPS Conference 2025 Conference Paper

Real-World Reinforcement Learning of Active Perception Behaviors

  • Edward Hu
  • Jie Wang
  • Xingfang Yuan
  • Fiona Luo
  • Muyao Li
  • Gaspard Lambrechts
  • Oleh Rybkin
  • Dinesh Jayaraman

A robot's instantaneous sensory observations do not always reveal task-relevant state information. Under such partial observability, optimal behavior typically involves explicitly acting to gain the missing information. Today's standard robot learning techniques struggle to produce such active perception behaviors. We propose a simple real-world robot learning recipe to efficiently train active perception policies. Our approach, asymmetric advantage weighted regression (AAWR), exploits access to "privileged" extra sensors at training time. The privileged sensors enable training high-quality privileged value functions that aid in estimating the advantage of the target policy. Bootstrapping from a small number of potentially suboptimal demonstrations and an easy-to-obtain coarse policy initialization, AAWR quickly acquires active perception behaviors and boosts task performance. In evaluations on 8 manipulation tasks on 3 robots spanning varying degrees of partial observability, AAWR synthesizes reliable active perception behaviors that outperform all prior approaches. When initialized with a "generalist" robot policy that struggles with active perception tasks, AAWR efficiently generates information-gathering behaviors that allow it to operate under severe partial observability for manipulation tasks. Website: https: //penn-pal-lab. github. io/aawr/

NeurIPS Conference 2025 Conference Paper

RoME: Domain-Robust Mixture-of-Experts for MILP Solution Prediction across Domains

  • Tianle Pu
  • Zijie Geng
  • Haoyang Liu
  • Shixuan Liu
  • Jie Wang
  • Li Zeng
  • Chao Chen
  • Changjun Fan

Mixed-Integer Linear Programming (MILP) is a fundamental and powerful framework for modeling complex optimization problems across diverse domains. Recently, learning-based methods have shown great promise in accelerating MILP solvers by predicting high-quality solutions. However, most existing approaches are developed and evaluated in single-domain settings, limiting their ability to generalize to unseen problem distributions. This limitation poses a major obstacle to building scalable and general-purpose learning-augmented solvers. To address this challenge, we introduce RoME, a domain-Robust Mixture-of-Experts (MoE) framework for predicting MILP solutions across domains. RoME dynamically routes problem instances to specialized experts based on learned task embeddings. The model is trained using a two-level distributionally robust optimization strategy: inter-domain to mitigate global shifts across domains, and intra-domain to enhance local robustness by introducing perturbations on task embeddings. We reveal that cross-domain training not only enhances the model's generalization capability to unseen domains but also improves performancewithin each individual domain by encouraging the model to capture more general intrinsic combinatorial patterns. Specifically, a single RoME model trained on three domains achieves an average improvement of $67. 7\%$ then evaluated on five diverse domains. We further test the pretrained model on MIPLIB in a zero-shot setting, demonstrating its ability to deliver measurable performance gains on challenging real-world instances where existing learning-based approaches often struggle to generalize.

JBHI Journal 2025 Journal Article

Short-Term Longitudinal Study on Brain Network Informatics of Stroke Patients Under Acupuncture and Motor Imagery Intervention

  • Jing Qu
  • Yijun Du
  • Jing Jing
  • Jie Wang
  • Lingguo Bu
  • Yonghui Wang

Objective: The quest for scientifically effective rehabilitation methods for stroke recovery constitutes an urgent need. However, due to the inadequacies of longitudinal studies and multimodal assessment methods, the rehabilitation mechanisms of methods such as Acupuncture Treatment (AT) and Motor Imagery (MI) remain unclear. Consequently, this study presents both AT and Acupuncture Synchronized with MI (ASMI) therapies, utilizing a combination of subjective and objective approaches to evaluate the long-term impacts of these two treatment modalities. Methods: A longitudinal design was adopted for a duration of two weeks. Clinical improvement in patients was assessed using scale data, while Functional Near-infrared spectroscopy (fNIRS) and Electroencephalogram (EEG) data were collected to analyze changes in brain function. This study proposed the Cluster-Span Threshold for Directed Networks (CSTDN) algorithm for identifying key connections within the brain network and conducted in-depth analysis using graph theory metrics. Results: Scale data indicated improvements in behavioral capabilities in both groups post-treatment. EEG and fNIRS data revealed significant variations in specific frequency bands between the two groups. Conclusion: This study not only validates the efficacy of AT and ASMI in stroke rehabilitation but also unveils the underlying neurobiological mechanisms through multimodal data analysis. The proposed CSTDN algorithm and graph theory analysis offer new perspectives for understanding changes in the brain network. Significance: This research contributes to the optimization of future rehabilitation treatment strategies and the formulation of personalized treatment plans.

NeurIPS Conference 2025 Conference Paper

STNet: Spectral Transformation Network for Solving Operator Eigenvalue Problem

  • Hong Wang
  • Yixuan Jiang
  • Jie Wang
  • Xinyi Li
  • Jian Luo
  • huanshuo dong

Operator eigenvalue problems play a critical role in various scientific fields and engineering applications, yet numerical methods are hindered by the curse of dimensionality. Recent deep learning methods provide an efficient approach to address this challenge by iteratively updating neural networks. These methods' performance relies heavily on the spectral distribution of the given operator: larger gaps between the operator's eigenvalues will improve precision, thus tailored spectral transformations that leverage the spectral distribution can enhance their performance. Based on this observation, we propose the S pectral T ransformation Net work ( STNet ). During each iteration, STNet uses approximate eigenvalues and eigenfunctions to perform spectral transformations on the original operator, turning it into an equivalent but easier problem. Specifically, we employ deflation projection to exclude the subspace corresponding to already solved eigenfunctions, thereby reducing the search space and avoiding converging to existing eigenfunctions. Additionally, our filter transform magnifies eigenvalues in the desired region and suppresses those outside, further improving performance. Extensive experiments demonstrate that STNet consistently outperforms existing learning-based methods, achieving state-of-the-art performance in accuracy.

NeurIPS Conference 2025 Conference Paper

SymMaP: Improving Computational Efficiency in Linear Solvers through Symbolic Preconditioning

  • Hong Wang
  • Jie Wang
  • Minghao Ma
  • Haoran Shao
  • Haoyang Liu

Matrix preconditioning is a critical technique to accelerate the solution of linear systems, where performance heavily depends on the selection of preconditioning parameters. Traditional parameter selection approaches often define fixed constants for specific scenarios. However, they rely on domain expertise and fail to consider the instance-wise features for individual problems, limiting their performance. In contrast, machine learning (ML) approaches, though promising, are hindered by high inference costs and limited interpretability. To combine the strengths of both approaches, we propose a symbolic discovery framework—namely, Sym bolic Ma trix P reconditioning ( SymMaP )—to learn efficient symbolic expressions for preconditioning parameters. Specifically, we employ a neural network to search the high-dimensional discrete space for expressions that can accurately predict the optimal parameters. The learned expression allows for high inference efficiency and excellent interpretability (expressed in concise symbolic formulas), making it simple and reliable for deployment. Experimental results show that SymMaP consistently outperforms traditional strategies across various benchmarks.

EAAI Journal 2025 Journal Article

Symmetric non-negative matrix factorization-based deep representation algorithm for multi-view clustering

  • Ping Deng
  • Xinying Zhou
  • Ji Xu
  • Wei Huang
  • Jie Wang
  • Dexian Wang
  • Tianrui Li

Symmetric Non-negative Matrix Factorization (SNMF) shows significant advantages in clustering task due to its unique mathematical properties. However, it still has several key limitations: (1) the single optimization scheme of traditional multiplicative update rule limits the flexibility of the algorithm; (2) linear factorization leads to insufficient representation ability for complex nonlinear features; (3) lack of learning rate guidance mechanism. These factors together constrain the algorithm representation learning ability in complex data. To address these issues, this paper proposes a SNMF-based Deep Representation algorithm for Multi-view Clustering (SNDRMvC). First, the matrix elements are decoupled, and the stochastic gradient descent as well as nonlinear activation function are used to implement non-negative matrix update. Then, based on the corresponding gradients of the elements and nonlinear function, the neural network learning mechanism is introduced into the SNMF update rule to construct a novel framework SNMF-based deep representation network for optimizing SNMF. This network aims to update the elements in the low-dimensional matrix of each view and fuse the low-dimensional matrices of multiple views to derive a consensus matrix. Finally, extensive experiments conducted on several public datasets demonstrate that the proposed algorithm exhibits notable advantages in clustering performance. We provide the code at: https: //github. com/Code706/SNDRMvC.

IJCAI Conference 2025 Conference Paper

Uncertainty-guided Graph Contrastive Learning from a Unified Perspective

  • Zhiqiang Li
  • Jie Wang
  • Jianqing Liang
  • Junbiao Cui
  • Xingwang Zhao
  • Jiye Liang

The success of current graph contrastive learning methods largely relies on the choice of data augmentation and contrastive objectives. However, most existing methods tend to optimize these two components independently, neglecting their potential interplay, which leads to suboptimal quality of the learned embeddings. To address this issue, we propose Uncertainty-guided Graph Contrastive Learning (UGCL) from a unified perspective. The core of our method is the introduction of sample uncertainty, a critical metric that quantifies the degree of class ambiguity within individual samples. On this basis, we design a novel multi-scale data augmentation strategy and a weighted graph contrastive loss function, both of which significantly enhance the quality of embeddings. Theoretically, we demonstrate that UGCL can coordinate overall optimization objectives through uncertainty, and through experiments, we show that it improves the performance of tasks such as node classification, node clustering, and link prediction, thereby verifying the effectiveness of our method.

UAI Conference 2025 Conference Paper

VADIS: Investigating Inter-View Representation Biases for Multi-View Partial Multi-Label Learning

  • Jie Wang
  • Ning Xu 0009
  • Xin Geng 0001

Multi-view partial multi-label learning (MVPML) deals with training data where each example is represented by multiple feature vectors and associated with a set of candidate labels, only a subset of which are correct. The diverse representation biases present in different views complicate the annotation process in MVPML, leading to the inclusion of incorrect labels in the candidate label set. Existing methods typically merge features from different views to identify the correct labels in the training data without addressing the representation biases inherent in different views. In this paper, we propose a novel MVPML method called \textsc{Vadis}, which investigates view-aware representations for disambiguation and predictive model learning. Specifically, we exploit the global common representation shared by all views, aligning it with a local semantic similarity matrix to estimate ground-truth labels via a low-rank mapping matrix. Additionally, to identify incorrect labels, the view-specific inconsistent representation is recovered by leveraging the sparsity assumption. Experiments on real-world datasets validate the superiority of our approach over other state-of-the-art methods.

ICRA Conference 2025 Conference Paper

ZeroMimic: Distilling Robotic Manipulation Skills from Web Videos

  • Junyao Shi
  • Zhuolun Zhao
  • Tianyou Wang
  • Ian Pedroza
  • Amy Luo
  • Jie Wang
  • Yecheng Jason Ma 0001
  • Dinesh Jayaraman

Many recent advances in robotic manipulation have come through imitation learning, yet these rely largely on mimicking a particularly hard-to-acquire form of demonstrations: those collected on the same robot in the same room with the same objects as the trained policy must handle at test time. In contrast, large pre-recorded human video datasets demonstrating manipulation skills in-the-wild already exist, which contain valuable information for robots. Is it possible to distill a repository of useful robotic skill policies out of such data without any additional requirements on robot-specific demonstrations or exploration? We present the first such system ZeroMimic, that generates immediately deployable image goal-conditioned skill policies for several common categories of manipulation tasks (opening, closing, pouring, pick&place, cutting, and stirring) each capable of acting upon diverse objects and across diverse unseen task setups. ZeroMimic is carefully designed to exploit recent advances in semantic and geometric visual understanding of human videos, together with modern grasp affordance detectors and imitation policy classes. After training ZeroMimic on the popular EpicKitchens dataset of ego-centric human videos, we evaluate its out-of-the-box performance in varied real-world and simulated kitchen settings with two different robot embodiments, demonstrating its impressive abilities to handle these varied tasks. To enable plug-and-play reuse of ZeroMimic policies on other task setups and robots, we release software and policy checkpoints of our skill policies.

EAAI Journal 2024 Journal Article

A deep evidence fusion framework for apple leaf disease classification

  • Hang Wang
  • Jiaxu Zhang
  • Zhu Yin
  • Liucheng Huang
  • Jie Wang
  • Xiaojian Ma

Apple leaf disease is one of the main culprits of apple yield reduction. The accurate classification of apple leaf diseases is essential to reduce economic losses. However, current studies face the challenge that when diseases have similar visual symptoms, they are difficult to distinguish. This study provides a new solution from the perspective of multi-source evidence fusion. Specifically, we propose a deep evidence fusion framework using both multi-saliency map in Hue Saturation Value (HSV) color space and a belief Cauchy–Schwarz divergence. Then, a new evidence fusion method based on the belief Cauchy–Schwarz is proposed, which fills the gap between evidence theory and apple leaf disease classification. Experimental results present that the proposed method can boost the accuracy of particularly classification backbone networks, which achieves the best performance with 98. 1% in EfficientNetV2-S network and the highest improvement with 4. 8% in Van-T networks. In addition, a series of experiments are implemented to evaluate the proposal’s effectiveness and superiority. The proposed method is a suitable alternative for classifying apple leaf diseases with similar visual symptoms, and in the future, more plant diseases will be extended to this fusion framework as well.

EAAI Journal 2024 Journal Article

Development of data-knowledge-driven predictive model and multi-objective optimization for intelligent optimal control of aluminum electrolysis process

  • Jie Wang
  • Yongfang Xie
  • Shiwen Xie
  • Xiaofang Chen

Operational optimization of the Hall-Héroult cell is essential for achieving high efficiency and cost-effectiveness in the aluminum electrolysis process. Due to the complicated mechanism and variable working conditions, manual operational decision-making is extensively used in practice. They challenge the reliable and optimal operation of aluminum electrolysis process. In this paper, we develop a data-knowledge-driven decision-making support system (DMSS) to achieve operational optimization for the aluminum electrolysis process. DMSS consists of a prediction model, a multi-objective optimizer, and a knowledge-guided decision-making module. Specifically, we propose a working-conditions-based attention with the exogeneous inputs auto-regressive neural network (WCA-NARX) to construct a data-driven heat balance indicator (HBI) prediction model, where the working condition-related variables serve as covariates to enhance predictability. In addition, the designed structure of introducing working condition information through an attention mechanism can decouple covariates from operational variables and autoregressive variables, facilitating subsequent operational optimization. Then, a novel knowledge-assigned reference vector evolutionary algorithm (KRVEA) is designed to solve the multi-objective optimization problem of the aluminum electrolysis process, in which Pareto front solutions can be solved in the preferred region. Finally, we utilize the knowledge base that stores historical optimization cases to make decisions regarding the selection of a practical-requirement-based control scheme from the Pareto set. Real-world industrial experiments demonstrate that DMSS can effectively enhance control performance and achieve superior results compared to other competitive methods. The source code is available at https: //github. com/wjiecsu/WCA-NARX.

IROS Conference 2024 Conference Paper

Efficient-PIP: Large-scale Pixel-level Aligned Image Pair Generation for Cross-time Infrared-RGB Translation

  • Jian Li 0003
  • Kexin Fei
  • Yi Sun
  • Jie Wang
  • Bokai Liu
  • Zongtan Zhou
  • Yongbin Zheng
  • Zhenping Sun

Generative models are gaining momentum in both academic and industrial applications driven by the availability of large-scale datasets, especially in tasks involving Image-to-Image Translation. Meanwhile, poor human perception of nighttime environment has led to a demand for translation from night-vision infrared to day-vision RGB images. However, collecting such cross-modal training data at the same time is impossible due to the thermal imaging properties of infrared cameras, the challenge lies in constructing image pairs during the day and at night respectively, where the requirement for data alignment poses significant difficulties. In this paper, we propose a Pixel-level aligned Image Pair generation framework PIP to explore efficient colorization of high-resolution infrared images. Specifically, we first construct a 3D high-precision point cloud map for the purpose of establishing the correlation between day and night scenes. Corresponding point clouds of modal images are collected simultaneously during data acquisition to obtain image sensor poses by Global Matching with the map, which allows us to calculate the transformation relationship from infrared to RGB image coordinate systems based on the sensor parameters and depth information of the map. Leveraging the relationship, the pixel values of RGB image is projected onto the infrared image followed by optimization as the colored image. Accordingly, we present a dataset NUDT-PIP, the first of its kind containing large-scale pixel-level aligned cross-time infrared-RGB image pairs of complicated real road scenes. Experimental results demonstrate the reliability and strong applicability of our dataset in Image-to-Image Translation. Our code will be released at https://github.com/wjjjjyourFA/NUDT-PIP.

NeurIPS Conference 2024 Conference Paper

FUSU: A Multi-temporal-source Land Use Change Segmentation Dataset for Fine-grained Urban Semantic Understanding

  • Shuai Yuan
  • Guancong Lin
  • Lixian Zhang
  • Runmin Dong
  • Jinxiao Zhang
  • Shuang Chen
  • Juepeng Zheng
  • Jie Wang

Fine urban change segmentation using multi-temporal remote sensing images is essential for understanding human-environment interactions in urban areas. Although there have been advances in high-quality land cover datasets that reveal the physical features of urban landscapes, the lack of fine-grained land use datasets hinders a deeper understanding of how human activities are distributed across landscapes and the impact of these activities on the environment, thus constraining proper technique development. To address this, we introduce FUSU, the first fine-grained land use change segmentation dataset for Fine-grained Urban Semantic Understanding. FUSU features the most detailed land use classification system to date, with 17 classes and 30 billion pixels of annotations. It includes bi-temporal high-resolution satellite images with 0. 2-0. 5 m ground sample distance and monthly optical and radar satellite time series, covering 847 km^2 across five urban areas in the southern and northern of China with different geographical features. The fine-grained land use pixel-wise annotations and high spatial-temporal resolution data provide a robust foundation for developing proper deep learning models to provide contextual insights on human activities and urbanization. To fully leverage FUSU, we propose a unified time-series architecture for both change detection and segmentation. We benchmark FUSU on various methods for several tasks. Dataset and code are available at: https: //github. com/yuanshuai0914/FUSU.

AAAI Conference 2024 Conference Paper

Learning to Stop Cut Generation for Efficient Mixed-Integer Linear Programming

  • Haotian Ling
  • Zhihai Wang
  • Jie Wang

Cutting planes (cuts) play an important role in solving mixed-integer linear programs (MILPs), as they significantly tighten the dual bounds and improve the solving performance. A key problem for cuts is when to stop cuts generation, which is important for the efficiency of solving MILPs. However, many modern MILP solvers employ hard-coded heuristics to tackle this problem, which tends to neglect underlying patterns among MILPs from certain applications. To address this challenge, we formulate the cuts generation stopping problem as a reinforcement learning problem and propose a novel hybrid graph representation model (HYGRO) to learn effective stopping strategies. An appealing feature of HYGRO is that it can effectively capture both the dynamic and static features of MILPs, enabling dynamic decision-making for the stopping strategies. To the best of our knowledge, HYGRO is the first data-driven method to tackle the cuts generation stopping problem. By integrating our approach with modern solvers, experiments demonstrate that HYGRO significantly improves the efficiency of solving MILPs compared to competitive baselines, achieving up to 31% improvement.

NeurIPS Conference 2024 Conference Paper

MILP-StuDio: MILP Instance Generation via Block Structure Decomposition

  • Haoyang Liu
  • Jie Wang
  • Wanbo Zhang
  • Zijie Geng
  • Yufei Kuang
  • Xijun Li
  • Yongdong Zhang
  • Bin Li

Mixed-integer linear programming (MILP) is one of the most popular mathematical formulations with numerous applications. In practice, improving the performance of MILP solvers often requires a large amount of high-quality data, which can be challenging to collect. Researchers thus turn to generation techniques to generate additional MILP instances. However, existing approaches do not take into account specific block structures—which are closely related to the problem formulations—in the constraint coefficient matrices (CCMs) of MILPs. Consequently, they are prone to generate computationally trivial or infeasible instances due to the disruptions of block structures and thus problem formulations. To address this challenge, we propose a novel MILP generation framework, called Block Structure Decomposition (MILP-StuDio), to generate high-quality instances by preserving the block structures. Specifically, MILP-StuDio begins by identifying the blocks in CCMs and decomposing the instances into block units, which serve as the building blocks of MILP instances. We then design three operators to construct new instances by removing, substituting, and appending block units in the original instances, enabling us to generate instances with flexible sizes. An appealing feature of MILP-StuDio is its strong ability to preserve the feasibility and computational hardness of the generated instances. Experiments on the commonly-used benchmarks demonstrate that using instances generated by MILP-StuDio is able to significantly reduce over 10% of the solving time for learning-based solvers.

NeurIPS Conference 2024 Conference Paper

Neural Krylov Iteration for Accelerating Linear System Solving

  • Jian Luo
  • Jie Wang
  • Hong Wang
  • huanshuo dong
  • Zijie Geng
  • Hanzhu Chen
  • Yufei Kuang

Solving large-scale sparse linear systems is essential in fields like mathematics, science, and engineering. Traditional numerical solvers, mainly based on the Krylov subspace iteration algorithm, suffer from the low-efficiency problem, which primarily arises from the less-than-ideal iteration. To tackle this problem, we propose a novel method, namely Neur al K rylov It era t ion ( NeurKItt ), for accelerating linear system solving. Specifically, NeurKItt employs a neural operator to predict the invariant subspace of the linear system and then leverages the predicted subspace to accelerate linear system solving. To enhance the subspace prediction accuracy, we utilize QR decomposition for the neural operator outputs and introduce a novel projection loss function for training. NeurKItt benefits the solving by using the predicted subspace to guide the iteration process, significantly reducing the number of iterations. We provide extensive experiments and comprehensive theoretical analyses to demonstrate the feasibility and efficiency of NeurKItt. In our main experiments, NeurKItt accelerates the solving of linear systems across various settings and datasets, achieving up to a 5. 5× speedup in computation time and a 16. 1× speedup in the number of iterations.

EAAI Journal 2024 Journal Article

Non-autoregressive transformer with fine-grained optimization for user-specified indoor layout

  • Chao Song
  • Jie Wang
  • Shujie Chen
  • Haidong Li
  • Zhaoyi Jiang
  • Bailin Yang

We present a novel framework that can generate plausible and diverse 3D (3 Dimensions) indoor scenes based on room floor plans and user-specified layout objects. In the framework, we construct a generative neural network with a non-autoregressive transformer to generate a reasonable distribution of layout objects, and then apply a fine-grained optimization process to adjust the sampled layout objects to optimal positions. Our non-autoregressive generative network addresses the issue of error chain accumulation, and the fine-grained optimization mitigates potential small collisions between layout objects. Furthermore, we trained the generative network using a publicly labeled 3D indoor dataset without additional manual processing, and provide detailed information on the capabilities of the generative network and the fine-grained optimization scheme. Extensive experiments demonstrate that our framework outperforms existing methods in learning the relationships between layout objects and layout rationality.

EAAI Journal 2024 Journal Article

Pos-DANet: A dual-branch awareness network for small object segmentation within high-resolution remote sensing images

  • Qianpeng Chong
  • Mengying Ni
  • Jianjun Huang
  • Zongbao Liang
  • Jie Wang
  • Ziyi Li
  • Jindong Xu

The more detailed and accurate earth observation has been made driven by the progress of satellites and sensors optical photography technology, which poses both an opportunity and a challenge to small object segmentation task. However, the inherent difficulty and inadequate consideration still make small object segmentation task inevitably encounter a performance gain bottleneck. We analyze the longstanding but underestimated challenges in this task and give a peer-to-peer solution to response them. Specifically, we design a dual-branch awareness structure dedicated to small object segmentation, named Pos-DANet, which is composed with a small object activation branch and a fuzzy refinement branch. The small object activation branch is used to aware the small objects and avoid the negative influence of redundant background. The fuzzy refinement branch utilizes the fuzzy modeling to improve the segmentation accuracy of small objects. These two branches work collaboratively to make the whole structure to focus more on small objects and achieve satisfying segmentation results. Finally, we propose a hierarchical unbiased loss to eliminate the bias against small objects in the regression process. Extensive experiments demonstrated that Pos-DANet exhibits a higher qualitative and quantitative performance than the advanced methods within small objects, which achieves the best results in mIoU (71. 12 %, 83. 33 %) and sIoU (63. 23 %, 68. 89 %) on two datasets.

EAAI Journal 2024 Journal Article

Short-term wind power prediction framework using numerical weather predictions and residual convolutional long short-term memory attention network

  • Chenlei Xie
  • Xuelei Yang
  • Tao Chen
  • Qiansheng Fang
  • Jie Wang
  • Yan Shen

As a prominent global source of renewable energy, wind power generation had been experiencing rapid growth. The more precise prediction of short-term wind power was essential to ensure the stable and cost-effective operation of power systems. In response, a wind power prediction framework using numerical weather predictions (NWPs) and Residual Convolutional Long Short-Term Memory Attention (Res-ConvLSTM-Attention) network was proposed in this study. Addressing the issue of significant errors in individual NPW, Weighted Naive Bayes (WNB) model and Multivariate Quadratic Nonlinear Regression (NR) model were employed to fuse the four NWPs wind speed and direction characteristics respectively, aiming to obtain more accurate weather forecast data. Given the difficulty in accurately predicting due to the randomness of wind power, a Res-ConvLSTM-Attention network was proposed for short-term wind power prediction. The Res-ConvLSTM unit extracted deep spatiotemporal features while effectively alleviating network degradation and gradient vanishing issues caused by network deepening. The Attention unit allocated higher weights to key features, and their combination enhanced the accuracy of wind power prediction. Finally, using the data provided by Challenge Data for experimental analysis, the results showed that the mean absolute error (MAE), root mean square error (RMSE), mean arctangent absolute percentage error (MAAPE) and coefficient of determination (R2) value were 0. 0758, 0. 1163, 0. 4364 and 0. 946, affirming the effectiveness of the wind power prediction framework.

NeurIPS Conference 2024 Conference Paper

Target-Guided Adversarial Point Cloud Transformer Towards Recognition Against Real-world Corruptions

  • Jie Wang
  • Tingfa Xu
  • Lihe Ding
  • Jianan Li

Achieving robust 3D perception in the face of corrupted data presents an challenging hurdle within 3D vision research. Contemporary transformer-based point cloud recognition models, albeit advanced, tend to overfit to specific patterns, consequently undermining their robustness against corruption. In this work, we introduce the Target-Guided Adversarial Point Cloud Transformer, termed APCT, a novel architecture designed to augment global structure capture through an adversarial feature erasing mechanism predicated on patterns discerned at each step during training. Specifically, APCT integrates an Adversarial Significance Identifier and a Target-guided Promptor. The Adversarial Significance Identifier, is tasked with discerning token significance by integrating global contextual analysis, utilizing a structural salience index algorithm alongside an auxiliary supervisory mechanism. The Target-guided Promptor, is responsible for accentuating the propensity for token discard within the self-attention mechanism, utilizing the value derived above, consequently directing the model attention towards alternative segments in subsequent stages. By iteratively applying this strategy in multiple steps during training, the network progressively identifies and integrates an expanded array of object-associated patterns. Extensive experiments demonstrate that our method achieves state-of-the-art results on multiple corruption benchmarks.

NeurIPS Conference 2024 Conference Paper

Towards Next-Generation Logic Synthesis: A Scalable Neural Circuit Generation Framework

  • Zhihai Wang
  • Jie Wang
  • Qingyue Yang
  • Yinqi Bai
  • Xing Li
  • Lei Chen
  • Jianye Hao
  • Mingxuan Yuan

Logic Synthesis (LS) aims to generate an optimized logic circuit satisfying a given functionality, which generally consists of circuit translation and optimization. It is a challenging and fundamental combinatorial optimization problem in integrated circuit design. Traditional LS approaches rely on manually designed heuristics to tackle the LS task, while machine learning recently offers a promising approach towards next-generation logic synthesis by neural circuit generation and optimization. In this paper, we first revisit the application of differentiable neural architecture search (DNAS) methods to circuit generation and found from extensive experiments that existing DNAS methods struggle to exactly generate circuits, scale poorly to large circuits, and exhibit high sensitivity to hyper-parameters. Then we provide three major insights for these challenges from extensive empirical analysis: 1) DNAS tends to overfit to too many skip-connections, consequently wasting a significant portion of the network's expressive capabilities; 2) DNAS suffers from the structure bias between the network architecture and the circuit inherent structure, leading to inefficient search; 3) the learning difficulty of different input-output examples varies significantly, leading to severely imbalanced learning. To address these challenges in a systematic way, we propose a novel regularized triangle-shaped circuit network generation framework, which leverages our key insights for completely accurate and scalable circuit generation. Furthermore, we propose an evolutionary algorithm assisted by reinforcement learning agent restarting technique for efficient and effective neural circuit optimization. Extensive experiments on four different circuit benchmarks demonstrate that our method can precisely generate circuits with up to 1200 nodes. Moreover, our synthesized circuits significantly outperform the state-of-the-art results from several competitive winners in IWLS 2022 and 2023 competitions.

NeurIPS Conference 2024 Conference Paper

Uncertainty-based Offline Variational Bayesian Reinforcement Learning for Robustness under Diverse Data Corruptions

  • Rui Yang
  • Jie Wang
  • Guoping Wu
  • Bin Li

Real-world offline datasets are often subject to data corruptions (such as noise or adversarial attacks) due to sensor failures or malicious attacks. Despite advances in robust offline reinforcement learning (RL), existing methods struggle to learn robust agents under high uncertainty caused by the diverse corrupted data (i. e. , corrupted states, actions, rewards, and dynamics), leading to performance degradation in clean environments. To tackle this problem, we propose a novel robust variational Bayesian inference for offline RL (TRACER). It introduces Bayesian inference for the first time to capture the uncertainty via offline data for robustness against all types of data corruptions. Specifically, TRACER first models all corruptions as the uncertainty in the action-value function. Then, to capture such uncertainty, it uses all offline data as the observations to approximate the posterior distribution of the action-value function under a Bayesian inference framework. An appealing feature of TRACER is that it can distinguish corrupted data from clean data using an entropy-based uncertainty measure, since corrupted data often induces higher uncertainty and entropy. Based on the aforementioned measure, TRACER can regulate the loss associated with corrupted data to reduce its influence, thereby enhancing robustness and performance in clean environments. Experiments demonstrate that TRACER significantly outperforms several state-of-the-art approaches across both individual and simultaneous data corruptions.

EAAI Journal 2024 Journal Article

Unsupervised heat balance indicator construction based on variational autoencoder and its application to aluminum electrolysis process monitoring

  • Jie Wang
  • Shiwen Xie
  • Yongfang Xie
  • Xiaofang Chen

Heat balance plays a significant role in reflecting the health state of aluminum electrolysis process (AEP). However, current methods hardly take into consideration the quantitative Heat Balance Indicator (HBI) construction by using the unlabeled data. In addition, it is limited to construct HBI by learning the complex relationship between degraded features and large-scale HBI labels in a supervised manner, because the labeled data are scarce and annotations are expensive in practice. To quantitatively construct HBI by using the unlabeled data, this paper proposes an unsupervised HBI construction method based on variational autoencoder (VAE). Firstly, we propose fuzzy evaluation strategy to estimate the tendency of cell temperature to highlight the trend of heat balance. Rather than simply using the latent features, we extract the feature representation of the heat balance state considering not only the latent features but also the reconstruction error. Finally, HBI is constructed by calculating the distance between the features representation of normal heat balance and degraded state. The applications of heat balance monitoring in a real-world aluminum electrolysis plant are performed to verify its effectiveness. The experimental results demonstrate that our proposed HBI construction method can better represent heat balance state of AEP, the average fault detection rate can achieve 80% for the monitoring electrolytic cells, increasing by more than 3% compared with these traditional monitoring statistics.

NeurIPS Conference 2023 Conference Paper

A Deep Instance Generative Framework for MILP Solvers Under Limited Data Availability

  • Zijie Geng
  • Xijun Li
  • Jie Wang
  • Xiao Li
  • Yongdong Zhang
  • Feng Wu

In the past few years, there has been an explosive surge in the use of machine learning (ML) techniques to address combinatorial optimization (CO) problems, especially mixed-integer linear programs (MILPs). Despite the achievements, the limited availability of real-world instances often leads to sub-optimal decisions and biased solver assessments, which motivates a suite of synthetic MILP instance generation techniques. However, existing methods either rely heavily on expert-designed formulations or struggle to capture the rich features of real-world instances. To tackle this problem, we propose G2MILP, the first deep generative framework for MILP instances. Specifically, G2MILP represents MILP instances as bipartite graphs, and applies a masked variational autoencoder to iteratively corrupt and replace parts of the original graphs to generate new ones. The appealing feature of G2MILP is that it can learn to generate novel and realistic MILP instances without prior expert-designed formulations, while preserving the structures and computational hardness of real-world datasets, simultaneously. Thus the generated instances can facilitate downstream tasks for enhancing MILP solvers under limited data availability. We design a suite of benchmarks to evaluate the quality of the generated MILP instances. Experiments demonstrate that our method can produce instances that closely resemble real-world datasets in terms of both structures and computational hardness. The deliverables are released at https: //miralab-ustc. github. io/L2O-G2MILP.

NeurIPS Conference 2023 Conference Paper

Contextual Stochastic Bilevel Optimization

  • Yifan Hu
  • Jie Wang
  • Yao Xie
  • Andreas Krause
  • Daniel Kuhn

We introduce contextual stochastic bilevel optimization (CSBO) -- a stochastic bilevel optimization framework with the lower-level problem minimizing an expectation conditioned on some contextual information and the upper-level decision variable. This framework extends classical stochastic bilevel optimization when the lower-level decision maker responds optimally not only to the decision of the upper-level decision maker but also to some side information and when there are multiple or even infinite many followers. It captures important applications such as meta-learning, personalized federated learning, end-to-end learning, and Wasserstein distributionally robust optimization with side information (WDRO-SI). Due to the presence of contextual information, existing single-loop methods for classical stochastic bilevel optimization are unable to converge. To overcome this challenge, we introduce an efficient double-loop gradient method based on the Multilevel Monte-Carlo (MLMC) technique and establish its sample and computational complexities. When specialized to stochastic nonconvex optimization, our method matches existing lower bounds. For meta-learning, the complexity of our method does not depend on the number of tasks. Numerical experiments further validate our theoretical results.

AAAI Conference 2023 Conference Paper

Efficient Exploration in Resource-Restricted Reinforcement Learning

  • Zhihai Wang
  • Taoxing Pan
  • Qi Zhou
  • Jie Wang

In many real-world applications of reinforcement learning (RL), performing actions requires consuming certain types of resources that are non-replenishable in each episode. Typical applications include robotic control with limited energy and video games with consumable items. In tasks with non-replenishable resources, we observe that popular RL methods such as soft actor critic suffer from poor sample efficiency. The major reason is that, they tend to exhaust resources fast and thus the subsequent exploration is severely restricted due to the absence of resources. To address this challenge, we first formalize the aforementioned problem as a resource-restricted reinforcement learning, and then propose a novel resource-aware exploration bonus (RAEB) to make reasonable usage of resources. An appealing feature of RAEB is that, it can significantly reduce unnecessary resource-consuming trials while effectively encouraging the agent to explore unvisited states. Experiments demonstrate that the proposed RAEB significantly outperforms state-of-the-art exploration strategies in resource-restricted reinforcement learning environments, improving the sample efficiency by up to an order of magnitude.

NeurIPS Conference 2023 Conference Paper

Learning Rule-Induced Subgraph Representations for Inductive Relation Prediction

  • Tianyu Liu
  • Qitan Lv
  • Jie Wang
  • Shuling Yang
  • Hanzhu Chen

Inductive relation prediction (IRP)---where entities can be different during training and inference---has shown great power for completing evolving knowledge graphs. Existing works mainly focus on using graph neural networks (GNNs) to learn the representation of the subgraph induced from the target link, which can be seen as an implicit rule-mining process to measure the plausibility of the target link. However, these methods are not able to differentiate the target link and other links during message passing, hence the final subgraph representation will contain irrelevant rule information to the target link, which reduces the reasoning performance and severely hinders the applications for real-world scenarios. To tackle this problem, we propose a novel $\textit{single-source edge-wise}$ GNN model to learn the $\textbf{R}$ule-induc$\textbf{E}$d $\textbf{S}$ubgraph represen$\textbf{T}$ations $(\textbf{REST}$), which encodes relevant rules and eliminates irrelevant rules within the subgraph. Specifically, we propose a $\textit{single-source}$ initialization approach to initialize edge features only for the target link, which guarantees the relevance of mined rules and target link. Then we propose several RNN-based functions for $\textit{edge-wise}$ message passing to model the sequential property of mined rules. REST is a simple and effective approach with theoretical support to learn the $\textit{rule-induced subgraph representation}$. Moreover, REST does not need node labeling, which significantly accelerates the subgraph preprocessing time by up to $\textbf{11. 66}\times$. Experiments on inductive relation prediction benchmarks demonstrate the effectiveness of our REST.

AAAI Conference 2023 Conference Paper

Robust Representation Learning by Clustering with Bisimulation Metrics for Visual Reinforcement Learning with Distractions

  • Qiyuan Liu
  • Qi Zhou
  • Rui Yang
  • Jie Wang

Recent work has shown that representation learning plays a critical role in sample-efficient reinforcement learning (RL) from pixels. Unfortunately, in real-world scenarios, representation learning is usually fragile to task-irrelevant distractions such as variations in background or viewpoint. To tackle this problem, we propose a novel clustering-based approach, namely Clustering with Bisimulation Metrics (CBM), which learns robust representations by grouping visual observations in the latent space. Specifically, CBM alternates between two steps: (1) grouping observations by measuring their bisimulation distances to the learned prototypes; (2) learning a set of prototypes according to the current cluster assignments. Computing cluster assignments with bisimulation metrics enables CBM to capture task-relevant information, as bisimulation metrics quantify the behavioral similarity between observations. Moreover, CBM encourages the consistency of representations within each group, which facilitates filtering out task-irrelevant information and thus induces robust representations against distractions. An appealing feature is that CBM can achieve sample-efficient representation learning even if multiple distractions exist simultaneously. Experiments demonstrate that CBM significantly improves the sample efficiency of popular visual RL algorithms and achieves state-of-the-art performance on both multiple and single distraction settings. The code is available at https://github.com/MIRALab-USTC/RL-CBM.

NeurIPS Conference 2023 Conference Paper

State Sequences Prediction via Fourier Transform for Representation Learning

  • Mingxuan Ye
  • Yufei Kuang
  • Jie Wang
  • Yang Rui
  • Wengang Zhou
  • Houqiang Li
  • Feng Wu

While deep reinforcement learning (RL) has been demonstrated effective in solving complex control tasks, sample efficiency remains a key challenge due to the large amounts of data required for remarkable performance. Existing research explores the application of representation learning for data-efficient RL, e. g. , learning predictive representations by predicting long-term future states. However, many existing methods do not fully exploit the structural information inherent in sequential state signals, which can potentially improve the quality of long-term decision-making but is difficult to discern in the time domain. To tackle this problem, we propose State Sequences Prediction via Fourier Transform (SPF), a novel method that exploits the frequency domain of state sequences to extract the underlying patterns in time series data for learning expressive representations efficiently. Specifically, we theoretically analyze the existence of structural information in state sequences, which is closely related to policy performance and signal regularity, and then propose to predict the Fourier transform of infinite-step future state sequences to extract such information. One of the appealing features of SPF is that it is simple to implement while not requiring storage of infinite-step future states as prediction targets. Experiments demonstrate that the proposed method outperforms several state-of-the-art algorithms in terms of both sample efficiency and performance.

AAAI Conference 2023 Conference Paper

Tagging before Alignment: Integrating Multi-Modal Tags for Video-Text Retrieval

  • Yizhen Chen
  • Jie Wang
  • Lijian Lin
  • Zhongang Qi
  • Jin Ma
  • Ying Shan

Vision-language alignment learning for video-text retrieval arouses a lot of attention in recent years. Most of the existing methods either transfer the knowledge of image-text pretraining model to video-text retrieval task without fully exploring the multi-modal information of videos, or simply fuse multi-modal features in a brute force manner without explicit guidance. In this paper, we integrate multi-modal information in an explicit manner by tagging, and use the tags as the anchors for better video-text alignment. Various pretrained experts are utilized for extracting the information of multiple modalities, including object, person, motion, audio, etc. To take full advantage of these information, we propose the TABLE (TAgging Before aLignmEnt) network, which consists of a visual encoder, a tag encoder, a text encoder, and a tag-guiding cross-modal encoder for jointly encoding multi-frame visual features and multi-modal tags information. Furthermore, to strengthen the interaction between video and text, we build a joint cross-modal encoder with the triplet input of [vision, tag, text] and perform two additional supervised tasks, Video Text Matching (VTM) and Masked Language Modeling (MLM). Extensive experimental results demonstrate that the TABLE model is capable of achieving State-Of-The-Art (SOTA) performance on various video-text retrieval benchmarks, including MSR-VTT, MSVD, LSMDC and DiDeMo.

YNIMG Journal 2022 Journal Article

A novel technology for in vivo detection of cell type-specific neural connection with AQP1-encoding rAAV2-retro vector and metal-free MRI

  • Ning Zheng
  • Mei Li
  • Yang Wu
  • Challika Kaewborisuth
  • Zhen Li
  • Zhu Gui
  • Jinfeng Wu
  • Aoling Cai

A mammalian brain contains numerous neurons with distinct cell types for complex neural circuits. Virus-based circuit tracing tools are powerful in tracking the interaction among the different brain regions. However, detecting brain-wide neural networks in vivo remains challenging since most viral tracing systems rely on postmortem optical imaging. We developed a novel approach that enables in vivo detection of brain-wide neural connections based on metal-free magnetic resonance imaging (MRI). The recombinant adeno-associated virus (rAAV) with retrograde ability, the rAAV2-retro, encoding the human water channel aquaporin 1 (AQP1) MRI reporter gene was generated to label neural connections. The mouse was micro-injected with the virus at the Caudate Putamen (CPU) region and subjected to detection with Diffusion-weighted MRI (DWI). The prominent structure of the CPU-connected network was clearly defined. In combination with a Cre-loxP system, rAAV2-retro expressing Cre-dependent AQP1 provides a CPU-connected network of specific type neurons. Here, we established a sensitive, metal-free MRI-based strategy for in vivo detection of cell type-specific neural connections in the whole brain, which could visualize the dynamic changes of neural networks in rodents and potentially in non-human primates.

IJCAI Conference 2022 Conference Paper

C3-STISR: Scene Text Image Super-resolution with Triple Clues

  • Minyi Zhao
  • Miao Wang
  • Fan Bai
  • Bingjia Li
  • Jie Wang
  • Shuigeng Zhou

Scene text image super-resolution (STISR) has been regarded as an important pre-processing task for text recognition from low-resolution scene text images. Most recent approaches use the recognizer's feedback as clues to guide super-resolution. However, directly using recognition clue has two problems: 1) Compatibility. It is in the form of probability distribution, has an obvious modal gap with STISR - a pixel-level task; 2) Inaccuracy. it usually contains wrong information, thus will mislead the main task and degrade super-resolution performance. In this paper, we present a novel method C3-STISR that jointly exploits the recognizer's feedback, visual and linguistical information as clues to guide super-resolution. Here, visual clue is from the images of texts predicted by the recognizer, which is informative and more compatible with the STISR task; while linguistical clue is generated by a pre-trained character-level language model, which is able to correct the predicted texts. We design effective extraction and fusion mechanisms for the triple cross-modal clues to generate a comprehensive and unified guidance for super-resolution. Extensive experiments on TextZoom show that C3-STISR outperforms the SOTA methods in fidelity and recognition performance. Code is available in https: //github. com/zhaominyiz/C3-STISR.

AAAI Conference 2022 Conference Paper

Learning Robust Policy against Disturbance in Transition Dynamics via State-Conservative Policy Optimization

  • Yufei Kuang
  • Miao Lu
  • Jie Wang
  • Qi Zhou
  • Bin Li
  • Houqiang Li

Deep reinforcement learning algorithms can perform poorly in real-world tasks due to the discrepancy between source and target environments. This discrepancy is commonly viewed as the disturbance in transition dynamics. Many existing algorithms learn robust policies by modeling the disturbance and applying it to source environments during training, which usually requires prior knowledge about the disturbance and control of simulators. However, these algorithms can fail in scenarios where the disturbance from target environments is unknown or is intractable to model in simulators. To tackle this problem, we propose a novel model-free actor-critic algorithm—namely, State-Conservative Policy Optimization (SCPO)—to learn robust policies without modeling the disturbance in advance. Specifically, SCPO reduces the disturbance in transition dynamics to that in state space and then approximates it by a simple gradient-based regularizer. The appealing features of SCPO include that it is simple to implement and does not require additional knowledge about the disturbance or specially designed simulators. Experiments in several robot control tasks demonstrate that SCPO learns robust policies against the disturbance in transition dynamics.

IJCAI Conference 2022 Conference Paper

Learning Unforgotten Domain-Invariant Representations for Online Unsupervised Domain Adaptation

  • Cheng Feng
  • Chaoliang Zhong
  • Jie Wang
  • Ying Zhang
  • Jun Sun
  • Yasuto Yokota

Existing unsupervised domain adaptation (UDA) studies focus on transferring knowledge in an offline manner. However, many tasks involve online requirements, especially in real-time systems. In this paper, we discuss Online UDA (OUDA) which assumes that the target samples are arriving sequentially as a small batch. OUDA tasks are challenging for prior UDA methods since online training suffers from catastrophic forgetting which leads to poor generalization. Intuitively, a good memory is a crucial factor in the success of OUDA. We formalize this intuition theoretically with a generalization bound where the OUDA target error can be bounded by the source error, the domain discrepancy distance, and a novel metric on forgetting in continuous online learning. Our theory illustrates the tradeoffs inherent in learning and remembering representations for OUDA. To minimize the proposed forgetting metric, we propose a novel source feature distillation (SFD) method which utilizes the source-only model as a teacher to guide the online training. In the experiment, we modify three UDA algorithms, i. e. , DANN, CDAN, and MCC, and evaluate their performance on OUDA tasks with real-world datasets. By applying SFD, the performance of all baselines is significantly improved.

NeurIPS Conference 2022 Conference Paper

MsSVT: Mixed-scale Sparse Voxel Transformer for 3D Object Detection on Point Clouds

  • Shaocong Dong
  • Lihe Ding
  • Haiyang Wang
  • Tingfa Xu
  • Xinli Xu
  • Jie Wang
  • Ziyang Bian
  • Ying Wang

3D object detection from the LiDAR point cloud is fundamental to autonomous driving. Large-scale outdoor scenes usually feature significant variance in instance scales, thus requiring features rich in long-range and fine-grained information to support accurate detection. Recent detectors leverage the power of window-based transformers to model long-range dependencies but tend to blur out fine-grained details. To mitigate this gap, we present a novel Mixed-scale Sparse Voxel Transformer, named MsSVT, which can well capture both types of information simultaneously by the divide-and-conquer philosophy. Specifically, MsSVT explicitly divides attention heads into multiple groups, each in charge of attending to information within a particular range. All groups' output is merged to obtain the final mixed-scale features. Moreover, we provide a novel chessboard sampling strategy to reduce the computational complexity of applying a window-based transformer in 3D voxel space. To improve efficiency, we also implement the voxel sampling and gathering operations sparsely with a hash map. Endowed by the powerful capability and high efficiency of modeling mixed-scale information, our single-stage detector built on top of MsSVT surprisingly outperforms state-of-the-art two-stage detectors on Waymo. Our project page: https: //github. com/dscdyc/MsSVT.

AAAI Conference 2022 Conference Paper

Sample-Efficient Reinforcement Learning via Conservative Model-Based Actor-Critic

  • Zhihai Wang
  • Jie Wang
  • Qi Zhou
  • Bin Li
  • Houqiang Li

Model-based reinforcement learning algorithms, which aim to learn a model of the environment to make decisions, are more sample efficient than their model-free counterparts. The sample efficiency of model-based approaches relies on whether the model can well approximate the environment. However, learning an accurate model is challenging, especially in complex and noisy environments. To tackle this problem, we propose the conservative model-based actorcritic (CMBAC), a novel approach that achieves high sample efficiency without the strong reliance on accurate learned models. Specifically, CMBAC learns multiple estimates of the Q-value function from a set of inaccurate models and uses the average of the bottom-k estimates—a conservative estimate—to optimize the policy. An appealing feature of CMBAC is that the conservative estimates effectively encourage the agent to avoid unreliable “promising actions”— whose values are high in only a small fraction of the models. Experiments demonstrate that CMBAC significantly outperforms state-of-the-art approaches in terms of sample efficiency on several challenging tasks, and the proposed method is more robust than previous methods in noisy environments.

AIIM Journal 2022 Journal Article

The interactive fuzzy linguistic term set and its application in multi-attribute decision making

  • Dan Peng
  • Jie Wang
  • Donghai Liu
  • Yu Cheng

In multi-attribute decision making problems, some decision information interact with each other. The paper proposes an interactive fuzzy linguistic term set to describe the interactive information in multi-attribute decision making problems. The properties of the interactive fuzzy linguistic term set and its advantages of improving the consistency of decision information are discussed, which are also interpreted from the geometric point of view. Meanwhile, some numerical examples are given to illustrate its application in dealing with the interactive information in multi-attribute decision making problems, which can improve the effectiveness of the decision results and promote the development of artificial intelligence.

NeurIPS Conference 2022 Conference Paper

Towards Video Text Visual Question Answering: Benchmark and Baseline

  • Minyi Zhao
  • Bingjia Li
  • Jie Wang
  • Wanqing Li
  • Wenjing Zhou
  • Lan Zhang
  • Shijie Xuyang
  • Zhihang Yu

There are already some text-based visual question answering (TextVQA) benchmarks for developing machine's ability to answer questions based on texts in images in recent years. However, models developed on these benchmarks cannot work effectively in many real-life scenarios (e. g. traffic monitoring, shopping ads and e-learning videos) where temporal reasoning ability is required. To this end, we propose a new task named Video Text Visual Question Answering (ViteVQA in short) that aims at answering questions by reasoning texts and visual information spatiotemporally in a given video. In particular, on the one hand, we build the first ViteVQA benchmark dataset named M4-ViteVQA --- the abbreviation of Multi-category Multi-frame Multi-resolution Multi-modal benchmark for ViteVQA, which contains 7, 620 video clips of 9 categories (i. e. , shopping, traveling, driving, vlog, sport, advertisement, movie, game and talking) and 3 kinds of resolutions (i. e. , 720p, 1080p and 1176x664), and 25, 123 question-answer pairs. On the other hand, we develop a baseline method named T5-ViteVQA for the ViteVQA task. T5-ViteVQA consists of five transformers. It first extracts optical character recognition (OCR) tokens, question features, and video representations via two OCR transformers, one language transformer and one video-language transformer, respectively. Then, a multimodal fusion transformer and an answer generation module are applied to fuse multimodal information and generate the final prediction. Extensive experiments on M4-ViteVQA demonstrate the superiority of T5-ViteVQA to the existing approaches of TextVQA and VQA tasks. The ViteVQA benchmark is available in https: //github. com/bytedance/VTVQA.

NeurIPS Conference 2021 Conference Paper

ConE: Cone Embeddings for Multi-Hop Reasoning over Knowledge Graphs

  • Zhanqiu Zhang
  • Jie Wang
  • Jiajun Chen
  • Shuiwang Ji
  • Feng Wu

Query embedding (QE)---which aims to embed entities and first-order logical (FOL) queries in low-dimensional spaces---has shown great power in multi-hop reasoning over knowledge graphs. Recently, embedding entities and queries with geometric shapes becomes a promising direction, as geometric shapes can naturally represent answer sets of queries and logical relationships among them. However, existing geometry-based models have difficulty in modeling queries with negation, which significantly limits their applicability. To address this challenge, we propose a novel query embedding model, namely \textbf{Con}e \textbf{E}mbeddings (ConE), which is the first geometry-based QE model that can handle all the FOL operations, including conjunction, disjunction, and negation. Specifically, ConE represents entities and queries as Cartesian products of two-dimensional cones, where the intersection and union of cones naturally model the conjunction and disjunction operations. By further noticing that the closure of complement of cones remains cones, we design geometric complement operators in the embedding space for the negation operations. Experiments demonstrate that ConE significantly outperforms existing state-of-the-art methods on benchmark datasets.

AAAI Conference 2021 Conference Paper

Topology-Aware Correlations Between Relations for Inductive Link Prediction in Knowledge Graphs

  • Jiajun Chen
  • Huarui He
  • Feng Wu
  • Jie Wang

Inductive link prediction—where entities during training and inference stages can be different—has been shown to be promising for completing continuously evolving knowledge graphs. Existing models of inductive reasoning mainly focus on predicting missing links by learning logical rules. However, many existing approaches do not take into account semantic correlations between relations, which are commonly seen in real-world knowledge graphs. To address this challenge, we propose a novel inductive reasoning approach, namely TACT, which can effectively exploit Topology-Aware CorrelaTions between relations in an entity-independent manner. TACT is inspired by the observation that the semantic correlation between two relations is highly correlated to their topological structure in knowledge graphs. Specifically, we categorize all relation pairs into several topological patterns, and then propose a Relational Correlation Network (RCN) to learn the importance of the different patterns for inductive link prediction. Experiments demonstrate that TACT can effectively model semantic correlations between relations, and significantly outperforms existing state-of-the-art methods on benchmark datasets for the inductive link prediction task.

AAAI Conference 2020 Conference Paper

An Iterative Polishing Framework Based on Quality Aware Masked Language Model for Chinese Poetry Generation

  • Liming Deng
  • Jie Wang
  • Hangming Liang
  • Hui Chen
  • Zhiqiang Xie
  • Bojin Zhuang
  • Shaojun Wang
  • Jing Xiao

Owing to its unique literal and aesthetical characteristics, automatic generation of Chinese poetry is still challenging in Artificial Intelligence, which can hardly be straightforwardly realized by end-to-end methods. In this paper, we propose a novel iterative polishing framework for highly qualified Chinese poetry generation. In the first stage, an encoder-decoder structure is utilized to generate a poem draft. Afterwards, our proposed Quality-Aware Masked Language Model (QA- MLM) is employed to polish the draft towards higher quality in terms of linguistics and literalness. Based on a multi-task learning scheme, QA-MLM is able to determine whether polishing is needed based on the poem draft. Furthermore, QA- MLM is able to localize improper characters of the poem draft and substitute with newly predicted ones accordingly. Benefited from the masked language model structure, QA- MLM incorporates global context information into the polishing process, which can obtain more appropriate polishing results than the unidirectional sequential decoding. Moreover, the iterative polishing process will be terminated automatically when QA-MLM regards the processed poem as a qualified one. Both human and automatic evaluation have been conducted, and the results demonstrate that our approach is effective to improve the performance of encoder-decoder structure.

AAAI Conference 2020 Conference Paper

D-SPIDER-SFO: A Decentralized Optimization Algorithm with Faster Convergence Rate for Nonconvex Problems

  • Taoxing Pan
  • Jun Liu
  • Jie Wang

Decentralized optimization algorithms have attracted intensive interests recently, as it has a balanced communication pattern, especially when solving large-scale machine learning problems. Stochastic Path Integrated Differential Estimator Stochastic First-Order method (SPIDER-SFO) nearly achieves the algorithmic lower bound in certain regimes for nonconvex problems. However, whether we can find a decentralized algorithm which achieves a similar convergence rate to SPIDER-SFO is still unclear. To tackle this problem, we propose a decentralized variant of SPIDER-SFO, called decentralized SPIDER-SFO (D-SPIDER-SFO). We show that D-SPIDER-SFO achieves a similar gradient computation cost—that is, O( −3 ) for finding an -approximate first-order stationary point—to its centralized counterpart. To the best of our knowledge, D-SPIDER-SFO achieves the state-of-the-art performance for solving nonconvex optimization problems on decentralized networks in terms of the computational cost. Experiments on different network configurations demonstrate the efficiency of the proposed method.

AAAI Conference 2020 Conference Paper

Deep Model-Based Reinforcement Learning via Estimated Uncertainty and Conservative Policy Optimization

  • Qi Zhou
  • Houqiang Li
  • Jie Wang

Model-based reinforcement learning algorithms tend to achieve higher sample efficiency than model-free methods. However, due to the inevitable errors of learned models, model-based methods struggle to achieve the same asymptotic performance as model-free methods. In this paper, We propose a Policy Optimization method with Model-Based Uncertainty (POMBU)—a novel model-based approach— that can effectively improve the asymptotic performance using the uncertainty in Q-values. We derive an upper bound of the uncertainty, based on which we can approximate the uncertainty accurately and efficiently for model-based methods. We further propose an uncertainty-aware policy optimization algorithm that optimizes the policy conservatively to encourage performance improvement with high probability. This can significantly alleviate the overfitting of policy to inaccurate models. Experiments show POMBU can outperform existing state-of-the-art policy optimization algorithms in terms of sample efficiency and asymptotic performance. Moreover, the experiments demonstrate the excellent robustness of POMBU compared to previous model-based approaches.

NeurIPS Conference 2020 Conference Paper

Duality-Induced Regularizer for Tensor Factorization Based Knowledge Graph Completion

  • Zhanqiu Zhang
  • Jianyu Cai
  • Jie Wang

Tensor factorization based models have shown great power in knowledge graph completion (KGC). However, their performance usually suffers from the overfitting problem seriously. This motivates various regularizers---such as the squared Frobenius norm and tensor nuclear norm regulariers---while the limited applicability significantly limits their practical usage. To address this challenge, we propose a novel regularizer---namely, \textbf{DU}ality-induced \textbf{R}egul\textbf{A}rizer (DURA)---which is not only effective in improving the performance of existing models but widely applicable to various methods. The major novelty of DURA is based on the observation that, for an existing tensor factorization based KGC model (\textit{primal}), there is often another distance based KGC model (\textit{dual}) closely associated with it.

AAAI Conference 2020 Conference Paper

Learning Hierarchy-Aware Knowledge Graph Embeddings for Link Prediction

  • Zhanqiu Zhang
  • Jianyu Cai
  • Yongdong Zhang
  • Jie Wang

Knowledge graph embedding, which aims to represent entities and relations as low dimensional vectors (or matrices, tensors, etc.), has been shown to be a powerful technique for predicting missing links in knowledge graphs. Existing knowledge graph embedding models mainly focus on modeling relation patterns such as symmetry/antisymmetry, inversion, and composition. However, many existing approaches fail to model semantic hierarchies, which are common in real-world applications. To address this challenge, we propose a novel knowledge graph embedding model—namely, Hierarchy-Aware Knowledge Graph Embedding (HAKE)— which maps entities into the polar coordinate system. HAKE is inspired by the fact that concentric circles in the polar coordinate system can naturally reflect the hierarchy. Specifically, the radial coordinate aims to model entities at different levels of the hierarchy, and entities with smaller radii are expected to be at higher levels; the angular coordinate aims to distinguish entities at the same level of the hierarchy, and these entities are expected to have roughly the same radii but different angles. Experiments demonstrate that HAKE can effectively model the semantic hierarchies in knowledge graphs, and significantly outperforms existing state-of-the-art methods on benchmark datasets for the link prediction task.

AAAI Conference 2020 Short Paper

Learning Sense Representation from Word Representation for Unsupervised Word Sense Disambiguation (Student Abstract)

  • Jie Wang
  • Zhenxin Fu
  • Moxin Li
  • Haisong Zhang
  • Dongyan Zhao
  • Rui Yan

Unsupervised WSD methods do not rely on annotated training datasets and can use WordNet. Since each ambiguous word in the WSD task exists in WordNet and each sense of the word has a gloss, we propose SGM and MGM to learn sense representations for words in WordNet using the glosses. In the WSD task, we calculate the similarity between each sense of the ambiguous word and its context to select the sense with the highest similarity. We evaluate our method on several benchmark WSD datasets and achieve better performance than the state-of-the-art unsupervised WSD systems.

NeurIPS Conference 2020 Conference Paper

Promoting Stochasticity for Expressive Policies via a Simple and Efficient Regularization Method

  • Qi Zhou
  • Yufei Kuang
  • Zherui Qiu
  • Houqiang Li
  • Jie Wang

Many recent reinforcement learning (RL) methods learn stochastic policies with entropy regularization for exploration and robustness. However, in continuous action spaces, integrating entropy regularization with expressive policies is challenging and usually requires complex inference procedures. To tackle this problem, we propose a novel regularization method that is compatible with a broad range of expressive policy architectures. An appealing feature is that, the estimation of our regularization terms is simple and efficient even when the policy distributions are unknown. We show that our approach can effectively promote the exploration in continuous action spaces. Based on our regularization, we propose an off-policy actor-critic algorithm. Experiments demonstrate that the proposed algorithm outperforms state-of-the-art regularized RL methods in continuous control tasks.

YNIMG Journal 2019 Journal Article

Detection of neural connections with ex vivo MRI using a ferritin-encoding trans-synaptic virus

  • Ning Zheng
  • Peng Su
  • Yue Liu
  • Huadong Wang
  • Binbin Nie
  • Xiaohui Fang
  • Yue Xu
  • Kunzhang Lin

The elucidation of neural networks is essential to understanding the mechanisms of brain functions and brain disorders. Neurotropic virus-based trans-synaptic tracing tools have become an effective method for dissecting the structure and analyzing the function of neural-circuitry. However, these tracing systems rely on fluorescent signals, making it hard to visualize the panorama of the labeled networks in mammalian brain in vivo. One MRI method, Diffusion Tensor Imaging (DTI), is capable of imaging the networks of the whole brain in live animals but without information of anatomical connections through synapses. In this report, a chimeric gene coding for ferritin and enhanced green fluorescent protein (EGFP) was integrated into Vesicular stomatitis virus (VSV), a neurotropic virus that is able to spread anterogradely in synaptically connected networks. After the animal was injected with the recombinant VSV (rVSV), rVSV-Ferritin-EGFP, into the somatosensory cortex (SC) for four days, the labeled neural-network was visualized in the postmortem whole brain with a T2-weighted MRI sequence. The modified virus transmitted from SC to synaptically connected downstream regions. The results demonstrate that rVSV-Ferritin-EGFP could be used as a bimodal imaging vector for detecting synaptically connected neural-network with both ex vivo MRI and fluorescent imaging. The strategy in the current study has the potential to longitudinally monitor the global structure of a given neural-network in living animals.

AAAI Conference 2019 Conference Paper

Gated Residual Recurrent Graph Neural Networks for Traffic Prediction

  • Cen Chen
  • Kenli Li
  • Sin G. Teo
  • Xiaofeng Zou
  • Kang Wang
  • Jie Wang
  • Zeng Zeng

Traffic prediction is of great importance to traffic management and public safety, and very challenging as it is affected by many complex factors, such as spatial dependency of complicated road networks and temporal dynamics, and many more. The factors make traffic prediction a challenging task due to the uncertainty and complexity of traffic states. In the literature, many research works have applied deep learning methods on traffic prediction problems combining convolutional neural networks (CNNs) with recurrent neural networks (RNNs), which CNNs are utilized for spatial dependency and RNNs for temporal dynamics. However, such combinations cannot capture the connectivity and globality of traffic networks. In this paper, we first propose to adopt residual recurrent graph neural networks (Res-RGNN) that can capture graph-based spatial dependencies and temporal dynamics jointly. Due to gradient vanishing, RNNs are hard to capture periodic temporal correlations. Hence, we further propose a novel hop scheme into Res-RGNN to utilize the periodic temporal dependencies. Based on Res-RGNN and hop Res-RGNN, we finally propose a novel end-to-end multiple Res-RGNNs framework, referred to as “MRes-RGNN”, for traffic prediction. Experimental results on two traffic datasets have demonstrated that the proposed MRes-RGNN outperforms state-of-the-art methods significantly.

JMLR Journal 2019 Journal Article

Scaling Up Sparse Support Vector Machines by Simultaneous Feature and Sample Reduction

  • Bin Hong
  • Weizhong Zhang
  • Wei Liu
  • Jieping Ye
  • Deng Cai
  • Xiaofei He
  • Jie Wang

Sparse support vector machine (SVM) is a popular classification technique that can simultaneously learn a small set of the most interpretable features and identify the support vectors. It has achieved great successes in many real-world applications. However, for large-scale problems involving a huge number of samples and ultra-high dimensional features, solving sparse SVMs remains challenging. By noting that sparse SVMs induce sparsities in both feature and sample spaces, we propose a novel approach, which is based on accurate estimations of the primal and dual optima of sparse SVMs, to simultaneously identify the inactive features and samples that are guaranteed to be irrelevant to the outputs. Thus, we can remove the identified inactive samples and features from the training phase, leading to substantial savings in the computational cost without sacrificing the accuracy. Moreover, we show that our method can be extended to multi-class sparse support vector machines. To the best of our knowledge, the proposed method is the first static feature and sample reduction method for sparse SVMs and multi-class sparse SVMs. Experiments on both synthetic and real data sets demonstrate that our approach significantly outperforms state-of-the-art methods and the speedup gained by our approach can be orders of magnitude. [abs] [ pdf ][ bib ] &copy JMLR 2019. ( edit, beta )

JMLR Journal 2019 Journal Article

Two-Layer Feature Reduction for Sparse-Group Lasso via Decomposition of Convex Sets

  • Jie Wang
  • Zhanqiu Zhang
  • Jieping Ye

Sparse-Group Lasso (SGL) has been shown to be a powerful regression technique for simultaneously discovering group and within-group sparse patterns by using a combination of the $\ell_1$ and $\ell_2$ norms. However, in large-scale applications, the complexity of the regularizers entails great computational challenges. In this paper, we propose a novel two-layer feature reduction method (TLFre) for SGL via a decomposition of its dual feasible set. The two-layer reduction is able to quickly identify the inactive groups and the inactive features, respectively, which are guaranteed to be absent from the sparse representation and can be removed from the optimization. Existing feature reduction methods are only applicable to sparse models with one sparsity-inducing regularizer. To our best knowledge, TLFre is the first one that is capable of dealing with multiple sparsity-inducing regularizers. Moreover, TLFre has a very low computational cost and can be integrated with any existing solvers. We also develop a screening method---called DPC (decomposition of convex set)---for nonnegative Lasso. Experiments on both synthetic and real data sets show that TLFre and DPC improve the efficiency of SGL and nonnegative Lasso by several orders of magnitude. [abs] [ pdf ][ bib ] &copy JMLR 2019. ( edit, beta )

JMLR Journal 2015 Journal Article

Lasso Screening Rules via Dual Polytope Projection

  • Jie Wang
  • Peter Wonka
  • Jieping Ye

Lasso is a widely used regression technique to find sparse representations. When the dimension of the feature space and the number of samples are extremely large, solving the Lasso problem remains challenging. To improve the efficiency of solving large- scale Lasso problems, El Ghaoui and his colleagues have proposed the SAFE rules which are able to quickly identify the inactive predictors, i.e., predictors that have $0$ components in the solution vector. Then, the inactive predictors or features can be removed from the optimization problem to reduce its scale. By transforming the standard Lasso to its dual form, it can be shown that the inactive predictors include the set of inactive constraints on the optimal dual solution. In this paper, we propose an efficient and effective screening rule via Dual Polytope Projections (DPP), which is mainly based on the uniqueness and nonexpansiveness of the optimal dual solution due to the fact that the feasible set in the dual space is a convex and closed polytope. Moreover, we show that our screening rule can be extended to identify inactive groups in group Lasso. To the best of our knowledge, there is currently no exact screening rule for group Lasso. We have evaluated our screening rule using synthetic and real data sets. Results show that our rule is more effective in identifying inactive predictors than existing state-of-the-art screening rules for Lasso. [abs] [ pdf ][ bib ] &copy JMLR 2015. ( edit, beta )

NeurIPS Conference 2015 Conference Paper

Multi-Layer Feature Reduction for Tree Structured Group Lasso via Hierarchical Projection

  • Jie Wang
  • Jieping Ye

Tree structured group Lasso (TGL) is a powerful technique in uncovering the tree structured sparsity over the features, where each node encodes a group of features. It has been applied successfully in many real-world applications. However, with extremely large feature dimensions, solving TGL remains a significant challenge due to its highly complicated regularizer. In this paper, we propose a novel Multi-Layer Feature reduction method (MLFre) to quickly identify the inactive nodes (the groups of features with zero coefficients in the solution) hierarchically in a top-down fashion, which are guaranteed to be irrelevant to the response. Thus, we can remove the detected nodes from the optimization without sacrificing accuracy. The major challenge in developing such testing rules is due to the overlaps between the parents and their children nodes. By a novel hierarchical projection algorithm, MLFre is able to test the nodes independently from any of their ancestor nodes. Moreover, we can integrate MLFre---that has a low computational cost---with any existing solvers. Experiments on both synthetic and real data sets demonstrate that the speedup gained by MLFre can be orders of magnitude.

NeurIPS Conference 2014 Conference Paper

A Safe Screening Rule for Sparse Logistic Regression

  • Jie Wang
  • Jiayu Zhou
  • Jun Liu
  • Peter Wonka
  • Jieping Ye

The l1-regularized logistic regression (or sparse logistic regression) is a widely used method for simultaneous classification and feature selection. Although many recent efforts have been devoted to its efficient implementation, its application to high dimensional data still poses significant challenges. In this paper, we present a fast and effective sparse logistic regression screening rule (Slores) to identify the zero components in the solution vector, which may lead to a substantial reduction in the number of features to be entered to the optimization. An appealing feature of Slores is that the data set needs to be scanned only once to run the screening and its computational cost is negligible compared to that of solving the sparse logistic regression problem. Moreover, Slores is independent of solvers for sparse logistic regression, thus Slores can be integrated with any existing solver to improve the efficiency. We have evaluated Slores using high-dimensional data sets from different applications. Extensive experimental results demonstrate that Slores outperforms the existing state-of-the-art screening rules and the efficiency of solving sparse logistic regression is improved by one magnitude in general.

NeurIPS Conference 2014 Conference Paper

Two-Layer Feature Reduction for Sparse-Group Lasso via Decomposition of Convex Sets

  • Jie Wang
  • Jieping Ye

Sparse-Group Lasso (SGL) has been shown to be a powerful regression technique for simultaneously discovering group and within-group sparse patterns by using a combination of the l1 and l2 norms. However, in large-scale applications, the complexity of the regularizers entails great computational challenges. In this paper, we propose a novel two-layer feature reduction method (TLFre) for SGL via a decomposition of its dual feasible set. The two-layer reduction is able to quickly identify the inactive groups and the inactive features, respectively, which are guaranteed to be absent from the sparse representation and can be removed from the optimization. Existing feature reduction methods are only applicable for sparse models with one sparsity-inducing regularizer. To our best knowledge, TLFre is the first one that is capable of dealing with multiple sparsity-inducing regularizers. Moreover, TLFre has a very low computational cost and can be integrated with any existing solvers. Experiments on both synthetic and real data sets show that TLFre improves the efficiency of SGL by orders of magnitude.

IJCAI Conference 2013 Conference Paper

Cross-Domain Collaborative Filtering via Bilinear Multilevel Analysis

  • Liang Hu
  • Jian Cao
  • Guandong Xu
  • Jie Wang
  • Zhiping Gu
  • Longbing Cao

Cross-domain collaborative filtering (CDCF), which aims to leverage data from multiple domains to relieve the data sparsity issue, is becoming an emerging research topic in recent years. However, current CDCF methods that mainly consider user and item factors but largely neglect the heterogeneity of domains may lead to improper knowledge transfer issues. To address this problem, we propose a novel CDCF model, the Bilinear Multilevel Analysis (BLMA), which seamlessly introduces multilevel analysis theory to the most successful collaborative filtering method, matrix factorization (MF). Specifically, we employ BLMA to more efficiently address the determinants of ratings from a hierarchical view by jointly considering domain, community, and user effects so as to overcome the issues caused by traditional MF approaches. Moreover, a parallel Gibbs sampler is provided to learn these effects. Finally, experiments conducted on a realworld dataset demonstrate the superiority of the BLMA over other state-of-the-art methods.

NeurIPS Conference 2013 Conference Paper

Lasso Screening Rules via Dual Polytope Projection

  • Jie Wang
  • Jiayu Zhou
  • Peter Wonka
  • Jieping Ye

Lasso is a widely used regression technique to find sparse representations. When the dimension of the feature space and the number of samples are extremely large, solving the Lasso problem remains challenging. To improve the efficiency of solving large-scale Lasso problems, El Ghaoui and his colleagues have proposed the SAFE rules which are able to quickly identify the inactive predictors, i. e. , predictors that have $0$ components in the solution vector. Then, the inactive predictors or features can be removed from the optimization problem to reduce its scale. By transforming the standard Lasso to its dual form, it can be shown that the inactive predictors include the set of inactive constraints on the optimal dual solution. In this paper, we propose an efficient and effective screening rule via Dual Polytope Projections (DPP), which is mainly based on the uniqueness and nonexpansiveness of the optimal dual solution due to the fact that the feasible set in the dual space is a convex and closed polytope. Moreover, we show that our screening rule can be extended to identify inactive groups in group Lasso. To the best of our knowledge, there is currently no exact" screening rule for group Lasso. We have evaluated our screening rule using many real data sets. Results show that our rule is more effective to identify inactive predictors than existing state-of-the-art screening rules for Lasso. "

ICRA Conference 2004 Conference Paper

Calibrating Human Hand for Teleoperating the HIT/DLR Hand

  • Haiying Hu
  • Xiaohui Gao
  • Jiawei Li
  • Jie Wang
  • Hong Liu 0002

Using human action to guide robot execution can greatly reduce the planning complexity. We calibrate a human hand model and map its motion to a four-finger dexterous robot hand. The parameters of human hand model are determined by open-loop kinematic calibration method based on a vision system. We analyze the kinematic difference between the human hand and dexterous robot hand, and present a modified fingertip mapping to solve the partial overlap of the fingertip workspaces. 3D graphic simulation and manipulation experiments show that the accuracy of the human hand model and the mapping method are sufficiently precise for teleoperation tasks.