Arrow Research search

Author name cluster

Yan Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

154 papers
2 author rows

Possible papers

154

JBHI Journal 2026 Journal Article

A Dynamic Multi-Scale Hypergraph Learning Framework Driven by Features and Structures for ceRNA-Disease Association Prediction

  • Xin-Fei Wang
  • Lan Huang
  • Yan Wang
  • Ren-Chu Guan
  • Zhu-Hong You
  • Feng-Feng Zhou
  • Yu-Qing Li
  • Yuan Fu

Competitive endogenous RNA (ceRNA) networks are pivotal for uncovering disease molecular mechanisms. Graph representation learning is a cornerstone for modeling biological regulatory networks and predicting disease-related biomarkers. However, current methods face challenges: traditional graph neural network (GNN) rely on low-order graph structures, which struggle to capture high-order molecular interactions, resulting in topological information loss; shallow GNN fail to model long-range dependencies, while deep architectures suffer from over-smoothing, limiting complex regulatory expression; static embeddings overlook dynamic molecular interactions, reducing biomarker accuracy. These limitations highlight the need for advanced graph learning frameworks. To address these challenges, we propose DMHLF, a Dynamic Multi-scale Hypergraph Learning Framework for predicting disease-associated ceRNA biomarkers. The framework first integrates multiple regulatory relationships among miRNAs, lncRNAs, circRNAs, mRNAs, and diseases to construct disease-specific ceRNA regulatory networks, capturing local and global regulatory patterns through multi-Hop hyperedges. Subsequently, we devise a Hypergraph-Weighted Dynamic Random Walk (HEDRW) method to dynamically extract node meta-embeddings that encode high-order regulatory information. Concurrently, we extend Eigen-GNN spectral analysis to hypergraph structures, incorporating a residual-enhanced hypergraph neural network to preserve the global topological properties of shallow hypergraphs. Finally, a cross-scale attention mechanism aligns and fuses multi-scale features to generate high-quality node embeddings for disease-ceRNA association prediction. Experiments on diverse datasets demonstrate that DMHLF significantly outperforms existing methods. Case study further validates the framework’s efficacy in identifying disease-related ceRNA biomarkers, providing a reliable predictive tool for biomedical research.

TMLR Journal 2026 Journal Article

AC-PKAN: Attention-Enhanced and Chebyshev Polynomial-Based Physics-Informed Kolmogorov–Arnold Networks

  • Hangwei Zhang
  • Zhimu Huang
  • Yan Wang

Kolmogorov–Arnold Networks (KANs) have recently shown promise for solving partial differential equations (PDEs). Yet their original formulation is computationally and memory intensive, motivating the introduction of Chebyshev Type-I-based KANs (Chebyshev1KANs). Although Chebyshev1KANs have outperformed the vanilla KANs architecture, our rigorous theoretical analysis reveals that they still suffer from rank collapse, ultimately limiting their expressive capacity. To overcome these limitations, we enhance Chebyshev1KANs by integrating wavelet-activated MLPs with learnable parameters and an internal attention mechanism. We prove that this design preserves a full-rank Jacobian and is capable of approximating solutions to PDEs of arbitrary order. Furthermore, to alleviate the loss instability and imbalance introduced by the Chebyshev polynomial basis, we externally incorporate a Residual Gradient Attention (RGA) mechanism that dynamically re-weights individual loss terms according to their gradient norms and residual magnitudes. By jointly leveraging internal and external attention, we present AC-PKAN, a novel architecture that constitutes an enhancement to weakly supervised Physics-Informed Neural Networks (PINNs) and extends the expressive power of KANs. Experimental results from nine benchmark tasks across three domains show that AC-PKAN consistently outperforms or matches state-of-the-art models such as PINNsFormer, establishing it as a highly effective tool for solving complex real-world engineering problems in zero-data or data-sparse regimes. The code will be made publicly available upon acceptance.

AAAI Conference 2026 Conference Paper

C-GNN-PRUNE: A Unified Graph-Based Framework for Structure-Aware Pruning of Mixture-of-Experts Models

  • Lin Li
  • Yan Wang
  • Zhuopeng Wang

The Mixture-of-Experts (MoE) architecture has emerged as a promising paradigm for scaling large language models (LLMs) by activating only a sparse subset of experts per input. However, its massive parameter size remains a major obstacle to efficient deployment. Existing pruning methods often ignore two key aspects: the intricate structural dependencies among experts and the heterogeneous importance of different layers. To tackle these issues, we propose C-GNN-PRUNE, a unified and structure-aware compression framework tailored for MoE models. Our method introduces an EntropyGuided Allocation Module that dynamically assigns pruning budgets by leveraging expert activation entropy, enabling adaptive handling of inter-layer heterogeneity. To preserve structural collaboration patterns, we construct an expert interaction graph that fuses functional similarity and routing behavior, and employ a GNN-Based Embedding Module to learn structure-aware expert representations. These embeddings, along with co-activation patterns, are fed into a Community Detection Module to identify expert clusters for structured pruning. Finally, an Activation-Aware Selection Module retains the most critical experts in each community, balancing sparsity and expressiveness. Experiments on multiple open-source MoE models demonstrate that C-GNN-PRUNE consistently outperforms prior methods under various pruning ratios, achieving better trade-offs between compression and accuracy. This framework provides a modular and effective solution for structure-preserving compression of large-scale MoE models.

AAAI Conference 2026 Conference Paper

Commonality in Few: Few-Shot Multimodal Anomaly Detection via Hypergraph-Enhanced Memory

  • Yuxuan Lin
  • Hanjing Yan
  • Xuan Tong
  • Yang Chang
  • Huanzhen Wang
  • Ziheng Zhou
  • Shuyong Gao
  • Yan Wang

Few-shot multimodal industrial anomaly detection is a critical yet underexplored task, offering the ability to quickly adapt to complex industrial scenarios. In few-shot settings, insufficient training samples often fail to cover the diverse patterns present in test samples. This challenge can be mitigated by extracting structural commonality from a small number of training samples. In this paper, we propose a novel few-shot unsupervised multimodal industrial anomaly detection method based on structural commonality, CIF (Commonality In Few). To extract intra-class structural information, we employ hypergraphs, which are capable of modeling higher-order correlations, to capture the structural commonality within training samples, and use a memory bank to store this intra-class structural prior. Firstly, we design a semantic-aware hypergraph construction module tailored for single-semantic industrial images, from which we extract common structures to guide the construction of the memory bank. Secondly, we use a training-free hypergraph message passing module to update the visual features of test samples, reducing the distribution gap between test features and features in the memory bank. We further propose a hyperedge-guided memory search module, which utilizes structural information to assist the memory search process and reduce the false positive rate. Experimental results on the MVTec 3D-AD dataset and the Eyecandies dataset show that our method outperforms the state-of-the-art (SOTA) methods in few-shot settings.

TCS Journal 2026 Journal Article

Completely independent spanning trees in the line graph of complete multipartite graphs

  • Hao Wang
  • Yan Wang
  • Baolei Cheng
  • Jianxi Fan

Spanning trees T 1, T 2, …, T t are completely independent spanning trees (CISTs) if and only if they are edge-disjoint and each node is internal in at most one tree. CISTs have a wide range of applications in routing protection, data transmission, etc. , and can improve reliability, fault tolerance, and information security. Line graphs have received increasing attention in recent years for the construction of multiple CISTs, as they are more likely to satisfy the structural conditions required for their existence. This paper introduces an algorithm for constructing multiple CISTs in the line graph of the complete tripartite graph K n 3, n 2, n 1 (denoted by L ( K n 3, n 2, n 1 ) ), using multiple two-dimensional matrices to guide the construction process. Furthermore, it presents a method for constructing multiple CISTs in the line graph of the complete multipartite graph K n φ, n φ − 1, …, n 1 (denoted by L ( K n φ, n φ − 1, …, n 1 ), when φ ≥ 4), utilizing edge-disjoint Hamiltonian cycles of the complete graph. In our simulation study, we employed multiple CISTs as transmission paths and compared their performance with shortest-path routing in terms of transmission latency and resilience to node failures.

AAAI Conference 2026 Conference Paper

Diagnostic-Guided Dynamic Profile Optimization for LLM-based User Simulators in Sequential Recommendation

  • Hongyang Liu
  • Zhu Sun
  • Tianjun Wei
  • Yan Wang
  • Jiajie Zhu
  • Xinghua Qu

Recent advances in large language models (LLMs) have enabled realistic user simulators for developing and evaluating recommender systems (RSs). However, existing LLM-based simulators for RSs face two major limitations: (1) static and single-step prompt-based inference that leads to inaccurate and incomplete user profile construction; (2) unrealistic and single-round recommendation-feedback interaction pattern that fails to capture real-world scenarios. To address these limitations, we propose DGDPO (Diagnostic-Guided Dynamic Profile Optimization), a novel framework that constructs user profile through a dynamic and iterative optimization process to enhance the simulation fidelity. Specifically, DGDPO incorporates two core modules within each optimization loop: firstly, a specialized LLM-based diagnostic module, calibrated through our novel training strategy, accurately identifies specific defects in the user profile. Subsequently, a generalized LLM-based treatment module analyzes the diagnosed defect and generates targeted suggestions to refine the profile. Furthermore, unlike existing LLM-based user simulators that are limited to single-round interactions, we are the first to integrate DGDPO with sequential recommenders, enabling a bidirectional evolution where user profiles and recommendation strategies adapt to each other over multi-round interactions. Extensive experiments conducted on three real-world datasets demonstrate the effectiveness of our proposed framework.

AAAI Conference 2026 Conference Paper

DualFete: Revisiting Teacher-Student Interactions from a Feedback Perspective for Semi-supervised Medical Image Segmentation

  • Le Yi
  • Wei Huang
  • Lei Zhang
  • Kefu Zhao
  • Yan Wang
  • Zizhou Wang

The teacher-student paradigm has emerged as a canonical framework in semi-supervised learning. When applied to medical image segmentation, the paradigm faces challenges due to inherent image ambiguities, making it particularly vulnerable to erroneous supervision. Crucially, the student's iterative reconfirmation of these errors leads to self-reinforcing bias. While some studies attempt to mitigate this bias, they often rely on external modifications to the conventional teacher-student framework, overlooking its intrinsic potential for error correction. In response, this work introduces a feedback mechanism into the teacher-student framework to counteract error reconfirmations. Here, the student provides feedback on the changes induced by the teacher's pseudo-labels, enabling the teacher to refine these labels accordingly. We specify that this interaction hinges on two key components: the feedback attributor, which designates pseudo-labels triggering the student's update, and the feedback receiver, which determines where to apply this feedback. Building on this, a dual-teacher feedback model is further proposed, which allows more dynamics in the feedback loop and fosters more gains by resolving disagreements through cross-teacher supervision while avoiding consistent errors. Comprehensive evaluations on three medical image benchmarks demonstrate the method's effectiveness in addressing error propagation in semi-supervised medical image segmentation.

AAAI Conference 2026 Conference Paper

EchoMimicV3: 1.3B Parameters Are All You Need for Unified Multi-Modal and Multi-Task Human Animation

  • Rang Meng
  • Yan Wang
  • Weipeng Wu
  • Ruobing Zheng
  • Yuming Li
  • Chenguang Ma

Recent work on human animation usually incorporates large-scale video models, thereby achieving more vivid performance. However, the practical use of such methods is hindered by the slow inference speed and high computational demands. Moreover, traditional work typically employs separate models for each animation task, increasing costs in multi-task scenarios and worsening the dilemma. To address these limitations, we introduce EchoMimicV3, an efficient framework that unifies multi-task and multi-modal human animation. At the core of EchoMimicV3 lies a threefold design: a Soup-of-Tasks paradigm, a Soup-of-Modals paradigm, and a novel training and inference strategy. The Soup-of-Tasks leverages multi-task mask inputs and a counter-intuitive task allocation strategy to achieve multi-task gains without multi-model pains. Meanwhile, the Soup-of-Modals introduces a Coupled-Decoupled Multi-Modal Cross Attention module to inject multi-modal conditions, complemented by a Timestep Phase-aware Multi-Modal Allocation mechanism to dynamically modulate multi-modal mixtures. Besides, we propose Negative Direct Preference Optimization and Phase-aware Negative Classifier-Free Guidance, which ensure stable training and inference. Extensive experiments and analyses demonstrate that EchoMimicV3, with a minimal model size of 1.3 billion parameters, achieves competitive performance in both quantitative and qualitative evaluations. We are committed to open-sourcing our code for community use.

AIIM Journal 2026 Journal Article

EPPCMinerBen: A novel benchmark for evaluating large language models on electronic patient–provider communication via the patient portal

  • Samah Jamal Fodeh
  • Yan Wang
  • Linhai Ma
  • Srivani Talakokkul
  • Jordan M. Alpert
  • Sarah Schellhorn

Effective communication in health care is critical for treatment outcomes and adherence. With patient–provider exchanges shifting to secure messaging, analyzing electronic patient-communication (EPPC) data is both essential and challenging. We introduce EPPCMinerBen, a benchmark for evaluating LLMs in detecting communication patterns and extracting insights from electronic patient–provider messages. EPPCMinerBen includes three sub-tasks: Code Classification, Subcode Classification, and Evidence Extraction. Using 1933 expert-annotated sentences from 752 secure messages of the patient portal at Yale New Haven Hospital, it evaluates LLMs on identifying communicative intent and supporting text. Benchmarks span various LLMs under zero-shot and few-shot settings, with data to be released via the NCI Cancer Data Service. Model performance varied across tasks and settings. Llama-3. 1-70B led in evidence extraction (F1: 82. 84%) and performed well in classification. Llama-3. 3-70b-Instruct outperformed all models in code classification (F1: 67. 03%). DeepSeek-R1-Distill-Qwen-32B excelled in subcode classification (F1: 48. 25%), while sdoh-llama-3-70B showed consistent performance. Smaller models underperformed, especially in subcode classification ( > 30% F1). Few-shot prompting improved most tasks. Our results indicate that large, instruction-tuned models tend to achieve higher performance in EPPCMinerBen tasks, particularly evidence extraction while smaller models struggle with fine-grained reasoning. EPPCMinerBen provides a benchmark for discourse-level understanding of patient–provider communication, supporting future work on structured communication analysis and model evaluation.

TIST Journal 2026 Journal Article

From Hallucination to Certainty: Meta-Knowledge Guided Self-Correcting Large Language Models

  • Wei Zhang
  • Guojun Dai
  • Ding Luo
  • Yan Wang
  • Chen Ye

Recent advancements in Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language understanding and generation. To further enhance their factual grounding and reasoning fidelity, integrating LLMs with Knowledge Graphs (KGs) has emerged as a promising direction. Significant progress has been made in leveraging KGs to augment LLM reasoning through methods like Retrieval-Augmented Generation. However, effectively harnessing the synergy between LLMs and KGs for robust and reliable reasoning still presents critical challenges. Specifically: (1) LLMs struggle to effectively interpret and utilize the structured nature of KGs, due to the discrepancy between their text-based training and KG's symbolic representations; (2) querying and reasoning over structured knowledge in KGs remains inefficient for LLMs, hindering complex inference. To address these limitations, we introduce Meta-Knowledge enhanced Knowledge Graph (MKG), a novel framework that empowers LLMs to effectively leverage structured knowledge from KGs. MKG employs Meta-Knowledge, stored in a multi-store memory with a Self-Correcting Mechanism, to guide LLMs in KG retrieval and reasoning. Our experimental evaluations on complex question answering benchmarks demonstrate that MKG achieves significant performance gains, outperforms the baseline Original LLM, Retrieval-Augmented Generation (RAG), ReAct, GraphRAG and ToG frameworks by 25%, 17%, 11%, 3.3% and 2.6%, respectively.

AAAI Conference 2026 Conference Paper

GaussianImage++: Boosted Image Representation and Compression with 2D Gaussian Splatting

  • Tiantian Li
  • Xinjie Zhang
  • Xingtong Ge
  • Tongda Xu
  • Dailan He
  • Jun Zhang
  • Yan Wang

Implicit neural representations (INRs) have achieved remarkable success in image representation and compression, but they require substantial training time and memory. Meanwhile, recent 2D Gaussian Splatting (GS) methods (\textit{e.g.}, GaussianImage) offer promising alternatives through efficient primitive-based rendering. However, these methods require excessive Gaussian primitives to maintain high visual fidelity. To exploit the potential of GS-based approaches, we present GaussianImage++, which utilizes limited Gaussian primitives to achieve impressive representation and compression performance. Firstly, we introduce a distortion-driven densification mechanism. It progressively allocates Gaussian primitives according to signal intensity. Secondly, we employ context-aware Gaussian filters for each primitive, which assist in the densification to optimize Gaussian primitives based on varying image content. Thirdly, we integrate attribute-separated learnable scalar quantizers and quantization-aware training, enabling efficient compression of primitive attributes. Experimental results demonstrate the effectiveness of our method. In particular, GaussianImage++ outperforms GaussianImage and INRs-based COIN in representation and compression performance while maintaining real-time decoding and low memory usage.

AAAI Conference 2026 Conference Paper

Hi-EF: Benchmarking Emotion Forecasting in Human-interaction

  • Haoran Wang
  • Xinji Mai
  • Zeng Tao
  • Junxiong Lin
  • Xuan Tong
  • Ivy Pan
  • Shaoqi Yan
  • Yan Wang

Affective Forecasting is an psychology task that involves predicting an individual's future emotional responses, often hampered by reliance on external factors leading to inaccuracies, and typically remains at a qualitative analysis stage. To address these challenges, we narrows the scope of Affective Forecasting by introducing the concept of Human-interaction-based Emotion Forecasting (EF). This task is set within the context of a two-party interaction, positing that an individual's emotions are significantly influenced by their interaction partner's emotional expressions and informational cues. This dynamic provides a structured perspective for exploring the patterns of emotional change, thereby enhancing the feasibility of emotion forecasting.

JBHI Journal 2026 Journal Article

MGTP: Multi-Granularity Textual Prompts for Low-Dose Brain PET Image Denoising via Adversarial Diffusion Model

  • Jiaqi Cui
  • Xinyi Zeng
  • Pinxian Zeng
  • Bo Liu
  • Xi Wu
  • Deng Xiong
  • Jiliu Zhou
  • Yan Wang

Positron emission tomography (PET) is an advanced nuclear imaging technique and has been widely applied in clinic. However, radiation risks associated with standard-dose PET imaging raise health concerns, whereas the quality of low-dose PET images fails to meet clinical requirements. To reduce the tracer dose while maintaining image quality, it is of great interest to estimate high-quality PET images from low-dose images. However, existing low-dose PET image denoising methods primarily focus on image data, overlooking crucial information in non-image textual data such as patients’ clinical tabular and textual descriptions of general image quality. This neglect can lead to subpar denoising quality with inaccurate contexts and poor details. To address these problems, in this paper, we propose Multi-Granularity Textual Prompts, namely MGTP, to denoise low-dose PET images via an adversarial diffusion model. Different from prior methods that rely solely on image conditioning, our MGTP innovatively introduces textual prompts spanning diverse granularities to capture both high-level semantic-related contexts and low-level degradation-related details. To harmonize multi-granularity textual prompts with low-dose PET images, we design a Cross-Modality Selective Conditioning (CMSC) module, which prioritizes semantic- and detail-relevant information while eliminating irrelevant components. The resulting features are fed into diffusion model as conditions, enforcing a more controlled diffusion process. In addition, we develop a Masked Prompt Reconstruction Network (MPR-Net) to enhance the preservation of semantics and details in denoised images, mitigating distortions brought by the random noise in the diffusion process. Experiments on clinical PET data show that our method achieves the state-of-the-art performance.

AAAI Conference 2026 Conference Paper

Physics-Aware Accelerated Unrolling Model for Sparse-View CT Reconstruction

  • Shaojie Guo
  • Yingying Fang
  • Junkang Zhang
  • Yan Wang

Deep unrolling models (DUMs) have shown great poten-tial in sparse-view CT reconstruction by combining itera-tive optimization and deep learning. However, most DUMsinsufficiently account for physical degradation from sparse-view imaging, leading to slow convergence and persistentartifacts. To address this, we propose PAUM, a Physics-Aware Accelerated Unrolling Model explicitly incorporatingCT imaging physics into the iterative reconstruction. PAUMfirst introduces a Dual-Domain Physics-Aware Extrapolation(DDPE) module. By modeling dual-domain degradations, itperforms row-wise extrapolation in the sinogram domain toimprove missing view recovery, and pixel-wise extrapolationin the image domain to address spatially variant degradationfrom incomplete backprojection. This physics-aware extrap-olation aligns optimization dynamics with underlying physi-cal imaging degradation, significantly enhances structural up-dates, thereby accelerating convergence. Subsequently, wedevelop a lightweight Block-Attention Deformable Regu-larization Network (BDRN), leveraging deformable convo-lutions and block-wise attention to model spatially variantand structured artifact physical characteristics. This enablesspatially adaptive regularization on extrapolated results, ef-fectively improving reconstruction quality. Extensive exper-iments demonstrate PAUM achieves over 1dB improvementcompared to SOTA methods, while reducing iteration countby 50%.

AAAI Conference 2026 Conference Paper

PromptEmo: Learning Emotion with Bilateral Textual Prompts in Multi-Domain Open-set Scenarios

  • Xinyi Zeng
  • Yuxiang Yang
  • Pinxian Zeng
  • Wenxia Yin
  • Bo Liu
  • Xi Wu
  • Yan Wang

Facial Expression Recognition (FER) is crucial to human-computer interaction. Existing cross-domain FER (CD-FER) methods mainly focus on single-source closed-set scenarios, transferring knowledge from a single source domain to a target domain with identical class sets. However, CD-FER faces two real-world challenges: 1) the need to leverage information from multiple sources, leading to multi-domain shift, and 2) the necessity to recognize unseen target classes, resulting in class shift. These issues give rise to a novel and challenging task, which we define as Multi-domain Open-set FER (MO-FER). In this paper, we propose PromptEmo, a novel CLIP-based framework that leverages bilateral textual prompts to address both shifts in the MO-FER task. Leveraging the generalizability of LLM, PromptEmo constructs trainable positive prompts with LLM-generated emotion descriptions for seen classes, as well as template-derived negative prompts to enhance the reasoning for unseen classes. Then, we introduce a modal-task optimization paradigm organized from two perspectives: textual semantics and visual domains, yielding Intra-modal Space-specific Optimization (ISO) and Cross-modal Emotion-aware Interaction (CEI) strategies. ISO refines the CLIP-based textual space to ensure semantic separation between bilateral prompts and improves the latent visual space by promoting inter-domain alignment. Founded on ISO, CEI facilitates effective vision-language interactions, resulting in four joint loss terms that improve emotion recognition by shaping a domain-invariant, discriminative feature space. PromptEmo surpasses the current SOTA method by 7.7% AUC on unseen classes across four FER datasets, serving as a strong baseline for the MO-FER task.

AAAI Conference 2026 Conference Paper

Reference Recommendation Based Membership Inference Attack Against Hybrid-Based Recommender Systems

  • Xiaoxiao Chi
  • Xuyun Zhang
  • Yan Wang
  • Hongsheng Hu
  • Wanchun Dou

Recommender systems have been widely deployed across various domains such as e-commerce and social media, and intelligently suggest items like products and potential friends to users based on their preferences and interaction history, which are often privacy-sensitive. Recent studies have revealed that recommender systems are prone to membership inference attacks (MIAs), where an attacker aims to infer whether or not a user’s data has been used for training a target recommender system. However, existing MIAs fail to exploit the unique characteristic of recommender systems, and therefore are only applicable to mixed recommender systems consisting of two recommendation algorithms. This leaves a gap in investigating MIAs against hybrid-based recommender systems where the same algorithm utilizing user-item historical interactions and attributes of users and items serves and produces personalised recommendations. To investigate how the personalisation in hybrid-based recommender systems influences MIA, we propose a novel metric-based MIA. Specifically, we leverage the characteristic of personalisation to obtain reference recommendation for any target users. Then, a relative membership metric is proposed to exploit a target user’s historical interactions, target recommendation, and reference recommendation to infer the membership of the target user’s data. Finally, we theoretically and empirically demonstrate the efficacy of the proposed metric-based MIA on hybrid-based recommender systems.

JBHI Journal 2026 Journal Article

Refocal Loss in Transformer for Long-Tailed Multi-Granularity Cataract Classification

  • Qiong Wang
  • Yan Wang
  • Hongdi Sun
  • Yu Feng
  • Zhe Dong
  • Cong Bai

Different cataract types and various severities usually require different countermeasures. For automatic cataract diagnosis, existing cataract classification methods group cataracts into common types, such as nuclear cataract, cortical cataract, and posterior subcapsular cataract, while existing cataract grading works aim to achieve fine-grained evaluation of the severity of the most common types of cataract. The severity assessment differs among various types of cataracts. Existing work is limited in predicting various cataract types at different granularity levels. In order to improve diagnostic efficiency, our study explores this matter in the context of multi-granularity cataract classification. Firstly, a large-scale dataset called Multi-Granularity Long-Tailed Cataract is collected. Secondly, an end-to-end training network is proposed, in which the Transformer is investigated for the extraction of multi-granularity cataract features. What is more, considering the imbalanced cataract data with the long-tailed distribution, the Refocal loss is proposed to rebalance the loss contribution of different classes by enhancing the reciprocal value of the effective number of samples. Compared with state-of-the-art methods, the experiments conducted on the multi-granularity cataract classification dataset demonstrate that the proposed model achieves the highest Precision of 78. 22%, F1-score of 68. 35%, Kappa of 64. 38% and MCC of 64. 49%, indicating that the proposed framework is promising in offering physicians reliable quantitative evaluations for multi-granularity cataract classification, which can help guide appropriate treatment decisions before the patient’s cataracts worsen.

JBHI Journal 2026 Journal Article

Reliable Multimodal Cancer Survival Prediction With Confidence-Aware Risk Modeling

  • Xuping Xie
  • Ye Wang
  • Ziqi Zhao
  • Qixing Yang
  • Lan Huang
  • Fengfeng Zhou
  • Yan Wang

Multimodal survival methods that integrate histology whole-slide images and transcriptomic profiles hold significant promise for understanding patient prognostication and guiding personalized treatment strategies. However, existing approaches primarily focus on improving predictive performance through multimodal information fusion, often neglecting the reliability estimation of the prediction results and the inherent alignment noise across modalities. Thus, we propose ReCaSP, a novel and reliable cancer survival prediction framework that effectively integrates histology and transcriptomics data via multimodal alignment and fusion, providing the auxiliary confidence levels for survival predictions through a confidence-aware risk modeling mechanism. Specifically, our approach incorporates a fine-grained risk classifier that models risk labels jointly over both multiple time intervals and censorship status, utilizing evidential deep learning to yield fine-grained risk predictions accompanied by confidence scores. Additionally, to mitigate the inherent noise in multimodal data alignment, we introduce a cross-attention alignment module that effectively aligns histology data with transcriptomics data prior to multimodal fusion, thereby facilitating cross-modal interaction learning. Extensive experiments on five datasets demonstrate that ReCaSP significantly outperforms state-of-the-art methods, achieving a 4. 58% improvement in the overall C-Index.

EAAI Journal 2026 Journal Article

Research on personalized federated learning algorithm based on Kolmogorov–Smirnov test clustering

  • Rui Zhang
  • Qingao Liu
  • Siyan Yang
  • Yan Wang
  • Guodong You

Federated learning enables multiple clients to perform federated modeling to collaboratively solve machine learning tasks while protecting data privacy. However, studies have shown that when multiple clients have different data feature distributions, the performance of the obtained global model degrades and lacks generalization ability. To address this problem, this paper proposes a personalized federated learning algorithm based on Kolmogorov–Smirnov test fuzzy clustering. The algorithm adaptively clusters and tests clients with similar data feature distributions based on their local model parameter updates when learning the global model. Compared with existing methods, the proposed algorithm can dynamically estimate the number of clusters to which each client belongs and the fuzzy boundaries between clusters. We conducted experiments under conditions of independent and identically distributed, Dirichlet, and mixed triple distributions. The experiments show that compared with existing state-of-the-art clustering federated algorithms, the proposed algorithm demonstrates higher accuracy and stability in image classification tasks, and compensates for the drawbacks of the traditional federated learning algorithms, which are ineffective in directly aggregating data with differing feature distributions across clients, resulting in degraded performance.

EAAI Journal 2026 Journal Article

Semi-supervised medical image segmentation method via dual-view graph contrastive learning and latent space uncertainty rectification

  • Dongxu Cheng
  • Qiwei Dong
  • Yan Yang
  • Yan Wang
  • Ruian Zhu
  • Yuhui Zheng

As a typical application of artificial intelligence (AI), automated medical image segmentation plays a crucial role in numerous scenarios such as clinical diagnosis and auxiliary treatment. However, supervised medical image segmentation relies on abundant labeled data, which is expensive and time-consuming to obtain. Semi-supervised learning (SSL) has emerged as a promising engineering framework in this context to alleviate the annotation burden by leveraging abundant unlabeled data. However, existing SSL methods suffer from inadequate feature representation and high prediction uncertainty, which will cause blurred boundaries in complex anatomical structures and compromise diagnostic accuracy. To address these issues, we propose a semi-supervised medical image segmentation framework by integrating dual-view graph contrastive learning (DGCL) and latent space uncertainty rectification (LSUR) strategy. The DGCL constructs a topological graph in the feature space by adaptively linking high-entropy anchor points, which explicitly enforces the intra-class compactness and inter-class discriminability to enhance the structural semantic information extraction along decision boundaries. Complementarily, LSUR performs generative reconstruction and refinement in the latent space for high uncertainty regions, which significantly mitigates the error accumulation caused by pseudo-labels, and enhances model robustness and prediction accuracy. Extensive experiments on public datasets demonstrate that our proposed method outperforms state-of-the-art comparison methods on different evaluation metrics with 10% and 30% labeled data. This indicates that the proposed method provides an effective AI-driven solution for accurate clinical segmentation with limited annotations.

AAAI Conference 2026 Conference Paper

Semore: VLM-guided Enhanced Semantic Motion Representations for Visual Reinforcement Learning

  • Wentao Wang
  • Chunyang Liu
  • Kehua Sheng
  • Bo Zhang
  • Yan Wang

The growing exploration of Large Language Models (LLM) and Vision-Language Models (VLM) has opened avenues for enhancing the effectiveness of reinforcement learning (RL). However, existing LLM-based RL methods often focus on the guidance of control policy and encounter the challenge of limited representations of the backbone networks. To tackle this problem, we introduce Enhanced Semantic Motion Representations (Semore), a new VLM-based framework for visual RL, which can simultaneously extract semantic and motion representations through a dual-path backbone from the RGB flows. Semore utilizes VLM with common-sense knowledge to retrieve key information from observations, while using the pre-trained clip to achieve the text-image alignment, thereby embedding the ground-truth representations into the backbone. To efficiently fuse semantic and motion representations for decision-making, our method adopts a separately supervised approach to simultaneously guide the extraction of semantics and motion, while allowing them to interact spontaneously. Extensive experiments demonstrate that, under the guidance of VLM at the feature level, our method exhibits efficient and adaptive ability compared to state-of-the-art methods. All codes are released.

YNIMG Journal 2026 Journal Article

Sleeping brain oscillates with intensity-induced auditory rhythm

  • Yan Wang
  • Lingxi Lu
  • Lingyan Ma
  • Qihong Zou
  • Jia-Hong Gao

The rhythmic patterns embedded in auditory stimuli hold considerable significance, as individuals often exhibit a subconscious tendency to synchronize bodily movements with perceived rhythms during wakefulness. However, the extent to which the sleeping brain can discern specific acoustic attributes, such as sound intensity, and entrain to intensity-induced rhythms remains largely unexplored. Here, we employed a frequency-tagging paradigm with electroencephalographic (EEG) and magnetoencephalographic (MEG) simultaneous recording to investigate whether the sleeping brain could perceive the intensity-induced rhythm. Human participants were presented with a sequence of vocal syllables at a certain frequency with changed sound intensity while asleep and awake. The results showed that cortical activity oscillated periodically at the intensity-changed frequency, with a selective enhancement of spectral power at this frequency observed in the MEG spectrum during both rapid eye movement (REM) sleep and non-REM (NREM) sleep, although weaker than during wakefulness. Our findings further revealed increased engagement of the superior frontal gyrus and inferior parietal lobe during rhythm processing in sleep compared to wakefulness, suggesting a sleep-dependent enhancement of higher-order temporal organization of rhythmic information. Across different sleep stages, we found that the neural signals in light sleep were stronger than those in deep sleep and REM sleep at the vocal syllable frequency, and there was no significant difference in the rhythm frequency among the three sleep stages. These results suggest that processing of intensity-induced rhythms is preserved during both NREM and REM sleep and is associated with activity across frontal and parietal cortical regions.

AAAI Conference 2026 Conference Paper

SparseWorld: A Flexible, Adaptive, and Efficient 4D Occupancy World Model Powered by Sparse and Dynamic Queries

  • Chenxu Dang
  • Haiyan Liu
  • Jason Bao
  • Pei An
  • Xinyue Tang
  • An Pan
  • Jie Ma
  • Bingchuan Sun

Semantic occupancy has emerged as a powerful representation in world models for its ability to capture rich spatial semantics. However, most existing occupancy world models rely on static and fixed embeddings or grids, which inherently limit the flexibility of perception. Moreover, their ``in-place classification" over grids exhibits a potential misalignment with the dynamic and continuous nature of real scenarios. In this paper, we propose SparseWorld, a novel 4D occupancy world model that is flexible, adaptive, and efficient, powered by sparse and dynamic queries. We propose a Range-Adaptive Perception module, in which learnable queries are modulated by the ego vehicle states and enriched with temporal-spatial associations to enable extended-range perception. To effectively capture the dynamics of the scene, we design a State-Conditioned Forecasting module, which replaces classification-based forecasting with regression-guided formulation, precisely aligning the dynamic queries with the continuity of the 4D environment. In addition, We specifically devise a Temporal-Aware Self-Scheduling training strategy to enable smooth and efficient training. Extensive experiments demonstrate that SparseWorld achieves state-of-the-art performance across perception, forecasting, and planning tasks. Comprehensive visualizations and ablation studies further validate the advantages of SparseWorld in terms of flexibility, adaptability, and efficiency.

JBHI Journal 2026 Journal Article

Structure and Semantics Aware Multi-View Contrastive Learning for predicting association among lncRNAs, miRNAs and diseases

  • Lan Huang
  • Yujuan Zhang
  • Chenghao Li
  • Yuan Fu
  • Yan Wang
  • Nan Sheng

Exploring associations among long non coding RNAs (lncRNAs), microRNAs (miRNAs), and dis eases is crucial for biomarker discovery and precision medicine. Existing computational methods are hindered by sparse known associations and the complexity of bi ological networks. To address this challenge, we pro pose SSMVCL (Structure- and Semantic-aware Multi-View Contrastive Learning), a unified framework for predicting lncRNA-disease associations (LDAs), miRNA-disease associations (MDAs), and lncRNA-miRNA interactions (LMIs). SSMVCL constructs a heterogeneous bioinformatics network from multi-source biological data and learns representations from two complementary views: a structure-aware view for local topology and a semantic-aware view using biologically meaningful meta-paths to capture high-order relationships. A cross-view contrastive alignment module with adaptive negative sampling enforces consistency between views and enhances discriminative capability. On two benchmark datasets, SSMVCL achieves state-of-the art performance: for Dataset2, AUC/AUPR of 0. 9736/0. 9716 (LDA), 0. 9364/0. 9309 (MDA), and 0. 9297/0. 9234 (LMI) Case studies on gastric and prostate cancers further validated robustness and translational potential by identifying supported associations.

AAAI Conference 2026 Conference Paper

TR-DQ: Time-Rotation Diffusion Quantization

  • Yihua Shao
  • Deyang Lin
  • Minxi Yan
  • Siyu Chen
  • Fanhu Zeng
  • Minwen Liao
  • Ao Ma
  • Ziyang Yan

Diffusion models have been widely adopted in image and video generation. However, their complex network architecture leads to high inference overhead for its generation process. Existing diffusion quantization methods primarily focus on the quantization of the model structure while ignoring the impact of time-steps variation during sampling. At the same time, most current approaches fail to account for significant activations that cannot be eliminated, resulting in substantial performance degradation after quantization. To address these issues, we propose Time-Rotation Diffusion Quantization (TR-DQ), a novel quantization method incorporating time-step and rotation-based optimization. TR-DQ first divides the sampling process based on time-steps and applies a rotation matrix to smooth activations and weights dynamically. For different time-steps, a dedicated hyperparameter is introduced for adaptive timing modeling, which enables dynamic quantization across different time steps. Additionally, we also explore the compression potential of Classifier-Free Guidance (CFG-wise) to establish a foundation for subsequent work. TR-DQ achieves state-of-the-art (SOTA) performance on image generation and video generation tasks and a 1.38-1.89× speedup and 1.97-2.58× memory reduction in inference compared to existing quantization methods.

AIIM Journal 2025 Journal Article

A cell-interacting and multi-correcting method for automatic circulating tumor cells detection

  • Xuan Zhang
  • Rensheng Lai
  • Ling Bai
  • Jianxin Ji
  • Ruihao Qin
  • Lihong Jiang
  • Bin Meng
  • Ying Zhang

Sensitive detection of circulating tumor cells (CTCs) from peripheral blood can serve as an effective tool in the early diagnosis and prognosis of cancer. Many methods based on modern object detectors were proposed in recent years for automatic abnormal cells detection in slide images. Although the modes of these methods can also be applied to the CTCs detection, several practical difficulties lead to suboptimal performance of them, such as accurate capture of CTCs in a large number of mixed cells and identification of CTCs and CTC-like cells with similar visual characteristics. Here, we develop a new cell-interacting and multi-correcting detector called CMD, and apply H&E-stained slide images to detect CTCs automatically for the first time. Specifically, the proposed method incorporates two task-oriented novel modules: (1) a self-attention module for aggregating feature interactions between cells and allowing the model to pay more attention to key abnormal cells, (2) a hard sample mining sampler for progressively correcting predictions of cells with ambiguous classification boundaries. Experiments conducted on a multi-center dataset of 1247 annotated slide images confirm the superiority of our method over state-of-the-art cell detection methods. The results of ablation experiment part also prove the effectiveness of two modules. The source codes of this paper are available at https: //github. com/zx333445/CMD.

AIIM Journal 2025 Journal Article

A multi-stage multi-modal learning algorithm with adaptive multimodal fusion for improving multi-label skin lesion classification

  • Lihan Zuo
  • Zizhou Wang
  • Yan Wang

Skin cancer is frequently occurring and has become a major contributor to both cancer incidence and mortality. Accurate and timely diagnosis of skin cancer holds the potential to save lives. Deep learning-based methods have demonstrated significant advancements in the screening of skin cancers. However, most current approaches rely on a single modality input for diagnosis, thereby missing out on valuable complementary information that could enhance accuracy. Although some multimodal-based methods exist, they often lack adaptability and fail to fully leverage multimodal information. In this paper, we introduce a novel uncertainty-based hybrid fusion strategy for a multi-modal learning algorithm aimed at skin cancer diagnosis. Our approach specifically combines three different modalities: clinical images, dermoscopy images, and metadata, to make the final classification. For the fusion of two image modalities, we employ an intermediate fusion strategy that considers the similarity between clinical and dermoscopy images to extract features containing both complementary and correlated information. To capture the correlated information, we utilize cosine similarity, and we employ concatenation as the means for integrating complementary information. In the fusion of image and metadata modalities, we leverage uncertainty to obtain confident late fusion results, allowing our method to adaptively combine the information from different modalities. We conducted comprehensive experiments using a popular publicly available skin disease diagnosis dataset, and the results of these experiments demonstrate the effectiveness of our proposed method. Our proposed fusion algorithm could enhance the clinical applicability of automated skin lesion classification, offering a more robust and adaptive way to make automatic diagnoses with the help of uncertainty mechanism. Code is available at https: //github. com/Zuo-Lihan/CosCatNet-Adaptive_Fusion_Algorithm.

IJCAI Conference 2025 Conference Paper

Adaptive Graph Unlearning

  • Pengfei Ding
  • Yan Wang
  • Guanfeng Liu
  • Jiajie Zhu

Graph unlearning, which deletes graph elements such as nodes and edges from trained graph neural networks (GNNs), is crucial for real-world applications where graph data may contain outdated, inaccurate, or privacy-sensitive information. However, existing methods often suffer from (1) incomplete or over unlearning due to neglecting the distinct objectives of different unlearning tasks, and (2) inaccurate identification of neighbors affected by deleted elements across various GNN architectures. To address these limitations, we propose AGU, a novel Adaptive Graph Unlearning framework that flexibly adapts to diverse unlearning tasks and GNN architectures. AGU ensures the complete forgetting of deleted elements while preserving the integrity of the remaining graph. It also accurately identifies affected neighbors for each GNN architecture and prioritizes important ones to enhance unlearning performance. Extensive experiments on seven real-world graphs demonstrate that AGU outperforms existing methods in terms of effectiveness, efficiency, and unlearning capability.

NeurIPS Conference 2025 Conference Paper

Addressing Mark Imbalance in Integration-free Marked Temporal Point Processes

  • Sishun Liu
  • Ke Deng
  • Yongli Ren
  • Yan Wang
  • Xiuzhen Zhang

Marked Temporal Point Process (MTPP) has been well studied to model the event distribution in marked event streams, which can be used to predict the mark and arrival time of the next event. However, existing studies overlook that the distribution of event marks is highly imbalanced in many real-world applications, with some marks being frequent but others rare. The imbalance poses a significant challenge to the performance of the next event prediction, especially for events of rare marks. To address this issue, we propose a thresholding method, which learns thresholds to tune the mark probability normalized by the mark's prior probability to optimize mark prediction, rather than predicting the mark directly based on the mark probability as in existing studies. In conjunction with this method, we predict the mark first and then the time. In particular, we develop a novel neural Marked Temporal Point Process (MTPP) model to support effective time sampling and estimation of mark probability without computationally expensive numerical improper integration. Extensive experiments on real-world datasets demonstrate the superior performance of our solution against various baselines for the next event mark and time prediction. The code is available at https: //github. com/undes1red/IFNMTPP.

IJCAI Conference 2025 Conference Paper

Advancing Stain Transfer for Multi-Biomarkers: A Human Annotation-Free Method Based on Auxiliary Task Supervision

  • Siyuan Xu
  • Haofei Song
  • Yingjiao Deng
  • Jiansheng Wang
  • Yan Wang
  • Qingli Li

Histopathological examination primarily relies on hematoxylin and eosin (H&E) and immunohistochemical (IHC) staining. Though IHC provides more crucial molecular information for diagnosis, it is more costly than H&E staining. Stain transfer technology seeks to efficiently generate virtual IHC images from H&E images. While current deep learning-based methods have made progress, they still struggle to maintain pathological and structural consistency across biomarkers without pixel-level aligned reference. To address the problem, we propose an Auxiliary Task supervision-based Stain Transfer method for multi-biomarkers (ATST-Net), which pioneeringly employs human annotation-free masks as ground truth (GT). ATST-Net ensures pathological consistency, structural preservation and style transfer. It automatically annotates H&E masks in a cost-effective manner by utilizing consecutive IHC sections. Multiple auxiliary tasks provide diverse supervisory information on the location and intensity of biomarker expression, ensuring model accuracy and interpretability. We design a pretrained model-based generator to extract deep feature in H&E images, improving generalization performance. Extensive experiments demonstrate the effectiveness of ATST-Net's components. Compared to existing methods, ATST-Net achieves state-of-the-art (SOTA) accuracy on datasets with multiple biomarkers and intensity levels, while also reflecting high practical value. Code is available at https: //github. com/SikangSHU/ATST-Net.

ICLR Conference 2025 Conference Paper

Block-Attention for Efficient Prefilling

  • Dongyang Ma
  • Yan Wang
  • Tian Lan

We introduce Block-attention, an attention mechanism designed to address the increased inference latency and cost in Retrieval-Augmented Generation (RAG) scenarios. Traditional approaches often encode the entire context in an auto-regressive manner. Instead, Block-attention divides retrieved documents into discrete blocks, with each block independently calculating key-value (KV) states except for the final block. In RAG scenarios, by defining each passage as a block, Block-attention enables us to reuse the KV states of passages that have been seen before, thereby significantly reducing the latency and the computation overhead during inference. The implementation of Block-attention involves block segmentation, position re-encoding, and fine-tuning the LLM to adapt to the Block-attention mechanism. Experiments on 11 diverse benchmarks, including RAG, ICL, and general domains, demonstrate that after block fine-tuning, the Block-attention model not only achieves performance comparable to that of full-attention models, but can also seamlessly switch between the block and full attention modes without any performance loss. Notably, Block-attention significantly reduces the time to first token (TTFT) and floating point operations (FLOPs) to a very low level. It only takes 45 ms to output the first token for an input sequence with a total length of 32K. Compared to the full-attention models, the TTFT and corresponding FLOPs are reduced by 98.7\% and 99.8\%, respectively. Additionally, in Appendix A, we elaborate on how Block-attention is applied in Game AI scenario and the substantial potential benefits it entails. We strongly suggest researchers in the gaming field not to overlook this section.

AAAI Conference 2025 Conference Paper

CAMSIC: Content-aware Masked Image Modeling Transformer for Stereo Image Compression

  • Xinjie Zhang
  • Shenyuan Gao
  • Zhening Liu
  • Jiawei Shao
  • Xingtong Ge
  • Dailan He
  • Tongda Xu
  • Yan Wang

Existing learning-based stereo image codec adopt sophisticated transformation with simple entropy models derived from single image codecs to encode latent representations. However, those entropy models struggle to effectively capture the spatial-disparity characteristics inherent in stereo images, which leads to suboptimal rate-distortion results. In this paper, we propose a stereo image compression framework, named CAMSIC. CAMSIC independently transforms each image to latent representation and employs a powerful decoder-free Transformer entropy model to capture both spatial and disparity dependencies, by introducing a novel content-aware masked image modeling (MIM) technique. Our content-aware MIM facilitates efficient bidirectional interaction between prior information and estimated tokens, which naturally obviates the need for an extra Transformer decoder. Experiments show that our stereo image codec achieves state-of-the-art rate-distortion performance on two stereo image datasets Cityscapes and InStereo2K with fast encoding and decoding speed.

TMLR Journal 2025 Journal Article

DisDet: Exploring Detectability of Backdoor Attack on Diffusion Models

  • Yang Sui
  • Huy Phan
  • Jinqi Xiao
  • Tianfang Zhang
  • Zijie Tang
  • Cong Shi
  • Yan Wang
  • Yingying Chen

In the exciting generative AI era, the diffusion model has emerged as a very powerful and widely adopted content-generation tool. Very recently, some pioneering works have shown the vulnerability of the diffusion model against backdoor attacks, calling for in-depth analysis and investigation of the security challenges. In this paper, we explore the detectability of the poisoned noise input for the backdoored diffusion models, an important performance metric yet little explored in the existing works. Starting from the perspective of a defender, we first analyze the distribution discrepancy of the trigger pattern in the existing diffusion backdoor attacks. Based on this finding, we propose a trigger detection mechanism that can effectively identify the poisoned input noise. Then, from the attack side, we propose a backdoor attack strategy that can learn the unnoticeable trigger to evade our proposed detection scheme. Our empirical evaluations across various diffusion models and datasets demonstrate the effectiveness of the proposed trigger detection and detection-evading attack strategy. For trigger detection, our distribution discrepancy-based solution can achieve a 100% detection rate for the Trojan triggers used in the existing works. For evading trigger detection, our proposed stealthy trigger design approach performs end-to-end learning to make the distribution of poisoned noise input approach that of benign noise, enabling nearly 100% detection pass rate with very high attack and benign performance for the backdoored diffusion models.

JBHI Journal 2025 Journal Article

Estimation of Ankle Joint Moment From Plantar Pressure Through an Optimized Sensor Layout Using Genetic Algorithm and Deep Forest Regression

  • Mingxia Gong
  • Wenxuan Chen
  • Yih-Kuen Jan
  • Yu Zhao
  • Jie Yao
  • Yan Wang
  • Weiyan Ren
  • Fang Pu

Objective: Ankle joint moments are critical in gait analysis, with accurate assessments typically necessitating complex inverse dynamics modeling. Pressure insoles are widely used wearable devices that have shown feasibility in estimating joint angles. However, achieving cost-effective, high-precision estimation of ankle joint moment remains challenging. This study combines genetic algorithm (GA) with deep forest regression (DFR) to optimize the number and layout of plantar pressure sensors, and estimate ankle joint moment based on plantar pressure. Methods: 26 healthy young participants were recruited to collect motion trajectories, ground reaction forces, and plantar pressure data while walking at fast, medium, and slow speeds. Ten gait cycles per speed per participant were analyzed for ankle joint moments using inverse dynamics, constituting the dataset. An optimization algorithm was constructed by combining GA with DFR, using the fitness function as the objective for sensor number and layout optimization. The leave-one-out cross-validation was employed to evaluate the precision of the model. Results: The highest fitness was achieved with an optimized layout using 9 sensors. The Pearson Correlation Coefficients for the sagittal, coronal, and transverse plane moments were 0. 967 ± 0. 014, 0. 918 ± 0. 027, and 0. 894 ± 0. 073. The optimized layout showed no significant difference in estimation accuracy across various walking speeds (P > 0. 05). Conclusion: The proposed GA-DFR algorithm is capable of estimating ankle joint moment accurately and optimizing the number and layout of sensors. Significance: The algorithm and optimized sensor layout enables the accurate and rapid estimation of ankle joint moment from plantar pressure insoles with trade-off approach.

NeurIPS Conference 2025 Conference Paper

F-Adapter: Frequency-Adaptive Parameter-Efficient Fine-Tuning in Scientific Machine Learning

  • Hangwei Zhang
  • Chun Kang
  • Yan Wang
  • Difan Zou

Parameter-efficient fine-tuning (PEFT) powerful pre-trained models for complex downstream tasks has proven effective in vision and language processing, yet this paradigm remains unexplored in scientific machine learning, where the objective is to model complex physical systems. We conduct the first systematic study of PEFT for pre-trained Large Operator Models (LOMs) obtained by scaling variants of Fourier Neural Operator. We observe that the widely used Low-Rank Adaptation (LoRA) yields markedly poorer performance on LOMs than Adapter tuning. We further theoretically establish that stacked LoRA incurs a depth-amplified lower bound on approximation error within Fourier layers, whereas adapters retain universal approximation capacity and, by concentrating parameters on energy-dominant low-frequency modes, attain exponentially decaying error with bottleneck width in the Fourier domain. Motivated by the robust empirical gains of adapters and by our theoretical characterization of PDE solutions as spectrally sparse, we introduce Frequency-Adaptive Adapter (F-Adapter). F-Adapter allocates adapter capacity based on spectral complexity, assigning higher-dimension modules to low-frequency components and lower-dimension modules to high-frequency components. Our F-Adapters establish state-of-the-art results on multiple challenging 3D Navier–Stokes benchmarks, markedly enhancing both generalization and spectral fidelity over LoRA and other PEFT techniques commonly used in LLMs. To the best of our knowledge, this work is the first to explore PEFT for scientific machine-learning and establishes F-Adapter as an effective paradigm for this domain. The code will be made publicly available upon acceptance.

AAAI Conference 2025 Conference Paper

GapMatch: Bridging Instance and Model Perturbations for Enhanced Semi-Supervised Medical Image Segmentation

  • Wei Huang
  • Lei Zhang
  • Zizhou Wang
  • Yan Wang

Medical image segmentation provides detailed understanding and aids in diagnosis, treatment planning, and monitoring of diseases. Due to the high cost of acquiring labeled data in the field of medical image analysis, semi-supervised segmentation methods have garnered increasing attention. Benefiting from their simplicity and effectiveness, consistency regularization-based methods have emerged as a significant research focus by utilizing perturbations. However, existing methods typically consider perturbation strategies from only a single perspective: either instance perturbation or model perturbation, thus ignoring the potential benefit of effectively combining both. In response, we propose a unified perturbation framework named GapMatch, which bridges instance and model perturbations to broaden the perturbation space and employs dual perturbation to impose consistency regularization on the model. Specifically, GapMatch involves using instance perturbation to update the decision boundary and model perturbation to further optimize it. These two steps mutually reinforce each other in an iterative manner, effectively pushing the decision boundary towards low-density regions while maximizing the class margin. Extensive experimental results on two popular medical image benchmarks demonstrate the effectiveness and generality of the proposed method.

IJCAI Conference 2025 Conference Paper

In-Context Meta LoRA Generation

  • Yihua Shao
  • Minxi Yan
  • Yang Liu
  • Siyu Chen
  • Wenjie Chen
  • Xinwei Long
  • Ziyang Yan
  • Lei Li

Low-rank Adaptation (LoRA) has demonstrated remarkable capabilities for task specific fine-tuning. However, in scenarios that involve multiple tasks, training a separate LoRA model for each one results in considerable inefficiency in terms of storage and inference. Moreover, existing parameter generation methods fail to capture the correlations among these tasks, making multi-task LoRA parameter generation challenging. To address these limitations, we propose In-Context Meta LoRA (ICM-LoRA), a novel approach that efficiently achieves task-specific customization of large language models (LLMs). Specifically, we use training data from all tasks to train a tailored generator, Conditional Variational Autoencoder (CVAE). CVAE takes task descriptions as inputs and produces task-aware LoRA weights as outputs. These LoRA weights are then merged with LLMs to create task-specialized models without the need for additional fine-tuning. Furthermore, we utilize in-context meta-learning for knowledge enhancement and task mapping, to capture the relationship between tasks and parameter distributions. As a result, our method achieves more accurate LoRA parameter generation for diverse tasks using CVAE. ICM-LoRA enables more accurate LoRA parameter reconstruction than current parameter reconstruction methods and is useful for implementing task-specific enhancements of LoRA parameters. At the same time, our method occupies 283MB, only 1% storage compared with the original LoRA. The code is available at https: //github. com/YihuaJerry/ICM-LoRA.

AAAI Conference 2025 Conference Paper

LLM4RSR: Large Language Models as Data Correctors for Robust Sequential Recommendation

  • Yatong Sun
  • Xiaochun Yang
  • Zhu Sun
  • Yan Wang
  • Bin Wang
  • Xinghua Qu

Sequential Recommenders (SRs) are trained to predict the next item as the target given its preceding items as the input, assuming every input-target pair is matched and is reliable for training. However, users can be induced by external distractions to click on items inconsistent with their true preferences, resulting in unreliable training instances with mismatched input-target pairs. To resist unreliable data, researchers attempt to develop Robust SRs (RSRs). However, our data analysis unveils that existing RSRs are data-driven. That is, for most instances formed by infrequently co-occurred items, existing RSRs are uncertain about their reliability. To fill this gap, we propose a generic framework -- LLM4RSR (Large Language Models for Robust Sequential Recommendation) to semantically complement data-driven RSRs by correcting uncertain instances into reliable ones based on LLMs' semantic comprehension of items beyond co-occurrence. In this way, RSRs can be re-trained with the corrected data for better accuracy. This is a selective knowledge distillation procedure, where the LLM acts as a teacher guiding student RSRs via uncertain instances. To align LLMs with the data correction task and mitigate inherent hallucinations, we equip the LLM with profile, plan, and memory modules, which are automatically optimized via textual gradient descent, eliminating the need for human effort and expertise. Experiments on four real-world datasets spanning eight backbones verify the generality, effectiveness, and efficiency of LLM4RSR.

IJCAI Conference 2025 Conference Paper

MonoMixer: Marrying Convolution and Vision Transformer for Efficient Self-Supervised Monocular Depth Estimation

  • Zhiyong Chang
  • Yan Wang

Self-supervised monocular depth estimation that does not require hard-to-source depth labels for training has been widely studied in recent years. Due to its significant and growing needs, many lightweight but effective architectures have been designed for edge devices. Convolutional Neural Networks (CNNs) have shown its extraordinary ability in monocular depth estimation. However, their limited receptive field stints existing methods to reason only locally, inhibiting the effectiveness of the self-supervised paradigm. Recently, Transformers has achieved great success in estimating depth maps from monocular images. Nevertheless, massive parameters in the Transformers hinder the deployment to edge devices. In this paper, we propose MonoMixer, a brand-new lightweight CNN-Transformer architecture with three main contributions: 1) The details-augmented (DA) block employs graph reasoning unit to capture abundant local details, resulting depth prediction at a higher level of precision. 2) The self-modulate channel attention (SMCA) block adaptively adjust the channel weights of feature maps, aiming to emphasize the crucial features and aggregate channel-wise feature maps of different patterns. 3) The global-guided Transformer (G2T) block integrates global semantic token into multi-scale local features and exploit cross-attention to encode long range dependencies. Furthermore, experimental results demonstrate the superiority of our proposed MonoMixer both at model size and inference speed, and achieve state-of-the-art performance on three datasets: KITTI, Make3D and Cityscapes. Specifically, our proposed MonoMixer outperforms MonoFormer by a large margin in accuracy, with about 95 % fewer parameters.

AAAI Conference 2025 Conference Paper

Multimodal Hypothetical Summary for Retrieval-based Multi-image Question Answering

  • Peize Li
  • Qingyi Si
  • Peng Fu
  • Zheng Lin
  • Yan Wang

Retrieval-based multi-image question answering (QA) task involves retrieving multiple question-related images and synthesizing these images to generate an answer. Conventional "retrieve-then-answer" pipelines often suffer from cascading errors because the training objective of QA fails to optimize the retrieval stage. To address this issue, we propose a novel method to effectively introduce and reference retrieved information into the QA. Given the image set to be retrieved, we employ a multimodal large language model (visual perspective) and a large language model (textual perspective) to obtain multimodal hypothetical summary in question-form and description-form. By combining visual and textual perspectives, MHyS captures image content more specifically and replaces real images in retrieval, which eliminates the modality gap by transforming into text-to-text retrieval and helps improve retrieval. To more advantageously introduce retrieval with QA, we employ contrastive learning to align queries (questions) with MHyS. Moreover, we propose a coarse-to-fine strategy for calculating both sentence-level and word-level similarity scores, to further enhance retrieval and filter out irrelevant details. Our approach achieves a 3.7% absolute improvement over state-of-the-art methods on RETVQA and a 14.5% improvement over CLIP. Comprehensive experiments and detailed ablation studies demonstrate the superiority of our method.

AAAI Conference 2025 Conference Paper

OUS: Bridging Scene Context and Facial Features to Overcome the Rigid Cognitive Problem

  • Xinji Mai
  • Haoran Wang
  • Zeng Tao
  • Junxiong Lin
  • Shaoqi Yan
  • Yan Wang
  • Jiawen Yu
  • Xuan Tong

Dynamic Facial Expression Recognition (DFER) is crucial for affective computing but often overlooks the impact of scene context. We have identified a significant issue in current DFER tasks: human annotators typically integrate emotions from various angles, including environmental cues and body language, whereas existing DFER methods tend to consider the scene as noise that needs to be filtered out, focusing solely on facial information. We refer to this as the Rigid Cognitive Problem. The Rigid Cognitive Problem can lead to discrepancies between the cognition of annotators and models in some samples. To align more closely with the human cognitive paradigm of emotions, we propose an Overall Understanding of the Scene DFER method (OUS). OUS effectively integrates scene and facial features, combining scene-specific emotional knowledge for DFER. Extensive experiments on the two largest datasets in the DFER field, DFEW and FERV39k, demonstrate that OUS significantly outperforms existing methods. By analyzing the Rigid Cognitive Problem, OUS successfully understands the complex relationship between scene context and emotional expression, closely aligning with human emotional understanding in real-world scenarios.

EAAI Journal 2025 Journal Article

Path optimization of a flexible robot with a spatial compressed and a direction-guided exploring method

  • Yan Wang
  • Wensong Jiang
  • Zai Luo
  • Li Yang
  • Jiafu Li
  • Hongzhe Lu

Optimal path planning of a measuring robot is a crucial component in automatic measurement. However, it is hard to obtain a stable optimal path solution when dealing with multiple environments. To overcome this problem, a spatial compressed and direction-guided exploring method is proposed based on Rapidly-exploring Random Tree (RRT). First, to enhance the consistency of the search path, a spatial compaction strategy is incorporated into the direction guidance Rapidly-exploring Random Tree (DG_RRT) method, which optimizes the exploration space for initial path generation. Second, invalid spaces are simplified by model constraints. Third, to obtain more suitable spatial compression models for different environmental features, the generated compressed space is further optimized by adjusting the functional coefficient (FC) and iterations. Path optimization consists of two key steps: linearization for redundant node elimination and curve-fitting for path smoothing. Subsequent discretization is then applied to ensure compatibility with robotic execution constraints. To verify the suggested spatial compressed and direction-guided RRT (SC_DG RRT) method, both numerical simulation and experimental analysis are carried out. The experimental results reveal that the average path length (APL) of SC_DG RRT is 11. 4 % lower than that of the DG_RRT and 42. 3 % lower than that of Q-learning (QL). The standard deviation (SD) of the path length of SC_DG RRT is 74. 3 % lower than that of DG_RRT. It demonstrates that the SC_DG RRT is superior to other methods.

AAAI Conference 2025 Conference Paper

Physical-aware Neural Radiance Fields for Efficient Exposure Correction

  • Kai Xu
  • Mingwen Shao
  • Yuanjian Qiao
  • Yan Wang

Neural Radiance Fields (NeRF) has achieved remarkable success in synthesizing impressive novel views. However, existing methods usually fail to handle scenes with adverse lighting conditions caused by external time variations and different camera settings, leading to poor visual quality. To address this challenge, we propose a physical-aware NeRF for efficient exposure correction, named PHY-NeRF. Specifically, we design Adaptive Lighting Particles inspired by the theory of light scattering and absorption, which can adjust the illumination intensity during volume rendering. Subsequently, we can handle scenes with different lighting conditions by jointly optimizing camera parameters and these lighting particles. Moreover, to promote natural brightness transitions, we devise a global illumination consistency module to control the lighting intensity across views at the feature level while completing more details. Benefiting from the above designs, our PHY-NeRF can tackle arbitrary low-light or overexposed scenes in an unsupervised manner. Extensive experiments show that our PHY-NeRF achieves state-of-the-art results in addressing adverse lighting problems while ensuring high rendering efficiency.

ICLR Conference 2025 Conference Paper

ProteinBench: A Holistic Evaluation of Protein Foundation Models

  • Fei Ye
  • Zaixiang Zheng
  • Dongyu Xue
  • Yuning Shen
  • Lihao Wang
  • Yiming Ma
  • Yan Wang
  • Xinyou Wang

Recent years have witnessed a surge in the development of protein foundation models, significantly improving performance in protein prediction and generative tasks ranging from 3D structure prediction and protein design to conformational dynamics. However, the capabilities and limitations associated with these models remain poorly understood due to the absence of a unified evaluation framework. To fill this gap, we introduce ProteinBench, a holistic evaluation framework designed to enhance the transparency of protein foundation models. Our approach consists of three key components: (i) A taxonomic classification of tasks that broadly encompass the main challenges in the protein domain, based on the relationships between different protein modalities; (ii) A multi-metric evaluation approach that assesses performance across four key dimensions: quality, novelty, diversity, and robustness; and (iii) In-depth analyses from various user objectives, providing a holistic view of model performance. Our comprehensive evaluation of protein foundation models reveals several key findings that shed light on their current capabilities and limitations. To promote transparency and facilitate further research, we release the evaluation dataset, code, and a public leaderboard publicly for further analysis and a general modular toolkit. We intend for ProteinBench to be a living benchmark for establishing a standardized, in-depth evaluation framework for protein foundation models, driving their development and application while fostering collaboration within the field.

JBHI Journal 2025 Journal Article

Self-Supervised Contrastive Learning on Attribute and Topology Graphs for Predicting Relationships Among lncRNAs, miRNAs and Diseases

  • Lan Huang
  • Nan Sheng
  • Ling Gao
  • Lei Wang
  • Wenju Hou
  • Jie Hong
  • Yan Wang

Exploring associations between long non-coding RNAs (lncRNAs), microRNAs (miRNAs) and diseases is crucial for disease prevention, diagnosis and treatment. While determining these relationships experimentally is resource-intensive and time-consuming, computational methods have emerged as an attractive way. However, existing computational methods tend to focus on single tasks, neglecting the benefits of leveraging multiple biomolecular interactions and domain-specific knowledge for multi-task prediction. Furthermore, the scarcity of labeled data for lncRNA-disease associations (LDAs), miRNA-disease associations (MDAs) and lncRNA-miRNA interactions (LMIs) poses challenges for comprehensive node embedding learning. This paper proposes a multi-task prediction model (called SSCLMD) that employs self-supervised contrastive learning on attribute and topology graphs to identify potential LDAs, MDAs and LMIs. Firstly, domain knowledge of lncRNAs, miRNAs and diseases as well as their interactions are exploited to construct attribute graph and topology graph, respectively. Then, the nodes are encoded in the attribute and topology spaces to extract the specific and common feature. Meanwhile, the attention mechanism is performed to adaptively fuse the embedding from different views. SSCLMD incorporates contrastive self-supervised learning as a regularize to guide node embedding learning in both attribute and topology space without relying on labels. Severing as a regularize in multi-task learning paradigm, it to improves the model. s generalization capabilities. Extensive experiments on 2 manually curated datasets demonstrate that SSCLMD significantly outperforms baseline methods in LDA, MDA and LMI prediction tasks. Case studies on both old and new datasets further supported SSCLMD's ability to uncover novel disease-related lncRNAs and miRNAs.

ICLR Conference 2025 Conference Paper

Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving

  • Xiang Li 0205
  • Pengfei Li 0007
  • Yupeng Zheng
  • Wei Sun
  • Yan Wang
  • Yilun Chen

Understanding world dynamics is crucial for planning in autonomous driving. Recent methods attempt to achieve this by learning a 3D occupancy world model that forecasts future surrounding scenes based on current observation. However, 3D occupancy labels are still required to produce promising results. Considering the high annotation cost for 3D outdoor scenes, we propose a semi-supervised vision-centric 3D occupancy world model, **PreWorld**, to leverage the potential of 2D labels through a novel two-stage training paradigm: the self-supervised pre-training stage and the fully-supervised fine-tuning stage. Specifically, during the pre-training stage, we utilize an attribute projection head to generate different attribute fields of a scene (e.g., RGB, density, semantic), thus enabling temporal supervision from 2D labels via volume rendering techniques. Furthermore, we introduce a simple yet effective state-conditioned forecasting module to recursively forecast future occupancy and ego trajectory in a direct manner. Extensive experiments on the nuScenes dataset validate the effectiveness and scalability of our method, and demonstrate that PreWorld achieves competitive performance across 3D occupancy prediction, 4D occupancy forecasting and motion planning tasks.

NeurIPS Conference 2025 Conference Paper

Simultaneous Modeling of Protein Conformation and Dynamics via Autoregression

  • Yuning Shen
  • Lihao Wang
  • Huizhuo Yuan
  • Yan Wang
  • Bangji Yang
  • Quanquan Gu

Understanding protein dynamics is critical for elucidating their biological functions. The increasing availability of molecular dynamics (MD) data enables the training of deep generative models to efficiently explore the conformational space of proteins. However, existing approaches either fail to explicitly capture the temporal dependencies between conformations or do not support direct generation of time-independent samples. To address these limitations, we introduce ConfRover, an autoregressive model that simultaneously learns protein conformation and dynamics from MD trajectory data, supporting both time-dependent and time-independent sampling. At the core of our model is a modular architecture comprising: (i) an encoding layer, adapted from protein folding models, that embeds protein-specific information and conformation at each time frame into a latent space; (ii) a temporal module, a sequence model that captures conformational dynamics across frames; and (iii) an SE(3) diffusion model as the structure decoder, generating conformations in continuous space. Experiments on ATLAS, a large-scale protein MD dataset of diverse structures, demonstrate the effectiveness of our model in learning conformational dynamics and supporting a wide range of downstream tasks. ConfRover is the first model to sample both protein conformations and trajectories within a single framework, offering a novel and flexible approach for learning from protein MD data.

IJCAI Conference 2025 Conference Paper

Towards Comprehensive and Prerequisite-Free Explainer for Graph Neural Networks

  • Han Zhang
  • Yan Wang
  • Guanfeng Liu
  • Pengfei Ding
  • Huaxiong Wang
  • Kwok-Yan Lam

To enhance the reliability and credibility of graph neural networks (GNNs) and improve the transparency of their decision logic, a new field of explainability of GNNs (XGNN) has emerged. However, two major limitations severely degrade the performance and hinder the generalizability of existing XGNN methods: they (a) fail to capture the complete decision logic of GNNs across diverse distributions in the entire dataset's sample space, and (b) impose strict prerequisites on edge properties and GNN internal accessibility. To address these limitations, we propose OPEN, a novel cOmprehensive and Prerequisite-free Explainer for GNNs. OPEN, as the first work in the literature, can infer and partition the entire dataset's sample space into multiple environments, each containing graphs that follow a distinct distribution. OPEN further learns the decision logic of GNNs across different distributions by sampling subgraphs from each environment and analyzing their predictions, thus eliminating the need for strict prerequisites. Experimental results demonstrate that OPEN captures nearly complete decision logic of GNNs, outperforms state-of-the-art methods in fidelity while maintaining similar efficiency, and enhances robustness in real-world scenarios.

TCS Journal 2025 Journal Article

Unpaired set-to-set disjoint path routings in recursive match networks

  • Bai Yin
  • Qianru Zhou
  • Hai Liu
  • Yan Wang
  • Baolei Cheng
  • Jianxi Fan

The recursive match networks represent a family of networks, encompassing various types of network structures. Among these network structures, the bijective connection networks and BCube are all special cases of recursive match networks. On the other hand, the bijective connection networks also stand for a family of networks, encompassing well-known hypercubes, twisted cubes, Möbius cubes, and crossed cubes. The BCube, a promising candidate for the data center network model, contains as many as thousands (even millions) of servers. Recursive match networks integrate diverse known networks as well as potentially other future ones, underscoring the significance of exploring their study. One of the key topics is finding vertex-disjoint paths in recursive match networks. An unpaired set-to-set disjoint paths problem is as follows: given a set of source vertices S = { s 1, s 2, …, s p } and a set of sink vertices T = { t 1, t 2, …, t q } in an r-connected graph G = ( V ( G ), E ( G ) ) with m ≤ m i n { p, q, r }, construct m vertex-disjoint paths P i from source s a i to sink t b i ( 1 ≤ i ≤ m ) such that { a 1, a 2, …, a m } ⊆ { 1, 2, …, p } and { b 1, b 2, …, b m } ⊆ { 1, 2, …, q }. In this paper, we give a proof of existence of unpaired set-to-set disjoint paths in a k-order, n-dimensional recursive match network X k, n, where the length of each path does not exceed 2 n − 1. Then, we propose an O ( N k 4 ( log k N ) 3 ) algorithm to construct nk vertex-disjoint paths between any pair of nk-vertex sets in X k, n, where N is the vertex number of X k, n. Furthermore, we randomly generate multiple X k, n with different parameters k and n, and apply the algorithm to simulate experiments on them. Finally, we evaluate the algorithm by comparing the maximum length of the obtained vertex-disjoint paths with the upper limit of diameter of X k, n. The experimental results show that the maximum length is close to the upper limit, with a deviation not exceeding 2.

TCS Journal 2025 Journal Article

Vertex-independent spanning trees in complete Josephus cubes

  • Qi He
  • Yan Wang
  • Jianxi Fan
  • Baolei Cheng

Vertex-independent spanning trees (short for VISTs) serve as pivotal constructs in numerous network algorithms and have been the subject of extensive research for three decades. The n-dimensional complete Josephus cube C J C n, derived from the Josephus cube, was first proposed to achieve better fault tolerance while maximizing routing efficiency (no sacrificing routing efficiency). Compared to the Josephus cube, it exhibits enhanced symmetry, improved connectivity, and better fault tolerance while maintaining efficient embedding, incremental scalability, and short diameter ( ⌈ n 2 ⌉ ). This paper studies the existence and construction of n + 2 VISTs in C J C n rooted at an arbitrary vertex. To determine the specific connection edge between vertex v and its parent in the spanning tree T i, three algorithms were first proposed to calculate the values of F v, i, M v, i, and H v, i, respectively, where v ∈ V ( C J C n ) and i ∈ { 0, 1, ⋯, n + 1 }. Based on these algorithms, a parallel algorithm is proposed to construct n + 2 ( n ≥ 4 ) VISTs in C J C n using 2 n processors. As C J C n is ( n + 2 ) -connected, our algorithm is designed to yield the optimal number of resulting VISTs for n ≥ 4. Finally, we present the theoretical proof of the parallel algorithm and demonstrate that its time complexity is O ( n ).

EAAI Journal 2025 Journal Article

Virtual workflows and adaptive optimization scheduling of production process with feedback constraints

  • Zhen Quan
  • Yan Wang
  • Xiang Liu
  • Zhicheng Ji

Feedback structure process often exists in production processes with precise quality requirements. The constraint of feedback process is conditional and reverse, and these peculiarities increase the complexity of scheduling problem. A dynamic scheduling model based on virtual workflows is designed and a scheduling optimization method with a new adaptive sequencing rule strategy is proposed for flexible production scheduling problem with feedback constraints. Based on feedback structure virtualization and virtual nodes responding to feedback disturbances, the dynamic scheduling model implements a mechanism for synchronizing the updating of task workflows and machine workflows with activities of feedback processes to trigger rescheduling. To adapt to the correlation between key dynamic features of scheduling scenario and multi-objective balance, the adaptive sequencing rule strategy with mean tardiness as dominant objective and makespan and energy consumption as regular objectives is proposed to improve multi-objective dynamic trade-off optimization performances of the scheduling method based on hybrid decision-making mechanism. Comparative tests are conducted to verify that the hybrid decision-making mechanism incorporating the proposed adaptive rule strategy can effectively improve the comprehensive dominance level of scheduling optimization results. Simulation tests show that the dynamic scheduling model can become a way to respond to feedback constraint disturbances and trigger rescheduling.

EAAI Journal 2024 Journal Article

A hierarchical integration scheduling method for flexible job shop with green lot splitting

  • Qingshan Gong
  • Junlin Li
  • Zhigang Jiang
  • Yan Wang

The integration of green scheduling and lot splitting scheduling is indispensable for ensuring the coordinated optimization of economic and environmental benefits in flexible job-shop scheduling (FJS). However, this integration involves not only the indicator of greenness and economy but also the process of production planning and scheduling, which is substantially complicated. To this end, a hierarchical integrated scheduling method is proposed by comprehensively considering the multilevel organizational structure and task configuration characteristics of flexible job-shop, as well as the differences in objectives on different scheduling levels: workshop level, process unit level, machine tool level. On the workshop level, a lot splitting model is presented to obtain the optimal processing task set for each production cycle with the minimum expected cost (startup cost, tardiness cost, and holding cost). On the process unit level, a task allocation model is given to allocate the optimal workload for each machine tool with the minimum processing energy consumption and maximum machine load. On the machine tool level, an operation sequencing model is established to obtain the optimal processing sequence for each machine tool with the minimum standby energy consumption and makespan. According to the solving characteristics of the hierarchical models, a multi-objective algorithm is applied. Finally, a case study is demonstrated to validate the proposed method.

AAAI Conference 2024 Conference Paper

A User-Friendly Framework for Generating Model-Preferred Prompts in Text-to-Image Synthesis

  • Nailei Hei
  • Qianyu Guo
  • Zihao Wang
  • Yan Wang
  • Haofen Wang
  • Wenqiang Zhang

Well-designed prompts have demonstrated the potential to guide text-to-image models in generating amazing images. Although existing prompt engineering methods can provide high-level guidance, it is challenging for novice users to achieve the desired results by manually entering prompts due to a discrepancy between novice-user-input prompts and the model-preferred prompts. To bridge the distribution gap between user input behavior and model training datasets, we first construct a novel Coarse-Fine Granularity Prompts dataset (CFP) and propose a novel User-Friendly Fine-Grained Text Generation framework (UF-FGTG) for automated prompt optimization. For CFP, we construct a novel dataset for text-to-image tasks that combines coarse and fine-grained prompts to facilitate the development of automated prompt generation methods. For UF-FGTG, we propose a novel framework that automatically translates user-input prompts into model-preferred prompts. Specifically, we propose a prompt refiner that continually rewrites prompts to empower users to select results that align with their unique needs. Meanwhile, we integrate image-related loss functions from the text-to-image model into the training process of text generation to generate model-preferred prompts. Additionally, we propose an adaptive feature extraction module to ensure diversity in the generated results. Experiments demonstrate that our approach is capable of generating more visually appealing and diverse images than previous state-of-the-art methods, achieving an average improvement of 5% across six quality and aesthetic metrics. Data and code are available at https://github.com/Naylenv/UF-FGTG.

ICRA Conference 2024 Conference Paper

Automatic Captioning based on Visible and Infrared Images

  • Yan Wang
  • Shuli Lou
  • Kai Wang
  • Yunzhe Wang
  • Xiaohu Yuan
  • Huaping Liu

In this paper, we tackle the task of image captioning with the complementarity of visible light images and infrared images. To address this problem, we propose an RGBIR image fusion captioning model, which can take full advantage of visible light images and infrared images under different conditions. Meanwhile, we develop a wearable environment-assisted system. In addition, we collect and annotate a new dataset containing 3510 pairs of RGB-IR images to support model training. Finally, we conduct extensive experiments to evaluate the model and system. Experimental results show that our new method and system significantly outperform baselines on multiple metrics and have potential practical value.

ECAI Conference 2024 Conference Paper

Cliff: Leveraging Ambiguous Samples for Enhanced Test-Time Adaptation

  • Xiao Chen
  • Qihui Zhang
  • Yan Wang

Given the common scenario where a trained model confronts significant variations in data distributions different from the training data at test time, Test Time Adaptation (TTA) has emerged as a crucial field of study. Traditional methods in TTA have focused on filtering low-entropy samples to improve model performance, primarily through entropy minimization techniques. However, these approaches exhibit limitations as they often overlook the potential classes of high-entropy samples. This oversight can result in an inadequate utilization of available data, particularly under challenging conditions where model adaptability is critical. In contrast to conventional approaches, our work diverges from the sole emphasis on low-entropy samples by leveraging the rich information contained within ambiguous samples. We demonstrate that reliance solely on entropy minimization is detrimental when dealing with ambiguous samples. To address this, we introduce Cliff, a novel framework designed to learn from ambiguous samples effectively. Concretely, Cliff comprises two innovative components: Dynamic Recognition (DR) and Gap Raising Loss (GRL). DR proposes a method for identifying ambiguous samples and dynamically assigning weights to them, enhancing the model’s focus on potentially informative discrepancies. Whereas the proposed GRL, indeed theoretically proven to be beneficial to the model, guides the model in effectively distinguishing among potential classes by emphasizing the differences in their predictive probabilities. Extensive experiments conducted on CIFAR-10-C and CIFAR-100-C datasets demonstrate Cliff’s state-of-the-art performance. Our results show an average accuracy improvement of 20. 24% and 21. 12% over the direct use of source domain models on target domains, respectively.

IJCAI Conference 2024 Conference Paper

CoAtFormer: Vision Transformer with Composite Attention

  • Zhiyong Chang
  • Mingjun Yin
  • Yan Wang

Transformer has recently gained significant attention and achieved state-of-the-art performance in various computer vision applications, including image classification, instance segmentation, and object detection. However, the self-attention mechanism underlying the transformer leads to quadratic computational cost with respect to image size, limiting its widespread adoption in state-of-the-art vision backbones. In this paper we introduce an efficient and effective attention module we call Composite Attention. It features parallel branches, enabling the modeling of various global dependencies. In each composite attention module, one branch employs a dynamic channel attention module to capture global channel dependencies, while the other branch utilizes an efficient spatial attention module to extract long-range spatial interactions. In addition, we effectively blending composite attention module with convolutions, and accordingly develop a simple hierarchical vision backbone, dubbed CoAtFormer, by simply repeating the basic building block over multiple stages. Extensive experiments show our CoAtFormer achieves state-of-the-art results on various different tasks. Without any pre-training and extra data, CoAtFormer-Tiny, CoAtFormer-Small, and CoAtFormer-Base achieve 84. 4%, 85. 3%, and 85. 9% top-1 accuracy on ImageNet-1K with 24M, 37M, and 73M parameters, respectively. Furthermore, CoAtFormer also consistently outperform prior work in other vision tasks such as object detection, instance segmentation, and semantic segmentation. When further pretraining on the larger dataset ImageNet-22k, we achieve 88. 7% Top-1 accuracy on ImageNet-1K

NeurIPS Conference 2024 Conference Paper

CogVLM: Visual Expert for Pretrained Language Models

  • Weihan Wang
  • Qingsong Lv
  • Wenmeng Yu
  • Wenyi Hong
  • Ji Qi
  • Yan Wang
  • Junhui Ji
  • Zhuoyi Yang

We introduce CogVLM, a powerful open-source visual language foundation model. Different from the popular \emph{shallow alignment} method which maps image features into the input space of language model, CogVLM bridges the gap between the frozen pretrained language model and image encoder by a trainable visual expert module in the attention and FFN layers. As a result, CogVLM enables a deep fusion of vision language features without sacrificing any performance on NLP tasks. CogVLM-17B achieves state-of-the-art performance on 17 classic cross-modal benchmarks, including 1) image captioning datasets: NoCaps, Flicker30k, 2) VQA datasets: OKVQA, TextVQA, OCRVQA, ScienceQA, 3) LVLM benchmarks: MM-Vet, MMBench, SEED-Bench, LLaVABench, POPE, MMMU, MathVista, 4) visual grounding datasets: RefCOCO, RefCOCO+, RefCOCOg, Visual7W. Codes and checkpoints are available at Github.

AAAI Conference 2024 Conference Paper

Collaborative Consortium of Foundation Models for Open-World Few-Shot Learning

  • Shuai Shao
  • Yu Bai
  • Yan Wang
  • Baodi Liu
  • Bin Liu

Open-World Few-Shot Learning (OFSL) is a crucial research field dedicated to accurately identifying target samples in scenarios where data is limited and labels are unreliable. This research holds significant practical implications and is highly relevant to real-world applications. Recently, the advancements in foundation models like CLIP and DINO have showcased their robust representation capabilities even in resource-constrained settings with scarce data. This realization has brought about a transformative shift in focus, moving away from “building models from scratch” towards “effectively harnessing the potential of foundation models to extract pertinent prior knowledge suitable for OFSL and utilizing it sensibly”. Motivated by this perspective, we introduce the Collaborative Consortium of Foundation Models (CO3), which leverages CLIP, DINO, GPT-3, and DALL-E to collectively address the OFSL problem. CO3 comprises four key blocks: (1) the Label Correction Block (LC-Block) corrects unreliable labels, (2) the Data Augmentation Block (DA-Block) enhances available data, (3) the Feature Extraction Block (FE-Block) extracts multi-modal features, and (4) the Text-guided Fusion Adapter (TeFu-Adapter) integrates multiple features while mitigating the impact of noisy labels through semantic constraints. Only the adapter's parameters are adjustable, while the others remain frozen. Through collaboration among these foundation models, CO3 effectively unlocks their potential and unifies their capabilities to achieve state-of-the-art performance on multiple benchmark datasets. https://github.com/The-Shuai/CO3.

NeurIPS Conference 2024 Conference Paper

DiffuBox: Refining 3D Object Detection with Point Diffusion

  • Xiangyu Chen
  • Zhenzhen Liu
  • Katie Z. Luo
  • Siddhartha Datta
  • Adhitya Polavaram
  • Yan Wang
  • Yurong You
  • Boyi Li

Ensuring robust 3D object detection and localization is crucial for many applications in robotics and autonomous driving. Recent models, however, face difficulties in maintaining high performance when applied to domains with differing sensor setups or geographic locations, often resulting in poor localization accuracy due to domain shift. To overcome this challenge, we introduce a novel diffusion-based box refinement approach. This method employs a domain-agnostic diffusion model, conditioned on the LiDAR points surrounding a coarse bounding box, to simultaneously refine the box's location, size, and orientation. We evaluate this approach under various domain adaptation settings, and our results reveal significant improvements across different datasets, object classes and detectors. Our PyTorch implementation is available at https: //github. com/cxy1997/DiffuBox.

TCS Journal 2024 Journal Article

Enhancing fault tolerance of balanced hypercube networks by the edge partition method

  • Xiaoqing Liu
  • Baolei Cheng
  • Yan Wang
  • Jia Yu
  • Jianxi Fan

Interconnection networks are key component of parallel and distributed systems. As the network scale increases, the growing number of link failures becomes inevitable. Therefore, how to enhance the fault tolerance of the network has become a research issue in parallel and distributed systems. Recently, two new metrics based on edge partition in networks have been introduced, called matroidal connectivity and conditional matroidal connectivity. Existing research indicates that they significantly enhance the fault tolerance of alternating group networks and star networks compared to other indicators such as g-component edge connectivity and g-extra edge connectivity. In this work, we consider the matroidal connectivity and conditional matroidal connectivity of the n-dimensional balanced hypercube B H n, a class of networks with vertex transitivity and edge transitivity, and theoretically demonstrate their high fault tolerance in B H n. Furthermore, through numerical simulation, it is observed that the matroidal connectivity and conditional matroidal connectivity of B H n are greater than the g-component edge connectivity and g-extra edge connectivity of B H n, which implies that (conditional) matroidal connectivity can further improve B H n ' fault tolerance.

IJCAI Conference 2024 Conference Paper

FD-UAD: Unsupervised Anomaly Detection Platform Based on Defect Autonomous Imaging and Enhancement

  • Yang Chang
  • Yuxuan Lin
  • Boyang Wang
  • Qing Zhao
  • Yan Wang
  • Wenqiang Zhang

In industrial quality control, detecting defects is essential. However, manual checks and machine vision encounter challenges in complex conditions, as defects vary among products made of different materials and shapes. We create FD-UAD, Unsupervised Anomaly Detection Platform Based on Defect Autonomous Imaging and Enhancement. It uses multi-sensor technology, combining RGB and infrared imaging, liquid lenses for adjustable focal lengths, and uses image fusion to capture multidimensional features. The system incorporates image restoration techniques such as enhancement, deblurring, denoising, and super-resolution, alongside unsupervised anomaly detection model for enhanced accuracy. FD-UAD is successfully used in a top diesel engine manufacturer, demonstrating its value in AI-enhanced industrial applications.

AAAI Conference 2024 Conference Paper

Fine-Tuning Large Language Model Based Explainable Recommendation with Explainable Quality Reward

  • Mengyuan Yang
  • Mengying Zhu
  • Yan Wang
  • Linxun Chen
  • Yilei Zhao
  • Xiuyuan Wang
  • Bing Han
  • Xiaolin Zheng

Large language model-based explainable recommendation (LLM-based ER) systems can provide remarkable human-like explanations and have widely received attention from researchers. However, the original LLM-based ER systems face three low-quality problems in their generated explanations, i.e., lack of personalization, inconsistency, and questionable explanation data. To address these problems, we propose a novel LLM-based ER model denoted as LLM2ER to serve as a backbone and devise two innovative explainable quality reward models for fine-tuning such a backbone in a reinforcement learning paradigm, ultimately yielding a fine-tuned model denoted as LLM2ER-EQR, which can provide high-quality explanations. LLM2ER-EQR can generate personalized, informative, and consistent high-quality explanations learned from questionable-quality explanation datasets. Extensive experiments conducted on three real-world datasets demonstrate that our model can generate fluent, diverse, informative, and highly personalized explanations.

TCS Journal 2024 Journal Article

High fault-tolerant performance of the divide-and-swap cube network

  • Qianru Zhou
  • Jianxi Fan
  • Yan Wang
  • Baolei Cheng
  • Guijuan Wang

Two crucial metrics used to evaluate the fault tolerance of interconnection networks are connectivity and diagnosability. By improving the connectivity and diagnosability of the interconnection network, its fault tolerance can be enhanced. In this paper, we focus on determining the g-extra connectivity ( 0 ≤ g ≤ 10 ) of the divide-and-swap cube D S C n, as well as its diagnosability based on the pessimistic diagnosis strategy and g-extra precise diagnosis strategy, under the PMC and MM* models. The research analysis suggests that compared with some other connectivity and diagnosability of D S C n, such as classical connectivity, structure connectivity, super connectivity, and classical diagnosability, the extra connectivity/diagnosability and pessimistic diagnosability of D S C n enable it to have a higher fault tolerance. Moreover, we propose two O ( N log 2 ⁡ N ) effective diagnosis algorithms of D S C n: the g-extra diagnosis algorithm (EX-Diagnosis D S C n ) and the pessimistic diagnosis algorithm (PE-Diagnosis D S C n ), where the EX-Diagnosis D S C n algorithm can accurately diagnose the state of all processors in D S C n.

NeurIPS Conference 2024 Conference Paper

LCGen: Mining in Low-Certainty Generation for View-consistent Text-to-3D

  • Zeng Tao
  • Tong Yang
  • Junxiong Lin
  • Xinji Mai
  • Haoran Wang
  • Beining Wang
  • Enyu Zhou
  • Yan Wang

The Janus Problem is a common issue in SDS-based text-to-3D methods. Due to view encoding approach and 2D diffusion prior guidance, the 3D representation model tends to learn content with higher certainty from each perspective, leading to view inconsistency. In this work, we first model and analyze the problem, visualizing the specific causes of the Janus Problem, which are associated with discrete view encoding and shared priors in 2D lifting. Based on this, we further propose the LCGen method, which guides text-to-3D to obtain different priors with different certainty from various viewpoints, aiding in view-consistent generation. Experiments have proven that our LCGen method can be directly applied to different SDS-based text-to-3D methods, alleviating the Janus Problem without introducing additional information, increasing excessive training burden, or compromising the generation effect.

AAAI Conference 2024 Conference Paper

LLMRG: Improving Recommendations through Large Language Model Reasoning Graphs

  • Yan Wang
  • Zhixuan Chu
  • Xin Ouyang
  • Simeng Wang
  • Hongyan Hao
  • Yue Shen
  • Jinjie Gu
  • Siqiao Xue

Recommendation systems aim to provide users with relevant suggestions, but often lack interpretability and fail to capture higher-level semantic relationships between user behaviors and profiles. In this paper, we propose a novel approach that leverages large language models (LLMs) to construct personalized reasoning graphs. These graphs link a user's profile and behavioral sequences through causal and logical inferences, representing the user's interests in an interpretable way. Our approach, LLM reasoning graphs (LLMRG), has four components: chained graph reasoning, divergent extension, self-verification and scoring, and knowledge base self-improvement. The resulting reasoning graph is encoded using graph neural networks, which serves as additional input to improve conventional recommender systems, without requiring extra user or item information. Our approach demonstrates how LLMs can enable more logical and interpretable recommender systems through personalized reasoning graphs. LLMRG allows recommendations to benefit from both engineered recommendation systems and LLM-derived reasoning graphs. We demonstrate the effectiveness of LLMRG on benchmarks and real-world scenarios in enhancing base recommendation models.

YNICL Journal 2024 Journal Article

Mapping grey matter and cortical thickness alterations associated with subjective cognitive decline and mild cognitive impairment among rural-dwelling older adults in China: A population-based study

  • Ziwei Chen
  • Qianqian Xie
  • Jiafeng Wang
  • Yan Wang
  • Huisi Zhang
  • Chunyan Li
  • Yongxiang Wang
  • Lin Cong

BACKGROUND: The structural brain alterations for subjective cognitive decline (SCD) and mild cognitive impairment (MCI) are poorly defined. We sought to characterize grey matter volume (GMV) and cortical thickness associated with SCD and MCI among rural-dwelling older adults in China. METHODS: This population-based cross-sectional study included 1072 dementia-free participants from the brain MRI sub-study of MIND-China (2018-2020). We defined MCI following the Petersen's criteria, and SCD as the self-rated Ascertain Dementia 8-item Questionnaire score ≥ 2. Data were analyzed using voxel-based morphometry (VBM), surface-based morphometry analysis (SBM), and logistic regression models. RESULTS: SCD was defined in 243 persons and MCI in 246 individuals. The VBM analysis showed that MCI (vs. normal cognition) was significantly associated with reduced GMV in brain regions such as the bilateral parahippocampus, bilateral hippocampus, and bilateral fusiform (P 0.05). The ROI-wise SBM analysis revealed that SCD was significantly associated with cortical thinning in the right paracentral sulcus, left caudal middle frontal gyrus, and left entorhinal cortex (P < 0.05) and that MCI was significantly associated with cortical thinning in the left temporal lobe, left frontal lobe, bilateral parietal lobe and bilateral fusiform (P < 0.05). CONCLUSIONS: The brain regions with reduced GMV or cortical thickness in older adults gradually expand from normal cognition through SCD to MCI, suggesting that characterizing structural brain alterations may help define the cognitive spectrum at the pre-dementia phase. These findings have potential implications for understanding the neuropathological process of cognitive deterioration in aging.

IROS Conference 2024 Conference Paper

MLPER: Multi-Level Prompts for Adaptively Enhancing Vision-Language Emotion Recognition

  • Yu Gao 0010
  • Weihong Ren
  • Xinglong Xu
  • Yan Wang
  • Zhiyong Wang 0009
  • Honghai Liu 0001

In the field of robotics, vision-based Emotion Recognition (ER) has achieved significant progress, but it still faces the challenge of poor generalization ability under unconstrained conditions (e. g. , occlusions and pose variations). In this work, we propose MLPER model, which introduces Vision-Language Model for Emotion Recognition to learn discriminative representations adaptively. Specifically, different from typically leveraging a hand-crafted prompt (e. g. , "a photo of a [class] person"), we first establish Multi-Level Prompts from three aspects: facial expression, human posture and situational condition using large language models, like ChatGPT. Correspondingly, we extract the visual tokens from three levels: the face, body, and context. Further, to achieve fine-grained alignment at each level, we adopt textual tokens from the positive and the hard negative to query visual tokens, predicting whether a pair of image and text is matched. Experimental results demonstrate that our MLPER model outperforms the state-of-the-art methods on several ER benchmarks, especially under the conditions of occlusions and pose variations.

AAAI Conference 2024 Conference Paper

Object Attribute Matters in Visual Question Answering

  • Peize Li
  • Qingyi Si
  • Peng Fu
  • Zheng Lin
  • Yan Wang

Visual question answering is a multimodal task that requires the joint comprehension of visual and textual information. However, integrating visual and textual semantics solely through attention layers is insufficient to comprehensively understand and align information from both modalities. Intuitively, object attributes can naturally serve as a bridge to unify them, which has been overlooked in previous research. In this paper, we propose a novel VQA approach from the perspective of utilizing object attribute, aiming to achieve better object-level visual-language alignment and multimodal scene understanding. Specifically, we design an attribute fusion module and a contrastive knowledge distillation module. The attribute fusion module constructs a multimodal graph neural network to fuse attributes and visual features through message passing. The enhanced object-level visual features contribute to solving fine-grained problem like counting-question. The better object-level visual-language alignment aids in understanding multimodal scenes, thereby improving the model's robustness. Furthermore, to augment scene understanding and the out-of-distribution performance, the contrastive knowledge distillation module introduces a series of implicit knowledge. We distill knowledge into attributes through contrastive loss, which further strengthens the representation learning of attribute features and facilitates visual-linguistic alignment. Intensive experiments on six datasets, COCO-QA, VQAv2, VQA-CPv2, VQA-CPv1, VQAvs and TDIUC, show the superiority of the proposed method.

AAAI Conference 2024 Conference Paper

Partial Label Learning with a Partner

  • Chongjie Si
  • Zekun Jiang
  • Xuehui Wang
  • Yan Wang
  • Xiaokang Yang
  • Wei Shen

In partial label learning (PLL), each instance is associated with a set of candidate labels among which only one is ground-truth. The majority of the existing works focuses on constructing robust classifiers to estimate the labeling confidence of candidate labels in order to identify the correct one. However, these methods usually struggle to rectify mislabeled samples. To help existing PLL methods identify and rectify mislabeled samples, in this paper, we introduce a novel partner classifier and propose a novel ``mutual supervision'' paradigm. Specifically, we instantiate the partner classifier predicated on the implicit fact that non-candidate labels of a sample should not be assigned to it, which is inherently accurate and has not been fully investigated in PLL. Furthermore, a novel collaborative term is formulated to link the base classifier and the partner one. During each stage of mutual supervision, both classifiers will blur each other's predictions through a blurring mechanism to prevent overconfidence in a specific label. Extensive experiments demonstrate that the performance and disambiguation ability of several well-established stand-alone and deep-learning based PLL approaches can be significantly improved by coupling with this learning paradigm.

AAAI Conference 2024 Conference Paper

Probability-Polarized Optimal Transport for Unsupervised Domain Adaptation

  • Yan Wang
  • Chuan-Xian Ren
  • Yi-Ming Zhai
  • You-Wei Luo
  • Hong Yan

Optimal transport (OT) is an important methodology to measure distribution discrepancy, which has achieved promising performance in artificial intelligence applications, e.g., unsupervised domain adaptation. However, from the view of transportation, there are still limitations: 1) the local discriminative structures for downstream tasks, e.g., cluster structure for classification, cannot be explicitly admitted by the learned OT plan; 2) the entropy regularization induces a dense OT plan with increasing uncertainty. To tackle these issues, we propose a novel Probability-Polarized OT (PPOT) framework, which can characterize the structure of OT plan explicitly. Specifically, the probability polarization mechanism is proposed to guide the optimization direction of OT plan, which generates a clear margin between similar and dissimilar transport pairs and reduces the uncertainty. Further, a dynamic mechanism for margin is developed by incorporating task-related information into the polarization, which directly captures the intra/inter class correspondence for knowledge transportation. A mathematical understanding for PPOT is provided from the view of gradient, which ensures interpretability. Extensive experiments on several datasets validate the effectiveness and empirical efficiency of PPOT.

ICML Conference 2024 Conference Paper

Protein Conformation Generation via Force-Guided SE(3) Diffusion Models

  • Yan Wang
  • Lihao Wang
  • Yuning Shen
  • Yiqun Wang
  • Huizhuo Yuan
  • Yue Wu
  • Quanquan Gu

The conformational landscape of proteins is crucial to understanding their functionality in complex biological processes. Traditional physics-based computational methods, such as molecular dynamics (MD) simulations, suffer from rare event sampling and long equilibration time problems, hindering their applications in general protein systems. Recently, deep generative modeling techniques, especially diffusion models, have been employed to generate novel protein conformations. However, existing score-based diffusion methods cannot properly incorporate important physical prior knowledge to guide the generation process, causing large deviations in the sampled protein conformations from the equilibrium distribution. In this paper, to overcome these limitations, we propose a force-guided $\mathrm{SE}(3)$ diffusion model, ConfDiff, for protein conformation generation. By incorporating a force-guided network with a mixture of data-based score models, ConfDiff can generate protein conformations with rich diversity while preserving high fidelity. Experiments on a variety of protein conformation prediction tasks, including 12 fast-folding proteins and the Bovine Pancreatic Trypsin Inhibitor (BPTI), demonstrate that our method surpasses the state-of-the-art method.

TCS Journal 2024 Journal Article

Reliability evaluation for a class of recursive match networks

  • Qianru Zhou
  • Baolei Cheng
  • Jingya Zhou
  • Jia Yu
  • Yan Wang
  • Jianxi Fan

Due to system failures in endlessly, efforts are always strived to improve the reliability of the system. As a fundamental problem in large-scale parallel and distributed systems, connectivity and diagnosability, especially g-extra connectivity and g-extra conditional diagnosability, have been widely followed. This work proposes a new class of recursive networks including the BCube—conditional recursive match networks (CRMNs). To explore the reliability of CRMNs, we get the g-extra connectivity and g-extra conditional diagnosability of CRMNs under the PMC model and the MM* model for 0 ≤ g ≤ 3 and g = 2 m + 1 − 1 ( m ≥ 0 ). And these results can not only be applied to the BCube, but also be used to directly obtain the corresponding g-extra connectivity and g-extra conditional diagnosability of some other networks except the BCube. In addition, we put forward the conjectures as regards the g-extra connectivity and g-extra conditional diagnosability of CRMNs with g > 0.

RLJ Journal 2024 Journal Article

Revisiting Sparse Rewards for Goal-Reaching Reinforcement Learning

  • Gautham Vasan
  • Yan Wang
  • Fahim Shahriar
  • James Bergstra
  • Martin Jägersand
  • A. Rupam Mahmood

Many real-world robot learning problems, such as pick-and-place or arriving at a destination, can be seen as a problem of reaching a goal state as soon as possible. These problems, when formulated as episodic reinforcement learning tasks, can easily be specified to align well with our intended goal: -1 reward every time step with termination upon reaching the goal state (termed $\textit{minimum-time}$ tasks). Despite this simplicity, such formulations are often overlooked in favor of dense rewards due to their perceived difficulty and lack of informativeness. Our studies contrast the two reward paradigms, revealing that the minimum-time task specification not only facilitates learning higher-quality policies but can also surpass dense-reward-based policies on their own performance metrics. Crucially, we also identify the goal-hit rate of the initial policy as a robust early indicator for learning success in such sparse feedback settings. Finally, using four distinct real-robotic platforms, we show that it is possible to learn pixel-based policies from scratch within two to three hours using constant negative rewards. Our video demo can be found here: https://youtu.be/a6zlVUuKzBc

RLC Conference 2024 Conference Paper

Revisiting Sparse Rewards for Goal-Reaching Reinforcement Learning

  • Gautham Vasan
  • Yan Wang
  • Fahim Shahriar
  • James Bergstra
  • Martin Jägers
  • A. Rupam Mahmood

Many real-world robot learning problems, such as pick-and-place or arriving at a destination, can be seen as a problem of reaching a goal state as soon as possible. These problems, when formulated as episodic reinforcement learning tasks, can easily be specified to align well with our intended goal: -1 reward every time step with termination upon reaching the goal state (termed $\textit{minimum-time}$ tasks). Despite this simplicity, such formulations are often overlooked in favor of dense rewards due to their perceived difficulty and lack of informativeness. Our studies contrast the two reward paradigms, revealing that the minimum-time task specification not only facilitates learning higher-quality policies but can also surpass dense-reward-based policies on their own performance metrics. Crucially, we also identify the goal-hit rate of the initial policy as a robust early indicator for learning success in such sparse feedback settings. Finally, using four distinct real-robotic platforms, we show that it is possible to learn pixel-based policies from scratch within two to three hours using constant negative rewards. Our video demo can be found here: https: //youtu. be/a6zlVUuKzBc

AAAI Conference 2024 Conference Paper

Set Prediction Guided by Semantic Concepts for Diverse Video Captioning

  • Yifan Lu
  • Ziqi Zhang
  • Chunfeng Yuan
  • Peng Li
  • Yan Wang
  • Bing Li
  • Weiming Hu

Diverse video captioning aims to generate a set of sentences to describe the given video in various aspects. Mainstream methods are trained with independent pairs of a video and a caption from its ground-truth set without exploiting the intra-set relationship, resulting in low diversity of generated captions. Different from them, we formulate diverse captioning into a semantic-concept-guided set prediction (SCG-SP) problem by fitting the predicted caption set to the ground-truth set, where the set-level relationship is fully captured. Specifically, our set prediction consists of two synergistic tasks, i.e., caption generation and an auxiliary task of concept combination prediction providing extra semantic supervision. Each caption in the set is attached to a concept combination indicating the primary semantic content of the caption and facilitating element alignment in set prediction. Furthermore, we apply a diversity regularization term on concepts to encourage the model to generate semantically diverse captions with various concept combinations. These two tasks share multiple semantics-specific encodings as input, which are obtained by iterative interaction between visual features and conceptual queries. The correspondence between the generated captions and specific concept combinations further guarantees the interpretability of our model. Extensive experiments on benchmark datasets show that the proposed SCG-SP achieves state-of-the-art (SOTA) performance under both relevance and diversity metrics.

IJCAI Conference 2024 Conference Paper

Shadow-Free Membership Inference Attacks: Recommender Systems Are More Vulnerable Than You Thought

  • Xiaoxiao Chi
  • Xuyun Zhang
  • Yan Wang
  • Lianyong Qi
  • Amin Beheshti
  • Xiaolong Xu
  • Kim-Kwang Raymond Choo
  • Shuo Wang

Recommender systems have been successfully applied in many applications. Nonetheless, recent studies demonstrate that recommender systems are vulnerable to membership inference attacks (MIAs), leading to the leakage of users’ membership privacy. However, existing MIAs relying on shadow training suffer a large performance drop when the attacker lacks knowledge of the training data distribution and the model architecture of the target recommender system. To better understand the privacy risks of recommender systems, we propose shadow-free MIAs that directly leverage a user’s recommendations for membership inference. Without shadow training, the proposed attack can conduct MIAs efficiently and effectively under a practice scenario where the attacker is given only black-box access to the target recommender system. The proposed attack leverages an intuition that the recommender system personalizes a user’s recommendations if his historical interactions are used by it. Thus, an attacker can infer membership privacy by determining whether the recommendations are more similar to the interactions or the general popular items. We conduct extensive experiments on benchmark datasets across various recommender systems. Remarkably, our attack achieves far better attack accuracy with low false positive rates than baselines while with a much lower computational cost.

NeurIPS Conference 2024 Conference Paper

SRFUND: A Multi-Granularity Hierarchical Structure Reconstruction Benchmark in Form Understanding

  • Jiefeng Ma
  • Yan Wang
  • Chenyu Liu
  • Jun Du
  • Yu Hu
  • Zhenrong Zhang
  • Pengfei Hu
  • Qing Wang

Accurately identifying and organizing textual content is crucial for the automation of document processing in the field of form understanding. Existing datasets, such as FUNSD and XFUND, support entity classification and relationship prediction tasks but are typically limited to local and entity-level annotations. This limitation overlooks the hierarchically structured representation of documents, constraining comprehensive understanding of complex forms. To address this issue, we present the SRFUND, a hierarchically structured multi-task form understanding benchmark. SRFUND provides refined annotations on top of the original FUNSD and XFUND datasets, encompassing five tasks: (1) word to text-line merging, (2) text-line to entity merging, (3) entity category classification, (4) item table localization, and (5) entity-based full-document hierarchical structure recovery. We meticulously supplemented the original dataset with missing annotations at various levels of granularity and added detailed annotations for multi-item table regions within the forms. Additionally, we introduce global hierarchical structure dependencies for entity relation prediction tasks, surpassing traditional local key-value associations. The SRFUND dataset includes eight languages including English, Chinese, Japanese, German, French, Spanish, Italian, and Portuguese, making it a powerful tool for cross-lingual form understanding. Extensive experimental results demonstrate that the SRFUND dataset presents new challenges and significant opportunities in handling diverse layouts and global hierarchical structures of forms, thus providing deep insights into the field of form understanding. The original dataset and implementations of baseline methods are available at https: //sprateam-ustc. github. io/SRFUND.

EAAI Journal 2024 Journal Article

Three-partition coevolutionary differential evolution algorithm for mixed-variable optimization problems

  • Guojun Gan
  • Hengzhou Ye
  • Minggang Dong
  • Wei Ye
  • Yan Wang

Both in industrial and scientific fields, many optimization problems involve continuous and discrete decision variables. Such problems are called mixed-variable optimization problems (MVOPs). MVOPs remain challenging due to the different spatial distribution characteristics of continuous, ordinal, and categorical variables. In this study, a new variant of differential evolution (DE), called the three-partition coevolutionary DE algorithm for MVOPs (TCDEmv) is proposed. First, a mixed-variable three-partition coevolutionary scheme that can simultaneously handle MVOPs comprising continuous, ordinal, and categorical variables with the same evolution operator is proposed. Additionally, the TCDEmv adopts a dynamic adaptive (DA) mechanism to maintain the balance between ordinal and categorical variables, avoiding the quantity dominance issue. Furthermore, to enhance the efficiency of the TCDEmv, a statistical probability-based two-layer optimization strategy (SPT) was employed for ordinal and categorical variables. The experimental results on 34 artificial MVOPs show that the TCDEmv obtained better solutions and convergence than seven representative algorithms. Compared with similar algorithms in three real-world MVOPs, the TCDEmv also shows competitive performance.

NeurIPS Conference 2024 Conference Paper

Training an Open-Vocabulary Monocular 3D Detection Model without 3D Data

  • Rui Huang
  • Henry Zheng
  • Yan Wang
  • Zhuofan Xia
  • Marco Pavone
  • Gao Huang

Open-vocabulary 3D object detection has recently attracted considerable attention due to its broad applications in autonomous driving and robotics, which aims to effectively recognize novel classes in previously unseen domains. However, existing point cloud-based open-vocabulary 3D detection models are limited by their high deployment costs. In this work, we propose a novel open-vocabulary monocular 3D object detection framework, dubbed OVM3D-Det, which trains detectors using only RGB images, making it both cost-effective and scalable to publicly available data. Unlike traditional methods, OVM3D-Det does not require high-precision LiDAR or 3D sensor data for either input or generating 3D bounding boxes. Instead, it employs open-vocabulary 2D models and pseudo-LiDAR to automatically label 3D objects in RGB images, fostering the learning of open-vocabulary monocular 3D detectors. However, training 3D models with labels directly derived from pseudo-LiDAR is inadequate due to imprecise boxes estimated from noisy point clouds and severely occluded objects. To address these issues, we introduce two innovative designs: adaptive pseudo-LiDAR erosion and bounding box refinement with prior knowledge from large language models. These techniques effectively calibrate the 3D labels and enable RGB-only training for 3D detectors. Extensive experiments demonstrate the superiority of OVM3D-Det over baselines in both indoor and outdoor scenarios. The code will be released.

TIST Journal 2024 Journal Article

Trustworthy Recommender Systems

  • Shoujin Wang
  • Xiuzhen Zhang
  • Yan Wang
  • Francesco Ricci

Recommender systems (RSs) aim at helping users to effectively retrieve items of their interests from a large catalogue. For a quite long time, researchers and practitioners have been focusing on developing accurate RSs. Recent years have witnessed an increasing number of threats to RSs, coming from attacks, system and user generated noise, and various types of biases. As a result, it has become clear that the focus on RS accuracy is too narrow, and the research must consider other important factors, particularly trustworthiness. A trustworthy recommender system (TRS) should not only be accurate but also transparent, unbiased, fair, and robust to noise and attacks. These observations actually led to a paradigm shift of the research on RSs: from accuracy-oriented RSs to TRSs. However, there is a lack of a systematic overview and discussion of the literature in this novel and fast-developing field of TRSs. To this end, in this article, we provide an overview of TRSs, including a discussion of the motivation and basic concepts of TRSs, a presentation of the challenges in building TRSs, and a perspective on the future directions in this area. We also provide a novel conceptual framework to support the construction of TRSs.

EAAI Journal 2023 Journal Article

A nonlinear multi-label learning model based on Tanh mapping

  • Changzhong Wang
  • Yan Wang
  • Tingquan Deng
  • Yang Huang

The relationship between features and labels plays an important role in multi-label learning. The purpose of multi-label learning is to learn a mapping from the feature space to the label space. Many of the existing methods decompose the observed label matrix into the product of a low-rank latent label matrix and a projection matrix, and construct a linear mapping between the feature space and low-rank latent label space. However, in practice, the relationship between two high-dimensional spaces is often nonlinear. Consequently, these linear models may not sufficiently capture the data distribution and the true dependencies between features and labels. Herein, a novel classification model for multi-label learning is proposed, which considers nonlinear mapping from the feature space to the latent label space. Because of correlation between labels, the label matrix is usually of a low rank. To explore the low-rank structure, a latent label matrix is obtained using the decomposition of the original label matrix. To characterize the nonlinear relationship between features and latent labels, the tanh function is employed to learn a nonlinear classifier. The tanh function describes the distribution of labels by constraining the domain of the latent label space to the interval [ − 1, 1 ]. In addition, the Frobenius norm regularization constraint is imposed on the variables in the model for optimization. The experimental results demonstrate that the proposed method is robust and comparable to state-of-the-art multi-label learning methods.

TCS Journal 2023 Journal Article

A parallel algorithm to construct edge independent spanning trees on the line graphs of conditional bijective connection networks

  • Zhiyong Pan
  • Baolei Cheng
  • Jianxi Fan
  • Yan Wang
  • Xiajing Li

The line graphs of some well-known graphs, such as the generalized hypercube and the crossed cube, have been adopted for the logic graphs of data center networks (DCNs). Conditional bijective connection networks (conditional BC networks) are a class of networks which have been proved to include hypercubes, crossed cubes, locally twisted cubes and Möbius cubes, etc. Hence, the researches on the line graphs of conditional BC networks can be applied to DCNs. Edge independent spanning trees (EISTs) on a graph have received extensive attention because of their applications in reliable communication, fault-tolerant broadcasting and secure message distribution. Since the line graph of an n-dimensional conditional BC network, denoted as L ( X C n ), is ( 2 n − 2 ) -regular, whether there exist 2 n − 2 EISTs on L ( X C n ) is an open question. In this paper, we first propose a parallel algorithm to construct 2 n − 2 EISTs rooted at an arbitrary node on L ( X C n ), where n ≥ 2. Then, we prove the correctness of our algorithm. Finally, we give a simulation result of our algorithm.

EAAI Journal 2023 Journal Article

An efficient multilayer adaptive self-organizing modeling methodology for improving the generalization ability of organic Rankine cycle (ORC) data-driven model

  • Xu Ping
  • Fubin Yang
  • HongGuang Zhang
  • Chengda Xing
  • Anren Yang
  • Yan Wang

The efficient and accurate model construction of an organic Rankine cycle (ORC) system is the key to its analysis, prediction, and optimization. As a typical multidisturbance nonlinear dynamic system, the ORC system always operates in a nonstationary state. Under uncertain disturbance, accurately identifying the thermal power conversion process state and quickly realizing the accurate association of various mappings are the key considerations of constructing the data-driven model of the ORC system. From the perspective of data selection, parameter association, and structural design, this study proposes a methodology for the efficient multilayer adaptive self-organizing modeling of ORC systems. This methodology can realize efficient autonomous modeling in the whole design process of the data-driven model of the ORC system. Moreover, the proposed methodology can minimize the structural risk of the model by balancing the empirical risk and structural complexity. By taking real data as the test base, the generalization ability and time cost of the data self-selection layer, information self-correlation layer and adaptive self-organizing part of the structure layer are evaluated. Compared with the direct construction of the ORC system data model, the proposed methodology can reduce the model construction time cost by 75. 54% and improve the generalization ability by 61. 88%. In addition, maximizing the generalization capability with minimum structural risk is an important part of data-driven model construction of ORC systems. In this study, a data-driven model structural reliability assessment approach for ORC systems is proposed. Then, the proposed adaptive self-organizing optimization methodology is verified on the basis of the structural reliability assessment model. The multilayer adaptive self-organizing modeling methodology proposed in this study can provide new ideas and necessary theoretical guidance.

TCS Journal 2023 Journal Article

Edge-independent spanning trees in folded crossed cubes

  • Huanwen Zhang
  • Yan Wang
  • Jianxi Fan
  • Chang Shu

Edge-independent spanning trees (for short EISTs) have widespread applications in fault-tolerance to enhance stability and security of networks, as well as in IP fast rerouting to prevent network breakdown caused by link failure. Then, the algorithms for constructing EISTs on many classes of graphs have been investigated. The folded crossed cube was proposed based on the folded cube and the crossed cube, which possesses such appealing properties as short diameter, short mean internode distance and very low message traffic density. In this paper, we study the existence and construction of EISTs with the same root r in the n-dimensional folded crossed cube (for short F C Q n ). For v ∈ V ( F C Q n ) ∖ { r } and i ∈ { 0, 1, ⋯, n − 1 }, we first propose two algorithms to obtain the sequence S v, i and the set F v, respectively. Then, based on them, an algorithm with time complexity O ( n 2 ) by using N processors is proposed to construct n + 1 EISTs rooted at any vertex r in F C Q n, where N = 2 n. And the corresponding theoretical proof and simulation experiments are presented to verify its validity. Since F C Q n is ( n + 1 ) -regular, the result is optimal with respect to the number of EISTs constructed. Moreover, the performance of the proposed algorithm is evaluated experimentally in terms of average distance and average distance-diameter ratio of resulting EISTs.

EAAI Journal 2023 Journal Article

Ensemble learning-based nonlinear time series prediction and dynamic multi-objective optimization of organic rankine cycle (ORC) under actual driving cycle

  • Xu Ping
  • Fubin Yang
  • HongGuang Zhang
  • Chengda Xing
  • Zhuxian Liu
  • Hailong Yang
  • Yan Wang

Complicated road conditions make organic Rankine cycle (ORC) operation characteristics show hysteresis and uncertainty. Under the strong coupling correlation of many operating parameters, how to realize the dynamic optimization of ORC comprehensive performance is the key to obtain practical application potential. Based on ensemble learning mechanism, neural network modeling, ensemble system, unsupervised learning, partial mutual information and optimization algorithm are integrated. This paper presents a nonlinear time series prediction and dynamic multi-objective optimization scheme. The average accuracy increased by at least 59. 6%. Taking the thermodynamic performance and environmental impact as optimization objectives, dynamic multi-objective optimization is carried out under road conditions. The optimization scheme can effectively trade off the nonlinear correlation between thermal efficiency and emissions of CO2 equivalent.

NeurIPS Conference 2023 Conference Paper

Exploiting Contextual Objects and Relations for 3D Visual Grounding

  • Li Yang
  • Chunfeng Yuan
  • Ziqi Zhang
  • Zhongang Qi
  • Yan Xu
  • Wei Liu
  • Ying Shan
  • Bing Li

3D visual grounding, the task of identifying visual objects in 3D scenes based on natural language inputs, plays a critical role in enabling machines to understand and engage with the real-world environment. However, this task is challenging due to the necessity to capture 3D contextual information to distinguish target objects from complex 3D scenes. The absence of annotations for contextual objects and relations further exacerbates the difficulties. In this paper, we propose a novel model, CORE-3DVG, to address these challenges by explicitly learning about contextual objects and relations. Our method accomplishes 3D visual grounding via three sequential modular networks, including a text-guided object detection network, a relation matching network, and a target identification network. During training, we introduce a pseudo-label self-generation strategy and a weakly-supervised method to facilitate the learning of contextual objects and relations, respectively. The proposed techniques allow the networks to focus more effectively on referred objects within 3D scenes by understanding their context better. We validate our model on the challenging Nr3D, Sr3D, and ScanRefer datasets and demonstrate state-of-the-art performance. Our code will be public at https: //github. com/yangli18/CORE-3DVG.

AAAI Conference 2023 Conference Paper

Generalizing Math Word Problem Solvers via Solution Diversification

  • Zhenwen Liang
  • Jipeng Zhang
  • Lei Wang
  • Yan Wang
  • Jie Shao
  • Xiangliang Zhang

Current math word problem (MWP) solvers are usually Seq2Seq models trained by the (one-problem; one-solution) pairs, each of which is made of a problem description and a solution showing reasoning flow to get the correct answer. However, one MWP problem naturally has multiple solution equations. The training of an MWP solver with (one-problem; one-solution) pairs excludes other correct solutions, and thus limits the generalizability of the MWP solver. One feasible solution to this limitation is to augment multiple solutions to a given problem. However, it is difficult to collect diverse and accurate augment solutions through human efforts. In this paper, we design a new training framework for an MWP solver by introducing a solution buffer and a solution discriminator. The buffer includes solutions generated by an MWP solver to encourage the training data diversity. The discriminator controls the quality of buffered solutions to participate in training. Our framework is flexibly applicable to a wide setting of fully, semi-weakly and weakly supervised training for all Seq2Seq MWP solvers. We conduct extensive experiments on a benchmark dataset Math23k and a new dataset named Weak12k, and show that our framework improves the performance of various MWP solvers under different settings by generating correct and diverse solutions.

NeurIPS Conference 2023 Conference Paper

Idempotent Learned Image Compression with Right-Inverse

  • Yanghao Li
  • Tongda Xu
  • Yan Wang
  • Jingjing Liu
  • Ya-Qin Zhang

We consider the problem of idempotent learned image compression (LIC). The idempotence of codec refers to the stability of codec to re-compression. To achieve idempotence, previous codecs adopt invertible transforms such as DCT and normalizing flow. In this paper, we first identify that invertibility of transform is sufficient but not necessary for idempotence. Instead, it can be relaxed into right-invertibility. And such relaxation allows wider family of transforms. Based on this identification, we implement an idempotent codec using our proposed blocked convolution and null-space enhancement. Empirical results show that we achieve state-of-the-art rate-distortion performance among idempotent codecs. Furthermore, our codec can be extended into near-idempotent codec by relaxing the right-invertibility. And this near-idempotent codec has significantly less quality decay after $50$ rounds of re-compression compared with other near-idempotent codecs.

AAAI Conference 2023 Conference Paper

Intriguing Findings of Frequency Selection for Image Deblurring

  • Xintian Mao
  • Yiming Liu
  • Fengze Liu
  • Qingli Li
  • Wei Shen
  • Yan Wang

Blur was naturally analyzed in the frequency domain, by estimating the latent sharp image and the blur kernel given a blurry image. Recent progress on image deblurring always designs end-to-end architectures and aims at learning the difference between blurry and sharp image pairs from pixel-level, which inevitably overlooks the importance of blur kernels. This paper reveals an intriguing phenomenon that simply applying ReLU operation on the frequency domain of a blur image followed by inverse Fourier transform, i.e., frequency selection, provides faithful information about the blur pattern (e.g., the blur direction and blur level, implicitly shows the kernel pattern). Based on this observation, we attempt to leverage kernel-level information for image deblurring networks by inserting Fourier transform, ReLU operation, and inverse Fourier transform to the standard ResBlock. 1 × 1 convolution is further added to let the network modulate flexible thresholds for frequency selection. We term our newly built block as Res FFT-ReLU Block, which takes advantages of both kernel-level and pixel-level features via learning frequency-spatial dual-domain representations. Extensive experiments are conducted to acquire a thorough analysis on the insights of the method. Moreover, after plugging the proposed block into NAFNet, we can achieve 33.85 dB in PSNR on GoPro dataset. Our method noticeably improves backbone architectures without introducing many parameters, while maintaining low computational complexity. Code is available at https://github.com/DeepMed-Lab/DeepRFT-AAAI2023.

IJCAI Conference 2023 Conference Paper

LION: Label Disambiguation for Semi-supervised Facial Expression Recognition with Progressive Negative Learning

  • Zhongjing Du
  • Xu Jiang
  • Peng Wang
  • Qizheng Zhou
  • Xi Wu
  • Jiliu Zhou
  • Yan Wang

Semi-supervised deep facial expression recognition (SS-DFER) has recently attracted rising research interest due to its more practical setting of abundant unlabeled data. However, there are two main problems unconsidered in current SS-DFER methods: 1) label ambiguity, i. e. , given labels mismatch with facial expressions; 2) inefficient utilization of unlabeled data with low-confidence. In this paper, we propose a novel SS-DFER method, including a Label DIsambiguation module and a PrOgressive Negative Learning module, namely LION, to simultaneously address both problems. Specifically, the label disambiguation module operates on labeled data, including data with accurate labels (clear data) and ambiguous labels (ambiguous data). It first uses clear data to calculate prototypes for all the expression classes, and then re-assign a candidate label set to all the ambiguous data. Based on the prototypes and the candidate label set, the ambiguous data can be relabeled more accurately. As for unlabeled data with low-confidence, the progressive negative learning module is developed to iteratively mine more complete complementary labels, which can guide the model to reduce the association between data and corresponding complementary labels. Experiments on three challenging datasets show that our method significantly outperforms the current state-of-the-art approaches in SS-DFER and surpasses fully-supervised baselines. Code will be available at https: //github. com/NUM-7/LION.

AAAI Conference 2023 Conference Paper

OMPQ: Orthogonal Mixed Precision Quantization

  • Yuexiao Ma
  • Taisong Jin
  • Xiawu Zheng
  • Yan Wang
  • Huixia Li
  • Yongjian Wu
  • Guannan Jiang
  • Wei Zhang

To bridge the ever-increasing gap between deep neural networks' complexity and hardware capability, network quantization has attracted more and more research attention. The latest trend of mixed precision quantization takes advantage of hardware's multiple bit-width arithmetic operations to unleash the full potential of network quantization. However, existing approaches rely heavily on an extremely time-consuming search process and various relaxations when seeking the optimal bit configuration. To address this issue, we propose to optimize a proxy metric of network orthogonality that can be efficiently solved with linear programming, which proves to be highly correlated with quantized model accuracy and bit-width. Our approach significantly reduces the search time and the required data amount by orders of magnitude, but without a compromise on quantization accuracy. Specifically, we achieve 72.08% Top-1 accuracy on ResNet-18 with 6.7Mb parameters, which does not require any searching iterations. Given the high efficiency and low data dependency of our algorithm, we use it for the post-training quantization, which achieves 71.27% Top-1 accuracy on MobileNetV2 with only 1.5Mb parameters.

AAAI Conference 2023 Conference Paper

Positive Distribution Pollution: Rethinking Positive Unlabeled Learning from a Unified Perspective

  • Qianqiao Liang
  • Mengying Zhu
  • Yan Wang
  • Xiuyuan Wang
  • Wanjia Zhao
  • Mengyuan Yang
  • Hua Wei
  • Bing Han

Positive Unlabeled (PU) learning, which has a wide range of applications, is becoming increasingly prevalent. However, it suffers from problems such as data imbalance, selection bias, and prior agnostic in real scenarios. Existing studies focus on addressing part of these problems, which fail to provide a unified perspective to understand these problems. In this paper, we first rethink these problems by analyzing a typical PU scenario and come up with an insightful point of view that all these problems are inherently connected to one problem, i.e., positive distribution pollution, which refers to the inaccuracy in estimating positive data distribution under very little labeled data. Then, inspired by this insight, we devise a variational model named CoVPU, which addresses all three problems in a unified perspective by targeting the positive distribution pollution problem. CoVPU not only accurately separates the positive data from the unlabeled data based on discrete normalizing flows, but also effectively approximates the positive distribution based on our derived unbiased rebalanced risk estimator and supervises the approximation based on a novel prior-free variational loss. Rigorous theoretical analysis proves the convergence of CoVPU to an optimal Bayesian classifier. Extensive experiments demonstrate the superiority of CoVPU over the state-of-the-art PU learning methods under these problems.

NeurIPS Conference 2023 Conference Paper

Prompt-augmented Temporal Point Process for Streaming Event Sequence

  • Siqiao Xue
  • Yan Wang
  • Zhixuan Chu
  • Xiaoming Shi
  • Caigao JIANG
  • Hongyan Hao
  • Gangwei Jiang
  • Xiaoyun Feng

Neural Temporal Point Processes (TPPs) are the prevalent paradigm for modeling continuous-time event sequences, such as user activities on the web and financial transactions. In real world applications, the event data typically comes in a streaming manner, where the distribution of the patterns may shift over time. Under the privacy and memory constraints commonly seen in real scenarios, how to continuously monitor a TPP to learn the streaming event sequence is an important yet under-investigated problem. In this work, we approach this problem by adopting Continual Learning (CL), which aims to enable a model to continuously learn a sequence of tasks without catastrophic forgetting. While CL for event sequence is less well studied, we present a simple yet effective framework, PromptTPP, by integrating the base TPP with a continuous-time retrieval prompt pool. In our proposed framework, prompts are small learnable parameters, maintained in a memory space and jointly optimized with the base TPP so that the model is properly instructed to learn event streams arriving sequentially without buffering past examples or task-specific attributes. We formalize a novel and realistic experimental setup for modeling event streams, where PromptTPP consistently sets state-of-the-art performance across two real user behavior datasets.

ICRA Conference 2023 Conference Paper

Real-Time Reinforcement Learning for Vision-Based Robotics Utilizing Local and Remote Computers

  • Yan Wang
  • Gautham Vasan
  • A. Rupam Mahmood

Real-time learning is crucial for robotic agents adapting to ever-changing, non-stationary environments. A common setup for a robotic agent is to have two different computers simultaneously: a resource-limited local computer tethered to the robot and a powerful remote computer connected wirelessly. Given such a setup, it is unclear to what extent the performance of a learning system can be affected by resource limitations and how to efficiently use the wirelessly connected powerful computer to compensate for any performance loss. In this paper, we implement a real-time learning system called the Remote-Local Distributed (ReLoD) system to distribute computations of two deep reinforcement learning (RL) algorithms, Soft Actor-Critic (SAC) and Proximal Policy Optimization (PPO), between a local and a remote computer. The performance of the system is evaluated on two vision-based control tasks developed using a robotic arm and a mobile robot. Our results show that SAC's performance degrades heavily on a resource-limited local computer. Strikingly, when all computations of the learning system are deployed on a remote workstation, SAC fails to compensate for the performance loss, indicating that, without careful consideration, using a powerful remote computer may not result in performance improvement. However, a carefully chosen distribution of computations of SAC consistently and substantially improves its performance on both tasks. On the other hand, the performance of PPO remains largely unaffected by the distribution of computations. In addition, when all computations happen solely on a powerful tethered computer, the performance of our system remains on par with an existing system that is well-tuned for using a single machine. ReLoD is the only publicly available system for real-time RL that applies to multiple robots for vision-based tasks. The source code can be found at https://github.com/rlai-lab/relod

TCS Journal 2023 Journal Article

Reliability evaluation of complete graph-based recursive networks

  • Yihong Wang
  • Jianxi Fan
  • Yuejuan Han
  • Yan Wang
  • Baolei Cheng

With the development of cloud computing and high-performance computing technologies, the scales of data center networks and interconnection networks become more and more large. As a result, the reliability evaluation of the two kinds of networks is very important. And most scholars study the reliability of specific networks. In this paper, we put forward a kind of complete graph-based recursive networks (short as CGRNs) which includes data center networks—DCell and generalized DCell, as well as the interconnection network dragonfly, etc. Then we further investigate the reliability of such networks—the connectivity, the diagnosability under the pessimistic diagnosis strategy based on the PMC model, g-restrict connectivity, and the g-good-neighbor conditional diagnosabilities under the PMC model and the MM⁎ model. As applications, previous results of DCell networks can be directly obtained and new results on the connectivity, diagnosability under the pessimistic strategy diagnosis based on the PMC model, g-restrict connectivity, and the g-good-neighbor conditional diagnosabilities under the PMC model and the MM⁎ model of networks (such as generalized DCell networks and dragonfly networks), are derived. Even these reliability results can be obtained for some other networks different from DCell networks, generalized DCell networks, and dragonfly networks. In addition, it can be seen that the diagnosability under the pessimistic diagnosis strategy based on the PMC model, the g-restrict connectivity, and the g-good-neighbor conditional diagnosabilities under the PMC model and the MM⁎ model of G r are about 2 times of its diagnosability, g + 1 times of its connectivity, and g + 1 times of its diagnosability, respectively.

JBHI Journal 2023 Journal Article

scGAMNN: Graph Antoencoder-Based Single-Cell RNA Sequencing Data Integration Algorithm Using Mutual Nearest Neighbors

  • Bai Zhang
  • Hanwen Wu
  • Yan Wang
  • Chenxu Xuan
  • Jie Gao

It is critical to correctly assemble high-dimensional single-cell RNA sequencing (scRNA-seq) datasets and downscale them for downstream analysis. However, given the complex relationships between cells, it remains a challenge to simultaneously eliminate batch effects between datasets and maintain the topology between cells within each dataset. Here, we propose scGAMNN, a deep learning model based on graph autoencoder, to simultaneously achieve batch correction and topology-preserving dimensionality reduction. The low-dimensional integrated data obtained by scGAMNN can be used for visualization, clustering and trajectory inference. By comparing it with the other five methods, multiple tasks show that scGAMNN consistently has comparable data integration performance in clustering and trajectory conservation.

AAAI Conference 2023 Conference Paper

SSDA3D: Semi-supervised Domain Adaptation for 3D Object Detection from Point Cloud

  • Yan Wang
  • Junbo Yin
  • Wei Li
  • Pascal Frossard
  • Ruigang Yang
  • Jianbing Shen

LiDAR-based 3D object detection is an indispensable task in advanced autonomous driving systems. Though impressive detection results have been achieved by superior 3D detectors, they suffer from significant performance degeneration when facing unseen domains, such as different LiDAR configurations, different cities, and weather conditions. The mainstream approaches tend to solve these challenges by leveraging unsupervised domain adaptation (UDA) techniques. However, these UDA solutions just yield unsatisfactory 3D detection results when there is a severe domain shift, e.g., from Waymo (64-beam) to nuScenes (32-beam). To address this, we present a novel Semi-Supervised Domain Adaptation method for 3D object detection (SSDA3D), where only a few labeled target data is available, yet can significantly improve the adaptation performance. In particular, our SSDA3D includes an Inter-domain Adaptation stage and an Intra-domain Generalization stage. In the first stage, an Inter-domain Point-CutMix module is presented to efficiently align the point cloud distribution across domains. The Point-CutMix generates mixed samples of an intermediate domain, thus encouraging to learn domain-invariant knowledge. Then, in the second stage, we further enhance the model for better generalization on the unlabeled target set. This is achieved by exploring Intra-domain Point-MixUp in semi-supervised learning, which essentially regularizes the pseudo label distribution. Experiments from Waymo to nuScenes show that, with only 10% labeled target data, our SSDA3D can surpass the fully-supervised oracle model with 100% target label. Our code is available at https://github.com/yinjunbo/SSDA3D.

NeurIPS Conference 2023 Conference Paper

Stability of Random Forests and Coverage of Random-Forest Prediction Intervals

  • Yan Wang
  • Huaiqing Wu
  • Dan Nettleton

We establish stability of random forests under the mild condition that the squared response ($Y^2$) does not have a heavy tail. In particular, our analysis holds for the practical version of random forests that is implemented in popular packages like \texttt{randomForest} in \texttt{R}. Empirical results show that stability may persist even beyond our assumption and hold for heavy-tailed $Y^2$. Using the stability property, we prove a non-asymptotic lower bound for the coverage probability of prediction intervals constructed from the out-of-bag error of random forests. With another mild condition that is typically satisfied when $Y$ is continuous, we also establish a complementary upper bound, which can be similarly established for the jackknife prediction interval constructed from an arbitrary stable algorithm. We also discuss the asymptotic coverage probability under assumptions weaker than those considered in previous literature. Our work implies that random forests, with its stability property, is an effective machine learning method that can provide not only satisfactory point prediction but also justified interval prediction at almost no extra computational cost.

TCS Journal 2023 Journal Article

The t/m-diagnosis strategy of augmented k-ary n-cubes

  • Xueli Sun
  • Jianxi Fan
  • Baolei Cheng
  • Yan Wang

As the scale of interconnected networks grows, so does the number of processor failures in the network. When a processor fails, the information transmitted through the failed processor is unreliable. In fact, the higher the network's diagnosability, the more reliable it is. The t / m -diagnosis strategy was proposed [20], the primary principle of which is to sacrifice less diagnostic accuracy to massively improve the diagnosability, with the goal of enhancing the reliability of the network. This paper presents the t / m -diagnosis algorithm and shows the t / m -diagnosability of the augmented k-ary n-cube A Q n, k, which is an extension of k-ary n-cubes and augmented cubes. In detail, we show that A Q n, k is [ 4 n ( 1 + m ) − ⌊ 5 ( 1 + m ) 2 2 ⌋ ] / m -diagnosable under the PMC model ( n ≥ 4, k ≥ 4 and 0 ≤ m ≤ n − 2 ) and under the MM* model ( n ≥ 4, k ≥ 4 and 1 ≤ m ≤ n − 2 ), respectively. Based on these results, we further propose the polynomial-time t / m -diagnosis algorithm on the node number of A Q n, k under the PMC and MM* models respectively.

NeurIPS Conference 2023 Conference Paper

Theoretically Guaranteed Bidirectional Data Rectification for Robust Sequential Recommendation

  • Yatong Sun
  • Bin Wang
  • Zhu Sun
  • Xiaochun Yang
  • Yan Wang

Sequential recommender systems (SRSs) are typically trained to predict the next item as the target given its preceding (and succeeding) items as the input. Such a paradigm assumes that every input-target pair is reliable for training. However, users can be induced to click on items that are inconsistent with their true preferences, resulting in unreliable instances, i. e. , mismatched input-target pairs. Current studies on mitigating this issue suffer from two limitations: (i) they discriminate instance reliability according to models trained with unreliable data, yet without theoretical guarantees that such a seemingly contradictory solution can be effective; and (ii) most methods can only tackle either unreliable input or targets but fail to handle both simultaneously. To fill the gap, we theoretically unveil the relationship between SRS predictions and instance reliability, whereby two error-bounded strategies are proposed to rectify unreliable targets and input, respectively. On this basis, we devise a model-agnostic Bidirectional Data Rectification (BirDRec) framework, which can be flexibly implemented with most existing SRSs for robust training against unreliable data. Additionally, a rectification sampling strategy is devised and a self-ensemble mechanism is adopted to reduce the (time and space) complexity of BirDRec. Extensive experiments on four real-world datasets verify the generality, effectiveness, and efficiency of our proposed BirDRec.

NeurIPS Conference 2023 Conference Paper

Waymax: An Accelerated, Data-Driven Simulator for Large-Scale Autonomous Driving Research

  • Cole Gulino
  • Justin Fu
  • Wenjie Luo
  • George Tucker
  • Eli Bronstein
  • Yiren Lu
  • Jean Harb
  • Xinlei Pan

Simulation is an essential tool to develop and benchmark autonomous vehicle planning software in a safe and cost-effective manner. However, realistic simulation requires accurate modeling of multi-agent interactive behaviors to be trustworthy, behaviors which can be highly nuanced and complex. To address these challenges, we introduce Waymax, a new data-driven simulator for autonomous driving in multi-agent scenes, designed for large-scale simulation and testing. Waymax uses publicly-released, real-world driving data (e. g. , the Waymo Open Motion Dataset) to initialize or play back a diverse set of multi-agent simulated scenarios. It runs entirely on hardware accelerators such as TPUs/GPUs and supports in-graph simulation for training, making it suitable for modern large-scale, distributed machine learning workflows. To support online training and evaluation, Waymax includes several learned and hard-coded behavior models that allow for realistic interaction within simulation. To supplement Waymax, we benchmark a suite of popular imitation and reinforcement learning algorithms with ablation studies on different design decisions, where we highlight the effectiveness of routes as guidance for planning agents and the ability of RL to overfit against simulated agents.

NeurIPS Conference 2022 Conference Paper

A Contrastive Framework for Neural Text Generation

  • Yixuan Su
  • Tian Lan
  • Yan Wang
  • Dani Yogatama
  • Lingpeng Kong
  • Nigel Collier

Text generation is of great importance to many natural language processing applications. However, maximization-based decoding methods (e. g. , beam search) of neural language models often lead to degenerate solutions---the generated text is unnatural and contains undesirable repetitions. Existing approaches introduce stochasticity via sampling or modify training objectives to decrease the probabilities of certain tokens (e. g. , unlikelihood training). However, they often lead to solutions that lack coherence. In this work, we show that an underlying reason for model degeneration is the anisotropic distribution of token representations. We present a contrastive solution: (i) SimCTG, a contrastive training objective to calibrate the model's representation space, and (ii) a decoding method---contrastive search---to encourage diversity while maintaining coherence in the generated text. Extensive experiments and analyses on three benchmarks from two languages demonstrate that our proposed approach outperforms state-of-the-art text generation methods as evaluated by both human and automatic metrics.

IJCAI Conference 2022 Conference Paper

AttExplainer: Explain Transformer via Attention by Reinforcement Learning

  • Runliang Niu
  • Zhepei Wei
  • Yan Wang
  • Qi Wang

Transformer and its variants, built based on attention mechanisms, have recently achieved remarkable performance in many NLP tasks. Most existing works on Transformer explanation tend to reveal and utilize the attention matrix with human subjective intuitions in a qualitative manner. However, the huge size of dimensions directly challenges these methods to quantitatively analyze the attention matrix. Therefore, in this paper, we propose a novel reinforcement learning (RL) based framework for Transformer explanation via attention matrix, namely AttExplainer. The RL agent learns to perform step-by-step masking operations by observing the change in attention matrices. We have adapted our method to two scenarios, perturbation-based model explanation and text adversarial attack. Experiments on three widely used text classification benchmarks validate the effectiveness of the proposed method compared to state-of-the-art baselines. Additional studies show that our method is highly transferable and consistent with human intuition. The code of this paper is available at https: //github. com/niuzaisheng/AttExplainer.

TCS Journal 2022 Journal Article

Connectivity and constructive algorithms of disjoint paths in dragonfly networks

  • Suying Wu
  • Jianxi Fan
  • Baolei Cheng
  • Jia Yu
  • Yan Wang

Dragonfly networks have been widely used in the current High Performance Computing (HPC) computers due to lower global network diameter and other advantages of communication performance such as modularity and cost-effectiveness. The original definition of the dragonfly network was very loose on account of its uncertain and diversified global link arrangements. In this paper, we study the logical structure of the dragonfly network which can be treated as a compound graph of complete graphs. Firstly, we give the general definition of the dragonfly network, named D F ( n, h, g ), and the specific definition of the dragonfly network under the relative global link arrangement, named D ( n, h ). Then, we prove that the connectivity of D ( n, h ) is n − 1 + h. In the end, we propose an O ( n ) algorithm to give the disjoint path between any two distinct vertices in D ( n, h ) and analyze the maximum length of these disjoint paths which is no more than 7.

JBHI Journal 2022 Journal Article

Deep Supervised Domain Adaptation for Pneumonia Diagnosis From Chest X-Ray Images

  • Yangqin Feng
  • Xinxing Xu
  • Yan Wang
  • Xiaofeng Lei
  • Soo Kng Teo
  • Jordan Zheng Ting Sim
  • Yonghan Ting
  • Liangli Zhen

Pneumonia is one of the most common treatable causes of death, and early diagnosis allows for early intervention. Automated diagnosis of pneumonia can therefore improve outcomes. However, it is challenging to develop high-performance deep learning models due to the lack of well-annotated data for training. This paper proposes a novel method, called Deep Supervised Domain Adaptation (DSDA), to automatically diagnose pneumonia from chest X-ray images. Specifically, we propose to transfer the knowledge from a publicly available large-scale source dataset (ChestX-ray14) to a well-annotated but small-scale target dataset (the TTSH dataset). DSDA aligns the distributions of the source domain and the target domain according to the underlying semantics of the training samples. It includes two task-specific sub-networks for the source domain and the target domain, respectively. These two sub-networks share the feature extraction layers and are trained in an end-to-end manner. Unlike most existing domain adaptation approaches that perform the same tasks in the source domain and the target domain, we attempt to transfer the knowledge from a multi-label classification task in the source domain to a binary classification task in the target domain. To evaluate the effectiveness of our method, we compare it with several existing peer methods. The experimental results show that our method can achieve promising performance for automated pneumonia diagnosis.

YNIMG Journal 2022 Journal Article

Disrupted neural tracking of sound localization during non-rapid eye movement sleep

  • Yan Wang
  • Lingxi Lu
  • Guangyuan Zou
  • Li Zheng
  • Lang Qin
  • Qihong Zou
  • Jia-Hong Gao

Spatial hearing in humans is a high-level auditory process that is crucial to rapid sound localization in the environment. Both neurophysiological models with animals and neuroimaging evidence from human subjects in the wakefulness stage suggest that the localization of auditory objects is mainly located in the posterior auditory cortex. However, whether this cognitive process is preserved during sleep remains unclear. To fill this research gap, we investigated the sleeping brain's capacity to identify sound locations by recording simultaneous electroencephalographic (EEG) and magnetoencephalographic (MEG) signals during wakefulness and non-rapid eye movement (NREM) sleep in human subjects. Using the frequency-tagging paradigm, the subjects were presented with a basic syllable sequence at 5 Hz and a location change that occurred every three syllables, resulting in a sound localization shift at 1.67 Hz. The EEG and MEG signals were used for sleep scoring and neural tracking analyses, respectively. Neural tracking responses at 5 Hz reflecting basic auditory processing were observed during both wakefulness and NREM sleep, although the responses during sleep were weaker than those during wakefulness. Cortical responses at 1.67 Hz, which correspond to the sound location change, were observed during wakefulness regardless of attention to the stimuli but vanished during NREM sleep. These results for the first time indicate that sleep preserves basic auditory processing but disrupts the higher-order brain function of sound localization.

EAAI Journal 2022 Journal Article

Fire detection in video surveillances using convolutional neural networks and wavelet transform

  • Lida Huang
  • Gang Liu
  • Yan Wang
  • Hongyong Yuan
  • Tao Chen

Fire is one of the most frequent and common emergencies threatening public safety and social development. Recently, intelligent fire detection technologies represented by convolutional neural networks (CNNs) have been widely concerned by academia and industry, substantially improving detection accuracy. However, CNN-based fire detection systems are still subject to the interference of false alarms and the limitation of computing power. In this paper, taking advantage of traditional spectral analysis in fire image detection technology, a novel Wavelet-CNN method is proposed, which applies the 2D Haar transform to extract spectral features of the image and input them into CNNs at different layer stages. Two classic backbone networks, ResNet50 and MobileNet v2 (MV2) are used to test our method, and experimental results on a benchmark fire dataset and a video dataset show that the method improves fire detection accuracy and reduces false alarms, especially for the light-weight MV2. Despite the low computational needs, the Wavelet-MV2 achieves accuracy that is comparable to state-of-the-art methods.

NeurIPS Conference 2022 Conference Paper

Flexible Neural Image Compression via Code Editing

  • Chenjian Gao
  • Tongda Xu
  • Dailan He
  • Yan Wang
  • Hongwei Qin

Neural image compression (NIC) has outperformed traditional image codecs in rate-distortion (R-D) performance. However, it usually requires a dedicated encoder-decoder pair for each point on R-D curve, which greatly hinders its practical deployment. While some recent works have enabled bitrate control via conditional coding, they impose strong prior during training and provide limited flexibility. In this paper we propose Code Editing, a highly flexible coding method for NIC based on semi-amortized inference and adaptive quantization. Our work is a new paradigm for variable bitrate NIC, and experimental results show that our method surpasses existing variable-rate methods. Furthermore, our approach is so flexible that it can also achieves ROI coding and multi-distortion trade-off with a single decoder. Our approach is compatible to all NIC methods with differentiable decoder NIC, and it can be even directly adopted on existing pre-trained models.

AIIM Journal 2022 Journal Article

ISSMF: Integrated semantic and spatial information of multi-level features for automatic segmentation in prenatal ultrasound images

  • Yihao Sun
  • Hongjian Yang
  • Jiliu Zhou
  • Yan Wang

As an effective way of routine prenatal diagnosis, ultrasound (US) imaging has been widely used recently. Biometrics obtained from the fetal segmentation shed light on fetal health monitoring. However, the segmentation in US images has strict requirements for sonographers on accuracy, making this task quite time-consuming and tedious. In this paper, we use DeepLabv3+ as the backbone and propose an Integrated Semantic and Spatial Information of Multi-level Features (ISSMF) based network to achieve the automatic and accurate segmentation of four parts of the fetus in US images while most of the previous works only segment one or two parts. Our contributions are threefold. First, to incorporate semantic information of high-level features and spatial information of low-level features of US images, we introduce a multi-level feature fusion module to integrate the features at different scales. Second, we propose to leverage the content-aware reassembly of features (CARAFE) upsampler to deeply explore the semantic and spatial information of multi-level features. Third, in order to alleviate performance degradation caused by batch normalization (BN) when batch size is small, we use group normalization (GN) instead. Experiments on four parts of fetus in US images show that our method outperforms the U-Net, DeepLabv3+ and the U-Net++ and the biometric measurements based on our segmentation results are pretty close to those derived from sonographers with ten-year work experience.

JBHI Journal 2022 Journal Article

Multi-Modal MRI Image Synthesis via GAN With Multi-Scale Gate Mergence

  • Bo Zhan
  • Di Li
  • Xi Wu
  • Jiliu Zhou
  • Yan Wang

Multi-modal magnetic resonance imaging (MRI) plays a critical role in clinical diagnosis and treatment nowadays. Each modality of MRI presents its own specific anatomical features which serve as complementary information to other modalities and can provide rich diagnostic information. However, due to the limitations of time consuming and expensive cost, some image sequences of patients may be lost or corrupted, posing an obstacle for accurate diagnosis. Although current multi-modal image synthesis approaches are able to alleviate the issues to some extent, they are still far short of fusing modalities effectively. In light of this, we propose a multi-scale gate mergence based generative adversarial network model, namely MGM-GAN, to synthesize one modality of MRI from others. Notably, we have multiple down-sampling branches corresponding to input modalities to specifically extract their unique features. In contrast to the generic multi-modal fusion approach of averaging or maximizing operations, we introduce a gate mergence (GM) mechanism to automatically learn the weights of different modalities across locations, enhancing the task-related information while suppressing the irrelative information. As such, the feature maps of all the input modalities at each down-sampling level, i. e. , multi-scale levels, are integrated via GM module. In addition, both the adversarial loss and the pixel-wise loss, as well as gradient difference loss (GDL) are applied to train the network to produce the desired modality accurately. Extensive experiments demonstrate that the proposed method outperforms the state-of-the-art multi-modal image synthesis methods.

NeurIPS Conference 2022 Conference Paper

Multi-Sample Training for Neural Image Compression

  • Tongda Xu
  • Yan Wang
  • Dailan He
  • Chenjian Gao
  • Han Gao
  • Kunzan Liu
  • Hongwei Qin

This paper considers the problem of lossy neural image compression (NIC). Current state-of-the-art (SOTA) methods adopt uniform posterior to approximate quantization noise, and single-sample pathwise estimator to approximate the gradient of evidence lower bound (ELBO). In this paper, we propose to train NIC with multiple-sample importance weighted autoencoder (IWAE) target, which is tighter than ELBO and converges to log likelihood as sample size increases. First, we identify that the uniform posterior of NIC has special properties, which affect the variance and bias of pathwise and score function estimators of the IWAE target. Moreover, we provide insights on a commonly adopted trick in NIC from gradient variance perspective. Based on those analysis, we further propose multiple-sample NIC (MS-NIC), an enhanced IWAE target for NIC. Experimental results demonstrate that it improves SOTA NIC methods. Our MS-NIC is plug-and-play, and can be easily extended to neural video compression.

AAAI Conference 2022 System Paper

MWPToolkit: An Open-Source Framework for Deep Learning-Based Math Word Problem Solvers

  • Yihuai Lan
  • Lei Wang
  • Qiyuan Zhang
  • Yunshi Lan
  • Bing Tian Dai
  • Yan Wang
  • Dongxiang Zhang
  • Ee-Peng Lim

While Math Word Problem (MWP) solving has emerged as a popular field of study and made great progress in recent years, most existing methods are benchmarked solely on one or two datasets and implemented with different configurations. In this paper, we introduce the first open-source library for solving MWPs called MWPToolkit, which provides a unified, comprehensive, and extensible framework for the research purpose. Specifically, we deploy 17 deep learningbased MWP solvers and 6 MWP datasets in our toolkit. These MWP solvers are advanced models for MWP solving, covering the categories of Seq2seq, Seq2Tree, Graph2Tree, and Pre-trained Language Models. And these MWP datasets are popular datasets that are commonly used as benchmarks in existing work. Our toolkit is featured with highly modularized and reusable components, which can help researchers quickly get started and develop their own models. We have released the code and documentation of MWPToolkit in https: //github. com/LYH-YF/MWPToolkit.

NeurIPS Conference 2022 Conference Paper

Neural Surface Reconstruction of Dynamic Scenes with Monocular RGB-D Camera

  • Hongrui Cai
  • Wanquan Feng
  • Xuetao Feng
  • Yan Wang
  • Juyong Zhang

We propose Neural-DynamicReconstruction (NDR), a template-free method to recover high-fidelity geometry and motions of a dynamic scene from a monocular RGB-D camera. In NDR, we adopt the neural implicit function for surface representation and rendering such that the captured color and depth can be fully utilized to jointly optimize the surface and deformations. To represent and constrain the non-rigid deformations, we propose a novel neural invertible deforming network such that the cycle consistency between arbitrary two frames is automatically satisfied. Considering that the surface topology of dynamic scene might change over time, we employ a topology-aware strategy to construct the topology-variant correspondence for the fused frames. NDR also further refines the camera poses in a global optimization manner. Experiments on public datasets and our collected dataset demonstrate that NDR outperforms existing monocular dynamic reconstruction methods.

TIST Journal 2022 Journal Article

Toward Scalable and Privacy-preserving Deep Neural Network via Algorithmic-Cryptographic Co-design

  • Jun Zhou
  • Longfei Zheng
  • Chaochao Chen
  • Yan Wang
  • Xiaolin Zheng
  • Bingzhe Wu
  • Cen Chen
  • Li Wang

Deep Neural Networks (DNNs) have achieved remarkable progress in various real-world applications, especially when abundant training data are provided. However, data isolation has become a serious problem currently. Existing works build privacy-preserving DNN models from either algorithmic perspective or cryptographic perspective. The former mainly splits the DNN computation graph between data holders or between data holders and server, which demonstrates good scalability but suffers from accuracy loss and potential privacy risks. In contrast, the latter leverages time-consuming cryptographic techniques, which has strong privacy guarantee but poor scalability. In this article, we propose SPNN—a Scalable and Privacy-preserving deep Neural Network learning framework, from an algorithmic-cryptographic co-perspective. From algorithmic perspective, we split the computation graph of DNN models into two parts, i.e., the private-data-related computations that are performed by data holders and the rest heavy computations that are delegated to a semi-honest server with high computation ability. From cryptographic perspective, we propose using two types of cryptographic techniques, i.e., secret sharing and homomorphic encryption, for the isolated data holders to conduct private-data-related computations privately and cooperatively. Furthermore, we implement SPNN in a decentralized setting and introduce user-friendly APIs. Experimental results conducted on real-world datasets demonstrate the superiority of our proposed SPNN.

AAAI Conference 2022 Conference Paper

Weakly-Supervised Salient Object Detection Using Point Supervision

  • Shuyong Gao
  • Wei Zhang
  • Yan Wang
  • Qianyu Guo
  • Chenglong Zhang
  • Yangji He
  • Wenqiang Zhang

Current state-of-the-art saliency detection models rely heavily on large datasets of accurate pixel-wise annotations, but manually labeling pixels is time-consuming and laborintensive. There are some weakly supervised methods developed for alleviating the problem, such as image label, bounding box label, and scribble label, while point label still has not been explored in this field. In this paper, we propose a novel weakly-supervised salient object detection method using point supervision. To infer the saliency map, we first design an adaptive masked flood filling algorithm to generate pseudo labels. Then we develop a transformer-based pointsupervised saliency detection model to produce the first round of saliency maps. However, due to the sparseness of the label, the weakly supervised model tends to degenerate into a general foreground detection model. To address this issue, we propose a Non-Salient Suppression (NSS) method to optimize the erroneous saliency maps generated in the first round and leverage them for the second round of training. Moreover, we build a new point-supervised dataset (P- DUTS) by relabeling the DUTS dataset. In P-DUTS, there is only one labeled point for each salient object. Comprehensive experiments on five largest benchmark datasets demonstrate our method outperforms the previous state-of-the-art methods trained with the stronger supervision and even surpass several fully supervised state-of-the-art models. The code is available at: https: //github. com/shuyonggao/PSOD.

IJCAI Conference 2021 Conference Paper

An Adaptive News-Driven Method for CVaR-sensitive Online Portfolio Selection in Non-Stationary Financial Markets

  • Qianqiao Liang
  • Mengying Zhu
  • Xiaolin Zheng
  • Yan Wang

CVaR-sensitive online portfolio selection (CS-OLPS) becomes increasingly important for investors because of its effectiveness to minimize conditional value at risk (CVaR) and control extreme losses. However, the non-stationary nature of financial markets makes it very difficult to address the CS-OLPS problem effectively. To address the CS-OLPS problem in non-stationary markets, we propose an effective news-driven method, named CAND, which adaptively exploits news to determine the adjustment tendency and adjustment scale for tracking the dynamic optimal portfolio with minimal CVaR in each trading round. In addition, we devise a filtering mechanism to reduce the errors caused by the noisy news for further improving CAND's effectiveness. We rigorously prove a sub-linear regret of CAND. Extensive experiments on three real-world datasets demonstrate CAND’s superiority over the state-of-the-art portfolio methods in terms of returns and risks.

IJCAI Conference 2021 Conference Paper

Cross-Domain Recommendation: Challenges, Progress, and Prospects

  • Feng Zhu
  • Yan Wang
  • Chaochao Chen
  • Jun Zhou
  • Longfei Li
  • Guanfeng Liu

To address the long-standing data sparsity problem in recommender systems (RSs), cross-domain recommendation (CDR) has been proposed to leverage the relatively richer information from a richer domain to improve the recommendation performance in a sparser domain. Although CDR has been extensively studied in recent years, there is a lack of a systematic review of the existing CDR approaches. To fill this gap, in this paper, we provide a comprehensive review of existing CDR approaches, including challenges, research progress, and prospects. Specifically, we first summarize existing CDR approaches into four types, including single-target CDR, single-target multi-domain recommendation (MDR), dual-target CDR, and multi-target CDR. We then present the definitions and challenges of these CDR approaches. Next, we propose a full-view categorization and new taxonomies on these approaches and report their research progress in detail. In the end, we share several promising prospects in CDR.

IJCAI Conference 2021 Conference Paper

Graph Learning based Recommender Systems: A Review

  • Shoujin Wang
  • Liang Hu
  • Yan Wang
  • Xiangnan He
  • Quan Z. Sheng
  • Mehmet A. Orgun
  • Longbing Cao
  • Francesco Ricci

Recent years have witnessed the fast development of the emerging topic of Graph Learning based Recommender Systems (GLRS). GLRS mainly employ advanced graph learning approaches to model users’ preferences and intentions as well as items’ characteristics and popularity for Recommender Systems (RS). Differently from other approaches, including content based filtering and collaborative filtering, GLRS are built on graphs where the important objects, e. g. , users, items, and attributes, are either explicitly or implicitly connected. With the rapid development of graph learning techniques, exploring and exploiting homogeneous or heterogeneous relations in graphs is a promising direction for building more effective RS. In this paper, we provide a systematic review of GLRS, by discussing how they extract knowledge from graphs to improve the accuracy, reliability and explainability of the recommendations. First, we characterize and formalize GLRS, and then summarize and categorize the key challenges and main progress in this novel research area.

AAAI Conference 2021 Conference Paper

Sketch and Customize: A Counterfactual Story Generator

  • Changying Hao
  • Liang Pang
  • Yanyan Lan
  • Yan Wang
  • Jiafeng Guo
  • Xueqi Cheng

Recent text generation models are easy to generate relevant and fluent text for the given text, while lack of causal reasoning ability when we change some parts of the given text. Counterfactual story rewriting is a recently proposed task to test the causal reasoning ability for text generation models, which requires a model to predict the corresponding story ending when the condition is modified to a counterfactual one. Previous works have shown that the traditional sequence-tosequence model cannot well handle this problem, as it often captures some spurious correlations between the original and counterfactual endings, instead of the causal relations between conditions and endings. To address this issue, we propose a sketch-and-customize generation model guided by the causality implicated in the conditions and endings. In the sketch stage, a skeleton is extracted by removing words which are conflict to the counterfactual condition, from the original ending. In the customize stage, a generation model is used to fill proper words in the skeleton under the guidance of the counterfactual condition. In this way, the obtained counterfactual ending is both relevant to the original ending and consistent with the counterfactual condition. Experimental results show that the proposed model generates much better endings, as compared with the traditional sequence-to-sequence model.

IJCAI Conference 2020 Conference Paper

A Graphical and Attentional Framework for Dual-Target Cross-Domain Recommendation

  • Feng Zhu
  • Yan Wang
  • Chaochao Chen
  • Guanfeng Liu
  • Xiaolin Zheng

The conventional single-target Cross-Domain Recommendation (CDR) only improves the recommendation accuracy on a target domain with the help of a source domain (with relatively richer information). In contrast, the novel dual-target CDR has been proposed to improve the recommendation accuracies on both domains simultaneously. However, dual-target CDR faces two new challenges: (1) how to generate more representative user and item embeddings, and (2) how to effectively optimize the user/item embeddings on each domain. To address these challenges, in this paper, we propose a graphical and attentional framework, called GA-DTCDR. In GA-DTCDR, we first construct two separate heterogeneous graphs based on the rating and content information from two domains to generate more representative user and item embeddings. Then, we propose an element-wise attention mechanism to effectively combine the embeddings of common users learned from both domains. Both steps significantly enhance the quality of user and item embeddings and thus improve the recommendation accuracy on each domain. Extensive experiments conducted on four real-world datasets demonstrate that GA-DTCDR significantly outperforms the state-of-the-art approaches.

AAAI Conference 2020 Conference Paper

Intention Nets: Psychology-Inspired User Choice Behavior Modeling for Next-Basket Prediction

  • Shoujin Wang
  • Liang Hu
  • Yan Wang
  • Quan Z. Sheng
  • Mehmet Orgun
  • Longbing Cao

Human behaviors are complex, which are often observed as a sequence of heterogeneous actions. In this paper, we take user choices for shopping baskets as a typical case to study the complexity of user behaviors. Most of existing approaches often model user behaviors in a mechanical way, namely treating a user action sequence as homogeneous sequential data, such as hourly temperatures, which fails to consider the complexity in user behaviors. In fact, users’ choices are driven by certain underlying intentions (e. g. , feeding the baby or relieving pain) according to Psychological theories. Moreover, the durations of intentions to drive user actions are quite different; some of them may be persistent while others may be transient. According to Psychological theories, we develop a hierarchical framework to describe the goal, intentions and action sequences, based on which, we design Intention Nets (Int- Net). In IntNet, multiple Action Chain Nets are constructed to model the user actions driven by different intentions, and a specially designed Persistent-Transient Intention Unit models the different intention durations. We apply the IntNet to nextbasket prediction, a recent challenging task in recommender systems. Extensive experiments on real-world datasets show the superiority of our Psychology-inspired model IntNet over the state-of-the-art approaches.

IJCAI Conference 2020 Conference Paper

Intention2Basket: A Neural Intention-driven Approach for Dynamic Next-basket Planning

  • Shoujin Wang
  • Liang Hu
  • Yan Wang
  • Quan Z. Sheng
  • Mehmet Orgun
  • Longbing Cao

User purchase behaviours are complex and dynamic, which are usually observed as multiple choice actions across a sequence of shopping baskets. Most of the existing next-basket prediction approaches model user actions as homogeneous sequence data without considering complex and heterogeneous user intentions, impeding deep under-standing of user behaviours from the perspective of human inside drivers and thus reducing the prediction performance. Psychological theories have indicated that user actions are essentially driven by certain underlying intentions (e. g. , diet and entertainment). Moreover, different intentions may influence each other while different choices usually have different utilities to accomplish an intention. Inspired by such psychological insights, we formalize the next-basket prediction as an Intention Recognition, Modelling and Accomplishing problem and further design the Intention2Basket (Int2Ba in short) model. In Int2Ba, an Intention Recognizer, a Coupled Intention Chain Net, and a Dynamic Basket Planner are specifically designed to respectively recognize, model and accomplish the heterogeneous intentions behind a sequence of baskets to better plan the next-basket. Extensive experiments on real-world datasets show the superiority of Int2Ba over the state-of-the-art approaches.

IJCAI Conference 2020 Conference Paper

Online Portfolio Selection with Cardinality Constraint and Transaction Costs based on Contextual Bandit

  • Mengying Zhu
  • Xiaolin Zheng
  • Yan Wang
  • Qianqiao Liang
  • Wenfang Zhang

Online portfolio selection (OLPS) is a fundamental and challenging problem in financial engineering, which faces two practical constraints during the real trading, i. e. , cardinality constraint and non-zero transaction costs. In order to achieve greater feasibility in financial markets, in this paper, we propose a novel online portfolio selection method named LExp4. TCGP with theoretical guarantee of sublinear regret to address the OLPS problem with the two constraints. In addition, we incorporate side information into our method based on contextual bandit, which further improves the effectiveness of our method. Extensive experiments conducted on four representative real-world datasets demonstrate that our method significantly outperforms the state-of-the-art methods when cardinality constraint and non-zero transaction costs co-exist.

NeurIPS Conference 2020 Conference Paper

Rotated Binary Neural Network

  • Mingbao Lin
  • Rongrong Ji
  • Zihan Xu
  • Baochang Zhang
  • Yan Wang
  • Yongjian Wu
  • Feiyue Huang
  • Chia-Wen Lin

Binary Neural Network (BNN) shows its predominance in reducing the complexity of deep neural networks. However, it suffers severe performance degradation. One of the major impediments is the large quantization error between the full-precision weight vector and its binary vector. Previous works focus on compensating for the norm gap while leaving the angular bias hardly touched. In this paper, for the first time, we explore the influence of angular bias on the quantization error and then introduce a Rotated Binary Neural Network (RBNN), which considers the angle alignment between the full-precision weight vector and its binarized version. At the beginning of each training epoch, we propose to rotate the full-precision weight vector to its binary vector to reduce the angular bias. To avoid the high complexity of learning a large rotation matrix, we further introduce a bi-rotation formulation that learns two smaller rotation matrices. In the training stage, we devise an adjustable rotated weight vector for binarization to escape the potential local optimum. Our rotation leads to around 50% weight flips which maximize the information gain. Finally, we propose a training-aware approximation of the sign function for the gradient backward. Experiments on CIFAR-10 and ImageNet demonstrate the superiorities of RBNN over many state-of-the-arts. Our source code, experimental settings, training logs and binary models are available at https: //github. com/lmbxmu/RBNN.

NeurIPS Conference 2020 Conference Paper

Wasserstein Distances for Stereo Disparity Estimation

  • Divyansh Garg
  • Yan Wang
  • Bharath Hariharan
  • Mark Campbell
  • Kilian Q. Weinberger
  • Wei-Lun Chao

Existing approaches to depth or disparity estimation output a distribution over a set of pre-defined discrete values. This leads to inaccurate results when the true depth or disparity does not match any of these values. The fact that this distribution is usually learned indirectly through a regression loss causes further problems in ambiguous regions around object boundaries. We address these issues using a new neural network architecture that is capable of outputting arbitrary depth values, and a new loss function that is derived from the Wasserstein distance between the true and the predicted distributions. We validate our approach on a variety of tasks, including stereo disparity and depth estimation, and the downstream 3D object detection. Our approach drastically reduces the error in ambiguous regions, especially around object boundaries that greatly affect the localization of objects in 3D, achieving the state-of-the-art in 3D object detection for autonomous driving.

AAAI Conference 2019 Conference Paper

Better Fine-Tuning via Instance Weighting for Text Classification

  • Zhi Wang
  • Wei Bi
  • Yan Wang
  • Xiaojiang Liu

Transfer learning for deep neural networks has achieved great success in many text classification applications. A simple yet effective transfer learning method is to fine-tune the pretrained model parameters. Previous fine-tuning works mainly focus on the pre-training stage and investigate how to pretrain a set of parameters that can help the target task most. In this paper, we propose an Instance Weighting based Finetuning (IW-Fit) method, which revises the fine-tuning stage to improve the final performance on the target domain. IW-Fit adjusts instance weights at each fine-tuning epoch dynamically to accomplish two goals: 1) identify and learn the specific knowledge of the target domain effectively; 2) well preserve the shared knowledge between the source and the target domains. The designed instance weighting metrics used in IW-Fit are model-agnostic, which are easy to implement for general DNN-based classifiers. Experimental results show that IW-Fit can consistently improve the classification accuracy on the target domain.

IJCAI Conference 2019 Conference Paper

Modeling Multi-Purpose Sessions for Next-Item Recommendations via Mixture-Channel Purpose Routing Networks

  • Shoujin Wang
  • Liang Hu
  • Yan Wang
  • Quan Z. Sheng
  • Mehmet Orgun
  • Longbing Cao

A session-based recommender system (SBRS) suggests the next item by modeling the dependencies between items in a session. Most of existing SBRSs assume the items inside a session are associated with one (implicit) purpose. However, this may not always be true in reality, and a session may often consist of multiple subsets of items for different purposes (e. g. , breakfast and decoration). Specifically, items (e. g. , bread and milk) in a subsethave strong purpose-specific dependencies whereas items (e. g. , bread and vase) from different subsets have much weaker or even no dependencies due to the difference of purposes. Therefore, we propose a mixture-channel model to accommodate the multi-purpose item subsets for more precisely representing a session. Filling gaps in existing SBRSs, this model recommends more diverse items to satisfy different purposes. Accordingly, we design effective mixture-channel purpose routing networks (MCPRN) with a purpose routing network to detect the purposes of each item and assign it into the corresponding channels. Moreover, a purpose specific recurrent network is devised to model the dependencies between items within each channel for a specific purpose. The experimental results show the superiority of MCPRN over the state-of-the-art methods in terms of both recommendation accuracy and diversity.

IJCAI Conference 2019 Conference Paper

ProNE: Fast and Scalable Network Representation Learning

  • Jie Zhang
  • Yuxiao Dong
  • Yan Wang
  • Jie Tang
  • Ming Ding

Recent advances in network embedding has revolutionized the field of graph and network mining. However, (pre-)training embeddings for very large-scale networks is computationally challenging for most existing methods. In this work, we present ProNE---a fast, scalable, and effective model, whose single-thread version is 10--400x faster than efficient network embedding benchmarks with 20 threads, including LINE, DeepWalk, node2vec, GraRep, and HOPE. As a concrete example, the single-version ProNE requires only 29 hours to embed a network of hundreds of millions of nodes while it takes LINE weeks and DeepWalk months by using 20 threads. To achieve this, ProNE first initializes network embeddings efficiently by formulating the task as sparse matrix factorization. The second step of ProNE is to enhance the embeddings by propagating them in the spectrally modulated space. Extensive experiments on networks of various scales and types demonstrate that ProNE achieves both effectiveness and significant efficiency superiority when compared to the aforementioned baselines. In addition, ProNE's embedding enhancement step can be also generalized for improving other models at speed, e. g. , offering >10% relative gains for the used baselines.

IJCAI Conference 2019 Conference Paper

Sequential Recommender Systems: Challenges, Progress and Prospects

  • Shoujin Wang
  • Liang Hu
  • Yan Wang
  • Longbing Cao
  • Quan Z. Sheng
  • Mehmet Orgun

The emerging topic of sequential recommender systems (SRSs) has attracted increasing attention in recent years. Different from the conventional recommender systems (RSs) including collaborative filtering and content-based filtering, SRSs try to understand and model the sequential user behaviors, the interactions between users and items, and the evolution of users’ preferences and item popularity over time. SRSs involve the above aspects for more precise characterization of user contexts, intent and goals, and item consumption trend, leading to more accurate, customized and dynamic recommendations. In this paper, we provide a systematic review on SRSs. We first present the characteristics of SRSs, and then summarize and categorize the key challenges in this research area, followed by the corresponding research progress consisting of the most recent and representative developments on this topic. Finally, we discuss the important research directions in this vibrant area.

NeurIPS Conference 2019 Conference Paper

Variational Structured Semantic Inference for Diverse Image Captioning

  • Fuhai Chen
  • Rongrong Ji
  • Jiayi Ji
  • Xiaoshuai Sun
  • Baochang Zhang
  • Xuri Ge
  • Yongjian Wu
  • Feiyue Huang

Despite the exciting progress in image captioning, generating diverse captions for a given image remains as an open problem. Existing methods typically apply generative models such as Variational Auto-Encoder to diversify the captions, which however neglect two key factors of diverse expression, i. e. , the lexical diversity and the syntactic diversity. To model these two inherent diversities in image captioning, we propose a Variational Structured Semantic Inferring model (termed VSSI-cap) executed in a novel structured encoder-inferer-decoder schema. VSSI-cap mainly innovates in a novel structure, i. e. , Variational Multi-modal Inferring tree (termed VarMI-tree). In particular, conditioned on the visual-textual features from the encoder, the VarMI-tree models the lexical and syntactic diversities by inferring their latent variables (with variations) in an approximate posterior inference guided by a visual semantic prior. Then, a reconstruction loss and the posterior-prior KL-divergence are jointly estimated to optimize the VSSI-cap model. Finally, diverse captions are generated upon the visual features and the latent variables from this structured encoder-inferer-decoder model. Experiments on the benchmark dataset show that the proposed VSSI-cap achieves significant improvements over the state-of-the-arts.

YNIMG Journal 2018 Journal Article

3D conditional generative adversarial networks for high-quality PET image estimation at low dose

  • Yan Wang
  • Biting Yu
  • Lei Wang
  • Chen Zu
  • David S. Lalush
  • Weili Lin
  • Xi Wu
  • Jiliu Zhou

Positron emission tomography (PET) is a widely used imaging modality, providing insight into both the biochemical and physiological processes of human body. Usually, a full dose radioactive tracer is required to obtain high-quality PET images for clinical needs. This inevitably raises concerns about potential health hazards. On the other hand, dose reduction may cause the increased noise in the reconstructed PET images, which impacts the image quality to a certain extent. In this paper, in order to reduce the radiation exposure while maintaining the high quality of PET images, we propose a novel method based on 3D conditional generative adversarial networks (3D c-GANs) to estimate the high-quality full-dose PET images from low-dose ones. Generative adversarial networks (GANs) include a generator network and a discriminator network which are trained simultaneously with the goal of one beating the other. Similar to GANs, in the proposed 3D c-GANs, we condition the model on an input low-dose PET image and generate a corresponding output full-dose PET image. Specifically, to render the same underlying information between the low-dose and full-dose PET images, a 3D U-net-like deep architecture which can combine hierarchical features by using skip connection is designed as the generator network to synthesize the full-dose image. In order to guarantee the synthesized PET image to be close to the real one, we take into account of the estimation error loss in addition to the discriminator feedback to train the generator network. Furthermore, a concatenated 3D c-GANs based progressive refinement scheme is also proposed to further improve the quality of estimated images. Validation was done on a real human brain dataset including both the normal subjects and the subjects diagnosed as mild cognitive impairment (MCI). Experimental results show that our proposed 3D c-GANs method outperforms the benchmark methods and achieves much better performance than the state-of-the-art methods in both qualitative and quantitative measures.

IJCAI Conference 2018 Conference Paper

A Deep Framework for Cross-Domain and Cross-System Recommendations

  • Feng Zhu
  • Yan Wang
  • Chaochao Chen
  • Guanfeng Liu
  • Mehmet Orgun
  • Jia Wu

Cross-Domain Recommendation (CDR) and Cross-System Recommendations (CSR) are two of the promising solutions to address the long-standing data sparsity problem in recommender systems. They leverage the relatively richer information, e. g. , ratings, from the source domain or system to improve the recommendation accuracy in the target domain or system. Therefore, finding an accurate mapping of the latent factors across domains or systems is crucial to enhancing recommendation accuracy. However, this is a very challenging task because of the complex relationships between the latent factors of the source and target domains or systems. To this end, in this paper, we propose a Deep framework for both Cross-Domain and Cross-System Recommendations, called DCDCSR, based on Matrix Factorization (MF) models and a fully connected Deep Neural Network (DNN). Specifically, DCDCSR first employs the MF models to generate user and item latent factors and then employs the DNN to map the latent factors across domains or systems. More importantly, we take into account the rating sparsity degrees of individual users and items in different domains or systems and use them to guide the DNN training process for utilizing the rating data more effectively. Extensive experiments conducted on three real-world datasets demonstrate that DCDCSR framework outperforms the state-of-the-art CDR and CSR approaches in terms of recommendation accuracy.

IJCAI Conference 2018 Conference Paper

Recurrent Collaborative Filtering for Unifying General and Sequential Recommender

  • Disheng Dong
  • Xiaolin Zheng
  • Ruixun Zhang
  • Yan Wang

General recommender and sequential recommender are two commonly applied modeling paradigms for recommendation tasks. General recommender focuses on modeling the general user preferences, ignoring the sequential patterns in user behaviors; whereas sequential recommender focuses on exploring the item-to-item sequential relations, failing to model the global user preferences. In addition, better recommendation performance has recently been achieved by adopting an approach to combine them. However, previous approaches are unable to solve both tasks in a unified way and cannot capture the whole historical sequential information. In this paper, we propose a recommendation model named Recurrent Collaborative Filtering (RCF), which unifies both paradigms within a single model. Specifically, we combine recurrent neural network (the sequential recommender part) and matrix factorization model (the general recommender part) in a multi-task learning framework, where we perform joint optimization with shared model parameters enforcing the two parts to regularize each other. Furthermore, we empirically demonstrate on MovieLens and Netflix datasets that our model outperforms the state-of-the-art methods across the tasks of both sequential and general recommender.

EAAI Journal 2018 Journal Article

Source term estimation of hazardous material releases using hybrid genetic algorithm with composite cost functions

  • Yan Wang
  • Hong Huang
  • Lida Huang
  • Xiaole Zhang

Source term estimation (STE) of atmospheric dispersion plays an important role in public safety, environmental protection and many other application fields. In this paper, several new composite cost functions for STE using hybrid genetic algorithm are proposed and compared using Nemenyi test based on 68 STE tasks from Prairie Grass field experiment. Results show one of the new composite cost functions, named as WSD, has outstanding performance in estimating both source location and emission rate. Then the patterns in STE results using different cost functions are analyzed based on the 68 tasks mentioned above, which provides further insights into what to expect from STE. At last, the relationship between composite cost functions and multi-objective optimization is analyzed to facilitate the understanding of composite cost functions. To summarize, composite cost functions such as WSD has the potential to achieve a better balance between sensitivity and robustness of cost functions applied in STE, providing the most accurate estimates. Statistical algorithm comparison techniques like Nemenyi test can help us better understand the characteristics and performance of specific settings in STE methods.

TCS Journal 2017 Journal Article

Edge-independent spanning trees in augmented cubes

  • Yan Wang
  • Hong Shen
  • Jianxi Fan

Edge-independent spanning trees (EISTs) have important applications in networks such as reliable communication protocols, one-to-all broadcasting, and secure message distribution, thus their designs in several classes of networks have been widely investigated. The n-dimensional augmented cube ( A Q n ) is an important variant of the n-dimensional hypercube. It is ( 2 n − 1 )-regular, ( 2 n − 1 )-connected ( n ≠ 3 ), vertex-symmetric and has diameter of ⌈ n / 2 ⌉. In this paper, by proposing an O ( N log ⁡ N ) algorithm that constructs 2 n − 1 EISTs in A Q n, where N is the number of nodes in A Q n, we solve the EISTs problem for this class of graphs. Since A Q n is ( 2 n − 1 )-regular, the result is optimal with respect to the number of EISTs constructed.

AAMAS Conference 2016 Conference Paper

A Novel Incentive Mechanism for Truthful Performance Assessments of Cloud Services (Extended Abstract)

  • Lie Qu
  • Yan Wang
  • Mehmet Orgun

The performance evaluation of cloud services usually relies on continual assessments from cloud users. In order to elicit continual and truthful assessments, an effective incentive mechanism should allow users to provide uncertain assessments when they are not sure about the real performance of cloud services, rather than providing untruthful or arbitrary assessments. This paper a novel uncertain-assessment-aware incentive mechanism. Under this mechanism, a rational user not only has sufficient incentives to continually provide truthful assessments, but also would prefer providing uncertain assessments over untruthful or arbitrary assessments since uncertain assessments can bring more benefits than untruthful or arbitrary assessments.

NeurIPS Conference 2016 Conference Paper

A Powerful Generative Model Using Random Weights for the Deep Image Representation

  • Kun He
  • Yan Wang
  • John Hopcroft

To what extent is the success of deep visualization due to the training? Could we do deep visualization using untrained, random weight networks? To address this issue, we explore new and powerful generative models for three popular deep visualization tasks using untrained, random weight convolutional neural networks. First we invert representations in feature spaces and reconstruct images from white noise inputs. The reconstruction quality is statistically higher than that of the same method applied on well trained networks with the same architecture. Next we synthesize textures using scaled correlations of representations in multiple layers and our results are almost indistinguishable with the original natural texture and the synthesized textures based on the trained network. Third, by recasting the content of an image in the style of various artworks, we create artistic images with high perceptual quality, highly competitive to the prior work of Gatys et al. on pretrained networks. To our knowledge this is the first demonstration of image representations using untrained deep neural networks. Our work provides a new and fascinating tool to study the representation of deep network architecture and sheds light on new understandings on deep visualization. It may possibly lead to a way to compare network architectures without training.

AAAI Conference 2016 Conference Paper

Capturing Semantic Correlation for Item Recommendation in Tagging Systems

  • Chaochao Chen
  • Xiaolin Zheng
  • Yan Wang
  • Fuxing Hong
  • Deren Chen

The popularity of tagging systems provides a great opportunity to improve the performance of item recommendation. Although existing approaches use topic modeling to mine the semantic information of items by grouping the tags labelled for items, they overlook an important property that tags link users and items as a bridge. Thus these methods cannot deal with the data sparsity without commonly rated items (DS-WO-CRI) problem, limiting their recommendation performance. Towards solving this challenging problem, we propose a novel tag and rating based collaborative filtering (CF) model for item recommendation, which first uses topic modeling to mine the semantic information of tags for each user and for each item respectively, and then incorporates the semantic information into matrix factorization to factorize rating information and to capture the bridging feature of tags and ratings between users and items. As a result, our model captures the semantic correlation between users and items, and is able to greatly improve recommendation performance, especially in DS-WO-CRI situations. Experiments conducted on two popular real-world datasets demonstrate that our proposed model significantly outperforms the conventional CF approach, the state-of-the-art social relation based CF approach, and the state-of-the-art topic modeling based CF approaches in terms of both precision and recall, and it is an effective approach to the DS-WO-CRI problem.

JBHI Journal 2016 Journal Article

Personalized Multilayer Daily Life Profiling Through Context Enabled Activity Classification and Motion Reconstruction: An Integrated System Approach

  • James Y. Xu
  • Yan Wang
  • Mick Barrett
  • Bruce Dobkin
  • Greg J. Pottie
  • William J. Kaiser

Profiling the daily activity of a physically disabled person in the community would enable healthcare professionals to monitor the type, quantity, and quality of their patients' compliance with recommendations for exercise, fitness, and practice of skilled movements, as well as enable feedback about performance in real-world situations. Based on our early research in in-community activity profiling, we present in this paper an end-to-end system capable of reporting a patient's daily activity at multiple levels of granularity: 1) at the highest level, information on the location categories a patient is able to visit; 2) within each location category, information on the activities a patient is able to perform; and 3) at the lowest level, motion trajectory, visualization, and metrics computation of each activity. Our methodology is built upon a physical activity prescription model coupled with MEMS inertial sensors and mobile device kits that can be sent to a patient at home. A novel context-guided activity-monitoring concept with categorical location context is used to achieve enhanced classification accuracy and throughput. The methodology is then seamlessly integrated with motion reconstruction and metrics computation to provide comprehensive layered reporting of a patient's daily life. We also present an implementation of the methodology featuring a novel location context detection algorithm using WiFi augmented GPS and overlays, with motion reconstruction and visualization algorithms for practical in-community deployment. Finally, we use a series of experimental field evaluations to confirm the accuracy of the system.

JBHI Journal 2015 Journal Article

Integrated Inertial Sensors and Mobile Computing for Real-Time Cycling Performance Guidance via Pedaling Profile Classification

  • James Y. Xu
  • Xiaomeng Nan
  • Victor Ebken
  • Yan Wang
  • Greg J. Pottie
  • William J. Kaiser

Today, the bicycle is utilized as a daily commute tool, a physical rehabilitation asset, and sporting equipment, prompting studies into the biomechanics of cycling. Of the number of important parameters that affect cycling efficiency, the foot angle profile is one of the most important as it correlates directly with the effective force applied to the bike. However, there has been no compact and portable solution for measuring the foot angle and for providing the cyclist with real-time feedback due to a number of difficulties of the current tracking and sensing technologies and the myriad types of bikes available. This paper presents a novel sensing and mobile computing system for classifying the foot angle profiles during cycling and for providing real-time guidance to the user to achieve the correct profile. Continuous foot angle tracking is firstly converted into a discrete problem requiring only recognition of acceleration profiles of the foot using a single shoe mounted tri-axial accelerometer during each pedaling cycle. A classification method is then applied to identify the pedaling profile. Finally, a mobile solution is presented to provide real-time signal processing and guidance.

AAAI Conference 2014 Conference Paper

Context-Aware Collaborative Topic Regression with Social Matrix Factorization for Recommender Systems

  • Chaochao Chen
  • Xiaolin Zheng
  • Yan Wang
  • Fuxing Hong
  • Zhen Lin

Online social networking sites have become popular platforms on which users can link with each other and share information, not only basic rating information but also information such as contexts, social relationships, and item contents. However, as far as we know, no existing works systematically combine diverse types of information to build more accurate recommender systems. In this paper, we propose a novel context-aware hierarchical Bayesian method. First, we propose the use of spectral clustering for user-item subgrouping, so that users and items in similar contexts are grouped. We then propose a novel hierarchical Bayesian model that can make predictions for each user-item subgroup, our model incorporate not only topic modeling to mine item content but also social matrix factorization to handle ratings and social relationships. Experiments on an Epinions dataset show that our method significantly improves recommendation performance compared with six categories of state-of-the-art recommendation methods in terms of both prediction accuracy and recall. We have also conducted experiments to study the extent to which ratings, contexts, social relationships, and item contents contribute to recommendation performance in terms of prediction accuracy and recall.

AAAI Conference 2014 Conference Paper

Trust Prediction with Propagation and Similarity Regularization

  • Xiaoming Zheng
  • Yan Wang
  • Mehmet Orgun
  • Youliang Zhong
  • Guanfeng Liu

Online social networks have been used for a variety of rich activities in recent years, such as investigating potential employees and seeking recommendations of high quality services and service providers. In such activities, trust is one of the most critical factors for the decisionmaking of users. In the literature, the state-of-the-art trust prediction approaches focus on either dispositional trust tendency and propagated trust of the pair-wise trust relationships along a path or the similarity of trust rating values. However, there are other influential factors that should be taken into account, such as the similarity of the trust rating distributions. In addition, tendency, propagated trust and similarity are of different types, as either personal properties or interpersonal properties. But the difference has been neglected in existing models. Therefore, in trust prediction, it is necessary to take all the above factors into consideration in modeling, and process them separately and differently. In this paper we propose a new trust prediction model based on trust decomposition and matrix factorization, considering all the above influential factors and differentiating both personal and interpersonal properties. In this model, we first decompose trust into trust tendency and tendency-reduced trust. Then, based on tendencyreduced trust ratings, matrix factorization with a regularization term is leveraged to predict the tendencyreduced values of missing trust ratings, incorporating both propagated trust and the similarity of users’ rating habits. In the end, the missing trust ratings are composed with predicted tendency-reduced values and trust tendency values. Experiments conducted on a realworld dataset illustrate significant improvement delivered by our approach in trust prediction accuracy over the state-of-the-art approaches.

AAAI Conference 2013 Conference Paper

Supervised Coupled Dictionary Learning with Group Structures for Multi-modal Retrieval

  • Yue Zhuang
  • Yan Wang
  • Fei Wu
  • Yin Zhang
  • Wei Lu

A better similarity mapping function across heterogeneous high-dimensional features is very desirable for many applications involving multi-modal data. In this paper, we introduce coupled dictionary learning (DL) into supervised sparse coding for multi-modal (crossmedia) retrieval. We call this Supervised coupleddictionary learning with group structures for Multi- Modal retrieval (SliM2 ). SliM2 formulates the multimodal mapping as a constrained dictionary learning problem. By utilizing the intrinsic power of DL to deal with the heterogeneous features, SliM2 extends unimodal DL to multi-modal DL. Moreover, the label information is employed in SliM2 to discover the shared structure inside intra-modality within the same class by a mixed norm (i. e. , `1/`2-norm). As a result, the multimodal retrieval is conducted via a set of jointly learned mapping functions across multi-modal data. The experimental results show the effectiveness of our proposed model when applied to cross-media retrieval.

TCS Journal 2012 Journal Article

An algorithm to construct independent spanning trees on parity cubes

  • Yan Wang
  • Jianxi Fan
  • Xiaohua Jia
  • He Huang

Independent spanning trees have applications in networks such as reliable communication protocols, one-to-all broadcasting, reliable broadcasting, and secure message distribution. Thus, the designs of independent spanning trees in several classes of networks have been widely investigated. However, there is a conjecture on independent spanning trees: any n -connected graph has n independent spanning trees rooted at an arbitrary vertex. This conjecture still remains open for n ≥ 5. In this paper, by proposing an algorithm to construct n independent spanning trees rooted at any vertex, we confirm the conjecture on n -dimensional parity cube P Q n —— a variant of n -dimensional hypercube. Furthermore, we prove that all independent spanning trees rooted at an arbitrary vertex constructed by our construction method are isomorphic and the height of each tree is n + 1 for any integer n ≥ 2.

AAAI Conference 2012 Conference Paper

Social Context-Aware Trust Network Discovery in Complex Contextual Social Networks

  • Guanfeng Liu
  • Yan Wang
  • Mehmet Orgun

Trust is one of the most important factors for participants’ decision-making in Online Social Networks (OSNs). The trust network from a source to a target without any prior interaction contains some important intermediate participants, the trust relations between the participants, and the social context, each of which has an important influence on trust evaluation. Thus, before performing any trust evaluation, the contextual trust network from a given source to a target needs to be extracted first, where constraints on the social context should also be considered to guarantee the quality of extracted networks. However, this problem has been proved to be NP-Complete. Towards solving this challenging problem, we first propose a complex contextual social network structure which considers social contextual impact factors. These factors have significant influences on both social interaction between participants and trust evaluation. Then, we propose a new concept called QoTN (Quality of Trust Network) and a social context-aware trust network discovery model. Finally, we propose a Social Context-Aware trust Network discovery algorithm (SCAN) by adopting the Monte Carlo method and our proposed optimization strategies. The experimental results illustrate that our proposed model and algorithm outperform the existing methods in both algorithm efficiency and the quality of the extracted trust network.

AAAI Conference 2011 Conference Paper

Trust Transitivity in Complex Social Networks

  • Guanfeng Liu
  • Yan Wang
  • Mehmet Orgun

In Online Social Networks (OSNs), participants can conduct rich activities, where trust is one of the most important factors for their decision making. This necessitates the evaluation of the trustworthiness between two unknown participants along the social trust paths between them based on the trust transitivity properties (i. e. , if A trusts B and B trusts C, then A can trust C to some extent). In order to compute more reasonable trust value between two unknown participants, a critical and challenging problem is to make clear how and to what extent trust is transitive along a social trust path. To address this problem, we first propose a new complex social network structure that takes, besides trust, social relationships, recommendation roles and preference similarity between participants into account. These factors have significant influence on trust transitivity. We then propose a general concept, called Quality of Trust Transitivity (QoTT), that takes any factor with impact on trust transitivity as an attribute to illustrate the ability of a trust path to guarantee a certain level of quality in trust transitivity. Finally, we propose a novel Multiple QoTT Constrained Trust Transitivity (MQCTT) model. The results of our experiments demonstrate that our proposed MQCTT model follows the properties of trust and the principles illustrated in social psychology, and thus can compute more resonable trust values than existing methods that consider neither the impact of social aspects nor the properties of trust.

AAAI Conference 2010 Conference Paper

Optimal Social Trust Path Selection in Complex Social Networks

  • Guanfeng Liu
  • Yan Wang
  • Mehmet Orgun

Online social networks are becoming increasingly popular and are being used as the means for a variety of rich activities. This demands the evaluation of the trustworthiness between two unknown participants along a certain social trust path between them in the social network. However, there are usually many social trust paths between participants. Thus, a challenging problem is finding which social trust path is the optimal one that can yield the most trustworthy evaluation result. In this paper, we first present a new complex social network structure and a new concept of Quality of Trust (QoT) to illustrate the ability to guarantee a certain level of trustworthiness in trust evaluation. We then model the optimal social trust path selection as a Multi-Constrained Optimal Path (MCOP) selection problem which is NP-Complete. For solving this problem, we propose an efficient approximation algorithm MONTE K based on the Monte Carlo method. The results of our experiments conducted on a real dataset of social networks illustrate that our proposed algorithm significantly outperforms existing approaches in both efficiency and the quality of selected social trust paths.

AIIM Journal 2010 Journal Article

PMirP: A pre-microRNA prediction method based on structure–sequence hybrid features

  • Dongyu Zhao
  • Yan Wang
  • Di Luo
  • Xiaohu Shi
  • Liupu Wang
  • Dong Xu
  • Jun Yu
  • Yanchun Liang

Objective MicroRNA is a type of small non-coding RNAs, which usually has a stem-loop structure. As an important stage of microRNA, the pre-microRNA is transported from nuclear to cytoplasm by exportin5 and finally cleaved into mature microRNA. Structure–sequence features and minimum of free energy of secondary structure have been used for predicting pre-microRNA. Meanwhile, the double helix structure with free nucleotides and base-pairing features is used to identify pre-miRNA for the first time. Methods We applied support vector machine for a novel hybrid coding scheme using left-triplet method, the free nucleotides, the minimum of free energy of secondary structure and base-pairings features. Data sets of human pre-microRNA, other 11 species and the latest pre-microRNA sequences were used for testing. Results In this study we developed an improved method for pre-microRNA prediction using a combination of various features and a web server called PMirP. The prediction specificity and sensitivity for real and pseudo human pre-microRNAs are as high as 98. 4% and 94. 9%, respectively. The web server is freely available to the public at http: //ccst. jlu. edu. cn/ci/bioinformatics/MiRNA (accessed: 26 February 2010). Conclusions Experimental results show that the proposed method improves the prediction efficiency and accuracy over existing methods. In addition, the PMirP has lower computational complexity and higher throughput prediction capacity than Mipred web server.

AAMAS Conference 2010 Conference Paper

Quality of Trust for Social Trust Path Selection in Complex Social Networks

  • Guanfeng Liu
  • Yan Wang
  • Mehmet Orgun

In online social networks, there are usually many social trust paths between agents. Thus, a challenging problem is which social trust path is the optimal one that can yield the most trustworthy evaluation result. In this paper, we present a new complex social network structure and propose a new concept, Quality of Trust (QoT), for social trust path selection in complex social networks.

AAAI Conference 2010 Conference Paper

Subjective Trust Inference in Composite Services

  • Lei Li
  • Yan Wang

In Service-Oriented Computing (SOC) environments, the trustworthiness of each service is critical for a service client when selecting one from a large pool of services. The trust value of a service is usually in the range of [0, 1] and is evaluated from the ratings given by service clients, which represent the subjective belief of these service clients on the satisfaction of delivered services. So a trust value can be taken as the subjective probability, with which one party believes that another party can perform an action in a certain situation. Hence, subjective probability theory should be adopted in trust evaluation. In addition, in SOC environments, a service usually invokes other services offered by different service providers forming a composite service. Thus, the global trust of a composite service should be evaluated based on complex invocation structures. In this paper, firstly, based on Bayesian inference, we propose a novel method to evaluate the subjective trustworthiness of a service component from a series of ratings given by service clients. Secondly, we interpret the trust dependency caused by service invocations as conditional probability, which is evaluated based on the subjective trust values of service components. Furthermore, we propose a joint subjective probability method to evaluate the subjective global trust of a composite service on the basis of trust dependency. Finally, we introduce the results of our conducted experiments to illustrate the properties of our proposed subjective global trust inference method.

AIIM Journal 2007 Journal Article

A multi-approaches-guided genetic algorithm with application to operon prediction

  • Shuqin Wang
  • Yan Wang
  • Wei Du
  • Fangxun Sun
  • Xiumei Wang
  • Chunguang Zhou
  • Yanchun Liang

Objective The prediction of operons is critical to the reconstruction of regulatory networks at the whole genome level. Multiple genome features have been used for predicting operons. However, multiple genome features are usually dealt with using only single method in the literatures. The aim of this paper is to develop a combined method for operon prediction by using different methods to preprocess different genome features in order for exerting their unique characteristics. Methods A novel multi-approach-guided genetic algorithm for operon prediction is presented. We exploit different methods for intergenic distance, cluster of orthologous groups (COG) gene functions, metabolic pathway and microarray expression data. A novel local-entropy-minimization method is proposed to partition intergenic distance. Our program can be used for other newly sequenced genomes by transferring the knowledge that has been obtained from Escherichia coli data. We calculate the log-likelihood for COG gene functions and Pearson correlation coefficient for microarray expression data. The genetic algorithm is used for integrating the four types of data. Results The proposed method is examined on E. coli K12 genome, Bacillus subtilis genome, and Pseudomonas aeruginosa PAO1 genome. The accuracies of prediction for these three genomes are 85. 9987%, 88. 296%, and 81. 2384%, respectively. Conclusion Simulated experimental results demonstrate that in the genetic algorithm the preprocessing for genome data using multiple approaches ensures the effective utilization of different biological characteristics. Experimental results also show that the proposed method is applicable for predicting operons in prokaryote.