Author name cluster

Wei Fan

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

49 papers

2 author rows

JBHI Journal 2026 Journal Article

DPGOK: A Deep Learning-Based Method for Protein Function Prediction by Fusing GO Knowledge With Protein Features

Qiurong Yang
Wenkang Wang
Wei Fan
Ruiqing Zheng
Min Li

Accurately predicting protein functions is critical for understanding disease mechanisms and discovering potential drug targets. Gene Ontology (GO), with its hierarchical and semantic information, provides valuable context that can be integrated to improve prediction accuracy. Recently, several existing methods have attempted to integrate GO knowledge with protein sequence features for function prediction. However, these methods ignore the fact that GO embeddings should be tailored to proteins to reflect protein-specific functional relevance. To address this limitation, we proposed DPGOK, a deep learning-based method that fused protein-aware GO representations with protein features for function prediction. DPGOK first learns GO semantic representations with a knowledge graph loss and further generates protein-aware GO embeddings under the guidance of protein features. Results show that DPGOK outperforms state-of-the-art methods across all GO domains. Additional experiments demonstrated that DPGOK is capable of discovering hierarchically deeper and more informative functions for target proteins. Ablation studies revealed that the knowledge graph loss we introduced contributes to more stable and semantically coherent GO representations across different domains. Finally, we find that the predictive performance can be further improved when DPGOK is combined with homology-based approaches.

Details DOI

EAAI Journal 2026 Journal Article

Global relationship awareness 3-dimensional object detection using 4-dimensional radar

Pianzhang Duan
Li Wang
Cheng Fang
Ziying Song
Ming Gao
Mo Zhou
Ying Li
Yibo Zhang

Details DOI

AAAI Conference 2026 Conference Paper

iMAD: Intelligent Multi-Agent Debate for Efficient and Accurate LLM Inference

Wei Fan
JinYi Yoon
Bo Ji

Large Language Model (LLM) agent systems have advanced rapidly, driven by their strong generalization in zero-shot settings. To further enhance reasoning and accuracy on complex tasks, Multi-Agent Debate (MAD) has emerged as a promising framework that engages multiple LLM agents in structured debates to encourage diverse reasoning. However, triggering MAD for every query is inefficient, as it incurs substantial computational (token) cost and may even degrade accuracy by overturning correct answers from single-agent. To address these limitations, we propose intelligent Multi-Agent Debate (iMAD), a token-efficient framework that selectively triggers MAD only when it is likely to be beneficial (i.e., correcting an initially wrong answer). To achieve this goal, iMAD learns generalizable model behaviors to make accurate debate decisions. Specifically, iMAD first prompts a single agent to produce a structured self-critique response, from which we extract 41 interpretable linguistic and semantic features capturing hesitation cues. Then, iMAD uses a lightweight debate decision classifier, trained using our proposed FocusCal loss without test-dataset-specific tuning, to make robust zero-shot debate decisions. Through extensive experiments using six (visual) question answering datasets against five competitive baselines, we show that iMAD significantly reduces token usage (by up to 92%) while also improving final answer accuracy (by up to 13.5%).

PDF Details DOI

AAAI Conference 2026 Conference Paper

MTP: Exploring Multimodal Urban Traffic Profiling with Modality Augmentation and Spectrum Fusion

Haolong Xiang
Peisi Wang
Xiaolong Xu
Kun Yi
Xuyun Zhang
Quan Z. Sheng
Amin Beheshti
Wei Fan

With rapid urbanization in the modern era, traffic signals from various sensors have been playing a significant role in monitoring the states of cities, which provides a strong foundation in ensuring safe travel, reducing traffic congestion and optimizing urban mobility. Most existing methods for traffic time series modeling often rely on the original data modality, i.e., numerical direct readings from the sensors in cities. However, this unimodal approach overlooks the semantic information existing in multimodal heterogeneous urban data in different perspectives, which hinders a comprehensive understanding of traffic signals and limits the accurate prediction of complex traffic dynamics. To address this problem, we propose a novel Multimodal framework, MTP, for urban Traffic Profiling, which learns multimodal features through numeric, visual, and textual perspectives in the frequency domain. The three branches drive a multimodal perspective of traffic signal learning for augmentation, while the frequency learning strategies delicately refine the information for extraction. Specifically, we first conduct the visual augmentation for the traffic time series, which transforms the original modality into periodicity images and frequency images for visual learning. Also, we augment descriptive texts for the traffic time series based on the specific topic, background information and item description for textual learning. To complement the numeric information, we utilize frequency multilayer perceptrons for learning on the original modality. We design a hierarchical contrastive learning on the three branches to fuse the three modalities. Finally, extensive experiments on six real-world datasets demonstrate superior performance compared with the state-of-the-art approaches.

PDF Details DOI

EAAI Journal 2026 Journal Article

Multirate ensemble probabilistic sparse identification of nonlinear dynamics for industrial process monitoring

Zixuan Lin
Wei Fan
Haiquan Yu
Cong Yu
Jiao Wang
Biwen Zhu

Details DOI

ICML Conference 2025 Conference Paper

Actor-Critics Can Achieve Optimal Sample Efficiency

Kevin Tan
Wei Fan
Yuting Wei 0001

Actor-critic algorithms have become a cornerstone in reinforcement learning (RL), leveraging the strengths of both policy-based and value-based methods. Despite recent progress in understanding their statistical efficiency, no existing work has successfully learned an $\epsilon$-optimal policy with a sample complexity of $O(1/\epsilon^2)$ trajectories with general function approximation when strategic exploration is necessary. We address this open problem by introducing a novel actor-critic algorithm that attains a sample-complexity of $O(dH^5 \log|\mathcal{A}|/\epsilon^2 + d H^4 \log|\mathcal{F}|/ \epsilon^2)$ trajectories, and accompanying $\sqrt{T}$ regret when the Bellman eluder dimension $d$ does not increase with $T$ at more than a $\log T$ rate. Here, $\mathcal{F}$ is the critic function class, and $\mathcal{A}$ is the action space. Our algorithm integrates optimism, off-policy critic estimation targeting the optimal Q-function, and rare-switching policy resets. We extend this to the setting of Hybrid RL, where we show that initializing the critic with offline data yields sample efficiency gains, and also provide a non-optimistic provably efficient actor-critic algorithm, addressing another open problem in the literature. Numerical experiments support our theoretical findings.

Details

AAAI Conference 2025 Conference Paper

Amplifier: Bringing Attention to Neglected Low-Energy Components in Time Series Forecasting

Jingru Fei
Kun Yi
Wei Fan
Qi Zhang
Zhendong Niu

We propose an energy amplification technique to address the issue that existing models easily overlook low-energy components in time series forecasting. This technique comprises an energy amplification block and an energy restoration block. The energy amplification block enhances the energy of low-energy components to improve the model's learning efficiency for these components, while the energy restoration block returns the energy to its original level. Moreover, considering that the energy-amplified data typically displays two distinct energy peaks in the frequency spectrum, we integrate the energy amplification technique with a seasonal-trend forecaster to model the temporal relationships of these two peaks independently, serving as the backbone for our proposed model, Amplifier. Additionally, we propose a semi-channel interaction temporal relationship enhancement block for Amplifier, which enhances the model's ability to capture temporal relationships from the perspective of the commonality and specificity of each channel in the data. Extensive experiments on eight time series forecasting benchmarks consistently demonstrate our model's superiority in both effectiveness and efficiency compared to state-of-the-art methods.

PDF Details DOI

EAAI Journal 2025 Journal Article

An on-line global–local defect detection framework for wide cold-rolled strip steel

Pan Jiang
Zhenying Xu
Wei Fan
Jin Zhang

Details DOI

EAAI Journal 2025 Journal Article

Coupled fault diagnosis for centrifugal pumps through Boruta-Shap feature selection and rime-enhanced stacked denoised autoencoder

Kang Hu
Hui Sun
Wei Fan
Qiaorui Si
Yu Wu
Shouqi Yuan

Details DOI

ICML Conference 2025 Conference Paper

De-AntiFake: Rethinking the Protective Perturbations Against Voice Cloning Attacks

Wei Fan
Kejiang Chen
Chang Liu 0089
Weiming Zhang 0001
Nenghai Yu

The rapid advancement of speech generation models has heightened privacy and security concerns related to voice cloning (VC). Recent studies have investigated disrupting unauthorized voice cloning by introducing adversarial perturbations. However, determined attackers can mitigate these protective perturbations and successfully execute VC. In this study, we conduct the first systematic evaluation of these protective perturbations against VC under realistic threat models that include perturbation purification. Our findings reveal that while existing purification methods can neutralize a considerable portion of the protective perturbations, they still lead to distortions in the feature space of VC models, which degrades the performance of VC. From this perspective, we propose a novel two-stage purification method: (1) Purify the perturbed speech; (2) Refine it using phoneme guidance to align it with the clean speech distribution. Experimental results demonstrate that our method outperforms state-of-the-art purification methods in disrupting VC defenses. Our study reveals the limitations of adversarial perturbation-based VC defenses and underscores the urgent need for more robust solutions to mitigate the security and privacy risks posed by VC. The code and audio samples are available at https: //de-antifake. github. io.

Details

IJCAI Conference 2025 Conference Paper

Empowering Multimodal Road Traffic Profiling with Vision Language Models and Frequency Spectrum Fusion

Haolong Xiang
Xiaolong Xu
Guangdong Wang
Xuyun Zhang
Xiaoyong Li
Qi Zhang
Amin Beheshti
Wei Fan

With the rapid urbanization in the modern era, smart traffic profiling based on multimodal sources of data has been playing a significant role in ensuring safe travel, reducing traffic congestion and optimizing urban mobility. Most existing methods for traffic profiling on the road level usually utilize single-modality data, i. e. , they mainly focus on image processing with deep vision models or auxiliary analysis on the textual data. However, the joint modeling and multimodal fusion of the textual and visual modalities have been rarely studied in road traffic profiling, which largely hinders the accurate prediction or classification of traffic conditions. To address this issue, we propose a novel multimodal learning and fusion framework for road traffic profiling, named TraffiCFUS. Specifically, given the traffic images, our TraffiCFUS framework first introduces Vision Language Models (VLMs) to generate text and then creates tailored prompt instructions for refining this text according to the specific scene requirements of road traffic profiling. Next, we apply the discrete Fourier transform to convert multimodal data from the spatial domain to the frequency domain and perform a cross-modal spectrum transform to filter out irrelevant information for traffic profiling. Furthermore, the processed spatial multimodal data is combined to generate fusion loss and interaction loss with contrastive learning. Finally, extensive experiments on four real-world datasets illustrate superior performance compared with the state-of-the-art approaches.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Enhancing Generalizability in Molecular Conformation Generation with METRIZATION-Informed Geometric Diffusion Pretraining

Xiaozhuang Song
Yuzhao Tu
Hangting Ye
Wei Fan
Qingquan Zhang
Xiaoxue Wang
Tianshu Yu

Diffusion-based generative models have recently excelled in generating molecular conformations but struggled with the generalization issue -- models trained on one dataset may produce meaningless conformations on out-of-distribution molecules. On the other hand, distance geometry serves as a generalizable tool for the traditional computational chemistry methods of molecular conformation, which is predicated on the assumption that it is possible to adequately define the set of all potential conformations of any non-rigid molecular system using purely geometric constraints. In this work, we for the first time explicitly incorporate distance geometry constraints into pretraining phase of diffusion-based molecular generation models to improve the generalizability. Inspired by the classical distance geometry solution designed for solving the molecular distance geometry problem, we propose MiGDiff, a Metrization-Informed Geometric Diffusion framework. MiGDiff injects distance geometry constraints by pretraining the deep geometric diffusion backbone within the Metrization sampling approach, yielding a "Metrization-driven pretraining + Data-driven finetuning" paradigm. Experimental results demonstrate that MiGDiff outperforms state-of-the-art methods and possesses strong generalization capabilities, particularly on generating previously unseen molecules, revealing the vast untapped potential of combining traditional computational methods with deep generative models for 3D molecular generation.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

RePST: Language Model Empowered Spatio-Temporal Forecasting via Semantic-Oriented Reprogramming

Hao Wang
Jindong Han
Wei Fan
Leilei Sun
Hao Liu

Spatio-temporal forecasting is pivotal in numerous real-world applications, including transportation planning, energy management, and climate monitoring. In this work, we aim to harness the reasoning and generalization abilities of Pre-trained Language Models (PLMs) for more effective spatio-temporal forecasting, particularly in data-scarce scenarios. However, recent studies uncover that PLMs, which are primarily trained on textual data, often falter when tasked with modeling the intricate correlations in numerical time series, thereby limiting their effectiveness in comprehending spatio-temporal data. To bridge the gap, we propose RePST, a semantic-oriented PLM reprogramming framework tailored for spatio-temporal forecasting. Specifically, we first propose a semantic-oriented decomposer that adaptively disentangles spatially correlated time series into interpretable sub-components, which facilitates PLM to understand sophisticated spatio-temporal dynamics via a divide-and-conquer strategy. Moreover, we propose a selective discrete reprogramming scheme, which introduces an expanded spatio-temporal vocabulary space to project spatio-temporal series into discrete representations. This scheme minimizes the information loss during reprogramming and enriches the representations derived by PLMs. Extensive experiments on real-world datasets show that the proposed RePST outperforms twelve state-of-the-art baseline methods, particularly in data-scarce scenarios, highlighting the effectiveness and superior generalization capabilities of PLMs for spatio-temporal forecasting. Codes and Appendix can be found at https: //github. com/usail-hkust/REPST.

PDF Details DOI

JBHI Journal 2025 Journal Article

TransScore: A Graph Model for Pose Scoring and Affinity Prediction Based on Transformer Convolution Network

Chuqi Lei
Wenkang Wang
Wei Fan
Zhangli Lu
Jing Tang
Min Li

Predicting the interaction of protein and compound is an important task in drug discovery. Molecular docking has been a fundamental and vital computer-aid tool for digging potential interaction of the protein-compound pair. With the recent great success of artificial intelligence (AI), the scoring function, as a fundamental part of molecular docking, has been achieving much better performance by incorporating AI-based models. However, the AI-based models usually focus on a single prediction task (e. g. , affinity prediction), which is limited by their lack of extensibility. Moreover, the performance of AI-based models usually declines in cold start scenarios, thus compromising the robustness. To this end, we propose a novel deep learning-based graph model based on the transformer convolution network for pose scoring and affinity prediction. TransScore captures the intrinsic characteristics of protein-compound poses by employing the self-attention mechanism, which achieves superior performances in both cold and warm scenarios for the pose-scoring task. The outstanding performance is also shown in imbalanced datasets, which demonstrates the robustness of TransScore. In addition, the gated residual algorithm in TransScore enhances the model to adapt to diverse related tasks. In particular, in the affinity prediction task, we have observed consistent improvements in warm/cold start scenarios. Moreover, it is noticeable that TransScore excels in both accuracy and precision, accurately predicting affinities and their relative ordering. We also conducted an analysis on carbonic anhydrase II, which bears out that TransScore can elaborate the interaction mechanism of the protein-ligand pair, suggesting the potential application of TransScore in drug discovery.

Details DOI

NeurIPS Conference 2025 Conference Paper

Vector Database Watermarking

Zhiwen Ren
Wei Fan
Qiyi Yao
Jing Qiu
Weiming Zhang
Nenghai Yu

Vector databases support machine learning tasks using Approximate Nearest Neighbour (ANN) query functionality, making them highly valuable digital assets. However, they also face security threats like unauthorized replication. By embedding stealth information, watermarking technology can be used for ownership authentication. This paper introduces a watermarking scheme specifically designed for vector databases. The scheme consists of four steps: generating identifiers, grouping, cryptographic mapping, and modification. Since watermark embedding requires modification of certain vectors, it may negatively affect the ANN query results. Further investigation reveals that in the widely used Hierarchical Navigable Small World (HNSW) indexing structure for vector databases, heuristic edge selection and pruning strategies result in some vectors having fewer edges or even none at all. These vectors exhibit significantly lower query frequencies than others, which means that modifying these vectors incurs less impact on query results. Based on this observation, we propose the Transparent Vector Priority (TVP) watermarking scheme, which prioritizes embedding the watermark in these low-query-frequency “transparent” vectors to minimize the impact of watermark embedding on query results. Experimental results show that compared to the current most effective and relevant watermarking schemes, the TVP scheme can significantly reduce the number of missed and false queries by approximately 75\%.