Arrow Research search

Author name cluster

Wei Fan

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

49 papers
2 author rows

Possible papers

49

JBHI Journal 2026 Journal Article

DPGOK: A Deep Learning-Based Method for Protein Function Prediction by Fusing GO Knowledge With Protein Features

  • Qiurong Yang
  • Wenkang Wang
  • Wei Fan
  • Ruiqing Zheng
  • Min Li

Accurately predicting protein functions is critical for understanding disease mechanisms and discovering potential drug targets. Gene Ontology (GO), with its hierarchical and semantic information, provides valuable context that can be integrated to improve prediction accuracy. Recently, several existing methods have attempted to integrate GO knowledge with protein sequence features for function prediction. However, these methods ignore the fact that GO embeddings should be tailored to proteins to reflect protein-specific functional relevance. To address this limitation, we proposed DPGOK, a deep learning-based method that fused protein-aware GO representations with protein features for function prediction. DPGOK first learns GO semantic representations with a knowledge graph loss and further generates protein-aware GO embeddings under the guidance of protein features. Results show that DPGOK outperforms state-of-the-art methods across all GO domains. Additional experiments demonstrated that DPGOK is capable of discovering hierarchically deeper and more informative functions for target proteins. Ablation studies revealed that the knowledge graph loss we introduced contributes to more stable and semantically coherent GO representations across different domains. Finally, we find that the predictive performance can be further improved when DPGOK is combined with homology-based approaches.

AAAI Conference 2026 Conference Paper

iMAD: Intelligent Multi-Agent Debate for Efficient and Accurate LLM Inference

  • Wei Fan
  • JinYi Yoon
  • Bo Ji

Large Language Model (LLM) agent systems have advanced rapidly, driven by their strong generalization in zero-shot settings. To further enhance reasoning and accuracy on complex tasks, Multi-Agent Debate (MAD) has emerged as a promising framework that engages multiple LLM agents in structured debates to encourage diverse reasoning. However, triggering MAD for every query is inefficient, as it incurs substantial computational (token) cost and may even degrade accuracy by overturning correct answers from single-agent. To address these limitations, we propose intelligent Multi-Agent Debate (iMAD), a token-efficient framework that selectively triggers MAD only when it is likely to be beneficial (i.e., correcting an initially wrong answer). To achieve this goal, iMAD learns generalizable model behaviors to make accurate debate decisions. Specifically, iMAD first prompts a single agent to produce a structured self-critique response, from which we extract 41 interpretable linguistic and semantic features capturing hesitation cues. Then, iMAD uses a lightweight debate decision classifier, trained using our proposed FocusCal loss without test-dataset-specific tuning, to make robust zero-shot debate decisions. Through extensive experiments using six (visual) question answering datasets against five competitive baselines, we show that iMAD significantly reduces token usage (by up to 92%) while also improving final answer accuracy (by up to 13.5%).

AAAI Conference 2026 Conference Paper

MTP: Exploring Multimodal Urban Traffic Profiling with Modality Augmentation and Spectrum Fusion

  • Haolong Xiang
  • Peisi Wang
  • Xiaolong Xu
  • Kun Yi
  • Xuyun Zhang
  • Quan Z. Sheng
  • Amin Beheshti
  • Wei Fan

With rapid urbanization in the modern era, traffic signals from various sensors have been playing a significant role in monitoring the states of cities, which provides a strong foundation in ensuring safe travel, reducing traffic congestion and optimizing urban mobility. Most existing methods for traffic time series modeling often rely on the original data modality, i.e., numerical direct readings from the sensors in cities. However, this unimodal approach overlooks the semantic information existing in multimodal heterogeneous urban data in different perspectives, which hinders a comprehensive understanding of traffic signals and limits the accurate prediction of complex traffic dynamics. To address this problem, we propose a novel Multimodal framework, MTP, for urban Traffic Profiling, which learns multimodal features through numeric, visual, and textual perspectives in the frequency domain. The three branches drive a multimodal perspective of traffic signal learning for augmentation, while the frequency learning strategies delicately refine the information for extraction. Specifically, we first conduct the visual augmentation for the traffic time series, which transforms the original modality into periodicity images and frequency images for visual learning. Also, we augment descriptive texts for the traffic time series based on the specific topic, background information and item description for textual learning. To complement the numeric information, we utilize frequency multilayer perceptrons for learning on the original modality. We design a hierarchical contrastive learning on the three branches to fuse the three modalities. Finally, extensive experiments on six real-world datasets demonstrate superior performance compared with the state-of-the-art approaches.

ICML Conference 2025 Conference Paper

Actor-Critics Can Achieve Optimal Sample Efficiency

  • Kevin Tan
  • Wei Fan
  • Yuting Wei 0001

Actor-critic algorithms have become a cornerstone in reinforcement learning (RL), leveraging the strengths of both policy-based and value-based methods. Despite recent progress in understanding their statistical efficiency, no existing work has successfully learned an $\epsilon$-optimal policy with a sample complexity of $O(1/\epsilon^2)$ trajectories with general function approximation when strategic exploration is necessary. We address this open problem by introducing a novel actor-critic algorithm that attains a sample-complexity of $O(dH^5 \log|\mathcal{A}|/\epsilon^2 + d H^4 \log|\mathcal{F}|/ \epsilon^2)$ trajectories, and accompanying $\sqrt{T}$ regret when the Bellman eluder dimension $d$ does not increase with $T$ at more than a $\log T$ rate. Here, $\mathcal{F}$ is the critic function class, and $\mathcal{A}$ is the action space. Our algorithm integrates optimism, off-policy critic estimation targeting the optimal Q-function, and rare-switching policy resets. We extend this to the setting of Hybrid RL, where we show that initializing the critic with offline data yields sample efficiency gains, and also provide a non-optimistic provably efficient actor-critic algorithm, addressing another open problem in the literature. Numerical experiments support our theoretical findings.

AAAI Conference 2025 Conference Paper

Amplifier: Bringing Attention to Neglected Low-Energy Components in Time Series Forecasting

  • Jingru Fei
  • Kun Yi
  • Wei Fan
  • Qi Zhang
  • Zhendong Niu

We propose an energy amplification technique to address the issue that existing models easily overlook low-energy components in time series forecasting. This technique comprises an energy amplification block and an energy restoration block. The energy amplification block enhances the energy of low-energy components to improve the model's learning efficiency for these components, while the energy restoration block returns the energy to its original level. Moreover, considering that the energy-amplified data typically displays two distinct energy peaks in the frequency spectrum, we integrate the energy amplification technique with a seasonal-trend forecaster to model the temporal relationships of these two peaks independently, serving as the backbone for our proposed model, Amplifier. Additionally, we propose a semi-channel interaction temporal relationship enhancement block for Amplifier, which enhances the model's ability to capture temporal relationships from the perspective of the commonality and specificity of each channel in the data. Extensive experiments on eight time series forecasting benchmarks consistently demonstrate our model's superiority in both effectiveness and efficiency compared to state-of-the-art methods.

ICML Conference 2025 Conference Paper

De-AntiFake: Rethinking the Protective Perturbations Against Voice Cloning Attacks

  • Wei Fan
  • Kejiang Chen
  • Chang Liu 0089
  • Weiming Zhang 0001
  • Nenghai Yu

The rapid advancement of speech generation models has heightened privacy and security concerns related to voice cloning (VC). Recent studies have investigated disrupting unauthorized voice cloning by introducing adversarial perturbations. However, determined attackers can mitigate these protective perturbations and successfully execute VC. In this study, we conduct the first systematic evaluation of these protective perturbations against VC under realistic threat models that include perturbation purification. Our findings reveal that while existing purification methods can neutralize a considerable portion of the protective perturbations, they still lead to distortions in the feature space of VC models, which degrades the performance of VC. From this perspective, we propose a novel two-stage purification method: (1) Purify the perturbed speech; (2) Refine it using phoneme guidance to align it with the clean speech distribution. Experimental results demonstrate that our method outperforms state-of-the-art purification methods in disrupting VC defenses. Our study reveals the limitations of adversarial perturbation-based VC defenses and underscores the urgent need for more robust solutions to mitigate the security and privacy risks posed by VC. The code and audio samples are available at https: //de-antifake. github. io.

IJCAI Conference 2025 Conference Paper

Empowering Multimodal Road Traffic Profiling with Vision Language Models and Frequency Spectrum Fusion

  • Haolong Xiang
  • Xiaolong Xu
  • Guangdong Wang
  • Xuyun Zhang
  • Xiaoyong Li
  • Qi Zhang
  • Amin Beheshti
  • Wei Fan

With the rapid urbanization in the modern era, smart traffic profiling based on multimodal sources of data has been playing a significant role in ensuring safe travel, reducing traffic congestion and optimizing urban mobility. Most existing methods for traffic profiling on the road level usually utilize single-modality data, i. e. , they mainly focus on image processing with deep vision models or auxiliary analysis on the textual data. However, the joint modeling and multimodal fusion of the textual and visual modalities have been rarely studied in road traffic profiling, which largely hinders the accurate prediction or classification of traffic conditions. To address this issue, we propose a novel multimodal learning and fusion framework for road traffic profiling, named TraffiCFUS. Specifically, given the traffic images, our TraffiCFUS framework first introduces Vision Language Models (VLMs) to generate text and then creates tailored prompt instructions for refining this text according to the specific scene requirements of road traffic profiling. Next, we apply the discrete Fourier transform to convert multimodal data from the spatial domain to the frequency domain and perform a cross-modal spectrum transform to filter out irrelevant information for traffic profiling. Furthermore, the processed spatial multimodal data is combined to generate fusion loss and interaction loss with contrastive learning. Finally, extensive experiments on four real-world datasets illustrate superior performance compared with the state-of-the-art approaches.

AAAI Conference 2025 Conference Paper

Enhancing Generalizability in Molecular Conformation Generation with METRIZATION-Informed Geometric Diffusion Pretraining

  • Xiaozhuang Song
  • Yuzhao Tu
  • Hangting Ye
  • Wei Fan
  • Qingquan Zhang
  • Xiaoxue Wang
  • Tianshu Yu

Diffusion-based generative models have recently excelled in generating molecular conformations but struggled with the generalization issue -- models trained on one dataset may produce meaningless conformations on out-of-distribution molecules. On the other hand, distance geometry serves as a generalizable tool for the traditional computational chemistry methods of molecular conformation, which is predicated on the assumption that it is possible to adequately define the set of all potential conformations of any non-rigid molecular system using purely geometric constraints. In this work, we for the first time explicitly incorporate distance geometry constraints into pretraining phase of diffusion-based molecular generation models to improve the generalizability. Inspired by the classical distance geometry solution designed for solving the molecular distance geometry problem, we propose MiGDiff, a Metrization-Informed Geometric Diffusion framework. MiGDiff injects distance geometry constraints by pretraining the deep geometric diffusion backbone within the Metrization sampling approach, yielding a "Metrization-driven pretraining + Data-driven finetuning" paradigm. Experimental results demonstrate that MiGDiff outperforms state-of-the-art methods and possesses strong generalization capabilities, particularly on generating previously unseen molecules, revealing the vast untapped potential of combining traditional computational methods with deep generative models for 3D molecular generation.

IJCAI Conference 2025 Conference Paper

RePST: Language Model Empowered Spatio-Temporal Forecasting via Semantic-Oriented Reprogramming

  • Hao Wang
  • Jindong Han
  • Wei Fan
  • Leilei Sun
  • Hao Liu

Spatio-temporal forecasting is pivotal in numerous real-world applications, including transportation planning, energy management, and climate monitoring. In this work, we aim to harness the reasoning and generalization abilities of Pre-trained Language Models (PLMs) for more effective spatio-temporal forecasting, particularly in data-scarce scenarios. However, recent studies uncover that PLMs, which are primarily trained on textual data, often falter when tasked with modeling the intricate correlations in numerical time series, thereby limiting their effectiveness in comprehending spatio-temporal data. To bridge the gap, we propose RePST, a semantic-oriented PLM reprogramming framework tailored for spatio-temporal forecasting. Specifically, we first propose a semantic-oriented decomposer that adaptively disentangles spatially correlated time series into interpretable sub-components, which facilitates PLM to understand sophisticated spatio-temporal dynamics via a divide-and-conquer strategy. Moreover, we propose a selective discrete reprogramming scheme, which introduces an expanded spatio-temporal vocabulary space to project spatio-temporal series into discrete representations. This scheme minimizes the information loss during reprogramming and enriches the representations derived by PLMs. Extensive experiments on real-world datasets show that the proposed RePST outperforms twelve state-of-the-art baseline methods, particularly in data-scarce scenarios, highlighting the effectiveness and superior generalization capabilities of PLMs for spatio-temporal forecasting. Codes and Appendix can be found at https: //github. com/usail-hkust/REPST.

JBHI Journal 2025 Journal Article

TransScore: A Graph Model for Pose Scoring and Affinity Prediction Based on Transformer Convolution Network

  • Chuqi Lei
  • Wenkang Wang
  • Wei Fan
  • Zhangli Lu
  • Jing Tang
  • Min Li

Predicting the interaction of protein and compound is an important task in drug discovery. Molecular docking has been a fundamental and vital computer-aid tool for digging potential interaction of the protein-compound pair. With the recent great success of artificial intelligence (AI), the scoring function, as a fundamental part of molecular docking, has been achieving much better performance by incorporating AI-based models. However, the AI-based models usually focus on a single prediction task (e. g. , affinity prediction), which is limited by their lack of extensibility. Moreover, the performance of AI-based models usually declines in cold start scenarios, thus compromising the robustness. To this end, we propose a novel deep learning-based graph model based on the transformer convolution network for pose scoring and affinity prediction. TransScore captures the intrinsic characteristics of protein-compound poses by employing the self-attention mechanism, which achieves superior performances in both cold and warm scenarios for the pose-scoring task. The outstanding performance is also shown in imbalanced datasets, which demonstrates the robustness of TransScore. In addition, the gated residual algorithm in TransScore enhances the model to adapt to diverse related tasks. In particular, in the affinity prediction task, we have observed consistent improvements in warm/cold start scenarios. Moreover, it is noticeable that TransScore excels in both accuracy and precision, accurately predicting affinities and their relative ordering. We also conducted an analysis on carbonic anhydrase II, which bears out that TransScore can elaborate the interaction mechanism of the protein-ligand pair, suggesting the potential application of TransScore in drug discovery.

NeurIPS Conference 2025 Conference Paper

Vector Database Watermarking

  • Zhiwen Ren
  • Wei Fan
  • Qiyi Yao
  • Jing Qiu
  • Weiming Zhang
  • Nenghai Yu

Vector databases support machine learning tasks using Approximate Nearest Neighbour (ANN) query functionality, making them highly valuable digital assets. However, they also face security threats like unauthorized replication. By embedding stealth information, watermarking technology can be used for ownership authentication. This paper introduces a watermarking scheme specifically designed for vector databases. The scheme consists of four steps: generating identifiers, grouping, cryptographic mapping, and modification. Since watermark embedding requires modification of certain vectors, it may negatively affect the ANN query results. Further investigation reveals that in the widely used Hierarchical Navigable Small World (HNSW) indexing structure for vector databases, heuristic edge selection and pruning strategies result in some vectors having fewer edges or even none at all. These vectors exhibit significantly lower query frequencies than others, which means that modifying these vectors incurs less impact on query results. Based on this observation, we propose the Transparent Vector Priority (TVP) watermarking scheme, which prioritizes embedding the watermark in these low-query-frequency “transparent” vectors to minimize the impact of watermark embedding on query results. Experimental results show that compared to the current most effective and relevant watermarking schemes, the TVP scheme can significantly reduce the number of missed and false queries by approximately 75\%.

AAAI Conference 2025 Conference Paper

Wills Aligner: Multi-Subject Collaborative Brain Visual Decoding

  • Guangyin Bao
  • Qi Zhang
  • Zixuan Gong
  • Jialei Zhou
  • Wei Fan
  • Kun Yi
  • Usman Naseem
  • Liang Hu

Decoding visual information from human brain activity has seen remarkable advancements in recent research. However, the diversity in cortical parcellation and fMRI patterns across individuals has prompted the development of deep learning models tailored to each subject. The personalization limits the broader applicability of brain visual decoding in real-world scenarios. To address this issue, we introduce Wills Aligner, a novel approach designed to achieve multi-subject collaborative brain visual decoding. Wills Aligner begins by aligning the fMRI data from different subjects at the anatomical level. It then employs delicate mixture-of-brain-expert adapters and a meta-learning strategy to account for individual fMRI pattern differences. Additionally, Wills Aligner leverages the semantic relation of visual stimuli to guide the learning of inter-subject commonality, enabling visual decoding for each subject to draw insights from other subjects' data. We rigorously evaluate our Wills Aligner across various visual decoding tasks, including classification, cross-modal retrieval, and image reconstruction. The experimental results demonstrate that Wills Aligner achieves promising performance.

IJCAI Conference 2024 Conference Paper

Decoupled Invariant Attention Network for Multivariate Time-series Forecasting

  • Haihua Xu
  • Wei Fan
  • Kun Yi
  • Pengyang Wang

To achieve more accurate prediction results in Time Series Forecasting (TSF), it is essential to distinguish between the valuable patterns (invariant patterns) of the spatial-temporal relationship and the patterns that are prone to generate distribution shift (variant patterns), then combine them for forecasting. The existing works, such as transformer-based models and GNN-based models, focus on capturing main forecasting dependencies whether it is stable or not, and they tend to overlook patterns that carry both useful information and distribution shift. In this paper, we propose a model for better forecasting time series: Decoupled Invariant Attention Network (DIAN), which contains two modules to learn spatial and temporal relationships respectively: 1) Spatial Decoupled Invariant-Variant Learning (SDIVL) to decouple the spatial invariant and variant attention scores, and then leverage convolutional networks to effectively integrate them for subsequent layers; 2) Temporal Augmented Invariant-Variant Learning (TAIVL) to decouple temporal invariant and variant patterns and combine them for further forecasting. In this module, we also design Temporal Intervention Mechanism to create multiple intervened samples by reassembling variant patterns across time stamps to eliminate the spurious impacts of variant patterns. In addition, we propose Joint Optimization to minimize the loss function considering all invariant patterns, variant patterns and intervened patterns so that our model can gain a more stable predictive ability. Extensive experiments on five datasets have demonstrated our superior performance with higher efficiency compared with state-of-the-art methods.

IJCAI Conference 2024 Conference Paper

Deep Frequency Derivative Learning for Non-stationary Time Series Forecasting

  • Wei Fan
  • Kun Yi
  • Hangting Ye
  • Zhiyuan Ning
  • Qi Zhang
  • Ning An

While most time series are non-stationary, it is inevitable for models to face the distribution shift issue in time series forecasting. Existing solutions manipulate statistical measures (usually mean and std. ) to adjust time series distribution. However, these operations can be theoretically seen as the transformation towards zero frequency component of the spectrum which cannot reveal full distribution information and would further lead to information utilization bottleneck in normalization, thus hindering forecasting performance. To address this problem, we propose to utilize the whole frequency spectrum to transform time series to make full use of data distribution from the frequency perspective. We present a deep frequency derivative learning framework, DERITS, for non-stationary time series forecasting. Specifically, DERITS is built upon a novel reversible transformation, namely Frequency Derivative Transformation (FDT) that makes signals derived in the frequency domain to acquire more stationary frequency representations. Then, we propose the Order-adaptive Fourier Convolution Network to conduct adaptive frequency filtering and learning. Furthermore, we organize DERITS as a parallel-stacked architecture for the multi-order derivation and fusion for forecasting. Finally, we conduct extensive experiments on several datasets which show the consistent superiority in both time series forecasting and shift alleviation.

IJCAI Conference 2024 Conference Paper

FedGCS: A Generative Framework for Efficient Client Selection in Federated Learning via Gradient-based Optimization

  • Zhiyuan Ning
  • Chunlin Tian
  • Meng Xiao
  • Wei Fan
  • Pengyang Wang
  • Li Li
  • Pengfei Wang
  • Yuanchun Zhou

Federated Learning faces significant challenges in statistical and system heterogeneity, along with high energy consumption, necessitating efficient client selection strategies. Traditional approaches, including heuristic and learning-based methods, fall short of addressing these complexities holistically. In response, we propose FedGCS, a novel generative client selection framework that innovatively recasts the client selection process as a generative task. Drawing inspiration from the methodologies used in large language models, FedGCS efficiently encodes abundant decision-making knowledge within a continuous representation space, enabling efficient gradient-based optimization to search for optimal client selection that will be finally output via generation. The framework comprises four steps: (1) automatic collection of diverse “selection-score” pair data using classical client selection methods; (2) training an encoder-evaluator-decoder framework on this data to construct a continuous representation space; (3) employing gradient-based optimization in this space for optimal client selection; (4) generating the final optimal client selection via using beam search for the well-trained decoder. FedGCS outperforms traditional methods by being more comprehensive, generalizable, and efficient, simultaneously optimizing for model performance, latency, and energy consumption. The effectiveness of FedGCS is proven through extensive experimental analyses.

NeurIPS Conference 2024 Conference Paper

FilterNet: Harnessing Frequency Filters for Time Series Forecasting

  • Kun Yi
  • Jingru Fei
  • Qi Zhang
  • Hui He
  • Shufeng Hao
  • Defu Lian
  • Wei Fan

Given the ubiquitous presence of time series data across various domains, precise forecasting of time series holds significant importance and finds widespread real-world applications such as energy, weather, healthcare, etc. While numerous forecasters have been proposed using different network architectures, the Transformer-based models have state-of-the-art performance in time series forecasting. However, forecasters based on Transformers are still suffering from vulnerability to high-frequency signals, efficiency in computation, and bottleneck in full-spectrum utilization, which essentially are the cornerstones for accurately predicting time series with thousands of points. In this paper, we explore a novel perspective of enlightening signal processing for deep time series forecasting. Inspired by the filtering process, we introduce one simple yet effective network, namely FilterNet, built upon our proposed learnable frequency filters to extract key informative temporal patterns by selectively passing or attenuating certain components of time series signals. Concretely, we propose two kinds of learnable filters in the FilterNet: (i) Plain shaping filter, that adopts a universal frequency kernel for signal filtering and temporal modeling; (ii) Contextual shaping filter, that utilizes filtered frequencies examined in terms of its compatibility with input signals fordependency learning. Equipped with the two filters, FilterNet can approximately surrogate the linear and attention mappings widely adopted in time series literature, while enjoying superb abilities in handling high-frequency noises and utilizing the whole frequency spectrum that is beneficial for forecasting. Finally, we conduct extensive experiments on eight time series forecasting benchmarks, and experimental results have demonstrated our superior performance in terms of both effectiveness and efficiency compared with state-of-the-art methods. Our code is available at$^1$.

NeurIPS Conference 2024 Conference Paper

Hybrid Reinforcement Learning Breaks Sample Size Barriers In Linear MDPs

  • Kevin Tan
  • Wei Fan
  • Yuting Wei

Hybrid Reinforcement Learning (RL), where an agent learns from both an offline dataset and online explorations in an unknown environment, has garnered significant recent interest. A crucial question posed by Xie et al. (2022) is whether hybrid RL can improve upon the existing lower bounds established in purely offline and purely online RL without relying on the single-policy concentrability assumption. While Li et al. (2023) provided an affirmative answer to this question in the tabular PAC RL case, the question remains unsettled for both the regret-minimizing RL case and the non-tabular case. In this work, building upon recent advancements in offline RL and reward-agnostic exploration, we develop computationally efficient algorithms for both PAC and regret-minimizing RL with linear function approximation, without requiring concentrability on the entire state-action space. We demonstrate that these algorithms achieve sharper error or regret bounds that are no worse than, and can improve on, the optimal sample complexity in offline RL (the first algorithm, for PAC RL) and online RL (the second algorithm, for regret-minimizing RL) in linear Markov decision processes (MDPs), regardless of the quality of the behavior policy. To our knowledge, this work establishes the tightest theoretical guarantees currently available for hybrid RL in linear MDPs.

IJCAI Conference 2024 Conference Paper

HyDiscGAN: A Hybrid Distributed cGAN for Audio-Visual Privacy Preservation in Multimodal Sentiment Analysis

  • Zhuojia Wu
  • Qi Zhang
  • Duoqian Miao
  • Kun Yi
  • Wei Fan
  • Liang Hu

Multimodal Sentiment Analysis (MSA) aims to identify speakers' sentiment tendencies in multimodal video content, raising serious concerns about privacy risks associated with multimodal data, such as voiceprints and facial images. Recent distributed collaborative learning has been verified as an effective paradigm for privacy preservation in multimodal tasks. However, they often overlook the privacy distinctions among different modalities, struggling to strike a balance between performance and privacy preservation. Consequently, it poses an intriguing question of maximizing multimodal utilization to improve performance while simultaneously protecting necessary modalities. This paper forms the first attempt at modality-specified (i. e. , audio and visual) privacy preservation in MSA tasks. We propose a novel Hybrid Distributed cross-modality cGAN framework (HyDiscGAN), which learns multimodality alignment to generate fake audio and visual features conditioned on shareable de-identified textual data. The objective is to leverage the fake features to approximate real audio and visual content to guarantee privacy preservation while effectively enhancing performance. Extensive experiments show that compared with the state-of-the-art MSA model, HyDiscGAN can achieve superior or competitive performance while preserving privacy.

JBHI Journal 2024 Journal Article

MMCA-NET: A Multimodal Cross Attention Transformer Network for Nasopharyngeal Carcinoma Tumor Segmentation Based on a Total-Body PET/CT System

  • Wenjie Zhao
  • Zhenxing Huang
  • Si Tang
  • Wenbo Li
  • Yunlong Gao
  • Yingying Hu
  • Wei Fan
  • Chuanli Cheng

Nasopharyngeal carcinoma (NPC) is a malignant tumor primarily treated by radiotherapy. Accurate delineation of the target tumor is essential for improving the effectiveness of radiotherapy. However, the segmentation performance of current models is unsatisfactory due to poor boundaries, large-scale tumor volume variation, and the labor-intensive nature of manual delineation for radiotherapy. In this paper, MMCA-Net, a novel segmentation network for NPC using PET/CT images that incorporates an innovative multimodal cross attention transformer (MCA-Transformer) and a modified U-Net architecture, is introduced to enhance modal fusion by leveraging cross-attention mechanisms between CT and PET data. Our method, tested against ten algorithms via fivefold cross-validation on samples from Sun Yat-sen University Cancer Center and the public HECKTOR dataset, consistently topped all four evaluation metrics with average Dice similarity coefficients of 0. 815 and 0. 7944, respectively. Furthermore, ablation experiments were conducted to demonstrate the superiority of our method over multiple baseline and variant techniques. The proposed method has promising potential for application in other tasks.

IJCAI Conference 2024 Conference Paper

Reconstructing Missing Variables for Multivariate Time Series Forecasting via Conditional Generative Flows

  • Xuanming Hu
  • Wei Fan
  • Haifeng Chen
  • Pengyang Wang
  • Yanjie Fu

The Variable Subset Forecasting (VSF) problem, where the majority of variables are unavailable in the inference stage of multivariate forecasting, has been an important but under-explored task with broad impacts in many real-world applications. Missing values, absent inter-correlation, and the impracticality of retraining largely hinder the ability of multivariate forecasting models to capture inherent relationships among variables, impacting their performance. However, existing approaches towards these issues either heavily rely on local temporal correlation or face limitations in fully recovering missing information from the unavailable subset, accompanied by notable computational expenses. To address these problems, we propose a novel density estimation solution to recover the information of missing variables via flows-based generative framework. In particular, a novel generative network for time series, namely Time-series Reconstruction Flows (TRF), is proposed to estimate and reconstruct the missing variable subset. In addition, a novel meta-training framework, Variable-Agnostic Meta Learning, has been developed to enhance the generalization ability of TRF, enabling it to adapt to diverse missing variables situations. Finally, extensive experiments are conducted to demonstrate the superiority of our proposed method compared with baseline methods.

AAAI Conference 2023 Conference Paper

Background-Mixed Augmentation for Weakly Supervised Change Detection

  • Rui Huang
  • Ruofei Wang
  • Qing Guo
  • Jieda Wei
  • Yuxiang Zhang
  • Wei Fan
  • Yang Liu

Change detection (CD) is to decouple object changes (i.e., object missing or appearing) from background changes (i.e., environment variations) like light and season variations in two images captured in the same scene over a long time span, presenting critical applications in disaster management, urban development, etc. In particular, the endless patterns of background changes require detectors to have a high generalization against unseen environment variations, making this task significantly challenging. Recent deep learning-based methods develop novel network architectures or optimization strategies with paired-training examples, which do not handle the generalization issue explicitly and require huge manual pixel-level annotation efforts. In this work, for the first attempt in the CD community, we study the generalization issue of CD from the perspective of data augmentation and develop a novel weakly supervised training algorithm that only needs image-level labels. Different from general augmentation techniques for classification, we propose the background-mixed augmentation that is specifically designed for change detection by augmenting examples under the guidance of a set of background changing images and letting deep CD models see diverse environment variations. Moreover, we propose the augmented & real data consistency loss that encourages the generalization increase significantly. Our method as a general framework can enhance a wide range of existing deep learning-based detectors. We conduct extensive experiments in two public datasets and enhance four state-of-the-art methods, demonstrating the advantages of our method. We release the code at https://github.com/tsingqguo/bgmix.

AAAI Conference 2023 Conference Paper

Dish-TS: A General Paradigm for Alleviating Distribution Shift in Time Series Forecasting

  • Wei Fan
  • Pengyang Wang
  • Dongkun Wang
  • Dongjie Wang
  • Yuanchun Zhou
  • Yanjie Fu

The distribution shift in Time Series Forecasting (TSF), indicating series distribution changes over time, largely hinders the performance of TSF models. Existing works towards distribution shift in time series are mostly limited in the quantification of distribution and, more importantly, overlook the potential shift between lookback and horizon windows. To address above challenges, we systematically summarize the distribution shift in TSF into two categories. Regarding lookback windows as input-space and horizon windows as output-space, there exist (i) intra-space shift, that the distribution within the input-space keeps shifted over time, and (ii) inter-space shift, that the distribution is shifted between input-space and output-space. Then we introduce, Dish-TS, a general neural paradigm for alleviating distribution shift in TSF. Specifically, for better distribution estimation, we propose the coefficient net (Conet), which can be any neural architectures, to map input sequences into learnable distribution coefficients. To relieve intra-space and inter-space shift, we organize Dish-TS as a Dual-Conet framework to separately learn the distribution of input- and output-space, which naturally captures the distribution difference of two spaces. In addition, we introduce a more effective training strategy for intractable Conet learning. Finally, we conduct extensive experiments on several datasets coupled with different state-of-the-art forecasting models. Experimental results show Dish-TS consistently boosts them with a more than 20% average improvement. Code is available at https://github.com/weifantt/Dish-TS.

NeurIPS Conference 2023 Conference Paper

FourierGNN: Rethinking Multivariate Time Series Forecasting from a Pure Graph Perspective

  • Kun Yi
  • Qi Zhang
  • Wei Fan
  • Hui He
  • Liang Hu
  • Pengyang Wang
  • Ning An
  • Longbing Cao

Multivariate time series (MTS) forecasting has shown great importance in numerous industries. Current state-of-the-art graph neural network (GNN)-based forecasting methods usually require both graph networks (e. g. , GCN) and temporal networks (e. g. , LSTM) to capture inter-series (spatial) dynamics and intra-series (temporal) dependencies, respectively. However, the uncertain compatibility of the two networks puts an extra burden on handcrafted model designs. Moreover, the separate spatial and temporal modeling naturally violates the unified spatiotemporal inter-dependencies in real world, which largely hinders the forecasting performance. To overcome these problems, we explore an interesting direction of directly applying graph networks and rethink MTS forecasting from a pure graph perspective. We first define a novel data structure, hypervariate graph, which regards each series value (regardless of variates or timestamps) as a graph node, and represents sliding windows as space-time fully-connected graphs. This perspective considers spatiotemporal dynamics unitedly and reformulates classic MTS forecasting into the predictions on hypervariate graphs. Then, we propose a novel architecture Fourier Graph Neural Network (FourierGNN) by stacking our proposed Fourier Graph Operator (FGO) to perform matrix multiplications in Fourier space. FourierGNN accommodates adequate expressiveness and achieves much lower complexity, which can effectively and efficiently accomplish {the forecasting}. Besides, our theoretical analysis reveals FGO's equivalence to graph convolutions in the time domain, which further verifies the validity of FourierGNN. Extensive experiments on seven datasets have demonstrated our superior performance with higher efficiency and fewer parameters compared with state-of-the-art methods. Code is available at this repository: https: //github. com/aikunyi/FourierGNN.

NeurIPS Conference 2023 Conference Paper

Frequency-domain MLPs are More Effective Learners in Time Series Forecasting

  • Kun Yi
  • Qi Zhang
  • Wei Fan
  • Shoujin Wang
  • Pengyang Wang
  • Hui He
  • Ning An
  • Defu Lian

Time series forecasting has played the key role in different industrial, including finance, traffic, energy, and healthcare domains. While existing literatures have designed many sophisticated architectures based on RNNs, GNNs, or Transformers, another kind of approaches based on multi-layer perceptrons (MLPs) are proposed with simple structure, low complexity, and superior performance. However, most MLP-based forecasting methods suffer from the point-wise mappings and information bottleneck, which largely hinders the forecasting performance. To overcome this problem, we explore a novel direction of applying MLPs in the frequency domain for time series forecasting. We investigate the learned patterns of frequency-domain MLPs and discover their two inherent characteristic benefiting forecasting, (i) global view: frequency spectrum makes MLPs own a complete view for signals and learn global dependencies more easily, and (ii) energy compaction: frequency-domain MLPs concentrate on smaller key part of frequency components with compact signal energy. Then, we propose FreTS, a simple yet effective architecture built upon Frequency-domain MLPs for Time Series forecasting. FreTS mainly involves two stages, (i) Domain Conversion, that transforms time-domain signals into complex numbers of frequency domain; (ii) Frequency Learning, that performs our redesigned MLPs for the learning of real and imaginary part of frequency components. The above stages operated on both inter-series and intra-series scales further contribute to channel-wise and time-wise dependency learning. Extensive experiments on 13 real-world benchmarks (including 7 benchmarks for short-term forecasting and 6 benchmarks for long-term forecasting) demonstrate our consistent superiority over state-of-the-art methods. Code is available at this repository: https: //github. com/aikunyi/FreTS.

IJCAI Conference 2022 Conference Paper

Feature and Instance Joint Selection: A Reinforcement Learning Perspective

  • Wei Fan
  • Kunpeng Liu
  • Hao Liu
  • Hengshu Zhu
  • Hui Xiong
  • Yanjie Fu

Feature selection and instance selection are two important techniques of data processing. However, such selections have mostly been studied separately, while existing work towards the joint selection conducts feature/instance selection coarsely; thus neglecting the latent fine-grained interaction between feature space and instance space. To address this challenge, we propose a reinforcement learning solution to accomplish the joint selection task and simultaneously capture the interaction between the selection of each feature and each instance. In particular, a sequential-scanning mechanism is designed as action strategy of agents and a collaborative-changing environment is used to enhance agent collaboration. In addition, an interactive paradigm introduces prior selection knowledge to help agents for more efficient exploration. Finally, extensive experiments on real-world datasets have demonstrated improved performances.

AAAI Conference 2021 Conference Paper

Audio-Oriented Multimodal Machine Comprehension via Dynamic Inter- and Intra-modality Attention

  • Zhiqi Huang
  • Fenglin Liu
  • Xian Wu
  • Shen Ge
  • Helin Wang
  • Wei Fan
  • Yuexian Zou

While Machine Comprehension (MC) has attracted extensive research interests in recent years, existing approaches mainly belong to the category of Machine Reading Comprehension task which mines textual inputs (paragraphs and questions) to predict the answers (choices or text spans). However, there are a lot of MC tasks that accept audio input in addition to the textual input, e. g. English listening comprehension test. In this paper, we target the problem of Audio- Oriented Multimodal Machine Comprehension, and its goal is to answer questions based on the given audio and textual information. To solve this problem, we propose a Dynamic Inter- and Intra-modality Attention (DIIA) model to effectively fuse the two modalities (audio and textual). DIIA can work as an independent component and thus be easily integrated into existing MC models. Moreover, we further develop a Multimodal Knowledge Distillation (MKD) module to enable our multimodal MC model to accurately predict the answers based only on either the text or the audio. As a result, the proposed approach can handle various tasks including: Audio-Oriented Multimodal Machine Comprehension, Machine Reading Comprehension and Machine Listening Comprehension, in a single model, making fair comparisons possible between our model and the existing unimodal MC models. Experimental results and analysis prove the effectiveness of the proposed approaches. First, the proposed DIIA boosts the baseline models by up to 21. 08% in terms of accuracy; Second, under the unimodal scenarios, the MKD module allows our multimodal MC model to significantly outperform the unimodal models by up to 18. 87%, which are trained and tested with only audio or textual data.

AAAI Conference 2021 Conference Paper

U-BERT: Pre-training User Representations for Improved Recommendation

  • Zhaopeng Qiu
  • Xian Wu
  • Jingyue Gao
  • Wei Fan

Learning user representation is a critical task for recommendation systems as it can encode user preference for personalized services. User representation is generally learned from behavior data, such as clicking interactions and review comments. However, for less popular domains, the behavior data is insufficient to learn precise user representations. To deal with this problem, a natural thought is to leverage contentrich domains to complement user representations. Inspired by the recent success of BERT in NLP, we propose a novel pretraining and fine-tuning based approach U-BERT. Different from typical BERT applications, U-BERT is customized for recommendation and utilizes different frameworks in pretraining and fine-tuning. In pre-training, U-BERT focuses on content-rich domains and introduces a user encoder and a review encoder to model users’ behaviors. Two pre-training strategies are proposed to learn the general user representations; In fine-tuning, U-BERT focuses on the target contentinsufficient domains. In addition to the user and review encoders inherited from the pre-training stage, U-BERT further introduces an item encoder to model item representations. Besides, a review co-matching layer is proposed to capture more semantic interactions between the reviews of the user and item. Finally, U-BERT combines user representations, item representations and review interaction information to improve recommendation performance. Experiments on six benchmark datasets from different domains demonstrate the state-of-the-art performance of U-BERT.

IJCAI Conference 2020 Conference Paper

Entity Synonym Discovery via Multipiece Bilateral Context Matching

  • Chenwei Zhang
  • Yaliang Li
  • Nan Du
  • Wei Fan
  • Philip S. Yu

Being able to automatically discover synonymous entities in an open-world setting benefits various tasks such as entity disambiguation or knowledge graph canonicalization. Existing works either only utilize entity features, or rely on structured annotations from a single piece of context where the entity is mentioned. To leverage diverse contexts where entities are mentioned, in this paper, we generalize the distributional hypothesis to a multi-context setting and propose a synonym discovery framework that detects entity synonyms from free-text corpora with considerations on effectiveness and robustness. As one of the key components in synonym discovery, we introduce a neural network model SynonymNet to determine whether or not two given entities are synonym with each other. Instead of using entities features, SynonymNet makes use of multiple pieces of contexts in which the entity is mentioned, and compares the context-level similarity via a bilateral matching schema. Experimental results demonstrate that the proposed model is able to detect synonym sets that are not observed during training on both generic and domain-specific datasets: Wiki+Freebase, PubMed+UMLS, and MedBook+MKG, with up to 4. 16% improvement in terms of Area Under the Curve and 3. 19% in terms of Mean Average Precision compared to the best baseline method.

AAAI Conference 2020 Conference Paper

Federated Learning for Vision-and-Language Grounding Problems

  • Fenglin Liu
  • Xian Wu
  • Shen Ge
  • Wei Fan
  • Yuexian Zou

Recently, vision-and-language grounding problems, e. g. , image captioning and visual question answering (VQA), has attracted extensive interests from both academic and industrial worlds. However, given the similarity of these tasks, the efforts to obtain better results by combining the merits of their algorithms are not well studied. Inspired by the recent success of federated learning, we propose a federated learning framework to obtain various types of image representations from different tasks, which are then fused together to form finegrained image representations. The representations merge useful features from different vision-and-language grounding problems, and are thus much more powerful than the original representations alone in individual tasks. To learn such image representations, we propose the Aligning, Integrating and Mapping Network (aimNet). The aimNet is validated on three federated learning settings, which include horizontal federated learning, vertical federated learning, and federated transfer learning. Experiments of aimNet-based federated learning framework on two representative tasks, i. e. , image captioning and VQA, demonstrate the effective and universal improvements of all metrics over the baselines. In image captioning, we are able to get 14% and 13% relative gain on the taskspecific metrics CIDEr and SPICE, respectively. In VQA, we could also boost the performance of strong baselines by up to 3%.

AAAI Conference 2020 Conference Paper

Object-Guided Instance Segmentation for Biological Images

  • Jingru Yi
  • Hui Tang
  • Pengxiang Wu
  • Bo Liu
  • Daniel J. Hoeppner
  • Dimitris N. Metaxas
  • Lianyi Han
  • Wei Fan

Instance segmentation of biological images is essential for studying object behaviors and properties. The challenges, such as clustering, occlusion, and adhesion problems of the objects, make instance segmentation a non-trivial task. Current box-free instance segmentation methods typically rely on local pixel-level information. Due to a lack of global object view, these methods are prone to over- or undersegmentation. On the contrary, the box-based instance segmentation methods incorporate object detection into the segmentation, performing better in identifying the individual instances. In this paper, we propose a new box-based instance segmentation method. Mainly, we locate the object bounding boxes from their center points. The object features are subsequently reused in the segmentation branch as a guide to separate the clustered instances within an RoI patch. Along with the instance normalization, the model is able to recover the target object distribution and suppress the distribution of neighboring attached objects. Consequently, the proposed model performs excellently in segmenting the clustered objects while retaining the target object details. The proposed method achieves state-of-the-art performances on three biological datasets: cell nuclei, plant phenotyping dataset, and neural cells.

AAAI Conference 2020 Conference Paper

On the Generation of Medical Question-Answer Pairs

  • Sheng Shen
  • Yaliang Li
  • Nan Du
  • Xian Wu
  • Yusheng Xie
  • Shen Ge
  • Tao Yang
  • Kai Wang

Question answering (QA) has achieved promising progress recently. However, answering a question in real-world scenarios like the medical domain is still challenging, due to the requirement of external knowledge and the insufficient quantity of high-quality training data. In the light of these challenges, we study the task of generating medical QA pairs in this paper. With the insight that each medical question can be considered as a sample from the latent distribution of questions given answers, we propose an automated medical QA pair generation framework, consisting of an unsupervised key phrase detector that explores unstructured material for validity, and a generator that involves a multi-pass decoder to integrate structural knowledge for diversity. A series of experiments have been conducted on a real-world dataset collected from the National Medical Licensing Examination of China. Both automatic evaluation and human annotation demonstrate the effectiveness of the proposed method. Further investigation shows that, by incorporating the generated QA pairs for training, significant improvement in terms of accuracy can be achieved for the examination QA system. 1

NeurIPS Conference 2020 Conference Paper

Prophet Attention: Predicting Attention with Future Attention

  • Fenglin Liu
  • Xuancheng Ren
  • Xian Wu
  • Shen Ge
  • Wei Fan
  • Yuexian Zou
  • Xu Sun

Recently, attention based models have been used extensively in many sequence-to-sequence learning systems. Especially for image captioning, the attention based models are expected to ground correct image regions with proper generated words. However, for each time step in the decoding process, the attention based models usually use the hidden state of the current input to attend to the image regions. Under this setting, these attention models have a deviated focus'' problem that they calculate the attention weights based on previous words instead of the one to be generated, impairing the performance of both grounding and captioning. In this paper, we propose the Prophet Attention, similar to the form of self-supervision. In the training stage, this module utilizes the future information to calculate the ideal'' attention weights towards image regions. These calculated ideal'' weights are further used to regularize the deviated'' attention. In this manner, image regions are grounded with the correct words. The proposed Prophet Attention can be easily incorporated into existing image captioning models to improve their performance of both grounding and captioning. The experiments on the Flickr30k Entities and the MSCOCO datasets show that the proposed Prophet Attention consistently outperforms baselines in both automatic metrics and human evaluations. It is worth noticing that we set new state-of-the-arts on the two benchmark datasets and achieve the 1st place on the leaderboard of the online MSCOCO benchmark in terms of the default ranking score, i. e. , CIDEr-c40.

AAAI Conference 2020 Conference Paper

PSENet: Psoriasis Severity Evaluation Network

  • Yi Li
  • Zhe Wu
  • Shuang Zhao
  • Xian Wu
  • Yehong Kuang
  • YangTian Yan
  • Shen Ge
  • Kai Wang

Psoriasis is a chronic skin disease which affects hundreds of millions of people around the world. This disease cannot be fully cured and requires lifelong caring. If the deterioration of Psoriasis is not detected and properly treated in time, it could cause serious complications or even lead to a life threat. Therefore, a quantitative measurement that can track the Psoriasis severity is necessary. Currently, PASI (Psoriasis Area and Severity Index) is the most frequently used measurement in clinical practices. However, PASI has the following disadvantages: (1) Time consuming: calculating PASI usually takes more than 30 minutes which poses a heavy burden on dermatologists; and (2) Inconsistency: due to the complexity of PASI calculation, different or even the same dermatologist could give different scores for the same case. To overcome these drawbacks, we propose PSENet which applies deep neural networks to estimate Psoriasis severity based on skin lesion images. Different from typical deep learning frameworks for image processing, PSENet has the following characteristics: (1) PSENet introduces a score re- fine module which is able to capture the visual features of skin at both coarse and fine-grained granularities; (2) PSENet uses siamese structure in training and accepts pairwise inputs, which reduces the dependency on large amount of training data; and (3) PSENet can not only estimate the severity, but also locate the skin lesion regions from the input image. To train and evaluate PSENet, we work with professional dermatologists from a top hospital and spend years in building a golden dataset. The experimental results show that PSENet can achieve the mean absolute error of 2. 21 and the accuracy of 77. 87% in pair comparison, outperforming baseline methods. Overall, PSENet not only relieves dermatologists from the dull PASI calculation but also enables patients to track Psoriasis severity in a much more convenient manner.

AAAI Conference 2020 Short Paper

Self-Supervised, Semi-Supervised, Multi-Context Learning for the Combined Classification and Segmentation of Medical Images (Student Abstract)

  • Abdullah-Al-Zubaer Imran
  • Chao Huang
  • Hui Tang
  • Wei Fan
  • Yuan Xiao
  • Dingjun Hao
  • Zhen Qian
  • Demetri Terzopoulos

To tackle the problem of limited annotated data, semisupervised learning is attracting attention as an alternative to fully supervised models. Moreover, optimizing a multipletask model to learn “multiple contexts” can provide better generalizability compared to single-task models. We propose a novel semi-supervised multiple-task model leveraging selfsupervision and adversarial training—namely, self-supervised, semi-supervised, multi-context learning (S4 MCL)—and apply it to two crucial medical imaging tasks, classification and segmentation. Our experiments on spine X-rays reveal that the S4 MCL model significantly outperforms semisupervised single-task, semi-supervised multi-context, and fully-supervised single-task models, even with a 50% reduction of classification and segmentation labels.

AAAI Conference 2020 Conference Paper

Shape-Aware Organ Segmentation by Predicting Signed Distance Maps

  • Yuan Xue
  • Hui Tang
  • Zhi Qiao
  • Guanzhong Gong
  • Yong Yin
  • Zhen Qian
  • Chao Huang
  • Wei Fan

In this work, we propose to resolve the issue existing in current deep learning based organ segmentation systems that they often produce results that do not capture the overall shape of the target organ and often lack smoothness. Since there is a rigorous mapping between the Signed Distance Map (SDM) calculated from object boundary contours and the binary segmentation map, we exploit the feasibility of learning the SDM directly from medical scans. By converting the segmentation task into predicting an SDM, we show that our proposed method retains superior segmentation performance and has better smoothness and continuity in shape. To leverage the complementary information in traditional segmentation training, we introduce an approximated Heaviside function to train the model by predicting SDMs and segmentation maps simultaneously. We validate our proposed models by conducting extensive experiments on a hippocampus segmentation dataset and the public MICCAI 2015 Head and Neck Auto Segmentation Challenge dataset with multiple organs. While our carefully designed backbone 3D segmentation network improves the Dice coefficient by more than 5% compared to current state-of-the-arts, the proposed model with SDM learning produces smoother segmentation results with smaller Hausdorff distance and average surface distance, thus proving the effectiveness of our method.

IJCAI Conference 2019 Conference Paper

MNN: Multimodal Attentional Neural Networks for Diagnosis Prediction

  • Zhi Qiao
  • Xian Wu
  • Shen Ge
  • Wei Fan

Diagnosis prediction plays a key role in clinical decision supporting process, which attracted extensive research attention recently. Existing studies mainly utilize discrete medical codes (e. g. , the ICD codes and procedure codes) as the primary features in prediction. However, in real clinical settings, such medical codes could be either incomplete or erroneous. For example, missed diagnosis will neglect some codes which should be included, mis-diagnosis will generate incorrect medical codes. To increase the robustness towards noisy data, we introduce textual clinical notes in addition to medical codes. Combining information from both sides will lead to improved understanding towards clinical health conditions. To accommodate both the textual notes and discrete medical codes in the same framework, we propose Multimodal Attentional Neural Networks (MNN), which integrates multi-modal data in a collaborative manner. Experimental results on real world EHR datasets demonstrate the advantages of MNN in terms of both robustness and accuracy.

AAAI Conference 2019 Conference Paper

Multi-Task Learning with Multi-View Attention for Answer Selection and Knowledge Base Question Answering

  • Yang Deng
  • Yuexiang Xie
  • Yaliang Li
  • Min Yang
  • Nan Du
  • Wei Fan
  • Kai Lei
  • Ying Shen

Answer selection and knowledge base question answering (KBQA) are two important tasks of question answering (QA) systems. Existing methods solve these two tasks separately, which requires large number of repetitive work and neglects the rich correlation information between tasks. In this paper, we tackle answer selection and KBQA tasks simultaneously via multi-task learning (MTL), motivated by the following motivations. First, both answer selection and KBQA can be regarded as a ranking problem, with one at text-level while the other at knowledge-level. Second, these two tasks can benefit each other: answer selection can incorporate the external knowledge from knowledge base (KB), while KBQA can be improved by learning contextual information from answer selection. To fulfill the goal of jointly learning these two tasks, we propose a novel multi-task learning scheme that utilizes multi-view attention learned from various perspectives to enable these tasks to interact with each other as well as learn more comprehensive sentence representations. The experiments conducted on several real-world datasets demonstrate the effectiveness of the proposed method, and the performance of answer selection and KBQA is improved. Also, the multi-view attention scheme is proved to be effective in assembling attentive information from different representational perspectives.

NeurIPS Conference 2014 Conference Paper

Generalized Higher-Order Orthogonal Iteration for Tensor Decomposition and Completion

  • Yuanyuan Liu
  • Fanhua Shang
  • Wei Fan
  • James Cheng
  • Hong Cheng

Low-rank tensor estimation has been frequently applied in many real-world problems. Despite successful applications, existing Schatten 1-norm minimization (SNM) methods may become very slow or even not applicable for large-scale problems. To address this difficulty, we therefore propose an efficient and scalable core tensor Schatten 1-norm minimization method for simultaneous tensor decomposition and completion, with a much lower computational complexity. We first induce the equivalence relation of Schatten 1-norm of a low-rank tensor and its core tensor. Then the Schatten 1-norm of the core tensor is used to replace that of the whole tensor, which leads to a much smaller-scale matrix SNM problem. Finally, an efficient algorithm with a rank-increasing scheme is developed to solve the proposed problem with a convergence guarantee. Extensive experimental results show that our method is usually more accurate than the state-of-the-art methods, and is orders of magnitude faster.

AAAI Conference 2012 Conference Paper

Sembler: Ensembling Crowd Sequential Labeling for Improved Quality

  • Xian Wu
  • Wei Fan
  • Yong Yu

Many natural language processing tasks, such as named entity recognition (NER), part of speech (POS) tagging, word segmentation, and etc. , can be formulated as sequential data labeling problems. Building a sound labeler requires very large number of correctly labeled training examples, which may not always be possible. On the other hand, crowdsourcing provides an inexpensive yet efficient alternative to collect manual sequential labeling from non-experts. However the quality of crowd labeling cannot be guaranteed, and three kinds of errors are typical: (1) incorrect annotations due to lack of expertise (e. g. , labeling gene names from plain text requires corresponding domain knowledge); (2) ignored or omitted annotations due to carelessness or low confidence; (3) noisy annotations due to cheating or vandalism. To correct these mistakes, we present Sembler, a statistical model for ensembling crowd sequential labelings. Sembler considers three types of statistical information: (1) the majority agreement that proves the correctness of an annotation; (2) correct annotation that improves the credibility of the corresponding annotator; (3) correct annotation that enhances the correctness of other annotations which share similar linguistic or contextual features. We evaluate the proposed model on a real Twitter and a synthetical biological data set, and find that Sembler is particularly accurate when more than half of annotators make mistakes.

NeurIPS Conference 2009 Conference Paper

Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models

  • Jing Gao
  • Feng Liang
  • Wei Fan
  • Yizhou Sun
  • Jiawei Han

Little work has been done to directly combine the outputs of multiple supervised and unsupervised models. However, it can increase the accuracy and applicability of ensemble methods. First, we can boost the diversity of classification ensemble by incorporating multiple clustering outputs, each of which provides grouping constraints for the joint label predictions of a set of related objects. Secondly, ensemble of supervised models is limited in applications which have no access to raw data but to the meta-level model outputs. In this paper, we aim at calculating a consolidated classification solution for a set of objects by maximizing the consensus among both supervised predictions and unsupervised grouping constraints. We seek a global optimal label assignment for the target objects, which is different from the result of traditional majority voting and model combination approaches. We cast the problem into an optimization problem on a bipartite graph, where the objective function favors smoothness in the conditional probability estimates over the graph, as well as penalizes deviation from initial labeling of supervised models. We solve the problem through iterative propagation of conditional probability estimates among neighboring nodes, and interpret the method as conducting a constrained embedding in a transformed space, as well as a ranking on the graph. Experimental results on three real applications demonstrate the benefits of the proposed method over existing alternatives.

AAAI Conference 2004 Conference Paper

On the Optimality of Probability Estimation by Random Decision Trees

  • Wei Fan

Random decision tree is an ensemble of decision trees. The feature at any node of a tree in the ensemble is chosen randomly from remaining features. A chosen discrete feature on a decision path cannot be chosen again. Continuous feature can be chosen multiple times, however, with a different splitting value each time. During classification, each tree outputs raw posterior probability. The probabilities from each tree in the ensemble are averaged as the final posterior probability estimate. Although remarkably simple and somehow counter-intuitive, random decision tree has been shown to be highly accurate under 0-1 loss and cost-sensitive loss functions. Preliminary explanation of its high accuracy is due to the “error-tolerance” property of probabilistic decision making. Our study has shown that the actual reason for random tree’s superior performance is due to its optimal approximation to each example’s true probability to be a member of a given class.

IJCAI Conference 2003 Conference Paper

Inductive Learning in Less Than One Sequential Data Scan

  • Wei Fan
  • Haixun Wang
  • Philip S. Yu
  • Shaw-Hwa Lo

Most recent research of scalable inductive learning on very large dataset, decision tree construction in particular, focuses on eliminating memory constraints and reducing the number of sequential data scans. However, state-of-the-art decision tree construction algorithms still require multiple scans over the data set and use sophisticated control mechanisms and data structures. We first discuss a general inductive learning framework that scans the dataset exactly once. Then, we propose an extension based on Hoeffding's inequality that scans the dataset less than once. Our frameworks are applicable to a wide range of inductive learners.

AAAI Conference 2002 Conference Paper

Pruning and Dynamic Scheduling of Cost-Sensitive Ensembles

  • Wei Fan
  • University of California

Previous research has shown that averaging ensemble can scale up learning over very large cost-sensitive datasets with linear speedup independent of the learning algorithms. At the same time, it achieves the same or even better accuracy than a single model computed from the entire dataset. However, one major drawback is its inefficiency in prediction since every base model in the ensemble has to be consulted in order to produce a final prediction. In this paper, we propose several approaches to reduce the number of base classifiers. Among various methods explored, our empirical studies have shown that the benefit-based greedy approach can safely remove more than 90% of the base models while maintaining or even exceeding the prediction accuracy of the original ensemble. Assuming that each base classifier consumes one unit of prediction time, the removal of 90% of base classifiers translates to a prediction speedup of 10 times. On top of pruning, we propose a novel dynamic scheduling approach to further reduce the “expected” number of classifiers employed in prediction. It measures the confidence of a prediction by a subset of classifiers in the pruned ensemble. This confidence is used to decide if more classifiers are needed in order to produce a prediction that is the same as the original unpruned ensemble. This approach reduces the “expected” number of classifiers by another 25% to 75% without loss of accuracy.