Arrow Research search

Author name cluster

Fei Ye

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

26 papers
2 author rows

Possible papers

26

AAAI Conference 2026 Conference Paper

Learning Adaptive and Expandable Mixture Model for Continual Learning

  • Fei Ye
  • Yongcheng Zhong
  • Qihe Liu
  • Adrian G. Bors
  • Jingling sun
  • Jinyu Guo
  • shijie zhou

Continuous learning constitutes a fundamental capability of artificial intelligence systems, enabling them to incrementally assimilate novel information without succumbing to catastrophic forgetting. Recent research has leveraged Pre-Trained Models (PTMs) to enhance continual learning efficacy. Nevertheless, prevailing methodologies typically depend on a singular pre-trained backbone and freeze all pre-trained parameters to mitigate network forgetting, thereby constraining adaptability to emerging tasks. In this study, we introduce an innovative PTM-based framework featuring a Dual-Representation Backbone Architecture (DRBA), which integrates both invariant and evolved representation networks to concurrently capture static and dynamic features. Building upon DRBA, we propose an Adaptive and Expandable Mixture Model (AEMM) that incrementally incorporates new expert modules with minimal parameter overhead to accommodate the learning of each novel task. To further augment adaptability, we develop a Dynamic Adaptive Representation Fusion Mechanism (DARFM) that processes outputs from both representation networks and autonomously generates data-driven adaptive weights, optimizing the contribution of each representation. This mechanism yields an adaptive, semantically enriched composite representation, thereby maximizing positive knowledge transfer. Additionally, we propose a Dynamic Knowledge Calibration Mechanism (DKCM), comprising prediction and representation calibration processes, to ensure consistency in both predictions and feature representations. This approach achieves a balance between stability and plasticity, even when learning complex datasets. Empirical evaluations substantiate that the proposed approach attains state-of-the-art performance.

AAAI Conference 2025 Conference Paper

Continual Unsupervised Generative Modelling via Online Optimal Transport

  • Fei Ye
  • Adrian G. Bors
  • Kun Zhang

Lately, deep generative models have achieved excellent results after learning pre-defined and static data distribution. Meanwhile, their performance on continual learning suffers from degeneration, caused by catastrophic forgetting. In this paper, we study the unsupervised generative modelling in a more realistic continual learning scenario, where class and task information are absent during both training and inference learning phases. To implement this goal, the proposed memory approach consists of a temporary memory system, which stores data examples while a dynamic expansion memory system would gradually preserve those samples that are crucial for long-term memorization. A novel memory expansion mechanism is then proposed, by employing optimal transport distances between the statistics of memorized samples and each newly seen datum. This paper proposes the Sinkhorn-based Dual Dynamic Memory (SDDM) method, by considering Sinkhorn distance as an optimal transport measure, for evaluating the significance of the data to be stored in the memory buffer. The Sinkhorn transport algorithm leads to preserving a diversity of samples within a compact memory capacity. The memory buffering approach does not interact with the model's training process and can be optimized independently in both supervised and unsupervised learning without any modifications. Moreover, we also propose a novel dynamic model expansion mechanism to automatically increase the model's capacity whenever necessary, which can deal with infinite data streams and further improve the model's performance. Experimental results show that the proposed approach achieves state-of-the-art performance in both supervised and unsupervised learning.

ICLR Conference 2025 Conference Paper

DPLM-2: A Multimodal Diffusion Protein Language Model

  • Xinyou Wang
  • Zaixiang Zheng
  • Fei Ye
  • Dongyu Xue
  • Shujian Huang
  • Quanquan Gu

Proteins are essential macromolecules defined by their amino acid sequences, which determine their three-dimensional structures and, consequently, their functions in all living organisms. Therefore, generative protein modeling necessitates a multimodal approach to simultaneously model, understand, and generate both sequences and structures. However, existing methods typically use separate models for each modality, limiting their ability to capture the intricate relationships between sequence and structure. This results in suboptimal performance in tasks that requires joint understanding and generation of both modalities. In this paper, we introduce DPLM-2, a multimodal protein foundation model that extends discrete diffusion protein language model (DPLM) to accommodate both sequences and structures. To enable structural learning with the language model, 3D coordinates are converted to discrete tokens using a lookup-free quantization-based tokenizer. By training on both experimental and high-quality synthetic structures, DPLM-2 learns the joint distribution of sequence and structure, as well as their marginals and conditionals. We also implement an efficient warm-up strategy to exploit the connection between large-scale evolutionary data and structural inductive biases from pre-trained sequence-based protein language models. Empirical evaluation shows that DPLM-2 can simultaneously generate highly compatible amino acid sequences and their corresponding 3D structures eliminating the need for a two-stage generation approach. Moreover, DPLM-2 demonstrates competitive performance in various conditional generation tasks, including folding, inverse folding, and scaffolding with multimodal motif inputs.

AAAI Conference 2025 Conference Paper

Dynamic Expansion Diffusion Learning for Lifelong Generative Modelling

  • Fei Ye
  • Adrian G. Bors
  • Kun Zhang

The diffusion model has lately been shown to achieve remarkable performances through its ability of generating high quality images. However, current diffusion model studies consider only learning from a single data distribution, resulting in catastrophic forgetting when attempting to learn new data. In this paper, we explore a more realistic learning scenario where training data is continuously acquired. We propose the Dynamic Expansion Diffusion Model (DEDM) for addressing catastrophic forgetting and data distribution shifts under Online Task-Free Continual Learning (OTFCL) paradigm. New diffusion components are added to a mixture model following the evaluation of a criterion which compares the probabilistic representation of the new data with the existing knowledge of the DEDM model. In addition, to maintain an optimal architecture, we propose a component discovery approach that ensures the diversity of knowledge while minimizing the total number of parameters in the DEDM. Furthermore, we show how the proposed DEDM can be implemented as a teacher module in a unified framework for representation learning. In this approach, knowledge distillation is proposed for training a student module aiming to compress the teacher's knowledge into the latent space of the student.

NeurIPS Conference 2025 Conference Paper

Dynamic Siamese Expansion Framework for Improving Robustness in Online Continual Learning

  • Fei Ye
  • Yulong Zhao
  • Qihe Liu
  • Junlin Chen
  • Adrian G. Bors
  • Jingling sun
  • Rongyao Hu
  • shijie zhou

Continual learning requires the model to continually capture novel information without forgetting prior knowledge. Nonetheless, existing studies predominantly address the catastrophic forgetting, often neglecting enhancements in model robustness. Consequently, these methodologies fall short in real-time applications, such as autonomous driving, where data samples frequently exhibit noise due to environmental and lighting variations, thereby impairing model efficacy and causing safety issues. In this paper, we address robustness in continual learning systems by introducing an innovative approach, the Dynamic Siamese Expansion Framework (DSEF) that employs a Siamese backbone architecture, comprising static and dynamic components, to facilitate the learning of both global and local representations over time. Specifically, the proposed framework dynamically generates a lightweight expert for each novel task, leveraging the Siamese backbone to enable rapid adaptation. A novel Robust Dynamic Representation Optimization (RDRO) approach is proposed to incrementally update the dynamic backbone by maintaining all previously acquired representations and prediction patterns of historical experts, thereby fostering new task learning without inducing detrimental knowledge transfer. Additionally, we propose a novel Robust Feature Fusion (RFF) approach to incrementally amalgamate robust representations from all historical experts into the expert construction process. A novel mutual information-based technique is employed to derive adaptive weights for feature fusion by assessing the knowledge relevance between historical experts and the new task, thus maximizing positive knowledge transfer effects. A comprehensive experimental evaluation, benchmarking our approach against established baselines, demonstrates that our method achieves state-of-the-art performance even under adversarial attacks.

ICML Conference 2025 Conference Paper

Elucidating the Design Space of Multimodal Protein Language Models

  • Cheng-Yen Hsieh
  • Xinyou Wang
  • Daiheng Zhang
  • Dongyu Xue
  • Fei Ye
  • Shujian Huang
  • Zaixiang Zheng
  • Quanquan Gu

Multimodal protein language models (PLMs) integrate sequence and token-based structural information, serving as a powerful foundation for protein modeling, generation, and design. However, the reliance on tokenizing 3D structures into discrete tokens causes substantial loss of fidelity about fine-grained structural details and correlations. In this paper, we systematically elucidate the design space of multimodal PLMs to overcome their limitations. We identify tokenization loss and inaccurate structure token predictions by the PLMs as major bottlenecks. To address these, our proposed design space covers improved generative modeling, structure-aware architectures and representation learning, and data exploration. Our advancements approach finer-grained supervision, demonstrating that token-based multimodal PLMs can achieve robust structural modeling. The effective design methods dramatically improve the structure generation diversity, and notably, folding abilities of our 650M model by reducing the RMSD from 5. 52 to 2. 36 on PDB testset, even outperforming 3B baselines and on par with the specialized folding models. Project page and code: https: //bytedance. github. io/dplm/dplm-2. 1.

NeurIPS Conference 2025 Conference Paper

Learning Expandable and Adaptable Representations for Continual Learning

  • Ruilong Yu
  • Mingyan Liu
  • Fei Ye
  • Adrian G. Bors
  • Rongyao Hu
  • Jingling sun
  • shijie zhou

Extant studies predominantly address catastrophic forgetting within a simplified continual learning paradigm, typically confined to a singular data domain. Conversely, real-world applications frequently encompass multiple, evolving data domains, wherein models often struggle to retain many critical past information, thereby leading to performance degradation. This paper addresses this complex scenario by introducing a novel dynamic expansion approach called Learning Expandable and Adaptable Representations (LEAR). This framework orchestrates a collaborative backbone structure, comprising global and local backbones, designed to capture both general and task-specific representations. Leveraging this collaborative backbone, the proposed framework dynamically create a lightweight expert to delineate decision boundaries for each novel task, thereby facilitating the prediction process. To enhance new task learning, we introduce a novel Mutual Information-Based Prediction Alignment approach, which incrementally optimizes the global backbone via a mutual information metric, ensuring consistency in the prediction patterns of historical experts throughout the optimization phase. To mitigate network forgetting, we propose a Kullback–Leibler (KL) Divergence-Based Feature Alignment approach, which employs a probabilistic distance measure to prevent significant shifts in critical local representations. Furthermore, we introduce a novel Hilbert-Schmidt Independence Criterion (HSIC)-Based Collaborative Optimization approach, which encourages the local and global backbones to capture distinct semantic information in a collaborative manner, thereby mitigating information redundancy and enhancing model performance. Moreover, to accelerate new task learning, we propose a novel Expert Selection Mechanism that automatically identifies the most relevant expert based on data characteristics. This selected expert is then utilized to initialize a new expert, thereby fostering positive knowledge transfer. This approach also enables expert selection during the testing phase without requring any task information. Empirical results demonstrate that the proposed framework achieves state-of-the-art performance.

NeurIPS Conference 2025 Conference Paper

Learning Multi-Source and Robust Representations for Continual Learning

  • Fei Ye
  • Yongcheng Zhong
  • Qihe Liu
  • Adrian G. Bors
  • Jingling sun
  • Rongyao Hu
  • shijie zhou

Plasticity and stability denote the ability to assimilate new tasks while preserving previously acquired knowledge, representing two important concepts in continual learning. Recent research addresses stability by leveraging pre-trained models to provide informative representations, yet the efficacy of these methods is highly reliant on the choice of the pre-trained backbone, which may not yield optimal plasticity. This paper addresses this limitation by introducing a streamlined and potent framework that orchestrates multiple different pre-trained backbones to derive semantically rich multi-source representations. We propose an innovative Multi-Scale Interaction and Dynamic Fusion (MSIDF) technique to process and selectively capture the most relevant parts of multi-source features through a series of learnable attention modules, thereby helping to learn better decision boundaries to boost performance. Furthermore, we introduce a novel Multi-Level Representation Optimization (MLRO) strategy to adaptively refine the representation networks, offering adaptive representations that enhance plasticity. To mitigate over-regularization issues, we propose a novel Adaptive Regularization Optimization (ARO) method to manage and optimize a switch vector that selectively governs the updating process of each representation layer, which promotes the new task learning. The proposed MLRO and ARO approaches are collectively optimized within a unified optimization framework to achieve an optimal trade-off between plasticity and stability. Our extensive experimental evaluations reveal that the proposed framework attains state-of-the-art performance. The source code of our algorithm is available at https: //github. com/CL-Coder236/LMSRR.

AAAI Conference 2025 Conference Paper

Lifelong Scalable Generative System via Online Maximum Mean Discrepancy

  • Fei Ye
  • Adrian G. Bors

Diffusion-based models have been recently shown to be high-quality data generators. However, their performance severely degrades when training on non-stationary changing data distributions in an online manner, due to the catastrophic forgetting. In this paper, we propose enabling the diffusion model with a novel Dynamic Expansion Memory Unit (DEMU) methodology that adaptively creates new memory buffers, to be added to a memory system, in order to preserve information deemed critical for training the model. Having a selective memory unit is essential for training diffusion networks, which are expensive to train, especially when deployed in resource-constrained environments. A Maximum Mean Discrepancy (MMD) based expansion mechanism, that evaluates probabilistic distances between each of the previously defined memory buffers and the newly given data, and uses them as expansion signals, is employed for ensuring the diversity of information learning. We propose a new model expansion mechanism to automatically add new diffusion models as experts in a mixture system, which enhances the multi-domain image generation performance. Also a novel memory compaction approach is proposed to automatically remove statistically overlapping memory units, through a graph relationship evaluation, preventing the limitless expansion of DEMU. Comprehensive results show that the proposed approach performs better than the state-of-the-art.

ICLR Conference 2025 Conference Paper

ProteinBench: A Holistic Evaluation of Protein Foundation Models

  • Fei Ye
  • Zaixiang Zheng
  • Dongyu Xue
  • Yuning Shen
  • Lihao Wang
  • Yiming Ma
  • Yan Wang
  • Xinyou Wang

Recent years have witnessed a surge in the development of protein foundation models, significantly improving performance in protein prediction and generative tasks ranging from 3D structure prediction and protein design to conformational dynamics. However, the capabilities and limitations associated with these models remain poorly understood due to the absence of a unified evaluation framework. To fill this gap, we introduce ProteinBench, a holistic evaluation framework designed to enhance the transparency of protein foundation models. Our approach consists of three key components: (i) A taxonomic classification of tasks that broadly encompass the main challenges in the protein domain, based on the relationships between different protein modalities; (ii) A multi-metric evaluation approach that assesses performance across four key dimensions: quality, novelty, diversity, and robustness; and (iii) In-depth analyses from various user objectives, providing a holistic view of model performance. Our comprehensive evaluation of protein foundation models reveals several key findings that shed light on their current capabilities and limitations. To promote transparency and facilitate further research, we release the evaluation dataset, code, and a public leaderboard publicly for further analysis and a general modular toolkit. We intend for ProteinBench to be a living benchmark for establishing a standardized, in-depth evaluation framework for protein foundation models, driving their development and application while fostering collaboration within the field.

ICML Conference 2024 Conference Paper

Diffusion Language Models Are Versatile Protein Learners

  • Xinyou Wang
  • Zaixiang Zheng
  • Fei Ye
  • Dongyu Xue
  • Shujian Huang
  • Quanquan Gu

This paper introduces diffusion protein language model (DPLM), a versatile protein language model that demonstrates strong generative and predictive capabilities for protein sequences. We first pre-train scalable DPLMs from evolutionary-scale protein sequences within a generative self-supervised discrete diffusion probabilistic framework, which generalizes language modeling for proteins in a principled way. After pre-training, DPLM exhibits the ability to generate structurally plausible, novel and diverse protein sequences for unconditional generation. We further demonstrate the proposed diffusion generative pre-training make DPLM possess a better understanding of proteins, making it a superior representation learner, which can be fine-tuned for various predictive tasks, comparing favorably to ESM2. Moreover, DPLM can be tailored for various needs, which showcases its prowess of conditional generation in several ways: (1) conditioning on partial peptide sequences, e. g. , generating scaffolds for functional motifs with high success rate; (2) incorporating other modalities as conditioners, e. g. , structure-conditioned generation for inverse folding; and (3) steering sequence generation towards desired properties, e. g. , satisfying specified secondary structures, through a plug-and-play classifier guidance.

AAAI Conference 2024 Conference Paper

Task-Free Continual Generation and Representation Learning via Dynamic Expansionable Memory Cluster

  • Fei Ye
  • Adrian G. Bors

Human brains can continually acquire and learn new skills and knowledge over time from a dynamically changing environment without forgetting previously learnt information. Such a capacity can selectively transfer some important and recently seen information to the persistent knowledge regions of the brain. Inspired by this intuition, we propose a new memory-based approach for image reconstruction and generation in continual learning, consisting of a temporary and evolving memory, with two different storage strategies, corresponding to the temporary and permanent memorisation. The temporary memory aims to preserve up-to-date information while the evolving memory can dynamically increase its capacity in order to preserve permanent knowledge information. This is achieved by the proposed memory expansion mechanism that selectively transfers those data samples deemed as important from the temporary memory to new clusters defined within the evolved memory according to an information novelty criterion. Such a mechanism promotes the knowledge diversity among clusters in the evolved memory, resulting in capturing more diverse information by using a compact memory capacity. Furthermore, we propose a two-step optimization strategy for training a Variational Autoencoder (VAE) to implement generation and representation learning tasks, which updates the generator and inference models separately using two optimisation paths. This approach leads to a better trade-off between generation and reconstruction performance. We show empirically and theoretically that the proposed approach can learn meaningful latent representations while generating diverse images from different domains. The source code and supplementary material (SM) are available at https://github.com/dtuzi123/DEMC.

AAAI Conference 2024 Conference Paper

Task-Free Dynamic Sparse Vision Transformer for Continual Learning

  • Fei Ye
  • Adrian G. Bors

Vision Transformers (ViTs) represent self-attention-based network backbones shown to be efficient in many individual tasks, but which have not been explored in Task-Free Continual Learning (TFCL) so far. Most existing ViT-based approaches for Continual Learning (CL) are relying on task information. In this study, we explore the advantages of the ViT in a more challenging CL scenario where the task boundaries are unavailable during training. To address this learning paradigm, we propose the Task-Free Dynamic Sparse Vision Transformer (TFDSViT), which can dynamically build new sparse experts, where each expert leverages sparsity to allocate the model's capacity for capturing different information categories over time. To avoid forgetting and ensure efficiency in reusing the previously learned knowledge in subsequent learning, we propose a new dynamic dual attention mechanism consisting of the Sparse Attention (SA') and Knowledge Transfer Attention (KTA) modules. The SA' refrains from updating some previously learned attention blocks for preserving prior knowledge. The KTA uses and regulates the information flow of all previously learned experts for learning new patterns. The proposed dual attention mechanism can simultaneously relieve forgetting and promote knowledge transfer for a dynamic expansion model in a task-free manner. We also propose an energy-based dynamic expansion mechanism using the energy as a measure of novelty for the incoming samples which provides appropriate expansion signals leading to a compact network architecture for TFDSViT. Extensive empirical studies demonstrate the effectiveness of TFDSViT. The code and supplementary material (SM) are available at https://github.com/dtuzi123/TFDSViT.

AAAI Conference 2023 Conference Paper

Continual Variational Autoencoder via Continual Generative Knowledge Distillation

  • Fei Ye
  • Adrian G. Bors

Humans and other living beings have the ability of short and long-term memorization during their entire lifespan. However, most existing Continual Learning (CL) methods can only account for short-term information when training on infinite streams of data. In this paper, we develop a new unsupervised continual learning framework consisting of two memory systems using Variational Autoencoders (VAEs). We develop a Short-Term Memory (STM), and a parameterised scalable memory implemented by a Teacher model aiming to preserve the long-term information. To incrementally enrich the Teacher's knowledge during training, we propose the Knowledge Incremental Assimilation Mechanism (KIAM), which evaluates the knowledge similarity between the STM and the already accumulated information as signals to expand the Teacher's capacity. Then we train a VAE as a Student module and propose a new Knowledge Distillation (KD) approach that gradually transfers generative knowledge from the Teacher to the Student module. To ensure the quality and diversity of knowledge in KD, we propose a new expert pruning approach that selectively removes the Teacher's redundant parameters, associated with unnecessary experts which have learnt overlapping information with other experts. This mechanism further reduces the complexity of the Teacher's module while ensuring the diversity of knowledge for the KD procedure. We show theoretically and empirically that the proposed framework can train a statistically diversified Teacher module for continual VAE learning which is applicable to learning infinite data streams.

AAAI Conference 2023 Conference Paper

Learning Dynamic Latent Spaces for Lifelong Generative Modelling

  • Fei Ye
  • Adrian G. Bors

Task Free Continual Learning (TFCL) aims to capture novel concepts from non-stationary data streams without forgetting previously learned knowledge. Mixture models, which add new components when certain conditions are met, have shown promising results in TFCL tasks. However, such approaches do not make use of the knowledge already accumulated for positive knowledge transfer. In this paper, we develop a new model, namely the Online Recursive Variational Autoencoder (ORVAE). ORVAE utilizes the prior knowledge by selectively incorporating the newly learnt information, by adding new components, according to the knowledge already known from the past learnt data. We introduce a new attention mechanism to regularize the structural latent space in which the most important information is reused while the information that interferes with novel samples is inactivated. The proposed attention mechanism can maximize the benefit from the forward transfer for learning novel information without forgetting previously learnt knowledge. We perform several experiments which show that ORVAE achieves state-of-the-art results under TFCL.

ICLR Conference 2023 Conference Paper

Learning Harmonic Molecular Representations on Riemannian Manifold

  • Yiqun Wang
  • Yuning Shen
  • Shi Chen 0003
  • Lihao Wang
  • Fei Ye
  • Hao Zhou

Molecular representation learning plays a crucial role in AI-assisted drug discovery research. Encoding 3D molecular structures through Euclidean neural networks has become the prevailing method in the geometric deep learning community. However, the equivariance constraints and message passing in Euclidean space may limit the network expressive power. In this work, we propose a Harmonic Molecular Representation learning (HMR) framework, which represents a molecule using the Laplace-Beltrami eigenfunctions of the molecular surface. HMR offers a multi-resolution representation of molecular geometric and chemical properties on 2D Riemannian manifold. We also introduce a harmonic message passing method to realize efficient spectral message passing over the surface manifold for better molecular encoding. Our proposed method shows comparable predictive power to current models in small molecule property prediction, and outperforms the state-of-the-art deep learning models for the rigid protein docking challenge, demonstrating its versatility in molecular representation learning.

AAAI Conference 2023 Conference Paper

Lifelong Compression Mixture Model via Knowledge Relationship Graph

  • Fei Ye
  • Adrian G. Bors

Task-Free Continual Learning (TFCL) represents a challenging scenario for lifelong learning because the model, under this paradigm, does not access any task information. The Dynamic Expansion Model (DEM) has shown promising results in this scenario due to its scalability and generalisation power. However, DEM focuses only on addressing forgetting and ignores minimizing the model size, which limits its deployment in practical systems. In this work, we aim to simultaneously address network forgetting and model size optimization by developing the Lifelong Compression Mixture Model (LGMM) equipped with the Maximum Mean Discrepancy (MMD) based expansion criterion for model expansion. A diversity-aware sample selection approach is proposed to selectively store a variety of samples to promote information diversity among the components of the LGMM, which allows more knowledge to be captured with an appropriate model size. In order to avoid having multiple components with similar knowledge in the LGMM, we propose a data-free component discarding mechanism that evaluates a knowledge relation graph matrix describing the relevance between each pair of components. A greedy selection procedure is proposed to identify and remove the redundant components from the LGMM. The proposed discarding mechanism can be performed during or after the training. Experiments on different datasets show that LGMM achieves the best performance for TFCL.

AAAI Conference 2023 Conference Paper

Lifelong Variational Autoencoder via Online Adversarial Expansion Strategy

  • Fei Ye
  • Adrian G. Bors

The Variational Autoencoder (VAE) suffers from a significant loss of information when trained on a non-stationary data distribution. This loss in VAE models, called catastrophic forgetting, has not been studied theoretically before. We analyse the forgetting behaviour of a VAE in continual generative modelling by developing a new lower bound on the data likelihood, which interprets the forgetting process as an increase in the probability distance between the generator's distribution and the evolved data distribution. The proposed bound shows that a VAE-based dynamic expansion model can achieve better performance if its capacity increases appropriately considering the shift in the data distribution. Based on this analysis, we propose a novel expansion criterion that aims to preserve the information diversity among the VAE components, while ensuring that it acquires more knowledge with fewer parameters. Specifically, we implement this expansion criterion from the perspective of a multi-player game and propose the Online Adversarial Expansion Strategy (OAES), which considers all previously learned components as well as the currently updated component as multiple players in a game, while an adversary model evaluates their performance. The proposed OAES can dynamically estimate the discrepancy between each player and the adversary without accessing task information. This leads to the gradual addition of new components while ensuring the knowledge diversity among all of them. We show theoretically and empirically that the proposed extension strategy can enable a VAE model to achieve the best performance given an appropriate model size.

ICLR Conference 2023 Conference Paper

On Pre-training Language Model for Antibody

  • Danqing Wang
  • Fei Ye
  • Hao Zhou 0012

Antibodies are vital proteins offering robust protection for the human body from pathogens. The development of general protein and antibody-specific pre-trained language models both facilitate antibody prediction tasks. However, there have been limited studies that comprehensively explore the representation capability of distinct pre-trained language models on different antibody tasks. To investigate the problem, we aim to answer several key questions in this paper, such as how pre-trained language models perform in antibody tasks with different specificity and how introducing specific biological mechanisms to the pre-training process can benefit the model. Additionally, we evaluate if the learned antibody pre-trained representations can be applied to real-world antibody problems, like drug discovery and immune process understanding. Previously, no benchmark available largely hindered the study to answer these questions. To aid in our investigation, we provide an AnTibody Understanding Evaluation (ATUE) benchmark. We comprehensively evaluate the performance of protein pre-trained language models by empirical study along with conclusions and new insights. Our ATUE and code are released at https://github.com/dqwang122/EATLM.

ICML Conference 2023 Conference Paper

Structure-informed Language Models Are Protein Designers

  • Zaixiang Zheng
  • Yifan Deng
  • Dongyu Xue
  • Yi Zhou 0018
  • Fei Ye
  • Quanquan Gu

This paper demonstrates that language models are strong structure-based protein designers. We present LM-Design, a generic approach to reprogramming sequence-based protein language models (pLMs), that have learned massive sequential evolutionary knowledge from the universe of natural protein sequences, to acquire an immediate capability to design preferable protein sequences for given folds. We conduct a structural surgery on pLMs, where a lightweight structural adapter is implanted into pLMs and endows it with structural awareness. During inference, iterative refinement is performed to effectively optimize the generated protein sequences. Experiments show that LM-Design improves the state-of-the-art results by a large margin, leading to 4% to 12% accuracy gains in sequence recovery (e. g. , 55. 65%/56. 63% on CATH 4. 2/4. 3 single-chain benchmarks, and $>$60% when designing protein complexes). We provide extensive and in-depth analyses, which verify that LM-Design can (1) indeed leverage both structural and sequential knowledge to accurately handle structurally non-deterministic regions, (2) benefit from scaling data and model size, and (3) generalize to other proteins (e. g. , antibodies and de novo proteins).

JBHI Journal 2022 Journal Article

A Dynamic Bayesian Model for Breast Cancer Survival Prediction

  • Jing Teng
  • Honglei Zhang
  • Wuyi Liu
  • Xiao-Ou Shu
  • Fei Ye

Objective: Predicting breast cancer survival and targeting patients at high-risk of mortality is of crucial importance. Methods: We built a Bayesian Dynamic Cox (BDCox) model for predicting 5-year overall survival in breast cancer patients using data of the SEER Cancer Registry with 12, 840 women. Four feature selection methods were used to identify predictors and enhance parsimony: fast backward variable selection, elastic net, Bayesian Model Average (BMA), and clinical expertise. All resulting models and a baseline full model containing all features were internally validated via bootstrapping and externally validated in the Shanghai Breast Cancer Survival Study. Results: BMA outperformed other feature selection methods in both internal and external validations. The BDCox model with 12 predictors had the best performance. Several predictors showed time-varying associations with survival that are in agreement with previous studies. Conclusion: The model developed using BDCox outperformed other prognostic models considered in our study. The internal validation results indicate that the BDCox model is capable of achieving high prediction accuracy (C-statistic: 0. 802), and the external validation results showed excellent generalizability of the BDCox model (C-statistic: 0. 739). Significance: We built a dynamic Bayesian model from the large population-based registry SEER for predicting 5-year breast cancer overall survival. The prediction performance of the BDCox model is significantly better than other survival models.

AAAI Conference 2022 Conference Paper

Lifelong Generative Modelling Using Dynamic Expansion Graph Model

  • Fei Ye
  • Adrian G. Bors

Variational Autoencoders (VAEs) suffer from degenerated performance, when learning several successive tasks. This is caused by catastrophic forgetting. In order to address the knowledge loss, VAEs are using either Generative Replay (GR) mechanisms or Expanding Network Architectures (ENA). In this paper we study the forgetting behaviour of VAEs using a joint GR and ENA methodology, by deriving an upper bound on the negative marginal log-likelihood. This theoretical analysis provides new insights into how VAEs forget the previously learnt knowledge during lifelong learning. The analysis indicates the best performance achieved when considering model mixtures, under the ENA framework, where there are no restrictions on the number of components. However, an ENA-based approach may require an excessive number of parameters. This motivates us to propose a novel Dynamic Expansion Graph Model (DEGM). DEGM expands its architecture, according to the novelty associated with each new database, when compared to the information already learnt by the network from previous tasks. DEGM training optimizes knowledge structuring, characterizing the joint probabilistic representations corresponding to the past and more recently learned tasks. We demonstrate that DEGM guarantees optimal performance for each task while also minimizing the required number of parameters.

NeurIPS Conference 2022 Conference Paper

Task-Free Continual Learning via Online Discrepancy Distance Learning

  • Fei Ye
  • Adrian G. Bors

Learning from non-stationary data streams, also called Task-Free Continual Learning (TFCL) remains challenging due to the absence of explicit task information in most applications. Even though recently some algorithms have been proposed for TFCL, these methods lack theoretical guarantees. Moreover, there are no theoretical studies about forgetting during TFCL. This paper develops a new theoretical analysis framework that derives generalization bounds based on the discrepancy distance between the visited samples and the entire information made available for training the model. This analysis provides new insights into the forgetting behaviour in classification tasks. Inspired by this theoretical model, we propose a new approach enabled with the dynamic component expansion mechanism for a mixture model, namely Online Discrepancy Distance Learning (ODDL). ODDL estimates the discrepancy between the current memory and the already accumulated knowledge as an expansion signal aiming to ensure a compact network architecture with optimal performance. We then propose a new sample selection approach that selectively stores the samples into the memory buffer through the discrepancy-based measure, further improving the performance. We perform several TFCL experiments with the proposed methodology, which demonstrate that the proposed approach achieves the state of the art performance.

JBHI Journal 2020 Journal Article

Bayesian Inference of Lymph Node Ratio Estimation and Survival Prognosis for Breast Cancer Patients

  • Jing Teng
  • Assem Abdygametova
  • Jing Du
  • Bian Ma
  • Rong Zhou
  • Yu Shyr
  • Fei Ye

Objective: We evaluated the prognostic value of lymph node ratio (LNR) for the survival of breast cancer patients using Bayesian inference. Methods: Data on 5, 279 women with infiltrating duct and lobular carcinoma breast cancer, diagnosed from 2006-2010, was obtained from the NCI SEER Cancer Registry. A prognostic modeling framework was proposed using Bayesian inference to estimate the impact of LNR in breast cancer survival. Based on the proposed model, we then developed a web application for estimating LNR and predicting overall survival. Results: The final survival model with LNR outperformed the other models considered (C-statistic 0. 71). Compared to directly measured LNR, estimated LNR slightly increased the accuracy of the prognostic model. Model diagnostics and predictive performance confirmed the effectiveness of Bayesian modeling and the prognostic value of the LNR in predicting breast cancer survival. Conclusion: The estimated LNR was found to have a significant predictive value for the overall survival of breast cancer patients. Significance: We used Bayesian inference to estimate LNR which was then used to predict overall survival. The models were developed from a large population-based cancer registry. We also built a user-friendly web application for individual patient survival prognosis. The diagnostic value of the LNR and the effectiveness of the proposed model were evaluated by comparisons with existing prediction models.

JMLR Journal 2010 Journal Article

Rate Minimaxity of the Lasso and Dantzig Selector for the l q Loss in l r Balls

  • Fei Ye
  • Cun-Hui Zhang

We consider the estimation of regression coefficients in a high-dimensional linear model. For regression coefficients in l r balls, we provide lower bounds for the minimax l q risk and minimax quantiles of the l q loss for all design matrices. Under an l 0 sparsity condition on a target coefficient vector, we sharpen and unify existing oracle inequalities for the Lasso and Dantzig selector. We derive oracle inequalities for target coefficient vectors with many small elements and smaller threshold levels than the universal threshold. These oracle inequalities provide sufficient conditions on the design matrix for the rate minimaxity of the Lasso and Dantzig selector for the l q risk and loss in l r balls, 0≤ r≤ 1≤ q≤ ∞. By allowing q=∞, our risk bounds imply the variable selection consistency of threshold Lasso and Dantzig selectors. [abs] [ pdf ][ bib ] &copy JMLR 2010. ( edit, beta )