Arrow Research search

Author name cluster

Yifan Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

23 papers
2 author rows

Possible papers

23

TMLR Journal 2026 Journal Article

DiffKGW: Stealthy and Robust Diffusion Model Watermarking

  • Tianxin Wei
  • Ruizhong Qiu
  • Yifan Chen
  • Yunzhe Qi
  • Jiacheng Lin
  • Wenxuan Bao
  • Wenju Xu
  • Sreyashi Nag

Diffusion models are known for their supreme capability to generate realistic images. However, ethical concerns, such as copyright protection and the generation of inappropriate content, pose significant challenges for the practical deployment of diffusion models. Recent work has proposed a flurry of watermarking techniques that inject artificial patterns into initial latent representations of diffusion models, offering a promising solution to these issues. However, enforcing a specific pattern on selected elements can disrupt the Gaussian distribution of the initial latent representation. Inspired by watermarks for large language models (LLMs), we generalize the LLM KGW watermark to image diffusion models and propose a stealthy probability adjustment approach DiffKGW that preserves the Gaussian distribution of initial latent representation. In addition, we dissect the design principles of state-of-the-art watermarking techniques and introduce a unified framework. We identify a set of dimensions that explain the manipulation enforced by watermarking methods, including the distribution of individual elements, the specification of watermark shapes within each channel, and the choice of channels for watermark embedding. Through the empirical studies on regular text-to-image applications and the first systematic attempt at watermarking image-to-image diffusion models, we thoroughly verify the effectiveness of our proposed framework through comprehensive evaluations. On all the diffusion models, including Stable Diffusion, our approach induced from the proposed framework not only preserves image quality but also outperforms existing methods in robustness against a wide range of attacks.

AAAI Conference 2026 Conference Paper

When Human Preferences Flip: An Instance-Dependent Robust Loss for RLHF

  • Yifan Xu
  • Xichen Ye
  • Yifan Chen
  • Qiaosheng Zhang

Quality of datasets plays an important role in large language model (LLM) alignment. In collecting human feedback, however, preference flipping is ubiquitous and causes corruption in data annotation; the issue necessitates the alignment algorithms with improved robustness against potential flipped pairs. To this end, this paper introduces a Flipping-Aware Direct Preference Optimization (FA-DPO) algorithm tailored to preference flipping from a reinforcement learning with human feedback (RLHF) perspective. We dissect the inherent human intention model and the preference flipping mechanism introduced by external factors as two distinct stages; in the latter, we introduce an instance-dependent flipping probability on the basis of the Bradley-Terry (BT) model. Further, by leveraging features relevant to preference annotation, we capture uncertainty in judgments and model preference flipping patterns. In practice, we design a simple yet efficient iterative optimization algorithm compatible with the original RLHF and Direct Preference Optimization (DPO) algorithms. In our experiments, we investigate the instance-dependent preference flipping model under multiple circumstances for evaluation of our proposed method, as well as other baseline methods.

ICML Conference 2025 Conference Paper

A Recipe for Causal Graph Regression: Confounding Effects Revisited

  • Yujia Yin
  • Tianyi Qu
  • Zihao Wang
  • Yifan Chen

Through recognizing causal subgraphs, causal graph learning (CGL) has risen to be a promising approach for improving the generalizability of graph neural networks under out-of-distribution (OOD) scenarios. However, the empirical successes of CGL techniques are mostly exemplified in classification settings, while regression tasks, a more challenging setting in graph learning, are overlooked. We thus devote this work to tackling causal graph regression (CGR); to this end we reshape the processing of confounding effects in existing CGL studies, which mainly deal with classification. Specifically, we reflect on the predictive power of confounders in graph-level regression, and generalize classification-specific causal intervention techniques to regression through a lens of contrastive learning. Extensive experiments on graph OOD benchmarks validate the efficacy of our proposals for CGR. The model implementation and the code are provided on https: //github. com/causal-graph/CGR.

JBHI Journal 2025 Journal Article

DrugKANs: A Paradigm to Enhance Drug-Target Interaction Prediction With KANs

  • Xiangzheng Fu
  • Zhenya Du
  • Yifan Chen
  • Haiting Chen
  • Linlin Zhuo
  • Aiping Lu
  • Dongsheng Cao
  • Xiaojun Yao

Identifyingpotential drug-target interactions (DTIs) is crucial for understanding drug mechanisms, and recent computational methods have yielded promising results in this area. However, these methods face several challenges, including limited model generalization due to heavy reliance on multiple similarity datasets and complex feature extraction, as well as a lack of interpretability by ignoring intrinsic information about drugs and targets. To address these challenges, we propose DrugKANs, a novel DTI prediction model that enhances both the quality and interpretability of DTI representations by integrating a dual-tower architecture with Kolmogorov-Arnold Network (KAN) technology. Our model involves utilizing a pre-trained model to derive initial representations of drugs and targets, and employing a lightweight attention mechanism to capture key features, thereby improving representation quality. We leverage the dual-tower architecture and a lightweight feature interaction mechanism to extract high-level representations separately for drugs and targets, aiming to reduce complex feature interactions and mitigate overfitting. Additionally, we incorporate a contrastive learning strategy within the drug-target bipartite graph to address sparse neighborhood effects and enhance topological information. The inclusion of KAN technology further improves the interpretability of the DTI prediction model. Experimental results on public datasets demonstrate that our model predicts DTIs effectively, underscoring its potential as a valuable tool in drug discovery. This comprehensive methodology presents a balanced approach to overcoming the identified challenges in DTI prediction. Our data and code are available at: https://github.com/Excelsior511/DrugKANs.

AAAI Conference 2025 Conference Paper

Optimized Gradient Clipping for Noisy Label Learning

  • Xichen Ye
  • Yifan Wu
  • Weizhong Zhang
  • Xiaoqiang Li
  • Yifan Chen
  • Cheng Jin

Previous research has shown that constraining the gradient of loss function w.r.t. model-predicted probabilities can enhance the model robustness against noisy labels. These methods typically specify a fixed optimal threshold for gradient clipping through validation data to obtain the desired robustness against noise. However, this common practice overlooks the dynamic distribution of gradients from both clean and noisy-labeled samples at different stages of training, significantly limiting the model capability to adapt to the variable nature of gradients throughout the training process. To address this issue, we propose a simple yet effective approach called Optimized Gradient Clipping (OGC), which dynamically adjusts the clipping threshold based on the ratio of noise gradients to clean gradients after clipping, estimated by modeling the distributions of clean and noisy samples. This approach allows us to modify the clipping threshold at each training step, effectively controlling the influence of noise gradients. Additionally, we provide statistical analysis to certify the noise-tolerance ability of OGC. Our extensive experiments across various types of label noise, including symmetric, asymmetric, instance-dependent, and real-world noise, demonstrate the effectiveness of our approach.

NeurIPS Conference 2025 Conference Paper

Split Gibbs Discrete Diffusion Posterior Sampling

  • Wenda Chu
  • Zihui Wu
  • Yifan Chen
  • Yang Song
  • Yisong Yue

We study the problem of posterior sampling in discrete-state spaces using discrete diffusion models. While posterior sampling methods for continuous diffusion models have achieved remarkable progress, analogous methods for discrete diffusion models remain challenging. In this work, we introduce a principled plug-and-play discrete diffusion posterior sampling algorithm based on split Gibbs sampling, which we call SGDD. Our algorithm enables reward-guided generation and solving inverse problems in discrete-state spaces. We demonstrate the convergence of SGDD to the target posterior distribution and verify this through controlled experiments on synthetic benchmarks. Our method enjoys state-of-the-art posterior sampling performance on a range of benchmarks for discrete data, including DNA sequence design, discrete image inverse problems, and music infilling, achieving more than 30% improved performance compared to existing baselines.

NeurIPS Conference 2025 Conference Paper

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

  • Xeron Du
  • Yifan Yao
  • Kaijing Ma
  • Bingli Wang
  • Tianyu Zheng
  • Minghao Liu
  • Yiming Liang
  • Xiaolong Jin

Large language models (LLMs) have demonstrated remarkable proficiency in mainstream academic disciplines such as mathematics, physics, and computer science. However, human knowledge encompasses over 200 specialized disciplines, far exceeding the scope of existing benchmarks. The capabilities of LLMs in many of these specialized fields-particularly in light industry, agriculture, and service-oriented disciplines-remain inadequately evaluated. To address this gap, we present SuperGPQA, a comprehensive benchmark that evaluates graduate-level knowledge and reasoning capabilities across 285 disciplines. Our benchmark employs a novel Human-LLM collaborative filtering mechanism to eliminate trivial or ambiguous questions through iterative refinement based on both LLM responses and expert feedback. Our experimental results reveal significant room for improvement in the performance of current state-of-the-art LLMs across diverse knowledge domains (e. g. , the reasoning-focused model Gemini-2. 5-Pro achieved the highest accuracy of 63. 56% on SuperGPQA), highlighting the considerable gap between current model capabilities and artificial general intelligence. Additionally, we present comprehensive insights from our management of a large-scale annotation process, involving over 80 expert annotators and an interactive Human-LLM collaborative system, offering valuable methodological guidance for future research initiatives of comparable scope.

JBHI Journal 2024 Journal Article

CiGNN: A Causality-Informed and Graph Neural Network Based Framework for Cuffless Continuous Blood Pressure Estimation

  • Lei Liu
  • Huiqi Lu
  • Maxine Whelan
  • Yifan Chen
  • Xiaorong Ding

Causalityholds profound potentials to dissipate confusion and improve accuracy in cuffless continuous blood pressure (BP) estimation, an area often neglected in current research. In this study, we propose a two-stage framework, CiGNN, that seamlessly integrates causality and graph neural network (GNN) for cuffless continuous BP estimation. The first stage concentrates on the generation of a causal graph between BP and wearable features from the the perspective of causal inference, so as to identify features that are causally related to BP variations. This stage is pivotal for the identification of novel causal features from the causal graph beyond pulse transit time (PTT). We found these causal features empower better tracking in BP changes compared to PTT. For the second stage, a spatio-temporal GNN (STGNN) is utilized to learn from the causal graph obtained from the first stage. The STGNN can exploit both the spatial information within the causal graph and temporal information from beat-by-beat cardiac signals for refined cuffless continuous BP estimation. We evaluated the proposed method with three datasets that include 305 subjects (102 hypertensive patients) with age ranging from 20–90 and BP at different levels, with the continuous Finapres BP as references. The mean absolute difference (MAD) for estimated systolic blood pressure (SBP) and diastolic blood pressure (DBP) were 3. 77 mmHg and 2. 52 mmHg, respectively, which outperformed comparison methods. In all cases including subjects with different age groups, while doing various maneuvers that induces BP changes at different levels and with or without hypertension, the proposed CiGNN method demonstrates superior performance for cuffless continuous BP estimation. These findings suggest that the proposed CiGNN is a promising approach in elucidating the causal mechanisms of cuffless BP estimation and can substantially enhance the precision of BP measurement.

NeurIPS Conference 2024 Conference Paper

Gliding over the Pareto Front with Uniform Designs

  • Xiaoyuan Zhang
  • Genghui Li
  • Xi Lin
  • Yichi Zhang
  • Yifan Chen
  • Qingfu Zhang

Multiobjective optimization (MOO) plays a critical role in various real-world domains. A major challenge therein is generating $K$ uniform Pareto-optimal solutions to represent the entire Pareto front. To address this issue, this paper firstly introduces \emph{fill distance} to evaluate the $K$ design points, which provides a quantitative metric for the representativeness of the design. However, directly specifying the optimal design that minimizes the fill distance is nearly intractable due to the nested $\min-\max-\min$ optimization problem. To address this, we propose a surrogate ``max-packing'' design for the fill distance design, which is easier to optimize and leads to a rate-optimal design with a fill distance at most $4\times$ the minimum value. Extensive experiments on synthetic and real-world benchmarks demonstrate that our proposed paradigm efficiently produces high-quality, representative solutions and outperforms baseline methods.

NeurIPS Conference 2024 Conference Paper

LibMOON: A Gradient-based MultiObjective OptimizatioN Library in PyTorch

  • Xiaoyuan Zhang
  • Liang Zhao
  • Yingying Yu
  • Xi Lin
  • Yifan Chen
  • Han Zhao
  • Qingfu Zhang

Multiobjective optimization problems (MOPs) are prevalent in machine learning, with applications in multi-task learning, learning under fairness or robustness constraints, etc. Instead of reducing multiple objective functions into a scalar objective, MOPs aim to optimize for the so-called Pareto optimality or Pareto set learning, which involves optimizing more than one objective function simultaneously, over models with thousands to millions of parameters. Existing benchmark libraries for MOPs mainly focus on evolutionary algorithms, most of which are zeroth-order or meta-heuristic methods that do not effectively utilize higher-order information from objectives and cannot scale to large-scale models with millions of parameters. In light of the above challenges, this paper introduces \algoname, the first multiobjective optimization library that supports state-of-the-art gradient-based methods, provides a fair and comprehensive benchmark, and is open-sourced for the community.

NeurIPS Conference 2024 Conference Paper

Principled Probabilistic Imaging using Diffusion Models as Plug-and-Play Priors

  • Zihui Wu
  • Yu Sun
  • Yifan Chen
  • Bingliang Zhang
  • Yisong Yue
  • Katherine L. Bouman

Diffusion models (DMs) have recently shown outstanding capabilities in modeling complex image distributions, making them expressive image priors for solving Bayesian inverse problems. However, most existing DM-based methods rely on approximations in the generative process to be generic to different inverse problems, leading to inaccurate sample distributions that deviate from the target posterior defined within the Bayesian framework. To harness the generative power of DMs while avoiding such approximations, we propose a Markov chain Monte Carlo algorithm that performs posterior sampling for general inverse problems by reducing it to sampling the posterior of a Gaussian denoising problem. Crucially, we leverage a general DM formulation as a unified interface that allows for rigorously solving the denoising problem with a range of state-of-the-art DMs. We demonstrate the effectiveness of the proposed method on six inverse problems (three linear and three nonlinear), including a real-world black hole imaging problem. Experimental results indicate that our proposed method offers more accurate reconstructions and posterior estimation compared to existing DM-based imaging inverse methods.

ICML Conference 2024 Conference Paper

Probabilistic Forecasting with Stochastic Interpolants and Föllmer Processes

  • Yifan Chen
  • Mark Goldstein
  • Mengjian Hua
  • Michael Samuel Albergo
  • Nicholas Matthew Boffi
  • Eric Vanden-Eijnden

We propose a framework for probabilistic forecasting of dynamical systems based on generative modeling. Given observations of the system state over time, we formulate the forecasting problem as sampling from the conditional distribution of the future system state given its current state. To this end, we leverage the framework of stochastic interpolants, which facilitates the construction of a generative model between an arbitrary base distribution and the target. We design a fictitious, non-physical stochastic dynamics that takes as initial condition the current system state and produces as output a sample from the target conditional distribution in finite time and without bias. This process therefore maps a point mass centered at the current state onto a probabilistic ensemble of forecasts. We prove that the drift coefficient entering the stochastic differential equation (SDE) achieving this task is non-singular, and that it can be learned efficiently by square loss regression over the time-series data. We show that the drift and the diffusion coefficients of this SDE can be adjusted after training, and that a specific choice that minimizes the impact of the estimation error gives a Föllmer process. We highlight the utility of our approach on several complex, high-dimensional forecasting problems, including stochastically forced Navier-Stokes and video prediction on the KTH and CLEVRER datasets. The code is available at https: //github. com/interpolants/forecasting.

TMLR Journal 2023 Journal Article

Calibrate and Debias Layer-wise Sampling for Graph Convolutional Networks

  • Yifan Chen
  • Tianning Xu
  • Dilek Hakkani-Tur
  • Di Jin
  • Yun Yang
  • Ruoqing Zhu

Multiple sampling-based methods have been developed for approximating and accelerating node embedding aggregation in graph convolutional networks (GCNs) training. Among them, a layer-wise approach recursively performs importance sampling to select neighbors jointly for existing nodes in each layer. This paper revisits the approach from a matrix approximation perspective, and identifies two issues in the existing layer-wise sampling methods: suboptimal sampling probabilities and estimation biases induced by sampling without replacement. To address these issues, we accordingly propose two remedies: a new principle for constructing sampling probabilities and an efficient debiasing algorithm. The improvements are demonstrated by extensive analyses of estimation variance and experiments on common benchmarks. Code and algorithm implementations are publicly available at \url{https://github.com/ychen-stat-ml/GCN-layer-wise-sampling}.

NeurIPS Conference 2023 Conference Paper

Hypervolume Maximization: A Geometric View of Pareto Set Learning

  • Xiaoyuan Zhang
  • Xi Lin
  • Bo Xue
  • Yifan Chen
  • Qingfu Zhang

This paper presents a novel approach to multiobjective algorithms aimed at modeling the Pareto set using neural networks. Whereas previous methods mainly focused on identifying a finite number of solutions, our approach allows for the direct modeling of the entire Pareto set. Furthermore, we establish an equivalence between learning the complete Pareto set and maximizing the associated hypervolume, which enables the convergence analysis of hypervolume (as a new metric) for Pareto set learning. Specifically, our new analysis framework reveals the connection between the learned Pareto solution and its representation in a polar coordinate system. We evaluate our proposed approach on various benchmark problems and real-world problems, and the encouraging results make it a potentially viable alternative to existing multiobjective algorithms. Code is available at \url{https: //github. com/xzhang2523/hvpsl/tree/master}.

JBHI Journal 2023 Journal Article

Predicting CircRNA-Disease Associations via Feature Convolution Learning With Heterogeneous Graph Attention Network

  • Li Peng
  • Cheng Yang
  • Yifan Chen
  • Wei Liu

Exploring the relationship between circular RNA (circRNA) and disease is beneficial for revealing the mechanisms of disease pathogenesis. However, a blind search for all possible associations between circRNAs and diseases through biological experiments is time-consuming. Although some prediction methods have been proposed, they still have limitations. In this study, a novel computational framework, called GATCL2CD, is proposed to forecast unknown circRNA-disease associations (CDAs). First, we calculate Gaussian interactive profile kernel (GIP) similarity and semantic similarity for diseases, circRNA sequence similarity and function similarity, and GIPs for circRNAs. Then, we combine them to construct a heterogeneous graph. Thereafter, GATCL2CD proposes a feature convolution learning framework, that uses a multi-head dynamic attention mechanism to obtain different aggregated representations of features that correspond to the nodes in the heterogeneous graph. Then, it extracts rich higher-order features from the stacked feature representations of each node by using of a single-layer convolutional neural network with filter kernels of different sizes. Finally, a pairwise element-wise product operation is implemented to capture the interactions of higher-order feature representations, and a multilayer perceptron neural network is introduced as an efficient classifier for inferring potential CDAs. Major experimental results under 5-fold cross-validation (5-fold CV) on three different datasets show that GATCL2CD is superior to five other state-of-the-art methods. Furthermore, case studies demonstrate the suitability of GATCL2CD as a useful tool for identifying potential disease-related circRNAs.

JBHI Journal 2021 Journal Article

MI-UNet: Multi-Inputs UNet Incorporating Brain Parcellation for Stroke Lesion Segmentation From T1-Weighted Magnetic Resonance Images

  • Yue Zhang
  • Jiong Wu
  • Yilong Liu
  • Yifan Chen
  • Ed X. Wu
  • Xiaoying Tang

Stroke is a serious manifestation of various cerebrovascular diseases and one of the most dangerous diseases in the world today. Volume quantification and location detection of chronic stroke lesions provide vital biomarkers for stroke rehabilitation. Recently, deep learning has seen a rapid growth, with a great potential in segmenting medical images. In this work, unlike most deep learning-based segmentation methods utilizing only magnetic resonance (MR) images as the input, we propose and validate a novel stroke lesion segmentation approach named multi-inputs UNet (MI-UNet) that incorporates brain parcellation information, including gray matter (GM), white matter (WM) and lateral ventricle (LV). The brain parcellation is obtained from 3D diffeomorphic registration and is concatenated with the original MR image to form two-channel inputs to the subsequent MI-UNet. Effectiveness of the proposed pipeline is validated using a dataset consisting of 229 T1-weighted MR images. Experiments are conducted via a five-fold cross-validation. The proposed MI-UNet performed significantly better than UNet in both 2D and 3D settings. Our best results obtained by 3D MI-UNet has superior segmentation performance, as measured by the Dice score, Hausdorff distance, average symmetric surface distance, as well as precision, over other state-of-the-art methods.

NeurIPS Conference 2021 Conference Paper

Skyformer: Remodel Self-Attention with Gaussian Kernel and Nystr\"om Method

  • Yifan Chen
  • Qi Zeng
  • Heng Ji
  • Yun Yang

Transformers are expensive to train due to the quadratic time and space complexity in the self-attention mechanism. On the other hand, although kernel machines suffer from the same computation bottleneck in pairwise dot products, several approximation schemes have been successfully incorporated to considerably reduce their computational cost without sacrificing too much accuracy. In this work, we leverage the computation methods for kernel machines to alleviate the high computational cost and introduce Skyformer, which replaces the softmax structure with a Gaussian kernel to stabilize the model training and adapts the Nyström method to a non-positive semidefinite matrix to accelerate the computation. We further conduct theoretical analysis by showing that the matrix approximation error of our proposed method is small in the spectral norm. Experiments on Long Range Arena benchmark show that the proposed method is sufficient in getting comparable or even better performance than the full self-attention while requiring fewer computation resources.