Arrow Research search

Author name cluster

Jianfeng Lu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

35 papers
2 author rows

Possible papers

35

EAAI Journal 2026 Journal Article

A lightweight and real-time surgical action detection framework using multi-contextual and decoupled representations

  • Siming Zheng
  • A.S.M. Sharifuzzaman Sagar
  • Yu Chen
  • Jun Hoong Chan
  • Zehao Yu
  • Shi Ying
  • Jianfeng Lu

Accurate detection of surgical actions in minimally invasive procedures is a critical step toward developing intelligent operative assistance systems. In this work, we propose Surgical You Only Look Once detector (Surg-YOLO), an efficient and high-precision surgical action detection framework built upon the YOLO version 11 (YOLOv11) architecture, specifically optimized for the spatio-temporal complexities of surgical environments. Surg-YOLO integrates three key architectural innovations: the Enhanced Spatial Pyramid Pooling-Fast (ESPPF) module for capturing rich multi-scale spatial features; the Spatio-Temporal Multi-scale Context Aggregation Module (ST-MCAM), which enhances temporal reasoning and contextual awareness across frames; and the Decoupled Dual-Branch Prediction Head (DDPH) for independently refining classification and localization tasks. Extensive experiments on a large-scale surgical action dataset demonstrate that Surg-YOLO significantly outperforms existing baseline models, achieving superior detection accuracy across multiple evaluation thresholds. Qualitative visualizations further validate the model’s ability to localize subtle and concurrent surgical actions with high precision. These results highlight Surg-YOLO’s potential as a reliable solution for real-time surgical action detection.

AAAI Conference 2026 Conference Paper

FedCure: Mitigating Participation Bias in Semi-Asynchronous Federated Learning with Non-IID Data

  • Yue Chen
  • Jianfeng Lu
  • Shuqin Cao
  • Wei Wang
  • Gang Li
  • Guanghui Wen

While semi-asynchronous federated learning (SAFL) combines the efficiency of synchronous training with the flexibility of asynchronous updates, it inherently suffers from participation bias, which is further exacerbated by non-IID data distributions. More importantly, hierarchical architecture shifts participation from individual clients to client groups, thereby further intensifying this issue. Despite notable advancements in SAFL research, most existing works still focus on conventional cloud-end architectures while largely overlooking the critical impact of non-IID data on scheduling across the cloud–edge–client hierarchy. To tackle these challenges, we propose FedCure, an innovative semiasynchronous Federated learning framework that leverages Coalition construction and participation-aware scheduling to mitigate participation bias with non-IID data. Specifically, FedCure operates through three key rules: (1) a preference rule that optimizes coalition formation by maximizing collective benefits and establishing theoretically stable partitions to reduce non-IID-induced performance degradation; (2) a scheduling rule that integrates the virtual queue technique with Bayesian-estimated coalition dynamics, mitigating efficiency loss while ensuring mean rate stability; and (3) a resource allocation rule that enhances computational efficiency by optimizing client CPU frequencies based on estimated coalition dynamics while satisfying delay requirements. Comprehensive experiments on four real-world datasets demonstrate that FedCure improves accuracy by up to 5.1x compared with four state-of-the-art baselines, while significantly enhancing efficiency with the lowest coefficient of variation 0.0223 for per-round latency and maintaining long-term balance across diverse scenarios.

JBHI Journal 2026 Journal Article

Joint Dynamic Brain Network Estimation and Graph Representation Learning for the Recognition of Neurological Disorders

  • Saqib Mamoon
  • Zhengwang Xia
  • Wang Jin
  • Amani Alfakih
  • Jianfeng Lu

Recently, Graph Neural Networks (GNNs) have shown significant improvements in the recognition of neurological disorders by incorporating brain networks/graphs. However, most existing approaches have three main limitations. First, these methodologies rely on precomputed brain networks as input, typically derived from statistical metrics (e. g. , Pearson correlation), which are inherently not learnable. Second, methods often assume that the magnitude of the brain interactions remains constant across the whole scan duration. Third, representations produced by models often lack interpretability and robustness when applied across brain disorders. To address these limitations, we propose a novel model called the Effective Brain Inference Graph Neural Network (EBIGNN), which infers dynamic Effective Connectivity (dEC) to characterize brain networks trained with direct feedback from downstream tasks within a unified end-to-end framework. EBIGNN is highly flexible in learning the most relevant graph structures customized to the specific underlying brain condition. The proposed model offers strong interpretability, providing valuable insights into the temporal evolution and altered connectivity patterns essential for understanding brain disorders. The model is validated on three publicly available datasets, demonstrating superior performance compared to other state-of-the-art methods. Moreover, the findings are consistent with previous neuroimaging-derived evidence of biomarkers, underscoring the model’s robustness in clinical settings.

AAAI Conference 2026 Conference Paper

OPTION: An Online Pricing Strategy for Asynchronous Federated Learning Against Free-Riding Attacks

  • Bangqi Pan
  • Jianfeng Lu
  • Shuqin Cao
  • Xiao Zhang
  • Gang Li
  • Guanghui Wen

Asynchronous Federated Learning (AFL) is acclaimed for accelerating collaborative training on heterogeneous systems by eliminating the wait for stragglers. While current solutions focus on improving convergence amidst update delays, they neglect how delayed aggregation fosters free-riding attacks, allowing malicious clients to easily extract the global model without contribution. This behavior results in significant fairness issues and performance degradation. To address this challenge, we propose OPTION, the first online pricing strategy tailored to mitigate free-riding in AFL. OPTION establishes an economic model in which access to model updates is purchased using credits earned from verified contributions. Specifically, OPTION values each model update according to its marginal performance gain and training cost, and subsequently necessitates a download fee from each client based on the Hotelling model to prevent zero-cost acquisition. Moreover, OPTION rewards clients for successful updates under non-arbitrage constraints, effectively balancing individual utility and task budget. To maximize the average model performance while satisfying these conditions, OPTION leverages the Lyapunov drift framework and a probabilistic sampling-based algorithm to optimize the pricing parameters. Extensive experimental results on three real-world datasets demonstrate that OPTION effectively mitigates freeriding attacks in AFL, increases the number of valid updates by at least 23.97%, and achieves a model accuracy improvement of at least 3.01% compared to state-of-the-art baselines.

AAAI Conference 2026 Conference Paper

OursFed: Provable Group Fairness-Aware Federated Learning Against Distrust and Fragility

  • Yun Xin
  • Jianfeng Lu
  • Gang Li
  • Shuqin Cao
  • Guanghui Wen
  • Kehao Wang

With the increasing application of high-stakes decisionmaking application in Federated Learning (FL), ensuring fairness across different populations to prevent biases against certain groups has become crucial. However, achieving group fairness (GF) in FL presents a formidable challenge due to its decentralization, which complicates the global GF estimation by the server. Moreover, distrust and fragility hinder the server from gathering GF values from unreliable clients. This challenge motivates our proposal of OursFed, a provable GF-aware FL framework that integrates a privacy pairbased contract and robust GF estimation method to address issues of distrust and fragility. Methodologically, we categorize client unreliability into two categories: active unreliability stemming from distrust and passive unreliability arising from fragility. To mitigate active unreliability, we design a privacy pair-based contract to guarantee truthful GF reporting, and enhance multivariate analysis by identifying relationships among multiple private data. To counteract passive unreliability, we develop a robust GF estimation using non-parametric techniques to smooth data and estimate probability densities and regression functions, improving per-client GF accuracy under multi-dimensional data perturbation. Theoretically, we demonstrate the efficacy of OursFed by analyzing its convergence, GF stability, and accuracy deviation. Experimentally, evaluations on two real datasets show that OursFed improves GF by 28.61% with at most 2.7% trade-off versus state-ofthe-art baselines, and synthetic experiments further confirm its effectiveness in handling fragility and distrust.

AAAI Conference 2026 Conference Paper

Ripple Shapley: Data Influence Attribution in One Federated Training Run

  • Dewen Zeng
  • Wenlong Tian
  • Haozhao Wang
  • Jianfeng Lu
  • Weijun Xiao
  • Zhiyong Xu

Contribution evaluation is essential for incentivizing high-quality data sharing in federated learning (FL), yet existing Shapley-value-based methods are prohibitively expensive and overlook temporal influence propagation. In this paper, we propose Ripple Shapley, a novel attribution framework that enables accurate, real-time data valuation within a single federated training run. Our method decomposes each sample’s impact into an instantaneous drop term and a recursive ripple term, the latter capturing downstream influence via a Jacobian chain over global updates. To scale computation, we introduce a low-rank approximation of the Jacobian product and construct a shared subspace for efficient ripple accumulation. Extensive experiments on CIFAR-10 and MNIST show that Ripple Shapley achieves up to 62× speedup over existing Shapley-based FL methods while maintaining high attribution fidelity, significantly improving efficiency, robustness, and fairness in federated environments. We further demonstrate its effectiveness in dynamic federated learning scenarios and its potential for real-time data pricing.

EAAI Journal 2025 Journal Article

A friction temperature model for dynamic operating bearing based on CNN and CNNLSTM

  • Changcheng Deng
  • Jianfeng Lu
  • Linchao An
  • Zhiqiang Gao
  • Xueli Cheng
  • Changliu Tian

Bearing temperature is a critical parameter affecting operational longevity and indicating potential failure. Operating parameters influence the temperature rise and its distribution in the rolling bearings, such as speed, load, external conditions, grease content, and clearance. Traditionally, the SKF model has been utilized to estimate bearing temperature. This study developed a high-speed bearing test rig to experimentally evaluate two different bearings under various conditions, considering the aforementioned influencing factors. Convolutional Neural Networks (CNN) and a hybrid Convolutional Neural Network combined with Long Short-Term Memory (CNNLSTM) models were employed to enhance the accuracy of bearing temperature prediction. A comparative analysis of the performance of these models against the SKF model revealed that both CNN and CNNLSTM models outperformed the traditional SKF approach in terms of accuracy, stability, and robustness, with the CNNLSTM model demonstrating superior predictive performance. For bearing 1 under 9 steady thermal cases, the coefficients of determination (R2) for the SKF, Back Propagation (BP), Genetic Algorithm-Back Propagation (GABP), CNN, and CNNLSTM models are 0. 6452, 0. 856, 0. 9051, 0. 6558, and 0. 917, respectively. The Root Mean Square Error (RMSE) values are 7. 0034, 4. 4608, 3. 63, 6. 3814, and 3. 6211, respectively, and the Mean Absolute Percentage Error (MAPE) values are 0. 0936, 0. 088, 0. 0579, 0. 0779, and 0. 0547, respectively. Bearing 2 yields a similar result, with a slight difference. Furthermore, the study investigated the effects of various operational factors on bearing temperature. In particular, an increase in both axial and radial loads results in higher bearing temperatures. This indicates a nonlinear relationship between temperature increase and load magnitude. As the rotational speed increases, the bearing temperature increases linearly with it. The proposed algorithmic models for bearing temperature estimation eliminate the need for individual modeling of bearing dynamics, thermodynamics, and lubrication mechanisms while significantly improving performance metrics. This research offers substantial potential for advancing bearing condition monitoring and temperature regulation systems, contributing to more effective and efficient predictive maintenance strategies.

IJCAI Conference 2025 Conference Paper

DaringFed: A Dynamic Bayesian Persuasion Pricing for Online Federated Learning Under Two-sided Incomplete Information

  • Yun Xin
  • Jianfeng Lu
  • Shuqin Cao
  • Gang Li
  • Haozhao Wang
  • Guanghui Wen

Online Federated Learning (OFL) is a real-time learning paradigm that sequentially executes parameter aggregation immediately for each random arriving client. To motivate clients to participate in OFL, it is crucial to offer appropriate incentives to offset the training resource consumption. However, the design of incentive mechanisms in OFL is constrained by the dynamic variability of Two-sided Incomplete Information (TII) concerning resources, where the server is unaware of the clients’ dynamically changing computational resources, while clients lack knowledge of the real-time communication resources allocated by the server. To incentivize clients to participate in training by offering dynamic rewards to each arriving client, we design a novel Dynamic Bayesian persuasion pricing for online Federated learning (DaringFed) under TII. Specifically, we begin by formulating the interaction between the server and clients as a dynamic signaling and pricing allocation problem within a Bayesian persuasion game, and then demonstrate the existence of a unique Bayesian persuasion Nash equilibrium. By deriving the optimal design of DaringFed under one-sided incomplete information, we further analyze the approximate optimal design of DaringFed with a specific bound under TII. Finally, extensive evaluation conducted on real datasets demonstrate that DaringFed optimizes accuracy and converges speed by 16. 99%, while experiments with synthetic datasets validate the convergence of estimate unknown values and the effectiveness of DaringFed in improving the server’s utility by up to 12. 6%.

AAAI Conference 2025 Conference Paper

FedCross: Intertemporal Federated Learning Under Evolutionary Games

  • Jianfeng Lu
  • Ying Zhang
  • Riheng Jia
  • Shuqin Cao
  • Jing Liu
  • Hao Fu

Federated Learning (FL) mitigates privacy leakage in decentralized machine learning by allowing multiple clients to train collaboratively locally. However, dynamic mobile networks with high mobility, intermittent connectivity, and bandwidth limitation severely hinder model updates to the cloud server. Although previous studies have typically addressed user mobility issue through task reassignment or predictive modeling, frequent migrations may result in high communication overhead. Addressing this challenge involves not only dealing with resource constraints, but also finding ways to mitigate the challenges posed by user migrations. We therefore propose a intertemporal incentive framework, FedCross, which ensures the continuity of FL tasks by migrating interrupted training tasks to feasible mobile devices. FedCross comprises two distinct stages: Specifically, in Stage 1, we address the task allocation problem across regions under resource constraints by employing a multi-objective migration algorithm to quantify the optimal task receivers. Moreover, we adopt evolutionary game theory to capture the dynamic decision-making of users, forecasting the evolution of user proportions across different regions to mitigate frequent migrations. In Stage 2, we utilize a procurement auction mechanism to allocate rewards among base stations, ensuring that those providing high-quality models receive optimal compensation. This approach incentivizes sustained user participation, thereby ensuring the overall feasibility of FedCross. Finally, experimental results validate the theoretical soundness of FedCross and demonstrate its significant reduction in communication overhead.

AAAI Conference 2025 Conference Paper

TRAIL: Trust-Aware Client Scheduling for Semi-Decentralized Federated Learning

  • Gangqiang Hu
  • Jianfeng Lu
  • Jianmin Han
  • Shuqin Cao
  • Jing Liu
  • Hao Fu

Due to the sensitivity of data, Federated Learning (FL) is employed to enable distributed machine learning while safeguarding data privacy and accommodating the requirements of various devices. However, in the context of semidecentralized FL, clients’ communication and training states are dynamic. This variability arises from local training fluctuations, heterogeneous data distributions, and intermittent client participation. Most existing studies primarily focus on stable client states, neglecting the dynamic challenges inherent in real-world scenarios. To tackle this issue, we propose a TRust-Aware clIent scheduLing mechanism called TRAIL, which assesses client states and contributions, enhancing model training efficiency through selective client participation. We focus on a semi-decentralized FL framework where edge servers and clients train a shared global model using unreliable intra-cluster model aggregation and inter-cluster model consensus. First, we propose an adaptive hidden semi-Markov model to estimate clients’ communication states and contributions. Next, we address a client-server association optimization problem to minimize global training loss. Using convergence analysis, we propose a greedy client scheduling algorithm. Finally, our experiments conducted on real-world datasets demonstrate that TRAIL outperforms state-of-the-art baselines, achieving an improvement of 8.7% in test accuracy and a reduction of 15.3% in training loss.

JMLR Journal 2024 Journal Article

Deep Network Approximation: Beyond ReLU to Diverse Activation Functions

  • Shijun Zhang
  • Jianfeng Lu
  • Hongkai Zhao

This paper explores the expressive power of deep neural networks for a diverse range of activation functions. An activation function set $\mathscr{A}$ is defined to encompass the majority of commonly used activation functions, such as $\mathtt{ReLU}$, $\mathtt{LeakyReLU}$, $\mathtt{ReLU}^2$, $\mathtt{ELU}$, $\mathtt{CELU}$, $\mathtt{SELU}$, $\mathtt{Softplus}$, $\mathtt{GELU}$, $\mathtt{SiLU}$, $\mathtt{Swish}$, $\mathtt{Mish}$, $\mathtt{Sigmoid}$, $\mathtt{Tanh}$, $\mathtt{Arctan}$, $\mathtt{Softsign}$, $\mathtt{dSiLU}$, and $\mathtt{SRS}$. We demonstrate that for any activation function $\varrho\in \mathscr{A}$, a $\mathtt{ReLU}$ network of width $N$ and depth $L$ can be approximated to arbitrary precision by a $\varrho$-activated network of width $3N$ and depth $2L$ on any bounded set. This finding enables the extension of most approximation results achieved with $\mathtt{ReLU}$ networks to a wide variety of other activation functions, albeit with slightly increased constants. Significantly, we establish that the (width,$\,$depth) scaling factors can be further reduced from $(3,2)$ to $(1,1)$ if $\varrho$ falls within a specific subset of $\mathscr{A}$. This subset includes activation functions such as $\mathtt{ELU}$, $\mathtt{CELU}$, $\mathtt{SELU}$, $\mathtt{Softplus}$, $\mathtt{GELU}$, $\mathtt{SiLU}$, $\mathtt{Swish}$, and $\mathtt{Mish}$. [abs] [ pdf ][ bib ] &copy JMLR 2024. ( edit, beta )

AAAI Conference 2024 Conference Paper

Enhanced Fine-Grained Motion Diffusion for Text-Driven Human Motion Synthesis

  • Dong Wei
  • Xiaoning Sun
  • Huaijiang Sun
  • Shengxiang Hu
  • Bin Li
  • Weiqing Li
  • Jianfeng Lu

The emergence of text-driven motion synthesis technique provides animators with great potential to create efficiently. However, in most cases, textual expressions only contain general and qualitative motion descriptions, while lack fine depiction and sufficient intensity, leading to the synthesized motions that either (a) semantically compliant but uncontrollable over specific pose details, or (b) even deviates from the provided descriptions, bringing animators with undesired cases. In this paper, we propose DiffKFC, a conditional diffusion model for text-driven motion synthesis with KeyFrames Collaborated, enabling realistic generation with collaborative and efficient dual-level control: coarse guidance at semantic level, with only few keyframes for direct and fine-grained depiction down to body posture level. Unlike existing inference-editing diffusion models that incorporate conditions without training, our conditional diffusion model is explicitly trained and can fully exploit correlations among texts, keyframes and the diffused target frames. To preserve the control capability of discrete and sparse keyframes, we customize dilated mask attention modules where only partial valid tokens participate in local-to-global attention, indicated by the dilated keyframe mask. Additionally, we develop a simple yet effective smoothness prior, which steers the generated frames towards seamless keyframe transitions at inference. Extensive experiments show that our model not only achieves state-of-the-art performance in terms of semantic fidelity, but more importantly, is able to satisfy animator requirements through fine-grained guidance without tedious labor.

IJCAI Conference 2024 Conference Paper

LEAP: Optimization Hierarchical Federated Learning on Non-IID Data with Coalition Formation Game

  • Jianfeng Lu
  • Yue Chen
  • Shuqin Cao
  • Longbiao Chen
  • Wei Wang
  • Yun Xin

Although Hierarchical Federated Learning (HFL) utilizes edge servers (ESs) to alleviate communication burdens, its model performance will be degraded by non-IID data and limited communication resources. Current works often assume that data is uniformly distributed, which however contradicts the heterogeneity of IoT. Solutions involving additional model training to check the data distribution inevitably increase computational costs and the risk of privacy leakage. The challenges in solving these issues are how to reduce the impact of non-IID data without involving raw data, and how to rationalize the communication resource allocation for addressing straggler problem. To tackle these challenges, we propose a novel optimization method based on coaLition formation gamE and grAdient Projection, called LEAP. Specifically, we combine edge data distribution with coalition formation game innovatively to adjust the correlations between clients and ESs dynamically, ensuring optimal correlations. We further capture the client heterogeneity to achieve the rational bandwidth allocation from coalition perception and determine the optimal transmission power within specified delay constraints at the client level. Experimental results on four real datasets show that LEAP is able to achieve 20. 62% improvement in model accuracy compared to the state-of-the-art baselines. Moreover, LEAP effectively reduces transmission energy consumption by at least about 2. 24 times.

RLJ Journal 2024 Journal Article

More Efficient Randomized Exploration for Reinforcement Learning via Approximate Sampling

  • Haque Ishfaq
  • Yixin Tan
  • Yu Yang
  • Qingfeng Lan
  • Jianfeng Lu
  • A. Rupam Mahmood
  • Doina Precup
  • Pan Xu

Thompson sampling (TS) is one of the most popular exploration techniques in reinforcement learning (RL). However, most TS algorithms with theoretical guarantees are difficult to implement and not generalizable to Deep RL. While approximate sampling-based exploration schemes are promising, most existing algorithms are specific to linear Markov Decision Processes (MDP) with suboptimal regret bounds, or only use the most basic samplers such as Langevin Monte Carlo. In this work, we propose an algorithmic framework that incorporates different approximate sampling methods with the recently proposed Feel-Good Thompson Sampling (FGTS) approach (Zhang, 2022; Dann et al., 2021), which was previously known to be intractable. When applied to linear MDPs, our regret analysis yields the best known dependency of regret on dimensionality, surpassing existing randomized algorithms. Additionally, we provide explicit sampling complexity for each employed sampler. Empirically, we show that in tasks where deep exploration is necessary, our proposed algorithms that combine FGTS and approximate sampling perform significantly better compared to other strong baselines. On several challenging games from the Atari 57 suite, our algorithms achieve performance that is either better than or on par with other strong baselines from the deep RL literature.

RLC Conference 2024 Conference Paper

More Efficient Randomized Exploration for Reinforcement Learning via Approximate Sampling

  • Haque Ishfaq
  • Yixin Tan
  • Yu Yang
  • Qingfeng Lan
  • Jianfeng Lu
  • A. Rupam Mahmood
  • Doina Precup
  • Pan Xu

Thompson sampling (TS) is one of the most popular exploration techniques in reinforcement learning (RL). However, most TS algorithms with theoretical guarantees are difficult to implement and not generalizable to Deep RL. While approximate sampling-based exploration schemes are promising, most existing algorithms are specific to linear Markov Decision Processes (MDP) with suboptimal regret bounds, or only use the most basic samplers such as Langevin Monte Carlo. In this work, we propose an algorithmic framework that incorporates different approximate sampling methods with the recently proposed Feel-Good Thompson Sampling (FGTS) approach (Zhang, 2022; Dann et al. , 2021), which was previously known to be intractable. When applied to linear MDPs, our regret analysis yields the best known dependency of regret on dimensionality, surpassing existing randomized algorithms. Additionally, we provide explicit sampling complexity for each employed sampler. Empirically, we show that in tasks where deep exploration is necessary, our proposed algorithms that combine FGTS and approximate sampling perform significantly better compared to other strong baselines. On several challenging games from the Atari 57 suite, our algorithms achieve performance that is either better than or on par with other strong baselines from the deep RL literature.

IJCAI Conference 2024 Conference Paper

TAI++: Text as Image for Multi-Label Image Classification by Co-Learning Transferable Prompt

  • Xiangyu Wu
  • Qing-Yuan Jiang
  • Yang Yang
  • Yi-Feng Wu
  • Qing-Guo Chen
  • Jianfeng Lu

The recent introduction of prompt tuning based on pre-trained vision-language models has dramatically improved the performance of multi-label image classification. However, some existing strategies that have been explored still have drawbacks, i. e. , either exploiting massive labeled visual data at a high cost or using text data only for text prompt tuning and thus failing to learn the diversity of visual knowledge. Hence, the application scenarios of these methods are limited. In this paper, we propose a pseudo-visual prompt (PVP) module for implicit visual prompt tuning to address this problem. Specifically, we first learn the pseudo-visual prompt for each category, mining diverse visual knowledge by the well-aligned space of pre-trained vision-language models. Then, a co-learning strategy with a dual-adapter module is designed to transfer visual knowledge from pseudo-visual prompt to text prompt, enhancing their visual representation abilities. Experimental results on VOC2007, MS-COCO, and NUSWIDE datasets demonstrate that our method can surpass state-of-the-art (SOTA) methods across various settings for multi-label image classification tasks. The code is available at https: //github. com/njustkmg/PVP.

NeurIPS Conference 2024 Conference Paper

What does guidance do? A fine-grained analysis in a simple setting

  • Muthu Chidambaram
  • Khashayar Gatmiry
  • Sitan Chen
  • Holden Lee
  • Jianfeng Lu

The use of guidance in diffusion models was originally motivated by the premise that the guidance-modified score is that of the data distribution tilted by a conditional likelihood raised to some power. In this work we clarify this misconception by rigorously proving that guidance fails to sample from the intended tilted distribution. Our main result is to give a fine-grained characterization of the dynamics of guidance in two cases, (1) mixtures of compactly supported distributions and (2) mixtures of Gaussians, which reflect salient properties of guidance that manifest on real-world data. In both cases, we prove that as the guidance parameter increases, the guided model samples more heavily from the boundary of the support of the conditional distribution. We also prove that for any nonzero level of score estimation error, sufficiently large guidance will result in sampling away from the support, theoretically justifying the empirical finding that large guidance results in distorted generations. In addition to verifying these results empirically in synthetic settings, we also show how our theoretical insights can offer useful prescriptions for practical deployment.

JBHI Journal 2023 Journal Article

A Structure-Guided Effective and Temporal-Lag Connectivity Network for Revealing Brain Disorder Mechanisms

  • Zhengwang Xia
  • Tao Zhou
  • Saqib Mamoon
  • Amani Alfakih
  • Jianfeng Lu

Brain network provides important insights for the diagnosis of many brain disorders, and how to effectively model the brain structure has become one of the core issues in the domain of brain imaging analysis. Recently, various computational methods have been proposed to estimate the causal relationship (i. e. , effective connectivity) between brain regions. Compared with traditional correlation-based methods, effective connectivity can provide the direction of information flow, which may provide additional information for the diagnosis of brain diseases. However, existing methods either ignore the fact that there is a temporal-lag in the information transmission across brain regions, or simply set the temporal-lag value between all brain regions to a fixed value. To overcome these issues, we design an effective temporal-lag neural network (termed ETLN) to simultaneously infer the causal relationships and the temporal-lag values between brain regions, which can be trained in an end-to-end manner. In addition, we also introduce three mechanisms to better guide the modeling of brain networks. The evaluation results on the Alzheimer's Disease Neuroimaging Initiative (ADNI) database demonstrate the effectiveness of the proposed method.

NeurIPS Conference 2023 Conference Paper

Deep Equilibrium Based Neural Operators for Steady-State PDEs

  • Tanya Marwah
  • Ashwini Pokle
  • J. Zico Kolter
  • Zachary Lipton
  • Jianfeng Lu
  • Andrej Risteski

Data-driven machine learning approaches are being increasingly used to solve partial differential equations (PDEs). They have shown particularly striking successes when training an operator, which takes as input a PDE in some family, and outputs its solution. However, the architectural design space, especially given structural knowledge of the PDE family of interest, is still poorly understood. We seek to remedy this gap by studying the benefits of weight-tied neural network architectures for steady-state PDEs. To achieve this, we first demonstrate that the solution of most steady-state PDEs can be expressed as a fixed point of a non-linear operator. Motivated by this observation, we propose FNO-DEQ, a deep equilibrium variant of the FNO architecture that directly solves for the solution of a steady-state PDE as the infinite-depth fixed point of an implicit operator layer using a black-box root solver and differentiates analytically through this fixed point resulting in $\mathcal{O}(1)$ training memory. Our experiments indicate that FNO-DEQ-based architectures outperform FNO-based baselines with $4\times$ the number of parameters in predicting the solution to steady-state PDEs such as Darcy Flow and steady-state incompressible Navier-Stokes. Finally, we show FNO-DEQ is more robust when trained with datasets with more noisy observations than the FNO-based baselines, demonstrating the benefits of using appropriate inductive biases in architectural design for different neural network based PDE solvers. Further, we show a universal approximation result that demonstrates that FNO-DEQ can approximate the solution to any steady-state PDE that can be written as a fixed point equation.

ICML Conference 2023 Conference Paper

Global optimality of Elman-type RNNs in the mean-field regime

  • Andrea Agazzi
  • Jianfeng Lu
  • Sayan Mukherjee 0001

We analyze Elman-type recurrent neural networks (RNNs) and their training in the mean-field regime. Specifically, we show convergence of gradient descent training dynamics of the RNN to the corresponding mean-field formulation in the large width limit. We also show that the fixed points of the limiting infinite-width dynamics are globally optimal, under some assumptions on the initialization of the weights. Our results establish optimality for feature-learning with wide RNNs in the mean-field regime.

AAAI Conference 2023 Conference Paper

Human Joint Kinematics Diffusion-Refinement for Stochastic Motion Prediction

  • Dong Wei
  • Huaijiang Sun
  • Bin Li
  • Jianfeng Lu
  • Weiqing Li
  • Xiaoning Sun
  • Shengxiang Hu

Stochastic human motion prediction aims to forecast multiple plausible future motions given a single pose sequence from the past. Most previous works focus on designing elaborate losses to improve the accuracy, while the diversity is typically characterized by randomly sampling a set of latent variables from the latent prior, which is then decoded into possible motions. This joint training of sampling and decoding, however, suffers from posterior collapse as the learned latent variables tend to be ignored by a strong decoder, leading to limited diversity. Alternatively, inspired by the diffusion process in nonequilibrium thermodynamics, we propose MotionDiff, a diffusion probabilistic model to treat the kinematics of human joints as heated particles, which will diffuse from original states to a noise distribution. This process not only offers a natural way to obtain the "whitened'' latents without any trainable parameters, but also introduces a new noise in each diffusion step, both of which facilitate more diverse motions. Human motion prediction is then regarded as the reverse diffusion process that converts the noise distribution into realistic future motions conditioned on the observed sequence. Specifically, MotionDiff consists of two parts: a spatial-temporal transformer-based diffusion network to generate diverse yet plausible motions, and a flexible refinement network to further enable geometric losses and align with the ground truth. Experimental results on two datasets demonstrate that our model yields the competitive performance in terms of both diversity and accuracy.

NeurIPS Conference 2023 Conference Paper

Learning Dense Flow Field for Highly-accurate Cross-view Camera Localization

  • Zhenbo Song
  • ze xianghui
  • Jianfeng Lu
  • Yujiao Shi

This paper addresses the problem of estimating the 3-DoF camera pose for a ground-level image with respect to a satellite image that encompasses the local surroundings. We propose a novel end-to-end approach that leverages the learning of dense pixel-wise flow fields in pairs of ground and satellite images to calculate the camera pose. Our approach differs from existing methods by constructing the feature metric at the pixel level, enabling full-image supervision for learning distinctive geometric configurations and visual appearances across views. Specifically, our method employs two distinct convolution networks for ground and satellite feature extraction. Then, we project the ground feature map to the bird's eye view (BEV) using a fixed camera height assumption to achieve preliminary geometric alignment. To further establish the content association between the BEV and satellite features, we introduce a residual convolution block to refine the projected BEV feature. Optical flow estimation is performed on the refined BEV feature map and the satellite feature map using flow decoder networks based on RAFT. After obtaining dense flow correspondences, we apply the least square method to filter matching inliers and regress the ground camera pose. Extensive experiments demonstrate significant improvements compared to state-of-the-art methods. Notably, our approach reduces the median localization error by 89\%, 19\%, 80\%, and 35\% on the KITTI, Ford multi-AV, VIGOR, and Oxford RobotCar datasets, respectively.

AAAI Conference 2023 Conference Paper

Meta-Auxiliary Learning for Adaptive Human Pose Prediction

  • Qiongjie Cui
  • Huaijiang Sun
  • Jianfeng Lu
  • Bin Li
  • Weiqing Li

Predicting high-fidelity future human poses, from a historically observed sequence, is crucial for intelligent robots to interact with humans. Deep end-to-end learning approaches, which typically train a generic pre-trained model on external datasets and then directly apply it to all test samples, emerge as the dominant solution to solve this issue. Despite encouraging progress, they remain non-optimal, as the unique properties (e.g., motion style, rhythm) of a specific sequence cannot be adapted. More generally, once encountering out-of-distributions, the predicted poses tend to be unreliable. Motivated by this observation, we propose a novel test-time adaptation framework that leverages two self-supervised auxiliary tasks to help the primary forecasting network adapt to the test sequence. In the testing phase, our model can adjust the model parameters by several gradient updates to improve the generation quality. However, due to catastrophic forgetting, both auxiliary tasks typically have a low ability to automatically present the desired positive incentives for the final prediction performance. For this reason, we also propose a meta-auxiliary learning scheme for better adaptation. Extensive experiments show that the proposed approach achieves higher accuracy and more realistic visualization.

JMLR Journal 2023 Journal Article

Single Timescale Actor-Critic Method to Solve the Linear Quadratic Regulator with Convergence Guarantees

  • Mo Zhou
  • Jianfeng Lu

We propose a single timescale actor-critic algorithm to solve the linear quadratic regulator (LQR) problem. A least squares temporal difference (LSTD) method is applied to the critic and a natural policy gradient method is used for the actor. We give a proof of convergence with sample complexity $\mathcal{O}(\varepsilon^{-1} \log(\varepsilon^{-1})^2)$. The method in the proof is applicable to general single timescale bilevel optimization problems. We also numerically validate our theoretical results on the convergence. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2023. ( edit, beta )

NeurIPS Conference 2023 Conference Paper

The probability flow ODE is provably fast

  • Sitan Chen
  • Sinho Chewi
  • Holden Lee
  • Yuanzhi Li
  • Jianfeng Lu
  • Adil Salim

We provide the first polynomial-time convergence guarantees for the probabilistic flow ODE implementation (together with a corrector step) of score-based generative modeling. Our analysis is carried out in the wake of recent results obtaining such guarantees for the SDE-based implementation (i. e. , denoising diffusion probabilistic modeling or DDPM), but requires the development of novel techniques for studying deterministic dynamics without contractivity. Through the use of a specially chosen corrector step based on the underdamped Langevin diffusion, we obtain better dimension dependence than prior works on DDPM ($O(\sqrt d)$ vs. $O(d)$, assuming smoothness of the data distribution), highlighting potential advantages of the ODE framework.

NeurIPS Conference 2022 Conference Paper

Convergence for score-based generative modeling with polynomial complexity

  • Holden Lee
  • Jianfeng Lu
  • Yixin Tan

Score-based generative modeling (SGM) is a highly successful approach for learning a probability distribution from data and generating further samples. We prove the first polynomial convergence guarantees for the core mechanic behind SGM: drawing samples from a probability density $p$ given a score estimate (an estimate of $\nabla \ln p$) that is accurate in $L^2(p)$. Compared to previous works, we do not incur error that grows exponentially in time or that suffers from a curse of dimensionality. Our guarantee works for any smooth distribution and depends polynomially on its log-Sobolev constant. Using our guarantee, we give a theoretical analysis of score-based generative modeling, which transforms white-noise input into samples from a learned data distribution given score estimates at different noise scales. Our analysis gives theoretical grounding to the observation that an annealed procedure is required in practice to generate good samples, as our proof depends essentially on using annealing to obtain a warm start at each step. Moreover, we show that a predictor-corrector algorithm gives better convergence than using either portion alone.

JBHI Journal 2022 Journal Article

Self-Supervised Multi-Modal Hybrid Fusion Network for Brain Tumor Segmentation

  • Feiyi Fang
  • Yazhou Yao
  • Tao Zhou
  • Guosen Xie
  • Jianfeng Lu

Accurate medical image segmentation of brain tumors is necessary for the diagnosing, monitoring, and treating disease. In recent years, with the gradual emergence of multi-sequence magnetic resonance imaging (MRI), multi-modal MRI diagnosis has played an increasingly important role in the early diagnosis of brain tumors by providing complementary information for a given lesion. Different MRI modalities vary significantly in context, as well as in coarse and fine information. As the manual identification of brain tumors is very complicated, it usually requires the lengthy consultation of multiple experts. The automatic segmentation of brain tumors from MRI images can thus greatly reduce the workload of doctors and buy more time for treating patients. In this paper, we propose a multi-modal brain tumor segmentation framework that adopts the hybrid fusion of modality-specific features using a self-supervised learning strategy. The algorithm is based on a fully convolutional neural network. Firstly, we propose a multi-input architecture that learns independent features from multi-modal data, and can be adapted to different numbers of multi-modal inputs. Compared with single-modal multi-channel networks, our model provides a better feature extractor for segmentation tasks, which learns cross-modal information from multi-modal data. Secondly, we propose a new feature fusion scheme, named hybrid attentional fusion. This scheme enables the network to learn the hybrid representation of multiple features and capture the correlation information between them through an attention mechanism. Unlike popular methods, such as feature map concatenation, this scheme focuses on the complementarity between multi-modal data, which can significantly improve the segmentation results of specific regions. Thirdly, we propose a self-supervised learning strategy for brain tumor segmentation tasks. Our experimental results demonstrate the effectiveness of the proposed model against other state-of-the-art multi-modal medical segmentation methods.

ICLR Conference 2021 Conference Paper

Global optimality of softmax policy gradient with single hidden layer neural networks in the mean-field regime

  • Andrea Agazzi
  • Jianfeng Lu

We study the problem of policy optimization for infinite-horizon discounted Markov Decision Processes with softmax policy and nonlinear function approximation trained with policy gradient algorithms. We concentrate on the training dynamics in the mean-field regime, modeling e.g. the behavior of wide single hidden layer neural networks, when exploration is encouraged through entropy regularization. The dynamics of these models is established as a Wasserstein gradient flow of distributions in parameter space. We further prove global optimality of the fixed points of this dynamics under mild conditions on their initialization.

NeurIPS Conference 2021 Conference Paper

On the Representation of Solutions to Elliptic PDEs in Barron Spaces

  • Ziang Chen
  • Jianfeng Lu
  • Yulong Lu

Numerical solutions to high-dimensional partial differential equations (PDEs) based on neural networks have seen exciting developments. This paper derives complexity estimates of the solutions of $d$-dimensional second-order elliptic PDEs in the Barron space, that is a set of functions admitting the integral of certain parametric ridge function against a probability measure on the parameters. We prove under some appropriate assumptions that if the coefficients and the source term of the elliptic PDE lie in Barron spaces, then the solution of the PDE is $\epsilon$-close with respect to the $H^1$ norm to a Barron function. Moreover, we prove dimension-explicit bounds for the Barron norm of this approximate solution, depending at most polynomially on the dimension $d$ of the PDE. As a direct consequence of the complexity estimates, the solution of the PDE can be approximated on any bounded domain by a two-layer neural network with respect to the $H^1$ norm with a dimension-explicit convergence rate.

NeurIPS Conference 2020 Conference Paper

A Universal Approximation Theorem of Deep Neural Networks for Expressing Probability Distributions

  • Yulong Lu
  • Jianfeng Lu

This paper studies the universal approximation property of deep neural networks for representing probability distributions. Given a target distribution $\pi$ and a source distribution $p_z$ both defined on $\mathbb{R}^d$, we prove under some assumptions that there exists a deep neural network $g: \mathbb{R}^d\gt \mathbb{R}$ with ReLU activation such that the push-forward measure $(\nabla g)_\# p_z$ of $p_z$ under the map $\nabla g$ is arbitrarily close to the target measure $\pi$. The closeness are measured by three classes of integral probability metrics between probability distributions: $1$-Wasserstein distance, maximum mean distance (MMD) and kernelized Stein discrepancy (KSD). We prove upper bounds for the size (width and depth) of the deep neural network in terms of the dimension $d$ and the approximation error $\varepsilon$ with respect to the three discrepancies. In particular, the size of neural network can grow exponentially in $d$ when $1$-Wasserstein distance is used as the discrepancy, whereas for both MMD and KSD the size of neural network only depends on $d$ at most polynomially. Our proof relies on convergence estimates of empirical measures under aforementioned discrepancies and semi-discrete optimal transport.

NeurIPS Conference 2020 Conference Paper

MPNet: Masked and Permuted Pre-training for Language Understanding

  • Kaitao Song
  • Xu Tan
  • Tao Qin
  • Jianfeng Lu
  • Tie-Yan Liu

BERT adopts masked language modeling (MLM) for pre-training and is one of the most successful pre-training models. Since BERT neglects dependency among predicted tokens, XLNet introduces permuted language modeling (PLM) for pre-training to address this problem. However, XLNet does not leverage the full position information of a sentence and thus suffers from position discrepancy between pre-training and fine-tuning. In this paper, we propose MPNet, a novel pre-training method that inherits the advantages of BERT and XLNet and avoids their limitations. MPNet leverages the dependency among predicted tokens through permuted language modeling (vs. MLM in BERT), and takes auxiliary position information as input to make the model see a full sentence and thus reducing the position discrepancy (vs. PLM in XLNet). We pre-train MPNet on a large-scale dataset (over 160GB text corpora) and fine-tune on a variety of down-streaming tasks (GLUE, SQuAD, etc). Experimental results show that MPNet outperforms MLM and PLM by a large margin, and achieves better results on these tasks compared with previous state-of-the-art pre-trained methods (e. g. , BERT, XLNet, RoBERTa) under the same model setting. We attach the code in the supplemental materials.

IJCAI Conference 2020 Conference Paper

Neural Machine Translation with Error Correction

  • Kaitao Song
  • Xu Tan
  • Jianfeng Lu

Neural machine translation (NMT) generates the next target token given as input the previous ground truth target tokens during training while the previous generated target tokens during inference, which causes discrepancy between training and inference as well as error propagation, and affects the translation accuracy. In this paper, we introduce an error correction mechanism into NMT, which corrects the error information in the previous generated tokens to better predict the next token. Specifically, we introduce two-stream self-attention from XLNet into NMT decoder, where the query stream is used to predict the next token, and meanwhile the content stream is used to correct the error information from the previous predicted tokens. We leverage scheduled sampling to simulate the prediction errors during training. Experiments on three IWSLT translation datasets and two WMT translation datasets demonstrate that our method achieves improvements over Transformer baseline and scheduled sampling. Further experimental analyses also verify the effectiveness of our proposed error correction mechanism to improve the translation quality.

AAAI Conference 2018 Conference Paper

Kill Two Birds With One Stone: Weakly-Supervised Neural Network for Image Annotation and Tag Refinement

  • Junjie Zhang
  • Qi Wu
  • Jian Zhang
  • Chunhua Shen
  • Jianfeng Lu

The number of social images has exploded by the wide adoption of social networks, and people like to share their comments about them. These comments can be a description of the image, or some objects, attributes, scenes in it, which are normally used as the user-provided tags. However, it is well-known that user-provided tags are incomplete and imprecise to some extent. Directly using them can damage the performance of related applications, such as the image annotation and retrieval. In this paper, we propose to learn an image annotation model and refine the user-provided tags simultaneously in a weakly-supervised manner. The deep neural network is utilized as the image feature learning and backbone annotation model, while visual consistency, semantic dependency, and user-error sparsity are introduced as the constraints at the batch level to alleviate the tag noise. Therefore, our model is highly flexible and stable to handle large-scale image sets. Experimental results on two benchmark datasets indicate that our proposed model achieves the best performance compared to the state-of-the-art methods.

IJCAI Conference 2013 Conference Paper

Instance Selection and Instance Weighting for Cross-Domain Sentiment Classification via PU Learning

  • Rui Xia
  • Xuelei Hu
  • Jianfeng Lu
  • Jian Yang
  • Chengqing Zong

Due to the explosive growth of the Internet online reviews, we can easily collect a large amount of labeled reviews from different domains. But only some of them are beneficial for training a desired target-domain sentiment classifier. Therefore, it is important for us to identify those samples that are the most relevant to the target domain and use them as training data. To address this problem, a novel approach, based on instance selection and instance weighting via PU learning, is proposed. PU learning is used at first to learn an in-target-domain selector, which assigns an in-target-domain probability to each sample in the training set. For instance selection, the samples with higher in-target-domain probability are used as training data; For instance weighting, the calibrated in-target-domain probabilities are used as sampling weights for training an instance-weighted naive Bayes model, based on the principle of maximum weighted likelihood estimation. The experimental results prove the necessity and effectiveness of the approach, especially when the size of training data is large. It is also proved that the larger the Kullback-Leibler divergence between the training and test data is, the more effective the proposed approach will be.