Arrow Research search

Author name cluster

Peng Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

44 papers
2 author rows

Possible papers

44

JBHI Journal 2026 Journal Article

MVGF-DR: Multi-view Graph Feature Fusion Approach for Drug Repositioning

  • Bing Wang
  • Tongxin Wu
  • Jun Zhang
  • Peng Chen
  • Kun Lu

Drug repositioning, exploring new indications for existing drugs, is emerging as a promising approach to accelerate drug discovery and reduce research risk of failure. Recent advances in this topic by applying graph neural networks have enabled researches to achieve significant results by extracting latent features from the original data. However, the previous studies have not fully considered the distinctive information embedded within different construction graphs, which may lead to insufficient classification performance due to the lack of more detailed features. This work therefore proposes a novel approach, namely MVGF DR, which leverages graph network construction and multi view graph feature fusion for drug repositioning. MVGF-DR built a comprehensive graph network from both similarity and association information, i. e. , a similarity graph network is constructed with drug-drug and disease-disease similarities where similarity information are extracted by graph isomorphism networks, and an association graph network with drug-disease associations where drug-disease relationships are explored by graph convolutional networks. Additionally, a maximum value selection strategy is introduced to filter features from different channels for feature fusion and noise reduction. The average AUROC and AUPR achieved by MVGF-DR across the three datasets reached 95. 38% and 51. 20%, respectively, outperforming the other five state-of-the-art models. Multiple experiments further also demonstrated the flexibility and practical applicability of MVGF-DR.

AAAI Conference 2026 Conference Paper

Towards Non-Stationary Time Series Forecasting with Temporal Stabilization and Frequency Differencing

  • Junkai Lu
  • Peng Chen
  • Chenjuan Guo
  • Yang Shu
  • Meng Wang
  • Bin Yang

Time series forecasting is critical for decision making across dynamic domains such as energy, finance, transportation, and cloud computing. However, real-world time series often exhibit non-stationarity, including temporal distribution shifts and spectral variability, which poses significant challenges for existing long-term time series forecasting methods. In this paper, we propose DTAF, a dual-branch framework that addresses non-stationarity in both the temporal and frequency domains. For the temporal domain, the Temporal Stabilizing Fusion (TFS) module employs a non-stationary mix of experts (MOE) filter to disentangle and suppress temporal non-stationary patterns while preserving long-term dependencies. For the frequency domain, the Frequency Wave Modeling (FWM) module applies frequency differencing to dynamically highlight components with significant spectral shifts. By fusing the complementary outputs of TFS and FWM, DTAF generates robust forecasts that adapt to both temporal and frequency domain non-stationarity. Extensive experiments on multiple real-world benchmarks demonstrate that DTAF outperforms state-of-the-art baselines, yielding significant improvements in forecasting accuracy under non-stationary conditions.

AAAI Conference 2026 Conference Paper

ViTCoP: Accelerating Large Vision-Language Models via Visual and Textual Semantic Collaborative Pruning

  • Wen Luo
  • Peng Chen
  • Xiaotao Huang
  • LiQun Huang

Large Vision-Language Models (LVLMs) incur high computational costs due to significant redundancy in their visual tokens. To effectively reduce this cost, researchers have proposed various visual token pruning methods. However, existing methods are generally limited, either losing critical visual information prematurely due to pruning in the vision encoder, or leading to information redundancy among the selected tokens due to pruning in the Large Language Models (LLMs). To address these challenges, we propose a Visual and Textual Semantic Collaborative Pruning framework (ViTCoP) that combines redundancy filtering in the vision encoder with step-wise co-pruning within the LLM based on its hierarchical characteristics, to efficiently preserve critical and informationally diverse visual tokens. Meanwhile, to ensure compatibility with acceleration techniques like FlashAttention, we introduce the L2 norm of K-vectors as the token saliency metric in the LLM. Extensive experiments on various Large Vision-Language Models demonstrate that ViTCoP not only achieves state-of-the-art performance surpassing existing methods on both image and video understanding tasks, but also significantly reduces model inference latency and GPU memory consumption. Notably, its performance advantage over other methods becomes even more pronounced under extreme pruning rates.

IROS Conference 2025 Conference Paper

A Gait Phase Detection and Gait Spatio-temporal Features Extraction Method Based on the Inertial Measurement Unit *

  • Shuai Fan 0011
  • Huiyong Luo
  • Yao Xiao
  • Ye Liang
  • Zelin Su
  • Guangkui Song
  • Peng Chen

The quantitative evaluation of the improvement of physical function is crucial for patients with impaired motor function, such as stroke, in conducting related rehabilitation training activities. Specially, a practical and easy-to-operate gait feature detection and extraction system for a home is urgently needed. In this study, a home gait feature extraction method based on the inertial measurement unit is proposed. The subjects’ walking distance and speed are calculated using the double integral and the number of strides is calculated using the local maximum peak approach, while the stance phase and swing phase are calculated using the local trough approach. The compared result shows that the average walking distance accuracy is about 91. 32 % and the average stride accuracy is about 96. 55%. The proportion of the stance period (59. 01%) and swing period (40. 99%) estimated by the inertial measurement unit is close to the ratio of the two at normal speed. The experimental results demonstrate that the great accuracy of the gait spatio-temporal features is retrieved. The proposed method facilitates gait evaluation in clinics and at home, including the extraction of gait features and real-time evaluation.

JBHI Journal 2025 Journal Article

An Eye Video Oriented and rPPG-Based Intraocular Pressure Detection Method

  • Kun Zheng
  • Xuejia Zhen
  • Boxiang Hu
  • Guang Chen
  • Xinming Peng
  • Jing Lu
  • Peng Chen

Current intraocular pressure (IOP) measurement methods still primarily rely on contact IOP measurement instruments, which are inconvenient for widespread use. This study proposed an innovative method for detecting and classifying IOP using eye videos, based on remote photoplethysmography (rPPG). The IOP-Net model was developed by extracting blood volume pulse (BVP) signals from three regions of interest (ROI)—the pupil, iris and sclera, and training a convolutional neural network (CNN) with four convolutional layers. This model can be used to detect the IOP and determine the classification of normal IOP and high IOP. The root mean square errors (RMSE) on EVIP-1 and EVIP-2 datasets were 3. 14 mmHg and 4. 19 mmHg, respectively. When the ground truth of IOP is more than 30 mmHg, the accuracy of the model in classifying high IOP reaches 80. 25%. The results indicate that this method has promising and potential application for video-based IOP detection and classification.

ECAI Conference 2025 Conference Paper

ASMA-Tune: Unlocking LLMs' Assembly Code Comprehension via Structural-Semantic Instruction Tuning

  • Xinyi Wang
  • Jiashui Wang
  • Jinbo Su
  • Ke Wang
  • Peng Chen
  • Yanming Liu
  • Long Liu
  • Xiang Li

Assembly code analysis and comprehension play critical roles in applications like reverse engineering, yet they face substantial challenges due to low information density and a lack of explicit syntactic structures. While traditional masked language modeling (MLM) approaches do not explicitly focus on natural language interaction, emerging decoder-focused large language models (LLMs) demonstrate partial success in binary analysis yet remain underexplored for holistic comprehension. We present Assembly Augmented Tuning (ASMA-Tune), an end-to-end structural-semantic instruction tuning framework that synergizes encoder architecture with decoder-based LLMs through a projector module, where the assembly encoder extracts hardware-level structural features, the projector bridges representations with the semantic space, and the instruction-tuned LLM preserves natural language capabilities. Experimental results demonstrate three key advantages: (1) State-of-the-art performance in assembly comprehension with +39. 7% Recall@1 and +17. 8% MRR improvements over GPT-4-Turbo, (2) Consistent enhancements across base models (24. 6–107. 4% Recall@1 and 15. 2–106. 3% MRR on Qwen2. 5-Coder, Deepseek-Coder and CodeLlama variants), and (3) Superior instruction-following capabilities (41. 5%–118% improvements) with controlled code generation degradation (–8. 9% to –35% across architectures).

IJCAI Conference 2025 Conference Paper

Backdoor Attack on Vertical Federated Graph Neural Network Learning

  • Jirui Yang
  • Peng Chen
  • Zhihui Lu
  • Jianping Zeng
  • Qiang Duan
  • Xin Du
  • Ruijun Deng

Federated Graph Neural Network (FedGNN) integrate federated learning (FL) with graph neural networks (GNNs) to enable privacy-preserving training on distributed graph data. Vertical Federated Graph Neural Network (VFGNN), a key branch of FedGNN, handles scenarios where data features and labels are distributed among participants. Despite the robust privacy-preserving design of VFGNN, we have found that it still faces the risk of backdoor attacks, even in situations where labels are inaccessible. This paper proposes BVG, a novel backdoor attack method that leverages multi-hop triggers and backdoor retention, requiring only four target-class nodes to execute effective attacks. Experimental results demonstrate that BVG achieves nearly 100% attack success rates across three commonly used datasets and three GNN models, with minimal impact on the main task accuracy. We also evaluated various defense methods, and the BVG method maintained high attack effectiveness even under existing defenses. This finding highlights the need for advanced defense mechanisms to counter sophisticated backdoor attacks in practical VFGNN applications.

IJCAI Conference 2025 Conference Paper

Enhancing Multimodal Protein Function Prediction Through Dual-Branch Dynamic Selection with Reconstructive Pre-Training

  • Xiaoling Luo
  • Peng Chen
  • Chengliang Liu
  • Xiaopeng Jin
  • Jie Wen
  • Yumeng Liu
  • Junsong Wang

Multimodal protein features play a crucial role in protein function prediction. However, these features encompass a wide range of information, ranging from structural data and sequence features to protein attributes and interaction networks, making it challenging to decipher their complex interconnections. In this work, we propose a multimodal protein function prediction method (DSRPGO) by utilizing dynamic selection and reconstructive pre-training mechanisms. To acquire complex protein information, we introduce reconstructive pre-training to mine more fine-grained information with low semantic levels. Moreover, we put forward the Bidirectional Interaction Module (BInM) to facilitate interactive learning among multimodal features. Additionally, to address the difficulty of hierarchical multi-label classification in this task, a Dynamic Selection Module (DSM) is designed to select the feature representation that is most conducive to current protein function prediction. Our proposed DSRPGO model improves significantly in BPO, MFO, and CCO on human datasets, thereby outperforming other benchmark models.

AAAI Conference 2025 Conference Paper

GraphAvatar: Compact Head Avatars with GNN-Generated 3D Gaussians

  • Xiaobao Wei
  • Peng Chen
  • Ming Lu
  • Hui Chen
  • Feng Tian

Rendering photorealistic head avatars from arbitrary viewpoints is crucial for various applications like virtual reality. Although previous methods based on Neural Radiance Fields (NeRF) can achieve impressive results, they lack fidelity and efficiency. Recent methods using 3D Gaussian Splatting (3DGS) have improved rendering quality and real-time performance but still require significant storage overhead. In this paper, we introduce a method called GraphAvatar that utilizes Graph Neural Networks (GNN) to generate 3D Gaussians for the head avatar. Specifically, GraphAvatar trains a geometric GNN and an appearance GNN to generate the attributes of the 3D Gaussians from the tracked mesh. Therefore, our method can store the GNN models instead of the 3D Gaussians, significantly reducing the storage overhead to just 10MB. To reduce the impact of face-tracking errors, we also present a novel graph-guided optimization module to refine face-tracking parameters during training. Finally, we introduce a 3D-aware enhancer for post-processing to enhance the rendering quality. We conduct comprehensive experiments to demonstrate the advantages of GraphAvatar, surpassing existing methods in visual fidelity and storage consumption. The ablation study sheds light on the trade-offs between rendering quality and model size.

IROS Conference 2025 Conference Paper

Hierarchical Decision-Making for Autonomous Navigation: Integrating Deep Reinforcement Learning and Fuzzy Logic in Four-Wheel Independent Steering and Driving Systems

  • Yizhi Wang
  • Degang Xu
  • Yongfang Xie
  • Shuzhong Tan
  • Xianan Zhou
  • Peng Chen

This paper presents a hierarchical decision-making framework for autonomous navigation in four-wheel independent steering and driving (4WISD) systems. The proposed approach integrates deep reinforcement learning (DRL) for high-level navigation with fuzzy logic for low-level control to ensure both task performance and physical feasibility. The DRL agent generates global motion commands, while the fuzzy logic controller enforces kinematic constraints to prevent mechanical strain and wheel slippage. Simulation experiments demonstrate that the proposed framework outperforms traditional navigation methods, offering enhanced training efficiency and stability and mitigating erratic behaviors compared to purely DRL-based solutions. Real-world validations further confirm the framework’s ability to navigate safely and effectively in dynamic industrial settings. Overall, this work provides a scalable and reliable solution for deploying 4WISD mobile robots in complex, real-world scenarios.

IROS Conference 2025 Conference Paper

Industrial-Grade Sensor Simulation via Gaussian Splatting: A Modular Framework for Scalable Editing and Full-Stack Validation

  • Xianming Zeng
  • Sicong Du
  • Qifeng Chen
  • Lizhe Liu
  • Haoyu Shu
  • Jiaxuan Gao
  • Jiarun Liu
  • Jiulong Xu

Sensor simulation is pivotal for scalable validation of autonomous driving systems, yet existing Neural Radiance Fields (NeRF) based methods face applicability and efficiency challenges in industrial workflows. This paper introduces a Gaussian Splatting (GS) based system to address these challenges: We first break down sensor simulator components and analyze the possible advantages of GS over NeRF. Then in practice, we refactor three crucial components through GS, to leverage its explicit scene representation and real-time rendering: (1) choosing the 2D neural Gaussian representation for physics-compliant scene and sensor modeling, (2) proposing a scene editing pipeline to leverage Gaussian primitives library for data augmentation, and (3) coupling a controllable diffusion model for scene expansion and harmonization. We implement this framework on a proprietary autonomous driving dataset supporting cameras and LiDAR sensors. We demonstrate through ablation studies that our approach reduces frame-wise simulation latency, achieves better geometric and photometric consistency, and enables interpretable explicit scene editing and expansion. Furthermore, we showcase how integrating such a GS-based sensor simulator with traffic and dynamic simulators enables full-stack testing of end-to-end autonomy algorithms. Our work provides both algorithmic insights and practical validation, establishing GS as a cornerstone for industrial-grade sensor simulation.

ICLR Conference 2025 Conference Paper

Latent-EnSF: A Latent Ensemble Score Filter for High-Dimensional Data Assimilation with Sparse Observation Data

  • Phillip Si
  • Peng Chen

Accurate modeling and prediction of complex physical systems often rely on data assimilation techniques to correct errors inherent in model simulations. Traditional methods like the Ensemble Kalman Filter (EnKF) and its variants as well as the recently developed Ensemble Score Filters (EnSF) face significant challenges when dealing with high-dimensional and nonlinear Bayesian filtering problems with sparse observations, which are ubiquitous in real-world applications. In this paper, we propose a novel data assimilation method, Latent-EnSF, which leverages EnSF with efficient and consistent latent representations of the full states and sparse observations to address the joint challenges of high dimensionlity in states and high sparsity in observations for nonlinear Bayesian filtering. We introduce a coupled Variational Autoencoder (VAE) with two encoders to encode the full states and sparse observations in a consistent way guaranteed by a latent distribution matching and regularization as well as a consistent state reconstruction. With comparison to several methods, we demonstrate the higher accuracy, faster convergence, and higher efficiency of Latent-EnSF for two challenging applications with complex models in shallow water wave propagation and medium-range weather forecasting, for highly sparse observations in both space and time.

NeurIPS Conference 2025 Conference Paper

Learning Cocoercive Conservative Denoisers via Helmholtz Decomposition for Poisson Imaging Inverse Problems

  • Deliang Wei
  • Peng Chen
  • Haobo Xu
  • Jiale Yao
  • Fang Li
  • Tieyong Zeng

Plug-and-play (PnP) methods with deep denoisers have shown impressive results in imaging problems. They typically require strong convexity or smoothness of the fidelity term and a (residual) non-expansive denoiser for convergence. These assumptions, however, are violated in Poisson inverse problems, and non-expansiveness can hinder denoising performance. To address these challenges, we propose a cocoercive conservative (CoCo) denoiser, which may be (residual) expansive, leading to improved denoising performance. By leveraging the generalized Helmholtz decomposition, we introduce a novel training strategy that combines Hamiltonian regularization to promote conservativeness and spectral regularization to ensure cocoerciveness. We prove that CoCo denoiser is a proximal operator of a weakly convex function, enabling a restoration model with an implicit weakly convex prior. The global convergence of PnP methods to a stationary point of this restoration model is established. Extensive experimental results demonstrate that our approach outperforms closely related methods in both visual quality and quantitative metrics.

NeurIPS Conference 2025 Conference Paper

PDPO: Parametric Density Path Optimization

  • Sebastian Gutierrez Hernandez
  • Peng Chen
  • Hao-Min Zhou

We introduce Parametric Density Path Optimization (PDPO), a novel method for computing action-minimizing paths between probability densities. The core idea is to represent the target probability path as the pushforward of a reference density through a parametric map, transforming the original infinite-dimensional optimization over densities to a finite-dimensional one over the parameters of the map. We derive a static formulation of the dynamic problem of action minimization and propose cubic spline interpolation of the path in parameter space to solve the static problem. Theoretically, we establish an error bound of the action under proper assumptions on the regularity of the parameter path. Empirically, we find that using 3–5 control points of the spline interpolation suffices to accurately resolve both multimodal and high-dimensional problems. We demonstrate that PDPO can flexibly accommodate a wide range of potential terms, including those modeling obstacles, mean-field interactions, stochastic control, and higher-order dynamics. Our method outperforms existing state-of-the-art approaches in benchmark tasks, demonstrating superior computational efficiency and solution quality.

NeurIPS Conference 2025 Conference Paper

Robustifying Learning-Augmented Caching Efficiently without Compromising 1-Consistency

  • Peng Chen
  • Hailiang Zhao
  • Jiaji Zhang
  • Xueyan Tang
  • Yixuan Wang
  • Shuiguang Deng

The online caching problem aims to minimize cache misses when serving a sequence of requests under a limited cache size. While naive learning-augmented caching algorithms achieve ideal $1$-consistency, they lack robustness guarantees. Existing robustification methods either sacrifice $1$-consistency or introduce excessive computational overhead. In this paper, we introduce Guard, a lightweight robustification framework that enhances the robustness of a broad class of learning-augmented caching algorithms to $2H_{k-1} + 2$, while preserving their $1$-consistency. Guard achieves the current best-known trade-off between consistency and robustness, with only $\mathcal{O}(1)$ additional per-request overhead, thereby maintaining the original time complexity of the base algorithm. Extensive experiments across multiple real-world datasets and prediction models validate the effectiveness of Guard in practice.

NeurIPS Conference 2025 Conference Paper

SHF: Symmetrical Hierarchical Forest with Pretrained Vision Transformer Encoder for High-Resolution Medical Segmentation

  • Enzhi Zhang
  • Peng Chen
  • Rui Zhong
  • Du Wu
  • Jun Igarashi
  • Isaac Lyngaas
  • Xiao Wang
  • Masaharu Munetomo

This paper presents a novel approach to addressing the long-sequence problem in high-resolution medical images for Vision Transformers (ViTs). Using smaller patches as tokens can enhance ViT performance, but quadratically increases computation and memory requirements. Therefore, the common practice for applying ViTs to high-resolution images is either to: (a) employ complex sub-quadratic attention schemes or (b) use large to medium-sized patches and rely on additional mechanisms within the model to capture the spatial hierarchy of details. We propose Symmetrical Hierarchical Forest (SHF), a lightweight approach that adaptively patches the input image to increase token information density and encode hierarchical spatial structures into the input embedding. We then apply a reverse depatching scheme to the output embeddings of the transformer encoder, eliminating the need for convolution-based decoders. Unlike previous methods that modify attention mechanisms \wahib{or use a complex hierarchy of interacting models}, SHF can be retrofitted to any ViT model to allow it to learn the hierarchical structure of details in high-resolution images without requiring architectural changes. Experimental results demonstrate significant gains in computational efficiency and performance: on the PAIP WSI dataset, we achieved a 3$\sim$32$\times$ speedup or a 2. 95\% to 7. 03\% increase in accuracy (measured by Dice score) at a $64K^2$ resolution with the same computational budget, compared to state-of-the-art production models. On the 3D medical datasets BTCV and KiTS, training was 6$\times$ faster, with accuracy gains of 6. 93\% and 5. 9\%, respectively, compared to models without SHF.

JMLR Journal 2025 Journal Article

Simplex Constrained Sparse Optimization via Tail Screening

  • Peng Chen
  • Jin Zhu
  • Junxian Zhu
  • Xueqin Wang

We consider the probabilistic simplex-constrained sparse recovery problem. The commonly used Lasso-type penalty for promoting sparsity is ineffective in this context since it is a constant within the simplex. Despite this challenge, fortunately, simplex constraint itself brings a self-regularization property, i.e., the empirical risk minimizer without any sparsity-promoting procedure obtains the usual Lasso-type estimation error. Moreover, we analyze the iterates of a projected gradient descent method and show its convergence to the ground truth sparse solution in the geometric rate until a satisfied statistical precision is attained. Although the estimation error is statistically optimal, the resulting solution is usually more dense than the sparse ground truth. To further sparsify the iterates, we propose a method called PERMITS via embedding a tail screening procedure, i.e., identifying negligible components and discarding them during iterations, into the projected gradient descent method. Furthermore, we combine tail screening and the special information criterion to balance the trade-off between fitness and complexity. Theoretically, the proposed PERMITS method can exactly recover the ground truth support set under mild conditions and thus obtain the oracle property. We demonstrate the statistical and computational efficiency of PERMITS with both synthetic and real data. The implementation of the proposed method can be found in https://github.com/abess-team/PERMITS. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2025. ( edit, beta )

IJCAI Conference 2025 Conference Paper

Universal Backdoor Defense via Label Consistency in Vertical Federated Learning

  • Peng Chen
  • Haolong Xiang
  • Xin Du
  • Xiaolong Xu
  • Xuhao Jiang
  • Zhihui Lu
  • Jirui Yang
  • Qiang Duan

Backdoor attacks in vertical federated learning (VFL) are particularly concerning as they can covertly compromise VFL decision-making, posing a severe threat to critical applications of VFL. Existing defense mechanisms typically involve either label obfuscation during training or model pruning during inference. However, the inherent limitations on the defender's access to the global model and complete training data in VFL environments fundamentally constrain the effectiveness of these conventional methods. To address these limitations, we propose the Universal Backdoor Defense (UBD) framework. UBD leverages Label Consistent Clustering (LCC) to synthesize plausible latent triggers associated with the backdoor class. This synthesized information is then utilized for mitigating backdoor threats through Linear Probing (LP), guided by a constraint on Batch Normalization (BN) statistics. Positioned within a unified VFL backdoor defense paradigm, UBD offers a generalized framework for both detection and mitigation that critically does not necessitate access to the entire model or dataset. Extensive experiments across multiple datasets rigorously demonstrate the efficacy of the UBD framework, achieving state-of-the-art performance against diverse backdoor attack types in VFL, including both dirty-label and clean-label variants.

NeurIPS Conference 2024 Conference Paper

Derivative-enhanced Deep Operator Network

  • Yuan Qiu
  • Nolan Bridges
  • Peng Chen

The deep operator networks (DeepONet), a class of neural operators that learn mappings between function spaces, have recently been developed as surrogate models for parametric partial differential equations (PDEs). In this work we propose a derivative-enhanced deep operator network (DE-DeepONet), which leverages derivative information to enhance the solution prediction accuracy and provides a more accurate approximation of solution-to-parameter derivatives, especially when training data are limited. DE-DeepONet explicitly incorporates linear dimension reduction of high dimensional parameter input into DeepONet to reduce training cost and adds derivative loss in the loss function to reduce the number of required parameter-solution pairs. We further demonstrate that the use of derivative loss can be extended to enhance other neural operators, such as the Fourier neural operator (FNO). Numerical experiments validate the effectiveness of our approach.

ICML Conference 2024 Conference Paper

Learning Pseudo-Contractive Denoisers for Inverse Problems

  • Deliang Wei
  • Peng Chen
  • Fang Li 0004

Deep denoisers have shown excellent performance in solving inverse problems in signal and image processing. In order to guarantee the convergence, the denoiser needs to satisfy some Lipschitz conditions like non-expansiveness. However, enforcing such constraints inevitably compromises recovery performance. This paper introduces a novel training strategy that enforces a weaker constraint on the deep denoiser called pseudo-contractiveness. By studying the spectrum of the Jacobian matrix, relationships between different denoiser assumptions are revealed. Effective algorithms based on gradient descent and Ishikawa process are derived, and further assumptions of strict pseudo-contractiveness yield efficient algorithms using half-quadratic splitting and forward-backward splitting. The proposed algorithms theoretically converge strongly to a fixed point. A training strategy based on holomorphic transformation and functional calculi is proposed to enforce the pseudo-contractive denoiser assumption. Extensive experiments demonstrate superior performance of the pseudo-contractive denoiser compared to related denoisers. The proposed methods are competitive in terms of visual effects and quantitative values.

NeurIPS Conference 2024 Conference Paper

Learning-Augmented Algorithms for the Bahncard Problem

  • Hailiang Zhao
  • Xueyan Tang
  • Peng Chen
  • Shuiguang Deng

In this paper, we study learning-augmented algorithms for the Bahncard problem. The Bahncard problem is a generalization of the ski-rental problem, where a traveler needs to irrevocably and repeatedly decide between a cheap short-term solution and an expensive long-term one with an unknown future. Even though the problem is canonical, only a primal-dual-based learning-augmented algorithm was explicitly designed for it. We develop a new learning-augmented algorithm, named PFSUM, that incorporates both history and short-term future to improve online decision making. We derive the competitive ratio of PFSUM as a function of the prediction error and conduct extensive experiments to show that PFSUM outperforms the primal-dual-based algorithm.

NeurIPS Conference 2024 Conference Paper

LiT: Unifying LiDAR "Languages" with LiDAR Translator

  • Yixing Lao
  • Tao Tang
  • Xiaoyang Wu
  • Peng Chen
  • Kaicheng Yu
  • Hengshuang Zhao

LiDAR data exhibits significant domain gaps due to variations in sensors, vehicles, and driving environments, creating “language barriers” that limit the effective use of data across domains and the scalability of LiDAR perception models. To address these challenges, we introduce the LiDAR Translator (LiT), a framework that directly translates LiDAR data across domains, enabling both cross-domain adaptation and multi-domain joint learning. LiT integrates three key components: a scene modeling module for precise foreground and background reconstruction, a LiDAR modeling module that models LiDAR rays statistically and simulates ray-drop, and a fast, hardware-accelerated ray casting engine. LiT enables state-of-the-art zero-shot and unified domain detection across diverse LiDAR datasets, marking a step toward data-driven domain unification for autonomous driving systems. Source code and demos are available at: https: //yxlao. github. io/lit.

JMLR Journal 2024 Journal Article

skscope: Fast Sparsity-Constrained Optimization in Python

  • Zezhi Wang
  • Junxian Zhu
  • Xueqin Wang
  • Jin Zhu
  • Huiyang Pen
  • Peng Chen
  • Anran Wang
  • Xiaoke Zhang

Applying iterative solvers on sparsity-constrained optimization (SCO) requires tedious mathematical deduction and careful programming/debugging that hinders these solvers' broad impact. In the paper, the library skscope is introduced to overcome such an obstacle. With skscope, users can solve the SCO by just programming the objective function. The convenience of skscope is demonstrated through two examples in the paper, where sparse linear regression and trend filtering are addressed with just four lines of code. More importantly, skscope's efficient implementation allows state-of-the-art solvers to quickly attain the sparse solution regardless of the high dimensionality of parameter space. Numerical experiments reveal the available solvers in skscope can achieve up to 80x speedup on the competing relaxation solutions obtained via the benchmarked convex solver. skscope is published on the Python Package Index (PyPI) and Conda, and its source code is available at: https://github.com/abess-team/skscope. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2024. ( edit, beta )

IJCAI Conference 2024 Conference Paper

Task-Agnostic Self-Distillation for Few-Shot Action Recognition

  • Bin Zhang
  • Yuanjie Dang
  • Peng Chen
  • Ronghua Liang
  • Nan Gao
  • Ruohong Huan
  • Xiaofei He

Task-oriented matching is one of the core aspects of few-shot Action Recognition. Most previous works leverage the metric features within the support and query sets of individual tasks, without considering the metric information across different matching tasks. This oversight represents a significant limitation in this task. Specifically, the task-specific metric feature can decrease the generalization ability and ignore the general matching feature applicable across different tasks. To address these challenges, we propose a novel meta-distillation framework for few-shot action recognition that learns the task-agnostic metric features and generalizes them to different tasks. First, to extract the task-agnostic metric information, we design a task-based self-distillation framework to learn the metric features from the training process progressively. Additionally, to enable the model with fine-grained matching capabilities, we design a multi-dimensional distillation module that extracts more detailed relations from the temporal, spatial, and channel dimensions within video pairs and improves the representative performance of metric features for each individual task. After that, the few-shot predictions can be obtained by feeding the embedded task-agnostic metric features to a common feature matcher. Extensive experimental results on standard datasets demonstrate our method’s superior performance compared to existing state-of-the-art methods.

NeurIPS Conference 2023 Conference Paper

PRODIGY: Enabling In-context Learning Over Graphs

  • Qian Huang
  • Hongyu Ren
  • Peng Chen
  • Gregor Kržmanc
  • Daniel Zeng
  • Percy S. Liang
  • Jure Leskovec

In-context learning is the ability of a pretrained model to adapt to novel and diverse downstream tasks by conditioning on prompt examples, without optimizing any parameters. While large language models have demonstrated this ability, how in-context learning could be performed over graphs is unexplored. In this paper, we develop \textbf{Pr}etraining \textbf{O}ver \textbf{D}iverse \textbf{I}n-Context \textbf{G}raph S\textbf{y}stems (PRODIGY), the first pretraining framework that enables in-context learning over graphs. The key idea of our framework is to formulate in-context learning over graphs with a novel \emph{prompt graph} representation, which connects prompt examples and queries. We then propose a graph neural network architecture over the prompt graph and a corresponding family of in-context pretraining objectives. With PRODIGY, the pretrained model can directly perform novel downstream classification tasks on unseen graphs via in-context learning. We provide empirical evidence of the effectiveness of our framework by showcasing its strong in-context learning performance on tasks involving citation networks and knowledge graphs. Our approach outperforms the in-context learning accuracy of contrastive pretraining baselines with hard-coded adaptation by 18\% on average across all setups. Moreover, it also outperforms standard finetuning with limited data by 33\% on average with in-context learning.

JBHI Journal 2022 Journal Article

Transformer Model for Functional Near-Infrared Spectroscopy Classification

  • Zenghui Wang
  • Jun Zhang
  • Xiaochu Zhang
  • Peng Chen
  • Bing Wang

Functional near-infrared spectroscopy (fNIRS) is a promising neuroimaging technology. The fNIRS classification problem has always been the focus of the brain-computer interface (BCI). Inspired by the success of Transformer based on self-attention mechanism in the fields of natural language processing and computer vision, we propose an fNIRS classification network based on Transformer, named fNIRS-T. We explore the spatial-level and channel-level representation of fNIRS signals to improve data utilization and network representation capacity. Besides, a preprocessing module, which consists of one-dimensional average pooling and layer normalization, is designed to replace filtering and baseline correction of data preprocessing. It makes fNIRS-T an end-to-end network, called fNIRS-PreT. Compared with traditional machine learning classifiers, convolutional neural network (CNN), and long short-term memory (LSTM), the proposed models obtain the best accuracy on three open-access datasets. Specifically, in the most extensive ternary classification task (30 subjects) that includes three types of overt movements, fNIRS-T, CNN, and LSTM obtain 75. 49%, 72. 89%, and 61. 94% on test sets, respectively. Compared to traditional classifiers, fNIRS-T is at least 27. 41% higher than statistical features and 6. 79% higher than well-designed features. In the individual subject experiment of the ternary classification task, fNIRS-T achieves an average subject accuracy of 78. 22% and surpasses CNN and LSTM by a large margin of +4. 75% and +11. 33%. fNIRS-PreT using raw data also achieves competitive performance to fNIRS-T. Therefore, the proposed models improve the performance of fNIRS-based BCI significantly.

ICLR Conference 2021 Conference Paper

Explainable Subgraph Reasoning for Forecasting on Temporal Knowledge Graphs

  • Zhen Han 0003
  • Peng Chen
  • Yunpu Ma
  • Volker Tresp

Modeling time-evolving knowledge graphs (KGs) has recently gained increasing interest. Here, graph representation learning has become the dominant paradigm for link prediction on temporal KGs. However, the embedding-based approaches largely operate in a black-box fashion, lacking the ability to interpret their predictions. This paper provides a link forecasting framework that reasons over query-relevant subgraphs of temporal KGs and jointly models the structural dependencies and the temporal dynamics. Especially, we propose a temporal relational attention mechanism and a novel reverse representation update scheme to guide the extraction of an enclosing subgraph around the query. The subgraph is expanded by an iterative sampling of temporal neighbors and by attention propagation. Our approach provides human-understandable evidence explaining the forecast. We evaluate our model on four benchmark temporal knowledge graphs for the link forecasting task. While being more explainable, our model obtains a relative improvement of up to 20 $\%$ on Hits@1 compared to the previous best temporal KG forecasting method. We also conduct a survey with 53 respondents, and the results show that the evidence extracted by the model for link forecasting is aligned with human understanding.

AAAI Conference 2021 Conference Paper

Locate Globally, Segment Locally: A Progressive Architecture With Knowledge Review Network for Salient Object Detection

  • Binwei Xu
  • Haoran Liang
  • Ronghua Liang
  • Peng Chen

Salient object location and segmentation are two different tasks in salient object detection (SOD). The former aims to globally find the most attractive objects in an image, whereas the latter can be achieved only using local regions that contain salient objects. However, previous methods mainly accomplish the two tasks simultaneously in a simple end-to-end manner, which leads to the ignorance of the differences between them. We assume that the human vision system orderly locates and segments objects, so we propose a novel progressive architecture with knowledge review network (PA-KRN) for SOD. It consists of three parts. (1) A coarse locating module (CLM) that uses body-attention label locates rough areas containing salient objects without boundary details. (2) An attention-based sampler highlights salient object regions with high resolution based on body-attention maps. (3) A fine segmenting module (FSM) finely segments salient objects. The networks applied in CLM and FSM are mainly based on our proposed knowledge review network (KRN) that utilizes the finest feature maps to reintegrate all previous layers, which can make up for the important information that is continuously diluted in the top-down path. Experiments on five benchmarks demonstrate that our single KRN can outperform state-of-the-art methods. Furthermore, our PA-KRN performs better and substantially surpasses the aforementioned methods.

AAAI Conference 2021 Conference Paper

SA-BNN: State-Aware Binary Neural Network

  • Chunlei Liu
  • Peng Chen
  • Bohan Zhuang
  • Chunhua Shen
  • Baochang Zhang
  • Wenrui Ding

Binary Neural Networks (BNNs) have received significant attention due to the memory and computation efficiency recently. However, the considerable accuracy gap between BNNs and their full-precision counterparts hinders BNNs to be deployed to resource-constrained platforms. One of the main reasons for the performance gap can be attributed to the frequent weight flip, which is caused by the misleading weight update in BNNs. To address this issue, we propose a state-aware binary neural network (SA-BNN) equipped with the well designed stateaware gradient. Our SA-BNN is inspired by the observation that the frequent weight flip is more likely to occur, when the gradient magnitude for all quantization states {−1, 1} is identical. Accordingly, we propose to employ independent gradient coefficients for different states when updating the weights. Furthermore, we also analyze the effectiveness of the state-aware gradient on suppressing the frequent weight flip problem. Experiments on ImageNet show that the proposed SA-BNN outperforms the current state-of-the-arts (e. g. , Bi-Real Net) by more than 3% when using a ResNet architecture. Specifically, we achieve 61. 7%, 65. 5% and 68. 7% Top-1 accuracy with ResNet-18, ResNet-34 and ResNet-50 on ImageNet, respectively.

JBHI Journal 2020 Journal Article

A Deep Learning-Based Chemical System for QSAR Prediction

  • ShanShan Hu
  • Peng Chen
  • Pengying Gu
  • Bing Wang

Research on quantitative structure-activity relationships (QSAR) provides an effective approach to determine new hits and promising lead compounds during drug discovery. In the past decades, various works have gained good performance for QSAR with the development of machine learning. The rise of deep learning, along with massive accessible chemical databases, made improvement on the QSAR performance. This article proposes a novel deep-learning-based method to implement QSAR prediction by the concatenation of end-to-end encoder-decoder model and convolutional neural network (CNN) architecture. The encoder-decoder model is mainly used to generate fixed-size latent features to represent chemical molecules; while these features are then input into CNN framework to train a robust and stable model and finally to predict active chemicals. Two models with different schemes are investigated to evaluate the validity of our proposed model on the same data sets. Experimental results showed that our proposed method outperforms other state-of-the-art methods in successful identification of chemical molecule whether it is active.

NeurIPS Conference 2020 Conference Paper

Projected Stein Variational Gradient Descent

  • Peng Chen
  • Omar Ghattas

The curse of dimensionality is a longstanding challenge in Bayesian inference in high dimensions. In this work, we propose a {projected Stein variational gradient descent} (pSVGD) method to overcome this challenge by exploiting the fundamental property of intrinsic low dimensionality of the data informed subspace stemming from ill-posedness of such problems. We adaptively construct the subspace using a gradient information matrix of the log-likelihood, and apply pSVGD to the much lower-dimensional coefficients of the parameter projection. The method is demonstrated to be more accurate and efficient than SVGD. It is also shown to be more scalable with respect to the number of parameters, samples, data points, and processor cores via experiments with parameters dimensions ranging from the hundreds to the tens of thousands.

IJCAI Conference 2019 Conference Paper

Knowledgeable Storyteller: A Commonsense-Driven Generative Model for Visual Storytelling

  • Pengcheng Yang
  • Fuli Luo
  • Peng Chen
  • Lei Li
  • Zhiyi Yin
  • Xiaodong He
  • Xu Sun

The visual storytelling (VST) task aims at generating a reasonable and coherent paragraph-level story with the image stream as input. Different from caption that is a direct and literal description of image content, the story in the VST task tends to contain plenty of imaginary concepts that do not appear in the image. This requires the AI agent to reason and associate with the imaginary concepts based on implicit commonsense knowledge to generate a reasonable story describing the image stream. Therefore, in this work, we present a commonsense-driven generative model, which aims to introduce crucial commonsense from the external knowledge base for visual storytelling. Our approach first extracts a set of candidate knowledge graphs from the knowledge base. Then, an elaborately designed vision-aware directional encoding schema is adopted to effectively integrate the most informative commonsense. Besides, we strive to maximize the semantic similarity within the output during decoding to enhance the coherence of the generated text. Results show that our approach can outperform the state-of-the-art systems by a large margin, which achieves a 29\% relative improvement of CIDEr score. With additional commonsense and semantic-relevance based objective, the generated stories are more diverse and coherent.

NeurIPS Conference 2019 Conference Paper

Projected Stein Variational Newton: A Fast and Scalable Bayesian Inference Method in High Dimensions

  • Peng Chen
  • Keyi Wu
  • Joshua Chen
  • Tom O'Leary-Roseberry
  • Omar Ghattas

We propose a projected Stein variational Newton (pSVN) method for high-dimensional Bayesian inference. To address the curse of dimensionality, we exploit the intrinsic low-dimensional geometric structure of the posterior distribution in the high-dimensional parameter space via its Hessian (of the log posterior) operator and perform a parallel update of the parameter samples projected into a low-dimensional subspace by an SVN method. The subspace is adaptively constructed using the eigenvectors of the averaged Hessian at the current samples. We demonstrate fast convergence of the proposed method, complexity independent of the parameter and sample dimensions, and parallel scalability.

IROS Conference 2018 Conference Paper

Design of a 2 Motor 2 Degrees-of-Freedom Coupled Tendon-driven Joint Module

  • Wenyang Li
  • Peng Chen
  • Dianchun Bai
  • Xiaoxiao Zhu
  • Shunta Togo
  • Hiroshi Yokoi
  • Yinlai Jiang

A 2 motor 2 degrees-of-freedom (2M2D) coupled tendon driven joint module is proposed as a basic component for robot arms. Torque reallocation via tendon coupling can enhance the output torque of one single joint. According to the motor position, the joint module is classified into four types: the externally-actuated structure, the internally-coaxially-actuated structure, the internally-separately-actuated structure, and the hybrid-actuated structure. The four structures are analyzed and compared, and their implementation design examples are given. Experiments comparing the proposed joint module with directly-actuated traditional joint suggested that the 2M2D coupled tendon-driven joint module can obtain high control accuracy, and the torque reallocation via tendon coupling is effective to improve output torque. Additionally, an anthropomorphic robot arm with low weight and high payload was developed to show the utility of the proposed joint module.

AAAI Conference 2017 Conference Paper

Building Task-Oriented Dialogue Systems for Online Shopping

  • Zhao Yan
  • Nan Duan
  • Peng Chen
  • Ming Zhou
  • Jianshe Zhou
  • Zhoujun Li

We present a general solution towards building task-oriented dialogue systems for online shopping, aiming to assist online customers in completing various purchase-related tasks, such as searching products and answering questions, in a natural language conversation manner. As a pioneering work, we show what & how existing natural language processing techniques, data resources, and crowdsourcing can be leveraged to build such task-oriented dialogue systems for E-commerce usage. To demonstrate its effectiveness, we integrate our system into a mobile online shopping application. To the best of our knowledge, this is the first time that an dialogue system in Chinese is practically used in online shopping scenario with millions of real consumers. Interesting and insightful observations are shown in the experimental part, based on the analysis of human-bot conversation log. Several current challenges are also pointed out as our future directions.