Arrow Research search

Author name cluster

Peng Xu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

47 papers
2 author rows

Possible papers

47

JBHI Journal 2026 Journal Article

BLADE: Breast Lesion Analysis with Domain Expertise for DCE-MRI Diagnosis

  • Zhitao Wei
  • Yi Dai
  • Yanting Liang
  • Chinting Wong
  • Yanfen Cui
  • Xiaobo Chen
  • Zhihe Zhao
  • Xiaodong Zheng

Dynamic Contrast-Enhanced Magnetic Reso nance Imaging (DCE-MRI) is pivotal in breast cancer diag nosis, yet radiologists face challenges in interpreting its complex data due to the lack of robust automated tools. Current lesion diagnosis systems struggle with limited datasets and insufficient integration of domain knowledge. To overcome these limitations, we propose Breast Lesion Analysis with DomainExpertise(BLADE), anoveldiagnosis framework that synergizes deep learning with clinical ex pertise. BLADE leverages a pre-trained vertical foundation model (optimized via Momentum Contrast on 2. 1 million MRI slices) as its encoder, ensuring robust feature extraction. Crucially, the system incorporates prior multi-phasic hemodynamic knowledge to emulate radiologists' diagnos tic reasoning and introduces a Breast Imaging Reporting and Data System (BI-RADS)-based constraint during training to align predictions with clinical standards. Extensive experiments demonstrate that BLADE outperforms state of-the-art methods, achieving an Area Under the Curve (AUC) of 0. 9228 and 0. 9553 on two external test datasets, respectively. Notably, BLADE significantly enhances clin ical workflow; when used as an assistive tool, BLADE improves diagnostic accuracy by 14. 31%, surpassing stan daloneperformanceofclinicians. This workbridgesthegap between AI-driven analysis and clinical practice in breast MRI interpretation. The source code is available at https://github.com/GDPHMediaLab/BLADE.

JBHI Journal 2026 Journal Article

Decoding Decision-Making and Feedback Interactions: Insights From EEG Activation Network

  • Xucheng Liu
  • Lu Shen
  • Ze Wang
  • Wei Tao
  • Shun Liu
  • Fali Li
  • Peng Xu
  • Tzyy-Ping Jung

The interaction of the brain’s decision-making and feedback stages is crucial for guiding human behavior. Previous studies mainly focused on the interaction immediately after the feedback, resulting in a limited understanding of brain communication dynamics during the interaction process. This study examined the communication dynamics of the brain network during decision-feedback interaction under various feedback conditions by employing a newly developed activation network approach to reveal its underlying neural mechanism. Thirty participants completed a decision-feedback task that involved a sequence of cue-induced predictions with highly predictable, somewhat predictable, and unpredictable feedback conditions. We constructed the activation network for all experimental stages using source-level EEG data in the alpha band. Notably, the brain exhibited the highest communication efficiency ( $p < 0. 05$ ) in receiving and integrating feedback with decision-making information during the feedback stage. Furthermore, the network-behavior correlations indicated that the brain tends to evaluate unexpected feedback under highly predictable conditions and expected feedback under unpredictable conditions, suggesting distinct neural strategies of the decision-feedback interaction process. Finally, we decoded the optimization process of decision-feedback interaction across the entire task. Although network correlations between the decision and feedback stages decreased over time (high predictable: $r = -0. 447$, $p = 0. 001$; unpredictable: $r = -0. 305$, $p = 0. 032$ ), classification accuracy significantly improved ( ${r = -0. 448}$, $p = 0. 010$, best accuracy: 86. 667% ) under the highly predictable condition, corresponding with enhanced prediction behavior. These results indicate the optimization process of the cognitive resources allocation that supports more efficient interaction and improved predictive performance. Our findings advance the understanding of the mechanisms of decision-feedback interaction.

AAAI Conference 2026 Conference Paper

KCLNet: Electrically Equivalence-Oriented Graph Representation Learning for Analog Circuits

  • Peng Xu
  • Yapeng Li
  • Tinghuan Chen
  • Tsung-Yi Ho
  • Bei Yu

Digital circuit representation learning has made remarkable progress in electronic design automation, effectively supporting critical tasks such as testability analysis and logic reasoning. However, representation learning for analog circuits remains challenging due to their continuous electrical characteristics compared to the discrete states of digital circuits. This paper presents a direct current (DC) electrically equivalent-oriented analog representation learning framework, named KCLNet. We will open-source the dataset and code upon publication. It comprises an asynchronous graph neural network structure with electrically-simulated message passing and a representation learning method inspired by Kirchhoff's Current Law (KCL). This method maintains the orderliness of the circuit embedding space by enforcing the equality of the sum of outgoing and incoming current embeddings at each node, which significantly enhances the generalization ability of circuit embeddings. KCLNet offers a novel and effective solution for analog circuit representation learning with electrical constraints preserved. Experimental results demonstrate that our method achieves significant performance in a variety of downstream tasks, e.g., analog circuit classification, subcircuit detection, and circuit edit distance prediction.

JBHI Journal 2026 Journal Article

Multi-Site rs-fMRI Domain Alignment for Autism Spectrum Disorder Auxiliary Diagnosis Based on Hyperbolic Space

  • Yiqian Luo
  • Qiurong Chen
  • Fali Li
  • Peng Xu
  • Yangsong Zhang

Increasing the volume of training data can enable the auxiliary diagnostic algorithms for Autism Spectrum Disorder (ASD) to learn more accurate and stable models. However, due to the significant heterogeneity and domain shift in rs-fMRI data across different sites, the accuracy of auxiliary diagnosis remains unsatisfactory. Moreover, there has been limited exploration of multi-source domain adaptation models on ASD recognition, and many existing models lack inherent interpretability, as they do not explicitly incorporate prior neurobiological knowledge such as the hierarchical structure of functional brain networks. To address these challenges, we proposed a domain-adaptive algorithm based on hyperbolic space embedding. Hyperbolic space is naturally suited for representing the topology of complex networks such as brain functional networks. Therefore, we embedded the brain functional network into hyperbolic space and constructed the corresponding hyperbolic space community network to effectively extract latent representations. To address the heterogeneity of data across different sites and the issue of domain shift, we introduce a constraint loss function, Hyperbolic Maximum Mean Discrepancy (HMMD), to align the marginal distributions in the hyperbolic space. Additionally, we employ class prototype alignment to mitigate discrepancies in conditional distributions across domains. Experimental results indicate that the proposed algorithm achieves superior classification performance for ASD compared to baseline models, with improved robustness to multi-site heterogeneity. Specifically, our method achieves an average accuracy improvement of 4. 03%. Moreover, its generalization capability is further validated through experiments conducted on extra Major Depressive Disorder (MDD) datasets.

JBHI Journal 2026 Journal Article

Nonparametric Dynamic Granger Causality based on Multi-Space Spectrum Fusion for Time-varying Directed Brain Network Construction

  • Chanlin Yi
  • Jiamin Zhang
  • Zihan Weng
  • Wanjun Chen
  • Dezhong Yao
  • Fali Li
  • Zehong Cao
  • Peiyang Li

Nonparametric estimation of time-varying directed networks can unveil the intricate transient organization of directed brain communication while circumventing constraints imposed by prescribed model driven methods. A robust time-frequency representation – the foundation of its causality inference – is critical for enhancing its reliability. This study proposed a novel method, i. e. , nonparametric dynamic Granger causality based on Multi-space Spectrum Fusion (ndGCMSF), which integrates complementary spectrum information from different spaces to generate enhanced spectral representations to estimate dynamic causalities across brain regions. Systematic simulations and validations demonstrate that ndGCMSF exhibits superior noise resistance and a powerful ability to capture subtle dynamic changes in directed brain networks. Particularly, ndGCMSF revealed that during motor imagery, the laterality in the hemisphere ipsilateral to the hemiplegic limb emerges upon task beginning and diminishes upon task accomplishment. These intrinsic variations further provide features for assessing motor functions. The ndGCMSF offers powerful functional patterns to derive effective brain networks in dynamically changing operational settings and contributes to extensive areas involving dynamical and directed communications.

JBHI Journal 2026 Journal Article

OrthoDetNet: An Enhanced YOLO-Based Framework for Detection of Orthopedic Surgical Instruments

  • Peng Xu
  • Guangquan Zhou
  • Mengxing Liu
  • Chu Guo
  • Yixuan Qiu
  • Yang Chen
  • Ping Zhou

Accurate detection of surgical instruments is critical for both routine surgical procedures and surgical robotics research. To the best of our knowledge, there is a notable lack of datasets and dedicated detection studies specifically addressing orthopedic surgical instruments. Detecting orthopedic surgical instruments presents particular challenges including significant size variations, highly similar shapes, and frequent, severe occlusions due to instrument intersections. To address these issues, we propose an orthopedic surgical instrument detection method (OrthoDetNet) incorporating three specialized modules. The FilterUnit mitigates occlusion effects via an adaptive feature filtering mechanism, that dynamically adjusts its filtering strategy based on context, prioritizing features from key regions while suppressing distracting interference features. The DEUnit enhances fine-grained feature discrimination in local regions to distinguish instruments with high shape similarity, and the BDFusion module improves multi-scale detection performance through bi-directional feature fusion between deep and shallow-level feature maps. A dataset for orthopedic surgical instrument detection is created, which is based on the proximal femoral nail antirotation (PFNA) instrument package manufactured by Shenzhen Mindray Bio-Medical Electronics Co. , Ltd. Images were captured in a controlled, simulated experimental environment, ensuring no patient privacy or ethical concerns. We obtained explicit authorization from the manufacturer for instrument use. Experimental results on this dataset demonstrate the effectiveness of the OrthoDetNet and its constituent modules.

AAAI Conference 2026 Conference Paper

SDNet: LiDAR Semantic Scene Completion with Sparse-Dense Fusion and Input-Aware Label Refinement

  • Tingming Bai
  • Zhiyu Xiang
  • Peng Xu
  • Tianyu Pu
  • Kai Wang
  • Eryun Liu

LiDAR Semantic Scene Completion (SSC) in autonomous driving requires predicting both dense occupancy and semantic labels from sparse input point cloud. Existing methods typically adopt cascaded architecture for feature dilation and semantic abstraction, which blurs distinctive geometric patterns and reduces feature discriminability. Moreover, given an input, conventional processing of the ground truth labels overlooks voxel predictability in the target, resulting in ill-posed supervision and discards informative voxels. To address these limitations, we propose Sparse-Dense Net (SDNet), a dual-branch architecture that processes the input points through parallel sparse and dense encoders. The complementary features are aligned and fused using a Sparse Dense Feature Fusion (SDFF) module and further refined by a Feature Propagation (FP) module. Additionally, we introduce an input-aware label refinement strategy, including Sparse-Guided Filtering (SGF) to filter unpredictable targets and Ignored Voxel Recycling (IVR) to leverage informative ignored voxels for auxiliary supervision. These innovations enhance both feature learning and label quality. Extensive experiments on SemanticKITTI and nuScenes OpenOccupancy datasets validate the effectiveness of our approach, with SDNet achieving state-of-the-art performance on both datasets and ranking 1st on the official SemanticKITTI benchmark with 42.1 mIoU, outperforming the previous best by 4.2 (+11.1\%).

JBHI Journal 2026 Journal Article

Vision Sensing-Driven Intelligent Ocular Disease Detection Using Conformer-Based Dual Fusion

  • Zhiwei Guo
  • Qin Zhang
  • Peng Xu
  • Yu Shen
  • Chinmay Chakraborty
  • Osama Alfarraj
  • Keping Yu

The deep vision sensing has been a practical tool in early disease detection, and this work aims at an important branch of ocular disease recognition. Although a number of researchers had paid attention to it during past years, fine-grained ocular feature extraction always remains a challenge. To handle with this issue, this work benefits from comprehensive ability of the convolution-Transformer structure (Conformer), and proposes vision sensing-driven intelligent ocular disease detection using conformer-based dual fusion. On the one hand, the proposal combines technical advantages of convolution and visual Transformer to more accurately fuse local subtle features and global representation information in images. On the other hand, the proposal significantly improves accuracy and robustness of the model by optimizing depth and width. Simulation experiments on real-world ocular disease image datasets show that the proposed model exhibits higher performance in ocular disease detection compared to other methods. Numerical results show that it improves the detection accuracy by 1% to 3. 7% compared to several mainstream baseline methods. This research result not only promotes the development of ocular disease detection, but also provides more reliable technical support for accurate diagnosis of ophthalmic diseases.

NeurIPS Conference 2025 Conference Paper

AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning

  • Yang Chen
  • Zhuolin Yang
  • Zihan Liu
  • Chankyu Lee
  • Peng Xu
  • Mohammad Shoeybi
  • Bryan Catanzaro
  • Wei Ping

Despite recent progress in large-scale reinforcement learning (RL) for reasoning, the training recipe for building high-performing reasoning models remains elusive. Key implementation details of frontier models, such as DeepSeek-R1, including data curation strategies and RL training recipe, are often omitted. Moreover, recent research indicates distillation remains more effective than RL for smaller models. In this work, we demonstrate that large-scale RL can significantly enhance the reasoning capabilities of strong, small- and mid-sized models, achieving results that surpass those of state-of-the-art distillation-based models. We systematically study the RL training process through extensive ablations and propose a simple yet effective approach: first training on math-only prompts, then on code-only prompts. Notably, we find that math-only RL not only significantly enhances the performance of strong distilled models on math benchmarks (e. g. , +14. 6\% / +17. 2\% on AIME 2025 for the 7B / 14B models), but also code reasoning tasks (e. g. , +6. 8\% / +5. 8\% on LiveCodeBench for the 7B / 14B models). In addition, extended code-only RL further improves code benchmark performance while causing minimal degradation in math results. We develop a robust data curation pipeline to collect challenging prompts with high-quality, verifiable answers and test cases to enable verification-based RL across both domains. The dataset will be released to support open research. Finally, we identify key experimental insights, including curriculum learning with progressively increasing response lengths and the stabilizing effect of on-policy parameter updates. We find that RL not only elicits the foundational reasoning capabilities acquired during pretraining and supervised fine-tuning (SFT), but also pushes the limits of the model’s reasoning ability, enabling it to solve problems that were previously unsolvable.

ECAI Conference 2025 Conference Paper

PRACTISED: Pattern-Based Representation Alignment for Cross-Domain Time Series Anomaly Detection

  • Anyi Zhang
  • Peng Xu

Anomaly detection in multivariate time series is particularly challenging under heterogeneous domain settings, where data characteristics vary significantly across domains. While prior methods have primarily focused on homogeneous adaptation, the problem of learning transferable anomaly semantics across heterogeneous domains has received limited attention. In this situation, we propose a pattern-based representation alignment for cross-domain time series anomaly detection (PRACTISED). We introduce the definition of pattern as an abstract semantic unit that captures diverse types of anomalies across heterogeneous domains. Instead of aligning input characteristics directly, PRACTISED learns domain-invariant semantic representations guided by transferable patterns, enabling more robust and generalizable anomaly detection in cross domain. Extensive experiments on both homogeneous and heterogeneous MTS-AD datasets demonstrate that PRACTISED not only maintains strong performance under homogeneous settings, but more importantly, significantly outperforms existing methods in heterogeneous transfer scenarios. This work contributes toward constructing the large-scale pre-train dataset by aligning multiple dataset from different domain, thus to address the critical challenge in building large-scale pretrained anomaly detection models.

AAAI Conference 2025 Conference Paper

SCKD: Semi-Supervised Cross-Modality Knowledge Distillation for 4D Radar Object Detection

  • Ruoyu Xu
  • Zhiyu Xiang
  • Chenwei Zhang
  • Hanzhi Zhong
  • Xijun Zhao
  • Ruina Dang
  • Peng Xu
  • Tianyu Pu

3D object detection is one of the fundamental perception tasks for autonomous vehicles. Fulfilling such a task with a 4D millimeter-wave radar is very attractive since the sensor is able to acquire 3D point clouds similar to Lidar while maintaining robust measurements under adverse weather. However, due to the high sparsity and noise associated with the radar point clouds, the performance of the existing methods is still much lower than expected. In this paper, we propose a novel Semi-supervised Cross-modality Knowledge Distillation (SCKD) method for 4D radar-based 3D object detection. It characterizes the capability of learning the feature from a Lidar-radar-fused teacher network with semi-supervised distillation. We first propose an adaptive fusion module in the teacher network to boost its performance. Then, two feature distillation modules are designed to facilitate the cross-modality knowledge transfer. Finally, a semi-supervised output distillation is proposed to increase the effectiveness and flexibility of the distillation framework. With the same network structure, our radar-only student trained by SCKD boosts the mAP by 10.38% over the baseline and outperforms the state-of-the-art works on the VoD dataset. The experiment on ZJUODset also shows 5.12% mAP improvements on the moderate difficulty level over the baseline when extra unlabeled data are available.

NeurIPS Conference 2024 Conference Paper

AMOR: A Recipe for Building Adaptable Modular Knowledge Agents Through Process Feedback

  • Jian Guan
  • Wei Wu
  • Zujie Wen
  • Peng Xu
  • Hongning Wang
  • Minlie Huang

The notable success of large language models (LLMs) has sparked an upsurge in building language agents to complete various complex tasks. We present AMOR, an agent framework based on open-source LLMs, which reasons with external knowledge bases and adapts to specific domains through human supervision to the reasoning process. AMOR builds reasoning logic over a finite state machine (FSM)that solves problems through autonomous executions and transitions over disentangled modules. This allows humans to provide direct feedback to the individual modules, and thus naturally forms process supervision. Based on this reasoning and feedback framework, we develop AMOR through two-stage fine-tuning: warm-up and adaptation. The former fine-tunes the LLM with examples automatically constructed from various public datasets, enabling AMOR to generalize across different knowledge environments, while the latter tailors AMOR to specific domains using process feedback. Extensive experiments across multiple domains demonstrate the advantage of AMOR to strong baselines, thanks to its FSM-based reasoning and process feedback mechanism. The code and data are publicly available athttps: //github. com/JianGuanTHU/AMOR.

NeurIPS Conference 2024 Conference Paper

ChatQA: Surpassing GPT-4 on Conversational QA and RAG

  • Zihan Liu
  • Wei Ping
  • Rajarshi Roy
  • Peng Xu
  • Chankyu Lee
  • Mohammad Shoeybi
  • Bryan Catanzaro

In this work, we introduce ChatQA, a suite of models that outperform GPT-4 on retrieval-augmented generation (RAG) and conversational question answering (QA). To enhance generation, we propose a two-stage instruction tuning method that significantly boosts the performance of RAG. For effective retrieval, we introduce a dense retriever optimized for conversational QA, which yields results comparable to the alternative state-of-the-art query rewriting models, while substantially reducing deployment costs. We also present the ChatRAG Bench, which encompasses ten datasets covering comprehensive evaluations on RAG, table-related QA, arithmetic calculations, and scenarios involving unanswerable questions. Our ChatQA-1. 0-70B (score: 54. 14), built on Llama2, a weaker foundation model than GPT-4, can slightly outperform GPT-4-0613 (score: 53. 90) and GPT-4-Turbo-2024-04-09 (score: 54. 03) on the ChatRAG Bench, without relying on any synthetic data from OpenAI GPT models. Notably, Llama3-ChatQA-1. 5-70B model surpasses the accuracy of GPT-4-Turbo-2024-04-09 by a margin. These results demonstrate the exceptional quality of the proposed ChatQA recipe. To advance research in this field, we open-sourced the model weights, instruction tuning data, ChatRAG Bench, and retriever for the community.

JBHI Journal 2024 Journal Article

CroMAM: A Cross-Magnification Attention Feature Fusion Model for Predicting Genetic Status and Survival of Gliomas Using Histological Images

  • Jisen Guo
  • Peng Xu
  • Yuankui Wu
  • Yunyun Tao
  • Chu Han
  • Jiatai Lin
  • Ke Zhao
  • Zaiyi Liu

Predicting the gene mutation status in whole slide images (WSIs) is crucial for the clinical treatment, cancer management, and research of gliomas. With advancements in CNN and Transformer algorithms, several promising models have been proposed. However, existing studies have paid little attention on fusing multi-magnification information, and the model requires processing all patches from a whole slide image. In this paper, we propose a cross-magnification attention model called CroMAM for predicting the genetic status and survival of gliomas. The CroMAM first utilizes a systematic patch extraction module to sample a subset of representative patches for downstream analysis. Next, the CroMAM applies Swin Transformer to extract local and global features from patches at different magnifications, followed by acquiring high-level features and dependencies among single-magnification patches through the application of a Vision Transformer. Subsequently, the CroMAM exchanges the integrated feature representations of different magnifications and encourage the integrated feature representations to learn the discriminative information from other magnification. Additionally, we design a cross-magnification attention analysis method to examine the effect of cross-magnification attention quantitatively and qualitatively which increases the model's explainability. To validate the performance of the model, we compare the proposed model with other multi-magnification feature fusion models on three tasks in two datasets. Extensive experiments demonstrate that the proposed model achieves state-of-the-art performance in predicting the genetic status and survival of gliomas.

JBHI Journal 2024 Journal Article

Drug-Target Prediction Based on Dynamic Heterogeneous Graph Convolutional Network

  • Peng Xu
  • Zhitao Wei
  • Chuchu Li
  • Jiaqi Yuan
  • Zaiyi Liu
  • Wenbin Liu

Novel drug-target interaction (DTI) prediction is crucial in drug discovery and repositioning. Recently, graph neural network (GNN) has shown promising results in identifying DTI by using thresholds to construct heterogeneous graphs. However, an empirically selected threshold can lead to loss of valuable information, especially in sparse networks, a common scenario in DTI prediction. To make full use of insufficient information, we propose a DTI prediction model based on Dynamic Heterogeneous Graph (DT-DHG). And progressive learning is introduced to adjust the receptive fields of node. The experimental results show that our method significantly improves the performance of the original GNNs and is robust against the choices of backbones. Meanwhile, DT-DHG outperforms the state-of-the-art methods and effectively predicts novel DTIs.

JBHI Journal 2024 Journal Article

Nonparametric Dynamic Granger Causality based on Multi-Space Spectrum Fusion for Time-varying Directed Brain Network Construction

  • Chanlin Yi
  • Jiamin Zhang
  • Zihan Weng
  • Wanjun Chen
  • Dezhong Yao
  • Fali Li
  • Zehong Cao
  • Peiyang Li

Nonparametric estimation of time-varying directed networks can unveil the intricate transient organization of directed brain communication while circumventing constraints imposed by prescribed model-driven methods. A robust time-frequency representation – the foundation of its causality inference – is critical for enhancing its reliability. This study proposed a novel method, i. e. , nonparametric dynamic Granger causality based on Multi-space Spectrum Fusion (ndGCMSF), which integrates complementary spectrum information from different spaces to generate reliable spectral representations to estimate dynamic causalities across brain regions. Systematic simulations and validations demonstrate that ndGCMSF exhibits superior noise resistance and a powerful ability to capture subtle dynamic changes in directed brain networks. Particularly, ndGCMSF revealed that during instruction response movements, the laterality in the hemisphere ipsilateral to the hemiplegic limb emerges upon instruction onset and diminishes upon task accomplishment. These intrinsic variations further provide reliable features for distinguishing two types of hemiplegia (left vs. right) and assessing motor functions. The ndGCMSF offers powerful functional patterns to derive effective brain networks in dynamically changing operational settings and contributes to extensive areas involving dynamical and directed communications.

AAAI Conference 2024 Conference Paper

p-Laplacian Adaptation for Generative Pre-trained Vision-Language Models

  • Haoyuan Wu
  • Xinyun Zhang
  • Peng Xu
  • Peiyu Liao
  • Xufeng Yao
  • Bei Yu

Vision-Language models (VLMs) pre-trained on large corpora have demonstrated notable success across a range of downstream tasks. In light of the rapidly increasing size of pre-trained VLMs, parameter-efficient transfer learning (PETL) has garnered attention as a viable alternative to full fine-tuning. One such approach is the adapter, which introduces a few trainable parameters into the pre-trained models while preserving the original parameters during adaptation. In this paper, we present a novel modeling framework that recasts adapter tuning after attention as a graph message passing process on attention graphs, where the projected query and value features and attention matrix constitute the node features and the graph adjacency matrix, respectively. Within this framework, tuning adapters in VLMs necessitates handling heterophilic graphs, owing to the disparity between the projected query and value space. To address this challenge, we propose a new adapter architecture, p-adapter, which employs p-Laplacian message passing in Graph Neural Networks (GNNs). Specifically, the attention weights are re-normalized based on the features, and the features are then aggregated using the calibrated attention matrix, enabling the dynamic exploitation of information with varying frequencies in the heterophilic attention graphs. We conduct extensive experiments on different pre-trained VLMs and multi-modal tasks, including visual question answering, visual entailment, and image captioning. The experimental results validate our method's significant superiority over other PETL methods. Our code is available at https://github.com/wuhy68/p-Adapter/.

ICML Conference 2023 Conference Paper

Feature Expansion for Graph Neural Networks

  • Jiaqi Sun
  • Lin Zhang
  • Guangyi Chen 0002
  • Peng Xu
  • Kun Zhang 0001
  • Yujiu Yang 0001

Graph neural networks aim to learn representations for graph-structured data and show impressive performance in node classification. Recently, many methods have studied the representations of GNNs from the perspective of optimization goals and spectral graph theory. However, the feature space that dominates representation learning has not been systematically studied in graph neural networks. In this paper, we propose to fill this gap by analyzing the feature space of both spatial and spectral models. We decompose graph neural networks into determined feature spaces and trainable weights, providing the convenience of studying the feature space explicitly using matrix space analysis. In particular, we find theoretically that the feature space tends to be linearly correlated due to repeated aggregations. In this case, the feature space is bounded by the poor representation of shared weights or the limited dimensionality of node attributes in existing models, leading to poor performance. Motivated by these findings, we propose 1) feature subspaces flattening and 2) structural principal components to expand the feature space. Extensive experiments verify the effectiveness of our proposed more comprehensive feature space, with comparable inference time to the baseline, and demonstrate its efficient convergence capability.

NeurIPS Conference 2022 Conference Paper

Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models

  • Boxin Wang
  • Wei Ping
  • Chaowei Xiao
  • Peng Xu
  • Mostofa Patwary
  • Mohammad Shoeybi
  • Bo Li
  • Anima Anandkumar

Pre-trained language models (LMs) are shown to easily generate toxic language. In this work, we systematically explore domain-adaptive training to reduce the toxicity of language models. We conduct this study on three dimensions: training corpus, model size, and parameter efficiency. For the training corpus, we demonstrate that using self-generated datasets consistently outperforms the existing baselines across various model sizes on both automatic and human evaluations, even when it uses a 3 1 smaller training corpus. We then comprehensively study detoxifying LMs with parameter sizes ranging from 126M up to 530B (3× larger than GPT3), a scale that has never been studied before. We find that i) large LMs have similar toxicity levels as smaller ones given the same pre-training corpus, and ii) large LMs require more endeavor to unlearn the toxic content seen at pretraining. We also explore parameter-efficient training methods for detoxification. We demonstrate that adding and training adapter-only layers in LMs not only saves a lot of parameters but also achieves a better trade-off between toxicity and perplexity than whole model adaptation for large-scale models. Our code will be available at: https: //github. com/NVIDIA/Megatron-LM/.

NeurIPS Conference 2022 Conference Paper

Factuality Enhanced Language Models for Open-Ended Text Generation

  • Nayeon Lee
  • Wei Ping
  • Peng Xu
  • Mostofa Patwary
  • Pascale N Fung
  • Mohammad Shoeybi
  • Bryan Catanzaro

Pretrained language models (LMs) are susceptible to generate text with nonfactual information. In this work, we measure and improve the factual accuracy of large-scale LMs for open-ended text generation. We design the FactualityPrompts test set and metrics to measure the factuality of LM generations. Based on that, we study the factual accuracy of LMs with parameter sizes ranging from 126M to 530B. Interestingly, we find that larger LMs are more factual than smaller ones, although a previous study suggests that larger LMs can be less truthful in terms of misconceptions. In addition, popular sampling algorithms (e. g. , top-p) in open-ended text generation can harm the factuality due to the ``uniform randomness'' introduced at every sampling step. We propose the factual-nucleus sampling algorithm that dynamically adapts the randomness to improve the factuality of generation while maintaining quality. Furthermore, we analyze the inefficiencies of the standard training method in learning correct associations between entities from factual text corpus (e. g. , Wikipedia). We propose a factuality-enhanced training method that uses TopicPrefix for better awareness of facts and sentence completion as the training objective, which can vastly reduce the factual errors.

NeurIPS Conference 2021 Conference Paper

BiToD: A Bilingual Multi-Domain Dataset For Task-Oriented Dialogue Modeling

  • Zhaojiang Lin
  • Andrea Madotto
  • Genta Winata
  • Peng Xu
  • Feijun Jiang
  • Yuxiang Hu
  • Chen Shi
  • Pascale N Fung

Task-oriented dialogue (ToD) benchmarks provide an important avenue to measure progress and develop better conversational agents. However, existing datasets for end-to-end ToD modeling are limited to a single language, hindering the development of robust end-to-end ToD systems for multilingual countries and regions. Here we introduce BiToD, the first bilingual multi-domain dataset for end-to-end task-oriented dialogue modeling. BiToD contains over 7k multi-domain dialogues (144k utterances) with a large and realistic bilingual knowledge base. It serves as an effective benchmark for evaluating bilingual ToD systems and cross-lingual transfer learning approaches. We provide state-of-the-art baselines under three evaluation settings (monolingual, bilingual, and cross-lingual). The analysis of our baselines in different settings highlights 1) the effectiveness of training a bilingual ToD system comparing to two independent monolingual ToD systems, and 2) the potential of leveraging a bilingual knowledge base and cross-lingual transfer learning to improve the system performance in the low resource condition.

AAAI Conference 2020 Conference Paper

Attention-Informed Mixed-Language Training for Zero-Shot Cross-Lingual Task-Oriented Dialogue Systems

  • Zihan Liu
  • Genta Indra Winata
  • Zhaojiang Lin
  • Peng Xu
  • Pascale Fung

Recently, data-driven task-oriented dialogue systems have achieved promising performance in English. However, developing dialogue systems that support low-resource languages remains a long-standing challenge due to the absence of highquality data. In order to circumvent the expensive and timeconsuming data collection, we introduce Attention-Informed Mixed-Language Training (MLT), a novel zero-shot adaptation method for cross-lingual task-oriented dialogue systems. It leverages very few task-related parallel word pairs to generate code-switching sentences for learning the interlingual semantics across languages. Instead of manually selecting the word pairs, we propose to extract source words based on the scores computed by the attention layer of a trained English task-related model and then generate word pairs using existing bilingual dictionaries. Furthermore, intensive experiments with different cross-lingual embeddings demonstrate the effectiveness of our approach. Finally, with very few word pairs, our model achieves significant zero-shot adaptation performance improvements in both cross-lingual dialogue state tracking and natural language understanding (i. e. , intent detection and slot filling) tasks compared to the current state-of-the-art approaches, which utilize a much larger amount of bilingual data.

AAAI Conference 2020 System Paper

CAiRE: An End-to-End Empathetic Chatbot

  • Zhaojiang Lin
  • Peng Xu
  • Genta Indra Winata
  • Farhad Bin Siddique
  • Zihan Liu
  • Jamin Shin
  • Pascale Fung

We present CAiRE, an end-to-end generative empathetic chatbot designed to recognize user emotions and respond in an empathetic manner. Our system adapts the Generative Pretrained Transformer (GPT) to empathetic response generation task via transfer learning. CAiRE is built primarily to focus on empathy integration in fully data-driven generative dialogue systems. We create a web-based user interface which allows multiple users to asynchronously chat with CAiRE. CAiRE also collects user feedback and continues to improve its response quality by discarding undesirable generations via active learning and negative training.

ICML Conference 2020 Conference Paper

On Variational Learning of Controllable Representations for Text without Supervision

  • Peng Xu
  • Jackie C. K. Cheung
  • Yanshuai Cao

The variational autoencoder (VAE) can learn the manifold of natural images on certain datasets, as evidenced by meaningful interpolating or extrapolating in the continuous latent space. However, on discrete data such as text, it is unclear if unsupervised learning can discover similar latent space that allows controllable manipulation. In this work, we find that sequence VAEs trained on text fail to properly decode when the latent codes are manipulated, because the modified codes often land in holes or vacant regions in the aggregated posterior latent space, where the decoding network fails to generalize. Both as a validation of the explanation and as a fix to the problem, we propose to constrain the posterior mean to a learned probability simplex, and performs manipulation within this simplex. Our proposed method mitigates the latent vacancy problem and achieves the first success in unsupervised learning of controllable representations for text. Empirically, our method outperforms unsupervised baselines and strong supervised approaches on text style transfer, and is capable of performing more flexible fine-grained control over text generation than existing methods.

AAAI Conference 2019 Conference Paper

Multi-View Information-Theoretic Co-Clustering for Co-Occurrence Data

  • Peng Xu
  • Zhaohong Deng
  • Kup-Sze Choi
  • Longbing Cao
  • Shitong Wang

Multi-view clustering has received much attention recently. Most of the existing multi-view clustering methods only focus on one-sided clustering. As the co-occurring data elements involve the counts of sample-feature co-occurrences, it is more efficient to conduct two-sided clustering along the samples and features simultaneously. To take advantage of two-sided clustering for the co-occurrences in the scene of multi-view clustering, a two-sided multi-view clustering method is proposed, i. e. , multi-view information-theoretic co-clustering (MV-ITCC). The proposed method realizes two-sided clustering for co-occurring multi-view data under the formulation of information theory. More specifically, it exploits the agreement and disagreement among views by sharing a common clustering results along the sample dimension and keeping the clustering results of each view specific along the feature dimension. In addition, the mechanism of maximum entropy is also adopted to control the importance of different views, which can give a right balance in leveraging the agreement and disagreement. Extensive experiments are conducted on text and image multiview datasets. The results clearly demonstrate the superiority of the proposed method.

NeurIPS Conference 2018 Conference Paper

GIANT: Globally Improved Approximate Newton Method for Distributed Optimization

  • Shusen Wang
  • Fred Roosta
  • Peng Xu
  • Michael Mahoney

For distributed computing environment, we consider the empirical risk minimization problem and propose a distributed and communication-efficient Newton-type optimization method. At every iteration, each worker locally finds an Approximate NewTon (ANT) direction, which is sent to the main driver. The main driver, then, averages all the ANT directions received from workers to form a Globally Improved ANT (GIANT) direction. GIANT is highly communication efficient and naturally exploits the trade-offs between local computations and global communications in that more local computations result in fewer overall rounds of communications. Theoretically, we show that GIANT enjoys an improved convergence rate as compared with first-order methods and existing distributed Newton-type methods. Further, and in sharp contrast with many existing distributed Newton-type methods, as well as popular first-order methods, a highly advantageous practical feature of GIANT is that it only involves one tuning parameter. We conduct large-scale experiments on a computer cluster and, empirically, demonstrate the superior performance of GIANT.

IROS Conference 2017 Conference Paper

Supervisory control of a DaVinci surgical robot

  • Der-Lin Chow
  • Peng Xu
  • Eser Tuna
  • Siqi Huang
  • Murat Cenk Çavusoglu
  • Wyatt S. Newman

This paper presents an approach to supervisory control of a DaVinci surgical robot. At present, such robots are controlled by teleoperation, with dissimilar kinematics of the operator interface vs. the robot. As a result, it can be difficult for the operator to visualize the kinematic restrictions on the robot, particularly for desired extended, precise trajectories, such as circular needle driving. The interface presented here constitutes a means to elevate the operator from teleoperation mode to supervisory mode. The operator interacts directly with a point-cloud display, allowing selection of task specifications from which the system automatically computes and executes precise trajectories to achieve the task goals. The intent is to allow the operator to focus on task specifications and rely on automation to achieve faster and more precise execution.

NeurIPS Conference 2016 Conference Paper

Sub-sampled Newton Methods with Non-uniform Sampling

  • Peng Xu
  • Jiyan Yang
  • Fred Roosta
  • Christopher Ré
  • Michael Mahoney

We consider the problem of finding the minimizer of a convex function $F: \mathbb R^d \rightarrow \mathbb R$ of the form $F(w) \defeq \sum_{i=1}^n f_i(w) + R(w)$ where a low-rank factorization of $\nabla^2 f_i(w)$ is readily available. We consider the regime where $n \gg d$. We propose randomized Newton-type algorithms that exploit \textit{non-uniform} sub-sampling of $\{\nabla^2 f_i(w)\}_{i=1}^{n}$, as well as inexact updates, as means to reduce the computational complexity, and are applicable to a wide range of problems in machine learning. Two non-uniform sampling distributions based on {\it block norm squares} and {\it block partial leverage scores} are considered. Under certain assumptions, we show that our algorithms inherit a linear-quadratic convergence rate in $w$ and achieve a lower computational complexity compared to similar existing methods. In addition, we show that our algorithms exhibit more robustness and better dependence on problem specific quantities, such as the condition number. We numerically demonstrate the advantages of our algorithms on several real datasets.

IROS Conference 2012 Conference Paper

A vision of the patient room as an architectural-robotic ecosystem

  • Anthony Threatt
  • Jessica Merino
  • Keith E. Green
  • Ian D. Walker
  • Johnell O. Brooks
  • Sean Ficht
  • Robert Kriener
  • Mary Mossey

Healthcare is becoming more digital and technological, but healthcare environments have not yet become embedded with digital technologies to support the most productive (physical) interaction between medical patients, clinical staff and the physical artifacts that surround and envelop them. This shortcoming is an opportunity for the architecture and robotics communities to interface with each other and the everyday users of healthcare environments. Our extended lab focused ten weeks on sketching in hardware a robotic, patient-room ecosystem we call home+ with the help of clinicians at the Roger C. Peace Rehabilitation Hospital of the Greenville Hospital System University Medical Center [GHS]. This early prototyping effort represents our vision for the larger robotic patient room, and identifies opportunities for more focused work on an Assistive Robotic Table (ART).

NeurIPS Conference 2008 Conference Paper

Short-Term Depression in VLSI Stochastic Synapse

  • Peng Xu
  • Timothy Horiuchi
  • Pamela Abshire

We report a compact realization of short-term depression (STD) in a VLSI stochastic synapse. The behavior of the circuit is based on a subtractive single release model of STD. Experimental results agree well with simulation and exhibit expected STD behavior: the transmitted spike train has negative autocorrelation and lower power spectral density at low frequencies which can remove redundancy in the input spike train, and the mean transmission probability is inversely proportional to the input spike rate which has been suggested as an automatic gain control mechanism in neural systems. The dynamic stochastic synapse could potentially be a powerful addition to existing deterministic VLSI spiking neural systems.

NeurIPS Conference 2004 Conference Paper

Using Random Forests in the Structured Language Model

  • Peng Xu
  • Frederick Jelinek

In this paper, we explore the use of Random Forests (RFs) in the struc- tured language model (SLM), which uses rich syntactic information in predicting the next word based on words already seen. The goal in this work is to construct RFs by randomly growing Decision Trees (DTs) us- ing syntactic information and investigate the performance of the SLM modeled by the RFs in automatic speech recognition. RFs, which were originally developed as classifiers, are a combination of decision tree classifiers. Each tree is grown based on random training data sampled independently and with the same distribution for all trees in the forest, and a random selection of possible questions at each node of the decision tree. Our approach extends the original idea of RFs to deal with the data sparseness problem encountered in language modeling. RFs have been studied in the context of n-gram language modeling and have been shown to generalize well to unseen data. We show in this paper that RFs using syntactic information can also achieve better performance in both perplexity (PPL) and word error rate (WER) in a large vocabulary speech recognition system, compared to a baseline that uses Kneser-Ney smoothing.