Author name cluster

Peng Sun

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

18 papers

2 author rows

AAAI Conference 2026 Conference Paper

VK-Det: Visual Knowledge Guided Prototype Learning for Open-Vocabulary Aerial Object Detection

Jianhang Yao
Yongbin Zheng
Siqi Lu
Wanying Xu
Peng Sun

To identify objects beyond predefined categories, open-vocabulary aerial object detection (OVAD) leverages the zero-shot capabilities of visual-language models (VLMs) to generalize from base to novel categories. Existing approaches typically utilize self-learning mechanisms with weak text supervision to generate region-level pseudo-labels to align detectors with VLMs semantic spaces. However, text dependence induces semantic bias, restricting open-vocabulary expansion to text-specified concepts. We propose VK-Det, a visual knowledge-guided open-vocabulary object detection framework without extra supervision. First, we discover and leverage vision encoder's inherent informative region perception to attain fine-grained localization and adaptive distillation. Second, we introduce a novel prototype-aware pseudo-labeling strategy. It models inter-class decision boundaries through feature clustering and maps detection regions to latent categories via prototype matching. This enhances attention to novel objects while compensating for missing supervision. Extensive experiments show state-of-the-art performance, achieving 30.1 mAPᴺ on DIOR and 23.3 mAPᴺ on DOTA, outperforming even extra supervised methods.

PDF Details DOI

AAAI Conference 2025 Conference Paper

DSRC: Learning Density-Insensitive and Semantic-Aware Collaborative Representation Against Corruptions

Jingyu Zhang
Yilei Wang
Lang Qian
Peng Sun
Zengwen Li
Sudong Jiang
Maolin Liu
Liang Song

As a potential application of Vehicle-to-Everything (V2X) communication, multi-agent collaborative perception has achieved significant success in 3D object detection. While these methods have demonstrated impressive results on standard benchmarks, the robustness of such approaches in the face of complex real-world environments requires additional verification. To bridge this gap, we introduce the first comprehensive benchmark designed to evaluate the robustness of collaborative perception methods in the presence of natural corruptions typical of real-world environments. Furthermore, we propose DSRC, a robustness-enhanced collaborative perception method aiming to learn Density-insensitive and Semantic-aware collaborative Representation against Corruptions. DSRC consists of two key designs: i) a semantic-guided sparse-to-dense distillation framework, which constructs multi-view dense objects painted by ground truth bounding boxes to effectively learn density-insensitive and semantic-aware collaborative representation; ii) a feature-to-point cloud reconstruction approach to better fuse critical collaborative representation across agents. To thoroughly evaluate DSRC, we conduct extensive experiments on real-world and simulated datasets. The results demonstrate that our method outperforms state-of-the-art collaborative perception methods in both clean and corrupted conditions.

PDF Details DOI

ICLR Conference 2025 Conference Paper

GIFT: Unlocking Full Potential of Labels in Distilled Dataset at Near-zero Cost

Xinyi Shang
Peng Sun
Tao Lin

Recent advancements in dataset distillation have demonstrated the significant benefits of employing soft labels generated by pre-trained teacher models. In this paper, we introduce a novel perspective by emphasizing the full utilization of labels. We first conduct a comprehensive comparison of various loss functions for soft label utilization in dataset distillation, revealing that the model trained on the synthetic dataset exhibits high sensitivity to the choice of loss function for soft label utilization. This finding highlights the necessity of a universal loss function for training models on synthetic datasets. Building on these insights, we introduce an extremely simple yet surprisingly effective plug-and-play approach, GIFT, which encompasses soft label refinement and a cosine similarity-based loss function to efficiently leverage full label information. Extensive experiments indicate that GIFT consistently enhances state-of-the-art dataset distillation methods across various dataset scales without incurring additional computational costs. Importantly, GIFT significantly enhances cross-optimizer generalization, an area previously overlooked. For instance, on ImageNet-1K with IPC = 10, GIFT enhances the state-of-the-art method RDED by 30.8% in cross-optimizer generalization. Our code is available at https://github.com/LINs-lab/GIFT.

Details

NeurIPS Conference 2025 Conference Paper

Multi-agent KTO: Enhancing Strategic Interactions of Large Language Model in Language Game

Rong Ye
Yongxin Zhang
Yikai Zhang
Haoyu Kuang
Peng Sun
Zhongyu Wei

Achieving Artificial General Intelligence (AGI) requires AI agents that can not only make strategic decisions but also engage in flexible and meaningful communication. Inspired by Wittgenstein's language game theory, we propose that language agents can learn through in-context interaction rather than traditional multi-stage frameworks that separate decision-making from language expression. Using Werewolf, a social deduction game that tests language understanding, strategic interaction, and adaptability, as a test bed, we develop the Multi-agent Kahneman-Tversky's Optimization (MaKTO). MaKTO engages diverse models in extensive gameplay to generate unpaired desirable and unacceptable responses, then employs KTO to refine the model's decision-making process. In 9-player Werewolf games, MaKTO achieves a 61% average win rate across various models, outperforming GPT-4o and two-stage RL agents by relative improvements of 23. 0% and 10. 9%, respectively. Notably, MaKTO also demonstrates human-like performance, winning 60% against expert players and showing only 48. 9% detectability in Turing-style blind tests. Code and data are available at project page https: //reneeye. github. io/MaKTO. html.

PDF Details

EAAI Journal 2025 Journal Article

Remaining useful life transfer prediction integrated with time-varying operation condition decoupling transformation

Lei Yang
Yuhe Liao
Tuojian Li
Tao Kang
Peng Sun

The Remaining Useful Life (RUL) prediction under time-varying operation conditions in both research and industry faces several challenges: (1) the speed signal is not always available in in-field equipment; (2) health indicators (HIs) under time-varying rotational frequencies exhibit fluctuations, which can lead to misjudgments in component failure; and (3) with fluctuating HIs resulting from time-varying operation conditions, accurately predicting the RUL becomes difficult. To address these issues, this paper proposes a novel RUL transfer prediction framework which integrates an Operation Condition Decoupling Transformation (OCDT) method. A Spectral Peak Query-based Harmonic Extraction method is firstly proposed to extract the rotational frequency harmonic. The OCDT method is then introduced to eliminate the impact of time-varying speed on the HIs and transform the HIs into an equivalent constant operation condition based on the estimated rotational frequency. A Domain Dual Adaption Transfer Prediction method with Adaptive Dynamic Weighted Loss Function is proposed to conduct cross-domain RUL prediction between the target domain of equivalent operation condition and the source domain of known operation condition. The real bearing experiments are conducted to validate the effectiveness and superiority of the proposed framework.

Details DOI

AAAI Conference 2025 Conference Paper

Unlocking the Power of LSTM for Long Term Time Series Forecasting

Yaxuan Kong
Zepu Wang
Yuqi Nie
Tian Zhou
Stefan Zohren
Yuxuan Liang
Peng Sun
Qingsong Wen

Traditional recurrent neural network architectures, such as long short-term memory neural networks (LSTM), have historically held a prominent role in time series forecasting (TSF) tasks. While the recently introduced sLSTM for Natural Language Processing (NLP) introduces exponential gating and memory mixing that are beneficial for long term sequential learning, its potential short memory issue is a barrier to applying sLSTM directly in TSF. To address this, we propose a simple yet efficient algorithm named P-sLSTM, which is built upon sLSTM by incorporating patching and channel independence. These modifications substantially enhance sLSTM's performance in TSF, achieving state-of-the-art results. Furthermore, we provide theoretical justifications for our design, and conduct extensive comparative and analytical experiments to fully validate the efficiency and superior performance of our model.

PDF Details DOI

AAAI Conference 2024 Conference Paper

A Dual Stealthy Backdoor: From Both Spatial and Frequency Perspectives

Yudong Gao
Honglong Chen
Peng Sun
Junjian Li
Anqing Zhang
Zhibo Wang
Weifeng Liu

Backdoor attacks pose serious security threats to deep neural networks (DNNs). Backdoored models make arbitrarily (targeted) incorrect predictions on inputs containing well-designed triggers, while behaving normally on clean inputs. Prior researches have explored the invisibility of backdoor triggers to enhance attack stealthiness. However, most of them only focus on the invisibility in the spatial domain, neglecting the generation of invisible triggers in the frequency domain. This limitation renders the generated poisoned images easily detectable by recent defense methods. To address this issue, we propose a DUal stealthy BAckdoor attack method named DUBA, which simultaneously considers the invisibility of triggers in both the spatial and frequency domains, to achieve desirable attack performance, while ensuring strong stealthiness. Specifically, we first use Wavelet Transform to embed the high-frequency information of the trigger image into the clean image to ensure attack effectiveness. Then, to attain strong stealthiness, we incorporate Fourier Transform and Cosine Transform to mix the poisoned image and clean image in the frequency domain. Moreover, DUBA adopts a novel attack strategy, training the model with weak triggers and attacking with strong triggers to further enhance attack performance and stealthiness. DUBA is evaluated extensively on four datasets against popular image classifiers, showing significant superiority over state-of-the-art backdoor attacks in attack success rate and stealthiness.

PDF Details DOI

EAAI Journal 2024 Journal Article

B A B E: Backdoor attack with bokeh effects via latent separation suppression

Junjian Li
Honglong Chen
Yudong Gao
Shaozhong Guo
Kai Lin
Yuping Liu
Peng Sun

The escalating menace of backdoor attacks constitutes a formidable obstacle to the ongoing advancement of deep neural networks (DNNs), particularly in the security-sensitive applications such as face recognition and self-driving. Backdoored models render deliberately incorrect predictions on the inputs with the crafted triggers while behaving normally with the benign ones. Despite demonstrating the varying degrees of threat, existing backdoor attack strategies often prioritize stealthiness and defense evasions but neglect the practical feasibility in the real-world deployment scenarios. In this paper, we develop a backdoor attack leveraging bokeh effects ( B A B E ), which introduces the bokeh effects as the trigger. Once the backdoored model is deployed in the vision application, the model’s malicious behaviors can be activated only by utilizing the captured bokeh images without any other modifications. Specially, we employ the saliency and depth estimation maps to derive the bokeh images, thereby serving as the poisoned samples. Furthermore, to avoid the latent separation of the generated poisoned images, we propose distinct attack strategies on the basis of the adversary’s prior abilities. For the adversary only with the data manipulation, we retain the original semantic labels for a subset of poisoned data during the training process. For the adversary with the manipulation of both the data and models, we construct a reference model trained on the clean samples to impose constraints on the latent representations of the poisoned images. Extensive experiments demonstrate the attack effects of the proposed B A B E, even on the bokeh photos captured from Digital Still Cameras (DSC) and smartphones.

Details DOI

NeurIPS Conference 2024 Conference Paper

Efficiency for Free: Ideal Data Are Transportable Representations

Peng Sun
Yi Jiang
Tao Lin

Data, the seminal opportunity and challenge in modern machine learning, currently constrains the scalability of representation learning and impedes the pace of model evolution. In this work, we investigate the efficiency properties of data from both optimization and generalization perspectives. Our theoretical and empirical analysis reveals an unexpected finding: for a given task, utilizing a publicly available, task- and architecture-agnostic model (referred to as the `prior model' in this paper) can effectively produce efficient data. Building on this insight, we propose the Representation Learning Accelerator (ReLA), which promotes the formation and utilization of efficient data, thereby accelerating representation learning. Utilizing a ResNet-18 pre-trained on CIFAR-10 as a prior model to inform ResNet-50 training on ImageNet-1K reduces computational costs by $50\%$ while maintaining the same accuracy as the model trained with the original BYOL, which requires $100\%$ cost. Our code is available at: \url{https: //github. com/LINs-lab/ReLA}.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

Hybrid Policy Optimization from Imperfect Demonstrations

Hanlin Yang
Chao Yu
Peng Sun
Siji Chen

Exploration is one of the main challenges in Reinforcement Learning (RL), especially in environments with sparse rewards. Learning from Demonstrations (LfD) is a promising approach to solving this problem by leveraging expert demonstrations. However, expert demonstrations of high quality are usually costly or even impossible to collect in real-world applications. In this work, we propose a novel RL algorithm called HYbrid Policy Optimization (HYPO), which uses a small number of imperfect demonstrations to accelerate an agent's online learning process. The key idea is to train an offline guider policy using imitation learning in order to instruct an online agent policy to explore efficiently. Through mutual update of the guider policy and the agent policy, the agent can leverage suboptimal demonstrations for efficient exploration while avoiding the conservative policy caused by imperfect demonstrations. Empirical results show that HYPO significantly outperforms several baselines in various challenging tasks, such as MuJoCo with sparse rewards, Google Research Football, and the AirSim drone simulation.

PDF Details

NeurIPS Conference 2022 Conference Paper

A Unified Diversity Measure for Multiagent Reinforcement Learning

Zongkai Liu
Chao Yu
Yaodong Yang
Peng Sun
Zifan Wu
Yuan Li

Promoting behavioural diversity is of critical importance in multi-agent reinforcement learning, since it helps the agent population maintain robust performance when encountering unfamiliar opponents at test time, or, when the game is highly non-transitive in the strategy space (e. g. , Rock-Paper-Scissor). While a myriad of diversity metrics have been proposed, there are no widely accepted or unified definitions in the literature, making the consequent diversity-aware learning algorithms difficult to evaluate and the insights elusive. In this work, we propose a novel metric called the Unified Diversity Measure (UDM) that offers a unified view for existing diversity metrics. Based on UDM, we design the UDM-Fictitious Play (UDM-FP) and UDM-Policy Space Response Oracle (UDM-PSRO) algorithms as efficient solvers for normal-form games and open-ended games. In theory, we prove that UDM-based methods can enlarge the gamescape by increasing the response capacity of the strategy pool, and have convergence guarantee to two-player Nash equilibrium. We validate our algorithms on games that show strong non-transitivity, and empirical results show that our algorithms achieve better performances than strong PSRO baselines in terms of the exploitability and population effectivity.

PDF Details

AAAI Conference 2022 Conference Paper

FedInv: Byzantine-Robust Federated Learning by Inversing Local Model Updates

Bo Zhao
Peng Sun
Tao Wang
Keyu Jiang

Federated learning (FL) is a privacy-preserving distributed machine learning paradigm that enables multiple clients to collaboratively train statistical models without disclosing raw training data. However, the inaccessible local training data and uninspectable local training process make FL susceptible to various Byzantine attacks (e. g. , data poisoning and model poisoning attacks), aiming to manipulate the FL model training process and degrade the model performance. Most of the existing Byzantine-robust FL schemes cannot effectively defend against stealthy poisoning attacks that craft poisoned models statistically similar to benign models. Things worsen when many clients are compromised or data among clients are highly non-independent and identically distributed (non-IID). In this work, to address these issues, we propose FedInv, a novel Byzantine-robust FL framework by inversing local model updates. Specifically, in each round of local model aggregation in FedInv, the parameter server first inverses the local model updates submitted by each client to generate a corresponding dummy dataset. Then, the server identifies those dummy datasets with exceptional Wasserstein distances from others and excludes the related local model updates from model aggregation. We conduct an exhaustive experimental evaluation of FedInv. The results demonstrate that FedInv significantly outperforms the existing robust FL schemes in defending against stealthy poisoning attacks under highly non-IID data partitions.

PDF Details

JBHI Journal 2022 Journal Article

Learning COVID-19 Pneumonia Lesion Segmentation From Imperfect Annotations via Divergence-Aware Selective Training

Shuojue Yang
Guotai Wang
Hui Sun
Xiangde Luo
Peng Sun
Kang Li
Qijun Wang
Shaoting Zhang

Automatic segmentation of COVID-19 pneumonia lesions is critical for quantitative measurement for diagnosis and treatment management. For this task, deep learning is the state-of-the-art method while requires a large set of accurately annotated images for training, which is difficult to obtain due to limited access to experts and the time-consuming annotation process. To address this problem, we aim to train the segmentation network from imperfect annotations, where the training set consists of a small clean set of accurately annotated images by experts and a large noisy set of inaccurate annotations by non-experts. To avoid the labels with different qualities corrupting the segmentation model, we propose a new approach to train segmentation networks to deal with noisy labels. We introduce a dual-branch network to separately learn from the accurate and noisy annotations. To fully exploit the imperfect annotations as well as suppressing the noise, we design a Divergence-Aware Selective Training (DAST) strategy, where a divergence-aware noisiness score is used to identify severely noisy annotations and slightly noisy annotations. For severely noisy samples we use an regularization through dual-branch consistency between predictions from the two branches. We also refine slightly noisy samples and use them as supplementary data for the clean branch to avoid overfitting. Experimental results show that our method achieves a higher performance than standard training process for COVID-19 pneumonia lesion segmentation when learning from imperfect labels, and our framework outperforms the state-of-the-art noise-tolerate methods significantly with various clean label percentages.

Details DOI

YNICL Journal 2021 Journal Article

Diffusion basis spectrum imaging measures anti-inflammatory and neuroprotective effects of fingolimod on murine optic neuritis

Ruimeng Yang
Tsen-Hsuan Lin
Jie Zhan
Shengsheng Lai
Chunyu Song
Peng Sun
Zezhong Ye
Michael Wallendorf

OBJECTIVE: To prospectively determine whether diffusion basis spectrum imaging (DBSI) detects, differentiates and quantitates coexisting inflammation, demyelination, axonal injury and axon loss in mice with optic neuritis (ON) due to experimental autoimmune encephalomyelitis (EAE), and to determine if DBSI accurately measures effects of fingolimod on underlying pathology. METHODS: (putatively reflecting demyelination). Mice were killed immediately after the last DBSI scan for immunohistochemical assessment. RESULTS: increase were detected during Fingolimod treatment. DBSI-derived metrics assessed in vivo significantly correlated (p < 0.05) with the corresponding histological markers. CONCLUSION: DBSI was used to assess changes of the underlying optic nerve pathologies in EAE mice with ON, exhibiting great potential as a noninvasive outcome measure for monitoring disease progression and therapeutic efficacy for MS.

Details DOI

AAAI Conference 2020 Conference Paper

Attention-over-Attention Field-Aware Factorization Machine

Zhibo Wang
Jinxin Ma
Yongquan Zhang
Qian Wang
Ju Ren
Peng Sun

Factorization Machine (FM) has been a popular approach in supervised predictive tasks, such as click-through rate prediction and recommender systems, due to its great performance and efﬁciency. Recently, several variants of FM have been proposed to improve its performance. However, most of the state-of-the-art prediction algorithms neglected the ﬁeld information of features, and they also failed to discriminate the importance of feature interactions due to the problem of redundant features. In this paper, we present a novel algorithm called Attention-over-Attention Field-aware Factorization Machine (AoAFFM) for better capturing the characteristics of feature interactions. Speciﬁcally, we propose the ﬁeldaware embedding layer to exploit the ﬁeld information of features, and combine it with the attention-over-attention mechanism to learn both feature-level and interaction-level attention to estimate the weight of feature interactions. Experimental results show that the proposed AoAFFM improves FM and FFM with large margin, and outperforms state-of-the-art algorithms on three public benchmark datasets.

PDF Details

YNIMG Journal 2019 Journal Article

Incorporating non-linear alignment and multi-compartmental modeling for improved human optic nerve diffusion imaging

Joo-won Kim
Jesper LR. Andersson
Alan C. Seifert
Peng Sun
Sheng-Kwei Song
Courtney Dula
Robert T. Naismith
Junqian Xu

In vivo human optic nerve diffusion magnetic resonance imaging (dMRI) is technically challenging with two outstanding issues not yet well addressed: (i) non-linear optic nerve movement, independent of head motion, and (ii) effect from partial-volumed cerebrospinal fluid or interstitial fluid such as in edema. In this work, we developed a non-linear optic nerve registration algorithm for improved volume alignment in axial high resolution optic nerve dMRI. During eyes-closed dMRI data acquisition, optic nerve dMRI measurements by diffusion tensor imaging (DTI) with and without free water elimination (FWE), and by diffusion basis spectrum imaging (DBSI), as well as optic nerve motion, were characterized in healthy adults at various locations along the posterior-to-anterior dimension. Optic nerve DTI results showed consistent trends in microstructural parametric measurements along the posterior-to-anterior direction of the entire intraorbital optic nerve, while the anterior portion of the intraorbital optic nerve exhibited the largest spatial displacement. Multi-compartmental dMRI modeling, such as DTI with FWE or DBSI, was less subject to spatially dependent biases in diffusivity and anisotropy measurements in the optic nerve which corresponded to similar spatial distributions of the estimated fraction of isotropic diffusion components. DBSI results derived from our clinically feasible (∼10 min) optic nerve dMRI protocol in this study are consistent with those from small animal studies, which provides the basis for evaluating the utility of multi-compartmental dMRI modeling in characterizing coexisting pathophysiology in human optic neuropathies.

Details DOI

NeurIPS Conference 2018 Conference Paper

Exponentially Weighted Imitation Learning for Batched Historical Data

Qing Wang
Jiechao Xiong
Lei Han
Peng Sun
Han Liu
Tong Zhang

We consider deep policy learning with only batched historical trajectories. The main challenge of this problem is that the learner no longer has a simulator or ``environment oracle'' as in most reinforcement learning settings. To solve this problem, we propose a monotonic advantage reweighted imitation learning strategy that is applicable to problems with complex nonlinear function approximation and works well with hybrid (discrete and continuous) action space. The method does not rely on the knowledge of the behavior policy, thus can be used to learn from data generated by an unknown policy. Under mild conditions, our algorithm, though surprisingly simple, has a policy improvement bound and outperforms most competing methods empirically. Thorough numerical results are also provided to demonstrate the efficacy of the proposed methodology.

PDF Details

YNIMG Journal 2014 Journal Article

Quantifying white matter tract diffusion parameters in the presence of increased extra-fiber cellularity and vasogenic edema

Chia-Wen Chiang
Yong Wang
Peng Sun
Tsen-Hsuan Lin
Kathryn Trinkaus
Anne H. Cross
Sheng-Kwei Song

The effect of extra-fiber structural and pathological components confounding diffusion tensor imaging (DTI) computation was quantitatively investigated using data generated by both Monte-Carlo simulations and tissue phantoms. Increased extent of vasogenic edema, by addition of various amount of gel to fixed normal mouse trigeminal nerves or by increasing non-restricted isotropic diffusion tensor components in Monte-Carlo simulations, significantly decreased fractional anisotropy (FA) and increased radial diffusivity, while less significantly increased axial diffusivity derived by DTI. Increased cellularity, mimicked by graded increase of the restricted isotropic diffusion tensor component in Monte-Carlo simulations, significantly decreased FA and axial diffusivity with limited impact on radial diffusivity derived by DTI. The MC simulation and tissue phantom data were also analyzed by the recently developed diffusion basis spectrum imaging (DBSI) to simultaneously distinguish and quantify the axon/myelin integrity and extra-fiber diffusion components. Results showed that increased cellularity or vasogenic edema did not affect the DBSI-derived fiber FA, axial or radial diffusivity. Importantly, the extent of extra-fiber cellularity and edema estimated by DBSI correlated with experimentally added gel and Monte-Carlo simulations. We also examined the feasibility of applying 25-direction diffusion encoding scheme for DBSI analysis on coherent white matter tracts. Results from both phantom experiments and simulations suggested that the 25-direction diffusion scheme provided comparable DBSI estimation of both fiber diffusion parameters and extra-fiber cellularity/edema extent as those by 99-direction scheme. An in vivo 25-direction DBSI analysis was performed on experimental autoimmune encephalomyelitis (EAE, an animal model of human multiple sclerosis) optic nerve as an example to examine the validity of derived DBSI parameters with post-imaging immunohistochemistry verification. Results support that in vivo DBSI using 25-direction diffusion scheme correctly reflect the underlying axonal injury, demyelination, and inflammation of optic nerves in EAE mice.

Details DOI