Arrow Research search

Author name cluster

Peng Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

71 papers
2 author rows

Possible papers

71

AAAI Conference 2026 Conference Paper

AMS-KV: Adaptive KV Caching in Multi-Scale Visual Autoregressive Transformers

  • Boxun Xu
  • Yu Wang
  • Zihu Wang
  • Peng Li

Visual autoregressive modeling (VAR) via next-scale prediction has emerged as a scalable image generation paradigm. While Key and Value (KV) caching in large language models (LLMs) has been extensively studied, next-scale prediction presents unique challenges, and KV caching design for next-scale based VAR transformers remains largely unexplored. A major bottleneck is the excessive KV memory growth with the increasing number of scales—severely limiting scalability. Our systematic investigation reveals that: (1) Attending to tokens from local scales significantly contributes to generation quality (2) Allocating a small amount of memory for the coarsest scales, termed as condensed scales, stabilizes multi-scale image generation (3) Strong KV similarity across finer scales is predominantly observed in cache-efficient layers, whereas cache-demanding layers exhibit weaker inter-scale similarity. Based on the observations, we introduce AMS-KV, a scale-adaptive KV caching policy for next-scale prediction in VAR models. AMS-KV prioritizes storing KVs from condensed and local scales, preserving the most relevant tokens to maintain generation quality. It further optimizes KV cache utilization and computational efficiency identifying cache-demanding layers through inter-scale similarity analysis. Compared to the vanilla next-scale prediction-based VAR models, AMS-KV reduces KV cache usage by up to 84.83% and self-attention latency by 60.48%. Moreover, when the baseline VAR-d30 model encounters out-of-memory failures at a batch size of 128, AMS-KV enables stable scaling to a batch size of 256 with improved throughput.

AAAI Conference 2026 Conference Paper

Khan-GCL: Kolmogorov–Arnold Network Based Graph Contrastive Learning with Hard Negatives

  • Zihu Wang
  • Boxun Xu
  • Hejia Geng
  • Peng Li

Graph contrastive learning (GCL) has demonstrated great promise for learning generalizable graph representations from unlabeled data. However, conventional GCL approaches face two critical limitations: (1) the restricted expressive capacity of multilayer perceptron (MLP) based encoders, and (2) suboptimal negative samples that either from random augmentations—failing to provide effective 'hard negatives'—or generated hard negatives without addressing the semantic distinctions crucial for discriminating graph data. To this end, we propose Khan-GCL, a novel framework that integrates the Kolmogorov–Arnold Network (KAN) into the GCL encoder architecture, substantially enhancing its representational capacity. Furthermore, we exploit the rich information embedded within KAN coefficient parameters to develop two novel critical feature identification techniques that enable the generation of semantically meaningful hard negative samples for each graph representation. These strategically constructed hard negatives guide the encoder to learn more discriminative features by emphasizing critical semantic differences between graphs. Extensive experiments demonstrate that our approach achieves state-of-the-art performance compared to existing GCL methods across a variety of datasets and tasks.

TIST Journal 2026 Journal Article

LKAFormer: A Lightweight Kolmogorov-Arnold Transformer Model for Image Semantic Segmentation

  • Shoulin Yin
  • Liguo Wang
  • Tao Chen
  • Huafei Huang
  • Jing Gao
  • Jianing Zhang
  • Meng Liu
  • Peng Li

Transformer-based semantic segmentation methods have demonstrated outstanding performance by leveraging global self-attention to effectively capture long-range dependence. However, there still exist two issues in existing works: (1) Most of them utilize the full-rank weight matrix to support the self-attention mechanism and feed-forward network in modelling long-range dependence between patches/pixels, resulting in a high computational cost during both training and inference. (2) Most of them ignore information interactions between high-level semantics and low-level structures during the image resolution recovery, which leads to the performance degradation in segmenting objects with complex boundaries. To tackle these challenges, a lightweight Kolmogorov-Arnold Transformer model (LKAFormer) is proposed for the image semantic segmentation, containing a two-stream lightweight Transformer encoder and a graph feature pyramid aggregation KAN-decoder. The former constructs a hierarchical feature cross-scale fusion pipeline to obtain sufficient semantics containing comprehensive multi-scale information via setting coarse-grained and fine-grained streams with different-size patches of images. In that pipeline, feature lightweight focusing modules model complex and long-range dependence across patches/pixels to refine image semantics with less computational costs by lightweight multi-head self-attention and lightweight feed-forward network designs. The latter leverages the learnable nonlinear transformation mechanism of the Kolmogorov-Arnold Transformer architecture to adaptively capture spatial structure dependence of distinct sub-regions of images. And then, it jointly performs the intra-scale graph fusion and cross-scale graph fusion during the image resolution recovery to enhance information interactions between high-level semantics and low-level structures, which achieves the robust boundary localization and texture refinement of segmentation objects. Finally, plentiful experiments are conducted on three challenging datasets, and the results show LKAFormer sets a new baseline in the image segmentation task in comparison with 11 methods.

AAAI Conference 2026 Conference Paper

Visual-Friendly Concept Protection via Selective Adversarial Perturbations

  • Xiaoyue Mi
  • Fan Tang
  • You Wu
  • Juan Cao
  • Peng Li
  • Yang Liu

Personalized concept generation by tuning diffusion models with a few images raises potential legal and ethical concerns regarding privacy and intellectual property rights. Researchers attempt to prevent malicious personalization using adversarial perturbations. However, previous efforts have mainly focused on the effectiveness of protection while neglecting the visibility of perturbations. They utilize global adversarial perturbations, which introduce noticeable alterations to original images and significantly degrade visual quality. In this work, we propose the Visual-Friendly Concept Protection (VCPro) framework, which prioritizes the protection of key concepts chosen by the image owner through adversarial perturbations with lower perceptibility. To ensure these perturbations are as inconspicuous as possible, we introduce a relaxed optimization objective to identify the least perceptible yet effective adversarial perturbations, solved using the Lagrangian multiplier method. Qualitative and quantitative experiments validate that VCPro achieves a better trade-off between the visibility of perturbations and protection effectiveness, effectively prioritizing the protection of target concepts in images with less perceptible perturbations.

IROS Conference 2025 Conference Paper

Bench4Merge: A Comprehensive Benchmark for Merging in Realistic Dense Traffic with Micro-Interactive Vehicles

  • Zhengming Wang
  • Junli Wang
  • Pengfei Li
  • Zhaohan Li
  • Chunyang Liu
  • Bo Zhang
  • Peng Li
  • Yilun Chen

While the capabilities of autonomous driving have advanced rapidly, merging into dense traffic remains a significant challenge, many motion planning methods for this scenario have been proposed but it is hard to evaluate them. Most existing closed-loop simulators rely on rule-based controls for other vehicles, which results in a lack of diversity and randomness, thus failing to accurately assess the motion planning capabilities in highly interactive scenarios. Moreover, traditional evaluation metrics are insufficient for comprehensively evaluating the performance of merging in dense traffic. In response, we proposed a closed-loop evaluation benchmark for assessing motion planning capabilities in merging scenarios. Our approach involves other vehicles trained in large scale datasets with micro-behavioral characteristics that significantly enhance the complexity and diversity. Additionally, we have restructured the evaluation mechanism by leveraging Large Language Models (LLMs) to assess each autonomous vehicle merging onto the main lane. Extensive experiments and test-vehicle deployment have demonstrated the progressiveness of this benchmark. Through this benchmark, we have obtained an evaluation of existing methods and identified common issues. The simulation environment and evaluation process can be accessed at https://github.com/WZM5853/Bench4Merge.

NeurIPS Conference 2025 Conference Paper

Beyond the Surface: Enhancing LLM-as-a-Judge Alignment with Human via Internal Representations

  • Peng Lai
  • Jianjie Zheng
  • Sijie Cheng
  • Yun Chen
  • Peng Li
  • Yang Liu
  • Guanhua Chen

The growing scale of evaluation tasks has led to the widespread adoption of automated evaluation using LLMs, a paradigm known as “LLM-as-a-judge”. However, improving its alignment with human preferences without complex prompts or fine-tuning remains challenging. Previous studies mainly optimize based on shallow outputs, overlooking rich cross-layer representations. In this work, motivated by preliminary findings that middle-to-upper layers encode semantically and task-relevant representations that are often more aligned with human judgments than the final layer, we propose LAGER, a post-hoc, plug-and-play framework for improving the alignment of LLM-as-a-Judge point-wise evaluations with human scores by leveraging internal representations. LAGER produces fine-grained judgment scores by aggregating cross-layer score-token logits and computing the expected score from a softmax-based distribution, while keeping the LLM backbone frozen and ensuring no impact on the inference process. LAGER fully leverages the complementary information across different layers, overcoming the limitations of relying solely on the final layer. We evaluate our method on the standard alignment benchmarks Flask, HelpSteer, and BIGGen using Spearman correlation, and find that LAGER achieves improvements of up to 7. 5% over the best baseline across these benchmarks. Without reasoning steps, LAGER matches or outperforms reasoning-based methods. Experiments on downstream applications, such as data selection and emotional understanding, further show the generalization of LAGER.

ICLR Conference 2025 Conference Paper

Earlier Tokens Contribute More: Learning Direct Preference Optimization From Temporal Decay Perspective

  • Ruichen Shao
  • Bei Li
  • Gangao Liu
  • Yang Chen
  • ZhouXiang
  • Jingang Wang
  • Xunliang Cai
  • Peng Li

Direct Preference Optimization (DPO) has gained attention as an efficient alternative to reinforcement learning from human feedback (RLHF) for aligning large language models (LLMs) with human preferences. Despite its advantages, DPO suffers from a length bias, generating responses longer than those from the reference model. Existing solutions like SimPO and SamPO address this issue but uniformly treat the contribution of rewards across sequences, overlooking temporal dynamics. To this end, we propose an enhanced preference optimization method that incorporates a temporal decay factor controlled by a gamma parameter. This dynamic weighting mechanism adjusts the influence of each reward based on its position in the sequence, prioritizing earlier tokens that are more critical for alignment. By adaptively focusing on more relevant feedback, our approach mitigates overfitting to less pertinent data and remains responsive to evolving human preferences. Experimental results on several benchmarks show that our approach consistently outperforms vanilla DPO by 5.9-8.8 points on AlpacaEval 2 and 3.3-9.7 points on Arena-Hard across different model architectures and sizes. Furthermore, additional experiments on mathematical and reasoning benchmarks (MMLU, GSM8K, and MATH) confirm that our method enhances performance without compromising general capabilities. Our codebase would be available at \url{https://github.com/LotuSrc/D2PO}.

TMLR Journal 2025 Journal Article

HoSNNs: Adversarially-Robust Homeostatic Spiking Neural Networks with Adaptive Firing Thresholds

  • Hejia Geng
  • Peng Li

While spiking neural networks (SNNs) offer a promising neurally-inspired model of computation, they are vulnerable to adversarial attacks. We present the first study that draws inspiration from neural homeostasis to design a threshold-adapting leaky integrate-and-fire (TA-LIF) neuron model and utilize TA-LIF neurons to construct the adversarially robust homeostatic SNNs (HoSNNs) for improved robustness. The TA-LIF model incorporates a self-stabilizing dynamic thresholding mechanism, offering a local feedback control solution to the minimization of each neuron's membrane potential error caused by adversarial disturbance. Theoretical analysis demonstrates favorable dynamic properties of TA-LIF neurons in terms of the bounded-input bounded-output stability and suppressed time growth of membrane potential error, underscoring their superior robustness compared with the standard LIF neurons. When trained with weak FGSM attacks (\(\epsilon = 2/255\)), our HoSNNs significantly outperform conventionally trained LIF-based SNNs across multiple datasets. Furthermore, under significantly stronger PGD7 attacks (\(\epsilon = 8/255\)), HoSNN achieves notable improvements in accuracy, increasing from 30.90% to 74.91% on FashionMNIST, 0.44% to 36.82% on SVHN, 0.54% to 43.33% on CIFAR10, and 0.04% to 16.66% on CIFAR100.

IJCAI Conference 2025 Conference Paper

MASTER: A Multi-granularity Invariant Structure Clustering Scheme for Multi-view Clustering

  • Suixue Wang
  • Shilin Zhang
  • Qingchen Zhang
  • Peng Li
  • Weiliang Huo

Deep multi-view clustering has attracted increasing attention in the pattern mining of data. However, most of them perform self-learning mechanisms in a single space, ignoring the fruitful structural information hidden in different-level feature spaces. Meanwhile, they conduct the reconstruction constraint to learn generalized representations of samples, failing to explore the discriminative ability of complementary and consistent information. To address the challenges, a multi-granularity invariant structure clustering scheme (MASTER) is proposed to define a bottom-up process that extracts multi-level information in sample, neighborhood, and category granularities from low-level, high-level, and semantics feature space, respectively. Specifically, it leverages the self-learning reconstruction with information-theoretic overclustering to capture invariant sample structure in the low-level feature space. Then, it models data diffusion of the clustering process in the reliable neighborhood to capture invariant local structure in the high-level feature space. Meanwhile, it defines dual divergences induced by the space geometry to capture invariant global structure in the semantics space. Finally, extensive experiments on 8 real-world datasets show that MASTER achieves state-of-the-art performance compared to 11 baselines.

IJCAI Conference 2025 Conference Paper

Semi-Clairvoyant Scheduling of Speculative Decoding Requests to Minimize LLM Inference Latency

  • Ruixiao Li
  • Fahao Chen
  • Peng Li

Speculative decoding accelerates Large Language Model (LLM) inference by employing a small speculative model (SSM) to generate multiple candidate tokens and verify them using the LLM in parallel. This technique has been widely integrated into LLM inference serving systems. However, inference requests typically exhibit uncertain execution time, which poses a significant challenge of efficiently scheduling requests in these systems. Existing work estimates execution time based solely on predicted output length, which could be inaccurate because execution time depends on both output length and token acceptance rate of verification by the LLM. In this paper, we propose a semi-clairvoyant request scheduling algorithm called Least-Attained/Perceived-Service for Speculative Decoding (LAPS-SD). Given a number of inference requests, LAPS-SD can effectively minimize average inference latency by adaptively scheduling requests according to their features during decoding. When token acceptance rate is dynamic and execution time is difficult to estimate, LAPS-SD maintains multiple priority queues and allows request execution preemption across different queues. Once the token acceptance rate becomes stable, LAPS-SD can accurately estimate the execution time and schedule requests accordingly. Extensive experiments show that LAPS-SD reduces inference latency by approximately 39% compared to state-of-the-art scheduling methods.

NeurIPS Conference 2025 Conference Paper

SyncHuman: Synchronizing 2D and 3D Generative Models for Single-view Human Reconstruction

  • Wenyue Chen
  • Peng Li
  • Wangguandong Zheng
  • Chengfeng Zhao
  • Mengfei Li
  • Yaolong Zhu
  • Zhiyang Dou
  • Ronggang Wang

Photorealistic 3D full-body human reconstruction from a single image is a critical yet challenging task for applications in films and video games due to inherent ambiguities and severe self-occlusions. While recent approaches leverage SMPL estimation and SMPL-conditioned image generative models to hallucinate novel views, they suffer from inaccurate 3D priors estimated from SMPL meshes and have difficulty in handling difficult human poses and reconstructing fine details. In this paper, we propose SyncHuman, a novel framework that combines 2D multiview generative model and 3D native generative model for the first time, enabling high-quality clothed human mesh reconstruction from single-view images even under challenging human poses. Multiview generative model excels at capturing fine 2D details but struggles with structural consistency, whereas 3D native generative model generates coarse yet structurally consistent 3D shapes. By integrating the complementary strengths of these two approaches, we develop a more effective generation framework. Specifically, we first jointly fine-tune the multiview generative model and the 3D native generative model with proposed pixel-aligned 2D-3D synchronization attention to produce geometrically aligned 3D shapes and 2D multiview images. To further improve details, we introduce a feature injection mechanism that lifts fine details from 2D multiview images onto the aligned 3D shapes, enabling accurate and high-fidelity reconstruction. Extensive experiments demonstrate that SyncHuman achieves robust and photorealistic 3D human reconstruction, even for images with challenging poses. Our method outperforms baseline methods in geometric accuracy and visual fidelity, demonstrating a promising direction for future 3D generation models.

NeurIPS Conference 2025 Conference Paper

TrackingWorld: World-centric Monocular 3D Tracking of Almost All Pixels

  • Jiahao Lu
  • Weitao Xiong
  • Jiacheng Deng
  • Peng Li
  • Tianyu Huang
  • Zhiyang Dou
  • Cheng Lin
  • Sai-Kit Yeung

Monocular 3D tracking aims to capture the long-term motion of pixels in 3D space from a single monocular video and has witnessed rapid progress in recent years. However, we argue that the existing monocular 3D tracking methods still fall short in separating the camera motion from foreground dynamic motion and cannot densely track newly emerging dynamic subjects in the videos. To address these two limitations, we propose TrackingWorld, a novel pipeline for dense 3D tracking of almost all pixels within a world-centric 3D coordinate system. First, we introduce a tracking upsampler that efficiently lifts the arbitrary sparse 2D tracks into dense 2D tracks. Then, to generalize the current tracking methods to newly emerging objects, we apply the upsampler to all frames and reduce the redundancy of 2D tracks by eliminating the tracks in overlapped regions. Finally, we present an efficient optimization-based framework to back-project dense 2D tracks into world-centric 3D trajectories by estimating the camera poses and the 3D coordinates of these 2D tracks. Extensive evaluations on both synthetic and real-world datasets demonstrate that our system achieves accurate and dense 3D tracking in a world-centric coordinate frame.

TMLR Journal 2024 Journal Article

Contrastive Learning with Consistent Representations

  • Zihu Wang
  • Yu Wang
  • Zhuotong Chen
  • Hanbin Hu
  • Peng Li

Contrastive learning demonstrates great promise for representation learning. Data augmentations play a critical role in contrastive learning by providing informative views of the data without necessitating explicit labels. Nonetheless, the efficacy of current methodologies heavily hinges on the quality of employed data augmentation (DA) functions, often chosen manually from a limited set of options. While exploiting diverse data augmentations is appealing, the complexities inherent in both DAs and representation learning can lead to performance deterioration. Addressing this challenge and facilitating the systematic incorporation of diverse data augmentations, this paper proposes Contrastive Learning with Consistent Representations (CoCor). At the heart of CoCor is a novel consistency metric termed DA consistency. This metric governs the mapping of augmented input data to the representation space. Moreover, we propose to learn the optimal mapping locations as a function of DA. Experimental results demonstrate that CoCor notably enhances the generalizability and transferability of learned representations in comparison to baseline methods. The implementation of CoCor can be found at https://github.com/zihuwang97/CoCor.

NeurIPS Conference 2024 Conference Paper

Era3D: High-Resolution Multiview Diffusion using Efficient Row-wise Attention

  • Peng Li
  • Yuan Liu
  • Xiaoxiao Long
  • Feihu Zhang
  • Cheng Lin
  • Mengfei Li
  • Xingqun Qi
  • Shanghang Zhang

In this paper, we introduce Era3D, a novel multiview diffusion method that generates high-resolution multiview images from a single-view image. Despite significant advancements in multiview generation, existing methods still suffer from camera prior mismatch, inefficacy, and low resolution, resulting in poor-quality multiview images. Specifically, these methods assume that the input images should comply with a predefined camera type, e. g. a perspective camera with a fixed focal length, leading to distorted shapes when the assumption fails. Moreover, the full-image or dense multiview attention they employ leads to a dramatic explosion of computational complexity as image resolution increases, resulting in prohibitively expensive training costs. To bridge the gap between assumption and reality, Era3D first proposes a diffusion-based camera prediction module to estimate the focal length and elevation of the input image, which allows our method to generate images without shape distortions. Furthermore, a simple but efficient attention layer, named row-wise attention, is used to enforce epipolar priors in the multiview diffusion, facilitating efficient cross-view information fusion. Consequently, compared with state-of-the-art methods, Era3D generates high-quality multiview images with up to a 512×512 resolution while reducing computation complexity of multiview attention by 12x times. Comprehensive experiments demonstrate the superior generation power of Era3D- it can reconstruct high-quality and detailed 3D meshes from diverse single-view input images, significantly outperforming baseline multiview diffusion methods.

TMLR Journal 2024 Journal Article

Extreme Risk Mitigation in Reinforcement Learning using Extreme Value Theory

  • Karthik Somayaji NS
  • Yu Wang
  • Malachi Schram
  • Jan Drgona
  • Mahantesh M Halappanavar
  • Frank Liu
  • Peng Li

Risk-sensitive reinforcement learning (RL) has garnered significant attention in recent years due to the growing interest in deploying RL agents in real-world scenarios. A critical aspect of risk awareness involves modelling highly rare risk events (rewards) that could potentially lead to catastrophic outcomes. These infrequent occurrences present a formidable challenge for data-driven methods aiming to capture such risky events accurately. While risk-aware RL techniques do exist, they suffer from high variance estimation due to the inherent data scarcity. Our work proposes to enhance the resilience of RL agents when faced with very rare and risky events by focusing on refining the predictions of the extreme values predicted by the state-action value distribution. To achieve this, we formulate the extreme values of the state-action value function distribution as parameterized distributions, drawing inspiration from the principles of extreme value theory (EVT). We propose an extreme value theory based actor-critic approach, namely, Extreme Valued Actor-Critic (EVAC) which effectively addresses the issue of infrequent occurrence by leveraging EVT-based parameterization. Importantly, we theoretically demonstrate the advantages of employing these parameterized distributions in contrast to other risk-averse algorithms. Our evaluations show that the proposed method outperforms other risk averse RL algorithms on a diverse range of benchmark tasks, each encompassing distinct risk scenarios.

IROS Conference 2024 Conference Paper

Offline Meta-Reinforcement Learning with Evolving Gradient Agreement

  • Jiaxing Chen
  • Weilin Yuan
  • Shaofei Chen
  • Furong Liu
  • Ao Ma
  • Zhenzhen Hu
  • Peng Li

Meta-Reinforcement Learning (Meta-RL) is a machine learning paradigm aimed at learning reinforcement learning policies that can quickly adapt to unseen tasks with few-shot data. Nevertheless, applying Meta-RL to real-world applications faces challenges due to the cost of data acquisition. To address this problem, offline Meta-RL has emerged as a promising solution, focusing on learning policies from pre-collected data that can effectively and rapidly adapt to unseen tasks. In this paper, we propose a new offline Meta-RL method called Meta-Actor-Critic with Evolving Gradient Agreement (MACEGA). MACEGA utilizes an evolutionary approach to estimate meta-gradients conductive to generalization across unseen tasks. During meta-training, gradient evolution is utilized to meta-update the value network and policies. Moreover, we use gradient agreement as an optimization objective for meta-learning, thereby enhancing the generalization ability of the meta-policy. We experimentally demonstrate the robustness of MACEGA in handling offline data quality. Furthermore, extensive experiments on various benchmarks provide empirical evidence that MACEGA outperforms previous state-of-the-art methods in generalizing to unseen tasks, thus demonstrating its potential for real-world applications.

AAMAS Conference 2024 Conference Paper

Optimal Flash Loan Fee Function with Respect to Leverage Strategies

  • Chenmin Wang
  • Peng Li
  • Yulong Zeng
  • Xuepeng Fan

We investigate two decentralized methods for leveraging assets: Firstly, investors recurrently commit their target assets as collateral to secure loans, subsequently reinvesting the borrowed funds in the same assets. Secondly, investors pledge their assets once but are required to promptly borrow from a lender and repay the borrowed amount. This model is exemplified by recent Ethereum investment strategies, where investors must weigh the trade-off between gas fees associated with multiple pledging processes and fees charged by the lender, known as the Flash Loan project. Our comprehensive analysis encompasses game theory dynamics, determining optimal strategies for self-interested investors and deriving a unique nonlinear optimal fee structure for Flash Loans. This structure remains incentive-compatible, guarding against Sybil attacks and other deviations. Empirical results, under varying environmental parameters, consistently demonstrate the superior revenue performance of our optimal fee structure compared to the commonly used linear fee model within the Flash Loan project.

AAAI Conference 2024 Conference Paper

Semi-supervised Learning of Dynamical Systems with Neural Ordinary Differential Equations: A Teacher-Student Model Approach

  • Yu Wang
  • Yuxuan Yin
  • Karthik Somayaji NS
  • Ján Drgoňa
  • Malachi Schram
  • Mahantesh Halappanavar
  • Frank Liu
  • Peng Li

Modeling dynamical systems is crucial for a wide range of tasks, but it remains challenging due to complex nonlinear dynamics, limited observations, or lack of prior knowledge. Recently, data-driven approaches such as Neural Ordinary Differential Equations (NODE) have shown promising results by leveraging the expressive power of neural networks to model unknown dynamics. However, these approaches often suffer from limited labeled training data, leading to poor generalization and suboptimal predictions. On the other hand, semi-supervised algorithms can utilize abundant unlabeled data and have demonstrated good performance in classification and regression tasks. We propose TS-NODE, the first semi-supervised approach to modeling dynamical systems with NODE. TS-NODE explores cheaply generated synthetic pseudo rollouts to broaden exploration in the state space and to tackle the challenges brought by lack of ground-truth system data under a teacher-student model. TS-NODE employs an unified optimization framework that corrects the teacher model based on the student's feedback while mitigating the potential false system dynamics present in pseudo rollouts. TS-NODE demonstrates significant performance improvements over a baseline Neural ODE model on multiple dynamical system modeling tasks.

AAAI Conference 2024 Conference Paper

Set Prediction Guided by Semantic Concepts for Diverse Video Captioning

  • Yifan Lu
  • Ziqi Zhang
  • Chunfeng Yuan
  • Peng Li
  • Yan Wang
  • Bing Li
  • Weiming Hu

Diverse video captioning aims to generate a set of sentences to describe the given video in various aspects. Mainstream methods are trained with independent pairs of a video and a caption from its ground-truth set without exploiting the intra-set relationship, resulting in low diversity of generated captions. Different from them, we formulate diverse captioning into a semantic-concept-guided set prediction (SCG-SP) problem by fitting the predicted caption set to the ground-truth set, where the set-level relationship is fully captured. Specifically, our set prediction consists of two synergistic tasks, i.e., caption generation and an auxiliary task of concept combination prediction providing extra semantic supervision. Each caption in the set is attached to a concept combination indicating the primary semantic content of the caption and facilitating element alignment in set prediction. Furthermore, we apply a diversity regularization term on concepts to encourage the model to generate semantically diverse captions with various concept combinations. These two tasks share multiple semantics-specific encodings as input, which are obtained by iterative interaction between visual features and conceptual queries. The correspondence between the generated captions and specific concept combinations further guarantees the interpretability of our model. Extensive experiments on benchmark datasets show that the proposed SCG-SP achieves state-of-the-art (SOTA) performance under both relevance and diversity metrics.

ICRA Conference 2024 Conference Paper

Statler: State-Maintaining Language Models for Embodied Reasoning

  • Takuma Yoneda
  • Jiading Fang
  • Peng Li
  • Huanyu Zhang
  • Tianchong Jiang
  • Shengjie Lin
  • Ben Picker
  • David Yunis

There has been a significant research interest in employing large language models to empower intelligent robots with complex reasoning. Existing work focuses on harnessing their abilities to reason about the histories of their actions and observations. In this paper, we explore a new dimension in which large language models may benefit robotics planning. In particular, we propose Statler, a framework in which large language models are prompted to maintain an estimate of the world state, which are often unobservable, and track its transition as new actions are taken. Our framework then conditions each action on the estimate of the current world state. Despite being conceptually simple, our Statler framework significantly outperforms strong competing methods (e. g. , Code-as-Policies) on several robot planning tasks. Additionally, it has the potential advantage of scaling up to more challenging long-horizon planning tasks. We release our code here.

AAAI Conference 2023 Conference Paper

AutoNF: Automated Architecture Optimization of Normalizing Flows with Unconstrained Continuous Relaxation Admitting Optimal Discrete Solution

  • Yu Wang
  • Ján Drgoňa
  • Jiaxin Zhang
  • Karthik Somayaji Nanjangud Suryanarayana
  • Malachi Schram
  • Frank Liu
  • Peng Li

Normalizing flows (NF) build upon invertible neural networks and have wide applications in probabilistic modeling. Currently, building a powerful yet computationally efficient flow model relies on empirical fine-tuning over a large design space. While introducing neural architecture search (NAS) to NF is desirable, the invertibility constraint of NF brings new challenges to existing NAS methods whose application is limited to unstructured neural networks. Developing efficient NAS methods specifically for NF remains an open problem. We present AutoNF, the first automated NF architectural optimization framework. First, we present a new mixture distribution formulation that allows efficient differentiable architecture search of flow models without violating the invertibility constraint. Second, under the new formulation, we convert the original NP-hard combinatorial NF architectural optimization problem to an unconstrained continuous relaxation admitting the discrete optimal architectural solution, circumventing the loss of optimality due to binarization in architectural optimization. We evaluate AutoNF with various density estimation datasets and show its superior performance-cost trade-offs over a set of existing hand-crafted baselines.

ICRA Conference 2023 Conference Paper

Efficient and Hybrid Decoder for Local Map Construction in Bird'-Eye-View

  • Kun Tian
  • Yun Ye
  • Zheng Zhu
  • Peng Li
  • Guan Huang 0003

High-definition maps are crucial perception elements for autonomous robot navigation systems, which can provide accurate scene layout and environment information for downstream motion prediction and planning control tasks. Traditional methods based on manual annotation or SLAM algorithms require massive labor efforts and time costs, which hinders the deployment of practical applications. Online construction of local maps from on-board cameras offers an alternative solution. Aiming at the problems of unsatisfying precision and redundant computation of HDMapNet, we propose an efficient and hybrid decoder (EHD) that consists of a CNN-based segmentation (Seg) head and a query-based lane detection head (QLD). Specifically, the Seg head outputs pixel-level semantic maps, and QLD predicts instance mask for each lane object through learnable query embeddings. The designed decoding method eliminates the cumulative error caused by inaccurate semantic maps and does not require additional clustering algorithm for post-processing. Through combining with a variety of bird's-eye-view (BEV) encoders, the effectiveness and efficiency of our EHD is demonstrated by extensive experiments. For segmentation task, the mIoU scores of semantic map can be improved by 1. 3%∼2. 9%. Additionally, the accuracy of lane detection is also significantly increased (more than 10. 2% mAP) under all evaluation criteria. Since our method discards redundant post-processing, the inference speed is up to 22. 71 FPS, which is 32 times faster than HDMapNet.

NeurIPS Conference 2023 Conference Paper

Exploiting Contextual Objects and Relations for 3D Visual Grounding

  • Li Yang
  • Chunfeng Yuan
  • Ziqi Zhang
  • Zhongang Qi
  • Yan Xu
  • Wei Liu
  • Ying Shan
  • Bing Li

3D visual grounding, the task of identifying visual objects in 3D scenes based on natural language inputs, plays a critical role in enabling machines to understand and engage with the real-world environment. However, this task is challenging due to the necessity to capture 3D contextual information to distinguish target objects from complex 3D scenes. The absence of annotations for contextual objects and relations further exacerbates the difficulties. In this paper, we propose a novel model, CORE-3DVG, to address these challenges by explicitly learning about contextual objects and relations. Our method accomplishes 3D visual grounding via three sequential modular networks, including a text-guided object detection network, a relation matching network, and a target identification network. During training, we introduce a pseudo-label self-generation strategy and a weakly-supervised method to facilitate the learning of contextual objects and relations, respectively. The proposed techniques allow the networks to focus more effectively on referred objects within 3D scenes by understanding their context better. We validate our model on the challenging Nr3D, Sr3D, and ScanRefer datasets and demonstrate state-of-the-art performance. Our code will be public at https: //github. com/yangli18/CORE-3DVG.

IROS Conference 2023 Conference Paper

Tightly-Coupled Visual-DVL Fusion For Accurate Localization of Underwater Robots

  • Yupei Huang
  • Peng Li
  • Shuaizheng Yan
  • Yaming Ou
  • Zhengxing Wu
  • Min Tan 0001
  • Junzhi Yu 0001

This paper proposes a tightly-coupled visual-Doppler-Velocity-Log (visual-DVL) fusion method for underwater robot localization through integrating the velocity measurements from a DVL into a visual odometry (VO). Considering that employing the DVL measurements in dead-reckoning systems easily leads to error accumulation and suboptimal results in previous works, we directly integrate them into the visual tracking process. Specifically, the velocity measurements are utilized to improve the initial estimation of camera pose during visual tracking, aiming to provide a better initial value for pose optimization. Thereafter, these velocity measurements are also directly employed to constrain the position change of the camera between two adjacent frames by constructing a novel DVL error term, which is optimized jointly with the visual constrains to obtain a more accurate camera pose. Various experiments are carried out in the datasets collected from several scenarios of the underwater simulation environment HoloOcean, and the results illustrate that the proposed fusion method can effectively improve the localization accuracy for underwater robots by about 20% compared to pure visual odometry. The proposed method provides valuable guidance for the accurate localization of underwater robots.

TMLR Journal 2023 Journal Article

When to Trust Aggregated Gradients: Addressing Negative Client Sampling in Federated Learning

  • Wenkai Yang
  • Yankai Lin
  • Guangxiang Zhao
  • Peng Li
  • Jie Zhou
  • Xu Sun

Federated Learning has become a widely-used framework which allows learning a global model on decentralized local datasets under the condition of protecting local data privacy. However, federated learning faces severe optimization difficulty when training samples are not independently and identically distributed (non-i.i.d.). In this paper, we point out that the client sampling practice plays a decisive role in the aforementioned optimization difficulty. We find that the negative client sampling will cause the merged data distribution of currently sampled clients heavily inconsistent with that of all available clients, and further make the aggregated gradient unreliable. To address this issue, we propose a novel learning rate adaptation mechanism to adaptively adjust the server learning rate for the aggregated gradient in each round, according to the consistency between the merged data distribution of currently sampled clients and that of all available clients. Specifically, we make theoretical deductions to find a meaningful and robust indicator that is positively related to the optimal server learning rate, which is supposed to minimize the Euclidean distance between the aggregated gradient given currently sampled clients and that if all clients could participate in the current round. We show that our proposed indicator can effectively reflect the merged data distribution of sampled clients, thus we utilize it for the server learning rate adaptation. Extensive experiments on multiple image and text classification tasks validate the great effectiveness of our method in various settings. Our code is available at https://github.com/lancopku/FedGLAD.

IJCAI Conference 2022 Conference Paper

A Closed-Loop Perception, Decision-Making and Reasoning Mechanism for Human-Like Navigation

  • Wenqi Zhang
  • Kai Zhao
  • Peng Li
  • Xiao Zhu
  • Yongliang Shen
  • Yanna Ma
  • Yingfeng Chen
  • Weiming Lu

Reliable navigation systems have a wide range of applications in robotics and autonomous driving. Current approaches employ an open-loop process that converts sensor inputs directly into actions. However, these open-loop schemes are challenging to handle complex and dynamic real-world scenarios due to their poor generalization. Imitating human navigation, we add a reasoning process to convert actions back to internal latent states, forming a two-stage closed loop of perception, decision-making, and reasoning. Firstly, VAE-Enhanced Demonstration Learning endows the model with the understanding of basic navigation rules. Then, two dual processes in RL-Enhanced Interaction Learning generate reward feedback for each other and collectively enhance obstacle avoidance capability. The reasoning model can substantially promote generalization and robustness, and facilitate the deployment of the algorithm to real-world robots without elaborate transfers. Experiments show our method is more adaptable to novel scenarios compared with state-of-the-art approaches.

AAAI Conference 2022 Conference Paper

I Can Find You! Boundary-Guided Separated Attention Network for Camouflaged Object Detection

  • Hongwei Zhu
  • Peng Li
  • Haoran Xie
  • Xuefeng Yan
  • Dong Liang
  • Dapeng Chen
  • Mingqiang Wei
  • Jing Qin

Can you find me? By simulating how humans to discover the so-called ‘perfectly’-camouflaged object, we present a novel boundary-guided separated attention network (call BSA-Net). Beyond the existing camouflaged object detection (COD) wisdom, BSA-Net utilizes two-stream separated attention modules to highlight the separator (or say the camouflaged object’s boundary) between an image’s background and foreground: the reverse attention stream helps erase the camouflaged object’s interior to focus on the background, while the normal attention stream recovers the interior and thus pay more attention to the foreground; and both streams are followed by a boundary guider module and combined to strengthen the understanding of the boundary. The core design of such separated attention is motivated by the COD procedure of humans: find the subtle difference between the foreground and background to delineate the boundary of a camouflaged object, then the boundary can help further enhance the COD accuracy. We validate on three benchmark datasets that our BSA-Net is very beneficial to detect camouflaged objects with the blurred boundaries and similar colors/patterns with their backgrounds. Extensive results exhibit very clear COD improvements on our BSA-Net over sixteen SOTAs.

IJCAI Conference 2022 Conference Paper

Rethinking the Promotion Brought by Contrastive Learning to Semi-Supervised Node Classification

  • Deli Chen
  • Yankai Lin
  • Lei Li
  • Xuancheng Ren
  • Peng Li
  • Jie Zhou
  • Xu Sun

Graph Contrastive Learning (GCL) has proven highly effective in promoting the performance of Semi-Supervised Node Classification (SSNC). However, existing GCL methods are generally transferred from other fields like CV or NLP, whose underlying working mechanism remains underexplored. In this work, we first deeply probe the working mechanism of GCL in SSNC, and find that the promotion brought by GCL is severely unevenly distributed: the improvement mainly comes from subgraphs with less annotated information, which is fundamentally different from contrastive learning in other fields. However, existing GCL methods generally ignore this uneven distribution of annotated information and apply GCL evenly to the whole graph. To remedy this issue and further improve GCL in SSNC, we propose the Topology InFormation gain-Aware Graph Contrastive Learning (TIFA-GCL) framework that considers the annotated information distribution across graph in GCL. Extensive experiments on six benchmark graph datasets, including the enormous OGB-Products graph, show that TIFA-GCL can bring a larger improvement than existing GCL methods in both transductive and inductive settings. Further experiments demonstrate the generalizability and interpretability of TIFA-GCL.

AAAI Conference 2021 Conference Paper

Aspect-Level Sentiment-Controllable Review Generation with Mutual Learning Framework

  • Huimin Chen
  • Yankai Lin
  • Fanchao Qi
  • Jinyi Hu
  • Peng Li
  • Jie Zhou
  • Maosong Sun

Review generation, aiming to automatically generate review text according to the given information, is proposed to assist in the unappealing review writing. However, most of existing methods only consider the overall sentiments of reviews and cannot achieve aspect-level sentiment control. Even though some previous studies attempt to generate aspect-level sentiment-controllable reviews, they usually require largescale human annotations which are unavailable in the real world. To address this issue, we propose a mutual learning framework to take advantage of unlabeled data to assist the aspect-level sentiment-controllable review generation. The framework consists of a generator and a classifier which utilize confidence mechanism and reconstruction reward to enhance each other. Experimental results show our model can achieve aspect-sentiment control accuracy up to 88% without losing generation quality.

IJCAI Conference 2021 Conference Paper

Deep Reinforcement Learning for Multi-contact Motion Planning of Hexapod Robots

  • Huiqiao Fu
  • Kaiqiang Tang
  • Peng Li
  • Wenqi Zhang
  • Xinpeng Wang
  • Guizhou Deng
  • Tao Wang
  • Chunlin Chen

Legged locomotion in a complex environment requires careful planning of the footholds of legged robots. In this paper, a novel Deep Reinforcement Learning (DRL) method is proposed to implement multi-contact motion planning for hexapod robots moving on uneven plum-blossom piles. First, the motion of hexapod robots is formulated as a Markov Decision Process (MDP) with a specified reward function. Second, a transition feasibility model is proposed for hexapod robots, which describes the feasibility of the state transition under the condition of satisfying kinematics and dynamics, and in turn determines the rewards. Third, the footholds and Center-of-Mass (CoM) sequences are sampled from a diagonal Gaussian distribution and the sequences are optimized through learning the optimal policies using the designed DRL algorithm. Both of the simulation and experimental results on physical systems demonstrate the feasibility and efficiency of the proposed method. Videos are shown at https: //videoviewpage. wixsite. com/mcrl.

AAAI Conference 2021 Conference Paper

Guiding Non-Autoregressive Neural Machine Translation Decoding with Reordering Information

  • Qiu Ran
  • Yankai Lin
  • Peng Li
  • Jie Zhou

Non-autoregressive neural machine translation (NAT) generates each target word in parallel and has achieved promising inference acceleration. However, existing NAT models still have a big gap in translation quality compared to autoregressive neural machine translation models due to the multimodality problem: the target words may come from multiple feasible translations. To address this problem, we propose a novel NAT framework ReorderNAT which explicitly models the reordering information to guide the decoding of NAT. Specially, ReorderNAT utilizes deterministic and nondeterministic decoding strategies that leverage reordering information as a proxy for the final translation to encourage the decoder to choose words belonging to the same translation. Experimental results on various widely-used datasets show that our proposed model achieves better performance compared to most existing NAT models, and even achieves comparable translation quality as autoregressive translation models with a significant speedup.

NeurIPS Conference 2021 Conference Paper

Topology-Imbalance Learning for Semi-Supervised Node Classification

  • Deli Chen
  • Yankai Lin
  • Guangxiang Zhao
  • Xuancheng Ren
  • Peng Li
  • Jie Zhou
  • Xu Sun

The class imbalance problem, as an important issue in learning node representations, has drawn increasing attention from the community. Although the imbalance considered by existing studies roots from the unequal quantity of labeled examples in different classes (quantity imbalance), we argue that graph data expose a unique source of imbalance from the asymmetric topological properties of the labeled nodes, i. e. , labeled nodes are not equal in terms of their structural role in the graph (topology imbalance). In this work, we first probe the previously unknown topology-imbalance issue, including its characteristics, causes, and threats to semisupervised node classification learning. We then provide a unified view to jointly analyzing the quantity- and topology- imbalance issues by considering the node influence shift phenomenon with the Label Propagation algorithm. In light of our analysis, we devise an influence conflict detection–based metric Totoro to measure the degree of graph topology imbalance and propose a model-agnostic method ReNode to address the topology-imbalance issue by re-weighting the influence of labeled nodes adaptively based on their relative positions to class boundaries. Systematic experiments demonstrate the effectiveness and generalizability of our method in relieving topology-imbalance issue and promoting semi-supervised node classification. The further analysis unveils varied sensitivity of different graph neural networks (GNNs) to topology imbalance, which may serve as a new perspective in evaluating GNN architectures.

AAAI Conference 2020 Conference Paper

DMRM: A Dual-Channel Multi-Hop Reasoning Model for Visual Dialog

  • Feilong Chen
  • Fandong Meng
  • Jiaming Xu
  • Peng Li
  • Bo Xu
  • Jie Zhou

Visual Dialog is a vision-language task that requires an AI agent to engage in a conversation with humans grounded in an image. It remains a challenging task since it requires the agent to fully understand a given question before making an appropriate response not only from the textual dialog history, but also from the visually-grounded information. While previous models typically leverage single-hop reasoning or single-channel reasoning to deal with this complex multimodal reasoning task, which is intuitively insufficient. In this paper, we thus propose a novel and more powerful Dual-channel Multi-hop Reasoning Model for Visual Dialog, named DMRM. DMRM synchronously captures information from the dialog history and the image to enrich the semantic representation of the question by exploiting dual-channel reasoning. Specifically, DMRM maintains a dual channel to obtain the question- and history-aware image features and the question- and image-aware dialog history features by a mulithop reasoning process in each channel. Additionally, we also design an effective multimodal attention to further enhance the decoder to generate more accurate responses. Experimental results on the VisDial v0. 9 and v1. 0 datasets demonstrate that the proposed model is effective and outperforms compared models by a significant margin.

AAAI Conference 2020 Conference Paper

Hierarchical Knowledge Squeezed Adversarial Network Compression

  • Peng Li
  • Chang Shu
  • Yuan Xie
  • Yan Qu
  • Hui Kong

Deep network compression has been achieved notable progress via knowledge distillation, where a teacher-student learning manner is adopted by using predetermined loss. Recently, more focuses have been transferred to employ the adversarial training to minimize the discrepancy between distributions of output from two networks. However, they always emphasize on result-oriented learning while neglecting the scheme of process-oriented learning, leading to the loss of rich information contained in the whole network pipeline. Whereas in other (non GAN-based) process-oriented methods, the knowledge have usually been transferred in a redundant manner. Observing that, the small network can not perfectly mimic a large one due to the huge gap of network scale, we propose a knowledge transfer method, involving effective intermediate supervision, under the adversarial training framework to learn the student network. Different from the other intermediate supervision methods, we design the knowledge representation in a compact form by introducing a task-driven attention mechanism. Meanwhile, to improve the representation capability of the attention-based method, a hierarchical structure is utilized so that powerful but highly squeezed knowledge is realized and the knowledge from teacher network could accommodate the size of student network. Extensive experimental results on three typical benchmark datasets, i. e. , CIFAR-10, CIFAR-100, and ImageNet, demonstrate that our method achieves highly superior performances against state-of-the-art methods.

AAAI Conference 2020 Conference Paper

Measuring and Relieving the Over-Smoothing Problem for Graph Neural Networks from the Topological View

  • Deli Chen
  • Yankai Lin
  • Wei Li
  • Peng Li
  • Jie Zhou
  • Xu Sun

Graph Neural Networks (GNNs) have achieved promising performance on a wide range of graph-based tasks. Despite their success, one severe limitation of GNNs is the over-smoothing issue (indistinguishable representations of nodes in different classes). In this work, we present a systematic and quantitative study on the over-smoothing issue of GNNs. First, we introduce two quantitative metrics, MAD and MADGap, to measure the smoothness and oversmoothness of the graph nodes representations, respectively. Then, we verify that smoothing is the nature of GNNs and the critical factor leading to over-smoothness is the low information-to-noise ratio of the message received by the nodes, which is partially determined by the graph topology. Finally, we propose two methods to alleviate the oversmoothing issue from the topological view: (1) MADReg which adds a MADGap-based regularizer to the training objective; (2) AdaEdge which optimizes the graph topology based on the model predictions. Extensive experiments on 7 widely-used graph datasets with 10 typical GNN models show that the two proposed methods are effective for relieving the over-smoothing issue, thus improving the performance of various GNN models.

AAAI Conference 2020 Conference Paper

Multi-Instance Multi-Label Action Recognition and Localization Based on Spatio-Temporal Pre-Trimming for Untrimmed Videos

  • Xiao-Yu Zhang
  • Haichao Shi
  • Changsheng Li
  • Peng Li

Weakly supervised action recognition and localization for untrimmed videos is a challenging problem with extensive applications. The overwhelming irrelevant background contents in untrimmed videos severely hamper effective identification of actions of interest. In this paper, we propose a novel multi-instance multi-label modeling network based on spatio-temporal pre-trimming to recognize actions and locate corresponding frames in untrimmed videos. Motivated by the fact that person is the key factor in a human action, we spatially and temporally segment each untrimmed video into person-centric clips with pose estimation and tracking techniques. Given the bag-of-instances structure associated with video-level labels, action recognition is naturally formulated as a multi-instance multi-label learning problem. The network is optimized iteratively with selective coarse-to-fine pre-trimming based on instance-label activation. After convergence, temporal localization is further achieved with localglobal temporal class activation map. Extensive experiments are conducted on two benchmark datasets, i. e. THUMOS14 and ActivityNet1. 3, and experimental results clearly corroborate the efficacy of our method when compared with the stateof-the-arts.

NeurIPS Conference 2020 Conference Paper

Temporal Spike Sequence Learning via Backpropagation for Deep Spiking Neural Networks

  • Wenrui Zhang
  • Peng Li

Spiking neural networks (SNNs) are well suited for spatio-temporal learning and implementations on energy-efficient event-driven neuromorphic processors. However, existing SNN error backpropagation (BP) methods lack proper handling of spiking discontinuities and suffer from low performance compared with the BP methods for traditional artificial neural networks. In addition, a large number of time steps are typically required to achieve decent performance, leading to high latency and rendering spike-based computation unscalable to deep architectures. We present a novel Temporal Spike Sequence Learning Backpropagation (TSSL-BP) method for training deep SNNs, which breaks down error backpropagation across two types of inter-neuron and intra-neuron dependencies and leads to improved temporal learning precision. It captures inter-neuron dependencies through presynaptic firing times by considering the all-or-none characteristics of firing activities and captures intra-neuron dependencies by handling the internal evolution of each neuronal state in time. TSSL-BP efficiently trains deep SNNs within a much shortened temporal window of a few steps while improving the accuracy for various image classification datasets including CIFAR10.

IJCAI Conference 2019 Conference Paper

A Dual Reinforcement Learning Framework for Unsupervised Text Style Transfer

  • Fuli Luo
  • Peng Li
  • Jie Zhou
  • Pengcheng Yang
  • Baobao Chang
  • Xu Sun
  • Zhifang Sui

Unsupervised text style transfer aims to transfer the underlying style of text but keep its main content unchanged without parallel data. Most existing methods typically follow two steps: first separating the content from the original style, and then fusing the content with the desired style. However, the separation in the first step is challenging because the content and style interact in subtle ways in natural language. Therefore, in this paper, we propose a dual reinforcement learning framework to directly transfer the style of the text via a one-step mapping model, without any separation of content and style. Specifically, we consider the learning of the source-to-target and target-to-source mappings as a dual task, and two rewards are designed based on such a dual structure to reflect the style accuracy and content preservation, respectively. In this way, the two one-step mapping models can be trained via reinforcement learning, without any use of parallel data. Automatic evaluations show that our model outperforms the state-of-the-art systems by a large margin, especially with more than 10 BLEU points improvement averaged on two benchmark datasets. Human evaluations also validate the effectiveness of our model in terms of style accuracy, content preservation and fluency. Our code and data, including outputs of all baselines and our model are available at https: //github. com/luofuli/DualRL.

IROS Conference 2019 Conference Paper

Self-modeling Tracking Control of Crawler Fire Fighting Robot Based on Causal Network *

  • Wenkai Chang
  • Peng Li
  • Caiyun Yang
  • Tao Lu 0006
  • Yinghao Cai
  • Shuo Wang 0001

In this paper, a self-modeling method based on a causal network is proposed for the tracking control of the Crawler Fire Fighting Robot (CFFR). The method mainly consists of two parts, one is a motion model, based on data driving, learning to establish the correspondence between control signal sequence and vehicle motion, estimating the motion state of the next moment from historical data, eliminating complex CFFR modeling. The other is the tracking network. Based on the simulation data of the motion model, the relationship between the target trajectory and the current control command is learned, which simplifies the design and cumbersome tuning of the complex controller. The effectiveness of the proposed method is verified in both simulated and real-world environments. Qualitative and quantitative experimental results verify the accuracy of the tracking.

NeurIPS Conference 2019 Conference Paper

Spike-Train Level Backpropagation for Training Deep Recurrent Spiking Neural Networks

  • Wenrui Zhang
  • Peng Li

Spiking neural networks (SNNs) well support spatiotemporal learning and energy-efficient event-driven hardware neuromorphic processors. As an important class of SNNs, recurrent spiking neural networks (RSNNs) possess great computational power. However, the practical application of RSNNs is severely limited by challenges in training. Biologically-inspired unsupervised learning has limited capability in boosting the performance of RSNNs. On the other hand, existing backpropagation (BP) methods suffer from high complexity of unrolling in time, vanishing and exploding gradients, and approximate differentiation of discontinuous spiking activities when applied to RSNNs. To enable supervised training of RSNNs under a well-defined loss function, we present a novel Spike-Train level RSNNs Backpropagation (ST-RSBP) algorithm for training deep RSNNs. The proposed ST-RSBP directly computes the gradient of a rated-coded loss function defined at the output layer of the network w. r. t tunable parameters. The scalability of ST-RSBP is achieved by the proposed spike-train level computation during which temporal effects of the SNN is captured in both the forward and backward pass of BP. Our ST-RSBP algorithm can be broadly applied to RSNNs with a single recurrent layer or deep RSNNs with multiple feed-forward and recurrent layers. Based upon challenging speech and image datasets including TI46, N-TIDIGITS, Fashion-MNIST and MNIST, ST-RSBP is able to train RSNNs with an accuracy surpassing that of the current state-of-art SNN BP algorithms and conventional non-spiking deep learning models.

NeurIPS Conference 2018 Conference Paper

Hybrid Macro/Micro Level Backpropagation for Training Deep Spiking Neural Networks

  • Yingyezhe Jin
  • Wenrui Zhang
  • Peng Li

Spiking neural networks (SNNs) are positioned to enable spatio-temporal information processing and ultra-low power event-driven neuromorphic hardware. However, SNNs are yet to reach the same performances of conventional deep artificial neural networks (ANNs), a long-standing challenge due to complex dynamics and non-differentiable spike events encountered in training. The existing SNN error backpropagation (BP) methods are limited in terms of scalability, lack of proper handling of spiking discontinuities, and/or mismatch between the rate-coded loss function and computed gradient. We present a hybrid macro/micro level backpropagation (HM2-BP) algorithm for training multi-layer SNNs. The temporal effects are precisely captured by the proposed spike-train level post-synaptic potential (S-PSP) at the microscopic level. The rate-coded errors are defined at the macroscopic level, computed and back-propagated across both macroscopic and microscopic levels. Different from existing BP methods, HM2-BP directly computes the gradient of the rate-coded loss function w. r. t tunable parameters. We evaluate the proposed HM2-BP algorithm by training deep fully connected and convolutional SNNs based on the static MNIST [14] and dynamic neuromorphic N-MNIST [26]. HM2-BP achieves an accuracy level of 99. 49% and 98. 88% for MNIST and N-MNIST, respectively, outperforming the best reported performances obtained from the existing SNN BP algorithms. Furthermore, the HM2-BP produces the highest accuracies based on SNNs for the EMNIST [3] dataset, and leads to high recognition accuracy for the 16-speaker spoken English letters of TI46 Corpus [16], a challenging patio-temporal speech recognition benchmark for which no prior success based on SNNs was reported. It also achieves competitive performances surpassing those of conventional deep learning models when dealing with asynchronous spiking streams.

AAAI Conference 2013 Conference Paper

An Extended GHKM Algorithm for Inducing Lambda-SCFG

  • Peng Li
  • Yang Liu
  • Maosong Sun

Semantic parsing, which aims at mapping a natural language (NL) sentence into its formal meaning representation (e. g. , logical form), has received increasing attention in recent years. While synchronous context-free grammar (SCFG) augmented with lambda calculus (λ- SCFG) provides an effective mechanism for semantic parsing, how to learn such λ-SCFG rules still remains a challenge because of the difficulty in determining the correspondence between NL sentences and logical forms. To alleviate this structural divergence problem, we extend the GHKM algorithm, which is a state-ofthe-art algorithm for learning synchronous grammars in statistical machine translation, to induce λ-SCFG from pairs of NL sentences and logical forms. By treating logical forms as trees, we reformulate the theory behind GHKM that gives formal semantics to the alignment between NL words and logical form tokens. Experiments on the GEOQUERY dataset show that our semantic parser achieves an F-measure of 90. 2%, the best result published to date.

JMLR Journal 2012 Journal Article

Distance Metric Learning with Eigenvalue Optimization

  • Yiming Ying
  • Peng Li

The main theme of this paper is to develop a novel eigenvalue optimization framework for learning a Mahalanobis metric. Within this context, we introduce a novel metric learning approach called DML-eig which is shown to be equivalent to a well-known eigenvalue optimization problem called minimizing the maximal eigenvalue of a symmetric matrix (Overton, 1988; Lewis and Overton, 1996). Moreover, we formulate LMNN (Weinberger et al., 2005), one of the state-of-the-art metric learning methods, as a similar eigenvalue optimization problem. This novel framework not only provides new insights into metric learning but also opens new avenues to the design of efficient metric learning algorithms. Indeed, first-order algorithms are developed for DML-eig and LMNN which only need the computation of the largest eigenvector of a matrix per iteration. Their convergence characteristics are rigorously established. Various experiments on benchmark data sets show the competitive performance of our new approaches. In addition, we report an encouraging result on a difficult and challenging face verification data set called Labeled Faces in the Wild (LFW). [abs] [ pdf ][ bib ] &copy JMLR 2012. ( edit, beta )

IROS Conference 2009 Conference Paper

BEST: A real-time tracking method for scout robot

  • Diansheng Chen
  • Feng Bai
  • Peng Li
  • Tianmiao Wang

We propose a BEST (Background subtraction and Enhanced camShift Tracking) method for a scout robot tracking a moving object in real time. A modified back subtraction method based on time axis is used to segment the moving object in a complicated environment. The centroid and area are chosen as the feature to judge target. We proposed a novel method that combines Camshift, AWS (Adaptive Window Selecting method) and Kalman predicting algorithm together to track the detected object. Experiments based on a DSP image processing system in a scout robot indicate the feasibility and robustness of our method.