Arrow Research search

Author name cluster

Meng Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

35 papers
2 author rows

Possible papers

35

NeurIPS Conference 2025 Conference Paper

CryptoMoE: Privacy-Preserving and Scalable Mixture of Experts Inference via Balanced Expert Routing

  • Yifan Zhou
  • Tianshi Xu
  • Jue Hong
  • Ye Wu
  • Meng Li

Private large language model (LLM) inference based on cryptographic primitives offers a promising path towards privacy-preserving deep learning. However, existing frameworks only support dense LLMs like LLaMA-1 and struggle to scale to mixture-of-experts (MoE) architectures. The key challenge comes from securely evaluating the dynamic routing mechanism in MoE layers, which may reveal sensitive input information if not fully protected. In this paper, we propose CryptoMoE, the first framework that enables private, efficient, and accurate inference for MoE-based models. CryptoMoE balances expert loads to protect expert routing information and proposes novel protocols for secure expert dispatch and combine. CryptoMoE also develops a confidence-aware token selection strategy and a batch matrix multiplication protocol to improve accuracy and efficiency further. Extensive experiments on DeepSeekMoE-16. 4B, OLMoE-6. 9B, and QWenMoE-14. 3B show that CryptoMoE achieves $2. 8\sim3. 5\times$ end-to-end latency reduction and $3\sim6\times$ communication reduction over a dense baseline with minimum accuracy loss. We also adapt CipherPrune (ICLR'25) for MoE inference and demonstrate CryptoMoE can reduce the communication by up to $4. 3 \times$.

NeurIPS Conference 2025 Conference Paper

EfficientNav: Towards On-Device Object-Goal Navigation with Navigation Map Caching and Retrieval

  • Zebin Yang
  • Sunjian Zheng
  • Tong Xie
  • Tianshi Xu
  • Bo Yu
  • Fan Wang
  • Jie Tang
  • Shaoshan Liu

Object-goal navigation (ObjNav) tasks an agent with navigating to the location of a specific object in an unseen environment. Embodied agents equipped with large language models (LLMs) and online constructed navigation maps can perform ObjNav in a zero-shot manner. However, existing agents heavily rely on giant LLMs on the cloud, e. g. , GPT-4, while directly switching to small LLMs, e. g. , LLaMA3. 2-11b, suffer from significant success rate drops due to limited model capacity for understanding complex navigation maps, which prevents deploying ObjNav on local devices. At the same time, the long prompt introduced by the navigation map description will cause high planning latency on local devices. In this paper, we propose EfficientNav to enable on-device efficient LLM-based zero-shot ObjNav. To help the smaller LLMs better understand the environment, we propose semantics-aware memory retrieval to prune redundant information in navigation maps. To reduce planning latency, we propose discrete memory caching and attention-based memory clustering to efficiently save and re-use the KV cache. Extensive experimental results demonstrate that EfficientNav achieves 11. 1\% improvement in success rate on HM3D benchmark over GPT-4-based baselines, and demonstrates 6. 7$\times$ real-time latency reduction and 4. 7$\times$ end-to-end latency reduction over GPT-4 planner. Our code is available on https: //github. com/PKU-SEC-Lab/EfficientNav.

NeurIPS Conference 2025 Conference Paper

FreqPolicy: Efficient Flow-based Visuomotor Policy via Frequency Consistency

  • Yifei Su
  • Ning Liu
  • Dong Chen
  • Zhen Zhao
  • Kun Wu
  • Meng Li
  • Zhiyuan Xu
  • Zhengping Che

Generative modeling-based visuomotor policies have been widely adopted in robotic manipulation, attributed to their ability to model multimodal action distributions. However, the high inference cost of multi-step sampling limits its applicability in real-time robotic systems. Existing approaches accelerate sampling in generative modeling-based visuomotor policies by adapting techniques originally developed to speed up image generation. However, a major distinction exists: image generation typically produces independent samples without temporal dependencies, while robotic manipulation requires generating action trajectories with continuity and temporal coherence. To this end, we propose FreqPolicy, a novel approach that first imposes frequency consistency constraints on flow-based visuomotor policies. Our work enables the action model to capture temporal structure effectively while supporting efficient, high-quality one-step action generation. Concretely, we introduce a frequency consistency constraint objective that enforces alignment of frequency-domain action features across different timesteps along the flow, thereby promoting convergence of one-step action generation toward the target distribution. In addition, we design an adaptive consistency loss to capture structural temporal variations inherent in robotic manipulation tasks. We assess FreqPolicy on $53$ tasks across $3$ simulation benchmarks, proving its superiority over existing one-step action generators. We further integrate FreqPolicy into the vision-language-action (VLA) model and achieve acceleration without performance degradation on $40$ tasks of Libero. Besides, we show efficiency and effectiveness in real-world robotic scenarios with an inference frequency of $93. 5$ Hz.

NeurIPS Conference 2025 Conference Paper

MPCache: MPC-Friendly KV Cache Eviction for Efficient Private LLM Inference

  • Wenxuan Zeng
  • Ye Dong
  • Jinjin Zhou
  • Jin Tan
  • Lei Wang
  • Tao Wei
  • Runsheng Wang
  • Meng Li

Private large language model (LLM) inference based on secure multi-party computation (MPC) achieves formal data privacy protection but suffers from significant latency overhead, especially for long input sequences. While key-value (KV) cache eviction and sparse attention algorithms have been proposed for efficient LLM inference in plaintext, they are not designed for MPC and cannot benefit private LLM inference directly. In this paper, we propose an accurate and MPC-friendly KV cache eviction framework, dubbed MPCache, building on the observation that historical tokens in a long sequence may have different effects on the downstream decoding. Hence, MPCache combines a look-once static eviction algorithm to discard unimportant KV cache and a query-aware dynamic selection algorithm to activate only a small subset of KV cache for attention computation. MPCache further incorporates a series of optimizations for efficient dynamic KV cache selection, including MPC-friendly similarity approximation, hierarchical KV cache clustering, and cross-layer index-sharing strategy. Extensive experiments demonstrate that MPCache consistently outperforms prior-art KV cache eviction baselines across different generation tasks and achieves 1. 8 ~ 2. 01x and 3. 39 ~ 8. 37x decoding latency and communication reduction on different sequence lengths, respectively.

AAAI Conference 2025 Conference Paper

SMamba: Sparse Mamba for Event-based Object Detection

  • Nan Yang
  • Yang Wang
  • Zhanwen Liu
  • Meng Li
  • Yisheng An
  • Xiangmo Zhao

Transformer-based methods have achieved remarkable performance in event-based object detection, owing to the global modeling ability. However, they neglect the influence of non-event and noisy regions and process them uniformly, leading to high computational overhead. To mitigate computation cost, some researchers propose window attention based sparsification strategies to discard unimportant regions, which sacrifices the global modeling ability and results in suboptimal performance. To achieve better trade-off between accuracy and efficiency, we propose Sparse Mamba (SMamba), which performs adaptive sparsification to reduce computational effort while maintaining global modeling capability. Specifically, a Spatio-Temporal Continuity Assessment module is proposed to measure the information content of tokens and discard uninformative ones by leveraging the spatiotemporal distribution differences between activity and noise events. Based on the assessment results, an Information-Prioritized Local Scan strategy is designed to shorten the scan distance between high-information tokens, facilitating interactions among them in the spatial dimension. Furthermore, to extend the global interaction from 2D space to 3D representations, a Global Channel Interaction module is proposed to aggregate channel information from a global spatial perspective. Results on three datasets (Gen1, 1Mpx, and eTram) demonstrate that our model outperforms other methods in both performance and efficiency.

NeurIPS Conference 2024 Conference Paper

ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction

  • Renze Chen
  • Zhuofeng Wang
  • Beiquan Cao
  • Tong Wu
  • Size Zheng
  • Xiuhong Li
  • Xuechao Wei
  • Shengen Yan

Large Language Models (LLMs) are widely used in today's tasks of natural language processing. To support applications like multi-turn chats, document understanding, and content generation, models with long context lengths are growing in importance. However, managing long contexts brings substantial challenges due to the expansion of key-value cache (KV cache). Longer KV cache requires larger memory, limiting the batch-size thus decreasing throughput. Also, computing attention over long KV cache incurs more memory access, hurting the end-to-end latency. Prior works find that it is sufficient to use only the recent and high-impact tokens for attention computation, allowing the eviction of less vital tokens to shrink cache size. Nonetheless, we observe a dynamic shift in token importance across different decoding steps. Tokens initially evicted might regain importance after certain decoding steps. To address this, we propose ArkVale, a page-based KV cache manager that can recognize and recall currently important tokens evicted before. We asynchronously copy the filled page into external memory (e. g. , CPU memory) as backup and summarize it into a much smaller digest by constructing the bounding-volume of its keys. Before attention computation, we measure all pages' importance based on their digests, recall the important ones, evict the unimportant ones, and select the top-ranked pages for attention computation. Experiment results show that ArkVale performs well on various long context tasks with negligible accuracy loss under 2k$\sim$4k cache budget and can improve decoding latency to $2. 2\times$ and batching throughput to $4. 6\times$ because it applies attention on only a small subset of pages and reduce per-sample memory usage of KV cache.

JBHI Journal 2024 Journal Article

Hybrid Brain-Computer Interface Controlled Soft Robotic Glove for Stroke Rehabilitation

  • Ruoqing Zhang
  • Shanshan Feng
  • Nan Hu
  • Shunkang Low
  • Meng Li
  • Xiaogang Chen
  • Hongyan Cui

Soft robotic glove controlled by a brain-computer interface (BCI) have demonstrated effectiveness in hand rehabilitation for stroke patients. Current systems rely on static visual representations for patients to perform motor imagination (MI) tasks, resulting in lower BCI performance. Therefore, this study innovatively used MI and high-frequency steady-state visual evoked potential (SSVEP) to construct a friendly and natural hybrid BCI paradigm. Specifically, the stimulation interface sequentially presented decomposed action pictures of the left and right hands gripping a ball, with the pictures flashing at specific stimulation frequencies (left: 34 Hz, right: 35 Hz). Integrating soft robotic glove as feedback, we established a comprehensive “peripheral - central - peripheral” hand rehabilitation system to facilitate the hand rehabilitation of patients. Filter bank common spatial pattern (FBCSP) and filter bank canonical correlation analysis (FBCCA) algorithms were used to identify MI and SSVEP signals, respectively. Additionally, we proposed a novel fusion algorithm to decide the final output of the system. The feasibility of the proposed system was validated through online experiments involving 12 healthy subjects and 9 stroke patients, achieving accuracy rates of 95. 83 ± 6. 83% and 63. 33 ± 10. 38, respectively. The accuracy of MI and SSVEP in 12 healthy subjects reached 81. 67 ± 15. 63% and 95. 14 ± 7. 47%, both lower than the accuracy after fusion, these results confirmed the effectiveness of the proposed algorithm. The accuracy rate was more than 50% in both healthy subjects and patients, confirming the effectiveness of the proposed system.

AAAI Conference 2024 Conference Paper

Intra- and Inter-group Optimal Transport for User-Oriented Fairness in Recommender Systems

  • Zhongxuan Han
  • Chaochao Chen
  • Xiaolin Zheng
  • Meng Li
  • Weiming Liu
  • Binhui Yao
  • Yuyuan Li
  • Jianwei Yin

Recommender systems are typically biased toward a small group of users, leading to severe unfairness in recommendation performance, i.e., User-Oriented Fairness (UOF) issue. Existing research on UOF exhibits notable limitations in two phases of recommendation models. In the training phase, current methods fail to tackle the root cause of the UOF issue, which lies in the unfair training process between advantaged and disadvantaged users. In the evaluation phase, the current UOF metric lacks the ability to comprehensively evaluate varying cases of unfairness. In this paper, we aim to address the aforementioned limitations and ensure recommendation models treat user groups of varying activity levels equally. In the training phase, we propose a novel Intra- and Inter-GrOup Optimal Transport framework (II-GOOT) to alleviate the data sparsity problem for disadvantaged users and narrow the training gap between advantaged and disadvantaged users. In the evaluation phase, we introduce a novel metric called?-UOF, which enables the identification and assessment of various cases of UOF. This helps prevent recommendation models from leading to unfavorable fairness outcomes, where both advantaged and disadvantaged users experience subpar recommendation performance. We conduct extensive experiments on three real-world datasets based on four backbone recommendation models to prove the effectiveness of?-UOF and the efficiency of our proposed II-GOOT.

IROS Conference 2024 Conference Paper

OVGNet: A Unified Visual-Linguistic Framework for Open-Vocabulary Robotic Grasping

  • Meng Li
  • Qi Zhao 0037
  • Shuchang Lyu
  • Chunlei Wang
  • Yujing Ma
  • Guangliang Cheng
  • Chenguang Yang 0001

Recognizing and grasping novel-category objects remains a crucial yet challenging problem in real-world robotic applications. Despite its significance, limited research has been conducted in this specific domain. To address this, we seamlessly propose a novel framework that integrates open-vocabulary learning into the domain of robotic grasping, empowering robots with the capability to adeptly handle novel objects. Our contributions are threefold. Firstly, we present a large-scale benchmark dataset specifically tailored for evaluating the performance of open-vocabulary grasping tasks. Secondly, we propose a unified visual-linguistic framework that serves as a guide for robots in successfully grasping both base and novel objects. Thirdly, we introduce two alignment modules designed to enhance visual-linguistic perception in the robotic grasping process. Extensive experiments validate the efficacy and utility of our approach. Notably, our framework achieves an average accuracy of 71. 2% and 64. 4% on base and novel categories in our new dataset, respectively. Our code and dataset are available at https://github.com/cv516Buaa/OVGNet.

NeurIPS Conference 2024 Conference Paper

PrivCirNet: Efficient Private Inference via Block Circulant Transformation

  • Tianshi Xu
  • Lemeng Wu
  • Runsheng Wang
  • Meng Li

Homomorphic encryption (HE)-based deep neural network (DNN) inference protects data and model privacy but suffers from significant computation overhead. We observe transforming the DNN weights into circulant matrices converts general matrix-vector multiplications into HE-friendly 1-dimensional convolutions, drastically reducing the HE computation cost. Hence, in this paper, we propose PrivCirNet, a protocol/network co-optimization framework based on block circulant transformation. At the protocol level, PrivCirNet customizes the HE encoding algorithm that is fully compatible with the block circulant transformation and reduces the computation latency in proportion to the block size. At the network level, we propose a latency-aware formulation to search for the layer-wise block size assignment based on second-order information. PrivCirNet also leverages layer fusion to further reduce the inference cost. We compare PrivCirNet with the state-of-the-art HE-based framework Bolt (IEEE S\&P 2024) and HE-friendly pruning method SpENCNN (ICML 2023). For ResNet-18 and Vision Transformer (ViT) on Tiny ImageNet, PrivCirNet reduces latency by $5. 0\times$ and $1. 3\times$ with iso-accuracy over Bolt, respectively, and improves accuracy by $4. 1$\% and $12$\% over SpENCNN, respectively. For MobileNetV2 on ImageNet, PrivCirNet achieves $1. 7\times$ lower latency and $4. 2$\% better accuracy over Bolt and SpENCNN, respectively. Our code and checkpoints are available on Git Hub.

NeurIPS Conference 2023 Conference Paper

CoPriv: Network/Protocol Co-Optimization for Communication-Efficient Private Inference

  • Wenxuan Zeng
  • Meng Li
  • Haichuan Yang
  • Wen-jie Lu
  • Runsheng Wang
  • Ru Huang

Deep neural network (DNN) inference based on secure 2-party computation (2PC) can offer cryptographically-secure privacy protection but suffers from orders of magnitude latency overhead due to enormous communication. Previous works heavily rely on a proxy metric of ReLU counts to approximate the communication overhead and focus on reducing the ReLUs to improve the communication efficiency. However, we observe these works achieve limited communication reduction for state-of-the-art (SOTA) 2PC protocols due to the ignorance of other linear and non-linear operations, which now contribute to the majority of communication. In this work, we present CoPriv, a framework that jointly optimizes the 2PC inference protocol and the DNN architecture. CoPriv features a new 2PC protocol for convolution based on Winograd transformation and develops DNN-aware optimization to significantly reduce the inference communication. CoPriv further develops a 2PC-aware network optimization algorithm that is compatible with the proposed protocol and simultaneously reduces the communication for all the linear and non-linear operations. We compare CoPriv with the SOTA 2PC protocol, CrypTFlow2, and demonstrate 2. 1× communication reduction for both ResNet-18 and ResNet-32 on CIFAR-100. We also compare CoPriv with SOTA network optimization methods, including SNL, MetaPruning, etc. CoPriv achieves 9. 98× and 3. 88× online and total communication reduction with a higher accuracy compare to SNL, respectively. CoPriv also achieves 3. 87× online communication reduction with more than 3% higher accuracy compared to MetaPruning.

JMLR Journal 2023 Journal Article

On the Estimation of Derivatives Using Plug-in Kernel Ridge Regression Estimators

  • Zejian Liu
  • Meng Li

We study the problem of estimating the derivatives of a regression function, which has a wide range of applications as a key nonparametric functional of unknown functions. Standard analysis may be tailored to specific derivative orders, and parameter tuning remains a daunting challenge particularly for high-order derivatives. In this article, we propose a simple plug-in kernel ridge regression (KRR) estimator in nonparametric regression with random design that is broadly applicable for multi-dimensional support and arbitrary mixed-partial derivatives. We provide a non-asymptotic analysis to study the behavior of the proposed estimator in a unified manner that encompasses the regression function and its derivatives, leading to two error bounds for a general class of kernels under the strong $L_\infty$ norm. In a concrete example specialized to kernels with polynomially decaying eigenvalues, the proposed estimator recovers the minimax optimal rate up to a logarithmic factor for estimating derivatives of functions in Hölder and Sobolev classes. Interestingly, the proposed estimator achieves the optimal rate of convergence with the same choice of tuning parameter for any order of derivatives. Hence, the proposed estimator enjoys a plug-in property for derivatives in that it automatically adapts to the order of derivatives to be estimated, enabling easy tuning in practice. Our simulation studies show favorable finite sample performance of the proposed method relative to several existing methods and corroborate the theoretical findings on its minimax optimality. [abs] [ pdf ][ bib ] &copy JMLR 2023. ( edit, beta )

AAAI Conference 2023 Conference Paper

Stroke Extraction of Chinese Character Based on Deep Structure Deformable Image Registration

  • Meng Li
  • Yahan Yu
  • Yi Yang
  • Guanghao Ren
  • Jian Wang

Stroke extraction of Chinese characters plays an important role in the field of character recognition and generation. The most existing character stroke extraction methods focus on image morphological features. These methods usually lead to errors of cross strokes extraction and stroke matching due to rarely using stroke semantics and prior information. In this paper, we propose a deep learning-based character stroke extraction method that takes semantic features and prior information of strokes into consideration. This method consists of three parts: image registration-based stroke registration that establishes the rough registration of the reference strokes and the target as prior information; image semantic segmentation-based stroke segmentation that preliminarily separates target strokes into seven categories; and high-precision extraction of single strokes. In the stroke registration, we propose a structure deformable image registration network to achieve structure-deformable transformation while maintaining the stable morphology of single strokes for character images with complex structures. In order to verify the effectiveness of the method, we construct two datasets respectively for calligraphy characters and regular handwriting characters. The experimental results show that our method strongly outperforms the baselines. Code is available at https://github.com/MengLi-l1/StrokeExtraction.

NeurIPS Conference 2022 Conference Paper

BiT: Robustly Binarized Multi-distilled Transformer

  • Zechun Liu
  • Barlas Oguz
  • Aasish Pappu
  • Lin Xiao
  • Scott Yih
  • Meng Li
  • Raghuraman Krishnamoorthi
  • Yashar Mehdad

Modern pre-trained transformers have rapidly advanced the state-of-the-art in machine learning, but have also grown in parameters and computational complexity, making them increasingly difficult to deploy in resource-constrained environments. Binarization of the weights and activations of the network can significantly alleviate these issues, however, is technically challenging from an optimization perspective. In this work, we identify a series of improvements that enables binary transformers at a much higher accuracy than what was possible previously. These include a two-set binarization scheme, a novel elastic binary activation function with learned parameters, and a method to quantize a network to its limit by successively distilling higher precision models into lower precision students. These approaches allow for the first time, fully binarized transformer models that are at a practical level of accuracy, approaching a full-precision BERT baseline on the GLUE language understanding benchmark within as little as 5. 9%. Code and models are available at: https: //github. com/facebookresearch/bit.

IJCAI Conference 2022 Conference Paper

Decentralized Unsupervised Learning of Visual Representations

  • Yawen Wu
  • Zhepeng Wang
  • Dewen Zeng
  • Meng Li
  • Yiyu Shi
  • Jingtong Hu

Collaborative learning enables distributed clients to learn a shared model for prediction while keeping the training data local on each client. However, existing collaborative learning methods require fully-labeled data for training, which is inconvenient or sometimes infeasible to obtain due to the high labeling cost and the requirement of expertise. The lack of labels makes collaborative learning impractical in many realistic settings. Self-supervised learning can address this challenge by learning from unlabeled data. Contrastive learning (CL), a self-supervised learning approach, can effectively learn visual representations from unlabeled image data. However, the distributed data collected on clients are usually not independent and identically distributed (non-IID) among clients, and each client may only have few classes of data, which degrades the performance of CL and learned representations. To tackle this problem, we propose a collaborative contrastive learning framework consisting of two approaches: feature fusion and neighborhood matching, by which a unified feature space among clients is learned for better data representations. Feature fusion provides remote features as accurate contrastive information to each client for better local learning. Neighborhood matching further aligns each client’s local features to the remote features such that well-clustered features among clients can be learned. Extensive experiments show the effectiveness of the proposed framework. It outperforms other methods by 11% on IID data and matches the performance of centralized learning.

JMLR Journal 2022 Journal Article

Double Spike Dirichlet Priors for Structured Weighting

  • Huiming Lin
  • Meng Li

Assigning weights to a large pool of objects is a fundamental task in a wide variety of applications. In this article, we introduce the concept of structured high-dimensional probability simplexes, in which most components are zero or near zero and the remaining ones are close to each other. Such structure is well motivated by (i) high-dimensional weights that are common in modern applications, and (ii) ubiquitous examples in which equal weights---despite their simplicity---often achieve favorable or even state-of-the-art predictive performance. This particular structure, however, presents unique challenges partly because, unlike high-dimensional linear regression, the parameter space is a simplex and pattern switching between partial constancy and sparsity is unknown. To address these challenges, we propose a new class of double spike Dirichlet priors to shrink a probability simplex to one with the desired structure. When applied to ensemble learning, such priors lead to a Bayesian method for structured high-dimensional ensembles that is useful for forecast combination and improving random forests, while enabling uncertainty quantification. We design efficient Markov chain Monte Carlo algorithms for implementation. Posterior contraction rates are established to study large sample behaviors of the posterior distribution. We demonstrate the wide applicability and competitive performance of the proposed methods through simulations and two real data applications using the European Central Bank Survey of Professional Forecasters data set and a data set from the UC Irvine Machine Learning Repository (UCI). [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2022. ( edit, beta )

AAAI Conference 2021 Conference Paper

NASGEM: Neural Architecture Search via Graph Embedding Method

  • Hsin-Pai Cheng
  • Tunhou Zhang
  • Yixing Zhang
  • Shiyu Li
  • Feng Liang
  • Feng Yan
  • Meng Li
  • Vikas Chandra

Neural Architecture Search (NAS) automates and prospers the design of neural networks. Estimator-based NAS has been proposed recently to model the relationship between architectures and their performance to enable scalable and flexible search. However, existing estimator-based methods encode the architecture into a latent space without considering graph similarity. Ignoring graph similarity in node-based search space may induce a large inconsistency between similar graphs and their distance in the continuous encoding space, leading to inaccurate encoding representation and/or reduced representation capacity that can yield sub-optimal search results. To preserve graph correlation information in encoding, we propose NAS- GEM which stands for Neural Architecture Search via Graph Embedding Method. NASGEM is driven by a novel graph embedding method equipped with similarity measures to capture the graph topology information. By precisely estimating the graph distance and using an auxiliary Weisfeiler-Lehman kernel to guide the encoding, NASGEM can utilize additional structural information to get more accurate graph representation to improve the search efficiency. GEMNet, a set of networks discovered by NASGEM, consistently outperforms networks crafted by existing search methods in classification tasks, i. e. , with 0. 4%-3. 6% higher accuracy while having 11%- 21% fewer Multiply-Accumulates. We further transfer GEM- Net for COCO object detection. In both one-stage and twostage detectors, our GEMNet surpasses its manually-crafted and automatically-searched counterparts.

TIST Journal 2021 Journal Article

PP-PG: Combining Parameter Perturbation with Policy Gradient Methods for Effective and Efficient Explorations in Deep Reinforcement Learning

  • Shilei Li
  • Meng Li
  • Jiongming Su
  • Shaofei Chen
  • Zhimin Yuan
  • Qing Ye

Efficient and stable exploration remains a key challenge for deep reinforcement learning (DRL) operating in high-dimensional action and state spaces. Recently, a more promising approach by combining the exploration in the action space with the exploration in the parameters space has been proposed to get the best of both methods. In this article, we propose a new iterative and close-loop framework by combining the evolutionary algorithm (EA), which does explorations in a gradient-free manner directly in the parameters space with an actor-critic, and the deep deterministic policy gradient (DDPG) reinforcement learning algorithm, which does explorations in a gradient-based manner in the action space to make these two methods cooperate in a more balanced and efficient way. In our framework, the policies represented by the EA population (the parametric perturbation part) can evolve in a guided manner by utilizing the gradient information provided by the DDPG and the policy gradient part (DDPG) is used only as a fine-tuning tool for the best individual in the EA population to improve the sample efficiency. In particular, we propose a criterion to determine the training steps required for the DDPG to ensure that useful gradient information can be generated from the EA generated samples and the DDPG and EA part can work together in a more balanced way during each generation. Furthermore, within the DDPG part, our algorithm can flexibly switch between fine-tuning the same previous RL-Actor and fine-tuning a new one generated by the EA according to different situations to further improve the efficiency. Experiments on a range of challenging continuous control benchmarks demonstrate that our algorithm outperforms related works and offers a satisfactory trade-off between stability and sample efficiency.

ICRA Conference 2016 Conference Paper

Control and experimental validation of robot-assisted automatic measurement system for Multi-Stud Tensioning Machine (MSTM)

  • Meng Li
  • Xingguang Duan
  • Haoyuan Li
  • Tengfei Cui
  • Liang Gao
  • Yue Zhan
  • Yan Xu

Multi-Stud Tensioning Machine (MSTM) is a specialized equipment used to open/seal the cover of the Reactor Pressure Vessel (RPV) during nuclear power plant maintenance. The tensioning residual values of the 58 studs are monitored for procedure evaluation. It is time-consuming for human operators to place the measurement meters into working positions. In order to reduce labor intensity and eliminate radiation exposure time, we develop a robot-assisted automatic measurement system to achieve meter placement and real-time data monitoring. The Field Programmable Gate Array (FPGA)-based distributed control scheme realizes high-speed data acquisition and coordinated control of the 58 node robots. The control software performs data analysis and sends emergency stop signals to the MSTM control PLC. The proposed system is validated in China Nuclear Power Technology Research Institute. Total operation time decreases from over 580 s to less than 120 s.

ICRA Conference 2003 Conference Paper

Conceptual Design and Kinematic Analysis of a 3-DOF Robot Wrist

  • Meng Li
  • Tian Huang
  • Zhanxian Li

A novel and plug-and-play robot wrist with three rotational degrees of freedom (DOF) is proposed in this paper. The wrist is composed of three independent kinematic chains, each of which rotates with respect to the fixed reference frame. Thus, the output of the wrist is the resultant of the differential motions of the three kinematic chains. The structure of the mechanism is simple and compact with a relatively large orientation capability. Various end-effectors, CCD camera for instance, can be mounted on the wrist through standard interface. The working principle and mechanical structure are described and the mathematical models for inverse and forward analyses are developed. The singularity of the wrist is also obtained.