Arrow Research search

Author name cluster

Peng Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

50 papers
2 author rows

Possible papers

50

TAAS Journal 2026 Journal Article

An Imitative Reinforcement Learning Framework for Pursuit-Lock-Launch Missions

  • Siyuan Li
  • Rongchang Zuo
  • Bofei Liu
  • Yaoyu He
  • Peng Liu
  • Yingnan Zhao

Unmanned combat aerial vehicle (UCAV) within-visual-range (WVR) engagement, referring to a fight between two or more UCAVs at close quarters, plays a decisive role on the aerial battlefields. With the development of artificial intelligence, WVR engagement progressively advances toward intelligent and autonomous modes. However, autonomous WVR engagement policy learning is hindered by challenges such as weak exploration capabilities, low learning efficiency, and unrealistic simulated environments. To overcome these challenges, we propose a novel imitative reinforcement learning framework, which efficiently leverages expert data while enabling autonomous exploration. The proposed framework not only enhances learning efficiency through expert imitation but also ensures adaptability to dynamic environments via autonomous exploration with reinforcement learning. Therefore, the proposed framework can learn a successful policy of “pursuit-lock-launch” for UCAVs. To support data-driven learning, we establish an environment based on the Harfang3D sandbox. The extensive experimental results indicate that the proposed framework excels in this multistage task and significantly outperforms state-of-the-art reinforcement learning and imitation learning methods. Thanks to the ability of imitating experts and autonomous exploration, our framework can quickly learn the critical knowledge in complex aerial combat tasks, achieving up to a 100% success rate and demonstrating excellent robustness.

TMLR Journal 2026 Journal Article

High-Layer Attention Pruning with Rescaling

  • Songtao Liu
  • Peng Liu

Pruning is a highly effective approach for compressing large language models (LLMs), significantly reducing inference latency. However, conventional training-free structured pruning methods often employ a heuristic metric that indiscriminately removes some attention heads across all pruning layers, without considering their positions within the network architecture. In this work, we propose a novel pruning algorithm that strategically prunes attention heads in the model's higher layers. Since the removal of attention heads can alter the magnitude of token representations, we introduce an adaptive rescaling parameter that calibrates the representation scale post-pruning to counteract this effect. We conduct comprehensive experiments on a wide range of LLMs, including LLaMA3.1-8B, Mistral-7B-v0.3, Qwen2-7B, and Gemma2-9B. Our evaluation includes both generation and discriminative tasks across 27 datasets. The results consistently demonstrate that our method outperforms existing structured pruning methods. This improvement is particularly notable in generation tasks, where our approach significantly outperforms existing baselines. Code is available at \url{https://github.com/SongtaoLiu0823/HARP}.

YNIMG Journal 2026 Journal Article

Hippocampal subfields volumes as biomarkers for early diagnosis of asymptomatic manganese overexposure

  • Jiayu Wu
  • Yuli Gao
  • Xuying Ru
  • Sijia Fan
  • Ming Gao
  • Mengxue Sun
  • Yixin Cao
  • Mingyue Ma

Chronic occupational manganese (Mn) overexposure is associated with motor and cognitive deficits, but its effects on hippocampal subfields remain underexplored. Although neurodegeneration is known to involve the hippocampus, subfield-specific structural abnormalities have received limited attention. This study aimed to assess the diagnostic power of hippocampal subfield volumes in discriminating asymptomatic Mn-exposed welders from healthy controls (HCs). Mn-exposed welders and age-matched HCs were recruited and underwent high-resolution T1-weighted MRI scans. Volumes across 19 hippocampal subfields of each subject were estimated from automated tissue segmentations and surface-based reconstruction using FreeSurfer. The laterality value was defined as: (Right-Left)/(Right+Left)*100. Between-group differences in subfield volumes and laterality were assessed using cross-sectional analysis. Three machine learning classifiers, including logistic regression, K-nearest neighbors and support vector machine (SVM), were applied to differentiate welders from HCs. Compared to HCs, Mn-exposed welders had reduced volumes mainly in the fimbria, subiculum, and presubiculum, while showing higher volumes in the cornu ammonis area 3 (CA3). The welders group demonstrated significant rightward laterality in CA1 and CA4, and leftward laterality in the presubiculum. Among the three classifiers, the SVM classifier achieved the best performance (AUC = 0.96) in distinguishing welders from HCs using subfield volumes. Additionally, the exposure duration was non-linearly associated with left fimbria volume. These results revealed distinct volumetric and asymmetric patterns in hippocampal subfields among Mn-exposed welders, indicating regional vulnerability and potential compensatory responses. Notably, our findings underscored that hippocampal subfield volumes might serve as imaging biomarkers for early diagnosis in individuals with asymptomatic Mn overexposure.

AILAW Journal 2026 Journal Article

LEAD: legal efficiency and diversity in fine-tuning data selection through dual-metric optimization and syntactic clustering

  • Peng Liu
  • Qingsheng Li
  • Qingwen Tu
  • Sidong Zhu
  • Sheng Bi

Abstract Existing fine-tuning data selection methods for large language models (LLMs) often struggle to balance two aspects: the evaluation of legal fine-tuning data quality and the maintenance of syntactic diversity. The assessment of data quality consists of two dimensions: legal content quality, encompassing logical rigor, legal basis traceability, fact coverage, and actionable guidance, and instructional complexity, which measures how challenging an instruction–response pair is for the model to learn. Syntactic diversity, in contrast, captures the variety of sentence structures and dependency patterns within the dataset, reflecting how sentences differ in grammatical organization, for example through variations between active and passive voice, clause nesting, or modifier placement. To address these issues, this paper proposes LEAD (Legal Efficiency and Diversity), a fine-tuning data selection framework for legal LLMs that integrates dual-objective optimization with syntactic structure clustering. LEAD consists of two modules: the Legal Efficiency Training Score (LETS), which evaluates each sample’s legal quality, and Syntactic Diversity Clustering (SDC), which applies graph-kernel methods to ensure syntactic diversity. Experiments show that using only 5% of the original dataset selected by LEAD, the resulting model outperforms full-data training by 14. 9% and achieves an average 12. 3% improvement across legal benchmarks, demonstrating strong potential for efficient legal LLM fine-tuning.

AAAI Conference 2026 Conference Paper

Towards Adaptive Humanoid Control via Multi-Behavior Distillation and Reinforced Fine-Tuning

  • Yingnan Zhao
  • Xinmiao Wang
  • Dewei Wang
  • Xinzhe Liu
  • Dan Lu
  • Qilong Han
  • Peng Liu
  • Chenjia Bai

Humanoid robots are promising to learn a diverse set of human-like locomotion behaviors, including standing up, walking, running, and jumping. However, existing methods predominantly require training independent policies for each skill, yielding behavior-specific controllers that exhibit limited generalization and brittle performance when deployed on irregular terrains and in diverse situations. To address this challenge, we propose Adaptive Humanoid Control (AHC) that adopts a two-stage framework to learn an adaptive humanoid locomotion controller across different skills and terrains. Specifically, we first train several primary locomotion policies and perform a multi-behavior distillation process to obtain a basic multi-behavior controller, facilitating adaptive behavior switching based on the environment. Then, we perform reinforced fine-tuning by collecting online feedback in performing adaptive behaviors on more diverse terrains, enhancing terrain adaptability for the adaptive behavior controller. We conduct experiments in both simulation and real-world experiments in Unitree G1 robots. The results show that our method exhibits strong adaptability across various situations and terrains.

AAAI Conference 2025 Conference Paper

Explainable Neural Networks with Guarantee: A Sparse Estimation Approach

  • Antoine Ledent
  • Peng Liu

Balancing predictive power and interpretability has long been a challenging research area, particularly in powerful yet complex models like neural networks, where nonlinearity obstructs direct interpretation. This paper introduces a novel approach to constructing an explainable neural network that harmonizes predictiveness and explainability. Our model is designed as a linear combination of a sparse set of jointly learned features, each derived from a different trainable function applied to a single 1-dimensional input feature. Leveraging the ability to learn arbitrarily complex relationships, our neural network architecture enables automatic selection of a sparse set of important features, with the final prediction being a sum of rescaled versions of these features. We demonstrate the ability to select significant features while maintaining comparable predictive performance and direct interpretability through extensive experiments on synthetic and real-world datasets. We also provide theoretical analysis on the generalization bounds of our framework, which is favorably linear in the number of selected features and only logarithmic in the number of input features. We further lift any dependence of sample complexity on the number of parameters or the architectural details under very mild conditions. Our research paves the way for further research on sparse and explainable neural networks with guarantees.

YNIMG Journal 2025 Journal Article

Joint resting state and structural networks characterize pediatric bipolar patients compared to healthy controls: a multimodal fusion approach

  • Xiaoping Yi
  • Mingzhao Ma
  • Xueying Wang
  • Jinfan Zhang
  • Feifei Wu
  • Haimiao Huang
  • Qian Xiao
  • An Xie

Pediatric bipolar disorder (PBD) is a highly debilitating condition, characterized by alternating episodes of mania and depression, with intervening periods of remission. Limited information is available about the functional and structural abnormalities in PBD, particularly when comparing type I with type II subtypes. Resting-state brain activity and structural grey matter, assessed through MRI, may provide insight into the neurobiological biomarkers of this disorder. In this study, Resting state Regional Homogeneity (ReHo) and grey matter concentration (GMC) data of 58 PBD patients, and 21 healthy controls matched for age, gender, education and IQ, were analyzed in a data fusion unsupervised machine learning approach known as transposed Independent Vector Analysis. Two networks significantly differed between BPD and HC. The first network included fronto- medial regions, such as the medial and superior frontal gyrus, the cingulate, and displayed higher ReHo and GMC values in PBD compared to HC. The second network included temporo-posterior regions, as well as the insula, the caudate and the precuneus and displayed lower ReHo and GMC values in PBD compared to HC. Additionally, two networks differ between type-I vs type-II in PBD: an occipito-cerebellar network with increased ReHo and GMC in type-I compared to type-II, and a fronto-parietal network with decreased ReHo and GMC in type-I compared to type-II. Of note, the first network positively correlated with depression scores. These findings shed new light on the functional and structural abnormalities displayed by pediatric bipolar patients.

AAAI Conference 2025 Conference Paper

Radiology Report Generation via Multi-objective Preference Optimization

  • Ting Xiao
  • Lei Shi
  • Peng Liu
  • Zhe Wang
  • Chenjia Bai

Automatic Radiology Report Generation (RRG) is an important topic for alleviating the substantial workload of radiologists. Existing RRG approaches rely on supervised regression based on different architectures or additional knowledge injection, while the generated report may not align optimally with radiologists’ preferences. Especially, since the preferences of radiologists are inherently heterogeneous and multi-dimensional, e.g., some may prioritize report fluency, while others emphasize clinical accuracy. To address this problem, we propose a new RRG method via Multi-objective Preference Optimization (MPO) to align the pre-trained RRG model with multiple human preferences, which can be formulated by multi-dimensional reward functions and optimized by multi-objective reinforcement learning (RL). Specifically, we use a preference vector to represent the weight of preferences and use it as a condition for the RRG model. Then, a linearly weighed reward is obtained via a dot product between the preference vector and multi-dimensional reward. Next, the RRG model is optimized to align with the preference vector by optimizing such a reward via RL. In the training stage, we randomly sample diverse preference vectors from the preference space and align the model by optimizing the weighted multi-objective rewards, which leads to an optimal policy on the entire preference space. When inference, our model can generate reports aligned with specific preferences without further fine-tuning. Extensive experiments on two public datasets show the proposed method can generate reports that cater to different preferences in a single model and achieve state-of-the-art performance.

ICLR Conference 2025 Conference Paper

RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction

  • Peng Liu
  • Dongyang Dai
  • Zhiyong Wu 0001

Recent advancements in generative modeling have significantly enhanced the reconstruction of audio waveforms from various representations. While diffusion models are adept at this task, they are hindered by latency issues due to their operation at the individual sample point level and the need for numerous sampling steps. In this study, we introduce RFWave, a cutting-edge multi-band Rectified Flow approach designed to reconstruct high-fidelity audio waveforms from Mel-spectrograms or discrete acoustic tokens. RFWave uniquely generates complex spectrograms and operates at the frame level, processing all subbands simultaneously to boost efficiency. Leveraging Rectified Flow, which targets a straight transport trajectory, RFWave achieves reconstruction with just 10 sampling steps. Our empirical evaluations show that RFWave not only provides outstanding reconstruction quality but also offers vastly superior computational efficiency, enabling audio generation at speeds up to 160 times faster than real-time on a GPU. Both an online demonstration and the source code are accessible.

AAAI Conference 2025 Conference Paper

Safe Planner: Empowering Safety Awareness in Large Pre-Trained Models for Robot Task Planning

  • Siyuan Li
  • Feifan Liu
  • Lingfei Cui
  • Jiani Lu
  • Qinqin Xiao
  • Xirui Yang
  • Peng Liu
  • Kewu Sun

Robot task planning is an important problem for autonomous robots in long-horizon challenging tasks. As large pre-trained models have demonstrated superior planning ability, recent research investigates utilizing large models to achieve autonomous planning for robots in diverse tasks. However, since the large models are pre-trained with Internet data and lack the knowledge of real task scenes, large models as planners may make unsafe decisions that hurt the robots and the surrounding environments. To solve this challenge, we propose a novel Safe Planner framework, which empowers safety awareness in large pre-trained models to accomplish safe and executable planning. In this framework, we develop a safety prediction module to guide the high-level large model planner, and this safety module trained in a simulator can be effectively transferred to real-world tasks. The proposed Safe Planner framework is evaluated on both simulated environments and real robots. The experiment results demonstrate that Safe Planner not only achieves state-of-the-art task success rates, but also substantially improves safety during task execution.

AAAI Conference 2025 Conference Paper

SkillTree: Explainable Skill-Based Deep Reinforcement Learning for Long-Horizon Control Tasks

  • Yongyan Wen
  • Siyuan Li
  • Rongchang Zuo
  • Lei Yuan
  • Hangyu Mao
  • Peng Liu

Deep reinforcement learning (DRL) has achieved remarkable success in various domains, yet its reliance on neural networks results in a lack of transparency, which limits its practical applications in safety-critical and human-agent interaction domains. Decision trees, known for their notable explainability, have emerged as a promising alternative to neural networks. However, decision trees often struggle in long-horizon continuous control tasks with high-dimensional observation space due to their limited expressiveness. To address this challenge, we propose SkillTree, a novel hierarchical framework that reduces the complex continuous action space of challenging control tasks into discrete skill space. By integrating the differentiable decision tree within the high-level policy, SkillTree generates discrete skill embeddings that guide low-level policy execution. Furthermore, through distillation, we obtain a simplified decision tree model that improves performance while further reducing complexity. Experiment results validate SkillTree’s effectiveness across various robotic manipulation tasks, providing clear skill-level insights into the decision-making process. The proposed approach not only achieves performance comparable to neural network based methods in complex long-horizon control tasks but also significantly enhances the transparency and explainability of the decision-making process.

AAAI Conference 2025 Conference Paper

VERO: Verification and Zero-Shot Feedback Acquisition for Few-Shot Multimodal Aspect-Level Sentiment Classification

  • Kai Sun
  • Hao Wu
  • Bin Shi
  • Samuel Mensah
  • Peng Liu
  • Bo Dong

Deep learning approaches for multimodal aspect-level sentiment classification (MALSC) often require extensive data, which is costly and time-consuming to obtain. To mitigate this, current methods typically fine-tune small-scale pretrained models like BERT and BART with few-shot examples. While these models have shown success, Large Vision-Language Models (LVLMs) offer significant advantages due to their greater capacity and ability to understand nuanced language in both zero-shot and few-shot settings. However, there is limited work on fine-tuning LVLMs for MALSC. A major challenge lies in selecting few-shot examples that effectively capture the underlying patterns in data for these LVLMs. To bridge this research gap, we propose an acquisition function designed to select challenging samples for the few-shot learning of LVLMs for MALSC. We compare our approach, Verification and ZERO-shot feedback acquisition (VERO), with diverse acquisition functions for few-shot learning in MALSC. Our experiments show that VERO outperforms prior methods, achieving an F1 score improvement of up to 6.07% on MALSC benchmark datasets.

YNIMG Journal 2025 Journal Article

White matter hyperintensity tissue property spatial variations as a function of cognitive status in Parkinson’s disease

  • Mariyemuguli Reheman
  • Sagar Buch
  • Naying He
  • Pei Huang
  • Qiurong Yu
  • Xinhui Wang
  • Yu Liu
  • Youmin Zhang

BACKGROUND AND PURPOSE: The pathological relationship between white matter hyperintensities (WMH) and cognitive impairment in Parkinson's disease (PD) remains unclear due to their variable locations, heterogeneity, and limited assessment of underlying tissue properties. This study integrates T2-FLAIR and quantitative MRI (qMRI) to investigate burden, spatial distribution, and extent of tissue alterations in WMH, aiming to elucidate their role in cognitive decline among PD patients. METHODS: A total of 122 age- and sex-matched PD patients and 65 healthy controls (HC) were recruited, with PD patients grouped by Montreal Cognitive Assessment (MoCA) score including normal, mild cognitive impairment (MCI) or PD with dementia (PDD). WMH burden was compared across groups and cognitive status. Water content, T1, and T2* measures were derived from qMRI data and tissue property heatmaps and periventricular distance profiles were constructed for all groups to visualize location-dependent tissue alterations of WMH relative to the lateral ventricles. In addition, voxel-wise analysis was performed to examine the correlation between WMH lesion tissue properties and MoCA scores. RESULTS: WMH volume was significantly higher in PDD compared to other groups (p < 0.05) and negatively correlated with MoCA scores (r = -0.352, p < 0.001). WMH appeared predominantly around the lateral ventricles, with anterior horn involvement common to all groups and posterior horn involvement specific to PDD. qMRI measures were significantly elevated in WMH compared to normal appearing white matter (NAWM) (p < 0.001), with heatmaps showing a negative gradient of tissue property changes from the lateral ventricles to the NAWM. Voxel-wise analysis revealed a significant negative correlation between the qMRI tissue properties of periventricular WMH and MoCA scores, with the strongest association observed in the periventricular WM situated just beyond the boundary of the lateral ventricles. CONCLUSION: Over and above volume differences, the spatial distribution and tissue property variations of WMH were closely linked to cognitive impairment in PD patients, with distinct patterns across different cognitive stages.

YNIMG Journal 2024 Journal Article

Development and validation of a perivascular space segmentation method in multi-center datasets

  • Peiyu Huang
  • Lingyun Liu
  • Yao Zhang
  • Siyan Zhong
  • Peng Liu
  • Hui Hong
  • Shuyue Wang
  • Linyun Xie

BACKGROUND: Perivascular spaces (PVS) visible on magnetic resonance imaging (MRI) are significant markers associated with various neurological diseases. Although quantitative analysis of PVS may enhance sensitivity and improve consistency across studies, the field lacks a universally validated method for analyzing images from multi-center studies. METHODS: We annotated PVS on multi-center 3D T1-weighted (T1w) images acquired using scanners from three major vendors (Siemens, General Electric, and Philips). A neural network, mcPVS-Net (multi-center PVS segmentation network), was trained using data from 40 subjects and then tested in a separate cohort of 15 subjects. We assessed segmentation accuracy against ground truth masks tailored for each scanner vendor. Additionally, we evaluated the agreement between segmented PVS volumes and visual scores for each scanner. We also explored correlations between PVS volumes and various clinical factors such as age, hypertension, and white matter hyperintensities (WMH) in a larger sample of 1020 subjects. Furthermore, mcPVS-Net was applied to a new dataset comprising both T1w and T2-weighted (T2w) images from a United Imaging scanner to investigate if PVS volumes could discriminate between subjects with differing visual scores. We also compared the mcPVS-Net with a previously published method that segments PVS from T1 images. RESULTS: In the test dataset, mcPVS-Net achieved a mean DICE coefficient of 0.80, with an average Precision of 0.81 and Recall of 0.79, indicating good specificity and sensitivity. The segmented PVS volumes were significantly associated with visual scores in both the basal ganglia (r = 0.541, p < 0.001) and white matter regions (r = 0.706, p < 0.001), and PVS volumes were significantly different among subjects with varying visual scores. Segmentation performance was consistent across different scanner vendors. PVS volumes exhibited significant associations with age, hypertension, and WMH. In the United Imaging scanner dataset, PVS volumes showed good associations with PVS visual scores evaluated on either T1w or T2w images. Compared to a previously published method, mcPVS-Net showed a higher accuracy and improved PVS segmentation in the basal ganglia region. CONCLUSION: The mcPVS-Net demonstrated good accuracy for segmenting PVS from 3D T1w images. It may serve as a useful tool for future PVS research.

AAAI Conference 2024 Conference Paper

EG-NAS: Neural Architecture Search with Fast Evolutionary Exploration

  • Zicheng Cai
  • Lei Chen
  • Peng Liu
  • Tongtao Ling
  • Yutao Lai

Differentiable Architecture Search (DARTS) has achieved a rapid search for excellent architectures by optimizing architecture parameters through gradient descent. However, this efficiency comes with a significant challenge: the risk of premature convergence to local optima, resulting in subpar performance that falls short of expectations. To address this issue, we propose a novel and effective method called Evolutionary Gradient-Based Neural Architecture Search (EG-NAS). Our approach combines the strengths of both gradient descent and evolutionary strategy, allowing for the exploration of various optimization directions during the architecture search process. To begin with, we continue to employ gradient descent for updating network parameters to ensure efficiency. Subsequently, to mitigate the risk of premature convergence, we introduce an evolutionary strategy with global search capabilities to optimize the architecture parameters. By leveraging the best of both worlds, our method strikes a balance between efficient exploration and exploitation of the search space. Moreover, we have redefined the fitness function to not only consider accuracy but also account for individual similarity. This inclusion enhances the diversity and accuracy of the optimized directions identified by the evolutionary strategy. Extensive experiments on various datasets and search spaces demonstrate that EG-NAS achieves highly competitive performance at significantly low search costs compared to state-of-the-art methods. The code is available at https://github.com/caicaicheng/EG-NAS.

AAAI Conference 2024 Conference Paper

How to Evaluate the Generalization of Detection? A Benchmark for Comprehensive Open-Vocabulary Detection

  • Yiyang Yao
  • Peng Liu
  • Tiancheng Zhao
  • Qianqian Zhang
  • Jiajia Liao
  • Chunxin Fang
  • Kyusong Lee
  • Qing Wang

Object detection (OD) in computer vision has made significant progress in recent years, transitioning from closed-set labels to open-vocabulary detection (OVD) based on large-scale vision-language pre-training (VLP). However, current evaluation methods and datasets are limited to testing generalization over object types and referral expressions, which do not provide a systematic, fine-grained, and accurate benchmark of OVD models' abilities. In this paper, we propose a new benchmark named OVDEval, which includes 9 sub-tasks and introduces evaluations on commonsense knowledge, attribute understanding, position understanding, object relation comprehension, and more. The dataset is meticulously created to provide hard negatives that challenge models' true understanding of visual and linguistic input. Additionally, we identify a problem with the popular Average Precision (AP) metric when benchmarking models on these fine-grained label datasets and propose a new metric called Non-Maximum Suppression Average Precision (NMS-AP) to address this issue. Extensive experimental results show that existing top OVD models all fail on the new tasks except for simple object types, demonstrating the value of the proposed dataset in pinpointing the weakness of current OVD models and guiding future research. Furthermore, the proposed NMS-AP metric is verified by experiments to provide a much more truthful evaluation of OVD models, whereas traditional AP metrics yield deceptive results. Data is available at https://github.com/om-ai-lab/OVDEval

ICLR Conference 2024 Conference Paper

Latent Trajectory Learning for Limited Timestamps under Distribution Shift over Time

  • Qiuhao Zeng
  • Changjian Shui
  • Long-Kai Huang
  • Peng Liu
  • Xi Chen 0009
  • Charles X. Ling
  • Boyu Wang 0004

Distribution shifts over time are common in real-world machine-learning applications. This scenario is formulated as Evolving Domain Generalization (EDG), where models aim to generalize well to unseen target domains in a time-varying system by learning and leveraging the underlying evolving pattern of the distribution shifts across domains. However, existing methods encounter challenges due to the limited number of timestamps (every domain corresponds to a timestamp) in EDG datasets, leading to difficulties in capturing evolving dynamics and risking overfitting to the sparse timestamps, which hampers their generalization and adaptability to new tasks. To address this limitation, we propose a novel approach SDE-EDG that collects the Infinitely Fined-Grid Evolving Trajectory (IFGET) of the data distribution with continuous-interpolated samples to bridge temporal gaps (intervals between two successive timestamps). Furthermore, by leveraging the inherent capacity of Stochastic Differential Equations (SDEs) to capture continuous trajectories, we propose their use to align SDE-modeled trajectories with IFGET across domains, thus enabling the capture of evolving distribution trends. We evaluate our approach on several benchmark datasets and demonstrate that it can achieve superior performance compared to existing state-of-the-art methods.

AAAI Conference 2024 Conference Paper

Model-Driven Deep Neural Network for Enhanced AoA Estimation Using 5G gNB

  • Shengheng Liu
  • Xingkang Li
  • Zihuan Mao
  • Peng Liu
  • Yongming Huang

High-accuracy positioning has become a fundamental enabler for intelligent connected devices. Nevertheless, the present wireless networks still rely on model-driven approaches to achieve positioning functionality, which are susceptible to performance degradation in practical scenarios, primarily due to hardware impairments. Integrating artificial intelligence into the positioning framework presents a promising solution to revolutionize the accuracy and robustness of location-based services. In this study, we address this challenge by reformulating the problem of angle-of-arrival (AoA) estimation into image reconstruction of spatial spectrum. To this end, we design a model-driven deep neural network (MoD-DNN), which can automatically calibrate the angular-dependent phase error. The proposed MoD-DNN approach employs an iterative optimization scheme between a convolutional neural network and a sparse conjugate gradient algorithm. Simulation and experimental results are presented to demonstrate the effectiveness of the proposed method in enhancing spectrum calibration and AoA estimation.

YNIMG Journal 2024 Journal Article

PENCIL imaging: A novel approach for neuromelanin sensitive MRI in Parkinson's disease

  • Peng Liu
  • Xinhui Wang
  • Youmin Zhang
  • Pei Huang
  • Zhijia Jin
  • Zenghui Cheng
  • Yongsheng Chen
  • Qiuyun Xu

BACKGROUND: Parkinson's disease (PD) is associated with the loss of neuromelanin (NM) and increased iron in the substantia nigra (SN). Magnetization transfer contrast (MTC) is widely used for NM visualization but has limitations in brain coverage and scan time. This study aimed to develop a new approach called Proton-density Enhanced Neuromelanin Contrast in Low flip angle gradient echo (PENCIL) imaging to visualize NM in the SN. METHODS: This study included 30 PD subjects and 50 healthy controls (HCs) scanned at 3T. PENCIL and MTC images were acquired. NM volume in the SN pars compacta (SNpc), normalized image contrast (Cnorm), and contrast-to-noise ratio (CNR) were calculated. The change of NM volume in the SNpc with age was analyzed using the HC data. A group analysis compared differences between PD subjects and HCs. Receiver operating characteristic (ROC) analysis and area under the curve (AUC) calculations were used to evaluate the diagnostic performance of NM volume and CNR in the SNpc. RESULTS: PENCIL provided similar visualization and structural information of NM compared to MTC. In HCs, PENCIL showed higher NM volume in the SNpc than MTC, but this difference was not observed in PD subjects. PENCIL had higher CNR, while MTC had higher Cnorm. Both methods revealed a similar pattern of NM volume in SNpc changes with age. There were no significant differences in AUCs between NM volume in SNpc measured by PENCIL and MTC. Both methods exhibited comparable diagnostic performance in this regard. CONCLUSIONS: PENCIL imaging provided improved CNR compared to MTC and showed similar diagnostic performance for differentiating PD subjects from HCs. The major advantage is PENCIL has rapid whole-brain coverage and, when using STAGE imaging, offers a one-stop quantitative assessment of tissue properties.

NeurIPS Conference 2024 Conference Paper

PersonalSum: A User-Subjective Guided Personalized Summarization Dataset for Large Language Models

  • Lemei Zhang
  • Peng Liu
  • Marcus T. Henriksboe
  • Even W. Lauvrak
  • Jon A. Gulla
  • Heri Ramampiaro

With the rapid advancement of Natural Language Processing in recent years, numerous studies have shown that generic summaries generated by Large Language Models (LLMs) can sometimes surpass those annotated by experts, such as journalists, according to human evaluations. However, there is limited research on whether these generic summaries meet the individual needs of ordinary people. The biggest obstacle is the lack of human-annotated datasets from the general public. Existing work on personalized summarization often relies on pseudo datasets created from generic summarization datasets or controllable tasks that focus on specific named entities or other aspects, such as the length and specificity of generated summaries, collected from hypothetical tasks without the annotators' initiative. To bridge this gap, we propose a high-quality, personalized, manually annotated summarization dataset called PersonalSum. This dataset is the first to investigate whether the focus of public readers differs from the generic summaries generated by LLMs. It includes user profiles, personalized summaries accompanied by source sentences from given articles, and machine-generated generic summaries along with their sources. We investigate several personal signals — entities/topics, plot, and structure of articles—that may affect the generation of personalized summaries using LLMs in a few-shot in-context learning scenario. Our preliminary results and analysis indicate that entities/topics are merely one of the key factors that impact the diverse preferences of users, and personalized summarization remains a significant challenge for existing LLMs.

ICML Conference 2024 Conference Paper

Preference Optimization for Molecule Synthesis with Conditional Residual Energy-based Models

  • Songtao Liu
  • Hanjun Dai
  • Yue Zhao 0016
  • Peng Liu

Molecule synthesis through machine learning is one of the fundamental problems in drug discovery. Current data-driven strategies employ one-step retrosynthesis models and search algorithms to predict synthetic routes in a top-bottom manner. Despite their effective performance, these strategies face limitations in the molecule synthetic route generation due to a greedy selection of the next molecule set without any lookahead. Furthermore, existing strategies cannot control the generation of synthetic routes based on possible criteria such as material costs, yields, and step count. In this work, we propose a general and principled framework via conditional residual energy-based models (EBMs), that focus on the quality of the entire synthetic route based on the specific criteria. By incorporating an additional energy-based function into our probabilistic model, our proposed algorithm can enhance the quality of the most probable synthetic routes (with higher probabilities) generated by various strategies in a plug-and-play fashion. Extensive experiments demonstrate that our framework can consistently boost performance across various strategies and outperforms previous state-of-the-art top-1 accuracy by a margin of 2. 5%. Code is available at https: //github. com/SongtaoLiu0823/CREBM.

AAAI Conference 2024 Conference Paper

Robust Visual Imitation Learning with Inverse Dynamics Representations

  • Siyuan Li
  • Xun Wang
  • Rongchang Zuo
  • Kewu Sun
  • Lingfei Cui
  • Jishiyu Ding
  • Peng Liu
  • Zhe Ma

Imitation learning (IL) has achieved considerable success in solving complex sequential decision-making problems. However, current IL methods mainly assume that the environment for learning policies is the same as the environment for collecting expert datasets. Therefore, these methods may fail to work when there are slight differences between the learning and expert environments, especially for challenging problems with high-dimensional image observations. However, in real-world scenarios, it is rare to have the chance to collect expert trajectories precisely in the target learning environment. To address this challenge, we propose a novel robust imitation learning approach, where we develop an inverse dynamics state representation learning objective to align the expert environment and the learning environment. With the abstract state representation, we design an effective reward function, which thoroughly measures the similarity between behavior data and expert data not only element-wise, but also from the trajectory level. We conduct extensive experiments to evaluate the proposed approach under various visual perturbations and in diverse visual control tasks. Our approach can achieve a near-expert performance in most environments, and significantly outperforms the state-of-the-art visual IL methods and robust IL methods.

IROS Conference 2024 Conference Paper

Towards Electricity-free Pneumatic Miniature Rotation Actuator for Optical Coherence Tomography Endoscopy

  • Tinghua Zhang
  • Sishen Yuan
  • Chao Xu 0008
  • Peng Liu
  • Hongliang Ren 0001
  • Wu Yuan 0001

Miniature rotation actuators have been extensively developed and utilized in optical coherence tomography (OCT) endoscopy, enabling distortion-free OCT imaging in complex and tortuous environments. However, the use of electrical-driven rotation actuators raises safety concerns. Although magnetic-driven rotation actuators have been reported in OCT endoscopy, their use can potentially interfere with other medical devices in clinical settings. Here, we propose a pneumatic miniature rotation actuator that eliminates the electricity and magnetism concerns in circumferential imaging for OCT endoscopy. The rotor of the actuator is designed as a windmill, enabling it to convert air energy into rotation energy. In addition, to maintain the stable rotation, both a sliding bearing with two supporting points and a glass spindle with a half-ball end surface are developed. The rotation speed of our pneumatic actuator can be controlled from 66 to 97 revolutions per second by adjusting the airflow rate from 3. 25 to 4. 00 liters per minute. By OCT imaging of the human fingers, we demonstrate the feasibility of the pneumatic actuator in electricity-free distal scanning OCT endoscopy. Our pneumatic rotation actuator has wide-ranging potential in various fiber-imaging modalities, including not only OCT but also ultrasound imaging that requires similar rotation capabilities.

YNIMG Journal 2024 Journal Article

Two fundamentally different mechanisms by which unconscious information impairs behavioral performance: Evidence from fMRI and computational modeling

  • Yongchun Wang
  • Meilin Di
  • Ya Li
  • Peng Liu
  • Jingjing Zhao
  • Yonghui Wang

It is increasingly clear that unconscious information impairs the performance of the corresponding action when the instruction to act is delayed. However, whether this impairment occurs at the response level or at the perceptual level remains controversial. This study used fMRI and a computational model with a pre-post design to address this elusive issue. The fMRI results showed that when the unconscious information containing strong stimulus-response associations was irrelevant to subsequent stimuli, the precuneus in the parietal lobe, which is thought to be involved in sensorimotor processing, was activated. In contrast, when the unconscious information was relevant to subsequent stimuli, regardless of the strength of the stimulus-response associations, some regions in the occipital and temporal cortices, which are thought to be involved in visual perceptual processing, were activated. In addition, the percent signal change in the regions of interest associated with motor inhibition was modulated by compatibility in the irrelevant but not in the relevant stimuli conditions. Modeling of behavioral data further supported that the irrelevant and relevant stimuli conditions involved fundamentally different mechanisms. Our finding reconciles the debate about the mechanism by which unconscious information impairs action performance and has important implications for understanding of unconscious cognition.

IJCAI Conference 2024 Conference Paper

Zero-shot High-fidelity and Pose-controllable Character Animation

  • Bingwen Zhu
  • Fanyi Wang
  • Tianyi Lu
  • Peng Liu
  • Jingwen Su
  • Jinxiu Liu
  • Yanhao Zhang
  • Zuxuan Wu

Image-to-video (I2V) generation aims to create a video sequence from a single image, which requires high temporal coherence and visual fidelity. However, existing approaches suffer from inconsistency of character appearances and poor preservation of fine details. Moreover, they require a large amount of video data for training, which can be computationally demanding. To address these limitations, we propose PoseAnimate, a novel zero-shot I2V framework for character animation. PoseAnimate contains three key components: 1) a Pose-Aware Control Module (PACM) that incorporates diverse pose signals into text embeddings, to preserve character-independent content and maintain precise alignment of actions. 2) a Dual Consistency Attention Module (DCAM) that enhances temporal consistency and retains character identity and intricate background details. 3) a Mask-Guided Decoupling Module (MGDM) that refines distinct feature perception abilities, improving animation fidelity by decoupling the character and background. We also propose a Pose Alignment Transition Algorithm (PATA) to ensure smooth action transition. Extensive experiment results demonstrate that our approach outperforms the state-of-the-art training-based methods in terms of character consistency and detail fidelity. Moreover, it maintains a high level of temporal coherence throughout the generated animations.

IJCAI Conference 2023 Conference Paper

Bayesian Optimization with Switching Cost: Regret Analysis and Lookahead Variants

  • Peng Liu
  • Haowei Wang
  • Wei Qiyu

Bayesian Optimization (BO) has recently received increasing attention due to its efficiency in optimizing expensive-to-evaluate functions. For some practical problems, it is essential to consider the path-dependent switching cost between consecutive sampling locations given a total traveling budget. For example, when using a drone to locate cracks in a building wall or search for lost survivors in the wild, the search path needs to be efficiently planned given the limited battery power of the drone. Tackling such problems requires a careful cost-benefit analysis of candidate locations and balancing exploration and exploitation. In this work, we formulate such a problem as a constrained Markov Decision Process (MDP) and solve it by proposing a new distance-adjusted multi-step look-ahead acquisition function, the distUCB, and using rollout approximation. We also provide a theoretical regret analysis of the distUCB-based Bayesian optimization algorithm. In addition, the empirical performance of the proposed algorithm is tested based on both synthetic and real data experiments, and it shows that our cost-aware non-myopic algorithm performs better than other popular alternatives.

JBHI Journal 2023 Journal Article

Boundary-Sensitive Loss Function With Location Constraint for Hard Region Segmentation

  • Jie Du
  • Kai Guan
  • Peng Liu
  • Yuanman Li
  • Tianfu Wang

In computer-aided diagnosis and treatment planning, accurate segmentation of medical images plays an essential role, especially for some hard regions including boundaries, small objects and background interference. However, existing segmentation loss functions including distribution-, region- and boundary-based losses cannot achieve satisfactory performances on these hard regions. In this paper, a boundary-sensitive loss function with location constraint is proposed for hard region segmentation in medical images, which provides three advantages: i) our Boundary-Sensitive loss (BS-loss) can automatically pay more attention to the hard-to-segment boundaries (e. g. , thin structures and blurred boundaries), thus obtaining finer object boundaries; ii) BS-loss also can adjust its attention to small objects during training to segment them more accurately; and iii) our location constraint can alleviate the negative impact of the background interference, through the distribution matching of pixels between prediction and Ground Truth (GT) along each axis. By resorting to the proposed BS-loss and location constraint, the hard regions in both foreground and background are considered. Experimental results on three public datasets demonstrate the superiority of our method. Specifically, compared to the second-best method tested in this study, our method improves performance on hard regions in terms of Dice similarity coefficient (DSC) and 95% Hausdorff distance (95%HD) of up to 4. 17% and 73% respectively. In addition, it also achieves the best overall segmentation performance. Hence, we can conclude that our method can accurately segment these hard regions and improve the overall segmentation performance in medical images.

JAAMAS Journal 2023 Journal Article

Classifying ambiguous identities in hidden-role Stochastic games with multi-agent reinforcement learning

  • Shijie Han
  • Siyuan Li
  • Peng Liu

Abstract Multi-agent reinforcement learning (MARL) is a prevalent learning paradigm for solving stochastic games. In most MARL studies, agents in a game are defined as teammates or enemies beforehand, and the relationships among the agents (i. e. , their identities ) remain fixed throughout the game. However, in real-world problems, the agent relationships are commonly unknown in advance or dynamically changing. Many multi-party interactions start off by asking: who is on my team? This question arises whether it is the first day at the stock exchange or the kindergarten. Therefore, training policies for such situations in the face of imperfect information and ambiguous identities is an important problem that needs to be addressed. In this work, we develop a novel identity detection reinforcement learning (IDRL) framework that allows an agent to dynamically infer the identities of nearby agents and select an appropriate policy to accomplish the task. In the IDRL framework, a relation network is constructed to deduce the identities of other agents by observing the behaviors of the agents. A danger network is optimized to estimate the risk of false-positive identifications. Beyond that, we propose an intrinsic reward that balances the need to maximize external rewards and accurate identification. After identifying the cooperation-competition pattern among the agents, IDRL applies one of the off-the-shelf MARL methods to learn the policy. To evaluate the proposed method, we conduct experiments on Red-10 card-shedding game, and the results show that IDRL achieves superior performance over other state-of-the-art MARL methods. Impressively, the relation network has the par performance to identify the identities of agents with top human players; the danger network reasonably avoids the risk of imperfect identification. The code to reproduce all the reported results is available online at https: //github. com/MR-BENjie/IDRL.

JBHI Journal 2023 Journal Article

Coarse-Refined Consistency Learning Using Pixel-Level Features for Semi-Supervised Medical Image Segmentation

  • Jie Du
  • Xiaoci Zhang
  • Peng Liu
  • Tianfu Wang

Pixel-level annotations are extremely expensive for medical image segmentation tasks as both expertise and time are needed to generate accurate annotations. Semi-supervised learning (SSL) for medical image segmentation has recently attracted growing attention because it can alleviate the exhausting manual annotations for clinicians by leveraging unlabeled data. However, most of the existing SSL methods do not take pixel-level information (e. g. , pixel-level features) of labeled data into account, i. e. , the labeled data are underutilized. Hence, in this work, an innovative Coarse-Refined Network with pixel-wise Intra-patch ranked loss and patch-wise Inter-patch ranked loss (CRII-Net) is proposed. It provides three advantages: i) it can produce stable targets for unlabeled data, as a simple yet effective coarse-refined consistency constraint is designed; ii) it is very effective for the extreme case where very scarce labeled data are available, as the pixel-level and patch-level features are extracted by our CRII-Net; and iii) it can output fine-grained segmentation results for hard regions (e. g. , blurred object boundaries and low-contrast lesions), as the proposed Intra-Patch Ranked Loss (Intra-PRL) focuses on object boundaries and Inter-Patch Ranked loss (Inter-PRL) mitigates the adverse impact of low-contrast lesions. Experimental results on two common SSL tasks for medical image segmentation demonstrate the superiority of our CRII-Net. Specifically, when there are only 4% labeled data, our CRII-Net improves the Dice similarity coefficient (DSC) score by at least 7. 49% when compared to five classical or state-of-the-art (SOTA) SSL methods. For hard samples/regions, our CRII-Net also significantly outperforms other compared methods in both quantitative and visualization results.

JBHI Journal 2023 Journal Article

Efficient Large-Scale Virtual Screening Based on Heterogeneous Many-Core Supercomputing System

  • Hao Liu
  • Cunji Wang
  • Peng Liu
  • Chengchao Liu
  • Zhuoya Wang
  • Zhiqiang Wei

With the rapid growth of virtual drug data- bases, the need for efficient molecular docking tools for large-scale screening is also growing. We have developed Vina@QNLM 2. 0, a novel molecular docking system that leverages the logical processing units and computational processing arrays of heterogeneous multicore architecture processors. Compared to Vina@QNLM, the new version optimizes the docking speed without sacrificing accuracy. This greatly improves the scoring capability for large molecules (molecular weight > 500). Simultaneously, the new system provides enhanced support for applications such as reverse target finding through an improved parallel strategy. Vina@QNLM 2. 0 achieves a speedup 20 times higher than that, using logical processing units only during a single docking process. Additionally, we successfully scaled the reverse target finding a task to 122, 401 kernel groups with a robust scalability of 80. 01%. In practice, we completed a reverse target-seeking for nine glycan molecules with 10, 094 proteins within 1 hour.

YNICL Journal 2023 Journal Article

Locus coeruleus and substantia nigra neuromelanin magnetic resonance imaging differentiates Parkinson’s disease and essential tremor

  • Xinhui Wang
  • Pei Huang
  • Ewart Mark Haacke
  • Yu Liu
  • Youmin Zhang
  • Zhijia Jin
  • Yan Li
  • Qiuyun Xu

BACKGROUND: Differential diagnosis of essential tremor (ET) and Parkinson's disease (PD) can still be a challenge in clinical practice. These two tremor disorders may have different pathogenesis related to the substantia nigra (SN) and locus coeruleus (LC). Characterizing neuromelanin (NM) in these structures may help improve the differential diagnosis. METHODS: from ET was assessed with a receiver operative characteristic curve, and the area under the curve (AUC) was calculated. RESULTS: from ET. CONCLUSION: and ET, and the investigation of the underlying pathophysiology.

IJCAI Conference 2023 Conference Paper

Matting Moments: A Unified Data-Driven Matting Engine for Mobile AIGC in Photo Gallery

  • Yanhao Zhang
  • Fanyi Wang
  • Weixuan Sun
  • Jingwen Su
  • Peng Liu
  • Yaqian Li
  • Xinjie Feng
  • Zhengxia Zou

Image matting is a fundamental technique in visual understanding and has become one of the most significant capabilities in mobile phones. Despite the development of mobile storage and computing power, achieving diverse mobile Artificial Intelligence Generated Content (AIGC) applications remains a great challenge. To address this issue, we present an innovative demonstration of an automatic system called "Matting Moments" that enables automatic image editing based on matting models in different scenarios. Coupled with accurate and refined matting subjects, our system provides visual element editing abilities and backend services for distribution and recommendation that respond to emotional expressions. Our system comprises three components: 1) photo content structuring, 2) data-driven matting engine, and 3) AIGC functions for generation, which automatically achieve diverse photo beautification in the gallery. This system offers a unified framework that guides consumers to obtain intelligent recommendations with beautifully generated contents, helping them enjoy the moments and memories of their present life.

YNIMG Journal 2023 Journal Article

Right, but not left, posterior superior temporal gyrus is causally involved in vocal feedback control

  • Dongxu Liu
  • Yichen Chang
  • Guangyan Dai
  • Zhiqiang Guo
  • Jeffery A. Jones
  • Tingni Li
  • Xi Chen
  • Mingyun Chen

The posterior superior temporal gyrus (pSTG) has been implicated in the integration of auditory feedback and motor system for controlling vocal production. However, the question as to whether and how the pSTG is causally involved in vocal feedback control is currently unclear. To this end, the present study selectively stimulated the left or right pSTG with continuous theta burst stimulation (c-TBS) in healthy participants, then used event-related potentials to investigate neurobehavioral changes in response to altered auditory feedback during vocal pitch regulation. The results showed that, compared to control (vertex) stimulation, c-TBS over the right pSTG led to smaller vocal compensations for pitch perturbations accompanied by smaller cortical N1 and larger P2 responses. Enhanced P2 responses received contributions from the right-lateralized temporal and parietal regions as well as the insula, and were significantly correlated with suppressed vocal compensations. Surprisingly, these effects were not found when comparing c-TBS over the left pSTG with control stimulation. Our findings provide evidence, for the first time, that supports a causal relationship between right, but not left, pSTG and auditory-motor integration for vocal pitch regulation. This lends support to a right-lateralized contribution of the pSTG in not only the bottom-up detection of vocal feedback errors but also the involvement of driving motor commands for error correction in a top-down manner.

YNIMG Journal 2022 Journal Article

A causal link between left supplementary motor area and auditory-motor control of vocal production: Evidence by continuous theta burst stimulation

  • Guangyan Dai
  • Mingyun Chen
  • Xi Chen
  • Zhiqiang Guo
  • Tingni Li
  • Jeffery A. Jones
  • Xiuqin Wu
  • Jingting Li

The supplementary motor area (SMA) has been implicated in the feedforward control of speech production. Whether this region is involved in speech motor control through auditory feedback, however, remains uncertain. The present event-related potential (ERP) study examined the role of the left SMA in vocal pitch regulation in a causal manner by combining auditory feedback manipulations and neuronavigated continuous theta bust stimulation (c-TBS). After receiving c-TBS over the left SMA or the control site (vertex), twenty young adults vocalized the vowel sound /u/ while hearing their voice unexpectedly pitch-shifted -50 or -200 cents. Compared to the control stimulation, c-TBS over the left SMA led to decreased vocal compensations for pitch perturbations of -50 and -200 cents. A significant decrease of N1 and P2 responses to -200 cents perturbations was also found when comparing active and control stimulation. Major neural generators of decreased P2 responses included the right-lateralized superior and middle temporal gyrus and angular gyrus. Notably, a significant correlation was found between active-control differences in the vocal compensation and P2 responses for the -200 cents perturbations. These findings provide neurobehavioral evidence for a causal link between the left SMA and auditory-motor integration for vocal pitch regulation, suggesting that the left SMA receives auditory feedback information and mediates vocal compensations for feedback errors in a bottom-up manner.

JBHI Journal 2022 Journal Article

Handling Imbalanced Data: Uncertainty-Guided Virtual Adversarial Training With Batch Nuclear-Norm Optimization for Semi-Supervised Medical Image Classification

  • Peng Liu
  • Guoyan Zheng

In manyclinical settings, a lot of medical image datasets suffer from imbalance problems, which makes predictions of trained models to be biased toward majority classes. Semi-supervised Learning (SSL) algorithms trained with such imbalanced datasets become more problematic since pseudo-supervision of unlabeled data are generated from the model’s biased predictions. To address these issues, in this work, we propose a novel semi-supervised deep learning method, i. e. , uncertainty-guided virtual adversarial training (VAT) with batch nuclear-norm (BNN) optimization, for large-scale medical image classification. To effectively exploit useful information from both labeled and unlabeled data, we leverage VAT and BNN optimization to harness the underlying knowledge, which helps to improve discriminability, diversity and generalization of the trained models. More concretely, our network is trained by minimizing a combination of four types of losses, including a supervised cross-entropy loss, a BNN loss defined on the output matrix of labeled data batch ( lBNN loss), a negative BNN loss defined on the output matrix of unlabeled data batch ( uBNN loss), and a VAT loss on both labeled and unlabeled data. We additionally propose to use uncertainty estimation to filter out unlabeled samples near the decision boundary when computing the VAT loss. We conduct comprehensive experiments to evaluate the performance of our method on two publicly available datasets and one in-house collected dataset. The experimental results demonstrated that our method achieved better results than state-of-the-art SSL methods.

AAAI Conference 2022 Conference Paper

Sample Average Approximation for Stochastic Optimization with Dependent Data: Performance Guarantees and Tractability

  • Yafei Wang
  • Bo Pan
  • Wei Tu
  • Peng Liu
  • Bei Jiang
  • Chao Gao
  • Wei Lu
  • Shangling Jui

Sample average approximation (SAA), a popular method for tractably solving stochastic optimization problems, enjoys strong asymptotic performance guarantees in settings with independent training samples. However, these guarantees are not known to hold generally with dependent samples, such as in online learning with time series data or distributed computing with Markovian training samples. In this paper, we show that SAA remains tractable when the distribution of unknown parameters is only observable through dependent instances and still enjoys asymptotic consistency and finite sample guarantees. Specifically, we provide a rigorous probability error analysis to derive 1 - beta confidence bounds for the out-of-sample performance of SAA estimators and show that these estimators are asymptotically consistent. We then, using monotone operator theory, study the performance of a class of stochastic first-order algorithms trained on a dependent source of data. We show that approximation error for these algorithms is bounded and concentrates around zero, and establish deviation bounds for iterates when the underlying stochastic process is phi-mixing. The algorithms presented can be used to handle numerically inconvenient loss functions such as the sum of a smooth and non-smooth function or of non-smooth functions with constraints. To illustrate the usefulness of our results, we present several stochastic versions of popular algorithms such as stochastic proximal gradient descent (S-PGD), stochastic relaxed Peaceman– Rachford splitting algorithms (S-rPRS), and numerical experiment.

ICRA Conference 2022 Conference Paper

The Feedback Trajectory Control of a SMA-Driven Miniature Jumping Robot

  • Lingqi Tang
  • Xuelin Wu
  • Peng Liu
  • Yao Li 0014
  • Bing Li 0015

Jumping motion is an effective way to overcome large obstacles, especially for the miniature robots. However, controlling of the jumping trajectory on a centimeter scale robot is not easy due to the limitation of size and payload. None of the jumping robots lighter than 90 g achieved the feedback control of their jumping height and take-off angle independently. In this work, we proposed a miniature 6 g jumping robot that ensured the feedback control of jumping trajectory. Two simple PD controllers were used in take-off angle and jumping height control, respectively. The robot can control its jumping height from 0 to 73cm, take-off angle from −20° to +20° with respect to the vertical direction. The control errors of the jumping height and the take-off angle were less than 5 cm and 2°, respectively. The robot can hop upon different obstacles exactly, greatly increased the controllability of the micro jumping robot.

ICRA Conference 2021 Conference Paper

A Multi-Level Network for Human Pose Estimation

  • Zhanpeng Shao
  • Peng Liu
  • You-Fu Li 0001
  • Jianyu Yang 0002
  • Xiaolong Zhou 0001

Although multi-person human pose estimation has made great progress in recent years, the challenges such as various scales of persons, occluded keypoints, and crowded backgrounds in complex scenes are still remained to be solved. In this paper, we propose a novel multi-level pose estimation network (MLPE) to learn multi-level features that can preserve both the strong semantic clues and spatial resolution for keypoint prediction and location. More specifically, a multi-level prediction network with a feature enhancement strategy is first proposed to learn multi-level features to achieve a good trade-off between the global context information and spatial resolution. We then build a high-resolution fine network to restore high spatial resolution information based on transposed convolutions to accurately locate the keypoints. We have conducted extensive experiments on the challenging MS COCO dataset, which has proved the effectiveness of our proposed method. Code † and the experimental results are publicly online available for further research.

NeurIPS Conference 2021 Conference Paper

Dynamic Bottleneck for Robust Self-Supervised Exploration

  • Chenjia Bai
  • Lingxiao Wang
  • Lei Han
  • Animesh Garg
  • Jianye Hao
  • Peng Liu
  • Zhaoran Wang

Exploration methods based on pseudo-count of transitions or curiosity of dynamics have achieved promising results in solving reinforcement learning with sparse rewards. However, such methods are usually sensitive to environmental dynamics-irrelevant information, e. g. , white-noise. To handle such dynamics-irrelevant information, we propose a Dynamic Bottleneck (DB) model, which attains a dynamics-relevant representation based on the information-bottleneck principle. Based on the DB model, we further propose DB-bonus, which encourages the agent to explore state-action pairs with high information gain. We establish theoretical connections between the proposed DB-bonus, the upper confidence bound (UCB) for linear case, and the visiting count for tabular case. We evaluate the proposed method on Atari suits with dynamics-irrelevant noises. Our experiments show that exploration with DB bonus outperforms several state-of-the-art exploration methods in noisy environments.

ICRA Conference 2021 Conference Paper

Muscular stimulation based biological actuator from locust's hindleg

  • Songsong Ma
  • Peng Liu
  • Shen Liu
  • Yao Li 0014
  • Bing Li 0015

The development and control of biological actuators have been an active research field. Biological actuators revealed high mobility with compact dimensions, which is critical for the design of microrobots. The powerful kicking motion of the locust is important for its quick jumping. Herein, we examined the kicking process of the locust’s hindleg and controlled the flexion and extension motions via exogenous stimulation. Unlike a simple extension of the leg, co-contraction is adopted by locust to store energy and increase jumping power. Thus, we imitated the co-contraction process and transformed the locust’s hindleg into a biological jumping actuator. Through coordinating the kicking of bilateral hindlegs together, we achieved the jumping control of a locust.

TAAS Journal 2012 Journal Article

A formalized, taxonomy-driven approach to cross-layer application adaptation

  • Razvan Popescu
  • Athanasios Staikopoulos
  • Antonio Brogi
  • Peng Liu
  • Siobhán Clarke

Advances in pervasive technology have made it possible to consider large-scale application types that potentially span heterogeneous organizations, technologies, and device types. This class of application will have a multilayer architecture, where each layer is likely to use languages and technologies appropriate to its own concerns. An example application is a geographically large-scale crisis management system. Typically, such applications are required to dynamically adapt their behavior based on current circumstances, with adaptations potentially affecting all layers of the application. The complexities involved in dynamically adapting multilayer applications will significantly benefit from formal approaches to its specification. This article presents a new methodology for flexible, multilayer application adaptation, with layer-specific adaptation solution templates bound to application mismatches that are organized into hierarchical taxonomies. Templates can be linked either through direct invocations or through adaptation events, supporting flexible cross-layer adaptation. The methodology illustrates the use of different formalisms for different elements of its specification. In particular, we combine semiformal metamodeling techniques for the system model specification with formal Petri nets, which are used to capture template matchmaking using reachability analysis. This work demonstrates how existing formalisms can be used for the specification of a generic adaptation model for pervasive applications.

IROS Conference 2012 Conference Paper

The AmphiHex: A novel amphibious robot with transformable leg-flipper composite propulsion mechanism

  • Xu Liang
  • Min Xu 0006
  • Lichao Xu
  • Peng Liu
  • Xiaoshuang Ren
  • Ziwen Kong
  • Jie Yang 0004
  • Shiwu Zhang

The amphibious robot is so attractive and challenging for its broad application and its complex working environment. It should walk on rough ground, maneuver underwater and pass through transitional terrain such as sand and mud, simultaneously. To tackle with such a complex task, a novel amphibious robot (AmphiHex-I) with transformable leg-flipper composite propulsion is proposed and developed. This paper presents the detailed structure design of the transformable leg-flipper propulsion mechanism and its drive module, which enables the amphibious robot passing through the terrain, water and transitional zone between them. A preliminary theoretical analysis is conducted to study the interaction between the elliptic leg and transitional environment such as granular medium. An orthogonal experiment is designed to study the leg locomotion in the sandy and muddy terrain with different water content. Finally, basic propulsion experiments of AmphiHex-I are launched, which verified the locomotion capability on land and underwater is achieved by the transformable leg-flipper mechanism.