Arrow Research search

Author name cluster

Bin Liang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

23 papers
2 author rows

Possible papers

23

TMLR Journal 2026 Journal Article

Interference-Aware K-Step Reachable Communication in Multi-Agent Reinforcement Learning

  • Ziyu Cheng
  • Jinsheng Ren
  • Jun Yang
  • Zhouxian Jiang
  • Chenzhihang Li
  • Rongye Shi
  • Bin Liang

Effective communication is pivotal for addressing complex collaborative tasks in multi-agent reinforcement learning (MARL). Yet, limited communication bandwidth and dynamic, intricate environmental topologies present significant challenges in identifying high-value communication partners. Agents must consequently select collaborators under uncertainty, lacking a priori knowledge of which partners can deliver task-critical information. To this end, we propose Interference-Aware $K$-Step Reachable Communication (IA-KRC), a novel framework that enhances cooperation via two core components: (1) a $K$-Step reachability protocol that confines message passing to physically accessible neighbors, and (2) an interference-prediction module that optimizes partner choice by minimizing interference while maximizing utility. Compared to existing methods, IA-KRC enables substantially more persistent and efficient cooperation despite environmental interference. Comprehensive evaluations confirm that IA-KRC achieves superior performance compared to state-of-the-art baselines, while demonstrating enhanced robustness and scalability in complex topological and highly dynamic multi-agent scenarios.

AAAI Conference 2026 Conference Paper

MemGuide: Intent-Driven Memory Selection for Goal-Oriented Multi-Session LLM Agents

  • Yiming Du
  • Bingbing Wang
  • Yang He
  • Bin Liang
  • Baojun Wang
  • Zhongyang Li
  • Lin Gui
  • Jeff Z. Pan

Modern task-oriented dialogue (TOD) systems increasingly rely on large language model (LLM) agents, leveraging Retrieval-Augmented Generation (RAG) and long-context capabilities for long-term memory utilization. However, these methods prioritise semantic similarity over task intent, degrading multi-session coherence. We propose MemGuide, a two-stage intent-driven memory selection framework: (1) Intent‑Aligned Retrieval retrieves goal-consistent QA‑formatted memory units; (2) Missing‑Slot Guided Filtering reranks units by slot-completion gain via a chain‑of‑thought reasoner and fine‑tuned LLaMA‑8B filter. We also introduce the MS-TOD, the first multi-session TOD benchmark with 132 diverse personas, 956 task goals, and annotated intent-aligned memory targets. Evaluations on MS-TOD show that MemGuide boosts task success rate by 11% (88%→99%) and reduces dialogue length by 2.84 turns, and matches single‑session performance.

AAAI Conference 2026 Conference Paper

Shadows in the Code: Exploring the Risks and Defenses of LLM-based Multi-Agent Software Development Systems

  • Xiaoqing Wang
  • Keman Huang
  • Bin Liang
  • Hongyu Li
  • Xiaoyong Du

The rapid advancement of Large Language Model (LLM)-driven multi-agent systems has significantly streamlined software developing tasks, enabling users with little technical expertise to develop executable applications. While these systems democratize software creation through natural language requirements, they introduce significant security risks that remain largely unexplored. We identify two risky scenarios: Malicious User with Benign Agents (MU-BA) and Benign User with Malicious Agents (BU-MA). We introduce the Implicit Malicious Behavior Injection Attack (IMBIA), demonstrating how multi-agent systems can be manipulated to generate software with concealed malicious capabilities beneath seemingly benign applications, and propose Adv-IMBIA as a defense mechanism. Evaluations across ChatDev, MetaGPT, and AgentVerse frameworks reveal varying vulnerability patterns, with IMBIA achieving attack success rates of 93%, 45%, and 71% in MU-BA scenarios, and 71%, 84%, and 45% in BU-MA scenarios. Our defense mechanism reduced attack success rates significantly, particularly in the MU-BA scenario. Further analysis reveals that compromised agents in the coding and testing phases pose significantly greater security risks, while also identifying critical agents that require protection against malicious user exploitation. Our findings highlight the urgent need for robust security measures in multi-agent software development systems and provide practical guidelines for implementing targeted, resource-efficient defensive strategies.

AAAI Conference 2025 Conference Paper

A Comprehensive Evaluation on Event Reasoning of Large Language Models

  • Zhengwei Tao
  • Zhi Jin
  • Yifan Zhang
  • Xiancai Chen
  • Haiyan Zhao
  • Jia Li
  • Bin Liang
  • Chongyang Tao

Event reasoning is a fundamental ability that underlies many applications. It requires event schema knowledge to perform global reasoning and needs to deal with the diversity of the inter-event relations and the reasoning paradigms. The extent to which LLMs excel in event reasoning across various relations and reasoning paradigms has not been thoroughly investigated. Additionally, it is still unclear whether LLMs utilize event knowledge in the same way humans do. To mitigate this disparity, we comprehensively evaluate the abilities of event reasoning of LLMs on different relations, paradigms, and levels of abstraction. We introduce a novel benchmark EV2 for EValuation of EVent reasoning. EV2 consists of two levels of evaluation on schema and instance and is comprehensive in relations and reasoning paradigms. We conduct extensive experiments on EV2. We find that 1) LLMs have abilities to accomplish event reasoning but their performances are far from satisfactory. 2) There are imbalances of event reasoning abilities on different relations and paradigms. 3) LLMs have event schema knowledge, however, they're not aligned with humans on how to utilize the knowledge. Based on these findings, we guide the LLMs in utilizing the event schema knowledge as memory leading to improvements in event reasoning.

AAAI Conference 2025 Conference Paper

A New Formula for Sticker Retrieval: Reply with Stickers in Multi-Modal and Multi-Session Conversation

  • Bingbing Wang
  • Yiming Du
  • Bin Liang
  • Zhixin Bai
  • Min Yang
  • Baojun Wang
  • Kam-Fai Wong
  • Ruifeng Xu

Stickers are widely used in online chatting, which can vividly express someone's intention, emotion, or attitude. Existing conversation research typically retrieves stickers based on a single session or the previous textual information, which can not adapt to the multi-modal and multi-session nature of the real-world conversation. To this end, we introduce MultiChat, a new dataset for sticker retrieval facing the multi-modal and multi-session conversation, comprising 1,542 sessions, featuring 50,192 utterances and 2,182 stickers. Based on the created dataset, we propose a novel Intent-Guided Sticker Retrieval (IGSR) framework that retrieves stickers for multi-modal and multi-session conversation history drawing support from intent learning. Specifically, we introduce sticker attributes to better leverage the sticker information in multi-modal conversation, which are incorporated with utterances to construct a memory bank. Further, we extract relevant memories for the current conversation from the memory bank to identify the intent of the current conversation, and then retrieve a sticker to respond guided by the intent. Extensive experiments on our MultiChat dataset reveal the robustness and effectiveness of our IGSR approach in multi-session, multi-modal scenarios.

AAAI Conference 2025 Conference Paper

CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility

  • Bojia Zi
  • Shihao Zhao
  • Xianbiao Qi
  • Jianan Wang
  • Yukai Shi
  • Qianyu Chen
  • Bin Liang
  • Rong Xiao

Video inpainting is a crucial task with diverse applications, including fine-grained video editing, video recovery, and video dewatermarking. However, most existing video inpainting methods primarily focus on visual content completion while neglecting text information. There are only a limited number of text-guided video inpainting techniques, and these techniques struggle with maintaining visual quality and exhibit poor semantic representation capabilities. In this paper, we introduce CoCoCo, a text-guided video inpainting diffusion framework. To address the aforementioned challenges, we enhance both the training data and model structure. Specifically, we devise an instance-aware region selection strategy for masked area sampling and develop a novel motion block that incorporates efficient 3D full attention and textual cross attention. Additionally, our CoCoCo framework can be seamlessly integrated with various personalized text-to-image diffusion models through a delicate training-free transfer mechanism. Comprehensive experiments demonstrate that CoCoCo can create high-quality visual content with enhanced temporal consistency, improved text controllability, and better compatibility with personalized image models.

AAAI Conference 2025 Conference Paper

Correcting Large Language Model Behavior via Influence Function

  • Han Zhang
  • Zhuo Zhang
  • Yi Zhang
  • Yuanzhao Zhai
  • Hanyang Peng
  • Yu Lei
  • Yue Yu
  • Hui Wang

Recent advancements in AI alignment techniques have significantly improved the alignment of large language models (LLMs) with static human preferences. However, the dynamic nature of human preferences can render some prior training data outdated or even erroneous, ultimately causing LLMs to deviate from contemporary human preferences and societal norms. Existing methodologies, either curation of new data for continual alignment or manual correction of outdated data for re-alignment, demand costly human resources. To address this, we propose a novel approach, LLM BehAvior Correction with INfluence FunCtion REcall and Post-Training (LANCET), which needs no human involvement. LANCET consists of two phases: (1) using a new method LinFAC to efficiently identify the training data that significantly impact undesirable model outputs, and (2) applying an novel Influence-driven Bregman Optimization (IBO) technique to adjust the model’s outputs based on these influence distributions. Our experiments show that LANCET effectively and efficiently corrects inappropriate behaviors of LLMs while preserving model utility. Further more, LANCET exhibits stronger generalization ability than all baselines under out-of-distribution harmful prompts, offering better interpretability and compatibility with real-world applications of LLMs.

ICLR Conference 2025 Conference Paper

CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery

  • Xiaoshuai Song
  • Muxi Diao
  • Guanting Dong 0001
  • Zhengyang Wang
  • Yujia Fu
  • Runqi Qiao
  • Zhexu Wang
  • Dayuan Fu

Large language models (LLMs) have demonstrated significant potential in advancing various fields of research and society. However, the current community of LLMs overly focuses on benchmarks for analyzing specific foundational skills (e.g. mathematics and code generation), neglecting an all-round evaluation of the computer science field. To bridge this gap, we introduce CS-Bench, the first multilingual (English, Chinese, French, German) benchmark dedicated to evaluating the performance of LLMs in computer science. CS-Bench comprises approximately 10K meticulously curated test samples, covering 26 subfields across 4 key areas of computer science, encompassing various task forms and divisions of knowledge and reasoning. Utilizing CS-Bench, we conduct a comprehensive evaluation of over 30 mainstream LLMs, revealing the relationship between CS performance and model scales. We also quantitatively analyze the reasons for failures in existing LLMs and highlight directions for improvements, including knowledge supplementation and CS-specific reasoning. Further cross-capability experiments show a high correlation between LLMs' capabilities in computer science and their abilities in mathematics and coding. Moreover, expert LLMs specialized in mathematics and coding also demonstrate strong performances in several CS subfields. Looking ahead, we envision CS-Bench serving as a cornerstone for LLM applications in the CS field and paving new avenues in assessing LLMs' diverse reasoning capabilities. Our project homepage is available at https://csbench.github.io/.

TMLR Journal 2025 Journal Article

Diffusion-RainbowPA: Improvements Integrated Preference Alignment for Diffusion-based Text-to-Image Generation

  • Haoyuan Sun
  • Bin Liang
  • Bo Xia
  • Jiaqi Wu
  • Yifei Zhao
  • Kai Qin
  • Yongzhe Chang
  • Xueqian Wang

Although rapidly increasing capabilities of text-to-image (T2I) models have profound implications across various industries, they concurrently suffer from numerous shortcomings, necessitating the implementation of effective alignment strategies with human preference. Diffusion-DPO and SPO have emerged as robust approaches for aligning diffusion-based T2I models with human preference feedback. However, they tend to suffer from text-image misalignment, aesthetic overfitting and low-quality generation. To tackle such matters, we improve the alignment paradigm through a tripartite perspective, which are the calibration enhancement (Calibration Enhanced Preference Alignment), the overfitting mitigation (Identical Preference Alignment, Jensen-Shannon Divergence Constraint) and the performance optimization (Margin Strengthened Preference Alignment, SFT-like Regularization). Furthermore, combining them with the step-aware preference alignment paradigm, we propose the Diffusion-RainbowPA, a suite of total six improvements that collectively improve the alignment performance of Diffusion-DPO. With comprehensive alignment performance evaluation and comparison, it is demonstrated that Diffusion-RainbowPA outperforms current state-of-the-art methods. We also conduct ablation studies on the introduced components that reveal incorporation of each has positively enhanced alignment performance.

IROS Conference 2025 Conference Paper

KD-RIEKF: Kinodynamic Right-Invariant EKF for Legged Robot State Estimation

  • Qi Yang
  • Bin Lan
  • Bingjie Chen
  • Jingjing Wang
  • Yi Cheng
  • Yizhe Li
  • Houde Liu
  • Bin Liang

We present KD-RIEKF, a novel state estimation framework that incorporates kinodynamic constraints into the Right-Invariant Extended Kalman Filter (RIEKF). Our framework integrates generalized momentum-based contact estimation, centroidal dynamics, and a noise-adaptive module, improving state estimation accuracy by probabilistically adjusting propagation noise to account for contact uncertainty and sensor noise. A key innovation is the expansion of the ground reaction force (GRF) into a state variable. By using GRF-based acceleration as a measurement, our method significantly reduces estimation errors in position, velocity, and orientation. The integration of contact-force-driven adaptive noise effectively boosts the stability of estimation, especially when the system is undergoing turning, acceleration, or deceleration processes. We validated our algorithm in simulation on highly uneven terrain, showing significant enhancements in z-axis position estimation compared to RIEKF. Further experiments on the Unitree Go2 robot across different speeds demonstrated that even in high-speed scenarios over 200 meters, our method reduced position estimation relative error (RE) by 47% and orientation estimation by 42%, confirming its robustness and accuracy under dynamic locomotion.

IROS Conference 2025 Conference Paper

MISCGrasp: Leveraging Multiple Integrated Scales and Contrastive Learning for Enhanced Volumetric Grasping

  • Qingyu Fan
  • Yinghao Cai
  • Chao Li
  • Chunting Jiao
  • Xudong Zheng
  • Tao Lu 0006
  • Bin Liang
  • Shuo Wang 0001

Robotic grasping faces challenges in adapting to objects with varying shapes and sizes. In this paper, we introduce MISCGrasp, a volumetric grasping method that integrates multi-scale feature extraction with contrastive feature enhancement for self-adaptive grasping. We propose a query-based interaction between high-level and low-level features through the Insight Transformer, while the Empower Transformer selectively attends to the highest-level features, which synergistically strikes a balance between focusing on fine geometric details and overall geometric structures. Furthermore, MISCGrasp utilizes multi-scale contrastive learning to exploit similarities among positive grasp samples, ensuring consistency across multi-scale features. Extensive experiments in both simulated and real-world environments demonstrate that MISCGrasp outperforms baseline and variant methods in tabletop decluttering tasks. More details are available at https://miscgrasp.github.io/.

ICRA Conference 2025 Conference Paper

NeuGrasp: Generalizable Neural Surface Reconstruction with Background Priors for Material-Agnostic Object Grasp Detection

  • Qingyu Fan
  • Yinghao Cai
  • Chao Li
  • Wenzhe He
  • Xudong Zheng
  • Tao Lu 0006
  • Bin Liang
  • Shuo Wang 0001

Robotic grasping in scenes with transparent and specular objects presents great challenges for methods relying on accurate depth information. In this paper, we introduce NeuGrasp, a neural surface reconstruction method that leverages background priors for material-agnostic grasp detection. NeuGrasp integrates transformers and global prior volumes to aggregate multi-view features with spatial encoding, enabling robust surface reconstruction in narrow and sparse viewing conditions. By focusing on foreground objects through residual feature enhancement and refining spatial perception with an occupancy-prior volume, NeuGrasp excels in handling objects with transparent and specular surfaces. Extensive experiments in both simulated and real-world scenarios show that NeuGrasp outperforms state-of-the-art methods in grasping while maintaining comparable reconstruction quality. More details are available at https://neugrasp.github.io/.

IROS Conference 2025 Conference Paper

Robust and High-Fidelity 3D Gaussian Splatting: Fusing Pose Priors and Geometry Constraints for Texture-Deficient Outdoor Scenes

  • Meijun Guo
  • Yongliang Shi
  • Caiyun Liu 0004
  • Yixiao Feng
  • Ming Ma
  • Tinghai Yan
  • Weining Lu
  • Bin Liang

3D Gaussian Splatting (3DGS) has emerged as a key rendering pipeline for digital asset creation due to its balance between efficiency and visual quality. To address the issues of unstable pose estimation and scene representation distortion caused by geometric texture inconsistency in large outdoor scenes with weak or repetitive textures, we approach the problem from two aspects: pose estimation and scene representation. For pose estimation, we leverage LiDAR-IMU Odometry to provide prior poses for cameras in large-scale environments. These prior pose constraints are incorporated into COLMAP’s triangulation process, with pose optimization performed via bundle adjustment. Ensuring consistency between pixel data association and prior poses helps maintain both robustness and accuracy. For scene representation, we introduce normal vector constraints and effective rank regularization to enforce consistency in the direction and shape of Gaussian primitives. These constraints are jointly optimized with the existing photometric loss to enhance the map quality. We evaluate our approach using both public and self-collected datasets. In terms of pose optimization, our method requires only one-third of the time while maintaining accuracy and robustness across both datasets. In terms of scene representation, the results show that our method significantly outperforms conventional 3DGS pipelines. Notably, on self-collected datasets characterized by weak or repetitive textures, our approach demonstrates enhanced visualization capabilities and achieves superior overall performance. Codes and data will be publicly available at https://github.com/justinyeah/normaljshape.git.

IROS Conference 2025 Conference Paper

Semi-distributed Cross-modal Air-Ground Relative Localization

  • Weining Lu
  • Deer Bin
  • Lian Ma
  • Ming Ma
  • Zhihao Ma
  • Xiangyang Chen
  • Longfei Wang
  • Yixiao Feng

Efficient, accurate, and flexible relative localization is crucial in air-ground collaborative tasks. However, current approaches for robot relative localization are primarily realized in the form of distributed multi-robot SLAM systems with the same sensor configuration, which are tightly coupled with the state estimation of all robots, limiting both flexibility and accuracy. To this end, we fully leverage the high capacity of Unmanned Ground Vehicle (UGV) to integrate multiple sensors, enabling a semi-distributed cross-modal air-ground relative localization framework. In this work, both the UGV and the Unmanned Aerial Vehicle (UAV) independently perform SLAM while extracting deep learning-based keypoints and global descriptors, which decouples the relative localization from the state estimation of all agents. The UGV employs a local Bundle Adjustment (BA) with LiDAR, camera, and an IMU to rapidly obtain accurate relative pose estimates. The BA process adopts sparse keypoint optimization and is divided into two stages: First, optimizing camera poses interpolated from LiDAR-Inertial Odometry (LIO), followed by estimating the relative camera poses between the UGV and UAV. Additionally, we implement an incremental loop closure detection algorithm using deep learning-based descriptors to maintain and retrieve keyframes efficiently. Experimental results demonstrate that our method achieves outstanding performance in both accuracy and efficiency. Unlike traditional multi-robot SLAM approaches that transmit images or point clouds, our method only transmits keypoint pixels and their descriptors, effectively constraining the communication bandwidth under 0. 3 Mbps. Codes and data will be publicly available on https://github.com/Ascbpiac/cross-model-relative-localization.git.

NeurIPS Conference 2025 Conference Paper

Señorita-2M: A High-Quality Instruction-based Dataset for General Video Editing by Video Specialists

  • Bojia Zi
  • Penghui Ruan
  • Marco Chen
  • Xianbiao Qi
  • Shaozhe Hao
  • Shihao Zhao
  • Youze Huang
  • Bin Liang

Video content editing has a wide range of applications. With the advancement of diffusion-based generative models, video editing techniques have made remarkable progress, yet they still remain far from practical usability. Existing inversion-based video editing methods are time-consuming and struggle to maintain consistency in unedited regions. Although instruction-based methods have high theoretical potential, they face significant challenges in constructing high-quality training datasets - current datasets suffer from issues such as editing correctness, frame consistency, and sample diversity. To bridge these gaps, we introduce the Señorita-2M dataset, a large-scale, diverse, and high-quality video editing dataset. We systematically categorize editing tasks into 2 classes consisting of 18 subcategories. To build this dataset, we design four new task specialists and employ or modify 14 existing task experts to generate data samples for each subclass. In addition, we design a filtering pipeline at both the visual content and instruction levels to further enhance data quality. This approach ensures the reliability of constructed data. Finally, the Señorita-2M dataset comprises 2 million high-fidelity samples with diverse resolutions and frame counts. We trained multiple models using different base video models, i. e. , Wan2. 1 and CogVideoX-5B, on Señorita-2M, and the results demonstrate that the models exhibit superior visual quality, robust frame-to-frame consistency, and strong instruction following capability. More videos are available at: https: //senorita-2m-dataset. github. io.

IS Journal 2024 Journal Article

AdaCLF: An Adaptive Curriculum Learning Framework for Emotional Support Conversation

  • Geng Tu
  • Taiyu Niu
  • Ruifeng Xu
  • Bin Liang
  • Erik Cambria

Emotional support conversation (ESC) aims to alleviate emotional distress using data-driven approaches trained on human-generated responses. However, the subjective and open-ended nature of human conversations presents challenges in training ESC models due to uneven complexities in query–response pairs. This uneven complexity impedes the efficiency and effectiveness of learning in ESC models. Based on this, we propose an adaptive curriculum learning framework (AdaCLF) to dynamically choose courses of varying complexity according to the learning status of the ESC model. AdaCLF consists of two main components: the student model (referred to as the ESC model) and the teacher model (responsible for selecting appropriate data to enhance the student model’s training). The framework operates within the reinforcement learning paradigm, where the teacher model utilizes feedback from the student model to optimize its teaching strategy, fostering collaborative evolution. Both automatic and human evaluations on benchmark datasets demonstrate that our framework significantly improves existing ESC methods, generating more effective supportive responses.

AAAI Conference 2024 Conference Paper

Adaptive Graph Learning for Multimodal Conversational Emotion Detection

  • Geng Tu
  • Tian Xie
  • Bin Liang
  • Hongpeng Wang
  • Ruifeng Xu

Multimodal Emotion Recognition in Conversations (ERC) aims to identify the emotions conveyed by each utterance in a conversational video. Current efforts encounter challenges in balancing intra- and inter-speaker context dependencies when tackling intra-modal interactions. This balance is vital as it encompasses modeling self-dependency (emotional inertia) where speakers' own emotions affect them and modeling interpersonal dependencies (empathy) where counterparts' emotions influence a speaker. Furthermore, challenges arise in addressing cross-modal interactions that involve content with conflicting emotions across different modalities. To address this issue, we introduce an adaptive interactive graph network (IGN) called AdaIGN that employs the Gumbel Softmax trick to adaptively select nodes and edges, enhancing intra- and cross-modal interactions. Unlike undirected graphs, we use a directed IGN to prevent future utterances from impacting the current one. Next, we propose Node- and Edge-level Selection Policies (NESP) to guide node and edge selection, along with a Graph-Level Selection Policy (GSP) to integrate the utterance representation from original IGN and NESP-enhanced IGN. Moreover, we design a task-specific loss function that prioritizes text modality and intra-speaker context selection. To reduce computational complexity, we use pre-defined pseudo labels through self-supervised methods to mask unnecessary utterance nodes for selection. Experimental results show that AdaIGN outperforms state-of-the-art methods on two popular datasets. Our code will be available at https://github.com/TuGengs/AdaIGN.

IROS Conference 2024 Conference Paper

Highly Efficient Observation Process Based on FFT Filtering for Robot Swarm Collaborative Navigation in Unknown Environments *

  • Chenxi Li
  • Weining Lu
  • Zhihao Ma
  • Litong Meng
  • Bin Liang

Collaborative path planning for robot swarms in complex, unknown environments without external positioning is a challenging problem. This requires robots to find safe directions based on real-time environmental observations, and to efficiently transfer and fuse these observations within the swarm. This study presents a filtering method based on Fast Fourier Transform (FFT) to address these two issues. We treat sensors’ environmental observations as a digital sampling process. Then, we design two different types of filters for safe direction extraction, as well as for the compression and reconstruction of environmental data. The reconstructed data is mapped to probabilistic domain, achieving efficient fusion of swarm observations and planning decision. The computation time is only on the order of microseconds, and the transmission data in communication systems is in bit-level. The performance of our algorithm in sensor data processing was validated in real world experiments, and the effectiveness in swarm path optimization was demonstrated through extensive simulations.

AAAI Conference 2024 Conference Paper

Learning Diverse Risk Preferences in Population-Based Self-Play

  • Yuhua Jiang
  • Qihan Liu
  • Xiaoteng Ma
  • Chenghao Li
  • Yiqin Yang
  • Jun Yang
  • Bin Liang
  • Qianchuan Zhao

Among the remarkable successes of Reinforcement Learning (RL), self-play algorithms have played a crucial role in solving competitive games. However, current self-play RL methods commonly optimize the agent to maximize the expected win-rates against its current or historical copies, resulting in a limited strategy style and a tendency to get stuck in local optima. To address this limitation, it is important to improve the diversity of policies, allowing the agent to break stalemates and enhance its robustness when facing with different opponents. In this paper, we present a novel perspective to promote diversity by considering that agents could have diverse risk preferences in the face of uncertainty. To achieve this, we introduce a novel reinforcement learning algorithm called Risk-sensitive Proximal Policy Optimization (RPPO), which smoothly interpolates between worst-case and best-case policy learning, enabling policy learning with desired risk preferences. Furthermore, by seamlessly integrating RPPO with population-based self-play, agents in the population optimize dynamic risk-sensitive objectives using experiences gained from playing against diverse opponents. Our empirical results demonstrate that our method achieves comparable or superior performance in competitive games and, importantly, leads to the emergence of diverse behavioral modes. Code is available at https://github.com/Jackory/RPBT.

NeurIPS Conference 2024 Conference Paper

NeuralPlane: An Efficiently Parallelizable Platform for Fixed-wing Aircraft Control with Reinforcement Learning

  • Chuanyi Xue
  • Qihan Liu
  • Xiaoteng Ma
  • Yang Qi
  • Xinyao Qin
  • Yuhua Jiang
  • Ning Gui
  • Jinsheng Ren

Reinforcement learning (RL) demonstrates superior potential over traditional flight control methods for fixed-wing aircraft, particularly under extreme operational conditions. However, the high demand for training samples and the lack of efficient computation in existing simulators hinder its further application. In this paper, we introduce NeuralPlane, the first benchmark platform for large-scale parallel simulations of fixed-wing aircraft. NeuralPlane significantly boosts high-fidelity simulation via GPU-accelerated Flight Dynamics Model (FDM) computation, achieving a single-step simulation time of just 0. 2 seconds at a parallel scale of $10^{6}$, far exceeding current platforms. We also provide clear code templates, comprehensive evaluation/visualization tools and hierarchical frameworks for integrating RL and traditional control methods. We believe that NeuralPlane can accelerate the development of RL-based fixed-wing flight control and serve as a new challenging benchmark for the RL community. Our NeuralPlane is open-source and accessible at https: //github. com/xuecy22/NeuralPlane.

NeurIPS Conference 2022 Conference Paper

Domain Generalization by Learning and Removing Domain-specific Features

  • Yu Ding
  • Lei Wang
  • Bin Liang
  • Shuming Liang
  • Yang Wang
  • Fang Chen

Deep Neural Networks (DNNs) suffer from domain shift when the test dataset follows a distribution different from the training dataset. Domain generalization aims to tackle this issue by learning a model that can generalize to unseen domains. In this paper, we propose a new approach that aims to explicitly remove domain-specific features for domain generalization. Following this approach, we propose a novel framework called Learning and Removing Domain-specific features for Generalization (LRDG) that learns a domain-invariant model by tactically removing domain-specific features from the input images. Specifically, we design a classifier to effectively learn the domain-specific features for each source domain, respectively. We then develop an encoder-decoder network to map each input image into a new image space where the learned domain-specific features are removed. With the images output by the encoder-decoder network, another classifier is designed to learn the domain-invariant features to conduct image classification. Extensive experiments demonstrate that our framework achieves superior performance compared with state-of-the-art methods.

IJCAI Conference 2018 Conference Paper

Deep Text Classification Can be Fooled

  • Bin Liang
  • Hongcheng Li
  • Miaoqiang Su
  • Pan Bian
  • Xirong Li
  • Wenchang Shi

In this paper, we present an effective method to craft text adversarial samples, revealing one important yet underestimated fact that DNN-based text classifiers are also prone to adversarial sample attack. Specifically, confronted with different adversarial scenarios, the text items that are important for classification are identified by computing the cost gradients of the input (white-box attack) or generating a series of occluded test samples (black-box attack). Based on these items, we design three perturbation strategies, namely insertion, modification, and removal, to generate adversarial samples. The experiment results show that the adversarial samples generated by our method can successfully fool both state-of-the-art character-level and word-level DNN-based text classifiers. The adversarial samples can be perturbed to any desirable classes without compromising their utilities. At the same time, the introduced perturbation is difficult to be perceived.

ICRA Conference 2004 Conference Paper

Multisensory Gripper and Local Autonomy of Extravehicular Mobile Robot

  • Yang Liu
  • Tao Mei
  • Xiaohua Wang
  • Bin Liang

This paper presents the development of the multisensory robot gripper for extravehicular mobile robot (EMR) and its sensor based local autonomy. For stable extravehicular walking and performing delicate tasks in unstructured and complex environment, our EMR gripper employed a simple and reliable mechanism and it is equipped with multisensory apparatus. Local autonomy of the space robot is an important requirement for on-orbit manipulation. Detecting contact state between robot gripper and environment is essential to fulfill space robot local autonomy. But we often face the problem of lack of sensory information when we try to know the contact state. A new way to detect: contact state under inadequate sensory information is proposed. By combing force sensor information with gripper geometry and mechanical analysis, some spatial contact information between robot and the trusswork can be derived. Then robot can adjust its position and orientation by fine motion displacement based on contact information to fulfill steady grasping. This method is implemented on a walking/grasping task, which is a simple and important fundamental task for extravehicular space robot.