Author name cluster

Jun Luo

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

34 papers

2 author rows

EAAI Journal 2026 Journal Article

A camera-light detection and ranging sensor online extrinsic calibration network based on mamba-like linear attention mechanism for unstructured off-road environments

Lang Wu
Ren Xiao
Huayan Pu
Gang Wang
Mingliang Zhou
Jun Luo

The fusion of heterogeneous data from cameras and Light Detection and Ranging LiDAR sensors is essential for robust and accurate environmental perception in robotics and autonomous vehicles. Accurate extrinsic calibration between the camera and LiDAR is a prerequisite for fusing camera images with three-dimensional 3D point cloud data, ensuring spatial alignment between the two data modalities. The existing methods have focused primarily on the online calibration of camera-LiDAR systems in structured urban environments, while achieving accurate online calibration in unstructured, feature-degraded off-road settings remains a significant challenge. To address this, we propose a Mamba-Like Linear Attention Network MLLANet for camera-LiDAR extrinsic online calibration on the basis of the mamba-like linear attention model. A multilevel feature extraction module leveraging mamba-like linear attention is constructed to enhance the network's ability to represent complex terrain features. A multiscale feature fusion and matching module is then constructed to accurately perceive feature differences between two-dimensional 2D images and LiDAR reprojected depth maps. Moreover, a hybrid loss function incorporating Huber depth map loss is designed to effectively suppress the influence of LiDAR point cloud outliers and accelerate network convergence in complex scenarios. Extensive experiments are conducted on one urban road dataset and two off-road datasets to validate the effectiveness of the proposed calibration network. The proposed method, MLLANet, achieves average translation errors of 0.289 cm, 2.161 cm, and 1.333 cm and average angular errors of 0.012 degrees, 0.057 degrees, and 0.192 degrees, respectively, on these three datasets, outperforming most existing learning-based calibration methods.

Details DOI

AAAI Conference 2026 Conference Paper

Branch, or Layer? Zeroth-Order Optimization for Continual Learning of Vision-Language Models

Ziwei Liu
Borui Kang
Wei Li
Hangjie Yuan
Yanbing Yang
Wenbin Li
Yifan Zhu
Tao Feng

Vision-Language Continual Learning (VLCL) has attracted significant research attention for its robust capabilities, and the adoption of Parameter-Efficient Fine-Tuning (PEFT) strategies is enabling these models to achieve competitive performance with substantially reduced resource consumption. However, dominated First-Order (FO) optimization is prone to trap models in suboptimal local minima, especially in limited exploration subspace within PEFT. To overcome this challenge, this paper pioneers a systematic exploration of adopting Zeroth-Order (ZO) optimization for PEFT-based VLCL. We first identify the incompatibility of naive full-ZO adoption in VLCL due to optimization process instability. We then investigate the application of ZO optimization from a modality branch-wise to a fine-grained layer-wise across various training units to identify an optimal strategy. Besides, a key theoretical insight reveals that vision modality exhibit higher variance than language counterparts in VLCL during the ZO optimization process, and we propose a modality-aware stabilized ZO strategy, which adopts gradient sign normalization in ZO and constrains vision modality perturbation to further improve performance. Benefiting from the adoption of ZO optimization, PEFT-based VLCL fulfills better ability to escape local minima during the optimization process, extensive experiments on four benchmarks demonstrate that our method achieves state-of-the-art results.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Cheating Stereo Matching in Full-Scale: Physical Adversarial Attack Against Binocular Depth Estimation in Autonomous Driving

Kangqiao Zhao
Shuo Huai
Xurui Song
Jun Luo

Though deep neural models adopted to realize the perception of autonomous driving have proven vulnerable to adversarial examples, known attacks often leverage 2D patches and target mostly monocular perception. Therefore, the effectiveness of Physical Adversarial Examples (PAEs) on stereo-based binocular depth estimation remains largely unexplored. To this end, we propose the first texture-enabled physical adversarial attack against stereo matching models in the context of autonomous driving. Our method employs a 3D PAE with global camouflage texture rather than a local 2D patch-based one, ensuring both visual consistency and attack effectiveness across different viewpoints of stereo cameras. To cope with the disparity effect of these cameras, we also propose a new 3D stereo matching rendering module that allows the PAE to be aligned with real-world positions and headings in binocular vision. We further propose a novel merging attack that seamlessly blends the target into the environment through fine-grained PAE optimization. It has significantly enhanced stealth and lethality upon existing hiding attacks that fail to get seamlessly merged into the background. Extensive evaluations show that our PAEs can successfully fool the stereo models into producing erroneous depth information.

PDF Details DOI

JBHI Journal 2026 Journal Article

DUR-Net+: Semi-Supervised Abdominal CT Pheochromocytoma Segmentation Via Dynamic Uncertainty Rectified and Prior Knowledge From SAM-Med3D

Chuanbo Qin
Zhuyuan Chen
Dong Wang
Bin Zheng
Jun Luo
Junying Zeng
Xudong Jia
Jin Wen

Pheochromocytoma is a rare urological adrenal tumor disease. Automated segmentation of pheochromocytomas from computed tomography (CT) is essential for diagnosis and treatment. However, this task is a challenging one due to issues such as blurred boundaries, irregular shapes, variations in location and size, and the lack of annotated images for training. To address these issues, we propose a semi-supervised framework for pheochromocytoma segmentation that primarily consists of a dynamic uncertainty rectification mechanism and a supervised strategy based on SAM-Med3D prior knowledge. First, we design a semi-supervised segmentation model comprising a shared encoder and multiple independent decoders that dynamically select pseudo labels from the different decoder outputs. To mitigate the risk of unreliable predictions caused by sparse annotations during training, we introduce uncertainty estimation to prioritize reliable outputs. Additionally, an Attentional Convolution Block (ACB) is designed in the encoding stage to fully utilize both global and local features, improving tumor recognition in segmentation. Furthermore, SAM-Med3D prior knowledge is incorporated into the framework as supplementary supervisory information, aiding the model in learning from limited labeled data. To eliminate the labor-intensive requirement for manual prompts in SAM-Med3D, we leverage pseudo labels to generate high-quality mask prompts, thus transforming the clinical workflow. Experiments on two pheochromocytoma datasets from different centers demonstrate that our proposed method achieves competitive performance.

Details DOI

EAAI Journal 2026 Journal Article

Shapley estimated explanation: A fast post-hoc attribution method for interpreting intelligent mechanical fault diagnosis

Qian Chen
Xingjian Dong
Shuai Gao
Jun Luo
Zhike Peng
Guang Meng

Despite significant progress in intelligent fault diagnosis (IFD), the lack of interpretability remains a critical barrier to practical industrial applications, driving the growth of interpretability research in IFD. Post-hoc interpretability has gained popularity due to its ability to preserve network flexibility and scalability without modifying model structures. However, these methods often yield suboptimal time-domain explanations. Recently, combining domain transform with SHapley Additive exPlanation (SHAP) has improved interpretability by extending explanations to more informative domains. Nonetheless, the computational expense of SHAP, exacerbated by increased dimensions from domain transforms, remains a major challenge. To address this, we propose patch-wise attribution and SHapley Estimated exPlanation (SHEP). Patch-wise attribution reduces feature dimensions at the cost of explanation granularity, while SHEP simplifies subset enumeration to approximate SHAP, reducing complexity from exponential to linear. Together, these methods significantly enhance SHAP’s computational efficiency, providing feasibility for real-time interpretation in monitoring tasks. Extensive experiments confirm SHEP’s efficiency, interpretability, and reliability in approximating SHAP. Additionally, with open-source code, SHEP has the potential to serve as a benchmark for post-hoc interpretability in IFD. The code is available on https: //github. com/ChenQian0618/SHEP.

Details DOI

AAAI Conference 2026 Conference Paper

Where to Start Alignment? Diffusion Large Language Model May Demand a Distinct Position

Zhixin Xie
Xurui Song
Jun Luo

Diffusion Large Language Models (dLLMs) have recently emerged as a competitive non-autoregressive paradigm due to their unique training and inference approach. However, there is currently a lack of safety study on this novel architecture. In this paper, we present the first analysis of dLLMs' safety performance and propose a novel safety alignment method tailored to their unique generation characteristics. Specifically, we identify a critical asymmetry between the defender and attacker in terms of security. For the defender, we reveal that the middle tokens of the response, rather than the initial ones, are more critical to the overall safety of dLLM outputs; this seems to suggest that aligning middle tokens can be more beneficial to the defender. The attacker, on the contrary, may have limited power to manipulate middle tokens, as we find dLLMs have a strong tendency towards a sequential generation order in practice, forcing the attack to meet this distribution and diverting it from influencing the critical middle tokens. Building on this asymmetry, we introduce Middle-tOken Safety Alignment (MOSA), a novel method that directly aligns the model's middle generation with safe refusals exploiting reinforcement learning. We implement MOSA and compare its security performance against eight attack methods on two benchmarks. We also test the utility of MOSA-aligned dLLM on coding, math, and general reasoning. The results strongly prove the superiority of MOSA.

PDF Details DOI

EAAI Journal 2025 Journal Article

A latent-coupled neural network for multiphysics long-term forecasting in reactor transients using sparse observations

Yu-Yan Xu
Jun Luo
Deng Pan
Wei Lu
Ting Liu
Guanghui Yuan
Minxiao Zhong
Qing Li

Complex dynamical systems in safety-critical applications like nuclear reactors involve strongly coupled physical fields evolving over space and time. Accurate prediction of these fields is vital for safety monitoring but is challenged by limited sensor placement and unobservable variables (e. g. , xenon and iodine concentrations). This paper proposes the Sparse observation to High-dimensional coupled physical field Prediction Network (SHPNet), a deep learning framework that predicts and reconstructs multiple physical fields directly from sparse observations. SHPNet combines a three-branch autoencoder to extract shared latent representations with a neural operator that models temporal dynamics in latent space, enabling efficient long-term forecasting. Evaluated on Hua-long Pressurized Reactor (HPR1000) under varying power and burnup conditions, SHPNet outperforms traditional frameworks and end-to-end model, achieving higher accuracy, robustness to observation sparsity, and effective reconstruction of unobservable fields. These results demonstrate SHPNet’s potential as a practical tool for real-time monitoring of complex coupled systems.

Details DOI

EAAI Journal 2025 Journal Article

A quantized subtraction-convolution network for industrial lightweight edge interpretable diagnosis

Qihang Wu
Jun Luo
Wenbin Huang
Xiaoxi Ding

In addressing the prevalent challenges of delayed fault diagnosis, deployment complexity of advanced deep learning models on low-cost edge devices, and limited model interpretability, this study proposed an edge-cloud collaborative interpretable diagnosis based on quantized subtraction-convolution network. Firstly, an interpretable quantized subtraction-convolution network with lightweight three-layer structure is designed. Inspired by the adaptive spectral subtraction, a learnable sparse pulse kernel is designed to extract the fault feature as the subtraction layer. Subsequently, the convolution and classification layers are then integrated to produce interpretable results for efficient identification. To facilitate deployment and updates on edge devices, the quantized subtraction-convolution network is decomposed into a lightweight edge architecture and corresponding parameters. It can be deployed on edge devices, and an edge-cloud collaborative framework addresses its training and compression. Considering the sparse characteristics of quantized subtraction-convolution network, a sparse pulse quantization strategy and quantization-aware training technique were developed to compress the model parameters. Finally, a low-cost edge fault diagnosis node prototype with quantized subtraction-convolution network is designed for real-time edge fault diagnosis. Experiments shown that the proposed method achieved average accuracy of 99. 88 percent with compression ratio of 9. 5. The memory usage, floating-point operations per second, and average power consumption are respectively only 54 kilo binary byte, 0. 053 mega binary byte, and 6. 67 mJ. Actual gear edge diagnosis experiments confirmed the effectiveness, which can implement model inference in 0. 045 s at the edge fault diagnosis node. It is anticipated that the proposed method will find extensive application in the industrial edge interpretable diagnosis.

Details DOI

EAAI Journal 2025 Journal Article

Adversarial-Causal Representation Learning Networks for Machine fault diagnosis under unseen conditions based on vibration and acoustic signals

Fei Wu
Zhuohang Xiang
Dengyu Xiao
Yaodong Hao
Yi Qin
Huayan Pu
Jun Luo

To address the challenges of obtaining diverse data, domain generalization (DG) methods for fault diagnosis have been developed. Domain adversarial methods are currently the most popular, due to their ability to handle data from unknown domains without requiring target domain information. However, their capacity to extract domain-irrelevant features remains challenging, often resulting in accuracy below 90% in many DG scenarios. This limitation stems from their inability to fully capture global dependencies, causing feature entanglement and redundant dependencies. To address these issues, we proposed a novel intelligent fault diagnosis method called Adversarial-Causal Representation Learning Networks (ACRLN), which is based on causal learning. By spatial mask domain adversarial method, ACRLN can significantly enhance data utilization by fully capturing the global dependency that are often ignored by domain adversarial algorithms. At the same time, causal learning is integrated into the ACRLN to further accomplish feature decoupling and the reduction of redundant dependency. This is achieved through channel feature orthogonality method combined with a loss function rooted in correlation analysis. Moreover, it adeptly addresses the spill-over effect often encountered in causal learning. Finally, ACRLN achieves better results and proves its effectiveness by comparison with several state-of-the-art fault diagnosis and DG algorithms on multiple datasets.

Details DOI

AAAI Conference 2025 Conference Paper

An Evaluation Framework for Product Images Background Inpainting Based on Human Feedback and Product Consistency

Yuqi Liang
Jun Luo
Xiaoxi Guo
Jianqi Bi

In product advertising applications, the automated inpainting of backgrounds utilizing AI techniques in product images has emerged as a significant task. However, the techniques still suffer from issues such as inappropriate background and inconsistent product in generated product images, and existing approaches for evaluating the quality of generated product images are mostly inconsistent with human feedback causing the evaluation for this task to depend on manual annotation. To relieve the issues above, this paper proposes Human Feedback and Product Consistency (HFPC), which can automatically assess the generated product images based on two modules. Firstly, to solve inappropriate backgrounds, human feedback on 44,000 automated inpainting product images is collected to train a reward model based on multi-modal features extracted from BLIP and comparative learning. Secondly, to filter generated product images containing inconsistent products, a fine-tuned segmentation model is employed to segment the product of the original and generated product images and then compare the differences between the above two. Extensive experiments have demonstrated that HFPC can effectively evaluate the quality of generated product images and significantly reduce the expense of manual annotation. Moreover, HFPC achieves state-of-the-art (96.4% in precision) in comparison to other open-source visual-quality-assessment models.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Attack via Overfitting: 10-shot Benign Fine-tuning to Jailbreak LLMs

Zhixin Xie
Xurui Song
Jun Luo

Despite substantial efforts in safety alignment, recent research indicates that Large Language Models (LLMs) remain highly susceptible to jailbreak attacks. Among these attacks, finetuning-based ones that compromise LLMs’ safety alignment via fine-tuning stand out due to its stable jailbreak performance. In particular, a recent study indicates that fine-tuning with as few as 10 harmful question-answer (QA) pairs can lead to successful jailbreaking across various harmful questions. However, such malicious fine-tuning attacks are readily detectable and hence thwarted by moderation models. In this paper, we demonstrate that LLMs can be jailbroken by fine-tuning with only 10 benign QA pairs; our attack exploits the increased sensitivity of LLMs to fine-tuning data after being overfitted. Specifically, our fine-tuning process starts with overfitting an LLM via fine-tuning with benign QA pairs involving identical refusal answers. Further fine-tuning is then performed with standard benign answers, causing the overfitted LLM to forget the refusal attitude and thus provide compliant answers regardless of the harmfulness of a question. We implement our attack on the ten LLMs and compare it with five existing baselines. Experiments demonstrate that our method achieves significant advantages in both attack effectiveness and attack stealth. Our findings expose previously unreported security vulnerabilities in current LLMs and provide a new perspective on understanding how LLMs’ security is compromised, even with benign fine-tuning. Our code is available at https: //github. com/ZHIXINXIE/ten_benign. git.

PDF Details

NeurIPS Conference 2025 Conference Paper

Co-Reinforcement Learning for Unified Multimodal Understanding and Generation

Jingjing Jiang
Chongjie Si
Jun Luo
Hanwang Zhang
Chao Ma

This paper presents a pioneering exploration of reinforcement learning (RL) via group relative policy optimization for unified multimodal large language models (ULMs), aimed at simultaneously reinforcing generation and understanding capabilities. Through systematic pilot studies, we uncover the significant potential of ULMs to enable the synergistic co-evolution of dual capabilities within a shared policy optimization framework. Building on this insight, we introduce \textbf{CoRL}, a \textbf{Co}-\textbf{R}einforcement \textbf{L}earning framework comprising a unified RL stage for joint optimization and a refined RL stage for task-specific enhancement. With the proposed CoRL, our resulting model, \textbf{ULM-R1}, achieves average improvements of 7\% on three text-to-image generation datasets and 23\% on nine multimodal understanding benchmarks. These results demonstrate the effectiveness of CoRL and highlight the substantial benefits of reinforcement learning in facilitating cross-task synergy and optimization for ULMs. Code is available at \url{https: //github. com/mm-vl/ULM-R1}.

PDF Details

AAAI Conference 2025 Conference Paper

Do Not DeepFake Me: Privacy-Preserving Neural 3D Head Reconstruction Without Sensitive Images

Jiayi Kong
Xurui Song
Shuo Huai
Baixin Xu
Jun Luo
Ying He

While 3D head reconstruction is widely used for modeling, existing neural reconstruction approaches rely on high-resolution multi-view images, posing notable privacy issues. Individuals are particularly sensitive to facial features, and facial image leakage can lead to many malicious activities, such as unauthorized tracking and deepfake. In contrast, geometric data is less susceptible to misuse due to its complex processing requirements, and absence of facial texture features. In this paper, we propose a novel two-stage 3D facial reconstruction method aimed at avoiding exposure to sensitive facial information while preserving detailed geometric accuracy. Our approach first uses non-sensitive rear-head images for initial geometry and then refines this geometry using processed privacy-removed gradient images. Extensive experiments show that the resulting geometry is comparable to methods using full images, while the process is resistant to DeepFake applications and facial recognition (FR) systems, thereby proving its effectiveness in privacy protection.

PDF Details DOI

AAAI Conference 2025 Conference Paper

GeoPro-Net: Learning Interpretable Spatiotemporal Prediction Models Through Statistically-Guided Geo-Prototyping

Bang An
Xun Zhou
Zirui Zhou
Ronilo Ragodos
Zenglin Xu
Jun Luo

The problem of forecasting spatiotemporal events such as crimes and accidents is crucial to public safety and city management. Besides accuracy, interpretability is also a key requirement for spatiotemporal forecasting models to justify the decisions. Merely presenting predicted scores fails to convince the public and does not contribute to future urban planning. Interpretation of the spatiotemporal forecasting mechanism is, however, challenging due to the complexity of multi-source spatiotemporal features, the non-intuitive nature of spatiotemporal patterns for non-expert users, and the presence of spatial heterogeneity in the data. Currently, no existing deep learning model intrinsically interprets the complex predictive process learned from multi-source spatiotemporal features. To bridge the gap, we propose GeoPro-Net, an intrinsically interpretable spatiotemporal model for spatiotemporal event forecasting problems. GeoPro-Net introduces a novel Geo-concept convolution operation, which employs statistical tests to extract predictive patterns in the input as "Geo-concepts'', and condenses the "Geo-concept-encoded'' input through interpretable channel fusion and geographic-based pooling. In addition, GeoPro-Net learns different sets of prototypes of concepts inherently, and projects them to real-world cases for interpretation. Comprehensive experiments and case studies on four real-world datasets demonstrate that GeoPro-Net provides better interpretability while still achieving competitive prediction performance compared with state-of-the-art baselines.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

LOPT: Learning Optimal Pigovian Tax in Sequential Social Dilemmas

Yun Hua
Shang Gao
Wenhao Li
Haosheng Chen
Bo Jin
Xiangfeng Wang
Jun Luo
Hongyuan Zha

Multi-agent reinforcement learning (MARL) has emerged as a powerful framework for modeling autonomous agents that independently optimize their individual objectives. However, in mixed-motive MARL environments, rational self-interested behaviors often lead to collectively suboptimal outcomes situations commonly referred to as social dilemmas. A key challenge in addressing social dilemmas lies in accurately quantifying and representing them in a numerical form that captures how self-interested agent behaviors impact social welfare. To address this challenge, \textit{externalities} in the economic concept is adopted and extended to denote the unaccounted-for impact of one agent's actions on others, as a means to rigorously quantify social dilemmas. Based on this measurement, a novel method, \textbf{L}earning \textbf{O}ptimal \textbf{P}igovian \textbf{T}ax (\textbf{LOPT}) is proposed. Inspired by Pigovian taxes, which are designed to internalize externalities by imposing cost on negative societal impacts, LOPT employs an auxiliary tax agent that learns an optimal Pigovian tax policy to reshape individual rewards aligned with social welfare, thereby promoting agent coordination and mitigating social dilemmas. We support LOPT with theoretical analysis and validate it on standard MARL benchmarks, including Escape Room and Cleanup. Results show that by effectively internalizing externalities that quantify social dilemmas, LOPT aligns individual objectives with collective goals, significantly improving social welfare over state-of-the-art baselines.

PDF Details

ICLR Conference 2025 Conference Paper

On the Performance Analysis of Momentum Method: A Frequency Domain Perspective

Xianliang Li
Jun Luo
Zhiwei Zheng
Hanxiao Wang
Li Luo
Lingkun Wen
Linlong Wu
Sheng Xu 0004

Momentum-based optimizers are widely adopted for training neural networks. However, the optimal selection of momentum coefficients remains elusive. This uncertainty impedes a clear understanding of the role of momentum in stochastic gradient methods. In this paper, we present a frequency domain analysis framework that interprets the momentum method as a time-variant filter for gradients, where adjustments to momentum coefficients modify the filter characteristics. Our experiments support this perspective and provide a deeper understanding of the mechanism involved. Moreover, our analysis reveals the following significant findings: high-frequency gradient components are undesired in the late stages of training; preserving the original gradient in the early stages, and gradually amplifying low-frequency gradient components during training both enhance performance. Based on these insights, we propose Frequency Stochastic Gradient Descent with Momentum (FSGDM), a heuristic optimizer that dynamically adjusts the momentum filtering characteristic with an empirically effective dynamic magnitude response. Experimental results demonstrate the superiority of FSGDM over conventional momentum optimizers.

Details

IROS Conference 2025 Conference Paper

Reinforcement Learning-Based Energy-Efficient and Obstacle-Free Path Planning for Magnetic Microrobots in Dynamic Environments

Hongwei Wang
Mingxue Cai
Jun Luo
Mingguo Jiang
Chenyang Huang 0004
Haolan Shen
Tiantian Xu 0001

Online path planning for magnetic microrobots actuated by electromagnetic system in dynamic flow field presents significant challenges due to time-varying fluid dynamics, energy constraints, and collision risks. Traditional path planning approaches, which often rely on static flow assumptions or simplified geometric models, struggle to balance energy efficiency, path continuity, and adaptability in real-world scenarios. This paper introduces an end-to-end path planner for energy-efficient and collision-free navigation of magnetic helical microrobots, integrating flow field feature extraction and reinforcement learning (RL) framework. Our method employs a transformer encoder to capture contextual correlations of flow field and uses a Soft Actor-Critic (SAC) framework to optimize energy consumption while ensuring dynamic obstacle avoidance. Simulations and experiments in dynamic flow environments validate our approach, demonstrating 14. 7% lower energy consumption and robust collision avoidance in several different test scenarios.

Details

NeurIPS Conference 2025 Conference Paper

Shapley-Coop: Credit Assignment for Emergent Cooperation in Self-Interested LLM Agents

Yun Hua
Haosheng Chen
Shiqin Wang
Wenhao Li
Xiangfeng Wang
Jun Luo

Large Language Models (LLMs) are increasingly deployed as autonomous agents in multi-agent systems, and promising coordination has been demonstrated in handling complex tasks under predefined roles and scripted workflows. However, significant challenges remain in open-ended environments, where agents are inherently self-interested and explicit coordination guidelines are absent. In such scenarios, misaligned incentives frequently lead to social dilemmas and inefficient collective outcomes. Inspired by how human societies tackle similar coordination challenges—through temporary collaborations like employment or subcontracting—a cooperative workflow \textbf{Shapley-Coop} is proposed. This workflow enables self-interested Large Language Model (LLM) agents to engage in emergent collaboration by using a fair credit allocation mechanism to ensure each agent’s contributions are appropriately recognized and rewarded. Shapley-Coop introduces structured negotiation protocols and Shapley-inspired reasoning to estimate agents’ marginal contributions, thereby enabling effective task-time coordination and equitable post-task outcome redistribution. This results in effective coordination that fosters collaboration while preserving agent autonomy, through a rational pricing mechanism that encourages cooperative behavior. Evaluated in two multi-agent games and a software engineering simulation, Shapley-Coop consistently enhances LLM agent collaboration and facilitates equitable outcome redistribution, accurately reflecting individual contributions during the task execution process.

PDF Details

EAAI Journal 2024 Journal Article

A noise suppression zeroing neural network for trajectory tracking with joint angle constraints of mobile manipulator

Zhongbo Sun
Yuzhe Fei
Shijun Tang
Xingtian Xiao
Jun Luo
Keping Liu

The trajectory tracking control (TTC) is an indispensable part in mobile manipulator (MM) application. The actual usage of the MM can be affected by factors, such as external noise interference and joint constraints. However, most of the current researches on the control of the MM only consider one of these factors. Herein, this paper presents a noise suppression zeroing neural network with joint angle constraints (NSZNN-JAC) model guided by theoretical analysis to solve the TTC problem of MM with both noise interference and joint angle constraints. The TTC problem with joint angle constraints can be transformed into time-varying nonlinear equations (TVNE) problem. The theoretical analyses verify that the NSZNN-JAC model is able to maintain convergence in the noise interference. The effectiveness and superiority of the NSZNN-JAC model are demonstrated by comparison in simulations. Moreover, the NSZNN-JAC model is applied to a physical platform which can substantiate that it is capable of performing the TTC task in the real platform.

Details DOI

JMLR Journal 2024 Journal Article

Label Alignment Regularization for Distribution Shift

Ehsan Imani
Guojun Zhang
Runjia Li
Jun Luo
Pascal Poupart
Philip H.S. Torr
Yangchen Pan

Recent work has highlighted the label alignment property (LAP) in supervised learning, where the vector of all labels in the dataset is mostly in the span of the top few singular vectors of the data matrix. Drawing inspiration from this observation, we propose a regularization method for unsupervised domain adaptation that encourages alignment between the predictions in the target domain and its top singular vectors. Unlike conventional domain adaptation approaches that focus on regularizing representations, we instead regularize the classifier to align with the unsupervised target data, guided by the LAP in both the source and target domains. Theoretical analysis demonstrates that, under certain assumptions, our solution resides within the span of the top right singular vectors of the target domain data and aligns with the optimal solution. By removing the reliance on the commonly used optimal joint risk assumption found in classic domain adaptation theory, we showcase the effectiveness of our method on addressing problems where traditional domain adaptation methods often fall short due to high joint error. Additionally, we report improved performance over domain adaptation baselines in well-known tasks such as MNIST-USPS domain adaptation and cross-lingual sentiment analysis. An implementation is available at https://github.com/EhsanEI/lar/. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2024. ( edit, beta )

PDF Details

TMLR Journal 2023 Journal Article

Memory-efficient Reinforcement Learning with Value-based Knowledge Consolidation

Qingfeng Lan
Yangchen Pan
Jun Luo
A. Rupam Mahmood

Artificial neural networks are promising for general function approximation but challenging to train on non-independent or non-identically distributed data due to catastrophic forgetting. The experience replay buffer, a standard component in deep reinforcement learning, is often used to reduce forgetting and improve sample efficiency by storing experiences in a large buffer and using them for training later. However, a large replay buffer results in a heavy memory burden, especially for onboard and edge devices with limited memory capacities. We propose memory-efficient reinforcement learning algorithms based on the deep Q-network algorithm to alleviate this problem. Our algorithms reduce forgetting and maintain high sample efficiency by consolidating knowledge from the target Q-network to the current Q-network. Compared to baseline methods, our algorithms achieve comparable or better performance in both feature-based and image-based tasks while easing the burden of large experience replay buffers.

PDF Details

TMLR Journal 2023 Journal Article

Reinforcement Teaching

Calarina Muslimani
Alex Lewandowski
Dale Schuurmans
Matthew E. Taylor
Jun Luo

Machine learning algorithms learn to solve a task, but are unable to improve their ability to learn. Meta-learning methods learn about machine learning algorithms and improve them so that they learn more quickly. However, existing meta-learning methods are either hand-crafted to improve one specific component of an algorithm or only work with differentiable algorithms. We develop a unifying meta-learning framework, called \textit{Reinforcement Teaching}, to improve the learning process of \emph{any} algorithm. Under Reinforcement Teaching, a teaching policy is learned, through reinforcement, to improve a student's learning algorithm. To learn an effective teaching policy, we introduce the \textit{parametric-behavior embedder} that learns a representation of the student's learnable parameters from its input/output behavior. We further use \textit{learning progress} to shape the teacher's reward, allowing it to more quickly maximize the student's performance. To demonstrate the generality of Reinforcement Teaching, we conduct experiments in which a teacher learns to significantly improve both reinforcement and supervised learning algorithms. Reinforcement Teaching outperforms previous work using heuristic reward functions and state representations, as well as other parameter representations.

PDF Details

NeurIPS Conference 2022 Conference Paper

A Simple Decentralized Cross-Entropy Method

Zichen Zhang
Jun Jin
Martin Jagersand
Jun Luo
Dale Schuurmans

Cross-Entropy Method (CEM) is commonly used for planning in model-based reinforcement learning (MBRL) where a centralized approach is typically utilized to update the sampling distribution based on only the top-$k$ operation's results on samples. In this paper, we show that such a centralized approach makes CEM vulnerable to local optima, thus impairing its sample efficiency. To tackle this issue, we propose Decentralized CEM (DecentCEM), a simple but effective improvement over classical CEM, by using an ensemble of CEM instances running independently from one another, and each performing a local improvement of its own sampling distribution. We provide both theoretical and empirical analysis to demonstrate the effectiveness of this simple decentralized approach. We empirically show that, compared to the classical centralized approach using either a single or even a mixture of Gaussian distributions, our DecentCEM finds the global optimum much more consistently thus improves the sample efficiency. Furthermore, we plug in our DecentCEM in the planning problem of MBRL, and evaluate our approach in several continuous control environments, with comparison to the state-of-art CEM based MBRL approaches (PETS and POPLIN). Results show sample efficiency improvement by simply replacing the classical CEM module with our DecentCEM module, while only sacrificing a reasonable amount of computational cost. Lastly, we conduct ablation studies for more in-depth analysis. Code is available at https: //github. com/vincentzhang/decentCEM.

PDF Details

IJCAI Conference 2022 Conference Paper

Adapt to Adaptation: Learning Personalization for Cross-Silo Federated Learning

Jun Luo
Shandong Wu

Conventional federated learning (FL) trains one global model for a federation of clients with decentralized data, reducing the privacy risk of centralized training. However, the distribution shift across non-IID datasets, often poses a challenge to this one-model-fits-all solution. Personalized FL aims to mitigate this issue systematically. In this work, we propose APPLE, a personalized cross-silo FL framework that adaptively learns how much each client can benefit from other clients’ models. We also introduce a method to flexibly control the focus of training APPLE between global and local objectives. We empirically evaluate our method's convergence and generalization behaviors, and perform extensive experiments on two benchmark datasets and two medical imaging datasets under two non-IID settings. The results show that the proposed personalized FL framework, APPLE, achieves state-of-the-art performance compared to several other personalized FL approaches in the literature. The code is publicly available at https: //github. com/ljaiverson/pFL-APPLE.

PDF Details DOI

AAMAS Conference 2022 Conference Paper

Multiagent Q-learning with Sub-Team Coordination

Wenhan Huang
Kai Li
Kun Shao
Tianze Zhou
Jun Luo
Dongge Wang
Hangyu Mao
Jianye Hao

For cooperative mutliagent reinforcement learning tasks, we propose a novel value factorization framework in the popular centralized training with decentralized execution paradigm, called multiagent Q-learning with sub-team coordination (QSCAN). This framework could flexibly exploit local coordination within sub-teams for effective factorization while honoring the individual-globalmax (IGM) condition. QSCAN encompasses the full spectrum of sub-team coordination according to sub-team size, ranging from the monotonic value function class to the entire IGM function class, with familiar methods such as QMIX and QPLEX located at the respective extremes of the spectrum. Empirical results show that QSCAN’s performance dominates state-of-the-art methods in predator-prey tasks and the Switch challenge in MA-Gym.

PDF

NeurIPS Conference 2022 Conference Paper

Multiagent Q-learning with Sub-Team Coordination

Wenhan Huang
Kai Li
Kun Shao
Tianze Zhou
Matthew Taylor
Jun Luo
Dongge Wang
Hangyu Mao

In many real-world cooperative multiagent reinforcement learning (MARL) tasks, teams of agents can rehearse together before deployment, but then communication constraints may force individual agents to execute independently when deployed. Centralized training and decentralized execution (CTDE) is increasingly popular in recent years, focusing mainly on this setting. In the value-based MARL branch, credit assignment mechanism is typically used to factorize the team reward into each individual’s reward — individual-global-max (IGM) is a condition on the factorization ensuring that agents’ action choices coincide with team’s optimal joint action. However, current architectures fail to consider local coordination within sub-teams that should be exploited for more effective factorization, leading to faster learning. We propose a novel value factorization framework, called multiagent Q-learning with sub-team coordination (QSCAN), to flexibly represent sub-team coordination while honoring the IGM condition. QSCAN encompasses the full spectrum of sub-team coordination according to sub-team size, ranging from the monotonic value function class to the entire IGM function class, with familiar methods such as QMIX and QPLEX located at the respective extremes of the spectrum. Experimental results show that QSCAN’s performance dominates state-of-the-art methods in matrix games, predator-prey tasks, the Switch challenge in MA-Gym. Additionally, QSCAN achieves comparable performances to those methods in a selection of StarCraft II micro-management tasks.

PDF Details

TIST Journal 2022 Journal Article

Urban Traffic Dynamics Prediction—A Continuous Spatial-temporal Meta-learning Approach

Yingxue Zhang
Yanhua Li
Xun Zhou
Jun Luo
Zhi-Li Zhang

Urban traffic status (e.g., traffic speed and volume) is highly dynamic in nature, namely, varying across space and evolving over time. Thus, predicting such traffic dynamics is of great importance to urban development and transportation management. However, it is very challenging to solve this problem due to spatial-temporal dependencies and traffic uncertainties. In this article, we solve the traffic dynamics prediction problem from Bayesian meta-learning perspective and propose a novel continuous spatial-temporal meta-learner (cST-ML), which is trained on a distribution of traffic prediction tasks segmented by historical traffic data with the goal of learning a strategy that can be quickly adapted to related but unseen traffic prediction tasks. cST-ML tackles the traffic dynamics prediction challenges by advancing the Bayesian black-box meta-learning framework through the following new points: (1) cST-ML captures the dynamics of traffic prediction tasks using variational inference, and to better capture the temporal uncertainties within tasks, cST-ML performs as a rolling window within each task; (2) cST-ML has novel designs in architecture, where CNN and LSTM are embedded to capture the spatial-temporal dependencies between traffic status and traffic-related features; (3) novel training and testing algorithms for cST-ML are designed. We also conduct experiments on two real-world traffic datasets (taxi inflow and traffic speed) to evaluate our proposed cST-ML. The experimental results verify that cST-ML can significantly improve the urban traffic prediction performance and outperform all baseline models especially when obvious traffic dynamics and temporal uncertainties are presented.

Details DOI

AAMAS Conference 2021 Conference Paper

Diverse Auto-Curriculum is Critical for Successful Real-World Multiagent Learning Systems

Yaodong Yang
Jun Luo
Ying Wen
Oliver Slumbers
Daniel Graves
Haitham Bou Ammar
Jun Wang
Matthew E. Taylor

Multiagent reinforcement learning (MARL) has achieved a remarkable amount of success in solving various types of video games. A cornerstone of this success is the auto-curriculum framework, which shapes the learning process by continually creating new challenging tasks for agents to adapt to, thereby facilitating the acquisition of new skills. In order to extend MARL methods to realworld domains outside of video games, we envision in this blue sky paper that maintaining a diversity-aware auto-curriculum is critical for successful MARL applications. Specifically, we argue that behavioural diversity is a pivotal, yet under-explored, component for real-world multiagent learning systems, and that significant work remains in understanding how to design a diversity-aware auto-curriculum. We list four open challenges for auto-curriculum techniques, which we believe deserve more attention from this community. Towards validating our vision, we recommend modelling realistic interactive behaviours in autonomous driving as an important test bed, and recommend the SMARTS/ULTRA benchmark.

PDF

TIST Journal 2020 Journal Article

DHPA

Menghai Pan
Weixiao Huang
Yanhua Li
Xun Zhou
Zhenming Liu
Rui Song
Hui Lu
Zhihong Tian

Many real-world human behaviors can be modeled and characterized as sequential decision-making processes, such as a taxi driver’s choices of working regions and times. Each driver possesses unique preferences on the sequential choices over time and improves the driver’s working efficiency. Understanding the dynamics of such preferences helps accelerate the learning process of taxi drivers. Prior works on taxi operation management mostly focus on finding optimal driving strategies or routes, lacking in-depth analysis on what the drivers learned during the process and how they affect the performance of the driver. In this work, we make the first attempt to establish Dynamic Human Preference Analytics. We inversely learn the taxi drivers’ preferences from data and characterize the dynamics of such preferences over time. We extract two types of features (i.e., profile features and habit features) to model the decision space of drivers. Then through inverse reinforcement learning, we learn the preferences of drivers with respect to these features. The results illustrate that self-improving drivers tend to keep adjusting their preferences to habit features to increase their earning efficiency while keeping the preferences to profile features invariant. However, experienced drivers have stable preferences over time. The exploring drivers tend to randomly adjust the preferences over time.

Details DOI

AAAI Conference 2020 Conference Paper

Neighborhood Cognition Consistent Multi-Agent Reinforcement Learning

Hangyu Mao
Wulong Liu
Jianye Hao
Jun Luo
Dong Li
Zhengchao Zhang
Jun Wang
Zhen Xiao

Social psychology and real experiences show that cognitive consistency plays an important role to keep human society in order: if people have a more consistent cognition about their environments, they are more likely to achieve better cooperation. Meanwhile, only cognitive consistency within a neighborhood matters because humans only interact directly with their neighbors. Inspired by these observations, we take the ﬁrst step to introduce neighborhood cognitive consistency (NCC) into multi-agent reinforcement learning (MARL). Our NCC design is quite general and can be easily combined with existing MARL methods. As examples, we propose neighborhood cognition consistent deep Q-learning and Actor-Critic to facilitate large-scale multi-agent cooperations. Extensive experiments on several challenging tasks (i. e. , packet routing, wiﬁ conﬁguration and Google football player control) justify the superior performance of our methods compared with state-of-the-art MARL approaches.

PDF Details

IS Journal 2015 Journal Article

Parallel Service Management Framework and Application to Railway Station Layout Planning

Lefei Li
Chen Lyu
Jun Luo
Shuyun Yang
Chenxu Dai

People always say that the customer is god, but do they really understand this "god"? In practice, the nature of service, such as its intangibility and heterogeneity, makes the effective management of service operations difficult. To better incorporate human behaviors and other complex factors to facilitate service management decisions, the authors propose a new approach called parallel service management. PSM, which follows the ACP framework, is an emerging methodology in complex systems modeling and analysis. In PSM, customer agents, service employee agents, service organization agents, and the service environment construct an artificial service system, which can run in parallel with the real service system. The authors developed an agent-based artificial service system for the Chengdu East Railway Station, which let them make better layout planning decisions. The modification plan was then implemented in the station and achieved significant increase in store entering rate. Statistical comparison proves PSM's potential to enhance the performance of the real service systems.

Details DOI

TCS Journal 2014 Journal Article

Voronoi diagram with visual restriction

Chenglin Fan
Jun Luo
Wencheng Wang
Binhai Zhu

In a normal Voronoi diagram, each site is able to see all the points in the plane. In this paper, we study the case such that each site is only able to see a visually restricted region in the plane and construct the so-called Visual Restriction Voronoi Diagram (VRVD). We show that the visual restriction Voronoi cell of each site is not necessarily convex and it could consist of many disjoint regions. We prove that the combinatorial complexity of the VRVD on n sites is Θ ( n 2 ), and then show that the VRVD can be constructed in O ( n 2 ) time and O ( n 2 ) space. Besides that, we also give another algorithm with an extra log n factor of running time to compute VRVD, which is easy to implement in practice.

Details DOI

TCS Journal 2013 Journal Article

Preface

Binhai Zhu
Jun Luo

Details DOI

IROS Conference 2009 Conference Paper

Adaptive dynamic coupling control of human-symbiotic wheeled mobile manipulators with hybrid joints

Zhijun Li 0001
Jun Luo
Lei Dai

In this paper, adaptive dynamic coupling control is considered for hybrid joint, which could be switched to either active (actuated) or passive (under-actuated) mode, for human-symbiotic wheeled mobile manipulators. Based on Lyapunov synthesis, adaptive coupling control using physical properties of wheeled mobile manipulators proposed for passive hybrid joints ensures that the system outputs track the given bounded reference signals within a small neighborhood of zero, and guarantees semi-global uniform boundedness of all closed loop signals. The effectiveness of the proposed controls is verified through extensive simulations.

Details