Arrow Research search

Author name cluster

Wenhao Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

40 papers
2 author rows

Possible papers

40

AAAI Conference 2026 Conference Paper

Cross-Modal Unlearning via Influential Neuron Path Editing in Multimodal Large Language Models

  • Kunhao Li
  • Wenhao Li
  • Di Wu
  • Lei Yang
  • Jun Bai
  • Ju Jia
  • Jason Xue

Multimodal Large Language Models (MLLMs) extend foundation models to real-world applications by integrating inputs such as text and vision. However, their broad knowledge capacity raises growing concerns about privacy leakage, toxicity mitigation, and intellectual property violations. Machine Unlearning (MU) offers a practical solution by selectively forgetting targeted knowledge while preserving overall model utility. When applied to MLLMs, existing neuron-editing-based MU approaches face two fundamental challenges: (i) forgetting becomes inconsistent across modalities because existing point-wise attribution methods fail to capture the structured, layer-by-layer information flow that connects different modalities; and (ii) general knowledge performance declines when sensitive neurons that also support important reasoning paths are pruned, as this disrupts the model’s ability to generalize. To alleviate these limitations, we propose a multimodal influential neuron path editor (MIP-Editor) for MU. Our approach introduces modality-specific attribution scores to identify influential neuron paths responsible for encoding forget-set knowledge and applies influential-path-aware neuron-editing via representation misdirection. This strategy also enables effective and coordinated forgetting across modalities while preserving the model's general capabilities. Experimental results demonstrate that MIP-Editor achieves a superior unlearning performance on multimodal tasks, with a maximum forgetting rate of 87.75% and up to 54.26% improvement in general knowledge retention. On textual tasks, MIP-Editor achieves up to 80.65% forgetting and preserves 77.90% of general performance.

AAAI Conference 2026 Conference Paper

KALL-E: Autoregressive Speech Synthesis with Next-Distribution Prediction

  • Kangxiang Xia
  • Xinfa Zhu
  • Jixun Yao
  • Wenjie Tian
  • Wenhao Li
  • Lei Xie

We introduce KALL-E, a novel autoregressive (AR) language model for text-to-speech (TTS) synthesis that operates by predicting the next distribution of continuous speech frames. Unlike existing methods, KALL-E directly models the continuous speech distribution conditioned on text, eliminating the need for any diffusion-based components. Specifically, we utilize a Flow-VAE to extract a continuous latent speech representation from waveforms, instead of relying on discrete speech tokens. A single AR Transformer is then trained to predict these continuous speech distributions from text, optimizing a Kullback–Leibler divergence loss as its objective. Experimental results demonstrate that KALL-E achieves superior speech synthesis quality and can even adapt to a target speaker from just a single sample. Importantly, KALL-E provides a more direct and effective approach for utilizing continuous speech representations in TTS.

AAAI Conference 2026 Conference Paper

Teaching Large Language Models to Maintain Contextual Faithfulness via Synthetic Tasks and Reinforcement Learning

  • Shuzheng Si
  • Haozhe Zhao
  • Cheng Gao
  • Yuzhuo Bai
  • Zhitong Wang
  • Bofei Gao
  • Kangyang Luo
  • Wenhao Li

Teaching large language models (LLMs) to be faithful in the provided context is crucial for building reliable information-seeking systems. Therefore, we propose a systematic framework, CANOE, to reduce faithfulness hallucinations of LLMs across different downstream tasks without human annotations. Specifically, we first synthesize short-form question-answering (QA) data with four diverse tasks to construct high-quality and easily verifiable training data without human annotation. Also, we propose Dual-GRPO, a rule-based reinforcement learning method that includes three tailored rule-based rewards derived from synthesized short-form QA data, while simultaneously optimizing both short-form and long-form response generation. Notably, Dual-GRPO eliminates the need to manually label preference data to train reward models and avoids over-optimizing short-form generation when relying only on the synthesized short-form QA data. Experimental results show that CANOE greatly improves the faithfulness of LLMs across 11 different tasks, even outperforming the most advanced LLMs, e.g., GPT-4o and OpenAI o1.

ICLR Conference 2025 Conference Paper

DICE: Data Influence Cascade in Decentralized Learning

  • Tongtian Zhu
  • Wenhao Li
  • Can Wang 0001
  • Fengxiang He

Decentralized learning offers a promising approach to crowdsource data consumptions and computational workloads across geographically distributed compute interconnected through peer-to-peer networks, accommodating the exponentially increasing demands. However, proper incentives are still in absence, considerably discouraging participation. Our vision is that a fair incentive mechanism relies on fair attribution of contributions to participating nodes, which faces non-trivial challenges arising from the localized connections making influence ``cascade'' in a decentralized network. To overcome this, we design the first method to estimate Data Influence CascadE (DICE) in a decentralized environment. Theoretically, the framework derives tractable approximations of influence cascade over arbitrary neighbor hops, suggesting the influence cascade is determined by an interplay of data, communication topology, and the curvature of loss landscape.DICE also lays the foundations for applications including selecting suitable collaborators and identifying malicious behaviors. Project page is available at https://raiden-zhu.github.io/blog/2025/DICE.

AAMAS Conference 2025 Conference Paper

Dynamic Conservative Degree Allocation for Offline Multi-Agent Reinforcement Learning

  • Haosheng Chen
  • Yun Hua
  • Junjie Sheng
  • Wenhao Li
  • Bo Jin
  • Xiangfeng Wang

Offline Multi-agent Reinforcement Learning (MARL) has been designed to learn policies from pre-collected datasets without realtime interaction in multi-agent systems. A primary concern in offline MARL is the conservative degree allocation, which involves assigning different conservatism levels to agents based on their varying influence on the system. Current approaches frequently neglect this crucial aspect, resulting in suboptimal performance, particularly when agents have differing impacts on the environment. In this paper, we propose OMCDA, a novel offline MARL algorithm that addresses the issue of conservative degree allocation by assigning dynamic conservatism levels to each agent based on their individual influence on system performance. OMCDA decomposes the Q-function into two components: one for computing the return and another for capturing deviations from the behavior policy. Additionally, OMCDA employs a dynamic allocation mechanism that adjusts conservatism levels for agents based on varying impacts, while maintaining coherent credit assignment and ensuring robust system performance throughout learning. We evaluate OMCDA on MuJoCo and SMAC, showing it outperforms existing offline MARL methods in challenging tasks by effectively addressing conservative degree allocation.

JBHI Journal 2025 Journal Article

Enhancing Automated Seizure Detection via Self-Calibrating Spatial-Temporal EEG Features with SC-LSTM

  • Wenhao Li
  • Qiran Chen
  • Zhenyu Hou
  • Shi Chang
  • Zhenhong Ye
  • Jiangping Chen
  • Guan Ning Lin

Epilepsy, a highly individualized neurological disorder, affects millions globally. Electroencephalography (EEG) remains the cornerstone for seizure diagnosis, yet manual interpretation is labor-intensive and often unreliable due to the complexity of multi-channel, high-dimensional data. Traditional machine learning models often struggle with overfitting and fail in fully capturing the highdimensional, temporal dynamics of EEG signals, restricting their clinical utility. In this study, we propose SC-LSTM, a novel hybrid deep learning architecture that integrates dynamic spatial and temporal feature extraction to enhance automated seizure detection. SC-LSTM comprises a SelfCalibrated Reconstruction Module (SCConvNet) for adaptive spatial feature representation and a Bidirectional Long Short-Term Memory (Bi-LSTM) network for modeling temporal dependency. This parallel processing framework captures patient-specific EEG variability more effectively than traditional sequential models, promoting robust and discriminative feature learning. Comprehensive evaluations on two real-world neonatal EEG datasets, using K-fold crossvalidation and simulated single-channel signal loss, demonstrate that SC-LSTM achieved an accuracy of 97% and an area under the curve of 0. 99, significantly surpassing the performance of CNN and CNN-LSTM models. Importantly, SC-LSTM maintained high diagnostic performance even under conditions of partial data loss from critical brain regions, underscoring its resilience to clinical variability and signal artifacts. By improving accuracy, stability, and adaptability in seizure detection, SC-LSTM exemplifies the application of artificial intelligence to support individualized diagnostics and embodies the core principles of precision medicine. The open-source availability of SC-LSTM further facilitates reproducibility, clinical translation, and future extensions across broader neurological disorder monitoring applications.

NeurIPS Conference 2025 Conference Paper

LOPT: Learning Optimal Pigovian Tax in Sequential Social Dilemmas

  • Yun Hua
  • Shang Gao
  • Wenhao Li
  • Haosheng Chen
  • Bo Jin
  • Xiangfeng Wang
  • Jun Luo
  • Hongyuan Zha

Multi-agent reinforcement learning (MARL) has emerged as a powerful framework for modeling autonomous agents that independently optimize their individual objectives. However, in mixed-motive MARL environments, rational self-interested behaviors often lead to collectively suboptimal outcomes situations commonly referred to as social dilemmas. A key challenge in addressing social dilemmas lies in accurately quantifying and representing them in a numerical form that captures how self-interested agent behaviors impact social welfare. To address this challenge, \textit{externalities} in the economic concept is adopted and extended to denote the unaccounted-for impact of one agent's actions on others, as a means to rigorously quantify social dilemmas. Based on this measurement, a novel method, \textbf{L}earning \textbf{O}ptimal \textbf{P}igovian \textbf{T}ax (\textbf{LOPT}) is proposed. Inspired by Pigovian taxes, which are designed to internalize externalities by imposing cost on negative societal impacts, LOPT employs an auxiliary tax agent that learns an optimal Pigovian tax policy to reshape individual rewards aligned with social welfare, thereby promoting agent coordination and mitigating social dilemmas. We support LOPT with theoretical analysis and validate it on standard MARL benchmarks, including Escape Room and Cleanup. Results show that by effectively internalizing externalities that quantify social dilemmas, LOPT aligns individual objectives with collective goals, significantly improving social welfare over state-of-the-art baselines.

AAMAS Conference 2025 Conference Paper

Negotiated Reasoning: On Provably Addressing Relative Over-Generalization

  • Junjie Sheng
  • Wenhao Li
  • Bo Jin
  • Hongyuan Zha
  • Jun Wang
  • Xiangfeng Wang

We focus on the relative over-generalization (RO) issue in fully cooperative multi-agent reinforcement learning (MARL). Existing methods show that endowing agents with reasoning can help mitigate RO empirically, but there is little theoretical insight. We first prove that RO is avoided when agents satisfy a consistent reasoning requirement. We then propose a new negotiated reasoning framework connecting reasoning and RO with theoretical guarantees. Based on it, we develop an algorithm called Stein variational negotiated reasoning (SVNR), which uses Stein variational gradient descent to form a negotiation policy that provably bypasses RO under maximumentropy policy iteration. SVNR is further parameterized with neural networks for computational efficiency. Experiments demonstrate that SVNR significantly outperforms baselines on RO-challenged tasks, confirming its advantage in achieving better cooperation.

JBHI Journal 2025 Journal Article

Refine Medical Diagnosis Using Generation Augmented Retrieval and Clinical Practice Guidelines

  • Wenhao Li
  • Hongkuan Zhang
  • Hongwei Zhang
  • Zhengxu Li
  • Zengjie Dong
  • Yafan Chen
  • Niranjan Bidargaddi
  • Hong Liu

Current medical language models, adapted from large language models, typically predict ICD code-based diagnosis from electronic health records (EHRs) because these labels are readily available. However, ICD codes do not capture the nuanced, context-rich reasoning clinicians use for diagnosis. Clinicians synthesize diverse patient data and reference clinical practice guidelines (CPGs) to make evidence-based decisions. This misalignment limits the clinical utility of existing models. We introduce GARMLE-G, a Generation-Augmented Retrieval framework that grounds medical language model outputs in authoritative CPGs. Unlike conventional Retrieval-Augmented Generation based approaches, GARMLE-G enables hallucination-free outputs by directly retrieving authoritative guideline content without relying on model-generated text. It (1) integrates LLM predictions with EHR data to create semantically rich queries, (2) retrieves relevant CPG knowledge snippets via embedding similarity, and (3) fuses guideline content with model output to generate clinically aligned recommendations. A prototype system for hypertension and coronary heart disease diagnosis was developed and evaluated on multiple metrics, demonstrating superior retrieval precision, semantic relevance, and clinical guideline adherence compared to RAG-based baselines, while maintaining a lightweight architecture suitable for localized healthcare deployment. This work provides a scalable, low-cost, and hallucination-free method for grounding medical language models in evidence-based clinical practice, with strong potential for broader clinical deployment.

ICML Conference 2025 Conference Paper

Self-Supervised Transformers as Iterative Solution Improvers for Constraint Satisfaction

  • Yudong Xu 0001
  • Wenhao Li
  • Scott Sanner
  • Elias B. Khalil

We present a Transformer-based framework for Constraint Satisfaction Problems (CSPs). CSPs find use in many applications and thus accelerating their solution with machine learning is of wide interest. Most existing approaches rely on supervised learning from feasible solutions or reinforcement learning, paradigms that require either feasible solutions to these NP-Complete CSPs or large training budgets and a complex expert-designed reward signal. To address these challenges, we propose ConsFormer, a self-supervised framework that leverages a Transformer as a solution refiner. ConsFormer constructs a solution to a CSP iteratively in a process that mimics local search. Instead of using feasible solutions as labeled data, we devise differentiable approximations to the discrete constraints of a CSP to guide model training. Our model is trained to improve random assignments for a single step but is deployed iteratively at test time, circumventing the bottlenecks of supervised and reinforcement learning. Experiments on Sudoku, Graph Coloring, Nurse Rostering, and MAXCUT demonstrate that our method can tackle out-of-distribution CSPs simply through additional iterations.

NeurIPS Conference 2025 Conference Paper

Shapley-Coop: Credit Assignment for Emergent Cooperation in Self-Interested LLM Agents

  • Yun Hua
  • Haosheng Chen
  • Shiqin Wang
  • Wenhao Li
  • Xiangfeng Wang
  • Jun Luo

Large Language Models (LLMs) are increasingly deployed as autonomous agents in multi-agent systems, and promising coordination has been demonstrated in handling complex tasks under predefined roles and scripted workflows. However, significant challenges remain in open-ended environments, where agents are inherently self-interested and explicit coordination guidelines are absent. In such scenarios, misaligned incentives frequently lead to social dilemmas and inefficient collective outcomes. Inspired by how human societies tackle similar coordination challenges—through temporary collaborations like employment or subcontracting—a cooperative workflow \textbf{Shapley-Coop} is proposed. This workflow enables self-interested Large Language Model (LLM) agents to engage in emergent collaboration by using a fair credit allocation mechanism to ensure each agent’s contributions are appropriately recognized and rewarded. Shapley-Coop introduces structured negotiation protocols and Shapley-inspired reasoning to estimate agents’ marginal contributions, thereby enabling effective task-time coordination and equitable post-task outcome redistribution. This results in effective coordination that fosters collaboration while preserving agent autonomy, through a rational pricing mechanism that encourages cooperative behavior. Evaluated in two multi-agent games and a software engineering simulation, Shapley-Coop consistently enhances LLM agent collaboration and facilitates equitable outcome redistribution, accurately reflecting individual contributions during the task execution process.

IJCAI Conference 2025 Conference Paper

SkyRover: A Modular Simulator for Cross-Domain Pathfinding

  • Wenhui Ma
  • Wenhao Li
  • Bo Jin
  • Changhong Lu
  • Xiangfeng Wang

Unmanned Aerial Vehicles (UAVs) and Automated Guided Vehicles (AGVs) increasingly collaborate in logistics, surveillance, inspection tasks and etc. However, existing simulators often focus on a single domain, limiting cross-domain study. This paper presents the SkyRover, a modular simulator for UAV-AGV multi-agent pathfinding (MAPF). SkyRover supports realistic agent dynamics, configurable 3D environments, and convenient APIs for external solvers and learning methods. By unifying ground and aerial operations, it facilitates cross-domain algorithm design, testing, and benchmarking. Experiments highlight SkyRover’s capacity for efficient pathfinding and high-fidelity simulations in UAV-AGV coordination. We believe the SkyRover fills a key gap in MAPF research. Project is available at https: //sites. google. com/view/mapf3d/home.

NeurIPS Conference 2025 Conference Paper

Spotlight Attention: Towards Efficient LLM Generation via Non-linear Hashing-based KV Cache Retrieval

  • Wenhao Li
  • Yuxin Zhang
  • Gen Luo
  • Haiyuan Wan
  • ZiYang Gong
  • Fei Chao
  • Rongrong Ji

Reducing the key-value (KV) cache burden in Large Language Models (LLMs) significantly accelerates inference. Dynamically selecting critical KV caches during decoding helps maintain performance. Existing methods use random linear hashing to identify important tokens, but this approach is inefficient due to the orthogonal distribution of queries and keys within two narrow cones in LLMs. We introduce Spotlight Attention, a novel method that employs non-linear hashing functions to optimize the embedding distribution of queries and keys, enhancing coding efficiency and robustness. We also developed a lightweight, stable training framework using a Bradley-Terry ranking-based loss, enabling optimization of the non-linear hashing module on GPUs with 16GB memory in 8 hours. Experimental results show that Spotlight Attention drastically improves retrieval precision while shortening the length of the hash code at least 5$\times$ compared to traditional linear hashing. Finally, we exploit the computational advantages of bitwise operations by implementing specialized CUDA kernels, achieving hashing retrieval for 512K tokens in under 100$\mu$s on a single A100 GPU, with end-to-end throughput up to 3$\times$ higher than vanilla decoding.

JBHI Journal 2025 Journal Article

SSGraphDTI: A Drug-Target Interaction Prediction Method Integrated Structural and Dynamic Systemic Biology Attributes

  • Haotian Guan
  • Tian Bai
  • Jingtong Zhao
  • Wenhao Li
  • Han Wang

Drug-Target Interaction (DTI) is a crucial aspect of pharmaceutical development. However, biochemical experiments are prohibitively expensive to identify these interactions on a large scale, while the computational approach is still on the way to making a highly reliable prediction. For the purpose of promoting prediction accuracy, drug-related molecular networks are gradually introduced to this task to furnish valuable information. We hypothesized that integrating structural and systemic biological attributes could effectively enhance the performance of DTI prediction and proposed a novel DTI prediction model, SSGraphDTI, which integrated two aforementioned attributes. Specifically, the structural attributes of drugs and targets are extracted using independent convolutional neural network based models from the Simplified Molecular Input Line Entry System of drugs and the amino acid sequences of targets, respectively. Meanwhile, the systemic biological attributes of drug-target pairs are obtained through graph representation learning on the dynamically constructed heterogeneous drug-target interaction network. SSGraphDTI was meticulously trained and rigorously tested on the benchmark Dataset_DrugBank, achieving an improvement of approximately 1. 0% across five metrics compared to recent comparable methods. These results underscore the potential of combining both structural and systemic information for accurate DTI prediction. Benefiting from the fact that the input consists solely of structural data without requiring interaction information, the model effectively addresses the “cold-start problem” in drug discovery. Furthermore, by extracting systemic attributes directly from the dynamically constructed DTI networks, the model maintains strong predictive performance even when data is limited. The source code is available at https://github.com/NENUBioCompute/SSGraphDTI.

AAAI Conference 2025 Conference Paper

SVTformer: Spatial-View-Temporal Transformer for Multi-View 3D Human Pose Estimation

  • Wanruo Zhang
  • Mengyuan Liu
  • Hong Liu
  • Wenhao Li

Recently, transformer-based methods have been introduced to estimate 3D human pose from multiple views by aggregating the spatial-temporal information of human joints to achieve the lifting of 2D to 3D. However, previous approaches cannot model the inter-frame correspondence of each view's joint individually, nor can they directly consider all view interactions at each time, leading to insufficient learning of multi-view associations. To address this issue, we propose a Spatial-View-Temporal transformer (SVTformer) to decouple spatial-view-temporal information in sequential order for correlation learning and model dependencies between them in a local-to-global manner. SVTformer includes an attended Spatial-View-Temporal (SVT) patch embedding to attentively capture the local features of the input poses and stacked SVT encoders to extract global spatial-view-temporal dependencies progressively. Specifically, SVT encoders perform three reconstructions sequentially to attended features with the learning through view decoupling for temporal-enhanced spatial correlation, temporal decoupling for spatial-enhanced view correlation, and another view decoupling for spatial-enhanced temporal relationship. This decoupling-coupling-decoupling multi-view scheme enables us to alternatively model the inter-joint spatial relationships, cross-view dependencies, and temporal motion associations. We evaluate the proposed SVTformer on three popular 3D HPE datasets, and it yields state-of-the-art performance. It effectively deals with ill-posed problems and enhances the accuracy of 3D human pose estimation.

TMLR Journal 2025 Journal Article

Tackling the Abstraction and Reasoning Corpus with Vision Transformers: the Importance of 2D Representation, Positions, and Objects

  • Wenhao Li
  • Yudong Xu
  • Scott Sanner
  • Elias Boutros Khalil

The Abstraction and Reasoning Corpus (ARC) is a popular benchmark focused on visual reasoning in the evaluation of Artificial Intelligence systems. In its original framing, an ARC task requires solving a program synthesis problem over small 2D images using a few input-output training pairs. In this work, we adopt the recently popular data-driven approach to the ARC and ask whether a Vision Transformer (ViT) can learn the implicit mapping, from input image to output image, that underlies the task. We show that a ViT—otherwise a state-of-the-art model for images—fails dra- matically on most ARC tasks even when trained on one million examples per task. This points to an inherent representational deficiency of the ViT architecture that makes it incapable of uncov- ering the simple structured mappings underlying the ARC tasks. Building on these insights, we propose VITARC, a ViT-style architecture that unlocks some of the visual reasoning capabilities re- quired by the ARC. Specifically, we use a pixel-level input representation, design a spatially-aware tokenization scheme, and introduce a novel object-based positional encoding that leverages auto- matic segmentation, among other enhancements. Our task-specific VITARC models achieve a test solve rate close to 100% on more than half of the 400 public ARC tasks strictly through supervised learning from input-output grids. This calls attention to the importance of imbuing the powerful (Vision) Transformer with the correct inductive biases for abstract visual reasoning that are critical even when the training data is plentiful and the mapping is noise-free. Hence, VITARC provides a strong foundation for future research in visual reasoning using transformer-based architectures.

AAAI Conference 2025 Conference Paper

TCPFormer: Learning Temporal Correlation with Implicit Pose Proxy for 3D Human Pose Estimation

  • Jiajie Liu
  • Mengyuan Liu
  • Hong Liu
  • Wenhao Li

Recent multi-frame lifting methods have dominated the 3D human pose estimation. However, previous methods ignore the intricate dependence within the 2D pose sequence and learn single temporal correlation. To alleviate this limitation, we propose TCPFormer, which leverages an implicit pose proxy as an intermediate representation. Each proxy within the implicit pose proxy can build one temporal correlation therefore helping us learn a more comprehensive temporal correlation of human motion. Specifically, our method consists of three key components: Proxy Update Module (PUM), Proxy Invocation Module (PIM), and Proxy Attention Module (PAM). PUM first uses pose features to update the implicit pose proxy, enabling it to store representative information from the pose sequence. PIM then invocates and integrates the pose proxy with the pose sequence to enhance the motion semantics of each pose. Finally, PAM leverages the above mapping between the pose sequence and pose proxy to enhance the temporal correlation of the whole pose sequence. Experiments on the Human3.6M and MPI-INF-3DHP datasets demonstrate that our proposed TCPFormer outperforms the previous state-of-the-art methods.

NeurIPS Conference 2025 Conference Paper

VT-FSL: Bridging Vision and Text with LLMs for Few-Shot Learning

  • Wenhao Li
  • Qiangchang Wang
  • Xianjing Meng
  • Zhibin Wu
  • Yilong Yin

Few-shot learning (FSL) aims to recognize novel concepts from only a few labeled support samples. Recent studies enhance support features by incorporating additional semantic information (e. g. , class descriptions) or designing complex semantic fusion modules. However, these methods still suffer from hallucinating semantics that contradict the visual evidence due to the lack of grounding in actual instances, resulting in noisy guidance and costly corrections. To address these issues, we propose a novel framework, bridging Vision and Text with LLMs for Few-Shot Learning (VT-FSL), which constructs precise cross-modal prompts conditioned on Large Language Models (LLMs) and support images, seamlessly integrating them through a geometry-aware alignment mechanism. It mainly consists of Cross-modal Iterative Prompting (CIP) and Cross-modal Geometric Alignment (CGA). Specifically, the CIP conditions an LLM on both class names and support images to generate precise class descriptions iteratively in a single structured reasoning pass. These descriptions not only enrich the semantic understanding of novel classes but also enable the zero-shot synthesis of semantically consistent images. The descriptions and synthetic images act respectively as complementary textual and visual prompts, providing high-level class semantics and low-level intra-class diversity to compensate for limited support data. Furthermore, the CGA jointly aligns the fused textual, support, and synthetic visual representations by minimizing the kernelized volume of the 3-dimensional parallelotope they span. It captures global and nonlinear relationships among all representations, enabling structured and consistent multimodal integration. The proposed VT-FSL method establishes new state-of-the-art performance across ten diverse benchmarks, including standard, cross-domain, and fine-grained few-shot learning scenarios. Code is available at https: //github. com/peacelwh/VT-FSL.

IROS Conference 2024 Conference Paper

C 3 P-VoxelMap: Compact, Cumulative and Coalescible Probabilistic Voxel Mapping

  • Xu Yang
  • Wenhao Li
  • Qijie Ge
  • Lulu Suo
  • Weijie Tang
  • Zhengyu Wei
  • Longxiang Huang
  • Bo Wang

This work presents a compact, cumulative, and coalescible probabilistic voxel mapping method to enhance performance, accuracy, and memory efficiency in LiDAR odometry. Probabilistic voxel mapping requires storing past point clouds and re-iterating them to update the uncertainty at every iteration, which consumes large memory space and CPU cycles. To solve this problem, we propose a two-fold strategy. First, we introduce a compact point-free representation for probabilistic voxels and derive a cumulative update of the planar uncertainty without caching original point clouds. Our voxel structure only keeps track of a predetermined set of statistics for points that lie inside it. This method reduces the runtime complexity from O(MN) to O(N) and the space complexity from O(N) to O(1) where M is the number of iterations and N is the number of points. Second, to further minimize memory usage and enhance mapping accuracy, we provide a strategy to dynamically merge voxels associated with the same physical planes by taking advantage of the geometric features in the real world. Rather than constantly scanning for these coalescible voxels at every iteration, our merging strategy accumulates voxels in a locality-sensitive hash and triggers merging lazily. On-demand merging reduces memory footprint with minimal computational overhead and improves localization accuracy thanks to cross-voxel denoising. Experiments exhibit 20% higher accuracy, 20% faster performance, and 70% lower memory consumption than the state-of-the-art.

IJCAI Conference 2024 Conference Paper

Carbon Market Simulation with Adaptive Mechanism Design

  • Han Wang
  • Wenhao Li
  • Hongyuan Zha
  • Baoxiang Wang

A carbon market is a market-based tool that incentivizes economic agents to align individual profits with the global utility, i. e. , reducing carbon emissions to tackle climate change. Cap and trade stands as a critical principle based on allocating and trading carbon allowances (carbon emission credit), enabling economic agents to follow planned emissions and penalizing excess emissions. A central authority is responsible for introducing and allocating those allowances in cap and trade. However, the complexity of carbon market dynamics makes accurate simulation intractable, which in turn hinders the design of effective allocation strategies. To address this, we propose an adaptive mechanism design framework, simulating the market using hierarchical, model-free multi-agent reinforcement learning (MARL). Government agents allocate carbon credits, while enterprises engage in economic activities and carbon trading. This framework illustrates agents’ behavior comprehensively. Numerical results show MARL enables government agents to balance productivity, equality, and carbon emissions. Our project is available at https: //anonymous. 4open. science/r/Carbon-Simulator.

NeurIPS Conference 2024 Conference Paper

JourneyBench: A Challenging One-Stop Vision-Language Understanding Benchmark of Generated Images

  • Zhecan Wang
  • Junzhang Liu
  • Chia-Wei Tang
  • Hani Alomari
  • Anushka Sivakumar
  • Rui Sun
  • Wenhao Li
  • Hammad Ayyubi

Existing vision-language understanding benchmarks largely consist of images of objects in their usual contexts. As a consequence, recent multimodal large language models can perform well with only a shallow visual understanding by relying on background language biases. Thus, strong performance on these benchmarks does not necessarily correlate with strong visual understanding. In this paper, we release JourneyBench, a comprehensive human-annotated benchmark of generated images designed to assess the model's fine-grained multimodal reasoning abilities across five tasks: complementary multimodal chain of thought, multi-image VQA, imaginary image captioning, VQA with hallucination triggers, and fine-grained retrieval with sample-specific distractors. Unlike existing benchmarks, JourneyBench explicitly requires fine-grained multimodal reasoning in unusual imaginary scenarios where language bias and holistic image gist are insufficient. We benchmark state-of-the-art models on JourneyBench and analyze performance along a number of fine-grained dimensions. Results across all five tasks show that JourneyBench is exceptionally challenging for even the best models, indicating that models' visual reasoning abilities are not as strong as they first appear. We discuss the implications of our findings and propose avenues for further research.

ICRA Conference 2024 Conference Paper

Learning Dual-arm Object Rearrangement for Cartesian Robots

  • Shishun Zhang
  • Qijin She
  • Wenhao Li
  • Chenyang Zhu 0002
  • Yongjun Wang
  • Ruizhen Hu
  • Kai Xu 0004

This work focuses on the dual-arm object rearrangement problem abstracted from a realistic industrial scenario of Cartesian robots. The goal of this problem is to transfer all the objects from sources to targets with the minimum total completion time. To achieve the goal, the core idea is to develop an effective object-to-arm task assignment strategy for minimizing the cumulative task execution time and maximizing the dual-arm cooperation efficiency. One of the difficulties in the task assignment is the scalability problem. As the number of objects increases, the computation time of traditional offline-search-based methods grows strongly for computational complexity. Encouraged by the adaptability of reinforcement learning (RL) in long-sequence task decisions, we propose an online task assignment decision method based on RL, and the computation time of our method only increases linearly with the number of objects. Further, we design an attention-based network to model the dependencies between the input states during the whole task execution process to help find the most reasonable object-to-arm correspondence in each task assignment round. In the experimental part, we adapt some search-based methods to this specific setting and compare our method with them. Experimental result shows that our approach achieves outperformance over search-based methods in total execution time and computational efficiency, and also verifies the generalization of our method to different numbers of objects. In addition, we show the effectiveness of our method deployed on the real robot in the supplementary video.

TMLR Journal 2024 Journal Article

LLMs and the Abstraction and Reasoning Corpus: Successes, Failures, and the Importance of Object-based Representations

  • Yudong Xu
  • Wenhao Li
  • Pashootan Vaezipoor
  • Scott Sanner
  • Elias Boutros Khalil

Can a Large Language Model (LLM) solve simple abstract reasoning problems? We explore this broad question through a systematic analysis of GPT on the Abstraction and Reasoning Corpus (ARC), a representative benchmark of abstract reasoning ability from limited examples in which solutions require some "core knowledge" of concepts such as objects, goal states, counting, and basic geometry. GPT-4 solves only 13/50 of the most straightforward ARC tasks when using textual encodings for their two-dimensional input-output grids. Our failure analysis reveals that GPT-4's capacity to identify objects and reason about them is significantly influenced by the sequential nature of the text that represents an object within a text encoding of a task. To test this hypothesis, we design a new benchmark, the 1D-ARC, which consists of one-dimensional (array-like) tasks that are more conducive to GPT-based reasoning, and where it indeed performs better than on the (2D) ARC. To alleviate this issue, we propose an object-based representation that is obtained through an external tool, resulting in nearly doubling the performance on solved ARC tasks and near-perfect scores on the easier 1D-ARC. Although the state-of-the-art GPT-4 is unable to "reason" perfectly within non-language domains such as the 1D-ARC or a simple ARC subset, our study reveals that the use of object-based representations can significantly improve its reasoning ability. Visualizations, GPT logs, and data are available at https://khalil-research.github.io/LLM4ARC.

ICRA Conference 2024 Conference Paper

Synchronized Dual-arm Rearrangement via Cooperative mTSP

  • Wenhao Li
  • Shishun Zhang
  • Sisi Dai
  • Hui Huang 0004
  • Ruizhen Hu
  • Xiaohong Chen
  • Kai Xu 0004

Synchronized dual-arm rearrangement is widely studied as a common scenario in industrial applications. It often faces scalability challenges due to the computational complexity of robotic arm rearrangement and the high-dimensional nature of dual-arm planning. To address these challenges, we formulated the problem as cooperative mTSP, a variant of mTSP where agents share cooperative costs, and utilized reinforcement learning for its solution. Our approach involved representing rearrangement tasks using a task state graph that captured spatial relationships and a cooperative cost matrix that provided details about action costs. Taking these representations as observations, we designed an attention-based network to effectively combine them and provide rational task scheduling. Furthermore, a cost predictor is also introduced to directly evaluate actions during both training and planning, significantly expediting the planning process. Our experimental results demonstrate that our approach outperforms existing methods in terms of both performance and planning efficiency.

JMLR Journal 2023 Journal Article

Dimension Reduction in Contextual Online Learning via Nonparametric Variable Selection

  • Wenhao Li
  • Ningyuan Chen
  • L. Jeff Hong

We consider a contextual online learning (multi-armed bandit) problem with high-dimensional covariate $x$ and decision $y$. The reward function to learn, $f(x,y)$, does not have a particular parametric form. The literature has shown that the optimal regret is $\tilde{O}(T^{(d_x\!+\!d_y\!+\!1)/(d_x\!+\!d_y\!+\!2)})$, where $d_x$ and $d_y$ are the dimensions of $x$ and $y$, and thus it suffers from the curse of dimensionality. In many applications, only a small subset of variables in the covariate affect the value of $f$, which is referred to as sparsity in statistics. To take advantage of the sparsity structure of the covariate, we propose a variable selection algorithm called BV-LASSO, which incorporates novel ideas such as binning and voting to apply LASSO to nonparametric settings. Using it as a subroutine, we can achieve the regret $\tilde{O}(T^{(d_x^*\!+\!d_y\!+\!1)/(d_x^*\!+\!d_y\!+\!2)})$, where $d_x^*$ is the effective covariate dimension. The regret matches the optimal regret when the covariate is $d^*_x$-dimensional and thus cannot be improved. Our algorithm may serve as a general recipe to achieve dimension reduction via variable selection in nonparametric settings. [abs] [ pdf ][ bib ] &copy JMLR 2023. ( edit, beta )

AAMAS Conference 2023 Conference Paper

Diverse Policy Optimization for Structured Action Space

  • Wenhao Li
  • Baoxiang Wang
  • Shanchao Yang
  • Hongyuan Zha

Enhancing the diversity of policies is beneficial for robustness, exploration, and transfer in reinforcement learning (RL). In this paper, we aim to seek diverse policies in an under-explored setting, namely RL tasks with structured action spaces with the two properties of composability and local dependencies. The complex action structure, non-uniform reward landscape, and subtle hyperparameter tuning due to the properties of structured actions prevent existing approaches from scaling well. We propose a simple and effective RL method, Diverse Policy Optimization (DPO), to model the policies in structured action space as the energy-based models (EBM) by following the probabilistic RL framework. A recently proposed novel and powerful generative model, GFlowNet, is introduced as the efficient, diverse EBM-based policy sampler. DPO follows a joint optimization framework: the outer layer uses the diverse policies sampled by the GFlowNet to update the EBM-based policies, which supports the GFlowNet training in the inner layer. Experiments on ATSC and Battle benchmarks demonstrate that DPO can efficiently discover surprisingly diverse policies in challenging scenarios and substantially outperform existing state-of-the-art methods.

JMLR Journal 2023 Journal Article

F2A2: Flexible Fully-decentralized Approximate Actor-critic for Cooperative Multi-agent Reinforcement Learning

  • Wenhao Li
  • Bo Jin
  • Xiangfeng Wang
  • Junchi Yan
  • Hongyuan Zha

Traditional centralized multi-agent reinforcement learning (MARL) algorithms are sometimes unpractical in complicated applications due to non-interactivity between agents, the curse of dimensionality, and computation complexity. Hence, several decentralized MARL algorithms are motivated. However, existing decentralized methods only handle the fully cooperative setting where massive information needs to be transmitted in training. The block coordinate gradient descent scheme they used for successive independent actor and critic steps can simplify the calculation, but it causes serious bias. This paper proposes a flexible fully decentralized actor-critic MARL framework, which can combine most of the actor-critic methods and handle large-scale general cooperative multi-agent settings. A primal-dual hybrid gradient descent type algorithm framework is designed to learn individual agents separately for decentralization. From the perspective of each agent, policy improvement and value evaluation are jointly optimized, which can stabilize multi-agent policy learning. Furthermore, the proposed framework can achieve scalability and stability for the large-scale environment. This framework also reduces information transmission by the parameter sharing mechanism and novel modeling-other-agents methods based on theory-of-mind and online supervised learning. Sufficient experiments in cooperative Multi-agent Particle Environment and StarCraft II show that the proposed decentralized MARL instantiation algorithms perform competitively against conventional centralized and decentralized methods. [abs] [ pdf ][ bib ] &copy JMLR 2023. ( edit, beta )

NeurIPS Conference 2023 Conference Paper

Information Design in Multi-Agent Reinforcement Learning

  • Yue Lin
  • Wenhao Li
  • Hongyuan Zha
  • Baoxiang Wang

Reinforcement learning (RL) is inspired by the way human infants and animals learn from the environment. The setting is somewhat idealized because, in actual tasks, other agents in the environment have their own goals and behave adaptively to the ego agent. To thrive in those environments, the agent needs to influence other agents so their actions become more helpful and less harmful. Research in computational economics distills two ways to influence others directly: by providing tangible goods (mechanism design) and by providing information (information design). This work investigates information design problems for a group of RL agents. The main challenges are two-fold. One is the information provided will immediately affect the transition of the agent trajectories, which introduces additional non-stationarity. The other is the information can be ignored, so the sender must provide information that the receiver is willing to respect. We formulate the Markov signaling game, and develop the notions of signaling gradient and the extended obedience constraints that address these challenges. Our algorithm is efficient on various mixed-motive tasks and provides further insights into computational economics. Our code is publicly available at https: //github. com/YueLin301/InformationDesignMARL.

AAMAS Conference 2023 Conference Paper

Learning Optimal "Pigovian Tax" in Sequential Social Dilemmas

  • Yun Hua
  • Shang Gao
  • Wenhao Li
  • Bo Jin
  • Xiangfeng Wang
  • Hongyuan Zha

In multi-agent reinforcement learning (MARL), each agent acts to maximize its individual accumulated rewards. Nevertheless, individual accumulated rewards could not fully reflect how others perceive them, resulting in selfish behaviors that undermine global performance, which brings the social dilemmas. This paper adapt the famous externality theory in economic area to analyze social dilemmas in MARL, and propose the method called Learning Optimal Pigovian Tax (LOPT) to internalize the externalities in MARL. Furthermore, a reward shaping mechanism based on the approximated optimal “Pigovian Tax” is applied to reduce the social cost of each agent and tries to alleviate the social dilemmas. Compared with existing state-of-the-art methods, the proposed LOPT leads to higher collective social welfare in both the Escape Room and the Cleanup environments, which shows the superiority of our method in solving social dilemmas.

AAMAS Conference 2023 Conference Paper

Learning Structured Communication for Multi-Agent Reinforcement Learning

  • Junjie Sheng
  • Xiangfeng Wang
  • Bo Jin
  • Wenhao Li
  • Jun Wang
  • Junchi Yan
  • Tsung-Hui Chang
  • Hongyuan Zha

This paper investigates multi-agent reinforcement learning (MARL) communication mechanisms in large-scale scenarios. We propose a novel framework, Learning Structured Communication (LSC), that leverages a flexible and efficient communication topology. LSC enables adaptive agent grouping to create diverse hierarchical formations over episodes generated through an auxiliary task and a hierarchical routing protocol. We learn a hierarchical graph neural network with the formed topology that facilitates effective message generation and propagation between inter- and intra-group communications. Unlike state-of-the-art communication mechanisms, LSC possesses a detailed and learnable design for hierarchical communication. Numerical experiments on challenging tasks demonstrate that the proposed LSC exhibits high communication efficiency and global cooperation capability.

AAMAS Conference 2023 Conference Paper

Model-Based Reinforcement Learning for Auto-bidding in Display Advertising

  • Shuang Chen
  • Qisen Xu
  • Liang Zhang
  • Yongbo Jin
  • Wenhao Li
  • Linjian Mo

Real-time bidding (RTB) achieves outstanding success in online display advertising, which has become one of the most influential businesses. Given historical ad impressions under the second price auction mechanism, the advertiser’s optimal bidding strategy is determined by the core parameter corresponding to the optimal solution of a constrained optimization problem. However, the sequentially arrived impressions in online display advertising make it highly non-trivial to obtain the optimal core parameter in advance without knowing the complete impression set. For this reason, recent methods have generally transformed the core parameter determination problem into a sequential parameter adjustment problem and solved it using reinforcement learning (RL). This paper proposes a simple and effective Model-Based Automatic Bidding algorithm, MBAB, which explicitly models the uncertainty of the dynamic auction environment and then uses the dynamic programming algorithm to obtain the current optimal adjustment of the core parameter. MBAB can avoid burdensome simulated environment construction and is more suitable for production deployment without the thorny sim-to-real issue than model-free methods. Furthermore, MBAB uses the optimal bidding formula to carry out coarse-grained modeling of the online market environment to alleviate the scalability problem caused by fine-grained environment modeling of previous model-based methods. In order to accurately describe the impression distribution and non-stationarity of the online market environment, we introduce the probabilistic modeling method and propose a novel monotonicity constraint to regulate the model output. Numerical experiments show that the proposed MBAB substantially outperforms existing baselines on various constrained RTB tasks in the production environment.

IJCAI Conference 2022 Conference Paper

Clickbait Detection via Contrastive Variational Modelling of Text and Label

  • Xiaoyuan Yi
  • Jiarui Zhang
  • Wenhao Li
  • Xiting Wang
  • Xing Xie

Clickbait refers to deliberately created sensational or deceptive text for tricking readers into clicking, which severely hurts the web ecosystem. With a growing number of clickbaits on social media, developing automatic detection methods becomes essential. Nonetheless, the performance of existing neural classifiers is limited due to the underutilization of small labelled datasets. Inspired by related pedagogy theories that learning to write can promote comprehension ability, we propose a novel Contrastive Variational Modelling (CVM) framework to exploit the labelled data better. CVM models the conditional distributions of text and clickbait labels by predicting labels from text and generating text from labels simultaneously with Variational AutoEncoder and further differentiates the learned spaces under each label by a mixed contrastive learning loss. In this way, CVM can capture more underlying textual properties and hence utilize label information to its full potential, boosting detection performance. We theoretically demonstrate CVM as learning a joint distribution of text, clickbait label, and latent variable. Experiments on three clickbait detection datasets show our method's robustness to inadequate and biased labels, outperforming several recent strong baselines.

IJCAI Conference 2022 Conference Paper

VMAgent: A Practical Virtual Machine Scheduling Platform

  • Junjie Sheng
  • Shengliang Cai
  • Haochuan Cui
  • Wenhao Li
  • Yun Hua
  • Bo Jin
  • Wenli Zhou
  • Yiqiu Hu

Virtual machine (VM) scheduling is one of the critical tasks in cloud computing. Many works have attempted to incorporate machine learning, especially reinforcement learning, to empower VM scheduling procedures. Although improved results are shown in several demo simulators, the performances in real-world scenarios are still underexploited. In this paper, we design a practical VM scheduling platform, i. e. , VMAgent, to assist researchers in developing their methods on the VM scheduling problem. VMAgent consists of three components: simulator, scheduler, and visualizer. The simulator abstracts three general realistic scheduling scenarios (fading, recovering, and expansion) based on Huawei Cloud’s scheduling data, which is the core of our platform. Flexible configurations are further provided to make the simulator compatible with practical cloud computing architecture (i. e. , Multi Non-Uniform Memory Access) and scenarios. Researchers then need to instantiate the scheduler to interact with the simulator, which is also pre-built in various types (e. g. , heuristic, machine learning, and operations research) of scheduling algorithms to speed up the algorithm design. The visualizer, as an auxiliary component of the simulator and scheduler, facilitates researchers to conduct an in-depth analysis of the scheduling procedure and comprehensively compare different scheduling algorithms. We believe that VMAgent would shed light on the AI for the VM scheduling community, and the demo video is presented in https: //bit. ly/vmagent-demo-video.

AAMAS Conference 2021 Conference Paper

Structured Diversification Emergence via Reinforced Organization Control and Hierachical Consensus Learning

  • Wenhao Li
  • Xiangfeng Wang
  • Bo Jin
  • Junjie Sheng
  • Yun Hua
  • Hongyuan Zha

When solving a complex task, humans will spontaneously form teams and to complete different parts of the whole task, respectively. Meanwhile, the cooperation between teammates will improve efficiency. However, for current cooperative MARL methods, the cooperation team is constructed through either heuristics or end-to-end blackbox optimization. In order to improve the efficiency of cooperation and exploration, we propose a structured diversification emergence MARL framework named Rochico based on reinforced organization control and hierarchical consensus learning. Rochico first learns an adaptive grouping policy through the organization control module, which is established by independent multi-agent reinforcement learning. Further, the hierarchical consensus module based on the hierarchical intentions with consensus constraint is introduced after team formation. Simultaneously, utilizing the hierarchical consensus module and a self-supervised intrinsic reward enhanced decision module, the proposed cooperative MARL algorithm Rochico can output the final diversified multi-agent cooperative policy. All three modules are organically combined to promote the structured diversification emergence. Comparative experiments on four large-scale cooperation tasks show that Rochico is significantly better than the current SOTA algorithms in terms of exploration efficiency and cooperation strength.

AAAI Conference 2020 Conference Paper

MixPoet: Diverse Poetry Generation via Learning Controllable Mixed Latent Space

  • Xiaoyuan Yi
  • Ruoyu Li
  • Cheng Yang
  • Wenhao Li
  • Maosong Sun

As an essential step towards computer creativity, automatic poetry generation has gained increasing attention these years. Though recent neural models make prominent progress in some criteria of poetry quality, generated poems still suffer from the problem of poor diversity. Related literature researches show that different factors, such as life experience, historical background, etc. , would influence composition styles of poets, which considerably contributes to the high diversity of human-authored poetry. Inspired by this, we propose MixPoet, a novel model that absorbs multiple factors to create various styles and promote diversity. Based on a semi-supervised variational autoencoder, our model disentangles the latent space into some subspaces, with each conditioned on one influence factor by adversarial training. In this way, the model learns a controllable latent variable to capture and mix generalized factor-related properties. Different factor mixtures lead to diverse styles and hence further differentiate generated poems from each other. Experiment results on Chinese poetry demonstrate that MixPoet improves both diversity and quality against three state-of-the-art models.

IJCAI Conference 2020 Conference Paper

Text Style Transfer via Learning Style Instance Supported Latent Space

  • Xiaoyuan Yi
  • Zhenghao Liu
  • Wenhao Li
  • Maosong Sun

Text style transfer pursues altering the style of a sentence while remaining its main content unchanged. Due to the lack of parallel corpora, most recent work focuses on unsupervised methods and has achieved noticeable progress. Nonetheless, the intractability of completely disentangling content from style for text leads to a contradiction of content preservation and style transfer accuracy. To address this problem, we propose a style instance supported method, StyIns. Instead of representing styles with embeddings or latent variables learned from single sentences, our model leverages the generative flow technique to extract underlying stylistic properties from multiple instances of each style, which form a more discriminative and expressive latent style space. By combining such a space with the attention-based structure, our model can better maintain the content and simultaneously achieve high transfer accuracy. Furthermore, the proposed method can be flexibly extended to semi-supervised learning so as to utilize available limited paired data. Experiments on three transfer tasks, sentiment modification, formality rephrasing, and poeticness generation, show that StyIns obtains a better balance between content and style, outperforming several recent baselines.

IJCAI Conference 2019 Conference Paper

Sentiment-Controllable Chinese Poetry Generation

  • Huimin Chen
  • Xiaoyuan Yi
  • Maosong Sun
  • Wenhao Li
  • Cheng Yang
  • Zhipeng Guo

Expressing diverse sentiments is one of the main purposes of human poetry creation. Existing Chinese poetry generation models have made great progress in poetry quality, but they all neglected to endow generated poems with specific sentiments. Such defect leads to strong sentiment collapse or bias and thus hurts the diversity and semantics of generated poems. Meanwhile, there are few sentimental Chinese poetry resources for studying. To address this problem, we first collect a manually-labelled sentimental poetry corpus with fine-grained sentiment labels. Then we propose a novel semi-supervised conditional Variational Auto-Encoder model for sentiment-controllable poetry generation. Besides, since poetry is discourse-level text where the polarity and intensity of sentiment could transfer among lines, we incorporate a temporal module to capture sentiment transition patterns among different lines. Experimental results show our model can control the sentiment of not only a whole poem but also each line, and improve the poetry diversity against the state-of-the-art models without losing quality.