Author name cluster

Wenhao Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

40 papers

2 author rows

AAAI Conference 2026 Conference Paper

Cross-Modal Unlearning via Influential Neuron Path Editing in Multimodal Large Language Models

Kunhao Li
Wenhao Li
Di Wu
Lei Yang
Jun Bai
Ju Jia
Jason Xue

Multimodal Large Language Models (MLLMs) extend foundation models to real-world applications by integrating inputs such as text and vision. However, their broad knowledge capacity raises growing concerns about privacy leakage, toxicity mitigation, and intellectual property violations. Machine Unlearning (MU) offers a practical solution by selectively forgetting targeted knowledge while preserving overall model utility. When applied to MLLMs, existing neuron-editing-based MU approaches face two fundamental challenges: (i) forgetting becomes inconsistent across modalities because existing point-wise attribution methods fail to capture the structured, layer-by-layer information flow that connects different modalities; and (ii) general knowledge performance declines when sensitive neurons that also support important reasoning paths are pruned, as this disrupts the model’s ability to generalize. To alleviate these limitations, we propose a multimodal influential neuron path editor (MIP-Editor) for MU. Our approach introduces modality-specific attribution scores to identify influential neuron paths responsible for encoding forget-set knowledge and applies influential-path-aware neuron-editing via representation misdirection. This strategy also enables effective and coordinated forgetting across modalities while preserving the model's general capabilities. Experimental results demonstrate that MIP-Editor achieves a superior unlearning performance on multimodal tasks, with a maximum forgetting rate of 87.75% and up to 54.26% improvement in general knowledge retention. On textual tasks, MIP-Editor achieves up to 80.65% forgetting and preserves 77.90% of general performance.

PDF Details DOI

AAAI Conference 2026 Conference Paper

KALL-E: Autoregressive Speech Synthesis with Next-Distribution Prediction

Kangxiang Xia
Xinfa Zhu
Jixun Yao
Wenjie Tian
Wenhao Li
Lei Xie

We introduce KALL-E, a novel autoregressive (AR) language model for text-to-speech (TTS) synthesis that operates by predicting the next distribution of continuous speech frames. Unlike existing methods, KALL-E directly models the continuous speech distribution conditioned on text, eliminating the need for any diffusion-based components. Specifically, we utilize a Flow-VAE to extract a continuous latent speech representation from waveforms, instead of relying on discrete speech tokens. A single AR Transformer is then trained to predict these continuous speech distributions from text, optimizing a Kullback–Leibler divergence loss as its objective. Experimental results demonstrate that KALL-E achieves superior speech synthesis quality and can even adapt to a target speaker from just a single sample. Importantly, KALL-E provides a more direct and effective approach for utilizing continuous speech representations in TTS.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Teaching Large Language Models to Maintain Contextual Faithfulness via Synthetic Tasks and Reinforcement Learning

Shuzheng Si
Haozhe Zhao
Cheng Gao
Yuzhuo Bai
Zhitong Wang
Bofei Gao
Kangyang Luo
Wenhao Li

Teaching large language models (LLMs) to be faithful in the provided context is crucial for building reliable information-seeking systems. Therefore, we propose a systematic framework, CANOE, to reduce faithfulness hallucinations of LLMs across different downstream tasks without human annotations. Specifically, we first synthesize short-form question-answering (QA) data with four diverse tasks to construct high-quality and easily verifiable training data without human annotation. Also, we propose Dual-GRPO, a rule-based reinforcement learning method that includes three tailored rule-based rewards derived from synthesized short-form QA data, while simultaneously optimizing both short-form and long-form response generation. Notably, Dual-GRPO eliminates the need to manually label preference data to train reward models and avoids over-optimizing short-form generation when relying only on the synthesized short-form QA data. Experimental results show that CANOE greatly improves the faithfulness of LLMs across 11 different tasks, even outperforming the most advanced LLMs, e.g., GPT-4o and OpenAI o1.

PDF Details DOI

EAAI Journal 2025 Journal Article

An evacuation guidance approach based on multi-agent shared Q-learning

Yanbin Han
Hong Liu
Liang Li
Wenhao Li

Details DOI

YNICL Journal 2025 Journal Article

Association of valproate use and hippocampal atrophy in idiopathic generalized epilepsy

Xiang Huang
Yingying Zhang
Qiuxing Lin
Kailing Huang
Yuming Li
Peiwen Liu
Danyang Cao
Wenhao Li

Details DOI

ICLR Conference 2025 Conference Paper

DICE: Data Influence Cascade in Decentralized Learning

Tongtian Zhu
Wenhao Li
Can Wang 0001
Fengxiang He

Decentralized learning offers a promising approach to crowdsource data consumptions and computational workloads across geographically distributed compute interconnected through peer-to-peer networks, accommodating the exponentially increasing demands. However, proper incentives are still in absence, considerably discouraging participation. Our vision is that a fair incentive mechanism relies on fair attribution of contributions to participating nodes, which faces non-trivial challenges arising from the localized connections making influence ``cascade'' in a decentralized network. To overcome this, we design the first method to estimate Data Influence CascadE (DICE) in a decentralized environment. Theoretically, the framework derives tractable approximations of influence cascade over arbitrary neighbor hops, suggesting the influence cascade is determined by an interplay of data, communication topology, and the curvature of loss landscape.DICE also lays the foundations for applications including selecting suitable collaborators and identifying malicious behaviors. Project page is available at https://raiden-zhu.github.io/blog/2025/DICE.

Details

AAMAS Conference 2025 Conference Paper

Dynamic Conservative Degree Allocation for Offline Multi-Agent Reinforcement Learning

Haosheng Chen
Yun Hua
Junjie Sheng
Wenhao Li
Bo Jin
Xiangfeng Wang

Offline Multi-agent Reinforcement Learning (MARL) has been designed to learn policies from pre-collected datasets without realtime interaction in multi-agent systems. A primary concern in offline MARL is the conservative degree allocation, which involves assigning different conservatism levels to agents based on their varying influence on the system. Current approaches frequently neglect this crucial aspect, resulting in suboptimal performance, particularly when agents have differing impacts on the environment. In this paper, we propose OMCDA, a novel offline MARL algorithm that addresses the issue of conservative degree allocation by assigning dynamic conservatism levels to each agent based on their individual influence on system performance. OMCDA decomposes the Q-function into two components: one for computing the return and another for capturing deviations from the behavior policy. Additionally, OMCDA employs a dynamic allocation mechanism that adjusts conservatism levels for agents based on varying impacts, while maintaining coherent credit assignment and ensuring robust system performance throughout learning. We evaluate OMCDA on MuJoCo and SMAC, showing it outperforms existing offline MARL methods in challenging tasks by effectively addressing conservative degree allocation.

PDF

JBHI Journal 2025 Journal Article

Enhancing Automated Seizure Detection via Self-Calibrating Spatial-Temporal EEG Features with SC-LSTM

Wenhao Li
Qiran Chen
Zhenyu Hou
Shi Chang
Zhenhong Ye
Jiangping Chen
Guan Ning Lin

Epilepsy, a highly individualized neurological disorder, affects millions globally. Electroencephalography (EEG) remains the cornerstone for seizure diagnosis, yet manual interpretation is labor-intensive and often unreliable due to the complexity of multi-channel, high-dimensional data. Traditional machine learning models often struggle with overfitting and fail in fully capturing the highdimensional, temporal dynamics of EEG signals, restricting their clinical utility. In this study, we propose SC-LSTM, a novel hybrid deep learning architecture that integrates dynamic spatial and temporal feature extraction to enhance automated seizure detection. SC-LSTM comprises a SelfCalibrated Reconstruction Module (SCConvNet) for adaptive spatial feature representation and a Bidirectional Long Short-Term Memory (Bi-LSTM) network for modeling temporal dependency. This parallel processing framework captures patient-specific EEG variability more effectively than traditional sequential models, promoting robust and discriminative feature learning. Comprehensive evaluations on two real-world neonatal EEG datasets, using K-fold crossvalidation and simulated single-channel signal loss, demonstrate that SC-LSTM achieved an accuracy of 97% and an area under the curve of 0. 99, significantly surpassing the performance of CNN and CNN-LSTM models. Importantly, SC-LSTM maintained high diagnostic performance even under conditions of partial data loss from critical brain regions, underscoring its resilience to clinical variability and signal artifacts. By improving accuracy, stability, and adaptability in seizure detection, SC-LSTM exemplifies the application of artificial intelligence to support individualized diagnostics and embodies the core principles of precision medicine. The open-source availability of SC-LSTM further facilitates reproducibility, clinical translation, and future extensions across broader neurological disorder monitoring applications.

Details DOI

NeurIPS Conference 2025 Conference Paper

LOPT: Learning Optimal Pigovian Tax in Sequential Social Dilemmas

Yun Hua
Shang Gao
Wenhao Li
Haosheng Chen
Bo Jin
Xiangfeng Wang
Jun Luo
Hongyuan Zha

Multi-agent reinforcement learning (MARL) has emerged as a powerful framework for modeling autonomous agents that independently optimize their individual objectives. However, in mixed-motive MARL environments, rational self-interested behaviors often lead to collectively suboptimal outcomes situations commonly referred to as social dilemmas. A key challenge in addressing social dilemmas lies in accurately quantifying and representing them in a numerical form that captures how self-interested agent behaviors impact social welfare. To address this challenge, \textit{externalities} in the economic concept is adopted and extended to denote the unaccounted-for impact of one agent's actions on others, as a means to rigorously quantify social dilemmas. Based on this measurement, a novel method, \textbf{L}earning \textbf{O}ptimal \textbf{P}igovian \textbf{T}ax (\textbf{LOPT}) is proposed. Inspired by Pigovian taxes, which are designed to internalize externalities by imposing cost on negative societal impacts, LOPT employs an auxiliary tax agent that learns an optimal Pigovian tax policy to reshape individual rewards aligned with social welfare, thereby promoting agent coordination and mitigating social dilemmas. We support LOPT with theoretical analysis and validate it on standard MARL benchmarks, including Escape Room and Cleanup. Results show that by effectively internalizing externalities that quantify social dilemmas, LOPT aligns individual objectives with collective goals, significantly improving social welfare over state-of-the-art baselines.

PDF Details

AAMAS Conference 2025 Conference Paper

Negotiated Reasoning: On Provably Addressing Relative Over-Generalization

Junjie Sheng
Wenhao Li
Bo Jin
Hongyuan Zha
Jun Wang
Xiangfeng Wang

We focus on the relative over-generalization (RO) issue in fully cooperative multi-agent reinforcement learning (MARL). Existing methods show that endowing agents with reasoning can help mitigate RO empirically, but there is little theoretical insight. We first prove that RO is avoided when agents satisfy a consistent reasoning requirement. We then propose a new negotiated reasoning framework connecting reasoning and RO with theoretical guarantees. Based on it, we develop an algorithm called Stein variational negotiated reasoning (SVNR), which uses Stein variational gradient descent to form a negotiation policy that provably bypasses RO under maximumentropy policy iteration. SVNR is further parameterized with neural networks for computational efficiency. Experiments demonstrate that SVNR significantly outperforms baselines on RO-challenged tasks, confirming its advantage in achieving better cooperation.

PDF

JBHI Journal 2025 Journal Article

Refine Medical Diagnosis Using Generation Augmented Retrieval and Clinical Practice Guidelines

Wenhao Li
Hongkuan Zhang
Hongwei Zhang
Zhengxu Li
Zengjie Dong
Yafan Chen
Niranjan Bidargaddi
Hong Liu

Current medical language models, adapted from large language models, typically predict ICD code-based diagnosis from electronic health records (EHRs) because these labels are readily available. However, ICD codes do not capture the nuanced, context-rich reasoning clinicians use for diagnosis. Clinicians synthesize diverse patient data and reference clinical practice guidelines (CPGs) to make evidence-based decisions. This misalignment limits the clinical utility of existing models. We introduce GARMLE-G, a Generation-Augmented Retrieval framework that grounds medical language model outputs in authoritative CPGs. Unlike conventional Retrieval-Augmented Generation based approaches, GARMLE-G enables hallucination-free outputs by directly retrieving authoritative guideline content without relying on model-generated text. It (1) integrates LLM predictions with EHR data to create semantically rich queries, (2) retrieves relevant CPG knowledge snippets via embedding similarity, and (3) fuses guideline content with model output to generate clinically aligned recommendations. A prototype system for hypertension and coronary heart disease diagnosis was developed and evaluated on multiple metrics, demonstrating superior retrieval precision, semantic relevance, and clinical guideline adherence compared to RAG-based baselines, while maintaining a lightweight architecture suitable for localized healthcare deployment. This work provides a scalable, low-cost, and hallucination-free method for grounding medical language models in evidence-based clinical practice, with strong potential for broader clinical deployment.

Details DOI

ICML Conference 2025 Conference Paper

Self-Supervised Transformers as Iterative Solution Improvers for Constraint Satisfaction

Yudong Xu 0001
Wenhao Li
Scott Sanner
Elias B. Khalil

We present a Transformer-based framework for Constraint Satisfaction Problems (CSPs). CSPs find use in many applications and thus accelerating their solution with machine learning is of wide interest. Most existing approaches rely on supervised learning from feasible solutions or reinforcement learning, paradigms that require either feasible solutions to these NP-Complete CSPs or large training budgets and a complex expert-designed reward signal. To address these challenges, we propose ConsFormer, a self-supervised framework that leverages a Transformer as a solution refiner. ConsFormer constructs a solution to a CSP iteratively in a process that mimics local search. Instead of using feasible solutions as labeled data, we devise differentiable approximations to the discrete constraints of a CSP to guide model training. Our model is trained to improve random assignments for a single step but is deployed iteratively at test time, circumventing the bottlenecks of supervised and reinforcement learning. Experiments on Sudoku, Graph Coloring, Nurse Rostering, and MAXCUT demonstrate that our method can tackle out-of-distribution CSPs simply through additional iterations.

Details

NeurIPS Conference 2025 Conference Paper

Shapley-Coop: Credit Assignment for Emergent Cooperation in Self-Interested LLM Agents

Yun Hua
Haosheng Chen
Shiqin Wang
Wenhao Li
Xiangfeng Wang
Jun Luo

Large Language Models (LLMs) are increasingly deployed as autonomous agents in multi-agent systems, and promising coordination has been demonstrated in handling complex tasks under predefined roles and scripted workflows. However, significant challenges remain in open-ended environments, where agents are inherently self-interested and explicit coordination guidelines are absent. In such scenarios, misaligned incentives frequently lead to social dilemmas and inefficient collective outcomes. Inspired by how human societies tackle similar coordination challenges—through temporary collaborations like employment or subcontracting—a cooperative workflow \textbf{Shapley-Coop} is proposed. This workflow enables self-interested Large Language Model (LLM) agents to engage in emergent collaboration by using a fair credit allocation mechanism to ensure each agent’s contributions are appropriately recognized and rewarded. Shapley-Coop introduces structured negotiation protocols and Shapley-inspired reasoning to estimate agents’ marginal contributions, thereby enabling effective task-time coordination and equitable post-task outcome redistribution. This results in effective coordination that fosters collaboration while preserving agent autonomy, through a rational pricing mechanism that encourages cooperative behavior. Evaluated in two multi-agent games and a software engineering simulation, Shapley-Coop consistently enhances LLM agent collaboration and facilitates equitable outcome redistribution, accurately reflecting individual contributions during the task execution process.

PDF Details

IJCAI Conference 2025 Conference Paper

SkyRover: A Modular Simulator for Cross-Domain Pathfinding

Wenhui Ma
Wenhao Li
Bo Jin
Changhong Lu
Xiangfeng Wang

Unmanned Aerial Vehicles (UAVs) and Automated Guided Vehicles (AGVs) increasingly collaborate in logistics, surveillance, inspection tasks and etc. However, existing simulators often focus on a single domain, limiting cross-domain study. This paper presents the SkyRover, a modular simulator for UAV-AGV multi-agent pathfinding (MAPF). SkyRover supports realistic agent dynamics, configurable 3D environments, and convenient APIs for external solvers and learning methods. By unifying ground and aerial operations, it facilitates cross-domain algorithm design, testing, and benchmarking. Experiments highlight SkyRover’s capacity for efficient pathfinding and high-fidelity simulations in UAV-AGV coordination. We believe the SkyRover fills a key gap in MAPF research. Project is available at https: //sites. google. com/view/mapf3d/home.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Spotlight Attention: Towards Efficient LLM Generation via Non-linear Hashing-based KV Cache Retrieval

Wenhao Li
Yuxin Zhang
Gen Luo
Haiyuan Wan
ZiYang Gong
Fei Chao
Rongrong Ji

Reducing the key-value (KV) cache burden in Large Language Models (LLMs) significantly accelerates inference. Dynamically selecting critical KV caches during decoding helps maintain performance. Existing methods use random linear hashing to identify important tokens, but this approach is inefficient due to the orthogonal distribution of queries and keys within two narrow cones in LLMs. We introduce Spotlight Attention, a novel method that employs non-linear hashing functions to optimize the embedding distribution of queries and keys, enhancing coding efficiency and robustness. We also developed a lightweight, stable training framework using a Bradley-Terry ranking-based loss, enabling optimization of the non-linear hashing module on GPUs with 16GB memory in 8 hours. Experimental results show that Spotlight Attention drastically improves retrieval precision while shortening the length of the hash code at least 5$\times$ compared to traditional linear hashing. Finally, we exploit the computational advantages of bitwise operations by implementing specialized CUDA kernels, achieving hashing retrieval for 512K tokens in under 100$\mu$s on a single A100 GPU, with end-to-end throughput up to 3$\times$ higher than vanilla decoding.

PDF Details

JBHI Journal 2025 Journal Article

SSGraphDTI: A Drug-Target Interaction Prediction Method Integrated Structural and Dynamic Systemic Biology Attributes

Haotian Guan
Tian Bai
Jingtong Zhao
Wenhao Li
Han Wang

Drug-Target Interaction (DTI) is a crucial aspect of pharmaceutical development. However, biochemical experiments are prohibitively expensive to identify these interactions on a large scale, while the computational approach is still on the way to making a highly reliable prediction. For the purpose of promoting prediction accuracy, drug-related molecular networks are gradually introduced to this task to furnish valuable information. We hypothesized that integrating structural and systemic biological attributes could effectively enhance the performance of DTI prediction and proposed a novel DTI prediction model, SSGraphDTI, which integrated two aforementioned attributes. Specifically, the structural attributes of drugs and targets are extracted using independent convolutional neural network based models from the Simplified Molecular Input Line Entry System of drugs and the amino acid sequences of targets, respectively. Meanwhile, the systemic biological attributes of drug-target pairs are obtained through graph representation learning on the dynamically constructed heterogeneous drug-target interaction network. SSGraphDTI was meticulously trained and rigorously tested on the benchmark Dataset_DrugBank, achieving an improvement of approximately 1. 0% across five metrics compared to recent comparable methods. These results underscore the potential of combining both structural and systemic information for accurate DTI prediction. Benefiting from the fact that the input consists solely of structural data without requiring interaction information, the model effectively addresses the “cold-start problem” in drug discovery. Furthermore, by extracting systemic attributes directly from the dynamically constructed DTI networks, the model maintains strong predictive performance even when data is limited. The source code is available at https://github.com/NENUBioCompute/SSGraphDTI.

Details DOI

AAAI Conference 2025 Conference Paper

SVTformer: Spatial-View-Temporal Transformer for Multi-View 3D Human Pose Estimation

Wanruo Zhang
Mengyuan Liu
Hong Liu
Wenhao Li

Recently, transformer-based methods have been introduced to estimate 3D human pose from multiple views by aggregating the spatial-temporal information of human joints to achieve the lifting of 2D to 3D. However, previous approaches cannot model the inter-frame correspondence of each view's joint individually, nor can they directly consider all view interactions at each time, leading to insufficient learning of multi-view associations. To address this issue, we propose a Spatial-View-Temporal transformer (SVTformer) to decouple spatial-view-temporal information in sequential order for correlation learning and model dependencies between them in a local-to-global manner. SVTformer includes an attended Spatial-View-Temporal (SVT) patch embedding to attentively capture the local features of the input poses and stacked SVT encoders to extract global spatial-view-temporal dependencies progressively. Specifically, SVT encoders perform three reconstructions sequentially to attended features with the learning through view decoupling for temporal-enhanced spatial correlation, temporal decoupling for spatial-enhanced view correlation, and another view decoupling for spatial-enhanced temporal relationship. This decoupling-coupling-decoupling multi-view scheme enables us to alternatively model the inter-joint spatial relationships, cross-view dependencies, and temporal motion associations. We evaluate the proposed SVTformer on three popular 3D HPE datasets, and it yields state-of-the-art performance. It effectively deals with ill-posed problems and enhances the accuracy of 3D human pose estimation.

PDF Details DOI

TMLR Journal 2025 Journal Article

Tackling the Abstraction and Reasoning Corpus with Vision Transformers: the Importance of 2D Representation, Positions, and Objects

Wenhao Li
Yudong Xu
Scott Sanner
Elias Boutros Khalil

The Abstraction and Reasoning Corpus (ARC) is a popular benchmark focused on visual reasoning in the evaluation of Artificial Intelligence systems. In its original framing, an ARC task requires solving a program synthesis problem over small 2D images using a few input-output training pairs. In this work, we adopt the recently popular data-driven approach to the ARC and ask whether a Vision Transformer (ViT) can learn the implicit mapping, from input image to output image, that underlies the task. We show that a ViT—otherwise a state-of-the-art model for images—fails dra- matically on most ARC tasks even when trained on one million examples per task. This points to an inherent representational deficiency of the ViT architecture that makes it incapable of uncov- ering the simple structured mappings underlying the ARC tasks. Building on these insights, we propose VITARC, a ViT-style architecture that unlocks some of the visual reasoning capabilities re- quired by the ARC. Specifically, we use a pixel-level input representation, design a spatially-aware tokenization scheme, and introduce a novel object-based positional encoding that leverages auto- matic segmentation, among other enhancements. Our task-specific VITARC models achieve a test solve rate close to 100% on more than half of the 400 public ARC tasks strictly through supervised learning from input-output grids. This calls attention to the importance of imbuing the powerful (Vision) Transformer with the correct inductive biases for abstract visual reasoning that are critical even when the training data is plentiful and the mapping is noise-free. Hence, VITARC provides a strong foundation for future research in visual reasoning using transformer-based architectures.

PDF Details

AAAI Conference 2025 Conference Paper

TCPFormer: Learning Temporal Correlation with Implicit Pose Proxy for 3D Human Pose Estimation

Jiajie Liu
Mengyuan Liu
Hong Liu
Wenhao Li

Recent multi-frame lifting methods have dominated the 3D human pose estimation. However, previous methods ignore the intricate dependence within the 2D pose sequence and learn single temporal correlation. To alleviate this limitation, we propose TCPFormer, which leverages an implicit pose proxy as an intermediate representation. Each proxy within the implicit pose proxy can build one temporal correlation therefore helping us learn a more comprehensive temporal correlation of human motion. Specifically, our method consists of three key components: Proxy Update Module (PUM), Proxy Invocation Module (PIM), and Proxy Attention Module (PAM). PUM first uses pose features to update the implicit pose proxy, enabling it to store representative information from the pose sequence. PIM then invocates and integrates the pose proxy with the pose sequence to enhance the motion semantics of each pose. Finally, PAM leverages the above mapping between the pose sequence and pose proxy to enhance the temporal correlation of the whole pose sequence. Experiments on the Human3.6M and MPI-INF-3DHP datasets demonstrate that our proposed TCPFormer outperforms the previous state-of-the-art methods.

PDF Details DOI

YNIMG Journal 2025 Journal Article

The brain-gut microbiota network (BGMN) is correlated with symptom severity and neurocognition in patients with schizophrenia

Runlin Peng
Wei Wang
Liqin Liang
Rui Han
Yi Li
Haiyuan Wang
Yuran Wang
Wenhao Li

Details DOI

NeurIPS Conference 2025 Conference Paper

VT-FSL: Bridging Vision and Text with LLMs for Few-Shot Learning

Wenhao Li
Qiangchang Wang
Xianjing Meng
Zhibin Wu
Yilong Yin

Few-shot learning (FSL) aims to recognize novel concepts from only a few labeled support samples. Recent studies enhance support features by incorporating additional semantic information (e. g. , class descriptions) or designing complex semantic fusion modules. However, these methods still suffer from hallucinating semantics that contradict the visual evidence due to the lack of grounding in actual instances, resulting in noisy guidance and costly corrections. To address these issues, we propose a novel framework, bridging Vision and Text with LLMs for Few-Shot Learning (VT-FSL), which constructs precise cross-modal prompts conditioned on Large Language Models (LLMs) and support images, seamlessly integrating them through a geometry-aware alignment mechanism. It mainly consists of Cross-modal Iterative Prompting (CIP) and Cross-modal Geometric Alignment (CGA). Specifically, the CIP conditions an LLM on both class names and support images to generate precise class descriptions iteratively in a single structured reasoning pass. These descriptions not only enrich the semantic understanding of novel classes but also enable the zero-shot synthesis of semantically consistent images. The descriptions and synthetic images act respectively as complementary textual and visual prompts, providing high-level class semantics and low-level intra-class diversity to compensate for limited support data. Furthermore, the CGA jointly aligns the fused textual, support, and synthetic visual representations by minimizing the kernelized volume of the 3-dimensional parallelotope they span. It captures global and nonlinear relationships among all representations, enabling structured and consistent multimodal integration. The proposed VT-FSL method establishes new state-of-the-art performance across ten diverse benchmarks, including standard, cross-domain, and fine-grained few-shot learning scenarios. Code is available at https: //github. com/peacelwh/VT-FSL.

PDF Details

IROS Conference 2024 Conference Paper

C 3 P-VoxelMap: Compact, Cumulative and Coalescible Probabilistic Voxel Mapping

Xu Yang
Wenhao Li
Qijie Ge
Lulu Suo
Weijie Tang
Zhengyu Wei
Longxiang Huang
Bo Wang

This work presents a compact, cumulative, and coalescible probabilistic voxel mapping method to enhance performance, accuracy, and memory efficiency in LiDAR odometry. Probabilistic voxel mapping requires storing past point clouds and re-iterating them to update the uncertainty at every iteration, which consumes large memory space and CPU cycles. To solve this problem, we propose a two-fold strategy. First, we introduce a compact point-free representation for probabilistic voxels and derive a cumulative update of the planar uncertainty without caching original point clouds. Our voxel structure only keeps track of a predetermined set of statistics for points that lie inside it. This method reduces the runtime complexity from O(MN) to O(N) and the space complexity from O(N) to O(1) where M is the number of iterations and N is the number of points. Second, to further minimize memory usage and enhance mapping accuracy, we provide a strategy to dynamically merge voxels associated with the same physical planes by taking advantage of the geometric features in the real world. Rather than constantly scanning for these coalescible voxels at every iteration, our merging strategy accumulates voxels in a locality-sensitive hash and triggers merging lazily. On-demand merging reduces memory footprint with minimal computational overhead and improves localization accuracy thanks to cross-voxel denoising. Experiments exhibit 20% higher accuracy, 20% faster performance, and 70% lower memory consumption than the state-of-the-art.

Details

IJCAI Conference 2024 Conference Paper

Carbon Market Simulation with Adaptive Mechanism Design

Han Wang
Wenhao Li
Hongyuan Zha
Baoxiang Wang

A carbon market is a market-based tool that incentivizes economic agents to align individual profits with the global utility, i. e. , reducing carbon emissions to tackle climate change. Cap and trade stands as a critical principle based on allocating and trading carbon allowances (carbon emission credit), enabling economic agents to follow planned emissions and penalizing excess emissions. A central authority is responsible for introducing and allocating those allowances in cap and trade. However, the complexity of carbon market dynamics makes accurate simulation intractable, which in turn hinders the design of effective allocation strategies. To address this, we propose an adaptive mechanism design framework, simulating the market using hierarchical, model-free multi-agent reinforcement learning (MARL). Government agents allocate carbon credits, while enterprises engage in economic activities and carbon trading. This framework illustrates agents’ behavior comprehensively. Numerical results show MARL enables government agents to balance productivity, equality, and carbon emissions. Our project is available at https: //anonymous. 4open. science/r/Carbon-Simulator.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

JourneyBench: A Challenging One-Stop Vision-Language Understanding Benchmark of Generated Images

Zhecan Wang
Junzhang Liu
Chia-Wei Tang
Hani Alomari
Anushka Sivakumar
Rui Sun
Wenhao Li
Hammad Ayyubi

Existing vision-language understanding benchmarks largely consist of images of objects in their usual contexts. As a consequence, recent multimodal large language models can perform well with only a shallow visual understanding by relying on background language biases. Thus, strong performance on these benchmarks does not necessarily correlate with strong visual understanding. In this paper, we release JourneyBench, a comprehensive human-annotated benchmark of generated images designed to assess the model's fine-grained multimodal reasoning abilities across five tasks: complementary multimodal chain of thought, multi-image VQA, imaginary image captioning, VQA with hallucination triggers, and fine-grained retrieval with sample-specific distractors. Unlike existing benchmarks, JourneyBench explicitly requires fine-grained multimodal reasoning in unusual imaginary scenarios where language bias and holistic image gist are insufficient. We benchmark state-of-the-art models on JourneyBench and analyze performance along a number of fine-grained dimensions. Results across all five tasks show that JourneyBench is exceptionally challenging for even the best models, indicating that models' visual reasoning abilities are not as strong as they first appear. We discuss the implications of our findings and propose avenues for further research.

PDF Details DOI

ICRA Conference 2024 Conference Paper

Learning Dual-arm Object Rearrangement for Cartesian Robots

Shishun Zhang
Qijin She
Wenhao Li
Chenyang Zhu 0002
Yongjun Wang
Ruizhen Hu
Kai Xu 0004

This work focuses on the dual-arm object rearrangement problem abstracted from a realistic industrial scenario of Cartesian robots. The goal of this problem is to transfer all the objects from sources to targets with the minimum total completion time. To achieve the goal, the core idea is to develop an effective object-to-arm task assignment strategy for minimizing the cumulative task execution time and maximizing the dual-arm cooperation efficiency. One of the difficulties in the task assignment is the scalability problem. As the number of objects increases, the computation time of traditional offline-search-based methods grows strongly for computational complexity. Encouraged by the adaptability of reinforcement learning (RL) in long-sequence task decisions, we propose an online task assignment decision method based on RL, and the computation time of our method only increases linearly with the number of objects. Further, we design an attention-based network to model the dependencies between the input states during the whole task execution process to help find the most reasonable object-to-arm correspondence in each task assignment round. In the experimental part, we adapt some search-based methods to this specific setting and compare our method with them. Experimental result shows that our approach achieves outperformance over search-based methods in total execution time and computational efficiency, and also verifies the generalization of our method to different numbers of objects. In addition, we show the effectiveness of our method deployed on the real robot in the supplementary video.

Details

TMLR Journal 2024 Journal Article

LLMs and the Abstraction and Reasoning Corpus: Successes, Failures, and the Importance of Object-based Representations

Yudong Xu
Wenhao Li
Pashootan Vaezipoor
Scott Sanner
Elias Boutros Khalil

Can a Large Language Model (LLM) solve simple abstract reasoning problems? We explore this broad question through a systematic analysis of GPT on the Abstraction and Reasoning Corpus (ARC), a representative benchmark of abstract reasoning ability from limited examples in which solutions require some "core knowledge" of concepts such as objects, goal states, counting, and basic geometry. GPT-4 solves only 13/50 of the most straightforward ARC tasks when using textual encodings for their two-dimensional input-output grids. Our failure analysis reveals that GPT-4's capacity to identify objects and reason about them is significantly influenced by the sequential nature of the text that represents an object within a text encoding of a task. To test this hypothesis, we design a new benchmark, the 1D-ARC, which consists of one-dimensional (array-like) tasks that are more conducive to GPT-based reasoning, and where it indeed performs better than on the (2D) ARC. To alleviate this issue, we propose an object-based representation that is obtained through an external tool, resulting in nearly doubling the performance on solved ARC tasks and near-perfect scores on the easier 1D-ARC. Although the state-of-the-art GPT-4 is unable to "reason" perfectly within non-language domains such as the 1D-ARC or a simple ARC subset, our study reveals that the use of object-based representations can significantly improve its reasoning ability. Visualizations, GPT logs, and data are available at https://khalil-research.github.io/LLM4ARC.

PDF Details

ICRA Conference 2024 Conference Paper

Synchronized Dual-arm Rearrangement via Cooperative mTSP

Wenhao Li
Shishun Zhang
Sisi Dai
Hui Huang 0004
Ruizhen Hu
Xiaohong Chen
Kai Xu 0004

Synchronized dual-arm rearrangement is widely studied as a common scenario in industrial applications. It often faces scalability challenges due to the computational complexity of robotic arm rearrangement and the high-dimensional nature of dual-arm planning. To address these challenges, we formulated the problem as cooperative mTSP, a variant of mTSP where agents share cooperative costs, and utilized reinforcement learning for its solution. Our approach involved representing rearrangement tasks using a task state graph that captured spatial relationships and a cooperative cost matrix that provided details about action costs. Taking these representations as observations, we designed an attention-based network to effectively combine them and provide rational task scheduling. Furthermore, a cost predictor is also introduced to directly evaluate actions during both training and planning, significantly expediting the planning process. Our experimental results demonstrate that our approach outperforms existing methods in terms of both performance and planning efficiency.

Details

JMLR Journal 2023 Journal Article

Dimension Reduction in Contextual Online Learning via Nonparametric Variable Selection

Wenhao Li
Ningyuan Chen
L. Jeff Hong

We consider a contextual online learning (multi-armed bandit) problem with high-dimensional covariate $x$ and decision $y$. The reward function to learn, $f(x,y)$, does not have a particular parametric form. The literature has shown that the optimal regret is $\tilde{O}(T^{(d_x\!+\!d_y\!+\!1)/(d_x\!+\!d_y\!+\!2)})$, where $d_x$ and $d_y$ are the dimensions of $x$ and $y$, and thus it suffers from the curse of dimensionality. In many applications, only a small subset of variables in the covariate affect the value of $f$, which is referred to as sparsity in statistics. To take advantage of the sparsity structure of the covariate, we propose a variable selection algorithm called BV-LASSO, which incorporates novel ideas such as binning and voting to apply LASSO to nonparametric settings. Using it as a subroutine, we can achieve the regret $\tilde{O}(T^{(d_x^*\!+\!d_y\!+\!1)/(d_x^*\!+\!d_y\!+\!2)})$, where $d_x^*$ is the effective covariate dimension. The regret matches the optimal regret when the covariate is $d^*_x$-dimensional and thus cannot be improved. Our algorithm may serve as a general recipe to achieve dimension reduction via variable selection in nonparametric settings. [abs] [ pdf ][ bib ] &copy JMLR 2023. ( edit, beta )

PDF Details

AAMAS Conference 2023 Conference Paper

Diverse Policy Optimization for Structured Action Space

Wenhao Li
Baoxiang Wang
Shanchao Yang
Hongyuan Zha

Enhancing the diversity of policies is beneficial for robustness, exploration, and transfer in reinforcement learning (RL). In this paper, we aim to seek diverse policies in an under-explored setting, namely RL tasks with structured action spaces with the two properties of composability and local dependencies. The complex action structure, non-uniform reward landscape, and subtle hyperparameter tuning due to the properties of structured actions prevent existing approaches from scaling well. We propose a simple and effective RL method, Diverse Policy Optimization (DPO), to model the policies in structured action space as the energy-based models (EBM) by following the probabilistic RL framework. A recently proposed novel and powerful generative model, GFlowNet, is introduced as the efficient, diverse EBM-based policy sampler. DPO follows a joint optimization framework: the outer layer uses the diverse policies sampled by the GFlowNet to update the EBM-based policies, which supports the GFlowNet training in the inner layer. Experiments on ATSC and Battle benchmarks demonstrate that DPO can efficiently discover surprisingly diverse policies in challenging scenarios and substantially outperform existing state-of-the-art methods.

PDF

JMLR Journal 2023 Journal Article

F2A2: Flexible Fully-decentralized Approximate Actor-critic for Cooperative Multi-agent Reinforcement Learning

Wenhao Li
Bo Jin
Xiangfeng Wang
Junchi Yan
Hongyuan Zha

Traditional centralized multi-agent reinforcement learning (MARL) algorithms are sometimes unpractical in complicated applications due to non-interactivity between agents, the curse of dimensionality, and computation complexity. Hence, several decentralized MARL algorithms are motivated. However, existing decentralized methods only handle the fully cooperative setting where massive information needs to be transmitted in training. The block coordinate gradient descent scheme they used for successive independent actor and critic steps can simplify the calculation, but it causes serious bias. This paper proposes a flexible fully decentralized actor-critic MARL framework, which can combine most of the actor-critic methods and handle large-scale general cooperative multi-agent settings. A primal-dual hybrid gradient descent type algorithm framework is designed to learn individual agents separately for decentralization. From the perspective of each agent, policy improvement and value evaluation are jointly optimized, which can stabilize multi-agent policy learning. Furthermore, the proposed framework can achieve scalability and stability for the large-scale environment. This framework also reduces information transmission by the parameter sharing mechanism and novel modeling-other-agents methods based on theory-of-mind and online supervised learning. Sufficient experiments in cooperative Multi-agent Particle Environment and StarCraft II show that the proposed decentralized MARL instantiation algorithms perform competitively against conventional centralized and decentralized methods. [abs] [ pdf ][ bib ] &copy JMLR 2023. ( edit, beta )

PDF Details

NeurIPS Conference 2023 Conference Paper

Information Design in Multi-Agent Reinforcement Learning

Yue Lin
Wenhao Li
Hongyuan Zha
Baoxiang Wang

Reinforcement learning (RL) is inspired by the way human infants and animals learn from the environment. The setting is somewhat idealized because, in actual tasks, other agents in the environment have their own goals and behave adaptively to the ego agent. To thrive in those environments, the agent needs to influence other agents so their actions become more helpful and less harmful. Research in computational economics distills two ways to influence others directly: by providing tangible goods (mechanism design) and by providing information (information design). This work investigates information design problems for a group of RL agents. The main challenges are two-fold. One is the information provided will immediately affect the transition of the agent trajectories, which introduces additional non-stationarity. The other is the information can be ignored, so the sender must provide information that the receiver is willing to respect. We formulate the Markov signaling game, and develop the notions of signaling gradient and the extended obedience constraints that address these challenges. Our algorithm is efficient on various mixed-motive tasks and provides further insights into computational economics. Our code is publicly available at https: //github. com/YueLin301/InformationDesignMARL.

PDF Details

AAMAS Conference 2023 Conference Paper

Learning Optimal "Pigovian Tax" in Sequential Social Dilemmas

Yun Hua
Shang Gao
Wenhao Li
Bo Jin
Xiangfeng Wang
Hongyuan Zha

In multi-agent reinforcement learning (MARL), each agent acts to maximize its individual accumulated rewards. Nevertheless, individual accumulated rewards could not fully reflect how others perceive them, resulting in selfish behaviors that undermine global performance, which brings the social dilemmas. This paper adapt the famous externality theory in economic area to analyze social dilemmas in MARL, and propose the method called Learning Optimal Pigovian Tax (LOPT) to internalize the externalities in MARL. Furthermore, a reward shaping mechanism based on the approximated optimal “Pigovian Tax” is applied to reduce the social cost of each agent and tries to alleviate the social dilemmas. Compared with existing state-of-the-art methods, the proposed LOPT leads to higher collective social welfare in both the Escape Room and the Cleanup environments, which shows the superiority of our method in solving social dilemmas.

PDF

AAMAS Conference 2023 Conference Paper

Learning Structured Communication for Multi-Agent Reinforcement Learning

Junjie Sheng
Xiangfeng Wang
Bo Jin
Wenhao Li
Jun Wang
Junchi Yan
Tsung-Hui Chang
Hongyuan Zha

This paper investigates multi-agent reinforcement learning (MARL) communication mechanisms in large-scale scenarios. We propose a novel framework, Learning Structured Communication (LSC), that leverages a flexible and efficient communication topology. LSC enables adaptive agent grouping to create diverse hierarchical formations over episodes generated through an auxiliary task and a hierarchical routing protocol. We learn a hierarchical graph neural network with the formed topology that facilitates effective message generation and propagation between inter- and intra-group communications. Unlike state-of-the-art communication mechanisms, LSC possesses a detailed and learnable design for hierarchical communication. Numerical experiments on challenging tasks demonstrate that the proposed LSC exhibits high communication efficiency and global cooperation capability.

PDF

AAMAS Conference 2023 Conference Paper

Model-Based Reinforcement Learning for Auto-bidding in Display Advertising

Shuang Chen
Qisen Xu
Liang Zhang
Yongbo Jin
Wenhao Li
Linjian Mo

Real-time bidding (RTB) achieves outstanding success in online display advertising, which has become one of the most influential businesses. Given historical ad impressions under the second price auction mechanism, the advertiser’s optimal bidding strategy is determined by the core parameter corresponding to the optimal solution of a constrained optimization problem. However, the sequentially arrived impressions in online display advertising make it highly non-trivial to obtain the optimal core parameter in advance without knowing the complete impression set. For this reason, recent methods have generally transformed the core parameter determination problem into a sequential parameter adjustment problem and solved it using reinforcement learning (RL). This paper proposes a simple and effective Model-Based Automatic Bidding algorithm, MBAB, which explicitly models the uncertainty of the dynamic auction environment and then uses the dynamic programming algorithm to obtain the current optimal adjustment of the core parameter. MBAB can avoid burdensome simulated environment construction and is more suitable for production deployment without the thorny sim-to-real issue than model-free methods. Furthermore, MBAB uses the optimal bidding formula to carry out coarse-grained modeling of the online market environment to alleviate the scalability problem caused by fine-grained environment modeling of previous model-based methods. In order to accurately describe the impression distribution and non-stationarity of the online market environment, we introduce the probabilistic modeling method and propose a novel monotonicity constraint to regulate the model output. Numerical experiments show that the proposed MBAB substantially outperforms existing baselines on various constrained RTB tasks in the production environment.

PDF

IJCAI Conference 2022 Conference Paper

Clickbait Detection via Contrastive Variational Modelling of Text and Label

Xiaoyuan Yi
Jiarui Zhang
Wenhao Li
Xiting Wang
Xing Xie

Clickbait refers to deliberately created sensational or deceptive text for tricking readers into clicking, which severely hurts the web ecosystem. With a growing number of clickbaits on social media, developing automatic detection methods becomes essential. Nonetheless, the performance of existing neural classifiers is limited due to the underutilization of small labelled datasets. Inspired by related pedagogy theories that learning to write can promote comprehension ability, we propose a novel Contrastive Variational Modelling (CVM) framework to exploit the labelled data better. CVM models the conditional distributions of text and clickbait labels by predicting labels from text and generating text from labels simultaneously with Variational AutoEncoder and further differentiates the learned spaces under each label by a mixed contrastive learning loss. In this way, CVM can capture more underlying textual properties and hence utilize label information to its full potential, boosting detection performance. We theoretically demonstrate CVM as learning a joint distribution of text, clickbait label, and latent variable. Experiments on three clickbait detection datasets show our method's robustness to inadequate and biased labels, outperforming several recent strong baselines.

PDF Details DOI

IJCAI Conference 2022 Conference Paper

VMAgent: A Practical Virtual Machine Scheduling Platform

Junjie Sheng
Shengliang Cai
Haochuan Cui
Wenhao Li
Yun Hua
Bo Jin
Wenli Zhou
Yiqiu Hu

Virtual machine (VM) scheduling is one of the critical tasks in cloud computing. Many works have attempted to incorporate machine learning, especially reinforcement learning, to empower VM scheduling procedures. Although improved results are shown in several demo simulators, the performances in real-world scenarios are still underexploited. In this paper, we design a practical VM scheduling platform, i. e. , VMAgent, to assist researchers in developing their methods on the VM scheduling problem. VMAgent consists of three components: simulator, scheduler, and visualizer. The simulator abstracts three general realistic scheduling scenarios (fading, recovering, and expansion) based on Huawei Cloud’s scheduling data, which is the core of our platform. Flexible configurations are further provided to make the simulator compatible with practical cloud computing architecture (i. e. , Multi Non-Uniform Memory Access) and scenarios. Researchers then need to instantiate the scheduler to interact with the simulator, which is also pre-built in various types (e. g. , heuristic, machine learning, and operations research) of scheduling algorithms to speed up the algorithm design. The visualizer, as an auxiliary component of the simulator and scheduler, facilitates researchers to conduct an in-depth analysis of the scheduling procedure and comprehensively compare different scheduling algorithms. We believe that VMAgent would shed light on the AI for the VM scheduling community, and the demo video is presented in https: //bit. ly/vmagent-demo-video.

PDF Details DOI

AAMAS Conference 2021 Conference Paper

Structured Diversification Emergence via Reinforced Organization Control and Hierachical Consensus Learning

Wenhao Li
Xiangfeng Wang
Bo Jin
Junjie Sheng
Yun Hua
Hongyuan Zha

When solving a complex task, humans will spontaneously form teams and to complete different parts of the whole task, respectively. Meanwhile, the cooperation between teammates will improve efficiency. However, for current cooperative MARL methods, the cooperation team is constructed through either heuristics or end-to-end blackbox optimization. In order to improve the efficiency of cooperation and exploration, we propose a structured diversification emergence MARL framework named Rochico based on reinforced organization control and hierarchical consensus learning. Rochico first learns an adaptive grouping policy through the organization control module, which is established by independent multi-agent reinforcement learning. Further, the hierarchical consensus module based on the hierarchical intentions with consensus constraint is introduced after team formation. Simultaneously, utilizing the hierarchical consensus module and a self-supervised intrinsic reward enhanced decision module, the proposed cooperative MARL algorithm Rochico can output the final diversified multi-agent cooperative policy. All three modules are organically combined to promote the structured diversification emergence. Comparative experiments on four large-scale cooperation tasks show that Rochico is significantly better than the current SOTA algorithms in terms of exploration efficiency and cooperation strength.

PDF

AAAI Conference 2020 Conference Paper

MixPoet: Diverse Poetry Generation via Learning Controllable Mixed Latent Space

Xiaoyuan Yi
Ruoyu Li
Cheng Yang
Wenhao Li
Maosong Sun

As an essential step towards computer creativity, automatic poetry generation has gained increasing attention these years. Though recent neural models make prominent progress in some criteria of poetry quality, generated poems still suffer from the problem of poor diversity. Related literature researches show that different factors, such as life experience, historical background, etc. , would inﬂuence composition styles of poets, which considerably contributes to the high diversity of human-authored poetry. Inspired by this, we propose MixPoet, a novel model that absorbs multiple factors to create various styles and promote diversity. Based on a semi-supervised variational autoencoder, our model disentangles the latent space into some subspaces, with each conditioned on one inﬂuence factor by adversarial training. In this way, the model learns a controllable latent variable to capture and mix generalized factor-related properties. Different factor mixtures lead to diverse styles and hence further differentiate generated poems from each other. Experiment results on Chinese poetry demonstrate that MixPoet improves both diversity and quality against three state-of-the-art models.

PDF Details

IJCAI Conference 2020 Conference Paper

Text Style Transfer via Learning Style Instance Supported Latent Space

Xiaoyuan Yi
Zhenghao Liu
Wenhao Li
Maosong Sun

Text style transfer pursues altering the style of a sentence while remaining its main content unchanged. Due to the lack of parallel corpora, most recent work focuses on unsupervised methods and has achieved noticeable progress. Nonetheless, the intractability of completely disentangling content from style for text leads to a contradiction of content preservation and style transfer accuracy. To address this problem, we propose a style instance supported method, StyIns. Instead of representing styles with embeddings or latent variables learned from single sentences, our model leverages the generative flow technique to extract underlying stylistic properties from multiple instances of each style, which form a more discriminative and expressive latent style space. By combining such a space with the attention-based structure, our model can better maintain the content and simultaneously achieve high transfer accuracy. Furthermore, the proposed method can be flexibly extended to semi-supervised learning so as to utilize available limited paired data. Experiments on three transfer tasks, sentiment modification, formality rephrasing, and poeticness generation, show that StyIns obtains a better balance between content and style, outperforming several recent baselines.

PDF Details DOI

IJCAI Conference 2019 Conference Paper

Sentiment-Controllable Chinese Poetry Generation

Huimin Chen
Xiaoyuan Yi
Maosong Sun
Wenhao Li
Cheng Yang
Zhipeng Guo

Expressing diverse sentiments is one of the main purposes of human poetry creation. Existing Chinese poetry generation models have made great progress in poetry quality, but they all neglected to endow generated poems with specific sentiments. Such defect leads to strong sentiment collapse or bias and thus hurts the diversity and semantics of generated poems. Meanwhile, there are few sentimental Chinese poetry resources for studying. To address this problem, we first collect a manually-labelled sentimental poetry corpus with fine-grained sentiment labels. Then we propose a novel semi-supervised conditional Variational Auto-Encoder model for sentiment-controllable poetry generation. Besides, since poetry is discourse-level text where the polarity and intensity of sentiment could transfer among lines, we incorporate a temporal module to capture sentiment transition patterns among different lines. Experimental results show our model can control the sentiment of not only a whole poem but also each line, and improve the poetry diversity against the state-of-the-art models without losing quality.

PDF Details