Arrow Research search

Author name cluster

Lu Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

49 papers
2 author rows

Possible papers

49

AAAI Conference 2026 Conference Paper

Blur-Robust Detection via Feature Restoration: An End-to-End Framework for Prior-Guided Infrared UAV Target Detection

  • Xiaolin Wang
  • Houzhang Fang
  • Qingshan Li
  • Lu Wang
  • Yi Chang
  • Luxin Yan

Infrared unmanned aerial vehicle (UAV) target images often suffer from motion blur degradation caused by rapid sensor movement, significantly reducing contrast between target and background. Generally, detection performance heavily depends on the discriminative feature representation between target and background. Existing methods typically treat deblurring as a preprocessing step focused on visual quality, while neglecting the enhancement of task-relevant features crucial for detection. Improving feature representation for detection under blur conditions remains challenging. In this paper, we propose a novel Joint Feature-Domain Deblurring and Detection end-to-end framework, dubbed JFD³. We design a dual-branch architecture with shared weights, where the clear branch guides the blurred branch to enhance discriminative feature representation. Specifically, we first introduce a lightweight feature restoration network, where features from the clear branch serve as feature-level supervision to guide the blurred branch, thereby enhancing its distinctive capability for detection. We then propose a frequency structure guidance module that refines the structure prior from the restoration network and integrates it into shallow detection layers to enrich target structural information. Finally, a feature consistency self-supervised loss is imposed between the dual-branch detection backbones, driving the blurred branch to approximate the feature representations of the clear one. We also construct a benchmark, named IRBlurUAV, containing 30,000 simulated and 4,118 real infrared UAV target images with diverse motion blur. Extensive experiments on IRBlurUAV demonstrate that JFD³ achieves superior detection performance while maintaining real-time efficiency.

AAAI Conference 2026 Conference Paper

Heterogeneous Complementary Distillation

  • Liuchi Xu
  • Hao Zheng
  • Lu Wang
  • Lisheng Xu
  • Jun Cheng

Knowledge distillation (KD) transfers the ``dark knowledge'' from a complex teacher model to a compact student model. However, heterogeneous architecture distillation, such as Vision Transformer (ViT) to ResNet18, faces challenges due to differences in spatial feature representations. Traditional KD methods are mostly designed for homogeneous architectures and hence struggle to effectively address the disparity. Although heterogeneous KD approaches have been developed recently to solve these issues, they often incur high computational costs and complex designs, or overly rely on logit alignment, which limits their ability to leverage the complementary features. To overcome these limitations, we propose Heterogeneous Complementary Distillation (HCD), a simple yet effective framework that integrates complementary teacher and student features to align representations in shared logits. These logits are decomposed and constrained to facilitate diverse knowledge transfer to the student. Specifically, HCD processes the student’s intermediate features through convolutional projector and adaptive pooling, concatenates them with teacher's feature from the penultimate layer and then maps them via the Complementary Feature Mapper (CFM) module, comprising fully connected layer, to produce shared logits. We further introduce Sub-logit Decoupled Distillation (SDD) that partitions the shared logits into n sub-logits, which are fused with teacher's logits to rectify classification. To ensure sub-logit diversity and reduce redundant knowledge transfer, we propose an Orthogonality Loss (OL). By preserving student-specific strengths and leveraging teacher knowledge, HCD enhances robustness and generalization in students. Extensive experiments on the CIFAR-100, fine-grained (e.g., CUB200, Aircraft) and ImageNet-1K datasets demonstrate that HCD outperforms state-of-the-art KD methods, establishing it as an effective solution for heterogeneous KD.

TMLR Journal 2026 Journal Article

Process Reward Models That Think

  • Muhammad Khalifa
  • Rishabh Agarwal
  • Lajanugen Logeswaran
  • Jaekyeom Kim
  • Hao Peng
  • Moontae Lee
  • Honglak Lee
  • Lu Wang

Step-by-step verifiers—also known as process reward models (PRMs)—are a key ingredient for test-time scaling, but training them requires expensive step-level supervision. This work aims to build data-efficient PRMs as verbalized step-wise reward models that verify every step in the solution by generating a verification chain-of-thought (CoT). We propose ThinkPRM, a long CoT verifier fine-tuned on orders of magnitude fewer process labels than those required by discriminative PRMs. Our approach capitalizes on the inherent reasoning abilities of long CoT models, and outperforms LLM-as-a-Judge and discriminative verifiers—using only 1% of the process labels in PRM800K—across several challenging benchmarks. Specifically, ThinkPRM beats the baselines on ProcessBench, MATH-500, and AIME ’24 under best-of-N selection and reward-guided search. In an out-of-domain evaluation over subsets of GPQA-Diamond and LiveCodeBench, our PRM surpasses discriminative verifiers trained with the full PRM800K by 8% and 4.5%, respectively. Lastly, under the same token budget, ThinkPRM scales up verification compute more effectively compared to LLM-as-a-Judge, outperforming it by 7.2% on a subset of ProcessBench. This work highlights the value of generative, long CoT PRMs that can scale test-time compute for verification while requiring minimal supervision for training.

TMLR Journal 2026 Journal Article

VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model

  • Mengzhuo Chen
  • Jiani zheng
  • Lu Wang
  • Fangkai Yang
  • Chaoyun Zhang
  • Lingrui Mei
  • Wenjie Yin
  • Qingwei Lin

Training Vision-Language Models (VLMs) for Graphical User Interfaces (GUI) agents via Reinforcement Learning (RL) faces critical challenges: environment-based RL requires costly interactions, while environment-free methods struggle with distribution shift and reward generalization. We propose an environment-free RL framework that decouples action utility learning from policy optimization by leveraging a pretrained Value Environment Model (VEM), which requires no live environment interaction during policy optimization. VEM predicts value-aligned action utilities directly from offline data, distilling human-like priors about GUI interaction outcomes without requiring next-state prediction or environmental feedback. This avoids compounding errors and enhances resilience to UI changes by focusing on semantic reasoning (e.g., “Does this action advance the user’s goal?”). The framework operates in two stages: (1) pretraining VEM to learn action-level utility signals and (2) guiding policy exploration with frozen VEM signals, enabling layout-agnostic GUI automation. Evaluated across diverse benchmarks including Android-in-the-Wild for mobile apps and Multimodal-Mind2Web for web environments, VEM achieves state-of-the-art or highly competitive performance in both offline and online settings. It significantly outperforms environment-free baselines and matches or exceeds environment-based approaches, crucially without incurring interaction costs. Importantly, VEM demonstrates that robust, generalizable GUI agents can be trained efficiently using semantic-aware action utility prediction, proving effective across distinct interaction platforms like mobile and web. The code is available at https://github.com/microsoft/GUI-Agent-RL.

JBHI Journal 2025 Journal Article

A Blockchain-Enabled AI-Driven Secure Searchable Encryption Framework for Medical IoT Systems

  • Salabat Khan
  • Mansoor Khan
  • Muhammad Asghar Khan
  • Muhammad Attique Khan
  • Lu Wang
  • Kaishun Wu

Blockchain technology is widely adopted in the Internet of Medical Things (IoMT) for information storage and retrieval. The integration of blockchain with IoMT systems enhances security; however, it raises privacy and security in data searching and storage. This study proposes a novel Binary Spring Search (BSS) technique based on group theory and integrated with a hybrid deep neural network approach to enhance the security and trustworthiness of IoMT. The proposed method incorporates secure key revocation and dynamic policy updates. The proposed framework leverages blockchain technology for immutable and decentralized data management, Artificial Intelligence (AI) for dynamic data analysis and threat detection, and advanced searchable encryption techniques to facilitate secure and efficient data queries. The proposed patient-centered data access model that combines blockchain technology with trust chains makes our method safer and more efficient and demonstrates a return on investment. Furthermore, our blockchain-based architecture ensures the integrity and immutability of medical data generated by IoMT devices, allowing for decentralized and tamper-proof storage. We used the hyper-ledger fabric tool, known as OrigionLab, for simulations in a blockchain context. We claim that the suggested framework provides a more searchable and secure solution to the healthcare system when compared to the other methods given through our findings. The simulation results show that our algorithm significantly reduces transaction time while maintaining high levels of security, making it a robust solution for managing Patient Health Records (PHR) in a decentralized manner.

JBHI Journal 2025 Journal Article

Advancing Medical Innovation Through Blockchain-Secured Federated Learning for Smart Health

  • Salabat Khan
  • Mansoor Khan
  • Muhammad Asghar Khan
  • Lu Wang
  • Kaishun Wu

The rapid digitization of healthcare systems has led to a vast accumulation of electronic medical records (EMRs), offering an invaluable source of patient data that can significantly advance medical research and improve patient care. However, sharing EMRs for research purposes presents challenges, particularly concerning data privacy, security, and the limitations of traditional centralized data-sharing models. This paper introduces a novel approach that leverages blockchain technology to facilitate federated learning with EMRs, thereby addressing these challenges. Federated learning enables multiple institutions to collaboratively train a robust machine learning model without sharing raw data, preserving privacy and security. By integrating blockchain, this framework enhances data integrity, immutability, and trust, all in a decentralized environment. Blockchain serves as a transparent and secure ledger, recording model updates and aggregating them through a consensus-based mechanism. Smart contracts further enforce data usage policies, allowing only authorized access and maintaining control over data ownership and sharing. This approach empowers medical researchers and institutions to collaborate more effectively, accelerating the discovery of treatments, advancements in personalized medicine, and insights into rare diseases. It also enables patients to contribute to medical research while retaining control over their personal data, fostering a patient-centered approach to healthcare innovation. Experimental results confirm the efficacy and efficiency of this blockchain-enabled federated learning framework, highlighting its potential to transform medical research and adhere to stringent privacy and security standards. This study emphasizes the pivotal role of blockchain in enhancing Big Data analytics within healthcare, paving the way for improved collaboration, innovation, and patient outcomes.

AAAI Conference 2025 Conference Paper

Debiased Distillation for Consistency Regularization

  • Lu Wang
  • Liuchi Xu
  • Xiong Yang
  • Zhenhua Huang
  • Jun Cheng

Knowledge distillation transfers "dark knowledge" from a large teacher model to a smaller student model, yielding a highly efficient network. To improve network's generalization ability, existing works use a larger temperature coefficient for knowledge distillation. Nevertheless, these methods may lower the target category's confidence and lead to ambiguous recognition of similar samples. To mitigate this issue, some studies introduce intra-batch distillation to reduce prediction discrepancy. However, these methods overlook the inconsistency between background information and the target category, which may increase prediction bias due to noise disturbance. Additionally, label imbalance from random sampling and batch size can undermine network generalization reliability. To tackle these challenges, we propose a simple yet effective Intra-class Knowledge Distillation (IKD) method that facilitates knowledge sharing within the same class to ensure consistent predictions. First, we initialize the matrix and the vector to store logits and class counts provided by the teacher, respectively. Then, in the first epoch, we calculate the sum of logits and sample counts per class and perform KD to prevent knowledge omission. Finally, in subsequent training, we update the matrix to obtain the average logits and compute the KL divergence between the student's output and the updated matrix according to the label index. This process ensures intra-class consistency and improves the student's performance. Furthermore, this method theoretically reduces prediction bias by ensuring intra-class consistency. Extensive experiments on the CIFAR-100, ImageNet-1K, and Tiny-ImageNet datasets validate the superiority of IKD.

ICML Conference 2025 Conference Paper

Domain2Vec: Vectorizing Datasets to Find the Optimal Data Mixture without Training

  • Mozhi Zhang
  • Howe Tissue
  • Lu Wang
  • Xipeng Qiu

We introduce Domain2Vec, a novel approach that decomposes any dataset into a linear combination of several meta-domains, a new concept designed to capture the key underlying features of datasets. Domain2Vec maintains a vocabulary of meta-domains and uses a classifier to decompose any given dataset into a domain vector that corresponds to a distribution over this vocabulary. These domain vectors enable the identification of optimal data mixture for language model (LM) pretraining in a training-free manner under the D *istribution A lignment A ssumption (DA$^{2}$), which suggests that when the data distribution of the training set and the validation set is more aligned, a lower validation loss is achieved. Moreover, Domain2Vec can be seamlessly integrated into previous works to model the relationship between domain vectors and LM performance, greatly enhancing the efficiency and scalability of previous methods. Extensive experiments demonstrate that Domain2Vec helps find the data mixture that enhances downstream task performance with minimal computational overhead. Specifically, Domain2Vec achieves the same validation loss on Pile-CC using only $51. 5$% of the compute required when training on the original mixture of The Pile Dataset. Under equivalent compute budget, Domain2Vec* improves downstream performance by an average of $2. 83$%.

NeurIPS Conference 2025 Conference Paper

FedWMSAM: Fast and Flat Federated Learning via Weighted Momentum and Sharpness-Aware Minimization

  • Tianle Li
  • Yongzhi Huang
  • Linshan Jiang
  • Chang Liu
  • Qipeng Xie
  • Wenfeng Du
  • Lu Wang
  • Kaishun Wu

In federated learning (FL), models must \emph{converge quickly} under tight communication budgets while \emph{generalizing} across non-IID client distributions. These twin requirements have naturally led to two widely used techniques: client/server \emph{momentum} to accelerate progress, and \emph{sharpness-aware minimization} (SAM) to prefer flat solutions. However, simply combining momentum and SAM leaves two structural issues unresolved in non-IID FL. We identify and formalize two failure modes: \emph{local–global curvature misalignment} (local SAM directions need not reflect the global loss geometry) and \emph{momentum-echo oscillation} (late-stage instability caused by accumulated momentum). To our knowledge, these failure modes have not been jointly articulated and addressed in the FL literature. We propose \textbf{FedWMSAM} to address both failure modes. First, we construct a momentum-guided global perturbation from server-aggregated momentum to align clients' SAM directions with the global descent geometry, enabling a \emph{single-backprop} SAM approximation that preserves efficiency. Second, we couple momentum and SAM via a cosine-similarity adaptive rule, yielding an early-momentum, late-SAM two-phase training schedule. We provide a non-IID convergence bound that \emph{explicitly models the perturbation-induced variance} $\sigma_\rho^2=\sigma^2+(L\rho)^2$ and its dependence on $(S, K, R, N)$ on the theory side. We conduct extensive experiments on multiple datasets and model architectures, and the results validate the effectiveness, adaptability, and robustness of our method, demonstrating its superiority in addressing the optimization challenges of Federated Learning. Our code is available at \url{https: //github. com/Li-Tian-Le/NeurlPS_FedWMSAM}.

IJCAI Conference 2025 Conference Paper

HARMONY: A Privacy-preserving and Sensor-agnostic Tele-monitoring system

  • Qipeng Xie
  • Hao Guo
  • Weizheng Wang
  • Yongzhi Huang
  • Linshan Jiang
  • Jiafei Wu
  • Shuxin Zhong
  • Lu Wang

Global aging necessitates tele-monitoring systems to provide real-time tracking and timely assistance for older adults living independently. While pervasive wireless devices (e. g. , CSI, IMU, UWB) enable cost-effective, non-intrusive monitoring, existing systems lack flexibility, limiting their adaptability to different environments. In this work, we posit that the motion dynamics of human movement are invariant across sensing modalities, inspiring the design of HARMONY—a privacy-preserving, sensor-agnostic system that supports multi-modal inputs and diverse tele-monitoring tasks. HARMONY incorporates Modality-agnostic Data Processing to uniformly encrypt multi-modal signals and Task-specific Activity Recognition for seamless tasks adaptation. A novel Encrypted-processing Engine then significantly accelerates computations on encrypted data by optimizing matrix and convolution operations. Evaluations across five different sensing modalities show that HARMONY consistently achieves high accuracy while delivering 3. 5 × to 130 × speedups over state-of-the-art baselines. Our results demonstrate that HARMONY is a practical, scalable, and privacy-centric prototype for next-generation remote healthcare.

TMLR Journal 2025 Journal Article

Large Action Models: From Inception to Implementation

  • Lu Wang
  • Fangkai Yang
  • Chaoyun Zhang
  • Junting Lu
  • Jiaxu Qian
  • Shilin He
  • Pu Zhao
  • Bo Qiao

As AI continues to advance, there is a growing demand for systems that go beyond language-based assistance and move toward intelligent agents capable of performing real-world actions. This evolution requires the transition from traditional Large Language Models (LLMs), which excel at generating textual responses, to Large Action Models (LAMs), designed for action generation and execution within dynamic environments. Enabled by agent systems, LAMs hold the potential to transform AI from passive language understanding to active task completion, marking a significant milestone in the progression toward artificial general intelligence. In this paper, we present a comprehensive framework for developing LAMs, offering a systematic approach to their creation, from inception to deployment. We begin with an overview of LAMs, highlighting their unique characteristics and delineating their differences from LLMs. Using a Windows OS-based agent as a case study, we provide a detailed, step-by-step guide on the key stages of LAM development, including data collection, model training, environment integration, grounding, and evaluation. This generalizable workflow can serve as a blueprint for creating functional LAMs in various application domains. We conclude by identifying the current limitations of LAMs and discussing directions for future research and industrial deployment, emphasizing the challenges and opportunities that lie ahead in realizing the full potential of LAMs in real-world applications.

ICML Conference 2025 Conference Paper

Learning Dynamics in Continual Pre-Training for Large Language Models

  • Xingjin Wang
  • Howe Tissue
  • Lu Wang
  • Linjing Li
  • Daniel Dajun Zeng

Continual Pre-Training (CPT) has become a popular and effective method to apply strong foundation models to specific downstream tasks. In this work, we explore the learning dynamics throughout the CPT process for large language models (LLMs). We specifically focus on how general and downstream domain performance evolves at each training step, with domain performance measured via validation losses. We have observed that the CPT loss curve fundamentally characterizes the transition from one curve to another hidden curve, and could be described by decoupling the effects of distribution shift and learning rate (LR) annealing. We derive a CPT scaling law that combines the two factors, enabling the prediction of loss at any (continual) training steps and across learning rate schedules (LRS) in CPT. Our formulation presents a comprehensive understanding of several critical factors in CPT, including the learning rate, the training steps, and the distribution distance between PT and CPT datasets. Moreover, our approach can be adapted to customize training hyper-parameters to different CPT goals such as balancing general and domain-specific performance. Extensive experiments demonstrate that our scaling law holds across various CPT datasets and training hyper-parameters.

NeurIPS Conference 2025 Conference Paper

MLRC-Bench: Can Language Agents Solve Machine Learning Research Challenges?

  • Yunxiang Zhang
  • Muhammad Khalifa
  • Shitanshu Bhushan
  • Grant Murphy
  • Lajanugen Logeswaran
  • Jaekyeom Kim
  • Moontae Lee
  • Honglak Lee

We introduce MLRC-Bench, a benchmark designed to quantify how effectively language agents can tackle challenging M achine L earning (ML) R esearch C ompetitions, with a focus on open research problems that demand novel methodologies. Unlike prior work, e. g. , AI Scientist, which evaluates the end-to-end agentic pipeline by using LLM-as-a-judge, MLRC-Bench measures the key steps of proposing and implementing novel research methods and evaluates them with rigorous protocol and objective metrics. Our curated suite of 7 competition tasks reveals significant challenges for LLM agents. Even the best-performing tested agent (gemini-exp-1206 under MLAB) closes only 9. 3% of the gap between baseline and top human participant scores. Furthermore, our analysis reveals a misalignment between the LLM-judged innovation and their actual performance on cutting-edge ML research problems. MLRC-Bench is a dynamic benchmark, which is designed to continually grow with new ML competitions to encourage rigorous and objective evaluations of AI’s research capabilities. Our leaderboard and code are publicly available at https: //huggingface. co/spaces/launch/MLRC_Bench.

NeurIPS Conference 2025 Conference Paper

Scaling Law with Learning Rate Annealing

  • Howe Tissue
  • Venus Wang
  • Lu Wang

We find that the cross-entropy loss curves of neural language models empirically adhere to a scaling law with learning rate (LR) annealing over training steps: $$L(s) = L_0 + A\cdot S_1^{-\alpha} - C\cdot S_2, $$ where $L(s)$ is the validation loss at step $s$, $S_1$ is the area under the LR curve, $S_2$ is the LR annealing area, and $L_0$, $A$, $C$, $\alpha$ are constant parameters. This formulation accounts for two main effects: (1) power-law scaling over data size, and (2) the additional loss reduction during LR annealing. Unlike previous studies that only fit losses at final steps, our formulation captures the entire training curve, allowing for parameter fitting using losses from any training step. Applying the scaling law with LR annealing and fitting only one or two training curves, we can accurately predict the loss at any given step under any learning rate scheduler (LRS). This approach significantly reduces computational cost in formulating scaling laws while providing more accuracy and expressiveness. Extensive experiments demonstrate that our findings hold across a range of hyper-parameters and model architectures and can extend to scaling effect of model sizes. Moreover, our formulation provides accurate theoretical insights into empirical results observed in numerous previous studies, particularly those focusing on LR schedule and annealing. We believe that this work is promising to enhance the understanding of LLM training dynamics while democratizing scaling laws, and it is helpful to guide both research and industrial participants in refining training strategies for further LLMs.

AAAI Conference 2025 Conference Paper

SSC-VAE: Structured Sparse Coding Based Variational Autoencoder for Detail Preserved Image Reconstruction

  • Hao Wang
  • Lu Wang
  • Zhongyu Wang
  • Lixin Ma
  • Ye Luo

Discrete latent representation techniques, such as Vector Quantization (VQ) and Sparse Coding (SC), have demonstrated superior image reconstruction and generation quality compared to continuous representation methods in Variational Autoencoders (VAEs). However, existing approaches often treat the latent representations of an image independently in their discrete representation space, neglecting both the inherent structural information within each representation and the correlations among them. This oversight leads to coarse representations and suboptimal generated results. In this paper, we address these limitations by introducing correlations among and within the latent representations of individual images in the latent discrete space of VAEs using sparse coding. We impose two-dimensional structural information through adaptive thresholding, enhancing local structure in image representations while suppressing noise via parsimonious representation with a learned dictionary. Empirical studies on three real benchmark datasets, including a clinical Ultrasound dataset, BSDS500, and mini-Imagenet, demonstrate that our proposed model preserves fine-grained details in image reconstruction and significantly outperforms baseline models of SC-VAE and VQ-VAE across objective and subjective image quality metrics. Particularly noteworthy are the substantial performance improvements observed on the ultrasound dataset, where structure information is crucial. Specifically, we observe significant performance improvements of 7.68 % and 17.03 % in SSIM, 3.25 dB and 6.58 dB in PSNR, 0.15 and 0.24 in LPIPS, 45.38 and 84.05 in FID over SC-VAE and VQ-VAE, respectively, indicating the superiority of our method in terms of image reconstruction quality and fidelity.

IROS Conference 2025 Conference Paper

UVS: A Novel Underwater Vehicle with Integrated VCMS-Thrusters Hybrid Architecture for Enhanced Attitude Regulation

  • Suohang Zhang
  • Shipang Qian
  • Ruiheng Liu
  • Lu Wang
  • Xinyu Fei
  • Yanhu Chen

Autonomous Underwater Vehicles (AUVs) require energy-efficient and responsive attitude control for underwater operations. We present UVS, a novel underwater vehicle that combines Variable Center of Mass System (VCMS) and thrusters for hybrid attitude regulation. Through multi-objective optimization of the VCMS structure, we achieved a 5. 19% larger pitch angle range while reducing space occupation by 15. 72%. Pool experiments demonstrated near-linear pitch control from 17. 5° to 172. 5° with stable horizontal-vertical mode transitions. Our proposed collaborative control method integrates VCMS and thruster advantages, enabling rapid convergence to target attitudes with long-term stability. The results show UVS’s potential for energy-efficient, wide-range attitude control in mobile ocean sensing applications.

NeurIPS Conference 2025 Conference Paper

VCM: Vision Concept Modeling with Adaptive Vision Token Compression via Instruction Fine-Tuning

  • Run Luo
  • Renke Shan
  • Longze Chen
  • Ziqiang Liu
  • Lu Wang
  • Min Yang
  • Xiaobo Xia

Large vision-language models (LVLMs) have emerged as foundational tools for real-world AI applications. Despite their remarkable capabilities, current LVLMs process entire images at the token level, leading to significant inefficiencies compared to human cognition, which selectively focuses on high-level vision concepts. This token-level redundancy becomes increasingly problematic for high-resolution images and long video sequences, resulting in large computational costs and limited scalability in practical applications. To address this limitation, we introduce the concept of a vision concept model, a novel paradigm that enables LVLMs to dynamically extract the most relevant vision concepts from complex inputs, based on task-specific instructions. To optimize this vision concept modeling process, we propose VCM, a self-supervised framework that leverages vision-language correlations across diverse instances. VCM is designed to learn meaningful vision concepts without the need for expensive concept-level annotations. At its core, it employs a forward-backward optimization algorithm that supports LVLMs to adjust concept granularity and spatial alignment dynamically. Experiments demonstrate that VCM remarkably reduces computational costs (e. g. , achieving up to 85\% fewer FLOPs for LLaVA-1. 5-7B), while maintaining strong performance across a series of vision-language tasks. The codebase is available at https: //github. com/RainBowLuoCS/VCM.

ICRA Conference 2025 Conference Paper

VIP-Dock: Vision, Inertia, and Pressure Sensor Fusion for Underwater Docking with Optical Beacon Guidance

  • Suohang Zhang
  • Shipang Qian
  • Lu Wang
  • Xinyu Fei
  • Yanhu Chen

Underwater docking enhances the operational capabilities of Autonomous Underwater Vehicles (AUVs) by facilitating energy and data transfer. Optical beacons serve as the primary guidance method for AUVs to localize and track docking stations. This paper presents VIP-Dock, a novel optical beacon tracking algorithm for robust underwater docking of AUVs. VIP-Dock addresses the challenge of maintaining accurate beacon tracking under visual interference by integrating visual, inertial, and pressure perception. Employing an unscented Kalman filter framework, the VIP-Dock algorithm provides continuous optimal estimation of beacon positions. Experimental results demonstrated VIP-Dock's real-time tracking performance in actual docking scenarios and its ability to maintain accuracy during visual input failure. Implementation in a simulation platform for an underwater vertical shuttle showed significant improvement, increasing docking success rates from 62% to 84% across 100 trials under simulated current disturbances.

AAAI Conference 2024 Conference Paper

Balance Reward and Safety Optimization for Safe Reinforcement Learning: A Perspective of Gradient Manipulation

  • Shangding Gu
  • Bilgehan Sel
  • Yuhao Ding
  • Lu Wang
  • Qingwei Lin
  • Ming Jin
  • Alois Knoll

Ensuring the safety of Reinforcement Learning (RL) is crucial for its deployment in real-world applications. Nevertheless, managing the trade-off between reward and safety during exploration presents a significant challenge. Improving reward performance through policy adjustments may adversely affect safety performance. In this study, we aim to address this conflicting relation by leveraging the theory of gradient manipulation. Initially, we analyze the conflict between reward and safety gradients. Subsequently, we tackle the balance between reward and safety optimization by proposing a soft switching policy optimization method, for which we provide convergence analysis. Based on our theoretical examination, we provide a safe RL framework to overcome the aforementioned challenge, and we develop a Safety-MuJoCo Benchmark to assess the performance of safe RL algorithms. Finally, we evaluate the effectiveness of our method on the Safety-MuJoCo Benchmark and a popular safe benchmark, Omnisafe. Experimental results demonstrate that our algorithms outperform several state-of-the-art baselines in terms of balancing reward and safety optimization.

AAAI Conference 2024 Conference Paper

Parallel Ranking of Ads and Creatives in Real-Time Advertising Systems

  • Zhiguang Yang
  • Liufang Sang
  • Haoran Wang
  • Wenlong Chen
  • Lu Wang
  • Jie He
  • Changping Peng
  • Zhangang Lin

Creativity is the heart and soul of advertising services. Effective creatives can create a win-win scenario: advertisers each target users and achieve marketing objectives more effectively, users more quickly find products of interest, and platforms generate more advertising revenue. With the advent of AI-Generated Content, advertisers now can produce vast amounts of creative content at a minimal cost. The current challenge lies in how advertising systems can select the most pertinent creative in real-time for each user personally. Existing methods typically perform serial ranking of ads or creatives, limiting the creative module in terms of both effectiveness and efficiency. In this paper, we propose for the first time a novel architecture for online parallel estimation of ads and creatives ranking, as well as the corresponding offline joint optimization model. The online architecture enables sophisticated personalized creative modeling while reducing overall latency. The offline joint model for CTR estimation allows mutual awareness and collaborative optimization between ads and creatives. Additionally, we optimize the offline evaluation metrics for the implicit feedback sorting task involved in ad creative ranking. We conduct extensive experiments to compare ours with two state-of-the-art approaches. The results demonstrate the effectiveness of our approach in both offline evaluations and real-world advertising platforms online in terms of response time, CTR, and CPM.

JBHI Journal 2024 Journal Article

Spatio-Temporal Classification of Lung Ventilation Patterns Using 3D EIT Images: A General Approach for Individualized Lung Function Evaluation

  • Shuzhe Chen
  • Li Li
  • Zhichao Lin
  • Ke Zhang
  • Ying Gong
  • Lu Wang
  • Xu Wu
  • Maokun Li

The Pulmonary Function Test (PFT) is a widely utilized and rigorous classification test for evaluating lung function, serving as a comprehensive diagnostic tool for lung conditions. Meanwhile, Electrical Impedance Tomography (EIT) is a rapidly advancing clinical technique that visualizes conductivity distribution induced by ventilation. EIT provides additional spatial and temporal information on lung ventilation beyond traditional PFT. However, relying solely on conventional isolated interpretations of PFT results and EIT images overlooks the continuous dynamic aspects of lung ventilation. This study aims to classify lung ventilation patterns by extracting spatial and temporal features from the 3D EIT image series. The study uses a Variational Autoencoder (VAE) with a MultiRes block to compress the spatial distribution in a 3D image into a one-dimensional vector. These vectors are then stacked to create a feature map for the exhibition of temporal features. A simple convolutional neural network is used for classification. Data from 137 subjects were utilized for the training phase. Initially, the model underwent validation through a leave-one-out cross-validation process. During this validation, the model achieved an accuracy and sensitivity of 0. 96 and 1. 00, respectively, with an f1-score of 0. 98 when identifying the normal subjects. To assess pipeline reliability and feasibility, we tested it on 9 newly recruited subjects, with accurate ventilation mode predictions for 8 out of 9. In addition, we included 2D EIT results for comparison and conducted ablation experiments to validate the effectiveness of the VAE. The study demonstrates the potential of using image series for lung ventilation mode classification, providing a feasible method for patient prescreening and presenting an alternative form of PFT.

NeurIPS Conference 2023 Conference Paper

Conservative State Value Estimation for Offline Reinforcement Learning

  • Liting Chen
  • Jie Yan
  • Zhengdao Shao
  • Lu Wang
  • Qingwei Lin
  • Saravanakumar Rajmohan
  • Thomas Moscibroda
  • Dongmei Zhang

Offline reinforcement learning faces a significant challenge of value over-estimation due to the distributional drift between the dataset and the current learned policy, leading to learning failure in practice. The common approach is to incorporate a penalty term to reward or value estimation in the Bellman iterations. Meanwhile, to avoid extrapolation on out-of-distribution (OOD) states and actions, existing methods focus on conservative Q-function estimation. In this paper, we propose Conservative State Value Estimation (CSVE), a new approach that learns conservative V-function via directly imposing penalty on OOD states. Compared to prior work, CSVE allows more effective state value estimation with conservative guarantees and further better policy optimization. Further, we apply CSVE and develop a practical actor-critic algorithm in which the critic does the conservative value estimation by additionally sampling and penalizing the states around the dataset, and the actor applies advantage weighted updates extended with state exploration to improve the policy. We evaluate in classic continual control tasks of D4RL, showing that our method performs better than the conservative Q-function learning methods and is strongly competitive among recent SOTA methods.

JBHI Journal 2022 Journal Article

Automatic Coronary Artery Segmentation of CCTA Images With an Efficient Feature-Fusion-and-Rectification 3D-UNet

  • Along Song
  • Lisheng Xu
  • Lu Wang
  • Bin Wang
  • Xiaofan Yang
  • Bu Xu
  • Benqiang Yang
  • Stephen E. Greenwald

Automatic coronary artery segmentation is of great value in diagnosing coronary disease. In this paper, we propose an automatic coronary artery segmentation method for coronary computerized tomography angiography (CCTA) images based on a deep convolutional neural network. The proposed method consists of three steps. First, to improve the efficiency and effectiveness of the segmentation, a 2D DenseNet classification network is utilized to screen out the non-coronary-artery slices. Second, we propose a coronary artery segmentation network based on the 3D-UNet, which is capable of extracting, fusing and rectifying features efficiently for accurate coronary artery segmentation. Specifically, in the encoding process of the 3D-UNet network, we adapt the dense block into the 3D-UNet so that it can extract rich and representative features for coronary artery segmentation; In the decoding process, 3D residual blocks with feature rectification capability are applied to improve the segmentation quality further. Third, we introduce a Gaussian weighting method to obtain the final segmentation results. This operation can highlight the more reliable segmentation results at the center of the 3D data blocks while weakening the less reliable segmentations at the block boundary when merging the segmentation results of spatially overlapping data blocks. Experiments demonstrate that our proposed method achieves a Dice Similarity Coefficient (DSC) value of 0. 826 on a CCTA dataset constructed by us. The code of the proposed method is available at https://github.com/alongsong/3D_CAS.

ICLR Conference 2022 Conference Paper

LoRA: Low-Rank Adaptation of Large Language Models

  • Edward J. Hu
  • Yelong Shen
  • Phillip Wallis
  • Zeyuan Allen-Zhu
  • Yuanzhi Li
  • Shean Wang
  • Lu Wang
  • Weizhu Chen

An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example -- deploying independent instances of fine-tuned models, each with 175B parameters, is prohibitively expensive. We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. Compared to GPT-3 175B fine-tuned with Adam, LoRA can reduce the number of trainable parameters by a factor of 10,000 and the GPU memory requirement by a factor of 3. LoRA performs on-par or better than fine-tuning in model quality on RoBERTa, DeBERTa, GPT-2, and GPT-3, despite having fewer trainable parameters, a higher training throughput, and, unlike adapters, no additional inference latency. We also provide an empirical investigation into rank-deficiency in language model adaptation, which sheds light on the efficacy of LoRA. We release a package that facilitates the integration of LoRA with PyTorch models and provide our implementations and model checkpoints for RoBERTa, DeBERTa, and GPT-2 at https://github.com/microsoft/LoRA.

IJCAI Conference 2022 Conference Paper

T-SMOTE: Temporal-oriented Synthetic Minority Oversampling Technique for Imbalanced Time Series Classification

  • Pu Zhao
  • Chuan Luo
  • Bo Qiao
  • Lu Wang
  • Saravan Rajmohan
  • Qingwei Lin
  • Dongmei Zhang

Time series classification is a popular and important topic in machine learning, and it suffers from the class imbalance problem in many real-world applications. In this paper, to address the class imbalance problem, we propose a novel and practical oversampling method named T-SMOTE, which can make full use of the temporal information of time-series data. In particular, for each sample of minority class, T-SMOTE generates multiple samples that are close to class border. Then, based on those samples near class border, T-SMOTE synthesizes more samples. Finally, a weighted sampling method is called on both generated samples near class border and synthetic samples. Extensive experiments on a diverse set of both univariate and multivariate time-series datasets demonstrate that T-SMOTE consistently outperforms the current state-of-the-art methods on imbalanced time series classification. More encouragingly, our empirical evaluations show that T-SMOTE performs better in the scenario of early prediction, an important application scenario in industry, which indicates that T-SMOTE could bring benefits in practice.

AAAI Conference 2020 Conference Paper

Copy or Rewrite: Hybrid Summarization with Hierarchical Reinforcement Learning

  • Liqiang Xiao
  • Lu Wang
  • Hao He
  • Yaohui Jin

Jointly using the extractive and abstractive summarization methods can combine their complementary advantages, generating both informative and concise summary. Existing methods that adopt an extract-then-abstract strategy have achieved impressive results, yet they suffer from the information loss in the abstraction step because they compress all the selected sentences without distinguish. Especially when the whole sentence is summary-worthy, salient content would be lost by compression. To address this problem, we propose HYSUM, a hybrid framework for summarization that can flexibly switch between copying sentence and rewriting sentence according to the degree of redundancy. In this way, our approach can effectively combine the advantages of two branches of summarization, juggling informativity and conciseness. Moreover, we based on Hierarchical Reinforcement Learning, propose an end-to-end reinforcing method to bridge together the extraction module and rewriting module, which can enhance the cooperation between them. Automatic evaluation shows that our approach significantly outperforms the state-of-the-arts on the CNN/DailyMail corpus. Human evaluation also demonstrates that our generated summaries are more informative and concise than popular models.

NeurIPS Conference 2020 Conference Paper

Provably Robust Metric Learning

  • Lu Wang
  • Xuanqing Liu
  • Jinfeng Yi
  • Yuan Jiang
  • Cho-Jui Hsieh

Metric learning is an important family of algorithms for classification and similarity search, but the robustness of learned metrics against small adversarial perturbations is less studied. In this paper, we show that existing metric learning algorithms, which focus on boosting the clean accuracy, can result in metrics that are less robust than the Euclidean distance. To overcome this problem, we propose a novel metric learning algorithm to find a Mahalanobis distance that is robust against adversarial perturbations, and the robustness of the resulting model is certifiable. Experimental results show that the proposed metric learning algorithm improves both certified robust errors and empirical robust errors (errors under adversarial attacks). Furthermore, unlike neural network defenses which usually encounter a trade-off between clean and robust errors, our method does not sacrifice clean errors compared with previous metric learning methods.

AAAI Conference 2019 Conference Paper

Learning Segmentation Masks with the Independence Prior

  • Songmin Dai
  • Xiaoqiang Li
  • Lu Wang
  • Pin Wu
  • Weiqin Tong
  • Yimin Chen

An instance with a bad mask might make a composite image that uses it look fake. This encourages us to learn segmentation by generating realistic composite images. To achieve this, we propose a novel framework that exploits a new proposed prior called the independence prior based on Generative Adversarial Networks (GANs). The generator produces an image with multiple category-specific instance providers, a layout module and a composition module. Firstly, each provider independently outputs a category-specific instance image with a soft mask. Then the provided instances’ poses are corrected by the layout module. Lastly, the composition module combines these instances into a final image. Training with adversarial loss and penalty for mask area, each provider learns a mask that is as small as possible but enough to cover a complete category-specific instance. Weakly supervised semantic segmentation methods widely use grouping cues modeling the association between image parts, which are either artificially designed or learned with costly segmentation labels or only modeled on local pairs. Unlike them, our method automatically models the dependence between any parts and learns instance segmentation. We apply our framework in two cases: (1) Foreground segmentation on category-specific images with box-level annotation. (2) Unsupervised learning of instance appearances and masks with only one image of homogeneous object cluster (HOC). We get appealing results in both tasks, which shows the independence prior is useful for instance segmentation and it is possible to unsupervisedly learn instance masks with only one image.

JBHI Journal 2019 Journal Article

Local Motion Intensity Clustering (LMIC) Model for Segmentation of Right Ventricle in Cardiac MRI Images

  • Zengzhi Guo
  • Wenjun Tan
  • Lu Wang
  • Lisheng Xu
  • Xinhui Wang
  • Benqiang Yang
  • Yudong Yao

Analysis of the morphology and function of the right ventricle (RV) can be used for the prediction and diagnosis of cardiovascular disease. Accurate description of the structure and function of heart can be provided by analyzing cardiac magnetic resonance imaging (MRI) images. Noise interference and intensity inhomogeneity of MRI images can be addressed by using a local intensity clustering (LIC) model. However, the segmentation of the RV in MRI images still remains a challenge mainly due to its ill-defined borders. To address such a challenge, an algorithm for segmenting the RV based on a local motion intensity clustering (LMIC) model is proposed in this paper. The LMIC model combines the LIC model with the motion intensity information, due to cardiac motion and blood flow. The motion intensity is calculated by using the Lucas Kanade optical flow method and utilized in the LMIC model as an energy parameter. Because the motion intensity of the RV region is stronger than other areas, the RV can be accurately segmented by this approach. Experimental results demonstrate that the LMIC model is able to address the challenge of the ill-defined RV borders in cardiac MRI images and improved RV segmentation accuracy over existing methods.

IJCAI Conference 2016 Conference Paper

Cost-Saving Effect of Crowdsourcing Learning

  • Lu Wang
  • Zhi-Hua Zhou

Crowdsourcing is widely adopted in many domains as a popular paradigm to outsource work to individuals. In the machine learning community, crowdsourcing is commonly used as a cost-saving way to collect labels for training data. While a lot of effort has been spent on developing methods for inferring labels from a crowd, few work concentrates on the theoretical foundation of crowdsourcing learning. In this paper, we theoretically study the cost-saving effect of crowdsourcing learning, and present an upper bound for the minimally-sufficient number of crowd labels for effective crowdsourcing learning. Our results provide an understanding about how to allocate crowd labels efficiently, and are verified empirically.

AAAI Conference 2016 Conference Paper

Risk Minimization in the Presence of Label Noise

  • Wei Gao
  • Lu Wang
  • Yu-Feng Li
  • Zhi-Hua Zhou

Matrix concentration inequalities have attracted much attention in diverse applications such as linear algebra, statistical estimation, combinatorial optimization, etc. In this paper, we present new Bernstein concentration inequalities depending only on the first moments of random matrices, whereas previous Bernstein inequalities are heavily relevant to the first and second moments. Based on those results, we analyze the empirical risk minimization in the presence of label noise. We find that many popular losses used in risk minimization can be decomposed into two parts, where the first part won’t be affected and only the second part will be affected by noisy labels. We show that the influence of noisy labels on the second part can be reduced by our proposed LICS (Labeled Instance Centroid Smoothing) approach. The effectiveness of the LICS algorithm is justified both theoretically and empirically.

JBHI Journal 2015 Journal Article

A Novel Classification Method for Prediction of Rectal Bleeding in Prostate Cancer Radiotherapy Based on a Semi-Nonnegative ICA of 3D Planned Dose Distributions

  • Julie Coloigner
  • Aureline Fargeas
  • Amar Kachenoura
  • Lu Wang
  • Gael Drean
  • Caroline Lafond
  • Lotfi Senhadji
  • Renaud de Crevoisier

The understanding of dose/side-effects relationships in prostate cancer radiotherapy is crucial to define appropriate individual's constraints for the therapy planning. Most of the existing methods to predict side-effects do not fully exploit the rich spatial information conveyed by the three-dimensional planned dose distributions. We propose a new classification method for three-dimensional individuals’ doses, based on a new semi-nonnegative ICA algorithm to identify patients at risk of presenting rectal bleeding from a population treated for prostate cancer. The method first determines two bases of vectors from the population data: the two bases span vector subspaces, which characterize patients with and without rectal bleeding, respectively. The classification is then achieved by calculating the distance of a given patient to the two subspaces. The results, obtained on a cohort of 87 patients (at two year follow-up) treated with radiotherapy, showed high performance in terms of sensitivity and specificity.

ICRA Conference 2014 Conference Paper

Switching control of attitude tracking on a quadrotor UAV for large-angle rotational maneuvers

  • Lu Wang
  • Jianbo Su

This paper studies an attitude tracking control system of a quadrotor unmanned aerial vehicle (UAV) under the condition of large-angle rotational maneuvers. We first established the attitude error model, taking both external disturbances and internal uncertainties into account. Thereafter, a switching control strategy is proposed for both high tracking accuracy and velocity constraints. Experiments on attitude tracking validate higher control accuracy with proposed method. Tasks of flight at unknown initial attitude and flip are also presented to verify the effectiveness of this method under large-angle rotational maneuverability.