Arrow Research search

Author name cluster

Lu Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

49 papers
2 author rows

Possible papers

49

EAAI Journal 2026 Journal Article

Accurate detection and characterization of sub-millimeter cracks using nonlinear ultrasonics-informed parallel multi-branch convolutional neural network

  • Lu Wang
  • Zhengpan Qi
  • Xiangyan Ding
  • Ning Hu
  • Xiaoyang Bi
  • Han Zhang
  • Libin Zhao

Conventional ultrasonic testing struggles to inspect sub-millimeter cracks and identify multiple characteristics. To overcome these limitations, this study proposes a parallel multi-branch convolutional neural network (PMCNN) to simultaneously and accurately detect the depth, length, and orientation of sub-millimeter cracks. The artificial intelligence (AI) innovation lies in the PMCNN's branch-specific kernels and cross-task isolation layers that effectively decouple overlapping nonlinear ultrasonic signals. First, ultrasonic non-destructive testing was performed on micro-crack specimens to obtain essential data for verifying both the finite element (FE) model and PMCNN. Subsequently, an FE model was established to systematically analyze the coupling effects of depth, length, and orientation on the signals and to generate a comprehensive dataset for PMCNN training. The primary engineering application is the provision of an effective solution for quantitative micro-crack assessment in complex operational environments through experimental validation with practical inspection signals. Results reveal that while harmonic amplitudes correlated with individual parameter variations, their sensitivity significantly decreases under multiparameter conditions. Interpretability analysis confirms the distinct feature selectivity of each branch network, while a hybrid data training strategy maintains robust accuracy (above 90 %) under noisy conditions. Experimental validation demonstrates that the proposed method achieves stable and reliable performance, bridging advanced AI techniques with practical structural health monitoring needs.

YNIMG Journal 2026 Journal Article

An fMRI-based study of the effect of audiovisual stimulus temporal pacing on brain responses

  • Lu Wang
  • Xingwei An
  • Liang Zhao
  • Shuang Liu
  • Dong Ming

Research on the effect of stimulus temporal pacing on brain states is a central topic in neuroscience and psychology. Studies of audiovisual integration (AVI) in the fields of Brain-Computer Interfaces (BCIs) and neuropsychology have often yielded inconsistent findings, potentially due to variations in stimulus temporal pacing. Although a number of psychological experiments have investigated the effects of stimulus temporal pacing on brain activity, the underlying neural mechanisms remain poorly understood. This study aims to investigate how stimulus temporal pacing modulates the dynamic reconfiguration of brain activity and connectivity using functional magnetic resonance imaging (fMRI). A multimodal audiovisual oddball paradigm was employed, presenting stimuli at two temporal pacing conditions (rapid and slow) across three sensory modalities (visual, auditory, and audiovisual) to compare brain activation and functional connectivity across conditions. Results showed that in the unimodal condition, rapid stimuli preferentially engaged primary sensory cortices, indicating efficient perceptual encoding under high temporal pressure. In contrast, slow stimuli shifted processing toward higher-order cognitive regions, suggesting greater engagement in higher-order cognitive regions and enhance global network efficiency. For audiovisual condition, both rapid and slow stimuli elicit comparable functional connectivity patterns, whereas slow stimuli showed stronger connectivity in specific regions (e.g., occipital-motor areas, STG-DMN nodes), suggesting that the core audiovisual network and the extended whole-brain networks act in concert, forming a dual-layer processing mechanism. These findings provide a neural basis for understanding how stimulus temporal pacing acts as a modulator, shaping the dynamic balance between localized sensory analysis and integrated global processing.

AAAI Conference 2026 Conference Paper

Blur-Robust Detection via Feature Restoration: An End-to-End Framework for Prior-Guided Infrared UAV Target Detection

  • Xiaolin Wang
  • Houzhang Fang
  • Qingshan Li
  • Lu Wang
  • Yi Chang
  • Luxin Yan

Infrared unmanned aerial vehicle (UAV) target images often suffer from motion blur degradation caused by rapid sensor movement, significantly reducing contrast between target and background. Generally, detection performance heavily depends on the discriminative feature representation between target and background. Existing methods typically treat deblurring as a preprocessing step focused on visual quality, while neglecting the enhancement of task-relevant features crucial for detection. Improving feature representation for detection under blur conditions remains challenging. In this paper, we propose a novel Joint Feature-Domain Deblurring and Detection end-to-end framework, dubbed JFD³. We design a dual-branch architecture with shared weights, where the clear branch guides the blurred branch to enhance discriminative feature representation. Specifically, we first introduce a lightweight feature restoration network, where features from the clear branch serve as feature-level supervision to guide the blurred branch, thereby enhancing its distinctive capability for detection. We then propose a frequency structure guidance module that refines the structure prior from the restoration network and integrates it into shallow detection layers to enrich target structural information. Finally, a feature consistency self-supervised loss is imposed between the dual-branch detection backbones, driving the blurred branch to approximate the feature representations of the clear one. We also construct a benchmark, named IRBlurUAV, containing 30,000 simulated and 4,118 real infrared UAV target images with diverse motion blur. Extensive experiments on IRBlurUAV demonstrate that JFD³ achieves superior detection performance while maintaining real-time efficiency.

AAAI Conference 2026 Conference Paper

Heterogeneous Complementary Distillation

  • Liuchi Xu
  • Hao Zheng
  • Lu Wang
  • Lisheng Xu
  • Jun Cheng

Knowledge distillation (KD) transfers the ``dark knowledge'' from a complex teacher model to a compact student model. However, heterogeneous architecture distillation, such as Vision Transformer (ViT) to ResNet18, faces challenges due to differences in spatial feature representations. Traditional KD methods are mostly designed for homogeneous architectures and hence struggle to effectively address the disparity. Although heterogeneous KD approaches have been developed recently to solve these issues, they often incur high computational costs and complex designs, or overly rely on logit alignment, which limits their ability to leverage the complementary features. To overcome these limitations, we propose Heterogeneous Complementary Distillation (HCD), a simple yet effective framework that integrates complementary teacher and student features to align representations in shared logits. These logits are decomposed and constrained to facilitate diverse knowledge transfer to the student. Specifically, HCD processes the student’s intermediate features through convolutional projector and adaptive pooling, concatenates them with teacher's feature from the penultimate layer and then maps them via the Complementary Feature Mapper (CFM) module, comprising fully connected layer, to produce shared logits. We further introduce Sub-logit Decoupled Distillation (SDD) that partitions the shared logits into n sub-logits, which are fused with teacher's logits to rectify classification. To ensure sub-logit diversity and reduce redundant knowledge transfer, we propose an Orthogonality Loss (OL). By preserving student-specific strengths and leveraging teacher knowledge, HCD enhances robustness and generalization in students. Extensive experiments on the CIFAR-100, fine-grained (e.g., CUB200, Aircraft) and ImageNet-1K datasets demonstrate that HCD outperforms state-of-the-art KD methods, establishing it as an effective solution for heterogeneous KD.

TMLR Journal 2026 Journal Article

Process Reward Models That Think

  • Muhammad Khalifa
  • Rishabh Agarwal
  • Lajanugen Logeswaran
  • Jaekyeom Kim
  • Hao Peng
  • Moontae Lee
  • Honglak Lee
  • Lu Wang

Step-by-step verifiers—also known as process reward models (PRMs)—are a key ingredient for test-time scaling, but training them requires expensive step-level supervision. This work aims to build data-efficient PRMs as verbalized step-wise reward models that verify every step in the solution by generating a verification chain-of-thought (CoT). We propose ThinkPRM, a long CoT verifier fine-tuned on orders of magnitude fewer process labels than those required by discriminative PRMs. Our approach capitalizes on the inherent reasoning abilities of long CoT models, and outperforms LLM-as-a-Judge and discriminative verifiers—using only 1% of the process labels in PRM800K—across several challenging benchmarks. Specifically, ThinkPRM beats the baselines on ProcessBench, MATH-500, and AIME ’24 under best-of-N selection and reward-guided search. In an out-of-domain evaluation over subsets of GPQA-Diamond and LiveCodeBench, our PRM surpasses discriminative verifiers trained with the full PRM800K by 8% and 4.5%, respectively. Lastly, under the same token budget, ThinkPRM scales up verification compute more effectively compared to LLM-as-a-Judge, outperforming it by 7.2% on a subset of ProcessBench. This work highlights the value of generative, long CoT PRMs that can scale test-time compute for verification while requiring minimal supervision for training.

EAAI Journal 2026 Journal Article

The research on railway inspection visual localization method based on multilevel feature matching

  • Peigang Li
  • Zexuan Liu
  • Qing Zhang
  • Zhipeng Zhang
  • Lu Wang
  • Jingqi Li
  • Jianjun Chen

Accurate localization is essential for railway inspection operations. This study addresses the limitations of existing localization methods concerning complexity, accuracy, and cost by proposing a two-stage Visual Position Recognition (VPR) method suitable for railway personnel and train passengers. This method is based on deep learning with a 34-layer residual network (ResNet34) and an epipolar geometry constraint model. Initially, a “visual-mileage” database is constructed by linking “far-near” trackside scenes to their corresponding mileage. The proposed two-stage localization framework consists of: (1) a preliminary localization stage that utilizes an improved ResNet34 and Network Vector of Locally Aggregated Descriptors (NetVLAD) for feature aggregation to extract visual features, enhanced through a triplet loss function for better discrimination; (2) a precise localization stage that employs epipolar geometry to estimate relative pose for high-accuracy localization. Experimental results from a 2. 38-km campus test route and a 30-km railway field test demonstrate a retrieval accuracy of 87. 5 % for the correct target location within the top 5 candidate images (TOP-5) and 83. 9 % within the top 1 candidate image (TOP-1). The average positioning error is 5. 67 m, with 90 % of errors being less than or equal to 10 m, thereby meeting the precision requirements for railway inspections. Our research represents the first application of Artificial Intelligence-driven visual localization in railway inspection positioning, innovatively incorporating train passengers into the railway inspection process and providing a novel solution for intelligent railway system inspections.

TMLR Journal 2026 Journal Article

VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model

  • Mengzhuo Chen
  • Jiani zheng
  • Lu Wang
  • Fangkai Yang
  • Chaoyun Zhang
  • Lingrui Mei
  • Wenjie Yin
  • Qingwei Lin

Training Vision-Language Models (VLMs) for Graphical User Interfaces (GUI) agents via Reinforcement Learning (RL) faces critical challenges: environment-based RL requires costly interactions, while environment-free methods struggle with distribution shift and reward generalization. We propose an environment-free RL framework that decouples action utility learning from policy optimization by leveraging a pretrained Value Environment Model (VEM), which requires no live environment interaction during policy optimization. VEM predicts value-aligned action utilities directly from offline data, distilling human-like priors about GUI interaction outcomes without requiring next-state prediction or environmental feedback. This avoids compounding errors and enhances resilience to UI changes by focusing on semantic reasoning (e.g., “Does this action advance the user’s goal?”). The framework operates in two stages: (1) pretraining VEM to learn action-level utility signals and (2) guiding policy exploration with frozen VEM signals, enabling layout-agnostic GUI automation. Evaluated across diverse benchmarks including Android-in-the-Wild for mobile apps and Multimodal-Mind2Web for web environments, VEM achieves state-of-the-art or highly competitive performance in both offline and online settings. It significantly outperforms environment-free baselines and matches or exceeds environment-based approaches, crucially without incurring interaction costs. Importantly, VEM demonstrates that robust, generalizable GUI agents can be trained efficiently using semantic-aware action utility prediction, proving effective across distinct interaction platforms like mobile and web. The code is available at https://github.com/microsoft/GUI-Agent-RL.

JBHI Journal 2025 Journal Article

A Blockchain-Enabled AI-Driven Secure Searchable Encryption Framework for Medical IoT Systems

  • Salabat Khan
  • Mansoor Khan
  • Muhammad Asghar Khan
  • Muhammad Attique Khan
  • Lu Wang
  • Kaishun Wu

Blockchain technology is widely adopted in the Internet of Medical Things (IoMT) for information storage and retrieval. The integration of blockchain with IoMT systems enhances security; however, it raises privacy and security in data searching and storage. This study proposes a novel Binary Spring Search (BSS) technique based on group theory and integrated with a hybrid deep neural network approach to enhance the security and trustworthiness of IoMT. The proposed method incorporates secure key revocation and dynamic policy updates. The proposed framework leverages blockchain technology for immutable and decentralized data management, Artificial Intelligence (AI) for dynamic data analysis and threat detection, and advanced searchable encryption techniques to facilitate secure and efficient data queries. The proposed patient-centered data access model that combines blockchain technology with trust chains makes our method safer and more efficient and demonstrates a return on investment. Furthermore, our blockchain-based architecture ensures the integrity and immutability of medical data generated by IoMT devices, allowing for decentralized and tamper-proof storage. We used the hyper-ledger fabric tool, known as OrigionLab, for simulations in a blockchain context. We claim that the suggested framework provides a more searchable and secure solution to the healthcare system when compared to the other methods given through our findings. The simulation results show that our algorithm significantly reduces transaction time while maintaining high levels of security, making it a robust solution for managing Patient Health Records (PHR) in a decentralized manner.

JBHI Journal 2025 Journal Article

Advancing Medical Innovation Through Blockchain-Secured Federated Learning for Smart Health

  • Salabat Khan
  • Mansoor Khan
  • Muhammad Asghar Khan
  • Lu Wang
  • Kaishun Wu

The rapid digitization of healthcare systems has led to a vast accumulation of electronic medical records (EMRs), offering an invaluable source of patient data that can significantly advance medical research and improve patient care. However, sharing EMRs for research purposes presents challenges, particularly concerning data privacy, security, and the limitations of traditional centralized data-sharing models. This paper introduces a novel approach that leverages blockchain technology to facilitate federated learning with EMRs, thereby addressing these challenges. Federated learning enables multiple institutions to collaboratively train a robust machine learning model without sharing raw data, preserving privacy and security. By integrating blockchain, this framework enhances data integrity, immutability, and trust, all in a decentralized environment. Blockchain serves as a transparent and secure ledger, recording model updates and aggregating them through a consensus-based mechanism. Smart contracts further enforce data usage policies, allowing only authorized access and maintaining control over data ownership and sharing. This approach empowers medical researchers and institutions to collaborate more effectively, accelerating the discovery of treatments, advancements in personalized medicine, and insights into rare diseases. It also enables patients to contribute to medical research while retaining control over their personal data, fostering a patient-centered approach to healthcare innovation. Experimental results confirm the efficacy and efficiency of this blockchain-enabled federated learning framework, highlighting its potential to transform medical research and adhere to stringent privacy and security standards. This study emphasizes the pivotal role of blockchain in enhancing Big Data analytics within healthcare, paving the way for improved collaboration, innovation, and patient outcomes.

TCS Journal 2025 Journal Article

Approximation algorithms for facility location and k-median with differential privacy

  • Lu Wang
  • Qilong Feng
  • Jianxin Wang

In this paper we consider the problems of facility location and k-median with differential privacy in metric space, where a local search-based framework is proposed to solve the differential privacy issues. The approximation algorithm given for the facility location problem has a multiplicative error of 4 and an additive error of O ( Δ n 2 log ⁡ n log ⁡ ( n + f max Δ − 1 ) ε − 1 ), where f max is the maximum facility-opening cost, n is the number of clients, and Δ is the maximum distance between any two input points. For the k-median problem, our local search-based framework yields an approximation algorithm with a multiplicative error of 4 + ε and an additive error of O ( Δ k 2 log 2 ⁡ n ε − 2 ).

YNIMG Journal 2025 Journal Article

Brain development during the lifespan of cynomolgus monkeys

  • Zhiqiang Tan
  • Binbin Nie
  • Huanhua Wu
  • Bang Li
  • Jingjie Shang
  • Tianhao Zhang
  • Zeyu Xiao
  • Chenchen Dong

F]FDG PET-MRI data from 228 healthy cynomolgus monkeys spanning the age range of 0.5-29.5 years to construct an age-specific multimodal image brain template toolset tailored to cynomolgus monkeys. Their brain volume and glucose metabolism were quantitatively analyzed by utilizing an individualized spatial segmentation algorithm. Our findings encapsulated the growth and development trends, sex differences, and asymmetrical variations in brain volume and glucose metabolism in cynomolgus monkeys, and analyzed the correlation between the brain volume and glucose metabolism. This endeavor enhances our capacity to leverage the cynomolgus monkey model in neuroscience research by providing a valuable resource for researchers. The age-specific brain template toolset and associated data offer a robust foundation for future investigations, facilitating a nuanced understanding of brain development in this primate species and, consequently, informing and advancing neuroscience research employing cynomolgus monkeys.

AAAI Conference 2025 Conference Paper

Debiased Distillation for Consistency Regularization

  • Lu Wang
  • Liuchi Xu
  • Xiong Yang
  • Zhenhua Huang
  • Jun Cheng

Knowledge distillation transfers "dark knowledge" from a large teacher model to a smaller student model, yielding a highly efficient network. To improve network's generalization ability, existing works use a larger temperature coefficient for knowledge distillation. Nevertheless, these methods may lower the target category's confidence and lead to ambiguous recognition of similar samples. To mitigate this issue, some studies introduce intra-batch distillation to reduce prediction discrepancy. However, these methods overlook the inconsistency between background information and the target category, which may increase prediction bias due to noise disturbance. Additionally, label imbalance from random sampling and batch size can undermine network generalization reliability. To tackle these challenges, we propose a simple yet effective Intra-class Knowledge Distillation (IKD) method that facilitates knowledge sharing within the same class to ensure consistent predictions. First, we initialize the matrix and the vector to store logits and class counts provided by the teacher, respectively. Then, in the first epoch, we calculate the sum of logits and sample counts per class and perform KD to prevent knowledge omission. Finally, in subsequent training, we update the matrix to obtain the average logits and compute the KL divergence between the student's output and the updated matrix according to the label index. This process ensures intra-class consistency and improves the student's performance. Furthermore, this method theoretically reduces prediction bias by ensuring intra-class consistency. Extensive experiments on the CIFAR-100, ImageNet-1K, and Tiny-ImageNet datasets validate the superiority of IKD.

ICML Conference 2025 Conference Paper

Domain2Vec: Vectorizing Datasets to Find the Optimal Data Mixture without Training

  • Mozhi Zhang
  • Howe Tissue
  • Lu Wang
  • Xipeng Qiu

We introduce Domain2Vec, a novel approach that decomposes any dataset into a linear combination of several meta-domains, a new concept designed to capture the key underlying features of datasets. Domain2Vec maintains a vocabulary of meta-domains and uses a classifier to decompose any given dataset into a domain vector that corresponds to a distribution over this vocabulary. These domain vectors enable the identification of optimal data mixture for language model (LM) pretraining in a training-free manner under the D *istribution A lignment A ssumption (DA$^{2}$), which suggests that when the data distribution of the training set and the validation set is more aligned, a lower validation loss is achieved. Moreover, Domain2Vec can be seamlessly integrated into previous works to model the relationship between domain vectors and LM performance, greatly enhancing the efficiency and scalability of previous methods. Extensive experiments demonstrate that Domain2Vec helps find the data mixture that enhances downstream task performance with minimal computational overhead. Specifically, Domain2Vec achieves the same validation loss on Pile-CC using only $51. 5$% of the compute required when training on the original mixture of The Pile Dataset. Under equivalent compute budget, Domain2Vec* improves downstream performance by an average of $2. 83$%.

YNIMG Journal 2025 Journal Article

Fast fluid-attenuated T2 mapping via multiple overlapping-echo detachment acquisition enhances preoperative histological classification of meningiomas

  • Qizhi Yang
  • Yijie Yang
  • Lu Wang
  • Xiao Wang
  • Linyu Fan
  • Weijian Wang
  • Qinqin Yang
  • Jianhui Zhong

Fluid-attenuated inversion recovery (FLAIR) is indispensable in MRI-based head-and-neck assessments, but its quantitative counterpart remains clinically absent due to the influence of cerebrospinal fluid (CSF) dynamics and the lengthy acquisition time spent on a series of weighting-increasing images. This work implements and validates fast fluid-attenuated T2 (FLA-T2) mapping via inversion-recovery-prepared multiple overlapping-echo detachment imaging (IR-MOLED). The clinical value is prospectively investigated with a cohort of 54 meningioma patients (mean age: 56 years ± 11 [standard deviation]; 19 men). Fluid-attenuated proton density mapping was simultaneously fulfilled and therefore intrinsically co-registered, revealing notable benefits in identifying CSF inflow. In quantifying parenchymal T2, IR-MOLED yielded a mean absolute error of 1.22 ms referring to spin-echo, and in fluid suppression, IR-MOLED exhibited a high radiographic consistence with orthodox FLAIR imaging. Using first-level histogram analysis, results of meningioma investigation first discovered: (1) in grading meningiomas, FLA-T2 mapping (AUC = 0.814) outshined FLAIR imaging (AUC = 0.685), contrast-enhanced T1-weighted imaging (insignificant), and T2 mapping (insignificant); and (2) in typing meningiomas, FLA-T2 classified transitional meningiomas from meningothelial or/and fibrous meningiomas, complementing the predictive ability of T2 mapping. In conclusion, with excluded parametric contribution from free water and standardized voxel value scales, FLA-T2 mapping permits a more precise description of brain parenchyma in both structural morphology and relaxation variables than T2 mapping and is fully superior to FLAIR imaging in preoperatively predicting the histopathologic heterogeneity of meningiomas.

NeurIPS Conference 2025 Conference Paper

FedWMSAM: Fast and Flat Federated Learning via Weighted Momentum and Sharpness-Aware Minimization

  • Tianle Li
  • Yongzhi Huang
  • Linshan Jiang
  • Chang Liu
  • Qipeng Xie
  • Wenfeng Du
  • Lu Wang
  • Kaishun Wu

In federated learning (FL), models must \emph{converge quickly} under tight communication budgets while \emph{generalizing} across non-IID client distributions. These twin requirements have naturally led to two widely used techniques: client/server \emph{momentum} to accelerate progress, and \emph{sharpness-aware minimization} (SAM) to prefer flat solutions. However, simply combining momentum and SAM leaves two structural issues unresolved in non-IID FL. We identify and formalize two failure modes: \emph{local–global curvature misalignment} (local SAM directions need not reflect the global loss geometry) and \emph{momentum-echo oscillation} (late-stage instability caused by accumulated momentum). To our knowledge, these failure modes have not been jointly articulated and addressed in the FL literature. We propose \textbf{FedWMSAM} to address both failure modes. First, we construct a momentum-guided global perturbation from server-aggregated momentum to align clients' SAM directions with the global descent geometry, enabling a \emph{single-backprop} SAM approximation that preserves efficiency. Second, we couple momentum and SAM via a cosine-similarity adaptive rule, yielding an early-momentum, late-SAM two-phase training schedule. We provide a non-IID convergence bound that \emph{explicitly models the perturbation-induced variance} $\sigma_\rho^2=\sigma^2+(L\rho)^2$ and its dependence on $(S, K, R, N)$ on the theory side. We conduct extensive experiments on multiple datasets and model architectures, and the results validate the effectiveness, adaptability, and robustness of our method, demonstrating its superiority in addressing the optimization challenges of Federated Learning. Our code is available at \url{https: //github. com/Li-Tian-Le/NeurlPS_FedWMSAM}.

IJCAI Conference 2025 Conference Paper

HARMONY: A Privacy-preserving and Sensor-agnostic Tele-monitoring system

  • Qipeng Xie
  • Hao Guo
  • Weizheng Wang
  • Yongzhi Huang
  • Linshan Jiang
  • Jiafei Wu
  • Shuxin Zhong
  • Lu Wang

Global aging necessitates tele-monitoring systems to provide real-time tracking and timely assistance for older adults living independently. While pervasive wireless devices (e. g. , CSI, IMU, UWB) enable cost-effective, non-intrusive monitoring, existing systems lack flexibility, limiting their adaptability to different environments. In this work, we posit that the motion dynamics of human movement are invariant across sensing modalities, inspiring the design of HARMONY—a privacy-preserving, sensor-agnostic system that supports multi-modal inputs and diverse tele-monitoring tasks. HARMONY incorporates Modality-agnostic Data Processing to uniformly encrypt multi-modal signals and Task-specific Activity Recognition for seamless tasks adaptation. A novel Encrypted-processing Engine then significantly accelerates computations on encrypted data by optimizing matrix and convolution operations. Evaluations across five different sensing modalities show that HARMONY consistently achieves high accuracy while delivering 3. 5 × to 130 × speedups over state-of-the-art baselines. Our results demonstrate that HARMONY is a practical, scalable, and privacy-centric prototype for next-generation remote healthcare.

TMLR Journal 2025 Journal Article

Large Action Models: From Inception to Implementation

  • Lu Wang
  • Fangkai Yang
  • Chaoyun Zhang
  • Junting Lu
  • Jiaxu Qian
  • Shilin He
  • Pu Zhao
  • Bo Qiao

As AI continues to advance, there is a growing demand for systems that go beyond language-based assistance and move toward intelligent agents capable of performing real-world actions. This evolution requires the transition from traditional Large Language Models (LLMs), which excel at generating textual responses, to Large Action Models (LAMs), designed for action generation and execution within dynamic environments. Enabled by agent systems, LAMs hold the potential to transform AI from passive language understanding to active task completion, marking a significant milestone in the progression toward artificial general intelligence. In this paper, we present a comprehensive framework for developing LAMs, offering a systematic approach to their creation, from inception to deployment. We begin with an overview of LAMs, highlighting their unique characteristics and delineating their differences from LLMs. Using a Windows OS-based agent as a case study, we provide a detailed, step-by-step guide on the key stages of LAM development, including data collection, model training, environment integration, grounding, and evaluation. This generalizable workflow can serve as a blueprint for creating functional LAMs in various application domains. We conclude by identifying the current limitations of LAMs and discussing directions for future research and industrial deployment, emphasizing the challenges and opportunities that lie ahead in realizing the full potential of LAMs in real-world applications.

ICML Conference 2025 Conference Paper

Learning Dynamics in Continual Pre-Training for Large Language Models

  • Xingjin Wang
  • Howe Tissue
  • Lu Wang
  • Linjing Li
  • Daniel Dajun Zeng

Continual Pre-Training (CPT) has become a popular and effective method to apply strong foundation models to specific downstream tasks. In this work, we explore the learning dynamics throughout the CPT process for large language models (LLMs). We specifically focus on how general and downstream domain performance evolves at each training step, with domain performance measured via validation losses. We have observed that the CPT loss curve fundamentally characterizes the transition from one curve to another hidden curve, and could be described by decoupling the effects of distribution shift and learning rate (LR) annealing. We derive a CPT scaling law that combines the two factors, enabling the prediction of loss at any (continual) training steps and across learning rate schedules (LRS) in CPT. Our formulation presents a comprehensive understanding of several critical factors in CPT, including the learning rate, the training steps, and the distribution distance between PT and CPT datasets. Moreover, our approach can be adapted to customize training hyper-parameters to different CPT goals such as balancing general and domain-specific performance. Extensive experiments demonstrate that our scaling law holds across various CPT datasets and training hyper-parameters.

NeurIPS Conference 2025 Conference Paper

MLRC-Bench: Can Language Agents Solve Machine Learning Research Challenges?

  • Yunxiang Zhang
  • Muhammad Khalifa
  • Shitanshu Bhushan
  • Grant Murphy
  • Lajanugen Logeswaran
  • Jaekyeom Kim
  • Moontae Lee
  • Honglak Lee

We introduce MLRC-Bench, a benchmark designed to quantify how effectively language agents can tackle challenging M achine L earning (ML) R esearch C ompetitions, with a focus on open research problems that demand novel methodologies. Unlike prior work, e. g. , AI Scientist, which evaluates the end-to-end agentic pipeline by using LLM-as-a-judge, MLRC-Bench measures the key steps of proposing and implementing novel research methods and evaluates them with rigorous protocol and objective metrics. Our curated suite of 7 competition tasks reveals significant challenges for LLM agents. Even the best-performing tested agent (gemini-exp-1206 under MLAB) closes only 9. 3% of the gap between baseline and top human participant scores. Furthermore, our analysis reveals a misalignment between the LLM-judged innovation and their actual performance on cutting-edge ML research problems. MLRC-Bench is a dynamic benchmark, which is designed to continually grow with new ML competitions to encourage rigorous and objective evaluations of AI’s research capabilities. Our leaderboard and code are publicly available at https: //huggingface. co/spaces/launch/MLRC_Bench.

EAAI Journal 2025 Journal Article

Open-pit mine occlusion object detection for unmanned transport vehicles

  • Chao Zheng
  • Guoxing Bai
  • Yu Meng
  • Lu Wang
  • Xianyao Jiang
  • Li Liu

Accurate object recognition in open-pit mine environments is crucial for the safety of autonomous transport vehicles. Existing autonomous driving perception mostly focuses on urban structured road traffic, and it is hard to adapt to the challenging open-pit mine environment. Lacking of datasets further limits the development of the specific work. In this paper, we propose an object detection dataset for open-pit mine autonomous driving applications. This dataset encompasses data from several mines and includes different periods such as day, dusk, and night. It provides detailed annotations for diverse objects in the open-pit mines and incorporates additional attributes for evaluating occlusion detection. In addition, to address the multi-scale changes of objects in open-pit mines and the occlusion problems caused by dust, we propose a novel occlusion mine object general distribution detection method, utilizing soft labels and vehicle attribute location to reduce the positioning ambiguity in difficult backgrounds and achieve specific object detection in harsh open-pit mine environments. Our work explores the benchmark for open-pit mine object recognition involving occlusion. Comparison with mainstream techniques on the benchmark demonstrates that our approach outperforms existing state-of-the-art methods and can achieve 82. 2%, 81. 7%, and 76. 7% average precision in easy, moderate, and hard modes, respectively.

NeurIPS Conference 2025 Conference Paper

Scaling Law with Learning Rate Annealing

  • Howe Tissue
  • Venus Wang
  • Lu Wang

We find that the cross-entropy loss curves of neural language models empirically adhere to a scaling law with learning rate (LR) annealing over training steps: $$L(s) = L_0 + A\cdot S_1^{-\alpha} - C\cdot S_2, $$ where $L(s)$ is the validation loss at step $s$, $S_1$ is the area under the LR curve, $S_2$ is the LR annealing area, and $L_0$, $A$, $C$, $\alpha$ are constant parameters. This formulation accounts for two main effects: (1) power-law scaling over data size, and (2) the additional loss reduction during LR annealing. Unlike previous studies that only fit losses at final steps, our formulation captures the entire training curve, allowing for parameter fitting using losses from any training step. Applying the scaling law with LR annealing and fitting only one or two training curves, we can accurately predict the loss at any given step under any learning rate scheduler (LRS). This approach significantly reduces computational cost in formulating scaling laws while providing more accuracy and expressiveness. Extensive experiments demonstrate that our findings hold across a range of hyper-parameters and model architectures and can extend to scaling effect of model sizes. Moreover, our formulation provides accurate theoretical insights into empirical results observed in numerous previous studies, particularly those focusing on LR schedule and annealing. We believe that this work is promising to enhance the understanding of LLM training dynamics while democratizing scaling laws, and it is helpful to guide both research and industrial participants in refining training strategies for further LLMs.

AAAI Conference 2025 Conference Paper

SSC-VAE: Structured Sparse Coding Based Variational Autoencoder for Detail Preserved Image Reconstruction

  • Hao Wang
  • Lu Wang
  • Zhongyu Wang
  • Lixin Ma
  • Ye Luo

Discrete latent representation techniques, such as Vector Quantization (VQ) and Sparse Coding (SC), have demonstrated superior image reconstruction and generation quality compared to continuous representation methods in Variational Autoencoders (VAEs). However, existing approaches often treat the latent representations of an image independently in their discrete representation space, neglecting both the inherent structural information within each representation and the correlations among them. This oversight leads to coarse representations and suboptimal generated results. In this paper, we address these limitations by introducing correlations among and within the latent representations of individual images in the latent discrete space of VAEs using sparse coding. We impose two-dimensional structural information through adaptive thresholding, enhancing local structure in image representations while suppressing noise via parsimonious representation with a learned dictionary. Empirical studies on three real benchmark datasets, including a clinical Ultrasound dataset, BSDS500, and mini-Imagenet, demonstrate that our proposed model preserves fine-grained details in image reconstruction and significantly outperforms baseline models of SC-VAE and VQ-VAE across objective and subjective image quality metrics. Particularly noteworthy are the substantial performance improvements observed on the ultrasound dataset, where structure information is crucial. Specifically, we observe significant performance improvements of 7.68 % and 17.03 % in SSIM, 3.25 dB and 6.58 dB in PSNR, 0.15 and 0.24 in LPIPS, 45.38 and 84.05 in FID over SC-VAE and VQ-VAE, respectively, indicating the superiority of our method in terms of image reconstruction quality and fidelity.

IROS Conference 2025 Conference Paper

UVS: A Novel Underwater Vehicle with Integrated VCMS-Thrusters Hybrid Architecture for Enhanced Attitude Regulation

  • Suohang Zhang
  • Shipang Qian
  • Ruiheng Liu
  • Lu Wang
  • Xinyu Fei
  • Yanhu Chen

Autonomous Underwater Vehicles (AUVs) require energy-efficient and responsive attitude control for underwater operations. We present UVS, a novel underwater vehicle that combines Variable Center of Mass System (VCMS) and thrusters for hybrid attitude regulation. Through multi-objective optimization of the VCMS structure, we achieved a 5. 19% larger pitch angle range while reducing space occupation by 15. 72%. Pool experiments demonstrated near-linear pitch control from 17. 5° to 172. 5° with stable horizontal-vertical mode transitions. Our proposed collaborative control method integrates VCMS and thruster advantages, enabling rapid convergence to target attitudes with long-term stability. The results show UVS’s potential for energy-efficient, wide-range attitude control in mobile ocean sensing applications.

NeurIPS Conference 2025 Conference Paper

VCM: Vision Concept Modeling with Adaptive Vision Token Compression via Instruction Fine-Tuning

  • Run Luo
  • Renke Shan
  • Longze Chen
  • Ziqiang Liu
  • Lu Wang
  • Min Yang
  • Xiaobo Xia

Large vision-language models (LVLMs) have emerged as foundational tools for real-world AI applications. Despite their remarkable capabilities, current LVLMs process entire images at the token level, leading to significant inefficiencies compared to human cognition, which selectively focuses on high-level vision concepts. This token-level redundancy becomes increasingly problematic for high-resolution images and long video sequences, resulting in large computational costs and limited scalability in practical applications. To address this limitation, we introduce the concept of a vision concept model, a novel paradigm that enables LVLMs to dynamically extract the most relevant vision concepts from complex inputs, based on task-specific instructions. To optimize this vision concept modeling process, we propose VCM, a self-supervised framework that leverages vision-language correlations across diverse instances. VCM is designed to learn meaningful vision concepts without the need for expensive concept-level annotations. At its core, it employs a forward-backward optimization algorithm that supports LVLMs to adjust concept granularity and spatial alignment dynamically. Experiments demonstrate that VCM remarkably reduces computational costs (e. g. , achieving up to 85\% fewer FLOPs for LLaVA-1. 5-7B), while maintaining strong performance across a series of vision-language tasks. The codebase is available at https: //github. com/RainBowLuoCS/VCM.

ICRA Conference 2025 Conference Paper

VIP-Dock: Vision, Inertia, and Pressure Sensor Fusion for Underwater Docking with Optical Beacon Guidance

  • Suohang Zhang
  • Shipang Qian
  • Lu Wang
  • Xinyu Fei
  • Yanhu Chen

Underwater docking enhances the operational capabilities of Autonomous Underwater Vehicles (AUVs) by facilitating energy and data transfer. Optical beacons serve as the primary guidance method for AUVs to localize and track docking stations. This paper presents VIP-Dock, a novel optical beacon tracking algorithm for robust underwater docking of AUVs. VIP-Dock addresses the challenge of maintaining accurate beacon tracking under visual interference by integrating visual, inertial, and pressure perception. Employing an unscented Kalman filter framework, the VIP-Dock algorithm provides continuous optimal estimation of beacon positions. Experimental results demonstrated VIP-Dock's real-time tracking performance in actual docking scenarios and its ability to maintain accuracy during visual input failure. Implementation in a simulation platform for an underwater vertical shuttle showed significant improvement, increasing docking success rates from 62% to 84% across 100 trials under simulated current disturbances.

AAAI Conference 2024 Conference Paper

Balance Reward and Safety Optimization for Safe Reinforcement Learning: A Perspective of Gradient Manipulation

  • Shangding Gu
  • Bilgehan Sel
  • Yuhao Ding
  • Lu Wang
  • Qingwei Lin
  • Ming Jin
  • Alois Knoll

Ensuring the safety of Reinforcement Learning (RL) is crucial for its deployment in real-world applications. Nevertheless, managing the trade-off between reward and safety during exploration presents a significant challenge. Improving reward performance through policy adjustments may adversely affect safety performance. In this study, we aim to address this conflicting relation by leveraging the theory of gradient manipulation. Initially, we analyze the conflict between reward and safety gradients. Subsequently, we tackle the balance between reward and safety optimization by proposing a soft switching policy optimization method, for which we provide convergence analysis. Based on our theoretical examination, we provide a safe RL framework to overcome the aforementioned challenge, and we develop a Safety-MuJoCo Benchmark to assess the performance of safe RL algorithms. Finally, we evaluate the effectiveness of our method on the Safety-MuJoCo Benchmark and a popular safe benchmark, Omnisafe. Experimental results demonstrate that our algorithms outperform several state-of-the-art baselines in terms of balancing reward and safety optimization.

AAAI Conference 2024 Conference Paper

Parallel Ranking of Ads and Creatives in Real-Time Advertising Systems

  • Zhiguang Yang
  • Liufang Sang
  • Haoran Wang
  • Wenlong Chen
  • Lu Wang
  • Jie He
  • Changping Peng
  • Zhangang Lin

Creativity is the heart and soul of advertising services. Effective creatives can create a win-win scenario: advertisers each target users and achieve marketing objectives more effectively, users more quickly find products of interest, and platforms generate more advertising revenue. With the advent of AI-Generated Content, advertisers now can produce vast amounts of creative content at a minimal cost. The current challenge lies in how advertising systems can select the most pertinent creative in real-time for each user personally. Existing methods typically perform serial ranking of ads or creatives, limiting the creative module in terms of both effectiveness and efficiency. In this paper, we propose for the first time a novel architecture for online parallel estimation of ads and creatives ranking, as well as the corresponding offline joint optimization model. The online architecture enables sophisticated personalized creative modeling while reducing overall latency. The offline joint model for CTR estimation allows mutual awareness and collaborative optimization between ads and creatives. Additionally, we optimize the offline evaluation metrics for the implicit feedback sorting task involved in ad creative ranking. We conduct extensive experiments to compare ours with two state-of-the-art approaches. The results demonstrate the effectiveness of our approach in both offline evaluations and real-world advertising platforms online in terms of response time, CTR, and CPM.

JBHI Journal 2024 Journal Article

Spatio-Temporal Classification of Lung Ventilation Patterns Using 3D EIT Images: A General Approach for Individualized Lung Function Evaluation

  • Shuzhe Chen
  • Li Li
  • Zhichao Lin
  • Ke Zhang
  • Ying Gong
  • Lu Wang
  • Xu Wu
  • Maokun Li

The Pulmonary Function Test (PFT) is a widely utilized and rigorous classification test for evaluating lung function, serving as a comprehensive diagnostic tool for lung conditions. Meanwhile, Electrical Impedance Tomography (EIT) is a rapidly advancing clinical technique that visualizes conductivity distribution induced by ventilation. EIT provides additional spatial and temporal information on lung ventilation beyond traditional PFT. However, relying solely on conventional isolated interpretations of PFT results and EIT images overlooks the continuous dynamic aspects of lung ventilation. This study aims to classify lung ventilation patterns by extracting spatial and temporal features from the 3D EIT image series. The study uses a Variational Autoencoder (VAE) with a MultiRes block to compress the spatial distribution in a 3D image into a one-dimensional vector. These vectors are then stacked to create a feature map for the exhibition of temporal features. A simple convolutional neural network is used for classification. Data from 137 subjects were utilized for the training phase. Initially, the model underwent validation through a leave-one-out cross-validation process. During this validation, the model achieved an accuracy and sensitivity of 0. 96 and 1. 00, respectively, with an f1-score of 0. 98 when identifying the normal subjects. To assess pipeline reliability and feasibility, we tested it on 9 newly recruited subjects, with accurate ventilation mode predictions for 8 out of 9. In addition, we included 2D EIT results for comparison and conducted ablation experiments to validate the effectiveness of the VAE. The study demonstrates the potential of using image series for lung ventilation mode classification, providing a feasible method for patient prescreening and presenting an alternative form of PFT.

YNICL Journal 2024 Journal Article

The brain topological alterations in the structural connectome and correlations with clinical characteristics in type 1 narcolepsy

  • Huiqin Zhang
  • Lin Xu
  • Zhu Ai
  • Linlin Wang
  • Lu Wang
  • Lili Li
  • Ruilin Zhang
  • Rong Xue

OBJECTIVE: To explore topological alterations of white matter (WM) structural connectome, and their associations with clinical characteristics in type 1 narcolepsy (NT1). METHODS: 46 NT1 patients and 34 age- and sex-matched healthy controls were recruited for clinical data and diffusion tensor imaging collection. Using graph theory analysis, the topology metrics of structural connectome, rich club organization, and connectivity properties were compared between two groups. Furthermore, partial correlation analysis was performed between the network characteristics of 90 nodes or weakened edges and clinical data using Pearson or Spearman correlation, controlling by age and sex. RESULTS: Between-group comparison reflected that NT1 patients exhibited sleep disorders with comorbidities of impaired cognition and psychological problems. In patients, the global efficiency, local efficiency, and average clustering coefficient were significantly lower, whereas characteristic path length was larger compared to healthy control. Pertinently, nodal path length of left middle frontal gyrus was positively correlated with Pittsburgh Sleep Quality Index scores. The rich club analysis identified six affected nodes: bilateral dorsolateral superior frontal gyrus, bilateral supplementary motor area, left hippocampus, and left pallidum. Furthermore, six significantly weakened structural connections seeding from these rich club nodes have shown significant correlations with clinical index or polysomnography parameters. CONCLUSION: In NT1 patients, WM structural connectome has shown to be disrupted, which were primarily distributed in frontal-parietal cortex, subcortical regions, and particularly cingulate, potentially affecting their clinical manifestations.

NeurIPS Conference 2023 Conference Paper

Conservative State Value Estimation for Offline Reinforcement Learning

  • Liting Chen
  • Jie Yan
  • Zhengdao Shao
  • Lu Wang
  • Qingwei Lin
  • Saravanakumar Rajmohan
  • Thomas Moscibroda
  • Dongmei Zhang

Offline reinforcement learning faces a significant challenge of value over-estimation due to the distributional drift between the dataset and the current learned policy, leading to learning failure in practice. The common approach is to incorporate a penalty term to reward or value estimation in the Bellman iterations. Meanwhile, to avoid extrapolation on out-of-distribution (OOD) states and actions, existing methods focus on conservative Q-function estimation. In this paper, we propose Conservative State Value Estimation (CSVE), a new approach that learns conservative V-function via directly imposing penalty on OOD states. Compared to prior work, CSVE allows more effective state value estimation with conservative guarantees and further better policy optimization. Further, we apply CSVE and develop a practical actor-critic algorithm in which the critic does the conservative value estimation by additionally sampling and penalizing the states around the dataset, and the actor applies advantage weighted updates extended with state exploration to improve the policy. We evaluate in classic continual control tasks of D4RL, showing that our method performs better than the conservative Q-function learning methods and is strongly competitive among recent SOTA methods.

EAAI Journal 2023 Journal Article

Heart action monitoring from pulse signals using a growing hybrid polynomial network

  • Lu Wang
  • Chunhui Zhao
  • P. Takis Mathiopoulos
  • Tomoaki Ohtsuki

Electrocardiogram (ECG) is a common used noninvasive test to quickly detect the heart problem with high precision. However, devices used to collect continuous ECG waveforms under free-living conditions have several operational difficulties. As an alternative, photoplethysmography (PPG) signals can be conveniently collected by pulse sensors, which can be mounted onto wearable devices. In this paper, we study the relation between ECG waveforms and PPG signals by proposing a novel growing hybrid polynomial network (GHPN)). Based on the projections between adjacent layers, the network is designed to approximate the target signals with the recorded inputs and the multi-scale dynamics of the input series are explored gradually through the growth of layers. Two public datasets are employed to evaluate the performance of the proposed approach on the accuracy of waveform reconstruction and heart rate (HR) detection with the widely used metrics. Compared with the reference ECG waveforms, the normalized mean square error (NMSE) of the proposed approach is 0. 248 and 0. 216 for PPG-Dalia and CapnoBase datasets, which is smaller than the comparison approaches. The average absolute value (AAE) between the detected HR and reference HR is 0. 93 and 1. 05 for PPG-Dalia and CapnoBase datasets, which exhibit higher HR detection accuracy. Experimental results obtained from benchmark datasets clearly show that the proposed method can achieve higher similarity on the reconstructed morphology with higher HR detection accuracy. Moreover, the proposed approach make it possible to employ PPG sensors for long-term monitoring of the heart actions with higher precision.

JBHI Journal 2022 Journal Article

Automatic Coronary Artery Segmentation of CCTA Images With an Efficient Feature-Fusion-and-Rectification 3D-UNet

  • Along Song
  • Lisheng Xu
  • Lu Wang
  • Bin Wang
  • Xiaofan Yang
  • Bu Xu
  • Benqiang Yang
  • Stephen E. Greenwald

Automatic coronary artery segmentation is of great value in diagnosing coronary disease. In this paper, we propose an automatic coronary artery segmentation method for coronary computerized tomography angiography (CCTA) images based on a deep convolutional neural network. The proposed method consists of three steps. First, to improve the efficiency and effectiveness of the segmentation, a 2D DenseNet classification network is utilized to screen out the non-coronary-artery slices. Second, we propose a coronary artery segmentation network based on the 3D-UNet, which is capable of extracting, fusing and rectifying features efficiently for accurate coronary artery segmentation. Specifically, in the encoding process of the 3D-UNet network, we adapt the dense block into the 3D-UNet so that it can extract rich and representative features for coronary artery segmentation; In the decoding process, 3D residual blocks with feature rectification capability are applied to improve the segmentation quality further. Third, we introduce a Gaussian weighting method to obtain the final segmentation results. This operation can highlight the more reliable segmentation results at the center of the 3D data blocks while weakening the less reliable segmentations at the block boundary when merging the segmentation results of spatially overlapping data blocks. Experiments demonstrate that our proposed method achieves a Dice Similarity Coefficient (DSC) value of 0. 826 on a CCTA dataset constructed by us. The code of the proposed method is available at https://github.com/alongsong/3D_CAS.

ICLR Conference 2022 Conference Paper

LoRA: Low-Rank Adaptation of Large Language Models

  • Edward J. Hu
  • Yelong Shen
  • Phillip Wallis
  • Zeyuan Allen-Zhu
  • Yuanzhi Li
  • Shean Wang
  • Lu Wang
  • Weizhu Chen

An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example -- deploying independent instances of fine-tuned models, each with 175B parameters, is prohibitively expensive. We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. Compared to GPT-3 175B fine-tuned with Adam, LoRA can reduce the number of trainable parameters by a factor of 10,000 and the GPU memory requirement by a factor of 3. LoRA performs on-par or better than fine-tuning in model quality on RoBERTa, DeBERTa, GPT-2, and GPT-3, despite having fewer trainable parameters, a higher training throughput, and, unlike adapters, no additional inference latency. We also provide an empirical investigation into rank-deficiency in language model adaptation, which sheds light on the efficacy of LoRA. We release a package that facilitates the integration of LoRA with PyTorch models and provide our implementations and model checkpoints for RoBERTa, DeBERTa, and GPT-2 at https://github.com/microsoft/LoRA.

IJCAI Conference 2022 Conference Paper

T-SMOTE: Temporal-oriented Synthetic Minority Oversampling Technique for Imbalanced Time Series Classification

  • Pu Zhao
  • Chuan Luo
  • Bo Qiao
  • Lu Wang
  • Saravan Rajmohan
  • Qingwei Lin
  • Dongmei Zhang

Time series classification is a popular and important topic in machine learning, and it suffers from the class imbalance problem in many real-world applications. In this paper, to address the class imbalance problem, we propose a novel and practical oversampling method named T-SMOTE, which can make full use of the temporal information of time-series data. In particular, for each sample of minority class, T-SMOTE generates multiple samples that are close to class border. Then, based on those samples near class border, T-SMOTE synthesizes more samples. Finally, a weighted sampling method is called on both generated samples near class border and synthetic samples. Extensive experiments on a diverse set of both univariate and multivariate time-series datasets demonstrate that T-SMOTE consistently outperforms the current state-of-the-art methods on imbalanced time series classification. More encouragingly, our empirical evaluations show that T-SMOTE performs better in the scenario of early prediction, an important application scenario in industry, which indicates that T-SMOTE could bring benefits in practice.

YNICL Journal 2021 Journal Article

Increased risk for cerebral small vessel disease is associated with quantitative susceptibility mapping in HIV infected and uninfected individuals

  • Kyle D. Murray
  • Md Nasir Uddin
  • Madalina E. Tivarus
  • Bogachan Sahin
  • Henry Z. Wang
  • Meera V. Singh
  • Xing Qiu
  • Lu Wang

The aim of this study was to assess, in the context of cerebral small vessel disease (CSVD), whether cardiovascular risk factors and white matter hyperintensities (WMHs) were associated with brain tissue susceptibility as measured by quantitative susceptibility mapping (QSM). Given that CSVD is diagnosed by the presence of lacunar strokes, periventricular and deep WMHs, increased perivascular spaces, and microbleeds, we expected that QSM could capture changes in brain tissue due to underlying CSVD pathology. We compared a cohort of 101 HIV-infected individuals (mean age ± SD = 53.2 ± 10.9 years) with mild to moderate cardiovascular risk scores, as measured by the Reynolds risk score, to 102 age-matched controls (mean age (SD) = 50.3 (15.7) years) with similar Reynolds scores. We performed brain MRI to assess CSVD burden by acquiring 3D T1-MPRAGE, 3D FLAIR, 2D T2-TSE, and mGRE for QSM. We found that signs of CSVD are significantly higher in individuals with HIV-infection compared to controls and that WMH volumes are significantly correlated with age and cardiovascular risk scores. Regional QSM was associated with cardiovascular risk factors, age, sex, and WMH volumes but not HIV status. These results suggest that QSM may be an early imaging marker reflective of alterations in brain microcirculation.

YNIMG Journal 2021 Journal Article

Learning Clique Subgraphs in Structural Brain Network Classification with Application to Crystallized Cognition

  • Lu Wang
  • Feng Vankee Lin
  • Martin Cole
  • Zhengwu Zhang

Structural brain networks constructed from diffusion MRI are important biomarkers for understanding human brain structure and its relation to cognitive functioning. There is increasing interest in learning differences in structural brain networks between groups of subjects in neuroimaging studies, leading to a variable selection problem in network classification. Traditional methods often use independent edgewise tests or unstructured generalized linear model (GLM) with regularization on vectorized networks to select edges distinguishing the groups, which ignore the network structure and make the results hard to interpret. In this paper, we develop a symmetric bilinear logistic regression (SBLR) with elastic-net penalty to identify a set of small clique subgraphs in network classification. Clique subgraphs, consisting of all the interconnections among a subset of brain regions, have appealing neurological interpretations as they may correspond to some anatomical circuits in the brain related to the outcome. We apply this method to study differences in the structural connectome between adolescents with high and low crystallized cognitive ability, using the crystallized cognition composite score, picture vocabulary and oral reading recognition tests from NIH Toolbox. A few clique subgraphs containing several small sets of brain regions are identified between different levels of functioning, indicating their importance in crystallized cognition.

YNICL Journal 2021 Journal Article

Mitochondrial toxicity before and after combination antiretroviral therapy, a Magnetic Resonance Spectroscopy study

  • Madalina E. Tivarus
  • Yuchuan Zhuang
  • Lu Wang
  • Kyle D. Murray
  • Arun Venkataraman
  • Miriam T. Weber
  • Jianhui Zhong
  • Xing Qiu

The aim of this study was to quantify, via Magnetic Resonance Spectroscopy (MRS), the effect of combination antiretroviral therapy (cART) on brain metabolites and characterize any possible associations between changes in metabolites, age, blood biomarkers of neuronal damage, functional connectivity and cognitive performance. As cART has dramatically increased the life expectancy of HIV-infected (HIV + ) individuals and unmasked an increase in HIV-associated neurocognitive disorders, it is still not clear whether cART neurotoxicity contributes to these disorders. We hypothesized a bimodal effect, with early cART treatment of HIV infection decreasing inflammation as measured by MRS metabolites and improving cognitive performance, and chronic exposure to cART contributing to persistence of cognitive impairment via its effect on mitochondrial function. Basal ganglia metabolites, functional connectivity, cognitive scores, as well as plasma levels of neurofilament light chain (NfL) and tau protein were measured before and after 12 weeks, 1 year and 2 years of cART in a cohort of 50 cART-naïve HIV + subjects and 72 age matched HIV- healthy controls. Glutamate (Glu) levels were lower in the cART naïve patients than in healthy controls and were inversely correlated with plasma levels of NfL. There were no other significant metabolite differences between HIV + and uninfected individuals. Treatment improved Glu levels in HIV+, however, no associations were found between Glu, functional connectivity and cognitive performance. Stable brain metabolites and plasma levels of NfL and Tau over two-years of follow-ups suggest there are no signs of cART neurotoxicity in this relatively young cohort of HIV + individuals.

AAAI Conference 2020 Conference Paper

Copy or Rewrite: Hybrid Summarization with Hierarchical Reinforcement Learning

  • Liqiang Xiao
  • Lu Wang
  • Hao He
  • Yaohui Jin

Jointly using the extractive and abstractive summarization methods can combine their complementary advantages, generating both informative and concise summary. Existing methods that adopt an extract-then-abstract strategy have achieved impressive results, yet they suffer from the information loss in the abstraction step because they compress all the selected sentences without distinguish. Especially when the whole sentence is summary-worthy, salient content would be lost by compression. To address this problem, we propose HYSUM, a hybrid framework for summarization that can flexibly switch between copying sentence and rewriting sentence according to the degree of redundancy. In this way, our approach can effectively combine the advantages of two branches of summarization, juggling informativity and conciseness. Moreover, we based on Hierarchical Reinforcement Learning, propose an end-to-end reinforcing method to bridge together the extraction module and rewriting module, which can enhance the cooperation between them. Automatic evaluation shows that our approach significantly outperforms the state-of-the-arts on the CNN/DailyMail corpus. Human evaluation also demonstrates that our generated summaries are more informative and concise than popular models.

NeurIPS Conference 2020 Conference Paper

Provably Robust Metric Learning

  • Lu Wang
  • Xuanqing Liu
  • Jinfeng Yi
  • Yuan Jiang
  • Cho-Jui Hsieh

Metric learning is an important family of algorithms for classification and similarity search, but the robustness of learned metrics against small adversarial perturbations is less studied. In this paper, we show that existing metric learning algorithms, which focus on boosting the clean accuracy, can result in metrics that are less robust than the Euclidean distance. To overcome this problem, we propose a novel metric learning algorithm to find a Mahalanobis distance that is robust against adversarial perturbations, and the robustness of the resulting model is certifiable. Experimental results show that the proposed metric learning algorithm improves both certified robust errors and empirical robust errors (errors under adversarial attacks). Furthermore, unlike neural network defenses which usually encounter a trade-off between clean and robust errors, our method does not sacrifice clean errors compared with previous metric learning methods.

YNIMG Journal 2019 Journal Article

A population stereotaxic positron emission tomography brain template for the macaque and its application to ischemic model

  • Binbin Nie
  • Lu Wang
  • Yichao Hu
  • Shengxiang Liang
  • Zhiqiang Tan
  • Pei Chai
  • Yongjin Tang
  • Jingjie Shang

Purpose Positron emission tomography (PET) is a non-invasive imaging tool for the evaluation of brain function and neuronal activity in normal and diseased conditions with high sensitivity. The macaque monkey serves as a valuable model system in the field of translational medicine, for its phylogenetic proximity to man. To translation of non-human primate neuro-PET studies, an effective and objective data analysis platform for neuro-PET studies is needed. Materials and methods A set of stereotaxic templates of macaque brain, namely the Institute of High Energy Physics & Jinan University Macaque Template (HJT), was constructed by iteratively registration and averaging, based on 30 healthy rhesus monkeys. A brain atlas image was created in HJT space by combining sub-anatomical regions and defining new 88 bilateral functional regions, in which a unique integer was assigned for each sub-anatomical region. Results The HJT comprised a structural MRI T1 weighted image (T1WI) template image, a functional FDG-PET template image, intracranial tissue segmentations accompanied with a digital macaque brain atlas image. It is compatible with various commercially available software tools, such as SPM and PMOD. Data analysis was performed on a stroke model compared with a group of healthy controls to demonstrate the usage of HJT. Conclusion We have constructed a stereotaxic template set of macaque brain named HJT, which standardizes macaque neuroimaging data analysis, supports novel radiotracer development and facilitates translational neuro-disorders research.

AAAI Conference 2019 Conference Paper

Learning Segmentation Masks with the Independence Prior

  • Songmin Dai
  • Xiaoqiang Li
  • Lu Wang
  • Pin Wu
  • Weiqin Tong
  • Yimin Chen

An instance with a bad mask might make a composite image that uses it look fake. This encourages us to learn segmentation by generating realistic composite images. To achieve this, we propose a novel framework that exploits a new proposed prior called the independence prior based on Generative Adversarial Networks (GANs). The generator produces an image with multiple category-specific instance providers, a layout module and a composition module. Firstly, each provider independently outputs a category-specific instance image with a soft mask. Then the provided instances’ poses are corrected by the layout module. Lastly, the composition module combines these instances into a final image. Training with adversarial loss and penalty for mask area, each provider learns a mask that is as small as possible but enough to cover a complete category-specific instance. Weakly supervised semantic segmentation methods widely use grouping cues modeling the association between image parts, which are either artificially designed or learned with costly segmentation labels or only modeled on local pairs. Unlike them, our method automatically models the dependence between any parts and learns instance segmentation. We apply our framework in two cases: (1) Foreground segmentation on category-specific images with box-level annotation. (2) Unsupervised learning of instance appearances and masks with only one image of homogeneous object cluster (HOC). We get appealing results in both tasks, which shows the independence prior is useful for instance segmentation and it is possible to unsupervisedly learn instance masks with only one image.

JBHI Journal 2019 Journal Article

Local Motion Intensity Clustering (LMIC) Model for Segmentation of Right Ventricle in Cardiac MRI Images

  • Zengzhi Guo
  • Wenjun Tan
  • Lu Wang
  • Lisheng Xu
  • Xinhui Wang
  • Benqiang Yang
  • Yudong Yao

Analysis of the morphology and function of the right ventricle (RV) can be used for the prediction and diagnosis of cardiovascular disease. Accurate description of the structure and function of heart can be provided by analyzing cardiac magnetic resonance imaging (MRI) images. Noise interference and intensity inhomogeneity of MRI images can be addressed by using a local intensity clustering (LIC) model. However, the segmentation of the RV in MRI images still remains a challenge mainly due to its ill-defined borders. To address such a challenge, an algorithm for segmenting the RV based on a local motion intensity clustering (LMIC) model is proposed in this paper. The LMIC model combines the LIC model with the motion intensity information, due to cardiac motion and blood flow. The motion intensity is calculated by using the Lucas Kanade optical flow method and utilized in the LMIC model as an energy parameter. Because the motion intensity of the RV region is stronger than other areas, the RV can be accurately segmented by this approach. Experimental results demonstrate that the LMIC model is able to address the challenge of the ill-defined RV borders in cardiac MRI images and improved RV segmentation accuracy over existing methods.

YNICL Journal 2018 Journal Article

Alteration of brain network topology in HIV-associated neurocognitive disorder: A novel functional connectivity perspective

  • Anas Z. Abidin
  • Adora M. DSouza
  • Mahesh B. Nagarajan
  • Lu Wang
  • Xing Qiu
  • Giovanni Schifitto
  • Axel Wismüller

HIV is capable of invading the brain soon after seroconversion. This ultimately can lead to deficits in multiple cognitive domains commonly referred to as HIV-associated neurocognitive disorders (HAND). Clinical diagnosis of such deficits requires detailed neuropsychological assessment but clinical signs may be difficult to detect during asymptomatic injury of the central nervous system (CNS). Therefore neuroimaging biomarkers are of particular interest in HAND. In this study, we constructed brain connectivity profiles of 40subjects (20 HIV positive subjects and 20 age-matched seronegative controls) using two different methods: a non-linear mutual connectivity analysis approach and a conventional method based on Pearson's correlation. These profiles were then summarized using graph-theoretic methods characterizing their topological network properties. Standard clinical and laboratory assessments were performed and a battery of neuropsychological (NP) tests was administered for all participating subjects. Based on NP testing, 14 of the seropositive subjects exhibited mild neurologic impairment. Subsequently, we analyzed associations between the network derived measures and neuropsychological assessment scores as well as common clinical laboratory plasma markers (CD4 cell count, HIV RNA) after adjusting for age and gender. Mutual connectivity analysis derived graph-theoretic measures, Modularity and Small Worldness, were significantly (p <0. 05, FDR adjusted) associated with the Executive as well as Overall z-score of NP performance. In contrast, network measures derived from conventional correlation-based connectivity did not yield any significant results. Thus, changes in connectivity can be captured using advanced time-series analysis techniques. The demonstrated associations between imaging-derived graph-theoretic properties of brain networks with neuropsychological performance, provides opportunities to further investigate the evolution of HAND in larger, longitudinal studies. Our analysis approach, involving non-linear time-series analysis in conjunction with graph theory, is promising and it may prove to be useful not only in HAND but also in other neurodegenerative disorders.

YNIMG Journal 2018 Journal Article

First demonstration of in vivo mapping for regional brain monoacylglycerol lipase using PET with [11C]SAR127303

  • Tomoteru Yamasaki
  • Wakana Mori
  • Yiding Zhang
  • Akiko Hatori
  • Masayuki Fujinaga
  • Hidekatsu Wakizaka
  • Yusuke Kurihara
  • Lu Wang

Monoacylglycerol lipase (MAGL) is a main regulator of the endocannabinoid system within the central nervous system (CNS). Recently, [11C]SAR127303 was developed as a promising radioligand for MAGL imaging. In this study, we aimed to quantify regional MAGL concentrations in the rat brain using positron emission tomography (PET) with [11C]SAR127303. An irreversible two-tissue compartment model (2-TCMi, k 4 = 0) analysis was conducted to estimate quantitative parameters (k 3, K i 2-TCMi, and λk 3 ). These parameters were successfully obtained with high identifiability (<10 %COV) for the following regions ranked in order from highest to lowest: cingulate cortex > striatum > hippocampus > thalamus > cerebellum > hypothalamus ≈ pons. In vitro autoradiographs using [11C]SAR127303 showed a heterogeneous distribution of radioactivity, as seen in the PET images. The K i 2-TCMi and λk 3 values correlated relatively highly with in vitro binding (r > 0. 4, P < 0. 005). The K i 2-TCMi values showed high correlation and low underestimation (<10%) compared with the slope of a Patlak plot analysis with linear regression (K i Patlak ). In conclusion, we successfully estimated regional net uptake value of [11C]SAR127303 reflecting MAGL concentrations in rat brain regions for the first time.

IJCAI Conference 2016 Conference Paper

Cost-Saving Effect of Crowdsourcing Learning

  • Lu Wang
  • Zhi-Hua Zhou

Crowdsourcing is widely adopted in many domains as a popular paradigm to outsource work to individuals. In the machine learning community, crowdsourcing is commonly used as a cost-saving way to collect labels for training data. While a lot of effort has been spent on developing methods for inferring labels from a crowd, few work concentrates on the theoretical foundation of crowdsourcing learning. In this paper, we theoretically study the cost-saving effect of crowdsourcing learning, and present an upper bound for the minimally-sufficient number of crowd labels for effective crowdsourcing learning. Our results provide an understanding about how to allocate crowd labels efficiently, and are verified empirically.

AIJ Journal 2016 Journal Article

One-pass AUC optimization

  • Wei Gao
  • Lu Wang
  • Rong Jin
  • Shenghuo Zhu
  • Zhi-Hua Zhou

AUC is an important performance measure that has been used in diverse tasks, such as class-imbalanced learning, cost-sensitive learning, learning to rank, etc. In this work, we focus on one-pass AUC optimization that requires going through training data only once without having to store the entire training dataset. Conventional online learning algorithms cannot be applied directly to one-pass AUC optimization because AUC is measured by a sum of losses defined over pairs of instances from different classes. We develop a regression-based algorithm which only needs to maintain the first and second-order statistics of training data in memory, resulting in a storage requirement independent of the number of training data. To efficiently handle high-dimensional data, we develop two deterministic algorithms that approximate the covariance matrices. We verify, both theoretically and empirically, the effectiveness of the proposed algorithms.

AAAI Conference 2016 Conference Paper

Risk Minimization in the Presence of Label Noise

  • Wei Gao
  • Lu Wang
  • Yu-Feng Li
  • Zhi-Hua Zhou

Matrix concentration inequalities have attracted much attention in diverse applications such as linear algebra, statistical estimation, combinatorial optimization, etc. In this paper, we present new Bernstein concentration inequalities depending only on the first moments of random matrices, whereas previous Bernstein inequalities are heavily relevant to the first and second moments. Based on those results, we analyze the empirical risk minimization in the presence of label noise. We find that many popular losses used in risk minimization can be decomposed into two parts, where the first part won’t be affected and only the second part will be affected by noisy labels. We show that the influence of noisy labels on the second part can be reduced by our proposed LICS (Labeled Instance Centroid Smoothing) approach. The effectiveness of the LICS algorithm is justified both theoretically and empirically.

JBHI Journal 2015 Journal Article

A Novel Classification Method for Prediction of Rectal Bleeding in Prostate Cancer Radiotherapy Based on a Semi-Nonnegative ICA of 3D Planned Dose Distributions

  • Julie Coloigner
  • Aureline Fargeas
  • Amar Kachenoura
  • Lu Wang
  • Gael Drean
  • Caroline Lafond
  • Lotfi Senhadji
  • Renaud de Crevoisier

The understanding of dose/side-effects relationships in prostate cancer radiotherapy is crucial to define appropriate individual's constraints for the therapy planning. Most of the existing methods to predict side-effects do not fully exploit the rich spatial information conveyed by the three-dimensional planned dose distributions. We propose a new classification method for three-dimensional individuals’ doses, based on a new semi-nonnegative ICA algorithm to identify patients at risk of presenting rectal bleeding from a population treated for prostate cancer. The method first determines two bases of vectors from the population data: the two bases span vector subspaces, which characterize patients with and without rectal bleeding, respectively. The classification is then achieved by calculating the distance of a given patient to the two subspaces. The results, obtained on a cohort of 87 patients (at two year follow-up) treated with radiotherapy, showed high performance in terms of sensitivity and specificity.

ICRA Conference 2014 Conference Paper

Switching control of attitude tracking on a quadrotor UAV for large-angle rotational maneuvers

  • Lu Wang
  • Jianbo Su

This paper studies an attitude tracking control system of a quadrotor unmanned aerial vehicle (UAV) under the condition of large-angle rotational maneuvers. We first established the attitude error model, taking both external disturbances and internal uncertainties into account. Thereafter, a switching control strategy is proposed for both high tracking accuracy and velocity constraints. Experiments on attitude tracking validate higher control accuracy with proposed method. Tasks of flight at unknown initial attitude and flip are also presented to verify the effectiveness of this method under large-angle rotational maneuverability.