Author name cluster

Yu Lu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

18 papers

2 author rows

JBHI Journal 2026 Journal Article

Fog/Edge-Aware State Space Models for Multi-Task Chest X-ray Report Generation and Lesion Detection

Wenbin Feng
Yu Lu
Xiaoqing Li
Kai Leung Yung
Wai Hung Ip

Artificial intelligence (AI) is transforming radiology, particularly in automating medical report generation and abnormality detection. Although recent AI systems have shown clear potential in reducing radiologists' workloads and improving diagnostic accuracy, they still suffer from high computational cost and limited efficiency when modeling long-range dependencies. To address these challenges, we propose two state spacemodel (SSM) based frameworks: MambaXray-CTL for medical report generation, and MambaXray-MTL for unified report generation and abnormality localization. Both frameworks integrate a lightweight Mamba-based vision encoder with a large language model (LLM) decoder and incorporate multi-stage contrastive learning to align visual and textual representations. MambaXray-CTL achieves state-of-the-art performance on the IU X-ray and CheXpertPlus datasets while substantially reducing computational overhead compared with Vision Transformer models. MambaXray-MTL further extends this capability through a multi-task learning design that produces clinically coherent reports and accurately localizes abnormalities. Experimental results demonstrate the effectiveness of combining state space models with contrastive learning to deliver efficient, interpretable, and deployable AI solutions for chest radiograph analysis.

Details DOI

AAAI Conference 2025 Conference Paper

Domain Generalized Medical Landmark Detection via Robust Boundary-Aware Pre-Training

Haifan Gong
Yu Lu
Xiang Wan
Haofeng Li

In recent years, deep learning has revenue in automated medical landmark detection. Nonetheless, prevailing research in this field predominantly addresses single-center scenarios or domain adaptation settings. In practical environments, the acquisition of multi-center data faces privacy concerns, coupled with the time-intensive and costly nature of data collection and annotation. These challenges substantially impede the broader application of deep learning-based medical landmark detection. To mitigate these issues, we propose a novel domain-generalized medical landmark detection framework that relies solely on single-center data for training. Considering the availability of numerous public medical segmentation datasets, we design a simple yet effective method that utilizes single-center segmentation to enhance the domain generalization capabilities of the landmark detection task. Specifically, we introduce a novel boundary-aware pre-training approach to focus the model on regions pertinent to landmarks. To further enhance the robustness and generalization capabilities during pre-training, we have derived a mixing loss term and proved its effectiveness in theory and practice. Extensive experiments conducted on our new domain generalization benchmark for medical landmark detection demonstrate the superiority of our approach.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer

Zechuan Zhang
Ji Xie
Yu Lu
Zongxin Yang
Yi Yang

Instruction-based image editing enables precise modifications via natural language prompts, but existing methods face a precision-efficiency tradeoff: fine-tuning demands massive datasets (>10M) and computational resources, while training-free approaches suffer from weak instruction comprehension. We address this by proposing \textbf{ICEdit}, which leverages the inherent comprehension and generation abilities of large-scale Diffusion Transformers (DiTs) through three key innovations: (1) An in-context editing paradigm without architectural modifications; (2) Minimal parameter-efficient fine-tuning for quality improvement; (3) Early Filter Inference-Time Scaling, which uses VLMs to select high-quality noise samples for efficiency. Experiments show that ICEdit achieves state-of-the-art editing performance with only 0. 1\% of the training data and 1\% trainable parameters compared to previous methods. Our approach establishes a new paradigm for balancing precision and efficiency in instructional image editing.

PDF Details

AAAI Conference 2025 Short Paper

Extended LSTMs for Knowledge Tracing: Peeking Inside the Black Box (Student Abstract)

Deliang Wang
Yu Lu
Gaowei Chen

This paper proposes extended Long Short-Term Memory (LSTM) networks for the knowledge tracing task and employs explainable AI methods to address interpretability issues. Specifically, we developed an extended LSTM-based model to automatically diagnose students' knowledge states. We then leveraged three interpreting methods—gradient sensitivity, gradient*input, and Deep SHAP—to explain the model's predictions by computing input contributions. The results demonstrate that the proposed model outperforms DKT, and the three methods effectively explain its predictions. Additionally, we identified three key insights into the model's working mechanisms.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

FlexSelect: Flexible Token Selection for Efficient Long Video Understanding

yunzhu zhang
Yu Lu
Tianyi Wang
Fengyun Rao
Yi Yang
Linchao Zhu

Long-form video understanding poses a significant challenge for video large language models (VideoLLMs) due to prohibitively high computational and memory demands. In this paper, We propose $\textbf{FlexSelect}$, a flexible and efficient token selection strategy for processing long videos. FlexSelect identifies and retains the most semantically relevant content by leveraging cross-modal attention patterns from a reference transformer layer. It comprises two key components: (1) $\textbf{a training-free token ranking pipeline}$ that leverages faithful cross-modal attention weights to estimate each video token’s importance, and (2) $\textbf{a rank-supervised lightweight selector}$ that is trained to replicate these rankings and filter redundant tokens. This generic approach can be seamlessly integrated into various VideoLLM architectures, such as LLaVA-Video, InternVL and Qwen-VL, serving as a plug-and-play module to extend their temporal context length. Empirically, FlexSelect delivers strong gains across multiple long-video benchmarks – including VideoMME, MLVU, LongVB, and LVBench. Morever, it achieves significant speed-ups ($\textit{e. g. ,}$ up to 9 $\times$ on a LLaVA-Video-7B model), highlighting FlexSelect’s promise for efficient long-form video understanding. Project page: https: //flexselect. github. io

PDF Details

JBHI Journal 2025 Journal Article

Non-invasive Detection of Adenoid Hypertrophy Using Deep Learning Based on Heart-Lung Sounds

Shengchang Xiao
Xueshuai Zhang
Yu Lu
Pengfei Ye
Yanfen Tang
Pengyuan Zhang
Yonghong Yan
Jun Tai

Adenoid hypertrophy is one of the most common upper respiratory tract disorders during childhood, leading to a range of symptoms such as nasal congestion, mouth breathing and obstructive sleep apnea. Current diagnostic methods, including computerized tomography scans and nasal endoscopy, are invasive or involve ionizing radiation, rendering them unsuitable for long-term assessments. To address these clinical challenges, this paper proposes a novel deep learning approach for the non-invasive detection of adenoid hypertrophy using heartlung sounds. Firstly, we established a heart-lung sound database with corresponding labels indicating adenoid size. Subsequently, we employed three different deep learning tasks to explore the association between heart-lung sounds and adenoid size. In particular, it includes binary classification to distinguish between normal and abnormal cases, four-grade classification to assess the severity of adenoid hypertrophy, and regression models to predict the actual size of the adenoids. The experimental results demonstrate that the deep learning models can effectively predict the condition of adenoid hypertrophy based on heart-lung sounds. In resource-constrained clinical environments, the proposed methods for adenoid hypertrophy automatic detection provide a simple and non-invasive approach, which can reduce healthcare costs and facilitate remote self-screening.

Details DOI

NeurIPS Conference 2024 Conference Paper

Automated Multi-level Preference for MLLMs

Mengxi Zhang
Wenhao Wu
Yu Lu
YuXin Song
Kang Rong
Huanjin Yao
Jianbo Zhao
Fanglong Liu

Current multimodal Large Language Models (MLLMs) suffer from ''hallucination'', occasionally generating responses that are not grounded in the input images. To tackle this challenge, one promising path is to utilize reinforcement learning from human feedback (RLHF), which steers MLLMs towards learning superior responses while avoiding inferior ones. We rethink the common practice of using binary preferences ( i. e. , superior, inferior), and find that adopting multi-level preferences ( e. g. , superior, medium, inferior) is better for two benefits: 1) It narrows the gap between adjacent levels, thereby encouraging MLLMs to discern subtle differences. 2) It further integrates cross-level comparisons (beyond adjacent-level comparisons), thus providing a broader range of comparisons with hallucination examples. To verify our viewpoint, we present the Automated Multi-level Preference ( AMP ) framework for MLLMs. To facilitate this framework, we first develop an automated dataset generation pipeline that provides high-quality multi-level preference datasets without any human annotators. Furthermore, we design the Multi-level Direct Preference Optimization (MDPO) algorithm to robustly conduct complex multi-level preference learning. Additionally, we propose a new hallucination benchmark, MRHal-Bench. Extensive experiments across public hallucination and general benchmarks, as well as our MRHal-Bench, demonstrate the effectiveness of our proposed method. Code is available at https: //github. com/takomc/amp.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

FreeLong: Training-Free Long Video Generation with SpectralBlend Temporal Attention

Yu Lu
Yuanzhi Liang
Linchao Zhu
Yi Yang

Video diffusion models have made substantial progress in various video generation applications. However, training models for long video generation tasks require significant computational and data resources, posing a challenge to developing long video diffusion models. This paper investigates a straightforward and training-free approach to extend an existing short video diffusion model (e. g. pre-trained on 16-frame videos) for consistent long video generation (e. g. 128 frames). Our preliminary observation has found that directly applying the short video diffusion model to generate long videos can lead to severe video quality degradation. Further investigation reveals that this degradation is primarily due to the distortion of high-frequency components in long videos, characterized by a decrease in spatial high-frequency components and an increase in temporal high-frequency components. Motivated by this, we propose a novel solution named FreeLong to balance the frequency distribution of long video features during the denoising process. FreeLong blends the low-frequency components of global video features, which encapsulate the entire video sequence, with the high-frequency components of local video features that focus on shorter subsequences of frames. This approach maintains global consistency while incorporating diverse and high-quality spatiotemporal details from local videos, enhancing both the consistency and fidelity of long video generation. We evaluated FreeLong on multiple base video diffusion models and observed significant improvements. Additionally, our method supports coherent multi-prompt generation, ensuring both visual coherence and seamless transitions between scenes. Our project page is at: https: //yulu. net. cn/freelong.

PDF Details DOI

EAAI Journal 2024 Journal Article

Unmanned Aerial Vehicles anomaly detection model based on sensor information fusion and hybrid multimodal neural network

Hongli Deng
Yu Lu
Tao Yang
Ziyu Liu
JiangChuan Chen

The use of Unmanned Aerial Vehicle (UAV) in various industries is increasing, which places higher requirements on the reliability of UAV. One of the ways to ensure the safety of UAV flights is by detecting anomalies in their flight. However, traditional UAV anomaly detection models have some shortcomings. First, they fail to integrate data from multiple sensors across time and frequency domains, hampering the anomaly detection model's ability to accurately assess the UAV's status. Second, they apply the same prediction error loss to all classes, which result in excessive false positives in some key classes. Finally, most of them used unimodal classification models to process data from multiple heterogeneous sensors, which makes it difficult for the models to extract targeted features. This paper proposes a UAV anomaly detection model based on sensor information fusion and hybrid multimodal neural network (IF-HMNN). Firstly, facilitated by the newly devised Multi-source Heterogeneous UAV Sensor Information Alignment algorithm (MHSIA), IF-HMNN can realize information fusion from multiple sensors. Secondly, a classes weight assignment mechanism is designed to increase the IF-HMNN's focus on key classes. Finally, the neural networks of two modalities are trained separately according to different time-frequency domain features, and their classification outcomes are amalgamated through a hybrid soft voting mechanism. Experimental results show that IF-HMNN achieves accuracy of 0. 99, 0. 9991, and 0. 9967 on three datasets respectively. The accuracy of IF-HMNN model on the test set is about 2 %–3 % higher than similar models. We will publish our code as well as the dataset here: https: //github. com/FishLuYu/IF-HMNN.

Details DOI

AAAI Conference 2021 System Paper

An Intelligent Assistant for Problem Behavior Management

Penghe Chen
Yu Lu
Jiefei Liu
Qi Xu

We design and implement an intelligent assistant, called PB- Advisor, to advise teachers and parents on students’ problem behaviors. It utilizes a task-oriented dialogue system to identify the need deficiency underlying students’ problem behaviors, and relies on a community question answering system to provide advice on typical problem behavior management. In addition, it also provides various learning resources, and illustrates the relations between influential factors on typical problem behaviors through data analysis. With PB-Advisor, teachers and parents without psychological expertise can easily find proper advice on students’ problem behaviors.

PDF Details

AAAI Conference 2021 System Paper

RadarMath: An Intelligent Tutoring System for Math Education

Yu Lu
Yang Pian
Penghe Chen
Qinggang Meng
Yunbo Cao

We propose and implement a novel intelligent tutoring system, called RadarMath, to support intelligent and personalized learning for math education. The system provides the services including automatic grading and personalized learning guidance. Specifically, two automatic grading models are designed to accomplish the tasks for scoring the text-answer and formula-answer questions respectively. An education-oriented knowledge graph with the individual learner’s knowledge state is used as the key tool for guiding the personalized learning process. The system demonstrates how the relevant AI techniques could be applied in today’s intelligent tutoring systems.

PDF Details

EAAI Journal 2021 Journal Article

The object-oriented dynamic task assignment for unmanned surface vessels

Bin Du
Yu Lu
Xiaotong Cheng
Weidong Zhang
Xuesong Zou

This paper investigates the task assignment and guidance issues of unmanned surface vessels (USVs) interception. When the USVs formation is invaded by some moving objects during its escort, it is necessary for the unmanned systems to assign defenders to prevent attackers approaching the vulnerable target in antagonistic scenarios. This action requires efficient guidance and task assignment strategies. With this in mind, this paper presents the Integral Proportional Navigation Guidance (IPNG) with Tabu Dynamic Consensus-Based Auction Algorithm (TDCBAA) in marine interception scenario. First, IPNG is introduced in the interception game considering the USV kinematic model, which can effectively reduce the individual interception time. Second, a new bidding function is designed for moving objects interception with the consideration of the attackers’ types, positions and interception time. Finally, a TDCBAA is designed to solve the task assignment subproblem, resulting in a shorter overall interception time and a higher interception success rate. Simulations demonstrate that the proposed algorithm can optimize the allocation of defenders in real-time and intercept the attackers more quickly compared with other classical algorithms, which is more suitable in situations where attackers are approaching from all directions.

Details DOI

AAAI Conference 2020 Conference Paper

Object Instance Mining for Weakly Supervised Object Detection

Chenhao Lin
Siwen Wang
Dongqi Xu
Yu Lu
Wayne Zhang

Weakly supervised object detection (WSOD) using only image-level annotations has attracted growing attention over the past few years. Existing approaches using multiple instance learning easily fall into local optima, because such mechanism tends to learn from the most discriminative object in an image for each category. Therefore, these methods suffer from missing object instances which degrade the performance of WSOD. To address this problem, this paper introduces an end-to-end object instance mining (OIM) framework for weakly supervised object detection. OIM attempts to detect all possible object instances existing in each image by introducing information propagation on the spatial and appearance graphs, without any additional annotations. During the iterative learning process, the less discriminative object instances from the same class can be gradually detected and utilized for training. In addition, we design an object instance reweighted loss to learn larger portion of each object instance to further improve the performance. The experimental results on two publicly available databases, VOC 2007 and 2012, demonstrate the efﬁcacy of proposed approach.

PDF Details

AIIM Journal 2020 Journal Article

Prediction of fetal weight at varying gestational age in the absence of ultrasound examination using ensemble learning

Yu Lu
Xianghua Fu
Fangxiong Chen
Kelvin K.L. Wong

Obstetric ultrasound examination of physiological parameters has been mainly used to estimate the fetal weight during pregnancy and baby weight before labour to monitor fetal growth and reduce prenatal morbidity and mortality. However, the problem is that ultrasound estimation of fetal weight is subject to population’s difference, strict operating requirements for sonographers, and poor access to ultrasound in low-resource areas. Inaccurate estimations may lead to negative perinatal outcomes. This study aims to predict fetal weight at varying gestational age in the absence of ultrasound examination within a certain accuracy. We consider that machine learning can provide an accurate estimation for obstetricians alongside traditional clinical practices, as well as an efficient and effective support tool for pregnant women for self-monitoring. We present a robust methodology using a data set comprising 4212 intrapartum recordings. The cubic spline function is used to fit the curves of several key characteristics that are extracted from ultrasound reports. A number of simple and powerful machine learning algorithms are trained, and their performance is evaluated with real test data. We also propose a novel evaluation performance index called the intersection-over-union (loU) for our study. The results are encouraging using an ensemble model consisting of Random Forest, XGBoost, and LightGBM algorithms. The experimental results show the loU between predicted range of fetal weight at any gestational age that is given by the ensemble model and ultrasound respectively. The machine learning based approach applied in our study is able to predict, with a high accuracy, fetal weight at varying gestational age in the absence of ultrasound examination.

Details DOI

ICML Conference 2017 Conference Paper

Provably Optimal Algorithms for Generalized Linear Contextual Bandits

Lihong Li 0001
Yu Lu
Dengyong Zhou

Contextual bandits are widely used in Internet services from news recommendation to advertising, and to Web search. Generalized linear models (logistical regression in particular) have demonstrated stronger performance than linear models in many applications where rewards are binary. However, most theoretical analyses on contextual bandits so far are on linear bandits. In this work, we propose an upper confidence bound based algorithm for generalized linear contextual bandits, which achieves an $\sim O(\sqrt{dT})$ regret over T rounds with d dimensional feature vectors. This regret matches the minimax lower bound, up to logarithmic terms, and improves on the best previous result by a $\sqrt{d}$ factor, assuming the number of arms is fixed. A key component in our analysis is to establish a new, sharp finite-sample confidence bound for maximum likelihood estimates in generalized linear models, which may be of independent interest. We also analyze a simpler upper confidence bound algorithm, which is useful in practice, and prove it to have optimal regret for certain cases.

Details

IJCAI Conference 2016 Conference Paper

An Intelligent System for Taxi Service Monitoring, Analytics and Visualization

Yu Lu
Gim Guan Chua
Huayu Wu
Clement Shi Qi Ong

The fast advancement in sensor data acquisition and communication technology greatly facilitates the collection of data from taxis, and thus enables analyzing the citywide taxi service system. In this paper, we present a novel and practical system for taxi service monitoring, analytics and visualization. By utilizing both of the buffered streaming and the large-size historical taxi data, the system focuses on wait time estimation (for both passengers and taxi drivers), citywide taxi pickup/dropoff hotspots, as well as the taxi trip distributions. The three-dimensional (3D) visualization is designed for users to access the analytics results and understand the characteristics of the taxi service.

PDF Details

ICML Conference 2016 Conference Paper

Exact Exponent in Optimal Rates for Crowdsourcing

Chao Gao
Yu Lu
Dengyong Zhou

Crowdsourcing has become a popular tool for labeling large datasets. This paper studies the optimal error rate for aggregating crowdsourced labels provided by a collection of amateur workers. Under the Dawid-Skene probabilistic model, we establish matching upper and lower bounds with an exact exponent mI(\pi), where m is the number of workers and I(\pi) is the average Chernoff information that characterizes the workers’ collective ability. Such an exact characterization of the error exponent allows us to state a precise sample size requirement m \ge \frac1I(\pi)\log\frac1ε in order to achieve an εmisclassification error. In addition, our results imply optimality of various forms of EM algorithms given accurate initializers of the model parameters.

Details

JMLR Journal 2016 Journal Article

Optimal Estimation and Completion of Matrices with Biclustering Structures

Chao Gao
Yu Lu
Zongming Ma
Harrison H. Zhou

Biclustering structures in data matrices were first formalized in a seminal paper by John Hartigan (Hartigan, 1972) where one seeks to cluster cases and variables simultaneously. Such structures are also prevalent in block modeling of networks. In this paper, we develop a theory for the estimation and completion of matrices with biclustering structures, where the data is a partially observed and noise contaminated matrix with a certain underlying biclustering structure. In particular, we show that a constrained least squares estimator achieves minimax rate-optimal performance in several of the most important scenarios. To this end, we derive unified high probability upper bounds for all sub-Gaussian data and also provide matching minimax lower bounds in both Gaussian and binary cases. Due to the close connection of graphon to stochastic block models, an immediate consequence of our general results is a minimax rate- optimal estimator for sparse graphons. [abs] [ pdf ][ bib ] &copy JMLR 2016. ( edit, beta )

PDF Details