Author name cluster

Lu Lu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

12 papers

2 author rows

AIIM Journal 2026 Journal Article

Application research of dynamic chaotic sequence generation mechanism in pre-hospital emergency data encryption

Wei Han
Lu Lu
Jingtao Ma
Qin Li
Zhuang Li

Background and objectives In the context of pre-hospital emergency care, the security of patients' physiological data has become increasingly important due to the widespread use of portable and wearable devices. This study aims to explore the application of dynamic chaotic sequence generation mechanisms for data encryption in pre-hospital emergency. Methods In this study, a chaotic encryption method is proposed in which the initial key is generated using characteristic waveforms and iterative counting, and the dynamic key update is performed using the time-varying properties of pulse-wave signals and the sensitivity of chaotic sequences. The algorithm is capable of adapting its complexity to the requisite security level, whilst concomitantly managing energy consumption. The chaotic encryption scheme under scrutiny consists of normalizing the pulse waveform and iteratively applying Tent and Logistic mappings to generate pseudo-random sequences. Results The system has been subjected to rigorous autocorrelation, SEN, NIST (National Institute of Standards and Technology) stochasticity tests, and cryptographic security evaluations. The results of these tests demonstrate that the combined chaotic system effectively mitigates the cyclic effect and enhances stochasticity. The analysis revealed that the cryptographic performance of the 32-bit fixed-point system in image encryption is comparable to that of the floating-point system, ensuring high efficiency and security of encryption. Conclusions This study highlights the potential of dynamic chaotic sequence generation for secure data transmission in emergency medical environments and paves the way for further exploration and optimization in real-time applications.

Details DOI

EAAI Journal 2026 Journal Article

Learning multi-physical system on a unified manifold by collaboratively fused features

Linzheng Wang
Zituo Chen
Yaojun Li
Ruiqu Deng
Ruizhi Zhang
Yonghao Luo
Lu Lu
Sili Deng

Understanding and controlling the spatio-temporal dynamics of multi-physics systems is critical to both natural and industrial processes, yet partial and sparse measurements often impede accurate characterization. We propose the Collaboratively Fused Feature (CoFFe) framework, a self-supervised pre-training strategy that unifies disparate physical fields onto a low-dimensional manifold. CoFFe flexibly integrates a self-attention-based field encoder module with a DeepONet-based field decoder module, forming the backbone of an efficient multi-field sparse reconstruction workflow. By employing an iterative random sparse sampling pre-training strategy, CoFFe seamlessly adapts to dynamic meshes and irregular geometries, and circumvents the computational burden associated with full-order model processing. When fine-tuned for downstream tasks with only sparse sensors from partial physical fields, it reliably extracts the system-level feature, enabling the rapid and accurate reconstruction of all physical fields. We validate CoFFe on three complex multi-physics engineering problems, including the thermal degradation of fuel particles, the operation of an alkaline water electrolyzer, and combustion in a biomass grate furnace. Our results demonstrate that, whether addressing temporally confined signals, spatially restricted sensor layouts, or randomly distributed sparse measurements in irregular computational domains, pre-trained CoFFe consistently achieves stable and fast multi-field reconstructions. Moreover, its ability to integrate information from different physical fields leads to enhanced reconstruction accuracy even under sparse measurement conditions. Additionally, CoFFe’s robust sparse pattern recognition empowers diverse downstream tasks, including parameter inversion, sensor optimization, and few-shot learning for previously unseen variables. These capabilities demonstrate its transformative potential to advance intelligent industrial systems and reshape the analysis of multi-physics phenomena.

Details DOI

AAAI Conference 2026 Conference Paper

Q Cache: Visual Attention Is Valuable in Less than Half of Decode Layers for Multimodal Large Language Model

Jiedong Zhuang
Lu Lu
Ming Dai
Rui Hu
Jian Chen
Qiang Liu
Haoji Hu

Multimodal large language models (MLLMs) are plagued by exorbitant inference costs attributable to the profusion of visual tokens within the vision encoder. The redundant visual tokens engenders a substantial computational load and key-value (KV) cache footprint bottleneck. Existing approaches focus on token-wise optimization, leveraging diverse intricate token pruning techniques to eliminate non-crucial visual tokens. Nevertheless, these methods often unavoidably undermine the integrity of the KV cache, resulting in failures in long-text generation tasks. To this end, we conduct an in-depth investigation towards the attention mechanism of the model from a new perspective, and discern that attention within more than half of all decode layers are semantic similar. Upon this finding, we contend that the attention in certain layers can be streamlined by inheriting the attention from their preceding layers. Consequently, we propose Lazy Attention, an efficient attention mechanism that enables cross-layer sharing of similar attention patterns. It ingeniously reduces layer-wise redundant computation in attention. In Lazy Attention, we develop a novel layer-shared cache, Q Cache, tailored for MLLMs, which facilitates the reuse of queries across adjacent layers. In particular, Q Cache is lightweight and fully compatible with existing inference frameworks, including Flash Attention and KV cache. Additionally, our method is highly flexible as it is orthogonal to existing token-wise techniques and can be deployed independently or combined with token pruning approaches. Empirical evaluations on multiple benchmarks demonstrate that our method can reduce KV cache usage by over 35% and achieve 1.5x throughput improvement, while sacrificing only approximately 1% of performance on various MLLMs. Compared with SOTA token-wise methods, our technique achieves superior accuracy preservation.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

SALMONN-omni: A Standalone Speech LLM without Codec Injection for Full-duplex Conversation

Wenyi Yu
Siyin Wang
Xiaoyu Yang
Xianzhao Chen
Xiaohai Tian
Jun Zhang
Guangzhi Sun
Lu Lu

In order to enable fluid and natural human-machine speech interaction, existing full-duplex conversational systems often adopt modular architectures with auxiliary components such as voice activity detectors, interrupters, conversation state predictors, or multiple LLMs. These systems, however, suffer from error accumulation across modules and struggle with key challenges such as context-dependent barge-in and echo cancellation. Recent approaches, most notably Moshi, simplify the pipeline by injecting audio codecs into the token space of a single LLM. However, such methods still incur significant performance degradation when operating on the speech rather than text modality. In this paper, we introduce SALMONN-omni, the first single, standalone full-duplex speech LLM that operates without audio codecs in its token space. It features a novel dynamic thinking mechanism within the LLM backbone, enabling the model to learn when to transition between speaking and listening states. Experiments on widely used benchmarks for spoken question answering and open-domain dialogue show that SALMONN-omni achieves at least 30\% relative performance improvement over existing open-source full-duplex models and performs highly competitively to half-duplex and turn-based systems, despite using substantially less training data. Moreover, SALMONN-omni demonstrates strong performance in complex conversational scenarios, including turn-taking, backchanneling, echo cancellation and context-dependent barge-in, with further improvements achieved through reinforcement learning. Some demo conversations between user and SALMONN-omni are provided in the following repository https: //github. com/bytedance/SALMONN.

PDF Details

AAAI Conference 2025 Conference Paper

ST3: Accelerating Multimodal Large Language Model by Spatial-Temporal Visual Token Trimming

Jiedong Zhuang
Lu Lu
Ming Dai
Rui Hu
Jian Chen
Qiang Liu
Haoji Hu

Multimodal large language models (MLLMs) enhance their perceptual capabilities by integrating visual and textual information. However, processing the massive number of visual tokens incurs a significant computational cost. Existing analysis of the MLLM attention mechanisms remains shallow, leading to coarse-grain token pruning strategies that fail to effectively balance speed and accuracy. In this paper, we conduct a comprehensive investigation of MLLM attention mechanisms with LLaVA. We find that numerous visual tokens and partial attention computations are redundant during the decoding process. Based on this insight, we propose Spatial-Temporal Visual Token Trimming (ST3), a framework designed to accelerate MLLM inference without retraining. ST3 consists of two primary components: 1) Progressive Visual Token Pruning (PVTP), which eliminates inattentive visual tokens across layers, and 2) Visual Token Annealing (VTA), which dynamically reduces the number of visual tokens in each layer as the generated tokens grow. Together, these techniques deliver around 2x faster inference with only about 30% KV cache memory compared to the original LLaVA, while maintaining consistent performance across various datasets. Crucially, ST3 can be seamlessly integrated into existing pre-trained MLLMs, providing a plug-and-play solution for efficient inference.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

PINNacle: A Comprehensive Benchmark of Physics-Informed Neural Networks for Solving PDEs

Zhongkai Hao
Jiachen Yao
Chang Su
Hang Su
Ziao Wang
Fanzhi Lu
Zeyu Xia
Yichi Zhang

While significant progress has been made on Physics-Informed Neural Networks (PINNs), a comprehensive comparison of these methods across a wide range of Partial Differential Equations (PDEs) is still lacking. This study introduces PINNacle, a benchmarking tool designed to fill this gap. PINNacle provides a diverse dataset, comprising over 20 distinct PDEs from various domains, including heat conduction, fluid dynamics, biology, and electromagnetics. These PDEs encapsulate key challenges inherent to real-world problems, such as complex geometry, multi-scale phenomena, nonlinearity, and high dimensionality. PINNacle also offers a user-friendly toolbox, incorporating about 10 state-of-the-art PINN methods for systematic evaluation and comparison. We have conducted extensive experiments with these methods, offering insights into their strengths and weaknesses. In addition to providing a standardized means of assessing performance, PINNacle also offers an in-depth analysis to guide future research, particularly in areas such as domain decomposition methods and loss reweighting for handling multi-scale problems and complex geometry. To the best of our knowledge, it is the largest benchmark with a diverse and comprehensive evaluation that will undoubtedly foster further research in PINNs.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words

Junyi Ao
Yuancheng Wang
Xiaohai Tian
Dekun Chen
Jun Zhang
Lu Lu
Yuxuan Wang
Haizhou Li

Speech encompasses a wealth of information, including but not limited to content, paralinguistic, and environmental information. This comprehensive nature of speech significantly impacts communication and is crucial for human-computer interaction. Chat-Oriented Large Language Models (LLMs), known for their general-purpose assistance capabilities, have evolved to handle multi-modal inputs, including speech. Although these models can be adept at recognizing and analyzing speech, they often fall short of generating appropriate responses. We argue that this is due to the lack of principles on task definition and model development, which requires open-source datasets and metrics suitable for model evaluation. To bridge the gap, we present SD-Eval, a benchmark dataset aimed at multidimensional evaluation of spoken dialogue understanding and generation. SD-Eval focuses on paralinguistic and environmental information and includes 7, 303 utterances, amounting to 8. 76 hours of speech data. The data is aggregated from eight public datasets, representing four perspectives: emotion, accent, age, and background sound. To assess the SD-Eval benchmark dataset, we implement three different models and construct a training set following a process similar to that of SD-Eval. The training set contains 1, 052. 72 hours of speech data and 724. 4k utterances. We also conduct a comprehensive evaluation using objective evaluation methods (e. g. BLEU and ROUGE), subjective evaluations and LLM-based metrics for the generated responses. Models conditioned with paralinguistic and environmental information outperform their counterparts in both objective and subjective measures. Moreover, experiments demonstrate that LLM-based metrics show a higher correlation with human evaluation compared to traditional metrics. We open-source SD-Eval at https: //github. com/amphionspace/SD-Eval.

PDF Details DOI

IJCAI Conference 2023 Conference Paper

AudioQR: Deep Neural Audio Watermarks For QR Code

Xinghua Qu
Xiang Yin
Pengfei Wei
Lu Lu
Zejun Ma

Image-based quick response (QR) code is frequently used, but creates barriers for the visual impaired people. With the goal of ``AI for good", this paper proposes the AudioQR, a barrier-free QR coding mechanism for the visually impaired population via deep neural audio watermarks. Previous audio watermarking approaches are mainly based on handcrafted pipelines, which is less secure and difficult to apply in large-scale scenarios. In contrast, AudioQR is the first comprehensive end-to-end pipeline that hides watermarks in audio imperceptibly and robustly. To achieve this, we jointly train an encoder and decoder, where the encoder is structured as a concatenation of transposed convolutions and multi-receptive field fusion modules. Moreover, we customize the decoder training with a stochastic data augmentation chain to make the watermarked audio robust towards different audio distortions, such as environment background, room impulse response when playing through the air, music surrounding, and Gaussian noise. Experiment results indicate that AudioQR can efficiently hide arbitrary information into audio without introducing significant perceptible difference. Our code is available at https: //github. com/xinghua-qu/AudioQR.

PDF Details DOI

EAAI Journal 2023 Journal Article

Feature selection using a sinusoidal sequence combined with mutual information

Gaoteng Yuan
Lu Lu
Xiaofeng Zhou

Data classification is the most common task in machine learning, and feature selection is the key step in the classification task. Common feature selection methods mainly analyze the maximum correlation and minimum redundancy between feature factors and tags while ignoring the impact of the number of key features, which will inevitably lead to waste in subsequent classification training. To solve this problem, a feature selection algorithm (SSMI) based on the combination of sinusoidal sequences and mutual information is proposed. First, the mutual information between each feature and tag is calculated, and the interference information in high-dimensional data is removed according to the mutual information value. Second, a sine function is constructed, and sine ordering is carried out according to the mutual information value and feature mean value between different categories of the same feature. By adjusting the period and phase value of the sequence, the feature set with the largest difference is found, and the subset of key features is obtained. Finally, three machine learning classifiers (KNN, RF, SVM) are used to classify key feature subsets, and several feature selection algorithms (JMI, mRMR, CMIM, SFS, etc.) are compared to verify the advantages and disadvantages of different algorithms. Compared with other feature selection methods, the SSMI algorithm obtains the least number of key features, with an average reduction of 15 features. The average classification accuracy has been improved by 3% on the KNN classifier. On the HBV and SDHR datasets, the SSMI algorithm achieved classification accuracy of 81. 26% and 83. 12%, with sensitivity and specificity results of 76. 28%, 87. 39% and 68. 14%, 86. 11%, respectively. This shows that the SSMI algorithm can achieve higher classification accuracy with a smaller feature subset.

Details DOI

YNICL Journal 2020 Journal Article

Neural primacy of the dorsolateral prefrontal cortex in patients with obsessive-compulsive disorder

Hailong Li
Xinyu Hu
Yingxue Gao
Lingxiao Cao
Lianqing Zhang
Xuan Bu
Lu Lu
Yanlin Wang

The dorsolateral prefrontal cortex (DLPFC), a key structure in the executive system, has consistently emerged as a crucial element in the pathophysiology of obsessive-compulsive disorder (OCD). However, the neural primacy of the DLPFC remains elusive in this disorder. We investigated the causal interaction (measured by effective connectivity) between the DLPFC and the remaining brain areas using bivariate Granger causality analysis of resting-state fMRI collected from 88 medication-free OCD patients and 88 matched healthy controls. Additionally, we conducted seed-based functional connectivity (FC) analyses to identify network-level neural functional alterations using the bilateral DLPFC as seeds. OCD patients demonstrated reduced FC between the right DLPFC and right orbitofrontal cortex (OFC), and activity in the right OFC had an inhibitory effect on the right DLPFC. Additionally, we observed alterations in both feedforward and reciprocal influences between the inferior temporal gyrus (ITG) and the DLPFC in patients. Furthermore, activity in the cerebellum had an excitatory influence on the right DLPFC in OCD patients. These findings may help to elucidate the psychopathology of OCD by detailing the directional connectivity between the DLPFC and the rest of the brain, ultimately helping to identify regions that could serve as treatment targets in OCD.

Details DOI

ICRA Conference 2020 Conference Paper

Velocity Field based Active-Assistive Control for Upper Limb Rehabilitation Exoskeleton Robot

En-Yu Chia
Yi-Lian Chen
Tzu-Chieh Chien
Ming-Li Chiang
Li-Chen Fu
Jin-Shin Lai
Lu Lu

There are limitations of conventional active-assistive control for upper limb rehabilitation exoskeleton robot, such as 1). prior time-dependent trajectories are generally required, 2). task-based rehabilitation exercise involving multi-joint motion is hard to implement, and 3). assistive mechanism normally is so inflexible that the resulting exercise performed by the subjects becomes inefficient. In this paper, we propose a novel velocity field based active-assistive control system to address these issues. First, we design a Kalman filter based interactive torque observer to obtain subjects' active intention of motion. Next, a joint-position-dependent velocity field which can be automatically generated via the task motion pattern is proposed to provide the time-independent assistance to the subjects. We further propose a novel integration method that combines the active and assistive motions based on the performance and the involvement of subjects to guide them to perform the task more voluntarily and precisely. The experiment results show that both the execution time and the subjects' torque exertion are reduced while performing both given single joint tasks and task-oriented multi-joint tasks as compared with the related work in the literature. To sum up, the proposed system not only can efficiently retain subjects' active intention but also can assist them to accomplish the rehabilitation task more precisely.

Details

YNICL Journal 2019 Journal Article

Characteristic alteration of subcortical nuclei shape in medication-free patients with obsessive-compulsive disorder

Lianqing Zhang
Xinyu Hu
Hailong Li
Lu Lu
Bin Li
Xiaoxiao Hu
Xuan Bu
Shi Tang

BACKGROUND: Subcortical nuclei are important components in the pathology model of obsessive-compulsive disorder (OCD), and subregions of these structures subserve different functions that may distinctively contribute to OCD symptoms. Exploration of the subregional-level profile of structural abnormalities of these nuclei is needed to develop a better understanding of the neural mechanism of OCD. METHODS: A total of 83 medication-free, non-comorbid OCD patients and 93 age- and sex-matched healthy controls were recruited, and high-resolution T1-weighted MR images were obtained for all participants. The volume and shape of the subcortical nuclei (including the nucleus accumbens, amygdala, caudate, pallidum, putamen and thalamus) were quantified and compared with an automated parcellation approach and vertex-wise shape analysis using FSL-FIRST software. Sex differences in these measurements were also explored with an exploratory subgroup analysis. RESULTS: Volumetric analysis showed no significant differences between patients and healthy control subjects. Relative to healthy control subjects, the OCD patients showed an expansion of the lateral amygdala (right hemisphere) and right pallidum. These deformities were associated with illness duration and symptom severity of OCD. Exploratory subgroup analysis by sex revealed amygdala deformity in male patients and caudate deformity in female patients. CONCLUSIONS: The lateral amygdala and the dorsal pallidum were associated with OCD. Neuroanatomic evidence of sexual dimorphism was also found in OCD. Our study not only provides deeper insight into how these structures contribute to OCD symptoms by revealing these subregional-level deformities but also suggests that gender effects may be important in OCD studies.

Details DOI