Author name cluster

Jin Lu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

8 papers

2 author rows

NeurIPS Conference 2025 Conference Paper

Harmony in Divergence: Towards Fast, Accurate, and Memory-efficient Zeroth-order LLM Fine-tuning

Qitao Tan
Jun Liu
Zheng Zhan
Caiwen Ding
Yanzhi Wang
Xiaolong Ma
Jaewoo Lee
Jin Lu

Large language models (LLMs) excel across various tasks, but standard first-order (FO) fine-tuning demands considerable memory, significantly limiting real-world deployment. Recently, zeroth-order (ZO) optimization stood out as a promising memory-efficient training paradigm, avoiding backward passes and relying solely on forward passes for gradient estimation, making it attractive for resource-constrained scenarios. However, ZO method lags far behind FO method in both convergence speed and accuracy. To bridge the gap, we introduce a novel layer-wise divergence analysis that uncovers the distinct update pattern of FO and ZO optimization. Aiming to resemble the learning capacity of FO method from the findings, we propose \textbf{Di}vergence-driven \textbf{Z}eroth-\textbf{O}rder (\textbf{DiZO}) optimization. DiZO conducts divergence-driven layer adaptation by incorporating projections to ZO updates, generating diverse-magnitude updates precisely scaled to layer-wise individual optimization needs. Our results demonstrate that DiZO significantly reduces the needed iterations for convergence without sacrificing throughput, cutting training GPU hours by up to 48\% on various datasets. Moreover, DiZO consistently outperforms the representative ZO baselines in fine-tuning RoBERTa-large, OPT-series, and Llama-series on downstream tasks and, in some cases, even surpasses memory-intensive FO fine-tuning. Our code is released at \url{https: //github. com/Skilteee/DiZO}.

PDF Details

AAAI Conference 2024 Conference Paper

LogoStyleFool: Vitiating Video Recognition Systems via Logo Style Transfer

Yuxin Cao
Ziyu Zhao
Xi Xiao
Derui Wang
Minhui Xue
Jin Lu

Video recognition systems are vulnerable to adversarial examples. Recent studies show that style transfer-based and patch-based unrestricted perturbations can effectively improve attack efficiency. These attacks, however, face two main challenges: 1) Adding large stylized perturbations to all pixels reduces the naturalness of the video and such perturbations can be easily detected. 2) Patch-based video attacks are not extensible to targeted attacks due to the limited search space of reinforcement learning that has been widely used in video attacks recently. In this paper, we focus on the video black-box setting and propose a novel attack framework named LogoStyleFool by adding a stylized logo to the clean video. We separate the attack into three stages: style reference selection, reinforcement-learning-based logo style transfer, and perturbation optimization. We solve the first challenge by scaling down the perturbation range to a regional logo, while the second challenge is addressed by complementing an optimization stage after reinforcement learning. Experimental results substantiate the overall superiority of LogoStyleFool over three state-of-the-art patch-based attacks in terms of attack performance and semantic preservation. Meanwhile, LogoStyleFool still maintains its performance against two existing patch-based defense methods. We believe that our research is beneficial in increasing the attention of the security community to such subregional style transfer attacks.

PDF Details DOI

ICRA Conference 2024 Conference Paper

STNet: Spatio-Temporal Fusion-Based SelfAttention for Slip Detection in Visuo-Tactile Sensors

Jin Lu
Bangyan Niu
Huan Ma
Jiafeng Zhu
Jingjing Ji

Slip detection plays a pivotal role in the dexterity of robotics, improving the reliability and precision of manipulations but also contributing to safety, efficiency, and adaptability. Deep learning-based slip detection algorithms commonly difficult to concentrate on key features when faced with dense 3D shape data obtained by visuo-tactile sensors. Data from noncontact locations can interfere with slip judgements and the ignorance of interframe linkage can also lead to slip detection failure. In this paper, a new spatio-temporal sequences fusion-based self-attention, STNet, is proposed to perform slip detection by allocating more attention to the object-sensor contact area when processing complex 3D shape data. A binocular visuo-tactile system (BVTS) is designed and fabricated for dataset construction. The entire 3D shape dataset containing 4 motion patterns, including stationary, pressing, rolling and slipping. Self-attention architecture with and without spatio-temporal sequences fusion mechanism (denoted as STNet and TemNet, respectively) are trained based on the same dataset. The experiments show the validity of STNet, which can reach 98. 91% slip detection accuracy. Meanwhile, the ablation studies confirm the effectiveness of the spatio-temporal sequences fusion mechanism.

Details

EAAI Journal 2023 Journal Article

A new hybrid prediction model of COVID-19 daily new case data

Guohui Li
Jin Lu
Kang Chen
Hong Yang

With the emergence of new mutant corona virus disease 2019 (COVID-19) strains such as Delta and Omicron, the number of infected people in various countries has reached a new high. Accurate prediction of the number of infected people is of far-reaching sig Nificance to epidemiological prevention in all countries of the world. In order to improve the prediction accuracy of COVID-19 daily new case data, a new hybrid prediction model of COVID-19 is proposed, which consists of four modules: decomposition, complexity judgment, prediction and error correction. Firstly, singular spectrum decomposition is used to decompose the COVID-19 data into singular spectrum components (SSC). Secondly, the complexity judgment is innovatively divided into high-complexity SSC and low-complexity SSC by neural network estimation time entropy. Thirdly, an improved LSSVM by GODLIKE optimization algorithm, named GLSSVM, is proposed to improve its prediction accuracy. Then, each low-complexity SSC is predicted by ARIMA, and each high-complexity SSC is predicted by GLSSVM, and the prediction error of each high-complexity SSC is predicted by GLSSVM. Finally, the predicted results are combined and reconstructed. Simulation experiments in Japan, Germany and Russia show that the proposed model has the highest prediction accuracy and the lowest prediction error. Diebold Mariano (DM) test is introduced to evaluate the model comprehensively. Taking Japan as an example, compared with ARIMA prediction model, the RMSE, average error and MAPE of the proposed model are reduced by 93. 17%, 91. 42% and 81. 20% respectively.

Details DOI

NeurIPS Conference 2023 Conference Paper

Mobilizing Personalized Federated Learning in Infrastructure-Less and Heterogeneous Environments via Random Walk Stochastic ADMM

Ziba Parsons
Fei Dou
Houyi Du
Zheng Song
Jin Lu

This paper explores the challenges of implementing Federated Learning (FL) in practical scenarios featuring isolated nodes with data heterogeneity, which can only be connected to the server through wireless links in an infrastructure-less environment. To overcome these challenges, we propose a novel mobilizing personalized FL approach, which aims to facilitate mobility and resilience. Specifically, we develop a novel optimization algorithm called Random Walk Stochastic Alternating Direction Method of Multipliers (RWSADMM). RWSADMM capitalizes on the server's random movement toward clients and formulates local proximity among their adjacent clients based on hard inequality constraints rather than requiring consensus updates or introducing bias via regularization methods. To mitigate the computational burden on the clients, an efficient stochastic solver of the approximated optimization problem is designed in RWSADMM, which provably converges to the stationary point almost surely in expectation. Our theoretical and empirical results demonstrate the provable fast convergence and substantial accuracy improvements achieved by RWSADMM compared to baseline methods, along with its benefits of reduced communication costs and enhanced scalability.

PDF Details

NeurIPS Conference 2023 Conference Paper

Polyhedron Attention Module: Learning Adaptive-order Interactions

Tan Zhu
Fei Dou
Xinyu Wang
Jin Lu
Jinbo Bi

Learning feature interactions can be the key for multivariate predictive modeling. ReLU-activated neural networks create piecewise linear prediction models, and other nonlinear activation functions lead to models with only high-order feature interactions. Recent methods incorporate candidate polynomial terms of fixed orders into deep learning, which is subject to the issue of combinatorial explosion, or learn the orders that are difficult to adapt to different regions of the feature space. We propose a Polyhedron Attention Module (PAM) to create piecewise polynomial models where the input space is split into polyhedrons which define the different pieces and on each piece the hyperplanes that define the polyhedron boundary multiply to form the interactive terms, resulting in interactions of adaptive order to each piece. PAM is interpretable to identify important interactions in predicting a target. Theoretic analysis shows that PAM has stronger expression capability than ReLU-activated networks. Extensive experimental results demonstrate the superior classification performance of PAM on massive datasets of the click-through rate prediction and PAM can learn meaningful interaction effects in a medical problem.

PDF Details

NeurIPS Conference 2016 Conference Paper

A Sparse Interactive Model for Matrix Completion with Side Information

Jin Lu
Guannan Liang
Jiangwen Sun
Jinbo Bi

Matrix completion methods can benefit from side information besides the partially observed matrix. The use of side features describing the row and column entities of a matrix has been shown to reduce the sample complexity for completing the matrix. We propose a novel sparse formulation that explicitly models the interaction between the row and column side features to approximate the matrix entries. Unlike early methods, this model does not require the low-rank condition on the model parameter matrix. We prove that when the side features can span the latent feature space of the matrix to be recovered, the number of observed entries needed for an exact recovery is $O(\log N)$ where $N$ is the size of the matrix. When the side features are corrupted latent features of the matrix with a small perturbation, our method can achieve an $\epsilon$-recovery with $O(\log N)$ sample complexity, and maintains a $\O(N^{3/2})$ rate similar to classfic methods with no side information. An efficient linearized Lagrangian algorithm is developed with a strong guarantee of convergence. Empirical results show that our approach outperforms three state-of-the-art methods both in simulations and on real world datasets.

PDF Details

YNICL Journal 2016 Journal Article

Changes of grey matter volume in first-episode drug-naive adult major depressive disorder patients with different age-onset

Zonglin Shen
Yuqi Cheng
Shuran Yang
Nan Dai
Jing Ye
Xiaoyan Liu
Jin Lu
Na Li

OBJECTIVE: Little is known about the pathological mechanism of early adult onset depression (EOD) and later adult onset depression (LOD). We seek to determine whether grey matter volume (GMV) change in EOD and LOD are different, which could also delineate EOD and LOD. METHODS: In present study, 147 first-episode, drug-naive patients with major depressive disorder (MDD), age between 18 and 45, were divided into two groups on the basis of age of MDD onset: the early adult onset group (age 18-29) and the later adult onset group (age 30-44), and a total of 130 gender-, and age-, matched healthy controls (HC) were also divided into two groups which fit for each patient group. Magnetic resonance imaging was conducted on all subjects. The voxel-based morphometry (VBM) approach was employed to analyze the images. RESULTS: Widespread abnormalities of GMV throughout parietal, temporal, limbic regions, occipital cortex and cerebellum were observed in MDD patients. Compare to young HC, reduced GMV in right fusiform gyrus, right middle temporal gyrus, vermis III and increased GMV in right middle occipital gyrus were seen in the EOD group. In contrast, relative to old HC, decreased GMV in the right hippocampus and increased GMV in the left middle temporal gyrus were observed in the LOD group. Compared to the LOD group, the EOD group had smaller GMV in right posterior cingulate cortex. There was no significant correlation between GMV of the right posterior cingulate cortex and the score of the depression rating scale in patients group. CONCLUSIONS: The GMV of the brain areas that were related to mood regulation was decreased in the first-episode, drug-naive adult patients with MDD. Adult patients with EOD and LOD exhibited different GMV changes relative to each age-matched comparison group, suggesting depressed adult patients with different age-onset might have different pathological mechanism.

Details DOI