Author name cluster

Junhong Lin

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

15 papers

2 author rows

NeurIPS Conference 2025 Conference Paper

HeroFilter: Adaptive Spectral Graph Filter for Varying Heterophilic Relations

Shuaicheng Zhang
Haohui Wang
Junhong Lin
Xiaojie Guo
Yada Zhu
Si Zhang
Dongqi Fu
Dawei Zhou

Graph heterophily, where connected nodes have different labels, has attracted significant interest recently. Most existing works adopt a simplified approach - using low-pass filters for homophilic graphs and high-pass filters for heterophilic graphs. However, we discover that the relationship between graph heterophily and spectral filters is more complex - the optimal filter response varies across frequency components and does not follow a strict monotonic correlation with heterophily degree. This finding challenges conventional fixed filter designs and suggests the need for adaptive filtering to preserve expressiveness in graph embeddings. Formally, natural questions arise: Given a heterophilic graph $\mathcal{G}$, how and to what extent will the varying heterophily degree of $\mathcal{G}$ affect the performance of GNNs? How can we design adaptive filters to fit those varying heterophilic connections? Our theoretical analysis reveals that the average frequency response of GNNs and graph heterophily degree do not follow a strict monotonic correlation, necessitating adaptive graph filters to guarantee good generalization performance. Hence, we propose HeroFilter, a simple yet powerful GNN, which extracts information across the heterophily spectrum and combines salient representations through adaptive mixing. HeroFilter's superior performance achieves up to 9. 2% accuracy improvement over leading baselines across homophilic and heterophilic graphs.

PDF Details

ICML Conference 2025 Conference Paper

LensLLM: Unveiling Fine-Tuning Dynamics for LLM Selection

Xinyue Zeng
Haohui Wang
Junhong Lin
Jun Wu 0019
Tyler Cody
Dawei Zhou 0003

The proliferation of open-sourced Large Language Models (LLMs) and diverse downstream tasks necessitates efficient model selection, given the impracticality of fine-tuning all candidates due to computational constraints. Despite the recent advances in LLM selection, a fundamental research question largely remains nascent: how can we model the dynamic behaviors of LLMs during fine-tuning, thereby enhancing our understanding of their generalization performance across diverse downstream tasks? In this work, we propose a novel theoretical framework that provides a proper lens to assess the generalization capabilities of LLMs, thereby enabling accurate and efficient LLM selection for downstream applications. In particular, we first derive a PAC-Bayesian Generalization Bound that unveils fine-tuning dynamics of LLMs and then introduce LensLLM, a Neural Tangent Kernel (NTK)-based Rectified Scaling Model that enables accurate performance predictions across diverse tasks while maintaining computational efficiency. Extensive empirical results on 3 large-scale benchmarks demonstrate that our model achieves up to 91. 1% accuracy and reduces up to 88. 5% computational cost in LLM selection, outperforming 5 state-of-the-art methods. We open-source our proposed LensLLM model and corresponding results at LensLLM. io.

Details

ICLR Conference 2025 Conference Paper

Reasoning of Large Language Models over Knowledge Graphs with Super-Relations

Song Wang 0013
Junhong Lin
Xiaojie Guo 0002
Julian Shun
Jundong Li
Yada Zhu

While large language models (LLMs) have made significant progress in processing and reasoning over knowledge graphs, current methods suffer from a high non-retrieval rate. This limitation reduces the accuracy of answering questions based on these graphs. Our analysis reveals that the combination of greedy search and forward reasoning is a major contributor to this issue. To overcome these challenges, we introduce the concept of super-relations, which enables both forward and backward reasoning by summarizing and connecting various relational paths within the graph. This holistic approach not only expands the search space, but also significantly improves retrieval efficiency. In this paper, we propose the ReKnoS framework, which aims to Reason over Knowledge Graphs with Super-Relations. Our framework’s key advantages include the inclusion of multiple relation paths through super-relations, enhanced forward and backward reasoning capabilities, and increased efficiency in querying LLMs. These enhancements collectively lead to a substantial improvement in the successful retrieval rate and overall reasoning performance. We conduct extensive experiments on a variety of datasets to evaluate ReKnoS, and the results demonstrate the superior performance of ReKnoS over existing state-of-the-art baselines, with an average accuracy gain of 2.92% across nine real-world datasets.

Details

NeurIPS Conference 2025 Conference Paper

Theoretical Investigation of Adafactor for Non-Convex Smooth Optimization

Yusu Hong
Junhong Lin

Adafactor is an early memory-efficient optimization algorithm proposed as an alternative to Adam. By eliminating first-order momentum and employing a rank-$1$ matrix factorization to approximate the second-moment matrix, Adafactor achieves near-zero memory overhead compared to traditional gradient descent methods. Despite its practical suitability for large-scale training tasks where memory efficiency is critical, its theoretical convergence analysis remains unexplored, largely due to the challenges posed by its matrix factorization and update clipping mechanisms. In this work, we provide a convergence analysis of Adafactor for non-convex smooth optimization. We establish optimal convergence rates (up to logarithmic factors) for finding stationary points in both deterministic and stochastic settings, the latter under sub-Gaussian noise. Central to our analysis is viewing Adafactor as an approximation of Adam, and the use of a new proxy step-size to approximate the unique adaptive step-size induced by Adafactor's matrix factorization and update clipping, along with an induction argument to control the gradient magnitude. Our findings may theoretically suggest that involving rank-$1$ matrix approximation of the second-moment matrix in Adam does not fundamentally hinder the convergence.

PDF Details

NeurIPS Conference 2024 Conference Paper

On Convergence of Adam for Stochastic Optimization under Relaxed Assumptions

Yusu Hong
Junhong Lin

In this paper, we study Adam in non-convex smooth scenarios with potential unbounded gradients and affine variance noise. We consider a general noise model which governs affine variance noise, bounded noise, and sub-Gaussian noise. We show that Adam with a specific hyper-parameter setup can find a stationary point with a $\mathcal{O}(\text{poly}(\log T)/\sqrt{T})$ rate in high probability under this general noise model where $T$ denotes total number iterations, matching the lower rate of stochastic first-order algorithms up to logarithm factors. We also provide a probabilistic convergence result for Adam under a generalized smooth condition which allows unbounded smoothness parameters and has been illustrated empirically to capture the smooth property of many practical objective functions more accurately.

PDF Details DOI

UAI Conference 2024 Conference Paper

Revisiting Convergence of AdaGrad with Relaxed Assumptions

Yusu Hong
Junhong Lin

In this study, we revisit the convergence of AdaGrad with momentum (covering AdaGrad as a special case) on non-convex smooth optimization problems. We consider a general noise model where the noise magnitude is controlled by the function value gap together with the gradient magnitude. This model encompasses a broad range of noises including bounded noise, sub-Gaussian noise, affine variance noise and the expected smoothness, and it has been shown to be more realistic in many practical applications. Our analysis yields a probabilistic convergence rate which, under the general noise, could reach at $\tilde{\mathcal{O}}(1/\sqrt{T})$. This rate does not rely on prior knowledge of problem-parameters and could accelerate to $\tilde{\mathcal{O}}(1/T)$ where $T$ denotes the total number iterations, when the noise parameters related to the function value gap and noise level are sufficiently small. The convergence rate thus matches the lower rate for stochastic first-order methods over non-convex smooth landscape up to logarithm terms [Arjevani et al. , 2023]. We further derive a convergence bound for AdaGrad with momentum, considering the generalized smoothness where the local smoothness is controlled by a first-order function of the gradient norm.

Details

JMLR Journal 2020 Journal Article

Convergences of Regularized Algorithms and Stochastic Gradient Methods with Random Projections

Junhong Lin
Volkan Cevher

We study the least-squares regression problem over a Hilbert space, covering nonparametric regression over a reproducing kernel Hilbert space as a special case. We first investigate regularized algorithms adapted to a projection operator on a closed subspace of the Hilbert space. We prove convergence results with respect to variants of norms, under a capacity assumption on the hypothesis space and a regularity condition on the target function. As a result, we obtain optimal rates for regularized algorithms with randomized sketches, provided that the sketch dimension is proportional to the effective dimension up to a logarithmic factor. As a byproduct, we obtain similar results for Nystr\"{o}m regularized algorithms. Our results provide optimal, distribution-dependent rates that do not have any saturation effect for sketched/Nystr\"{o}m regularized algorithms, considering both the attainable and non-attainable cases, in the well-conditioned regimes. We then study stochastic gradient methods with projection over the subspace, allowing multi-pass over the data and minibatches, and we derive similar optimal statistical convergence results. [abs] [ pdf ][ bib ] &copy JMLR 2020. ( edit, beta )

PDF Details

JMLR Journal 2020 Journal Article

Optimal Convergence for Distributed Learning with Stochastic Gradient Methods and Spectral Algorithms

Junhong Lin
Volkan Cevher

We study generalization properties of distributed algorithms in the setting of nonparametric regression over a reproducing kernel Hilbert space (RKHS). We first investigate distributed stochastic gradient methods (SGM), with mini-batches and multi-passes over the data. We show that optimal generalization error bounds (up to logarithmic factor) can be retained for distributed SGM provided that the partition level is not too large. We then extend our results to spectral algorithms (SA), including kernel ridge regression (KRR), kernel principal component regression, and gradient methods. Our results show that distributed SGM has a smaller theoretical computational complexity, compared with distributed KRR and classic SGM. Moreover, even for a general non-distributed SA, they provide optimal, capacity-dependent convergence rates, for the case that the regression function may not be in the RKHS in the well-conditioned regimes. [abs] [ pdf ][ bib ] &copy JMLR 2020. ( edit, beta )

PDF Details

ICML Conference 2018 Conference Paper

Optimal Distributed Learning with Multi-pass Stochastic Gradient Methods

Junhong Lin
Volkan Cevher

We study generalization properties of distributed algorithms in the setting of nonparametric regression over a reproducing kernel Hilbert space (RKHS). We investigate distributed stochastic gradient methods (SGM), with mini-batches and multi-passes over the data. We show that optimal generalization error bounds can be retained for distributed SGM provided that the partition level is not too large. Our results are superior to the state-of-the-art theory, covering the cases that the regression function may not be in the hypothesis spaces. Particularly, our results show that distributed SGM has a smaller theoretical computational complexity, compared with distributed kernel ridge regression (KRR) and classic SGM.

Details

ICML Conference 2018 Conference Paper

Optimal Rates of Sketched-regularized Algorithms for Least-Squares Regression over Hilbert Spaces

Junhong Lin
Volkan Cevher

We investigate regularized algorithms combining with projection for least-squares regression problem over a Hilbert space, covering nonparametric regression over a reproducing kernel Hilbert space. We prove convergence results with respect to variants of norms, under a capacity assumption on the hypothesis space and a regularity condition on the target function. As a result, we obtain optimal rates for regularized algorithms with randomized sketches, provided that the sketch dimension is proportional to the effective dimension up to a logarithmic factor. As a byproduct, we obtain similar results for Nyström regularized algorithms. Our results provide optimal, distribution-dependent rates for sketched/Nyström regularized algorithms, considering both the attainable and non-attainable cases.

Details

JMLR Journal 2017 Journal Article

Optimal Rates for Multi-pass Stochastic Gradient Methods

Junhong Lin
Lorenzo Rosasco

We analyze the learning properties of the stochastic gradient method when multiple passes over the data and mini-batches are allowed. We study how regularization properties are controlled by the step-size, the number of passes and the mini-batch size. In particular, we consider the square loss and show that for a universal step-size choice, the number of passes acts as a regularization parameter, and optimal finite sample bounds can be achieved by early-stopping. Moreover, we show that larger step-sizes are allowed when considering mini-batches. Our analysis is based on a unifying approach, encompassing both batch and stochastic gradient methods as special cases. As a byproduct, we derive optimal convergence results for batch gradient methods (even in the non-attainable cases). [abs] [ pdf ][ bib ] &copy JMLR 2017. ( edit, beta )

PDF Details

ICML Conference 2016 Conference Paper

Generalization Properties and Implicit Regularization for Multiple Passes SGM

Junhong Lin
Raffaello Camoriano
Lorenzo Rosasco

We study the generalization properties of stochastic gradient methods for learning with convex loss functions and linearly parameterized functions. We show that, in the absence of penalizations or constraints, the stability and approximation properties of the algorithm can be controlled by tuning either the step-size or the number of passes over the data. In this view, these parameters can be seen to control a form of implicit regularization. Numerical results complement the theoretical findings.

Details

JMLR Journal 2016 Journal Article

Iterative Regularization for Learning with Convex Loss Functions

Junhong Lin
Lorenzo Rosasco
Ding-Xuan Zhou

We consider the problem of supervised learning with convex loss functions and propose a new form of iterative regularization based on the subgradient method. Unlike other regularization approaches, in iterative regularization no constraint or penalization is considered, and generalization is achieved by (early) stopping an empirical iteration. We consider a nonparametric setting, in the framework of reproducing kernel Hilbert spaces, and prove consistency and finite sample bounds on the excess risk under general regularity conditions. Our study provides a new class of efficient regularized learning algorithms and gives insights on the interplay between statistics and optimization in machine learning. [abs] [ pdf ][ bib ] &copy JMLR 2016. ( edit, beta )

PDF Details

NeurIPS Conference 2016 Conference Paper

Optimal Learning for Multi-pass Stochastic Gradient Methods

Junhong Lin
Lorenzo Rosasco

We analyze the learning properties of the stochastic gradient method when multiple passes over the data and mini-batches are allowed. In particular, we consider the square loss and show that for a universal step-size choice, the number of passes acts as a regularization parameter, and optimal finite sample bounds can be achieved by early-stopping. Moreover, we show that larger step-sizes are allowed when considering mini-batches. Our analysis is based on a unifying approach, encompassing both batch and stochastic gradient methods as special cases.

PDF Details

JMLR Journal 2015 Journal Article

Learning Theory of Randomized Kaczmarz Algorithm

Junhong Lin
Ding-Xuan Zhou

A relaxed randomized Kaczmarz algorithm is investigated in a least squares regression setting by a learning theory approach. When the sampling values are accurate and the regression function (conditional means) is linear, such an algorithm has been well studied in the community of non-uniform sampling. In this paper, we are mainly interested in the different case of either noisy random measurements or a nonlinear regression function. In this case, we show that relaxation is needed. A necessary and sufficient condition on the sequence of relaxation parameters or step sizes for the convergence of the algorithm in expectation is presented. Moreover, polynomial rates of convergence, both in expectation and in probability, are provided explicitly. As a result, the almost sure convergence of the algorithm is proved by applying the Borel-Cantelli Lemma. [abs] [ pdf ][ bib ] &copy JMLR 2015. ( edit, beta )

PDF Details