Author name cluster

Junfan Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

5 papers

2 author rows

NeurIPS Conference 2025 Conference Paper

DOTA: Distributional Test-time Adaptation of Vision-Language Models

Zongbo Han
Jialong Yang
Guangyu Wang
Junfan Li
Qianli Xu
Mike Zheng Shou
Changqing Zhang

Vision-language foundation models (VLMs), such as CLIP, exhibit remarkable performance across a wide range of tasks. However, deploying these models can be unreliable when significant distribution gaps exist between training and test data, while fine-tuning for diverse scenarios is often costly. Cache-based test-time adapters offer an efficient alternative by storing representative test samples to guide subsequent classifications. Yet, these methods typically employ naive cache management with limited capacity, leading to severe catastrophic forgetting when samples are inevitably dropped during updates. In this paper, we propose DOTA (DistributiOnal Test-time Adaptation), a simple yet effective method addressing this limitation. Crucially, instead of merely memorizing individual test samples, DOTA continuously estimates the underlying distribution of the test data stream. Test-time posterior probabilities are then computed using these dynamically estimated distributions via Bayes' theorem for adaptation. This distribution-centric approach enables the model to continually learn and adapt to the deployment environment. Extensive experiments validate that DOTA significantly mitigates forgetting and achieves state-of-the-art performance compared to existing methods.

PDF Details

AAAI Conference 2024 Conference Paper

Ahpatron: A New Budgeted Online Kernel Learning Machine with Tighter Mistake Bound

Yun Liao
Junfan Li
Shizhong Liao
Qinghua Hu
Jianwu Dang

In this paper, we study the mistake bound of online kernel learning on a budget. We propose a new budgeted online kernel learning model, called Ahpatron, which significantly improves the mistake bound of previous work and resolves an open problem related to upper bounds of hypothesis space constraints. We first present an aggressive variant of Perceptron, named AVP, a model without budget, which uses an active updating rule. Then we design a new budget maintenance mechanism, which removes a half of examples, and projects the removed examples onto a hypothesis space spanned by the remaining examples. Ahpatron adopts the above mechanism to approximate AVP. Theoretical analyses prove that Ahpatron has tighter mistake bounds, and experimental results show that Ahpatron outperforms the state-of-the-art algorithms on the same or a smaller budget.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

On the Necessity of Collaboration for Online Model Selection with Decentralized Data

Junfan Li
Zheshun Wu
Zenglin Xu
Irwin King

We consider online model selection with decentralized data over $M$ clients, and study the necessity of collaboration among clients. Previous work proposed various federated algorithms without demonstrating their necessity, while we answer the question from a novel perspective of computational constraints. We prove lower bounds on the regret, and propose a federated algorithm and analyze the upper bound. Our results show (i) collaboration is unnecessary in the absence of computational constraints on clients; (ii) collaboration is necessary if the computational cost on each client is limited to $o(K)$, where $K$ is the number of candidate hypothesis spaces. We clarify the unnecessary nature of collaboration in previous federated algorithms for distributed online multi-kernel learning, and improve the regret bounds at a smaller computational and communication cost. Our algorithm relies on three new techniques including an improved Bernstein's inequality for martingale, a federated online mirror descent framework, and decoupling model selection and prediction, which might be of independent interest.

PDF Details DOI

AAAI Conference 2023 Conference Paper

Improved Kernel Alignment Regret Bound for Online Kernel Learning

Junfan Li
Shizhong Liao

In this paper, we improve the kernel alignment regret bound for online kernel learning in the regime of the Hinge loss function. Previous algorithm achieves a regret of O((A_TT ln T)^{1/4}) at a computational complexity (space and per-round time) of O((A_TT ln T)^{1/2}), where A_T is called kernel alignment. We propose an algorithm whose regret bound and computational complexity are better than previous results. Our results depend on the decay rate of eigenvalues of the kernel matrix. If the eigenvalues of the kernel matrix decay exponentially, then our algorithm enjoys a regret of O((A_T)^{1/2}) at a computational complexity of O((ln T)^2). Otherwise, our algorithm enjoys a regret of O((A_TT)^{1/4}) at a computational complexity of O((A_TT)^{1/2}). We extend our algorithm to batch learning and obtain a O(T^{-1}(E[A_T])^{1/2}) excess risk bound which improves the previous O(T^{-1/2}) bound.

PDF Details DOI

ICML Conference 2023 Conference Paper

Nearly Optimal Algorithms with Sublinear Computational Complexity for Online Kernel Regression

Junfan Li
Shizhong Liao

The trade-off between regret and computational cost is a fundamental problem for online kernel regression, and previous algorithms worked on the trade-off can not keep optimal regret bounds at a sublinear computational complexity. In this paper, we propose two new algorithms, AOGD-ALD and NONS-ALD, which can keep nearly optimal regret bounds at a sublinear computational complexity, and give sufficient conditions under which our algorithms work. Both algorithms dynamically maintain a group of nearly orthogonal basis used to approximate the kernel mapping, and keep nearly optimal regret bounds by controlling the approximate error. The number of basis depends on the approximate error and the decay rate of eigenvalues of the kernel matrix. If the eigenvalues decay exponentially, then AOGD-ALD and NONS-ALD separately achieves a regret of $O(\sqrt{L(f)})$ and $O(\mathrm{d}_{\mathrm{eff}}(\mu)\ln{T})$ at a computational complexity in $O(\ln^2{T})$. If the eigenvalues decay polynomially with degree $p\geq 1$, then our algorithms keep the same regret bounds at a computational complexity in $o(T)$ in the case of $p>4$ and $p\geq 10$, respectively. $L(f)$ is the cumulative losses of $f$ and $\mathrm{d}_{\mathrm{eff}}(\mu)$ is the effective dimension of the problem. The two regret bounds are nearly optimal and are not comparable.

Details