Author name cluster

Tomer Koren

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

61 papers

2 author rows

EWRL Workshop 2025 Workshop Paper

Convergence and Sample Complexity of First-Order Methods for Agnostic Reinforcement Learning

Uri Sherman
Tomer Koren
Yishay Mansour

We study reinforcement learning (RL) in the agnostic policy learning setting, where the goal is to find a policy whose performance is competitive with the best policy in a given class of interest $\Pi$---crucially, without assuming that $\Pi$ contains the optimal policy. We propose a general policy learning framework that reduces this problem to first-order optimization in a non-Euclidean space, leading to new algorithms as well as shedding light on the convergence properties of existing ones. Specifically, under the assumption that $\Pi$ is convex and satisfies a variational gradient dominance (VGD) condition---an assumption known to be strictly weaker than more standard completeness and coverability conditions---we obtain sample complexity upper bounds for three policy learning algorithms: \emph{(i)} Steepest Descent Policy Optimization, derived from a constrained steepest descent method for non-convex optimization; \emph{(ii)} the classical Conservative Policy Iteration algorithm \citep{kakade2002approximately} reinterpreted through the lens of the Frank-Wolfe method, which leads to improved convergence results; and \emph{(iii)} an on-policy instantiation of the well-studied Policy Mirror Descent algorithm. Finally, we empirically evaluate the VGD condition across several standard environments, demonstrating the practical relevance of our key assumption.

PDF

ICML Conference 2025 Conference Paper

Convergence of Policy Mirror Descent Beyond Compatible Function Approximation

Uri Sherman
Tomer Koren
Yishay Mansour

Modern policy optimization methods roughly follow the policy mirror descent (PMD) algorithmic template, for which there are by now numerous theoretical convergence results. However, most of these either target tabular environments, or can be applied effectively only when the class of policies being optimized over satisfies strong closure conditions, which is typically not the case when working with parametric policy classes in large-scale environments. In this work, we develop a theoretical framework for PMD for general policy classes where we replace the closure conditions with a generally weaker variational gradient dominance assumption, and obtain upper bounds on the rate of convergence to the best-in-class policy. Our main result leverages a novel notion of smoothness with respect to a local norm induced by the occupancy measure of the current policy, and casts PMD as a particular instance of smooth non-convex optimization in non-Euclidean space.

Details

ICML Conference 2025 Conference Paper

Dueling Convex Optimization with General Preferences

Aadirupa Saha
Tomer Koren
Yishay Mansour

We address the problem of convex optimization with dueling feedback, where the goal is to minimize a convex function given a weaker form of dueling feedback. Each query consists of two points and the dueling feedback returns a (noisy) single-bit binary comparison of the function values of the two queried points. The translation of the function values to the single comparison bit is through a transfer function. This problem has been addressed previously for some restricted classes of transfer functions, but here we consider a very general transfer function class which includes all functions that admit a series expansion about the origin. Our main contribution is an efficient algorithm with convergence rate of $O(\epsilon^{-4p})$ for smooth convex functions, and an optimal rate of $\widetilde O(\epsilon^{-2p})$ when the objective is both smooth and strongly convex, where $p$ is the minimal degree (with a non-zero coefficient) in the transfer’s series expansion about the origin.

Details

NeurIPS Conference 2025 Conference Paper

Fast Last-Iterate Convergence of SGD in the Smooth Interpolation Regime

Amit Attia
Matan Schliserman
Uri Sherman
Tomer Koren

We study population convergence guarantees of stochastic gradient descent (SGD) for smooth convex objectives in the interpolation regime, where the noise at optimum is zero or near zero. The behavior of the last iterate of SGD in this setting---particularly with large (constant) stepsizes---has received growing attention in recent years due to implications for the training of over-parameterized models, as well as to analyzing forgetting in continual learning and to understanding the convergence of the randomized Kaczmarz method for solving linear systems. We establish that after $T$ steps of SGD on $\beta$-smooth convex loss functions with stepsize $0 < \eta < 2/\beta$, the last iterate exhibits expected excess risk $\widetilde{O}(\tfrac{1}{\eta (2-\beta \eta) T^{1-\beta\eta/2}} + \tfrac{\eta}{(2-\beta\eta)^2} T^{\beta\eta/2} \sigma_\star^2)$, where $\sigma_\star^2$ denotes the variance of the stochastic gradients at the optimum. In particular, for a well-tuned stepsize we obtain a near optimal $\widetilde{O}(1/T + \sigma_\star/\sqrt T)$ rate for the last iterate, extending the results of Varre et al. (2021) beyond least squares regression; and when $\sigma_\star=0$ we obtain a rate of $O(1/\sqrt T)$ with $\eta=1/\beta$, improving upon the best-known $O(T^{-1/4})$ rate recently established by Evron et al. (2025) in the special case of realizable linear regression.