Arrow Research search

Author name cluster

Diego Klabjan

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

25 papers
2 author rows

Possible papers

25

UAI Conference 2025 Conference Paper

A Mirror Descent Perspective of Smoothed Sign Descent

  • Shuyang Wang
  • Diego Klabjan

The optimization dynamics of gradient descent for overparameterized problems can be viewed as low-dimensional dual dynamics induced by a mirror map, providing a mirror descent perspective on the implicit regularization phenomenon. However, the dynamics of adaptive gradient descent methods that are widely used in practice remain less understood. Meanwhile, empirical evidence of performance gaps suggests fundamental differences in their underlying dynamics. In this work, we introduce the dual dynamics of smoothed sign descent with stability constant $\varepsilon$ for regression problems, formulated using the mirror descent framework. Unlike prior methods, our approach applies to algorithms where update directions deviate from true gradients such as ADAM. We propose a mirror map that reveals the equivalent dual dynamics under some assumptions. By studying dual dynamics, we characterize the convergent solution as approximately minimizing a Bregman divergence style function closely related to the $l_{3/2}$ norm. Furthermore, we demonstrate the role of the stability constant $\varepsilon$ in shaping the convergent solution. Our analyses offer new insights into the distinct properties of the smoothed sign descent algorithm, and show the potential of applying the mirror descent framework to study complex dynamics beyond gradient descent.

NeurIPS Conference 2025 Conference Paper

Investigating Hallucinations of Time Series Foundation Models through Signal Subspace Analysis

  • Yufeng Zou
  • Zijian Wang
  • Diego Klabjan
  • Han Liu

Times series foundation models (TSFMs) have emerged as a promising paradigm for time series analyses and forecasting, showing remarkable generalization performance across different domains. Despite the efforts made on hallucinations of foundation models, hallucinations of TSFMs have been underexplored in existing literature. In this paper, we formally define TSFM hallucinations in the zero-shot forecasting setting by examining whether a generated forecast exhibits different dynamics from those of the context. Our study reveals that TSFM hallucinations are associated with the loss of context information in hidden states during forward propagation. As such, we propose a methodology to identify signal subspaces of TSFMs and magnify the information through intervention. Experiments demonstrate that our proposed intervention approach effectively mitigates hallucinations and improves forecasting performance. The signal strength measure computed from signal subspaces shows strong predictive power of hallucinations and forecasting performance of the model. Our work contributes to deeper understanding of TSFM trustworthiness that could foster future research in this direction.

NeurIPS Conference 2025 Conference Paper

Rewind-to-Delete: Certified Machine Unlearning for Nonconvex Functions

  • Siqiao Mu
  • Diego Klabjan

Machine unlearning algorithms aim to efficiently remove data from a model without retraining it from scratch, in order to remove corrupted or outdated data or respect a user's "right to be forgotten. " Certified machine unlearning is a strong theoretical guarantee based on differential privacy that quantifies the extent to which an algorithm erases data from the model weights. In contrast to existing works in certified unlearning for convex or strongly convex loss functions, or nonconvex objectives with limiting assumptions, we propose the first, first-order, black-box (i. e. , can be applied to models pretrained with vanilla gradient descent) algorithm for unlearning on general nonconvex loss functions, which unlearns by ``rewinding" to an earlier step during the learning process before performing gradient descent on the loss function of the retained data points. We prove $(\epsilon, \delta)$ certified unlearning and performance guarantees that establish the privacy-utility-complexity tradeoff of our algorithm, and we prove generalization guarantees for nonconvex functions that satisfy the Polyak-Lojasiewicz inequality. Finally, we demonstrate the superior performance of our algorithm compared to existing methods, within a new experimental framework that more accurately reflects unlearning user data in practice.

NeurIPS Conference 2025 Conference Paper

Technical Debt in In-Context Learning: Diminishing Efficiency in Long Context

  • Taejong Joo
  • Diego Klabjan

Transformers have demonstrated remarkable in-context learning (ICL) capabilities, adapting to new tasks by simply conditioning on demonstrations without parameter updates. Compelling empirical and theoretical evidence suggests that ICL, as a general-purpose learner, could outperform task-specific models. However, it remains unclear to what extent the transformers optimally learn in-context compared to principled learning algorithms. To investigate this, we employ a meta ICL framework in which each prompt defines a distinctive regression task whose target function is drawn from a hierarchical distribution, requiring inference over both the latent model class and task-specific parameters. Within this setup, we benchmark sample complexity of ICL against principled learning algorithms, including the Bayes optimal estimator, under varying performance requirements. Our findings reveal a striking dichotomy: while ICL initially matches the efficiency of a Bayes optimal estimator, its efficiency significantly deteriorates in long context. Through an information-theoretic analysis, we show that the diminishing efficiency is inherent to ICL. These results clarify the trade-offs in adopting ICL as a universal problem solver, motivating a new generation of on-the-fly adaptive methods without the diminishing efficiency.

AAAI Conference 2024 Conference Paper

A Primal-Dual Algorithm for Hybrid Federated Learning

  • Tom Overman
  • Garrett Blum
  • Diego Klabjan

Very few methods for hybrid federated learning, where clients only hold subsets of both features and samples, exist. Yet, this scenario is very important in practical settings. We provide a fast, robust algorithm for hybrid federated learning that hinges on Fenchel Duality. We prove the convergence of the algorithm to the same solution as if the model was trained centrally in a variety of practical regimes. Furthermore, we provide experimental results that demonstrate the performance improvements of the algorithm over a commonly used method in federated learning, FedAvg, and an existing hybrid FL algorithm, HyFEM. We also provide privacy considerations and necessary steps to protect client data.

NeurIPS Conference 2024 Conference Paper

Improving self-training under distribution shifts via anchored confidence with theoretical guarantees

  • Taejong Joo
  • Diego Klabjan

Self-training often falls short under distribution shifts due to an increased discrepancy between prediction confidence and actual accuracy. This typically necessitates computationally demanding methods such as neighborhood or ensemble-based label corrections. Drawing inspiration from insights on early learning regularization, we develop a principled method to improve self-training under distribution shifts based on temporal consistency. Specifically, we build an uncertainty-aware temporal ensemble with a simple relative thresholding. Then, this ensemble smooths noisy pseudo labels to promote selective temporal consistency. We show that our temporal ensemble is asymptotically correct and our label smoothing technique can reduce the optimality gap of self-training. Our extensive experiments validate that our approach consistently improves self-training performances by 8% to 16% across diverse distribution shift scenarios without a computational overhead. Besides, our method exhibits attractive properties, such as improved calibration performance and robustness to different hyperparameter choices.

ICML Conference 2024 Conference Paper

IW-GAE: Importance weighted group accuracy estimation for improved calibration and model selection in unsupervised domain adaptation

  • Taejong Joo
  • Diego Klabjan

Distribution shifts pose significant challenges for model calibration and model selection tasks in the unsupervised domain adaptation problem—a scenario where the goal is to perform well in a distribution shifted domain without labels. In this work, we tackle difficulties coming from distribution shifts by developing a novel importance weighted group accuracy estimator. Specifically, we present a new perspective of addressing the model calibration and model selection tasks by estimating the group accuracy. Then, we formulate an optimization problem for finding an importance weight that leads to an accurate group accuracy estimation with theoretical analyses. Our extensive experiments show that our approach improves state-of-the-art performances by 22% in the model calibration task and 14% in the model selection task.

ICML Conference 2024 Conference Paper

On the Second-Order Convergence of Biased Policy Gradient Algorithms

  • Siqiao Mu
  • Diego Klabjan

Since the objective functions of reinforcement learning problems are typically highly nonconvex, it is desirable that policy gradient, the most popular algorithm, escapes saddle points and arrives at second-order stationary points. Existing results only consider vanilla policy gradient algorithms with unbiased gradient estimators, but practical implementations under the infinite-horizon discounted reward setting are biased due to finite-horizon sampling. Moreover, actor-critic methods, whose second-order convergence has not yet been established, are also biased due to the critic approximation of the value function. We provide a novel second-order analysis of biased policy gradient methods, including the vanilla gradient estimator computed from Monte-Carlo sampling of trajectories as well as the double-loop actor-critic algorithm, where in the inner loop the critic improves the approximation of the value function via TD(0) learning. Separately, we also establish the convergence of TD(0) on Markov chains irrespective of initial state distribution.

NeurIPS Conference 2023 Conference Paper

Decentralized Randomly Distributed Multi-agent Multi-armed Bandit with Heterogeneous Rewards

  • Mengfan Xu
  • Diego Klabjan

We study a decentralized multi-agent multi-armed bandit problem in which multiple clients are connected by time dependent random graphs provided by an environment. The reward distributions of each arm vary across clients and rewards are generated independently over time by an environment based on distributions that include both sub-exponential and sub-gaussian distributions. Each client pulls an arm and communicates with neighbors based on the graph provided by the environment. The goal is to minimize the overall regret of the entire system through collaborations. To this end, we introduce a novel algorithmic framework, which first provides robust simulation methods for generating random graphs using rapidly mixing markov chains or the random graph model, and then combines an averaging-based consensus approach with a newly proposed weighting technique and the upper confidence bound to deliver a UCB-type solution. Our algorithms account for the randomness in the graphs, removing the conventional doubly stochasticity assumption, and only require the knowledge of the number of clients at initialization. We derive optimal instance-dependent regret upper bounds of order $\log{T}$ in both sub-gaussian and sub-exponential environments, and a nearly optimal instance-free regret upper bound of order $\sqrt{T}\log T$ up to a $\log T$ factor. Importantly, our regret bounds hold with high probability and capture graph randomness, whereas prior works consider expected regret under assumptions and require more stringent reward distributions.

ICML Conference 2023 Conference Paper

Pareto Regret Analyses in Multi-objective Multi-armed Bandit

  • Mengfan Xu
  • Diego Klabjan

We study Pareto optimality in multi-objective multi-armed bandit by providing a formulation of adversarial multi-objective multi-armed bandit and defining its Pareto regrets that can be applied to both stochastic and adversarial settings. The regrets do not rely on any scalarization functions and reflect Pareto optimality compared to scalarized regrets. We also present new algorithms assuming both with and without prior information of the multi-objective multi-armed bandit setting. The algorithms are shown optimal in adversarial settings and nearly optimal up to a logarithmic factor in stochastic settings simultaneously by our established upper bounds and lower bounds on Pareto regrets. Moreover, the lower bound analyses show that the new regrets are consistent with the existing Pareto regret for stochastic settings and extend an adversarial attack mechanism from bandit to the multi-objective one.

JMLR Journal 2023 Journal Article

Scale Invariant Power Iteration

  • Cheolmin Kim
  • Youngseok Kim
  • Diego Klabjan

We introduce a new class of optimization problems called scale invariant problems that cover interesting problems in machine learning and statistics and show that they are efficiently solved by a general form of power iteration called scale invariant power iteration (SCI-PI). SCI-PI is a special case of the generalized power method (GPM) (Journée et al., 2010) where the constraint set is the unit sphere. In this work, we provide the convergence analysis of SCI-PI for scale invariant problems which yields a better rate than the analysis of GPM. Specifically, we prove that it attains local linear convergence with a generalized rate of power iteration to find an optimal solution for scale invariant problems. Moreover, we discuss some extended settings of scale invariant problems and provide similar convergence results. In numerical experiments, we introduce applications to independent component analysis, Gaussian mixtures, and non-negative matrix factorization with the KL-divergence. Experimental results demonstrate that SCI-PI is competitive to application specific state-of-the-art algorithms and often yield better solutions. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2023. ( edit, beta )

ICML Conference 2021 Conference Paper

A Probabilistic Approach to Neural Network Pruning

  • Xin Qian
  • Diego Klabjan

Neural network pruning techniques reduce the number of parameters without compromising predicting ability of a network. Many algorithms have been developed for pruning both over-parameterized fully-connected networks (FCN) and convolutional neural networks (CNN), but analytical studies of capabilities and compression ratios of such pruned sub-networks are lacking. We theoretically study the performance of two pruning techniques (random and magnitude-based) on FCN and CNN. Given a target network, we provide a universal approach to bound the gap between a pruned and the target network in a probabilistic sense, which is the first study of this nature. The results establish that there exist pruned networks with expressive power within any specified bound from the target network and with a significant compression ratio.

IJCAI Conference 2021 Conference Paper

k-Nearest Neighbors by Means of Sequence to Sequence Deep Neural Networks and Memory Networks

  • Yiming Xu
  • Diego Klabjan

k-Nearest Neighbors is one of the most fundamental but effective classification models. In this paper, we propose two families of models built on a sequence to sequence model and a memory network model to mimic the k-Nearest Neighbors model, which generate a sequence of labels, a sequence of out-of-sample feature vectors and a final label for classification, and thus they could also function as oversamplers. We also propose 'out-of-core' versions of our models which assume that only a small portion of data can be loaded into memory. Computational experiments show that our models on structured datasets outperform k-Nearest Neighbors, a feed-forward neural network, XGBoost, lightGBM, random forest and a memory network, due to the fact that our models must produce additional output and not just the label. On image and text datasets, the performance of our model is close to many state-of-the-art deep models. As an oversampler on imbalanced datasets, the sequence to sequence kNN model often outperforms Synthetic Minority Over-sampling Technique and Adaptive Synthetic Sampling.

AAAI Conference 2021 Conference Paper

Open-Set Recognition with Gaussian Mixture Variational Autoencoders

  • Alexander Cao
  • Yuan Luo
  • Diego Klabjan

In inference, open-set classification is to either classify a sample into a known class from training or reject it as an unknown class. Existing deep open-set classifiers train explicit closed-set classifiers, in some cases disjointly utilizing reconstruction, which we find dilutes the latent representation’s ability to distinguish unknown classes. In contrast, we train our model to cooperatively learn reconstruction and perform class-based clustering in the latent space. With this, our Gaussian mixture variational autoencoder (GMVAE) achieves more accurate and robust open-set classification results, with an average F1 increase of 0. 26, through extensive experiments aided by analytical results.

ICML Conference 2018 Conference Paper

Competitive Multi-agent Inverse Reinforcement Learning with Sub-optimal Demonstrations

  • Xingyu Wang 0003
  • Diego Klabjan

This paper considers the problem of inverse reinforcement learning in zero-sum stochastic games when expert demonstrations are known to be suboptimal. Compared to previous works that decouple agents in the game by assuming optimality in expert policies, we introduce a new objective function that directly pits experts against Nash Equilibrium policies, and we design an algorithm to solve for the reward function in the context of inverse reinforcement learning with deep neural networks as model approximations. To? nd Nash Equilibrium in large-scale games, we also propose an adversarial training algorithm for zero-sum stochastic games, and show the theoretical appeal of non-existence of local optima in its objective function. In numerical experiments, we demonstrate that our Nash Equilibrium and inverse reinforcement learning algorithms address games that are not amenable to existing benchmark algorithms. Moreover, our algorithm successfully recovers reward and policy functions regardless of the quality of the sub-optimal expert demonstration set.

JMLR Journal 2017 Journal Article

Bayesian Network Learning via Topological Order

  • Young Woong Park
  • Diego Klabjan

We propose a mixed integer programming (MIP) model and iterative algorithms based on topological orders to solve optimization problems with acyclic constraints on a directed graph. The proposed MIP model has a significantly lower number of constraints compared to popular MIP models based on cycle elimination constraints and triangular inequalities. The proposed iterative algorithms use gradient descent and iterative reordering approaches, respectively, for searching topological orders. A computational experiment is presented for the Gaussian Bayesian network learning problem, an optimization problem minimizing the sum of squared errors of regression models with L1 penalty over a feature network with application of gene network inference in bioinformatics. [abs] [ pdf ][ bib ] &copy JMLR 2017. ( edit, beta )

NeurIPS Conference 2017 Conference Paper

Improving the Expected Improvement Algorithm

  • Chao Qin
  • Diego Klabjan
  • Daniel Russo

The expected improvement (EI) algorithm is a popular strategy for information collection in optimization under uncertainty. The algorithm is widely known to be too greedy, but nevertheless enjoys wide use due to its simplicity and ability to handle uncertainty and noise in a coherent decision theoretic framework. To provide rigorous insight into EI, we study its properties in a simple setting of Bayesian optimization where the domain consists of a finite grid of points. This is the so-called best-arm identification problem, where the goal is to allocate measurement effort wisely to confidently identify the best arm using a small number of measurements. In this framework, one can show formally that EI is far from optimal. To overcome this shortcoming, we introduce a simple modification of the expected improvement algorithm. Surprisingly, this simple change results in an algorithm that is asymptotically optimal for Gaussian best-arm identification problems, and provably outperforms standard EI by an order of magnitude.

RLDM Conference 2017 Conference Abstract

Improving the Expected Improvement Algorithm*

  • Chao Qin
  • Diego Klabjan
  • Daniel Russo

The expected improvement (EI) algorithm is a popular strategy for information collection in optimization under uncertainty. The algorithm is widely known to be too greedy, but nevertheless enjoys wide use due to its simplicity and ability to handle uncertainty and noise in a coherent decision theoretic framework. To provide rigorous insight into EI, we study its properties in a simple setting of Bayesian optimization where the domain consists of a finite grid of points. This is the so-called best-arm identification problem, where the goal is to allocate measurement effort wisely to confidently identify the best arm using a small number of measurements. In this framework, one can show formally that EI is far from optimal. To overcome this shortcoming, we introduce a simple modification of the expected improvement algorithm. Surprisingly, this simple change results in an algorithm that is asymptotically optimal for Gaussian best-arm identification problems, and provably outperforms standard EI by an order of magnitude.

AAAI Conference 2017 Conference Paper

Regularization for Unsupervised Deep Neural Nets

  • Baiyang Wang
  • Diego Klabjan

Unsupervised neural networks, such as restricted Boltzmann machines (RBMs) and deep belief networks (DBNs), are powerful tools for feature selection and pattern recognition tasks. We demonstrate that overfitting occurs in such models just as in deep feedforward neural networks, and discuss possible regularization methods to reduce overfitting. We also propose a “partial” approach to improve the efficiency of Dropout/DropConnect in this scenario, and discuss the theoretical justification of these methods from model convergence and likelihood bounds. Finally, we compare the performance of these methods based on their likelihood and classification error rates for various pattern recognition data sets.

AAAI Conference 2016 Conference Paper

Temporal Topic Analysis with Endogenous and Exogenous Processes

  • Baiyang Wang
  • Diego Klabjan

We consider the problem of modeling temporal textual data taking endogenous and exogenous processes into account. Such text documents arise in real world applications, including job advertisements and economic news articles, which are influenced by the fluctuations of the general economy. We propose a hierarchical Bayesian topic model which imposes a ”groupcorrelated” hierarchical structure on the evolution of topics over time incorporating both processes, and show that this model can be estimated from Markov chain Monte Carlo sampling methods. We further demonstrate that this model captures the intrinsic relationships between the topic distribution and the time-dependent factors, and compare its performance with latent Dirichlet allocation (LDA) and two other related models. The model is applied to two collections of documents to illustrate its empirical performance: online job advertisements from DirectEmployers Association and journalists’ postings on BusinessInsider. com.