Arrow Research search

Author name cluster

Stratis Ioannidis

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

17 papers
2 author rows

Possible papers

17

TMLR Journal 2025 Journal Article

Dependency-aware Maximum Likelihood Estimation for Active Learning

  • Beyza Kalkanli
  • Tales Imbiriba
  • Stratis Ioannidis
  • Deniz Erdogmus
  • Jennifer Dy

Active learning aims to efficiently build a labeled training set by strategically selecting samples to query labels from annotators. In this sequential process, each sample acquisition influences subsequent selections, causing dependencies among samples in the labeled set. However, these dependencies are overlooked during the model parameter estimation stage when updating the model using Maximum Likelihood Estimation (MLE), a conventional method that assumes independent and identically distributed (i.i.d.) data. We propose Dependency-aware MLE (DMLE), which corrects MLE within the active learning framework by addressing sample dependencies typically neglected due to the i.i.d. assumption, ensuring consistency with active learning principles in the model parameter estimation process. This improved method achieves superior performance across multiple benchmark datasets, reaching higher performance in earlier cycles compared to conventional MLE. Specifically, we observe average accuracy improvements of 6%, 8.6%, and 10.5% for k=1, k=5, and k=10 respectively, after collecting the first 100 samples, where entropy is the acquisition function and k is the query batch size acquired at every active learning cycle.

NeurIPS Conference 2025 Conference Paper

H-SPLID: HSIC-based Saliency Preserving Latent Information Decomposition

  • Lukas Miklautz
  • Chengzhi Shi
  • Andrii Shkabrii
  • Theodoros Thirimachos Davarakis
  • Prudence Lam
  • Claudia Plant
  • Jennifer Dy
  • Stratis Ioannidis

We introduce H-SPLID, a novel algorithm for learning salient feature representations through the explicit decomposition of salient and non-salient features into separate spaces. We show that H-SPLID promotes learning low-dimensional, task-relevant features. We prove that the expected prediction deviation under input perturbations is upper-bounded by the dimension of the salient subspace and the Hilbert-Schmidt Independence Criterion (HSIC) between inputs and representations. This establishes a link between robustness and latent representation compression in terms of the dimensionality and information preserved. Empirical evaluations on image classification tasks show that models trained with H-SPLID primarily rely on salient input components, as indicated by reduced sensitivity to perturbations affecting non-salient features, such as image backgrounds.

AAAI Conference 2025 Conference Paper

Learning Set Functions with Implicit Differentiation

  • Gözde Özcan
  • Chengzhi Shi
  • Stratis Ioannidis

A recent work introduces the problem of learning set functions from data generated by a so-called optimal subset oracle. Their approach approximates the underlying utility function with an energy-based model, whose parameters are estimated via mean-field variational inference. This approximation reduces to fixed point iterations; however, as the number of iterations increases, automatic differentiation quickly becomes computationally prohibitive due to the size of the Jacobians that are stacked during backpropagation. We address this challenge with implicit differentiation and examine the convergence conditions for the fixed-point iterations. We empirically demonstrate the efficiency of our method on synthetic and real-world subset selection applications including product recommendation, set anomaly detection and compound selection tasks.

NeurIPS Conference 2025 Conference Paper

Online Two-Stage Submodular Maximization

  • Iasonas Nikolaou
  • Miltiadis Stouras
  • Stratis Ioannidis
  • Evimaria Terzi

Given a collection of monotone submodular functions, the goal of Two-Stage Submodular Maximization (2SSM) (Balkanski et al. 2016) is to restrict the ground set so an objective selected u. a. r. from the collection attains a high maximal value, on average, when optimized over the restricted ground set. We introduce the Online Two-Stage Submodular Maximization (O2SSM) problem, in which the submodular objectives are revealed in an online fashion. We study this problem for weighted threshold potential functions, a large and important subclass of monotone submodular functions that includes influence maximization, data summarization, and facility location, to name a few. We design an algorithm that achieves sublinear $(1 - 1/e)^2$-regret under general matroid constraints and $(1 - 1/e)(1-e^{-k}k^k/k! )$-regret in the case of uniform matroids of rank $k$; the latter also yields a state-of-the-art bound for the (offline) 2SSM problem. We empirically validate the performance of our online algorithm with experiments on real datasets.

NeurIPS Conference 2024 Conference Paper

Efficient Federated Learning against Heterogeneous and Non-stationary Client Unavailability

  • Ming Xiang
  • Stratis Ioannidis
  • Edmund Yeh
  • Carlee Joe-Wong
  • Lili Su

Addressing intermittent client availability is critical for the real-world deployment of federated learning algorithms. Most prior work either overlooks the potential non-stationarity in the dynamics of client unavailability or requires substantial memory/computation overhead. We study federated learning in the presence of heterogeneous and non-stationary client availability, which may occur when the deployment environments are uncertain, or the clients are mobile. The impacts of heterogeneity and non-stationarity on client unavailability can be significant, as we illustrate using FedAvg, the most widely adopted federated learning algorithm. We propose FedAWE, which includes novel algorithmic structures that (i) compensate for missed computations due to unavailability with only $O(1)$ additional memory and computation with respect to standard FedAvg, and (ii) evenly diffuse local updates within the federated learning system through implicit gossiping, despite being agnostic to non-stationary dynamics. We show that FedAWE converges to a stationary point of even non-convex objectives while achieving the desired linear speedup property. We corroborate our analysis with numerical experiments over diversified client unavailability dynamics on real-world data sets.

NeurIPS Conference 2024 Conference Paper

Exploring Token Pruning in Vision State Space Models

  • Zheng Zhan
  • Zhenglun Kong
  • Yifan Gong
  • Yushu Wu
  • Zichong Meng
  • Hangyu Zheng
  • Xuan Shen
  • Stratis Ioannidis

State Space Models (SSMs) have the advantage of keeping linear computational complexity compared to attention modules in transformers, and have been applied to vision tasks as a new type of powerful vision foundation model. Inspired by the observations that the final prediction in vision transformers (ViTs) is only based on a subset of most informative tokens, we take the novel step of enhancing the efficiency of SSM-based vision models through token-based pruning. However, direct applications of existing token pruning techniques designed for ViTs fail to deliver good performance, even with extensive fine-tuning. To address this issue, we revisit the unique computational characteristics of SSMs and discover that naive application disrupts the sequential token positions. This insight motivates us to design a novel and general token pruning method specifically for SSM-based vision models. We first introduce a pruning-aware hidden state alignment method to stabilize the neighborhood of remaining tokens for performance enhancement. Besides, based on our detailed analysis, we propose a token importance evaluation method adapted for SSM models, to guide the token pruning. With efficient implementation and practical acceleration methods, our method brings actual speedup. Extensive experiments demonstrate that our approach can achieve significant computation reduction with minimal impact on performance across different tasks. Notably, we achieve 81. 7\% accuracy on ImageNet with a 41. 6\% reduction in the FLOPs for pruned PlainMamba-L3. Furthermore, our work provides deeper insights into understanding the behavior of SSM-based vision models for future research.

AAAI Conference 2024 Conference Paper

Online Submodular Maximization via Online Convex Optimization

  • Tareq Si Salem
  • Gözde Özcan
  • Iasonas Nikolaou
  • Evimaria Terzi
  • Stratis Ioannidis

We study monotone submodular maximization under general matroid constraints in the online setting. We prove that online optimization of a large class of submodular functions, namely, threshold potential functions, reduces to online convex optimization (OCO). This is precisely because functions in this class admit a concave relaxation; as a result, OCO policies, coupled with an appropriate rounding scheme, can be used to achieve sublinear regret in the combinatorial setting. We also show that our reduction extends to many different versions of the online learning problem, including the dynamic regret, bandit, and optimistic-learning settings.

ICML Conference 2023 Conference Paper

DualHSIC: HSIC-Bottleneck and Alignment for Continual Learning

  • Zifeng Wang 0002
  • Zheng Zhan 0001
  • Yifan Gong 0004
  • Yucai Shao
  • Stratis Ioannidis
  • Yanzhi Wang 0001
  • Jennifer G. Dy

Rehearsal-based approaches are a mainstay of continual learning (CL). They mitigate the catastrophic forgetting problem by maintaining a small fixed-size buffer with a subset of data from past tasks. While most rehearsal-based approaches exploit the knowledge from buffered past data, little attention is paid to inter-task relationships and to critical task-specific and task-invariant knowledge. By appropriately leveraging inter-task relationships, we propose a novel CL method, named DualHSIC, to boost the performance of existing rehearsal-based methods in a simple yet effective way. DualHSIC consists of two complementary components that stem from the so-called Hilbert Schmidt independence criterion (HSIC): HSIC-Bottleneck for Rehearsal (HBR) lessens the inter-task interference and HSIC Alignment (HA) promotes task-invariant knowledge sharing. Extensive experiments show that DualHSIC can be seamlessly plugged into existing rehearsal-based methods for consistent performance improvements, outperforming recent state-of-the-art regularization-enhanced rehearsal methods.

NeurIPS Conference 2023 Conference Paper

SmoothHess: ReLU Network Feature Interactions via Stein's Lemma

  • Max Torop
  • Aria Masoomi
  • Davin Hill
  • Kivanc Kose
  • Stratis Ioannidis
  • Jennifer Dy

Several recent methods for interpretability model feature interactions by looking at the Hessian of a neural network. This poses a challenge for ReLU networks, which are piecewise-linear and thus have a zero Hessian almost everywhere. We propose SmoothHess, a method of estimating second-order interactions through Stein's Lemma. In particular, we estimate the Hessian of the network convolved with a Gaussian through an efficient sampling algorithm, requiring only network gradient calls. SmoothHess is applied post-hoc, requires no modifications to the ReLU network architecture, and the extent of smoothing can be controlled explicitly. We provide a non-asymptotic bound on the sample complexity of our estimation procedure. We validate the superior ability of SmoothHess to capture interactions on benchmark datasets and a real-world medical spirometry dataset.

ICLR Conference 2022 Conference Paper

Explanations of Black-Box Models based on Directional Feature Interactions

  • Aria Masoomi
  • Davin Hill
  • Zhonghui Xu
  • Craig P. Hersh
  • Edwin K. Silverman
  • Peter J. Castaldi
  • Stratis Ioannidis
  • Jennifer G. Dy

As machine learning algorithms are deployed ubiquitously to a variety of domains, it is imperative to make these often black-box models transparent. Several recent works explain black-box models by capturing the most influential features for prediction per instance; such explanation methods are univariate, as they characterize importance per feature. We extend univariate explanation to a higher-order; this enhances explainability, as bivariate methods can capture feature interactions in black-box models, represented as a directed graph. Analyzing this graph enables us to discover groups of features that are equally important (i.e., interchangeable), while the notion of directionality allows us to identify the most influential features. We apply our bivariate method on Shapley value explanations, and experimentally demonstrate the ability of directional explanations to discover feature interactions. We show the superiority of our method against state-of-the-art on CIFAR10, IMDB, Census, Divorce, Drug, and gene data.

NeurIPS Conference 2022 Conference Paper

SparCL: Sparse Continual Learning on the Edge

  • Zifeng Wang
  • Zheng Zhan
  • Yifan Gong
  • Geng Yuan
  • Wei Niu
  • Tong Jian
  • Bin Ren
  • Stratis Ioannidis

Existing work in continual learning (CL) focuses on mitigating catastrophic forgetting, i. e. , model performance deterioration on past tasks when learning a new task. However, the training efficiency of a CL system is under-investigated, which limits the real-world application of CL systems under resource-limited scenarios. In this work, we propose a novel framework called Sparse Continual Learning (SparCL), which is the first study that leverages sparsity to enable cost-effective continual learning on edge devices. SparCL achieves both training acceleration and accuracy preservation through the synergy of three aspects: weight sparsity, data efficiency, and gradient sparsity. Specifically, we propose task-aware dynamic masking (TDM) to learn a sparse network throughout the entire CL process, dynamic data removal (DDR) to remove less informative training data, and dynamic gradient masking (DGM) to sparsify the gradient updates. Each of them not only improves efficiency, but also further mitigates catastrophic forgetting. SparCL consistently improves the training efficiency of existing state-of-the-art (SOTA) CL methods by at most 23X less training FLOPs, and, surprisingly, further improves the SOTA accuracy by at most 1. 7%. SparCL also outperforms competitive baselines obtained from adapting SOTA sparse training methods to the CL setting in both efficiency and accuracy. We also evaluate the effectiveness of SparCL on a real mobile phone, further indicating the practical potential of our method.

NeurIPS Conference 2021 Conference Paper

Revisiting Hilbert-Schmidt Information Bottleneck for Adversarial Robustness

  • Zifeng Wang
  • Tong Jian
  • Aria Masoomi
  • Stratis Ioannidis
  • Jennifer Dy

We investigate the HSIC (Hilbert-Schmidt independence criterion) bottleneck as a regularizer for learning an adversarially robust deep neural network classifier. In addition to the usual cross-entropy loss, we add regularization terms for every intermediate layer to ensure that the latent representations retain useful information for output prediction while reducing redundant information. We show that the HSIC bottleneck enhances robustness to adversarial attacks both theoretically and experimentally. In particular, we prove that the HSIC bottleneck regularizer reduces the sensitivity of the classifier to adversarial examples. Our experiments on multiple benchmark datasets and architectures demonstrate that incorporating an HSIC bottleneck regularizer attains competitive natural accuracy and improves adversarial robustness, both with and without adversarial examples during training. Our code and adversarially robust models are publicly available.

IJCAI Conference 2018 Conference Paper

Distributing Frank-Wolfe via Map-Reduce

  • Armin Moharrer
  • Stratis Ioannidis

We identify structural properties under which a convex optimization over the simplex can be massively parallelized via map-reduce operations using the Frank-Wolfe (FW) algorithm. A broad class of problems, e. g. , Convex Approximation, Experimental Designs, and Adaboost, can be tackled this way. We implement FW over Apache Spark, and solve problems with 20 million variables using 350 cores in 79 minutes; the same operation takes 165 hours when executed serially.

IJCAI Conference 2018 Conference Paper

Experimental Design under the Bradley-Terry Model

  • Yuan Guo
  • Peng Tian
  • Jayashree Kalpathy-Cramer
  • Susan Ostmo
  • J. Peter Campbell
  • Michael F. Chiang
  • Deniz Erdogmus
  • Jennifer Dy

Labels generated by human experts via comparisons exhibit smaller variance compared to traditional sample labels. Collecting comparison labels is challenging over large datasets, as the number of comparisons grows quadratically with the dataset size. We study the following experimental design problem: given a budget of expert comparisons, and a set of existing sample labels, we determine the comparison labels to collect that lead to the highest classification improvement. We study several experimental design objectives motivated by the Bradley-Terry model. The resulting optimization problems amount to maximizing submodular functions. We experimentally evaluate the performance of these methods over synthetic and real-life datasets.

ICML Conference 2014 Conference Paper

Learning Mixtures of Linear Classifiers

  • Yuekai Sun
  • Stratis Ioannidis
  • Andrea Montanari

We consider a discriminative learning (regression) problem, whereby the regression function is a convex combination of k linear classifiers. Existing approaches are based on the EM algorithm, or similar techniques, without provable guarantees. We develop a simple method based on spectral techniques and a ‘mirroring’ trick, that discovers the subspace spanned by the classifiers’ parameter vectors. Under a probabilistic assumption on the feature vector distribution, we prove that this approach has nearly optimal statistical efficiency.

UAI Conference 2012 Conference Paper

Guess Who Rated This Movie: Identifying Users Through Subspace Clustering

  • Amy Zhang 0001
  • Nadia Fawaz
  • Stratis Ioannidis
  • Andrea Montanari

It is often the case that, within an online recommender system, multiple users share a common account. Can such shared accounts be identified solely on the basis of the userprovided ratings? Once a shared account is identified, can the different users sharing it be identified as well? Whenever such user identification is feasible, it opens the way to possible improvements in personalized recommendations, but also raises privacy concerns. We develop a model for composite accounts based on unions of linear subspaces, and use subspace clustering for carrying out the identification task. We show that a significant fraction of such accounts is identifiable in a reliable manner, and illustrate potential uses for personalized recommendation.