Arrow Research search

Author name cluster

Gal Vardi

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

30 papers
2 author rows

Possible papers

30

NeurIPS Conference 2025 Conference Paper

Benign Overfitting in Single-Head Attention

  • Roey Magen
  • Shuning Shang
  • Zhiwei Xu
  • Spencer Frei
  • Wei Hu
  • Gal Vardi

The phenomenon of benign overfitting, where a trained neural network perfectly fits noisy training data but still achieves near-optimal test performance, has been extensively studied in recent years for linear models and fully-connected/convolutional networks. In this work, we study benign overfitting in a single-head softmax attention model, which is the fundamental building block of Transformers. We prove that under appropriate conditions, the model exhibits benign overfitting in a classification setting already after two steps of gradient descent. Moreover, we show conditions where a minimum-norm/maximum-margin interpolator exhibits benign overfitting. We study how the overfitting behavior depends on the signal-to-noise ratio (SNR) of the data distribution, namely, the ratio between norms of signal and noise tokens, and prove that a sufficiently large SNR is both necessary and sufficient for benign overfitting.

ICLR Conference 2025 Conference Paper

Flavors of Margin: Implicit Bias of Steepest Descent in Homogeneous Neural Networks

  • Nikolaos Tsilivis 0002
  • Gal Vardi
  • Julia Kempe

We study the implicit bias of the family of steepest descent algorithms with infinitesimal learning rate, including gradient descent, sign gradient descent and coordinate descent, in deep homogeneous neural networks. We prove that an algorithm-dependent geometric margin increases during training and characterize the late-stage bias of the algorithms. In particular, we define a generalized notion of stationarity for optimization problems and show that the algorithms progressively reduce a (generalized) Bregman divergence, which quantifies proximity to such stationary points of a margin-maximization problem. We then experimentally zoom into the trajectories of neural networks optimized with various steepest descent algorithms, highlighting connections to the implicit bias of Adam.

NeurIPS Conference 2025 Conference Paper

Temperature is All You Need for Generalization in Langevin Dynamics and other Markov Processes

  • Itamar Harel
  • Yonathan Wolanowsky
  • Gal Vardi
  • Nati Srebro
  • Daniel Soudry

We analyze the generalization gap (gap between the training and test errors) when training a potentially over-parametrized model using a Markovian stochastic training algorithm, initialized from some distribution $\theta_0 \sim p_0$. We focus on Langevin dynamics with a positive temperature $\beta^{-1}$, i. e. gradient descent on a training loss $L$ with infinitesimal step size, perturbed with $\beta^{-1}$-variances Gaussian noise, and lightly regularized or bounded. There, we bound the generalization gap, *at any time during training*, by $\sqrt{(\beta\mathbb{E} L (\theta_0) + \log(1/\delta))/N}$ with probability $1-\delta$ over the dataset, where $N$ is the sample size, and $\mathbb{E} L(\theta_0)=O(1)$ with standard initialization scaling. In contrast to previous guarantees, we have no dependence on either training time or reliance on mixing, nor a dependence on dimensionality, gradient norms, or any other properties of the loss or model. This guarantee follows from a general analysis of any Markov process-based training that has a Gibbs-style stationary distribution. The proof is surprisingly simple, once we observe that the marginal distribution divergence from initialization remains bounded, as implied by a generalized second law of thermodynamics.

ICLR Conference 2025 Conference Paper

Trained Transformer Classifiers Generalize and Exhibit Benign Overfitting In-Context

  • Spencer Frei
  • Gal Vardi

Transformers have the capacity to act as supervised learning algorithms: by properly encoding a set of labeled training (''in-context'') examples and an unlabeled test example into an input sequence of vectors of the same dimension, the forward pass of the transformer can produce predictions for that unlabeled test example. A line of recent work has shown that when linear transformers are pre-trained on random instances for linear regression tasks, these trained transformers make predictions using an algorithm similar to that of ordinary least squares. In this work, we investigate the behavior of linear transformers trained on random linear classification tasks. Via an analysis of the implicit regularization of gradient descent, we characterize how many pre-training tasks and in-context examples are needed for the trained transformer to generalize well at test-time. We further show that in some settings, these trained transformers can exhibit ''benign overfitting in-context'': when in-context examples are corrupted by label flipping noise, the transformer memorizes all of its in-context examples (including those with noisy labels) yet still generalizes near-optimally for clean test examples.

NeurIPS Conference 2025 Conference Paper

Transformers are almost optimal metalearners for linear classification

  • Roey Magen
  • Gal Vardi

Transformers have demonstrated impressive in-context learning (ICL) capabilities, raising the question of whether they can serve as metalearners that adapt to new tasks using only a small number of in-context examples, without any further training. While recent theoretical work has studied transformers' ability to perform ICL, most of these analyses do not address the formal metalearning setting, where the objective is to solve a collection of related tasks more efficiently than would be possible by solving each task individually. In this paper, we provide the first theoretical analysis showing that a simplified transformer architecture trained via gradient descent can act as a near-optimal metalearner in a linear classification setting. We consider a natural family of tasks where each task corresponds to a class-conditional Gaussian mixture model, with the mean vectors lying in a shared $k$-dimensional subspace of $\mathbb{R}^d$. After training on a sufficient number of such tasks, we show that the transformer can generalize to a new task using only $\widetilde{O}(k / \widetilde{R}^4)$ in-context examples, where $\widetilde{R}$ denotes the signal strength at test time. This performance (almost) matches that of an optimal learner that knows exactly the shared subspace and significantly outperforms any learner that only has access to the in-context data, which requires $\Omega(d / \widetilde{R}^4)$ examples to generalize.

ICLR Conference 2024 Conference Paper

An Agnostic View on the Cost of Overfitting in (Kernel) Ridge Regression

  • Lijia Zhou
  • James B. Simon
  • Gal Vardi
  • Nathan Srebro

We study the cost of overfitting in noisy kernel ridge regression (KRR), which we define as the ratio between the test error of the interpolating ridgeless model and the test error of the optimally-tuned model. We take an ``agnostic'' view in the following sense: we consider the cost as a function of sample size for any target function, even if the sample size is not large enough for consistency or the target is outside the RKHS. We analyze the cost of overfitting under a Gaussian universality ansatz using recently derived (non-rigorous) risk estimates in terms of the task eigenstructure. Our analysis provides a more refined characterization of benign, tempered and catastrophic overfitting (cf. Mallinar et al. 2022).

ICLR Conference 2024 Conference Paper

Benign Overfitting and Grokking in ReLU Networks for XOR Cluster Data

  • Zhiwei Xu
  • Yutong Wang 0002
  • Spencer Frei
  • Gal Vardi
  • Wei Hu

Neural networks trained by gradient descent (GD) have exhibited a number of surprising generalization behaviors. First, they can achieve a perfect fit to noisy training data and still generalize near-optimally, showing that overfitting can sometimes be benign. Second, they can undergo a period of classical, harmful overfitting---achieving a perfect fit to training data with near-random performance on test data---before transitioning (''grokking'') to near-optimal generalization later in training. In this work, we show that both of these phenomena provably occur in two-layer ReLU networks trained by GD on XOR cluster data where a constant fraction of the training labels are flipped. In this setting, we show that after the first step of GD, the network achieves 100\% training accuracy, perfectly fitting the noisy labels in the training data, but achieves near-random test accuracy. At a later training step, the network achieves near-optimal test accuracy while still fitting the random labels in the training data, exhibiting a ''grokking'' phenomenon. This provides the first theoretical result of benign overfitting in neural network classification when the data distribution is not linearly separable. Our proofs rely on analyzing the feature learning process under GD, which reveals that the network implements a non-generalizable linear classifier after one step and gradually learns generalizable features in later steps.

ICLR Conference 2024 Conference Paper

Noisy Interpolation Learning with Shallow Univariate ReLU Networks

  • Nirmit Joshi
  • Gal Vardi
  • Nathan Srebro

Understanding how overparameterized neural networks generalize despite perfect interpolation of noisy training data is a fundamental question. Mallinar et. al. (2022) noted that neural networks seem to often exhibit ``tempered overfitting'', wherein the population risk does not converge to the Bayes optimal error, but neither does it approach infinity, yielding non-trivial generalization. However, this has not been studied rigorously. We provide the first rigorous analysis of the overfiting behaviour of regression with minimum norm ($\ell_2$ of weights), focusing on univariate two-layer ReLU networks. We show overfitting is tempered (with high probability) when measured with respect to the $L_1$ loss, but also show that the situation is more complex than suggested by Mallinar et. al., and overfitting is catastrophic with respect to the $L_2$ loss, or when taking an expectation over the training set.

NeurIPS Conference 2024 Conference Paper

Overfitting Behaviour of Gaussian Kernel Ridgeless Regression: Varying Bandwidth or Dimensionality

  • Marko Medvedev
  • Gal Vardi
  • Nathan Srebro

We consider the overfitting behavior of minimum norm interpolating solutions of Gaussian kernel ridge regression (i. e. kernel ridgeless regression), when the bandwidth or input dimension varies with the sample size. For fixed dimensions, we show that even with varying or tuned bandwidth, the ridgeless solution is never consistent and, at least with large enough noise, always worse than the null predictor. For increasing dimension, we give a generic characterization of the overfitting behavior for any scaling of the dimension with sample size. We use this to provide the first example of benign overfitting using the Gaussian kernel with sub-polynomial scaling dimension. All our results are under the Gaussian universality ansatz and the (non-rigorous) risk predictions in terms of the kernel eigenstructure.

NeurIPS Conference 2024 Conference Paper

Provable Tempered Overfitting of Minimal Nets and Typical Nets

  • Itamar Harel
  • William M. Hoza
  • Gal Vardi
  • Itay Evron
  • Nathan Srebro
  • Daniel Soudry

We study the overfitting behavior of fully connected deep Neural Networks (NNs) with binary weights fitted to perfectly classify a noisy training set. We consider interpolation using both the smallest NN (having the minimal number of weights) and a random interpolating NN. For both learning rules, we prove overfitting is tempered. Our analysis rests on a new bound on the size of a threshold circuit consistent with a partial function. To the best of our knowledge, ours are the first theoretical results on benign or tempered overfitting that: (1) apply to deep NNs, and (2) do not require a very high or very low input dimension.

NeurIPS Conference 2023 Conference Paper

Adversarial Examples Exist in Two-Layer ReLU Networks for Low Dimensional Linear Subspaces

  • Odelia Melamed
  • Gilad Yehudai
  • Gal Vardi

Despite a great deal of research, it is still not well-understood why trained neural networks are highly vulnerable to adversarial examples. In this work we focus on two-layer neural networks trained using data which lie on a low dimensional linear subspace. We show that standard gradient methods lead to non-robust neural networks, namely, networks which have large gradients in directions orthogonal to the data subspace, and are susceptible to small adversarial $L_2$-perturbations in these directions. Moreover, we show that decreasing the initialization scale of the training algorithm, or adding $L_2$ regularization, can make the trained network more robust to adversarial perturbations orthogonal to the data.

NeurIPS Conference 2023 Conference Paper

Computational Complexity of Learning Neural Networks: Smoothness and Degeneracy

  • Amit Daniely
  • Nati Srebro
  • Gal Vardi

Understanding when neural networks can be learned efficientlyis a fundamental question in learning theory. Existing hardness results suggest that assumptions on both the input distribution and the network's weights are necessary for obtaining efficient algorithms. Moreover, it was previously shown that depth-$2$ networks can be efficiently learned under the assumptions that the input distribution is Gaussian, and the weight matrix is non-degenerate. In this work, we study whether such assumptions may suffice for learning deeper networks and prove negative results. We show that learning depth-$3$ ReLU networks under the Gaussian input distribution is hard even in the smoothed-analysis framework, where a random noise is added to the network's parameters. It implies that learning depth-$3$ ReLU networks under the Gaussian distribution is hard even if the weight matrices are non-degenerate. Moreover, we consider depth-$2$ networks, and show hardness of learning in the smoothed-analysis framework, where both the network parameters and the input distribution are smoothed. Our hardness results are under a well-studied assumption on the existence of local pseudorandom generators.

NeurIPS Conference 2023 Conference Paper

Deconstructing Data Reconstruction: Multiclass, Weight Decay and General Losses

  • Gon Buzaglo
  • Niv Haim
  • Gilad Yehudai
  • Gal Vardi
  • Yakir Oz
  • Yaniv Nikankin
  • Michal Irani

Memorization of training data is an active research area, yet our understanding of the inner workings of neural networks is still in its infancy. Recently, Haim et al. 2022 proposed a scheme to reconstruct training samples from multilayer perceptron binary classifiers, effectively demonstrating that a large portion of training samples are encoded in the parameters of such networks. In this work, we extend their findings in several directions, including reconstruction from multiclass and convolutional neural networks. We derive a more general reconstruction scheme which is applicable to a wider range of loss functions such as regression losses. Moreover, we study the various factors that contribute to networks' susceptibility to such reconstruction schemes. Intriguingly, we observe that using weight decay during training increases reconstructability both in terms of quantity and quality. Additionally, we examine the influence of the number of neurons relative to the number of training samples on the reconstructability. Code: https: //github. com/gonbuzaglo/decoreco

ICLR Conference 2023 Conference Paper

Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data

  • Spencer Frei
  • Gal Vardi
  • Peter L. Bartlett
  • Nathan Srebro
  • Wei Hu

The implicit biases of gradient-based optimization algorithms are conjectured to be a major factor in the success of modern deep learning. In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fully-connected neural networks with leaky ReLU activations when the training data are nearly-orthogonal, a common property of high-dimensional data. For gradient flow, we leverage recent work on the implicit bias for homogeneous neural networks to show that asymptotically, gradient flow produces a neural network with rank at most two. Moreover, this network is an $\ell_2$-max-margin solution (in parameter space), and has a linear decision boundary that corresponds to an approximate-max-margin linear predictor. For gradient descent, provided the random initialization variance is small enough, we show that a single step of gradient descent suffices to drastically reduce the rank of the network, and that the rank remains small throughout training. We provide experiments which suggest that a small initialization scale is important for finding low-rank neural networks with gradient descent.

NeurIPS Conference 2023 Conference Paper

Most Neural Networks Are Almost Learnable

  • Amit Daniely
  • Nati Srebro
  • Gal Vardi

We present a PTAS for learning random constant-depth networks. We show that for any fixed $\epsilon>0$ and depth $i$, there is a poly-time algorithm that for any distribution on $\sqrt{d} \cdot \mathbb{S}^{d-1}$ learns random Xavier networks of depth $i$, up to an additive error of $\epsilon$. The algorithm runs in time and sample complexity of $(\bar{d})^{\mathrm{poly}(\epsilon^{-1})}$, where $\bar d$ is the size of the network. For some cases of sigmoid and ReLU-like activations the bound can be improved to $(\bar{d})^{\mathrm{polylog}(\epsilon^{-1})}$, resulting in a quasi-poly-time algorithm for learning constant depth random networks.

NeurIPS Conference 2023 Conference Paper

The Double-Edged Sword of Implicit Bias: Generalization vs. Robustness in ReLU Networks

  • Spencer Frei
  • Gal Vardi
  • Peter Bartlett
  • Nati Srebro

In this work, we study the implications of the implicit bias of gradient flow on generalization and adversarial robustness in ReLU networks. We focus on a setting where the data consists of clusters and the correlations between cluster means are small, and show that in two-layer ReLU networks gradient flow is biased towards solutions that generalize well, but are vulnerable to adversarial examples. Our results hold even in cases where the network is highly overparameterized. Despite the potential for harmful overfitting in such settings, we prove that the implicit bias of gradient flow prevents it. However, the implicit bias also leads to non-robust solutions (susceptible to small adversarial $\ell_2$-perturbations), even though robust networks that fit the data exist.

NeurIPS Conference 2022 Conference Paper

Gradient Methods Provably Converge to Non-Robust Networks

  • Gal Vardi
  • Gilad Yehudai
  • Ohad Shamir

Despite a great deal of research, it is still unclear why neural networks are so susceptible to adversarial examples. In this work, we identify natural settings where depth-$2$ ReLU networks trained with gradient flow are provably non-robust (susceptible to small adversarial $\ell_2$-perturbations), even when robust networks that classify the training dataset correctly exist. Perhaps surprisingly, we show that the well-known implicit bias towards margin maximization induces bias towards non-robust networks, by proving that every network which satisfies the KKT conditions of the max-margin problem is non-robust.

NeurIPS Conference 2022 Conference Paper

On Margin Maximization in Linear and ReLU Networks

  • Gal Vardi
  • Ohad Shamir
  • Nati Srebro

The implicit bias of neural networks has been extensively studied in recent years. Lyu and Li (2019) showed that in homogeneous networks trained with the exponential or the logistic loss, gradient flow converges to a KKT point of the max margin problem in parameter space. However, that leaves open the question of whether this point will generally be an actual optimum of the max margin problem. In this paper, we study this question in detail, for several neural network architectures involving linear and ReLU activations. Perhaps surprisingly, we show that in many cases, the KKT point is not even a local optimum of the max margin problem. On the flip side, we identify multiple settings where a local or global optimum can be guaranteed.

NeurIPS Conference 2022 Conference Paper

On the Effective Number of Linear Regions in Shallow Univariate ReLU Networks: Convergence Guarantees and Implicit Bias

  • Itay Safran
  • Gal Vardi
  • Jason D. Lee

We study the dynamics and implicit bias of gradient flow (GF) on univariate ReLU neural networks with a single hidden layer in a binary classification setting. We show that when the labels are determined by the sign of a target network with $r$ neurons, with high probability over the initialization of the network and the sampling of the dataset, GF converges in direction (suitably defined) to a network achieving perfect training accuracy and having at most $\mathcal{O}(r)$ linear regions, implying a generalization bound. Unlike many other results in the literature, under an additional assumption on the distribution of the data, our result holds even for mild over-parameterization, where the width is $\tilde{\mathcal{O}}(r)$ and independent of the sample size.

ICLR Conference 2022 Conference Paper

On the Optimal Memorization Power of ReLU Neural Networks

  • Gal Vardi
  • Gilad Yehudai
  • Ohad Shamir

We study the memorization power of feedforward ReLU neural networks. We show that such networks can memorize any $N$ points that satisfy a mild separability assumption using $\tilde{O}\left(\sqrt{N}\right)$ parameters. Known VC-dimension upper bounds imply that memorizing $N$ samples requires $\Omega(\sqrt{N})$ parameters, and hence our construction is optimal up to logarithmic factors. We also give a generalized construction for networks with depth bounded by $1 \leq L \leq \sqrt{N}$, for memorizing $N$ samples using $\tilde{O}(N/L)$ parameters. This bound is also optimal up to logarithmic factors. Our construction uses weights with large bit complexity. We prove that having such a large bit complexity is both necessary and sufficient for memorization with a sub-linear number of parameters.

NeurIPS Conference 2022 Conference Paper

Reconstructing Training Data From Trained Neural Networks

  • Niv Haim
  • Gal Vardi
  • Gilad Yehudai
  • Ohad Shamir
  • Michal Irani

Understanding to what extent neural networks memorize training data is an intriguing question with practical and theoretical implications. In this paper we show that in some cases a significant fraction of the training data can in fact be reconstructed from the parameters of a trained neural network classifier. We propose a novel reconstruction scheme that stems from recent theoretical results about the implicit bias in training neural networks with gradient-based methods. To the best of our knowledge, our results are the first to show that reconstructing a large portion of the actual training samples from a trained neural network classifier is generally possible. This has negative implications on privacy, as it can be used as an attack for revealing sensitive training data. We demonstrate our method for binary MLP classifiers on a few standard computer vision datasets.

NeurIPS Conference 2022 Conference Paper

The Sample Complexity of One-Hidden-Layer Neural Networks

  • Gal Vardi
  • Ohad Shamir
  • Nati Srebro

We study norm-based uniform convergence bounds for neural networks, aiming at a tight understanding of how these are affected by the architecture and type of norm constraint, for the simple class of scalar-valued one-hidden-layer networks, and inputs bounded in Euclidean norm. We begin by proving that in general, controlling the spectral norm of the hidden layer weight matrix is insufficient to get uniform convergence guarantees (independent of the network width), while a stronger Frobenius norm control is sufficient, extending and improving on previous work. Motivated by the proof constructions, we identify and analyze two important settings where (perhaps surprisingly) a mere spectral norm control turns out to be sufficient: First, when the network's activation functions are sufficiently smooth (with the result extending to deeper networks); and second, for certain types of convolutional networks. In the latter setting, we study how the sample complexity is additionally affected by parameters such as the amount of overlap between patches and the overall number of patches.

NeurIPS Conference 2021 Conference Paper

Learning a Single Neuron with Bias Using Gradient Descent

  • Gal Vardi
  • Gilad Yehudai
  • Ohad Shamir

We theoretically study the fundamental problem of learning a single neuron with a bias term ($\mathbf{x}\mapsto \sigma(\langle\mathbf{w}, \mathbf{x}\rangle + b)$) in the realizable setting with the ReLU activation, using gradient descent. Perhaps surprisingly, we show that this is a significantly different and more challenging problem than the bias-less case (which was the focus of previous works on single neurons), both in terms of the optimization geometry as well as the ability of gradient methods to succeed in some scenarios. We provide a detailed study of this problem, characterizing the critical points of the objective, demonstrating failure cases, and providing positive convergence guarantees under different sets of assumptions. To prove our results, we develop some tools which may be of independent interest, and improve previous results on learning single neurons.

NeurIPS Conference 2020 Conference Paper

Hardness of Learning Neural Networks with Natural Weights

  • Amit Daniely
  • Gal Vardi

Neural networks are nowadays highly successful despite strong hardness results. The existing hardness results focus on the network architecture, and assume that the network's weights are arbitrary. A natural approach to settle the discrepancy is to assume that the network's weights are ``well-behaved" and posses some generic properties that may allow efficient learning. This approach is supported by the intuition that the weights in real-world networks are not arbitrary, but exhibit some ''random-like" properties with respect to some ''natural" distributions. We prove negative results in this regard, and show that for depth-$2$ networks, and many ``natural" weights distributions such as the normal and the uniform distribution, most networks are hard to learn. Namely, there is no efficient learning algorithm that is provably successful for most weights, and every input distribution. It implies that there is no generic property that holds with high probability in such random networks and allows efficient learning.

NeurIPS Conference 2020 Conference Paper

Neural Networks with Small Weights and Depth-Separation Barriers

  • Gal Vardi
  • Ohad Shamir

In studying the expressiveness of neural networks, an important question is whether there are functions which can only be approximated by sufficiently deep networks, assuming their size is bounded. However, for constant depths, existing results are limited to depths $2$ and $3$, and achieving results for higher depths has been an important open question. In this paper, we focus on feedforward ReLU networks, and prove fundamental barriers to proving such results beyond depth $4$, by reduction to open problems and natural-proof barriers in circuit complexity. To show this, we study a seemingly unrelated problem of independent interest: Namely, whether there are polynomially-bounded functions which require super-polynomial weights in order to approximate with constant-depth neural networks. We provide a negative and constructive answer to that question, by showing that if a function can be approximated by a polynomially-sized, constant depth $k$ network with arbitrarily large weights, it can also be approximated by a polynomially-sized, depth $3k+3$ network, whose weights are polynomially bounded.

JAAMAS Journal 2019 Journal Article

Multi-player flow games

  • Shibashis Guha
  • Orna Kupferman
  • Gal Vardi

Abstract In the traditional maximum-flow problem, the goal is to transfer maximum flow in a network by directing, in each vertex in the network, incoming flow to outgoing edges. The problem corresponds to settings in which a central authority has control on all vertices of the network. Today’s computing environment, however, involves systems with no central authority. In particular, in many applications of flow networks, the vertices correspond to decision-points controlled by different and selfish entities. For example, in communication networks, routers may belong to different companies, with different destination objectives. This suggests that the maximum-flow problem should be revisited, and examined from a game-theoretic perspective. We introduce and study multi-player flow games (MFGs, for short). Essentially, the vertices of an MFG are partitioned among the players, and a player that owns a vertex directs the flow that reaches it. Each player has a different target vertex, and the objective of each player is to maximize the flow that reaches her target vertex. We study the stability of MFGs and show that, unfortunately, an MFG need not have a Nash Equilibrium. Moreover, the price of anarchy and even the price of stability of MFGs are unbounded. That is, the reduction in the flow due to selfish behavior is unbounded. We study the problem of deciding whether a given MFG has a Nash Equilibrium and show that it is \(\Sigma _2^P\) -complete, as well as the problem of finding optimal strategies for the players (that is, best-response moves), which we show to be NP-complete. We continue with some good news and consider a variant of MFGs in which flow may be swallowed. For example, when routers in a communication network may drop messages. We show that, surprisingly, while this model seems to incentivize selfish behavior, a Nash Equilibrium that achieves the maximum flow always exists, and can be found in polynomial time. Finally, we consider MFGs in which the strategies of the players may use non-integral flows, which we show to be stronger.

AAMAS Conference 2018 Conference Paper

Multi-Player Flow Games

  • Shibashis Guha
  • Orna Kupferman
  • Gal Vardi

In the traditional maximum-flow problem, the goal is to transfer maximum flow in a network by directing, in each vertex in the network, incoming flow to outgoing edges. The problem corresponds to settings in which a central authority has control on all vertices of the network. Today’s computing environment, however, involves systems with no central authority. In particular, in many applications of flow networks, the vertices correspond to decisionpoints controlled by different and selfish entities. For example, in communication networks, routers may belong to different companies, with different destination objectives. This suggests that the maximum-flow problem should be revisited, and examined from a game-theoretic perspective. We introduce and study multi-player flow games (MFGs, for short). Essentially, the vertices of an MFG are partitioned among the players, and a player that owns a vertex directs the flow that reaches it. Each player has a different target vertex, and the objective of each player is to maximize the flow that reaches her target vertex. We study the stability of MFGs and show that, unfortunately, an MFG need not have a Nash Equilibrium. Moreover, the Price of Anarchy and even the Price of Stability of MFGs are unbounded. That is, the reduction in the flow due to selfish behavior is unbounded. We study the problem of deciding whether a given MFG has a Nash Equilibrium and show that it is ΣP 2 -complete, as well as the problem of finding optimal strategies for the players (that is, best-response moves), which we show to be NP-complete. We continue with some good news and consider a variant of MFGs in which flow may be swallowed. For example, when routers in a communication network may drop messages. We show that, surprisingly, while this model seems to incentivize selfish behavior, a Nash Equilibrium that achieves the maximum flow always exists, and can be found in polynomial time. Finally, we consider MFGs in which the strategies of the players may use non-integral flows, which we show to be stronger.

MFCS Conference 2018 Conference Paper

Spanning-Tree Games

  • Dan Hefetz
  • Orna Kupferman
  • Amir Lellouche
  • Gal Vardi

We introduce and study a game variant of the classical spanning-tree problem. Our spanning-tree game is played between two players, min and max, who alternate turns in jointly constructing a spanning tree of a given connected weighted graph G. Starting with the empty graph, in each turn a player chooses an edge that does not close a cycle in the forest that has been generated so far and adds it to that forest. The game ends when the chosen edges form a spanning tree in G. The goal of min is to minimize the weight of the resulting spanning tree and the goal of max is to maximize it. A strategy for a player is a function that maps each forest in G to an edge that is not yet in the forest and does not close a cycle. We show that while in the classical setting a greedy approach is optimal, the game setting is more complicated: greedy strategies, namely ones that choose in each turn the lightest (min) or heaviest (max) legal edge, are not necessarily optimal, and calculating their values is NP-hard. We study the approximation ratio of greedy strategies. We show that while a greedy strategy for min guarantees nothing, the performance of a greedy strategy for max is satisfactory: it guarantees that the weight of the generated spanning tree is at least w(MST(G))/2, where w(MST(G)) is the weight of a maximum spanning tree in G, and its approximation ratio with respect to an optimal strategy for max is 1. 5+1/w(MST(G)), assuming weights in [0, 1]. We also show that these bounds are tight. Moreover, in a stochastic setting, where weights for the complete graph K_n are chosen at random from [0, 1], the expected performance of greedy strategies is asymptotically optimal. Finally, we study some variants of the game and study an extension of our results to games on general matroids.

MFCS Conference 2016 Conference Paper

Eulerian Paths with Regular Constraints

  • Orna Kupferman
  • Gal Vardi

Labeled graphs, in which edges are labeled by letters from some alphabet Sigma, are extensively used to model many types of relations associated with actions, costs, owners, or other properties. Each path in a labeled graph induces a word in Sigma^* -- the one obtained by concatenating the letters along the edges in the path. Classical graph-theory problems give rise to new problems that take these words into account. We introduce and study the constrained Eulerian path problem. The input to the problem is a Sigma-labeled graph G and a specification L \subseteq Sigma^*. The goal is to find an Eulerian path in G that satisfies L. We consider several classes of the problem, defined by the classes of G and L. We focus on the case L is regular and show that while the problem is in general NP-complete, even for very simple graphs and specifications, there are classes that can be solved efficiently. Our results extend work on Eulerian paths with edge-order constraints. We also study the constrained Chinese postman problem, where edges have costs and the goal is to find a cheapest path that contains each edge at least once and satisfies the specification. Finally, we define and study the Eulerian language of a graph, namely the set of words along its Eulerian paths.

CSL Conference 2015 Conference Paper

On Relative and Probabilistic Finite Counterability

  • Orna Kupferman
  • Gal Vardi

A counterexample to the satisfaction of a linear property psi in a system S is an infinite computation of S that violates psi. Counterexamples are of great help in detecting design errors and in modeling methodologies such as CEGAR. When psi is a safety property, a counterexample to its satisfaction need not be infinite. Rather, it is a bad-prefix for psi: a finite word all whose extensions violate psi. The existence of finite counterexamples is very helpful in practice. Liveness properties do not have bad-prefixes and thus do not have finite counterexamples. We extend the notion of finite counterexamples to non-safety properties. We study counterable languages - ones that have at least one bad-prefix. Thus, a language is counterable iff it is not liveness. Three natural problems arise: (1) Given a language, decide whether it is counterable, (2) study the length of minimal bad-prefixes for counterable languages, and (3) develop algorithms for detecting bad-prefixes for counterable languages. We solve the problems for languages given by means of LTL formulas or nondeterministic Büchi automata. In particular, our EXPSPACE-completeness proof for the problem of deciding whether a given LTL formula is counterable, and hence also for deciding whether it is liveness, settles a long-standing open problem. We also make finite counterexamples more relevant and helpful by introducing two variants of the traditional definition of bad-prefixes. The first adds a probabilistic component to the definition. There, a prefix is bad if almost all its extensions violate the property. The second makes it relative to the system. There, a prefix is bad if all its extensions in the system violate the property. We also study the combination of the probabilistic and relative variants. Our framework suggests new variants also of safety and liveness languages. We solve the above three problems for the different variants. Interestingly, the probabilistic variant not only increases the chances to return finite counterexamples, but also makes the solution of the three problems exponentially easier.