SODA Conference 2025 Conference Paper
Lipschitz Continuous Algorithms for Covering Problems
- Soh Kumabe
- Yuichi Yoshida
Author name cluster
Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.
SODA Conference 2025 Conference Paper
NeurIPS Conference 2023 Conference Paper
We introduce a transformation framework that can be utilized to develop online algorithms with low $\epsilon$-approximate regret in the random-order model from offline approximation algorithms. We first give a general reduction theorem that transforms an offline approximation algorithm with low average sensitivity to an online algorithm with low $\epsilon$-approximate regret. We then demonstrate that offline approximation algorithms can be transformed into a low-sensitivity version using a coreset construction method. To showcase the versatility of our approach, we apply it to various problems, including online $(k, z)$-clustering, online matrix approximation, and online regression, and successfully achieve polylogarithmic $\epsilon$-approximate regret for each problem. Moreover, we show that in all three cases, our algorithm also enjoys low inconsistency, which may be desired in some online applications.
ICLR Conference 2023 Conference Paper
A decision tree is a fundamental model used in data mining and machine learning. In practice, the training data used to construct a decision tree may change over time or contain noise, and a drastic change in the learned tree structure owing to such data perturbation is unfavorable. For example, in data mining, a change in the tree implies a change in the extracted knowledge, which raises the question of whether the extracted knowledge is truly reliable or is only a noisy artifact. To alleviate this issue, we design decision tree learning algorithms that are stable against insignificant perturbations in the training data. Specifically, we adopt the notion of average sensitivity as a stability measure, and design an algorithm with low average sensitivity that outputs a decision tree whose accuracy is close to the optimal decision tree. The experimental results on real-world datasets demonstrate that the proposed algorithm enables users to select suitable decision trees considering the trade-off between average sensitivity and accuracy.
ICML Conference 2023 Conference Paper
Variational autoencoders (VAEs) are one of the deep generative models that have experienced enormous success over the past decades. However, in practice, they suffer from a problem called posterior collapse, which occurs when the posterior distribution coincides, or collapses, with the prior taking no information from the latent structure of the input data into consideration. In this work, we introduce an inverse Lipschitz neural network into the decoder and, based on this architecture, provide a new method that can control in a simple and clear manner the degree of posterior collapse for a wide range of VAE models equipped with a concrete theoretical guarantee. We also illustrate the effectiveness of our method through several numerical experiments.
FOCS Conference 2023 Conference Paper
Graph algorithms are widely used for decision making and knowledge discovery. To ensure their effectiveness, it is essential that their output remains stable even when subjected to small perturbations to the input because frequent output changes can result in costly decisions, reduced user trust, potential security concerns, and lack of replicability. In this study, we consider the Lipschitz continuity of algorithms as a stability measure and initiate a systematic study of the Lipschitz continuity of algorithms for (weighted) graph problems. Depending on how we embed the output solution to a metric space, we can think of several Lipschitzness notions. We mainly consider the one that is invariant under scaling of weights, and we provide Lipschitz continuous algorithms and lower bounds for the minimum spanning tree problem, the shortest path problem, and the maximum weight matching problem. In particular, our shortest path algorithm is obtained by first designing an algorithm for unweighted graphs that are robust against edge contractions and then applying it to the unweighted graph constructed from the original weighted graph. Then, we consider another Lipschitzness notion induced by a natural mapping from the output solution to its characteristic vector. It turns out that no Lipschitz continuous algorithm exists for this Lipschitz notion, and we instead design algorithms with bounded pointwise Lipschitz constants for the minimum spanning tree problem and the maximum weight bipartite matching problem. Our algorithm for the latter problem is based on an LP relaxation with entropy regularization.
SODA Conference 2023 Conference Paper
SODA Conference 2023 Conference Paper
SODA Conference 2022 Conference Paper
When processing data with uncertainty, it is desirable that the output of the algorithm is stable against small perturbations in the input. Varma and Yoshida [SODA'21] recently formalized this idea and proposed the notion of average sensitivity of algorithms, which is roughly speaking, the average Hamming distance between solutions for the original input and that obtained by deleting one element from the input, where the average is taken over the deleted element. In this work, we consider average sensitivity of algorithms for problems that can be solved by dynamic programming. We first present a (1– δ )-approximation algorithm for finding a maximum weight chain (MWC) in a transitive directed acyclic graph with average sensitivity O ( δ –1 log 3 n ), where n is the number of vertices in the graph. We then show algorithms with small average sensitivity for various dynamic programming problems by reducing them to the MWC problem while preserving average sensitivity, including the longest increasing subsequence problem, the interval scheduling problem, the longest common subsequence problem, the longest palindromic subsequence problem, the knapsack problem with integral weight, and the RNA folding problem. For the RNA folding problem, our reduction is highly nontrivial because a naive reduction generates an exponentially large graph, which only provides a trivial average sensitivity bound.
NeurIPS Conference 2022 Conference Paper
Given a set of $n$ points in $\mathbb{R}^d$, the goal of Euclidean $(k, \ell)$-clustering is to find $k$ centers that minimize the sum of the $\ell$-th powers of the Euclidean distance of each point to the closest center. In practical situations, the clustering result must be stable against points missing in the input data so that we can make trustworthy and consistent decisions. To address this issue, we consider the average sensitivity of Euclidean $(k, \ell)$-clustering, which measures the stability of the output in total variation distance against deleting a random point from the input data. We first show that a popular algorithm \textsc{$k$-means++} and its variant called \textsc{$D^\ell$-sampling} have low average sensitivity. Next, we show that any approximation algorithm for Euclidean $(k, \ell)$-clustering can be transformed to an algorithm with low average sensitivity while almost preserving the approximation guarantee. As byproducts of our results, we provide several algorithms for consistent $(k, \ell)$-clustering and dynamic $(k, \ell)$-clustering in the random-order model, where the input points are randomly permuted and given in an online manner. The goal of the consistent setting is to maintain a good solution while minimizing the number of changes to the solution during the process, and that of the dynamic setting is to maintain a good solution while minimizing the (amortized) update time.
TCS Journal 2022 Journal Article
AAAI Conference 2022 Conference Paper
Submodular functions are at the core of many machine learning and data mining tasks. The underlying submodular functions for many of these tasks are decomposable, i. e. , they are sum of several simple submodular functions. In many data intensive applications, however, the number of underlying submodular functions in the original function is so large that we need prohibitively large amount of time to process it and/or it does not even fit in the main memory. To overcome this issue, we introduce the notion of sparsification for decomposable submodular functions whose objective is to obtain an accurate approximation of the original function that is a (weighted) sum of only a few submodular functions. Our main result is a polynomial-time randomized sparsification algorithm such that the expected number of functions used in the output is independent of the number of underlying submodular functions in the original function. We also study the effectiveness of our algorithm under various constraints such as matroid and cardinality constraints. We complement our theoretical analysis with an empirical study of the performance of our algorithm.
SODA Conference 2021 Conference Paper
In modern applications of graph algorithms, where the graphs of interest are large and dynamic, it is unrealistic to assume that an input representation contains the full information of a graph being studied. Hence, it is desirable to use algorithms that, even when provided with only a (large) subgraph, output solutions that are close to the solutions output when the whole graph is available. We formalize this feature by introducing the notion of average sensitivity of graph algorithms, which is the average earth mover's distance between the output distributions of an algorithm on a graph and its subgraph obtained by removing an edge, where the average is over the edges removed and the distance between two outputs is the Hamming distance. In this work, we initiate a systematic study of average sensitivity. After deriving basic properties of average sensitivity such as composition, we provide efficient approximation algorithms with low average sensitivities for concrete graph problems, including the minimum spanning forest problem, the global minimum cut problem, the minimum s-t cut problem, and the maximum matching problem. In addition, we prove that the average sensitivity of our global minimum cut algorithm is almost optimal, by showing a nearly matching lower bound. We also show that every algorithm for the 2-coloring problem has average sensitivity linear in the number of vertices. One of the main ideas involved in designing our algorithms with low average sensitivity is the following fact; if the presence of a vertex or an edge in the solution output by an algorithm can be decided locally, then the algorithm has a low average sensitivity, allowing us to reuse the analyses of known sublinear-time algorithms and local computation algorithms. Using this fact in conjunction with our average sensitivity lower bound for 2-coloring, we show that every local computation algorithm for 2-coloring has query complexity linear in the number of vertices, thereby answering an open question.
IJCAI Conference 2021 Conference Paper
We present a polynomial-time online algorithm for maximizing the conditional value at risk (CVaR) of a monotone stochastic submodular function. Given T i. i. d. samples from an underlying distribution arriving online, our algorithm produces a sequence of solutions that converges to a (1−1/e)-approximate solution with a convergence rate of O(T −1/4 ) for monotone continuous DR-submodular functions. Compared with previous offline algorithms, which require Ω(T) space, our online algorithm only requires O( √ T) space. We extend our on- line algorithm to portfolio optimization for mono- tone submodular set functions under a matroid constraint. Experiments conducted on real-world datasets demonstrate that our algorithm can rapidly achieve CVaRs that are comparable to those obtained by existing offline algorithms.
TCS Journal 2021 Journal Article
FOCS Conference 2021 Conference Paper
Graph sparsification has been studied extensively over the past two decades, culminating in spectral sparsifiers of optimal size (up to constant factors). Spectral hypergraph sparsification is a natural analogue of this problem, for which optimal bounds on the sparsifier size are not known, mainly because the hypergraph Laplacian is non-linear, and thus lacks the linear-algebraic structure and tools that have been so effective for graphs. Our main contribution is the first algorithm for constructing $\epsilon$ -spectral sparsifiers for hypergraphs with $O^{\ast}(n)$ hyperedges, where $O^{\ast}$ suppresses $(\epsilon^{-1}\log n)^{O(1)}$ factors. This bound is independent of the rank $r$ (maximum cardinality of a hyperedge), and is essentially best possible due to a recent bit complexity lower bound of $\Omega(nr)$ for hypergraph sparsification. This result is obtained by introducing two new tools. First, we give a new proof of spectral concentration bounds for sparsifiers of graphs; it avoids linear-algebraic methods, replacing e. g. the usual application of the matrix Bernstein inequality and therefore applies to the (non-linear) hypergraph setting. To achieve the result, we design a new sequence of hypergraph-dependent $\epsilon$ -nets on the unit sphere in $\mathbb{R}^{n}$. Second, we extend the weight-assignment technique of Chen, Khanna and Nagda [FOCS'20] to the spectral sparsification setting. Surprisingly, the number of spanning trees after the weight assignment can serve as a potential function guiding the reweighting process in the spectral setting.
STOC Conference 2021 Conference Paper
Cut and spectral sparsification of graphs have numerous applications, including e.g. speeding up algorithms for cuts and Laplacian solvers. These powerful notions have recently been extended to hypergraphs, which are much richer and may offer new applications. However, the current bounds on the size of hypergraph sparsifiers are not as tight as the corresponding bounds for graphs. Our first result is a polynomial-time algorithm that, given a hypergraph on n vertices with maximum hyperedge size r , outputs an є-spectral sparsifier with O * ( nr ) hyperedges, where O * suppresses (є −1 log n ) O (1) factors. This size bound improves the two previous bounds: O * ( n 3 ) [Soma and Yoshida, SODA’19] and O * ( nr 3 ) [Bansal, Svensson and Trevisan, FOCS’19]. Our main technical tool is a new method for proving concentration of the nonlinear analogue of the quadratic form of the Laplacians for hypergraph expanders. We complement this with lower bounds on the bit complexity of any compression scheme that (1+є)-approximates all the cuts in a given hypergraph, and hence also on the bit complexity of every є-cut/spectral sparsifier. These lower bounds are based on Ruzsa-Szemerédi graphs, and a particular instantiation yields an Ω( nr ) lower bound on the bit complexity even for fixed constant є. In the case of hypergraph cut sparsifiers, this is tight up to polylogarithmic factors in n , due to recent result of [Chen, Khanna and Nagda, FOCS’20]. For spectral sparsifiers it narrows the gap to O * ( r ). Finally, for directed hypergraphs, we present an algorithm that computes an є-spectral sparsifier with O * ( n 2 r 3 ) hyperarcs, where r is the maximum size of a hyperarc. For small r , this improves over O * ( n 3 ) known from [Soma and Yoshida, SODA’19], and is getting close to the trivial lower bound of Ω( n 2 ) hyperarcs.
ICML Conference 2020 Conference Paper
The problem of maximizing nonnegative monotone submodular functions under a certain constraint has been intensively studied in the last decade, and a wide range of efficient approximation algorithms have been developed for this problem. Many machine learning problems, including data summarization and influence maximization, can be naturally modeled as the problem of maximizing monotone submodular functions. However, when such applications involve sensitive data about individuals, their privacy concerns should be addressed. In this paper, we study the problem of maximizing monotone submodular functions subject to matroid constraints in the framework of differential privacy. We provide $(1-\frac{1}{\mathrm{e}})$-approximation algorithm which improves upon the previous results in terms of approximation guarantee. This is done with an almost cubic number of function evaluations in our algorithm. Moreover, we study $k$-submodularity, a natural generalization of submodularity. We give the first $\frac{1}{2}$-approximation algorithm that preserves differential privacy for maximizing monotone $k$-submodular functions subject to matroid constraints. The approximation ratio is asymptotically tight and is obtained with an almost linear number of function evaluations.
NeurIPS Conference 2020 Conference Paper
We propose novel algorithms with first- and second-order regret bounds for adversarial linear bandits. These regret bounds imply that our algorithms perform well when there is an action achieving a small cumulative loss or the loss has a small variance. In addition, we need only assumptions weaker than those of existing algorithms; our algorithms work on discrete action sets as well as continuous ones without a priori knowledge about losses, and they run efficiently if a linear optimization oracle for the action set is available. These results are obtained by combining optimistic online optimization, continuous multiplicative weight update methods, and a novel technique that we refer to as distribution truncation. We also show that the regret bounds of our algorithms are tight up to polylogarithmic factors.
SODA Conference 2019 Conference Paper
The Cheeger inequality for undirected graphs, which relates the conductance of an undirected graph and the second smallest eigenvalue of its normalized Laplacian, is a cornerstone of spectral graph theory. The Cheeger inequality has been extended to directed graphs and hypergraphs using normalized Laplacians for those, that are no longer linear but piecewise linear transformations. In this paper, we introduce the notion of a submodular transformation F: {0, 1} n → ℝ m, which applies m submodular functions to the n -dimensional input vector, and then introduce the notions of its Laplacian and normalized Laplacian. With these notions, we unify and generalize the existing Cheeger inequalities by showing a Cheeger inequality for submodular transformations, which relates the conductance of a submodular transformation and the smallest non-trivial eigenvalue of its normalized Laplacian. This result recovers the Cheeger inequalities for undirected graphs, directed graphs, and hypergraphs, and derives novel Cheeger inequalities for mutual information and directed information. Computing the smallest non-trivial eigenvalue of a normalized Laplacian of a submodular transformation is NP-hard under the small set expansion hypothesis. In this paper, we present a polynomial-time O (log n )-approximation algorithm for the symmetric case, which is tight, and a polynomial-time O (log 2 n + log n · log m )-approximation algorithm for the general case. We expect the algebra concerned with submodular transformations, or submodular algebra, to be useful in the future not only for generalizing spectral graph theory but also for analyzing other problems that involve piecewise linear transformations, e. g. , deep learning.
SODA Conference 2019 Conference Paper
For an undirected/directed hypergraph G = ( V, E ), its Laplacian L G: ℝ v → ℝ v is defined such that its “quadratic form” x ⊺ L G ( x ) captures the cut information of G. In particular, 1 S ⊺ L G (1 S ) coincides with the cut size of S ⊆ V, where 1 S ∊ ℝ V is the characteristic vector of S. A weighted subgraph H of a hypergraph G on a vertex set V is said to be an ∊-spectral sparsifier of G if (1 – ∊ ) x ⊺ L H ( x ) ≤ x ⊺ L G ( x ) ≤ (1 + ∊ ) x ⊺ L H ( x ) holds for every x ∊ ℝ V. In this paper, we present a polynomial-time algorithm that, given an undirected/directed hypergraph G on n vertices, constructs an ∊ -spectral sparsifier of G with O ( n 3 log n / ∊ 2 ) hyperedges/hyperarcs. The proposed spectral sparsification can be used to improve the time and space complexities of algorithms for solving problems that involve the quadratic form, such as computing the eigenvalues of L G, computing the effective resistance between a pair of vertices in G, semi-supervised learning based on L G, and cut problems on G. In addition, our sparsification result implies that any nonnegative hypernetwork type submodular function can be concisely represented by a directed hypergraph of polynomial size, even if the original representation is of exponential size. Accordingly, we show that, for any distribution, we can properly and agnostically learn nonnegative hypernetwork type submodular functions with O ( n 4 log( n / ∊ )/ ∊ 4 ) samples.
UAI Conference 2019 Conference Paper
Various regularizers inducing structured-sparsity are constructed as Lovász extensions of submodular functions. In this paper, we consider a hierarchical probabilistic model of linear regression and its kernel extension with this type of regularization, and develop a variational inference scheme for the posterior estimate on this model. We derive an upper bound on the partition function with an approximation guarantee, and then show that minimizing this bound is equivalent to the minimization of a quadratic function over the polyhedron determined by the corresponding submodular function, which can be solved efficiently by the proximal gradient algorithm. Our scheme gives a natural extension of the Bayesian Lasso model for the maximum a posteriori (MAP) estimation to a variety of regularizers inducing structured sparsity, and thus this work provides a principled way to transfer the advantages of the Bayesian formulation into those models. Finally, we investigate the empirical performance of our scheme with several Bayesian variants of widely known models such as Lasso, generalized fused Lasso, and non-overlapping group Lasso.
FOCS Conference 2018 Conference Paper
A recent trend in the design of FPT algorithms is exploiting the half-integrality of LP relaxations. In other words, starting with a half-integral optimal solution to an LP relaxation, we assign integral values to variables one-by-one by branch and bound. This technique is general and the resulting time complexity has a low dependency on the parameter. However, the time complexity often becomes a large polynomial in the input size because we need to compute half-integral optimal LP solutions. In this paper, we address this issue by providing an O(km)-time algorithm for solving the LPs arising from various FPT problems, where k is the optimal value and m is the number of edges/constraints. Our algorithm is based on interesting connections among 0/1/all constraints, which has been studied in the field of constraints satisfaction, A-path packing, which has been studied in the field of combinatorial optimization, and the LPs used in FPT algorithms. With the aid of this algorithm, we obtain linear-time FPT algorithms for various problems. The obtained running time for each problem is linear in the input size and has the current smallest dependency on the parameter. Most importantly, instead of using problem-specific approaches, we obtain all of these results by a unified approach, i. e. , the branch-and-bound framework combined with the efficient computation of half-integral LPs, which demonstrates its generality.
IJCAI Conference 2018 Conference Paper
In a cooperative game, the utility of a coalition of players is given by the characteristic function, and the goal is to find a stable value division of the total utility to the players. In real-world applications, however, multiple scenarios could exist, each of which determines a characteristic function, and which scenario is more important is unknown. To handle such situations, the notion of multi-scenario cooperative games and several solution concepts have been proposed. However, computing the value divisions in those solution concepts is intractable in general. To resolve this issue, we focus on supermodular two-scenario cooperative games in which the number of scenarios is two and the characteristic functions are supermodular and study the computational aspects of a major solution concept called the preference core. First, we show that we can compute the value division in the preference core of a supermodular two-scenario game in polynomial time. Then, we reveal the relations among preference cores with different parameters. Finally, we provide more efficient algorithms for deciding the non-emptiness of the preference core for several specific supermodular two-scenario cooperative games such as the airport game, multicast tree game, and a special case of the generalized induced subgraph game.
ICLR Conference 2018 Conference Paper
One of the challenges in the study of generative adversarial networks is the instability of its training. In this paper, we propose a novel weight normalization technique called spectral normalization to stabilize the training of the discriminator. Our new normalization technique is computationally light and easy to incorporate into existing implementations. We tested the efficacy of spectral normalization on CIFAR10, STL-10, and ILSVRC2012 dataset, and we experimentally confirmed that spectrally normalized GANs (SN-GANs) is capable of generating images of better or equal quality relative to the previous training stabilization techniques.
AAAI Conference 2018 Conference Paper
Co-occurrences between two words provide useful insights into the semantics of those words. Consequently, numerous prior work on word embedding learning has used cooccurrences between two words as the training signal for learning word embeddings. However, in natural language texts it is common for multiple words to be related and cooccurring in the same context. We extend the notion of cooccurrences to cover k(≥2)-way co-occurrences among a set of k-words. Specifically, we prove a theoretical relationship between the joint probability of k(≥2) words, and the sum of 2 norms of their embeddings. Next, we propose a learning objective motivated by our theoretical result that utilises k-way co-occurrences for learning word embeddings. Our experimental results show that the derived theoretical relationship does indeed hold empirically, and despite data sparsity, for some smaller k(≤5) values, k-way embeddings perform comparably or better than 2-way embeddings in a range of tasks.
AAAI Conference 2017 Conference Paper
One of the goals of a cooperative game is to compute a value division to the players from which they have no incentive to deviate. This concept is formalized as the notion of the core. To obtain a value division that motivates players to cooperate to a greater extent or that is more robust under noise, the notions of the strong least core and the weak least core have been considered. In this paper, we characterize the strong and the weak least cores of supermodular cooperative games using the theory of minimizing crossing submodular functions. We then apply our characterizations to two representative supermodular cooperative games, namely, the induced subgraph game generalized to hypergraphs and the airport game. For these games, we derive explicit forms of the strong and weak least core values, and provide polynomial-time algorithms that compute value divisions in the strong and weak least cores.
NeurIPS Conference 2017 Conference Paper
In this paper, we develop an algorithm that approximates the residual error of Tucker decomposition, one of the most popular tensor decomposition methods, with a provable guarantee. Given an order-$K$ tensor $X\in\mathbb{R}^{N_1\times\cdots\times N_K}$, our algorithm randomly samples a constant number $s$ of indices for each mode and creates a ``mini'' tensor $\tilde{X}\in\mathbb{R}^{s\times\cdots\times s}$, whose elements are given by the intersection of the sampled indices on $X$. Then, we show that the residual error of the Tucker decomposition of $\tilde{X}$ is sufficiently close to that of $X$ with high probability. This result implies that we can figure out how much we can fit a low-rank tensor to $X$ \emph{in constant time}, regardless of the size of $X$. This is useful for guessing the favorable rank of Tucker decomposition. Finally, we demonstrate how the sampling method works quickly and accurately using multiple real datasets.
AAAI Conference 2017 Conference Paper
We consider non-monotone DR-submodular function maximization, where DR-submodularity (diminishing return submodularity) is an extension of submodularity for functions over the integer lattice based on the concept of the diminishing return property. Maximizing non-monotone DRsubmodular functions has many applications in machine learning that cannot be captured by submodular set functions. In this paper, we present a 1 2+ -approximation algorithm with a running time of roughly O(n log2 B), where n is the size of the ground set, B is the maximum value of a coordinate, and > 0 is a parameter. The approximation ratio is almost tight and the dependency of running time on B is exponentially smaller than the naive greedy algorithm. Experiments on synthetic and real-world datasets demonstrate that our algorithm outputs almost the best solution compared to other baseline algorithms, whereas its running time is several orders of magnitude faster.
AAAI Conference 2017 Conference Paper
In the analysis of real-world complex networks, identifying important vertices is one of the most fundamental operations. A variety of centrality measures have been proposed and extensively studied in various research areas. Many of distancebased centrality measures embrace some issues in treating disconnected networks, which are resolved by the recently emerged harmonic centrality. This paper focuses on a family of centrality measures including the harmonic centrality and its variants, and addresses their computational difficulty on very large graphs by presenting a new estimation algorithm named the random-radius ball (RRB) method. The RRB method is easy to implement, and a theoretical analysis, which includes the time complexity and error bounds, is also provided. The effectiveness of the RRB method over existing algorithms is demonstrated through experiments on real-world networks.
AAAI Conference 2017 Conference Paper
Submodular function maximization has numerous applications in machine learning and artificial intelligence. Many real applications require multiple submodular objective functions to be maximized, and which function is regarded as important by a user is not known in advance. In such cases, it is desirable to have a small family of representative solutions that would satisfy any user’s preference. A traditional approach for solving such a problem is to enumerate the Pareto optimal solutions. However, owing to the massive number of Pareto optimal solutions (possibly exponentially many), it is difficult for a user to select a solution. In this paper, we propose two efficient methods for finding a small family of representative solutions, based on the notion of regret ratio. The first method outputs a family of fixed size with a non-trivial regret ratio. The second method enables us to choose the size of the output family, and in the biobjective case, it has a provable trade-off between the size and the regret ratio. Using real and synthetic data, we empirically demonstrate that our methods achieve a small regret ratio.
IJCAI Conference 2016 Conference Paper
In a connected graph, spanning tree centralities of a vertex and an edge measure how crucial they are for the graph to be connected. In this paper, we propose efficient algorithms for estimating spanning tree centralities with theoretical guarantees on their accuracy. We experimentally demonstrate that our methods are orders of magnitude faster than previous methods. Then, we propose a novel centrality notion of a vertex, called aggregated spanning tree centrality, which also considers the number of connected components obtained by removing the vertex. We also give an efficient algorithm for estimating aggregated spanning tree centrality. Finally, we experimentally show that those spanning tree centralities are useful to identify vulnerable edges and vertices in infrastructure networks.
SODA Conference 2016 Conference Paper
SODA Conference 2016 Conference Paper
NeurIPS Conference 2016 Conference Paper
A sampling-based optimization method for quadratic functions is proposed. Our method approximately solves the following $n$-dimensional quadratic minimization problem in constant time, which is independent of $n$: $z^*=\min_{\bv \in \bbR^n}\bracket{\bv}{A \bv} + n\bracket{\bv}{\diag(\bd)\bv} + n\bracket{\bb}{\bv}$, where $A \in \bbR^{n \times n}$ is a matrix and $\bd, \bb \in \bbR^n$ are vectors. Our theoretical analysis specifies the number of samples $k(\delta, \epsilon)$ such that the approximated solution $z$ satisfies $|z - z^*| = O(\epsilon n^2)$ with probability $1-\delta$. The empirical performance (accuracy and runtime) is positively confirmed by numerical experiments.
SODA Conference 2016 Conference Paper
FOCS Conference 2016 Conference Paper
For a finite relational structure A, let CSP(A) denote the CSP instances whose constraint relations are taken from A. The resulting family of problems CSP(A) has been considered heavily in a variety of computational contexts. In this article, we consider this family from the perspective of property testing: given an instance of a CSP and query access to an assignment, one wants to decide whether the assignment satisfies the instance, or is far from so doing. While previous work on this scenario studied concrete templates or restricted classes of structures, this article presents comprehensive classification theorems. Our first contribution is a dichotomy theorem completely characterizing the structures A such that CSP(A) is constant-query testable: (i) If A has a majority polymorphism and a Maltsev polymorphism, then CSP(A) is constant-query testable with one-sided error. (ii) Else, testing CSP(A) requires a super-constant number of queries. Let ∃CSP(A) denote the extension of CSP(A) to instances which may include existentially quantified variables. Our second contribution is to classify all structures A in terms of the number of queries needed to test assignments to instances of ∃CSP(A), with one-sided error. More specifically, we show the following trichotomy (i) If A has a majority polymorphism and a Maltsev polymorphism, then ∃CSP(A) is constant-query testable with one-sided error. (ii) Else, if A has a (k + 1)-ary near-unanimity polymorphism for some k ≥ 2, and no Maltsev polymorphism then ∃CSP(A) is not constant-query testable (even with two-sided error) but is sublinear-query testable with one-sided error. (iii) Else, testing ∃CSP(A) with one-sided error requires a linear number of queries.
NeurIPS Conference 2015 Conference Paper
We consider a generalization of the submodular cover problem based on the concept of diminishing return property on the integer lattice. We are motivated by real scenarios in machine learning that cannot be captured by (traditional) submodular set functions. We show that the generalized submodular cover problem can be applied to various problems and devise a bicriteria approximation algorithm. Our algorithm is guaranteed to output a log-factor approximate solution that satisfies the constraints with the desired accuracy. The running time of our algorithm is roughly $O(n\log (nr) \log{r})$, where $n$ is the size of the ground set and $r$ is the maximum value of a coordinate. The dependency on $r$ is exponentially better than the naive reduction algorithms. Several experiments on real and artificial datasets demonstrate that the solution quality of our algorithm is comparable to naive algorithms, while the running time is several orders of magnitude faster.
AAAI Conference 2015 Conference Paper
We introduce a new framework for solving distributed constraint optimization problems that extend the domain of each variable into a simplex. We propose two methods for searching the extended domain for good assignments. The first one relaxes the problem using linear programming, finds the optimum LP solution, and rounds it to an assignment. The second one plays a cost-minimization game, finds a certain kind of equilibrium, and rounds it to an assignment. Both methods are realized by performing the multiplicative weights method in a distributed manner. We experimentally demonstrate that our methods have good scalability, and in particular, the second method outperforms existing algorithms in terms of solution quality and efficiency.
AAAI Conference 2015 Conference Paper
We propose an indexing scheme for top-k shortestpath distance queries on graphs, which is useful in a wide range of important applications such as networkaware searches and link prediction. While many efficient methods for answering standard (top-1) distance queries have been developed, none of these methods are directly extensible to top-k distance queries. We develop a new framework for top-k distance queries based on 2-hop cover and then present an efficient indexing algorithm based on the recently proposed pruned landmark labeling scheme. The scalability, efficiency and robustness of our method is demonstrated in extensive experimental results. Moreover, we demonstrate the usefulness of top-k distance queries by applying them to link prediction, the most fundamental graph problem in the AI and Web communities.
AAAI Conference 2015 Conference Paper
Attributes of words and relations between two words are central to numerous tasks in Artificial Intelligence such as knowledge representation, similarity measurement, and analogy detection. Often when two words share one or more attributes in common, they are connected by some semantic relations. On the other hand, if there are numerous semantic relations between two words, we can expect some of the attributes of one of the words to be inherited by the other. Motivated by this close connection between attributes and relations, given a relational graph in which words are interconnected via numerous semantic relations, we propose a method to learn a latent representation for the individual words. The proposed method considers not only the co-occurrences of words as done by existing approaches for word representation learning, but also the semantic relations in which two words co-occur. To evaluate the accuracy of the word representations learnt using the proposed method, we use the learnt word representations to solve semantic word analogy problems. Our experimental results show that it is possible to learn better word representations by using semantic semantics between words.
NeurIPS Conference 2015 Conference Paper
A $k$-submodular function is a generalization of a submodular function, where the input consists of $k$ disjoint subsets, instead of a single subset, of the domain. Many machine learning problems, including influence maximization with $k$ kinds of topics and sensor placement with $k$ kinds of sensors, can be naturally modeled as the problem of maximizing monotone $k$-submodular functions. In this paper, we give constant-factor approximation algorithms for maximizing monotone $k$-submodular functions subject to several size constraints. The running time of our algorithms are almost linear in the domain size. We experimentally demonstrate that our algorithms outperform baseline algorithms in terms of the solution quality.
STOC Conference 2014 Conference Paper
AAAI Conference 2014 Conference Paper
Influence maximization is a problem to find small sets of highly influential individuals in a social network to maximize the spread of influence under stochastic cascade models of propagation. Although the problem has been well-studied, it is still highly challenging to find solutions of high quality in large-scale networks of the day. While Monte-Carlo-simulation-based methods produce near-optimal solutions with a theoretical guarantee, they are prohibitively slow for large graphs. As a result, many heuristic methods without any theoretical guarantee have been developed, but all of them substantially compromise solution quality. To address this issue, we propose a new method for the influence maximization problem. Unlike other recent heuristic methods, the proposed method is a Monte-Carlo-simulationbased method, and thus it consistently produces solutions of high quality with the theoretical guarantee. On the other hand, unlike other previous Monte-Carlosimulation-based methods, it runs as fast as other stateof-the-art methods, and can be applied to large networks of the day. Through our extensive experiments, we demonstrate the scalability and the solution quality of the proposed method.
SODA Conference 2014 Conference Paper
In the area of parameterized complexity, to cope with NP-Hard problems, we introduce a parameter k besides the input size n, and we aim to design algorithms (called FPT algorithms) that run in O ( f ( k ) n d ) time for some function f ( k ) and constant d. Though FPT algorithms have been successfully designed for many problems, typically they are not sufficiently fast because of huge f ( k ) and d. In this paper, we give FPT algorithms with small f ( k ) and d for many important problems including Odd Cycle Transversal and Almost 2-SAT. More specifically, we can choose f ( k ) as a single exponential (4 k ) and d as one, that is, linear in the input size. To the best of our knowledge, our algorithms achieve linear time complexity for the first time for these problems. To obtain our algorithms for these problems, we consider a large class of integer programs, called BIP2. Then we show that, in linear time, we can reduce BIP2 to Vertex Cover Above LP preserving the parameter k, and we can compute an optimal LP solution for Vertex Cover Above LP using network flow. Then, we perform an exaustive search by fixing half-integral values in the optimal LP solution for Vertex Cover Above LP. A bottleneck here is that we need to recompute an LP optimal solution after branching. To address this issue, we exploit network flow to update the optimal LP solution in linear time.
IJCAI Conference 2013 Conference Paper
The ability to recognize analogies is an important factor that is closely related to human intelligence. Verbal analogies have been used for evaluating both examinees at university entrance exams as well as algorithms for measuring relational similarity. However, relational similarity measures proposed so far are confined to measuring the similarity between pairs of words. Unfortunately, such pairwise approaches ignore the rich relational structure that exists in real-world knowledge bases containing millions of entities and semantic relations. We propose a method to efficiently identify analogous entity tuples from a given entity-relation graph. First, we present an efficient approach for extracting potential analogous tuples from a given entityrelation graph. Second, to measure the structural similarity between two tuples, we propose two types of kernel functions: vertex-feature kernels, and edge-feature kernels. Moreover, we combine those kernels to construct composite kernels that simultaneously consider both vertex and edge features. Experimental results show that our proposed method accurately identifies analogous tuples and significantly outperforms a state-of-the-art pairwise relational similarity measure, extended to tuples.
STOC Conference 2013 Conference Paper
FOCS Conference 2012 Conference Paper
Given a Boolean function f, the f-isomorphism testing problem requires a randomized algorithm to distinguish functions that are identical to f up to relabeling of the input variables from functions that are far from being so. An important open question in property testing is to determine for which functions f we can test f-isomorphism with a constant number of queries. Despite much recent attention to this question, essentially only two classes of functions were known to be efficiently isomorphism testable: symmetric functions and juntas. We unify and extend these results by showing that all partially symmetric functions -- functions invariant to the reordering of all but a constant number of their variables -- are efficiently isomorphism-testable. This class of functions, first introduced by Shannon, includes symmetric functions, juntas, and many other functions as well. We conjecture that these functions are essentially the only functions efficiently isomorphism-testable. To prove our main result, we also show that partial symmetry is efficiently testable. In turn, to prove this result we had to revisit the junta testing problem. We provide a new proof of correctness of the nearly-optimal junta tester. Our new proof replaces the Fourier machinery of the original proof with a purely combinatorial argument that exploits the connection between sets of variables with low influence and intersecting families. Another important ingredient in our proofs is a new notion of symmetric influence. We use this measure of influence to prove that partial symmetry is efficiently testable and also to construct an efficient sample extractor for partially symmetric functions. We then combine the sample extractor with the testing-by-implicit-learning approach to complete the proof that partially symmetric functions are efficiently isomorphism-testable.
TCS Journal 2012 Journal Article
STOC Conference 2011 Conference Paper
STOC Conference 2009 Conference Paper