Arrow Research search

Author name cluster

Mikko Koivisto

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

42 papers
2 author rows

Possible papers

42

UAI Conference 2025 Conference Paper

Quantum Speedups for Bayesian Network Structure Learning

  • Juha Harviainen
  • Kseniya Rychkova
  • Mikko Koivisto

The Bayesian network structure learning (BNSL) problem asks for a directed acyclic graph that maximizes a given score function. For networks with $n$ nodes, the fastest known algorithms run in time $O(2^n n^2)$ in the worst case, with no improvement in the asymptotic bound for two decades. Inspired by recent advances in quantum computing, we ask whether BNSL admits a polynomial quantum speedup, that is, whether the problem can be solved by a quantum algorithm in time $O(c^n)$ for some constant $c$ less than $2$. We answer the question in the affirmative by giving two algorithms achieving $c \leq 1. 817$ and $c \leq 1. 982$ assuming the number of potential parent sets is, respectively, subexponential and $O(1. 453^n)$. Both algorithms assume the availability of a quantum random access memory. We also prove that one presumably cannot lower the base $2$ for any classical algorithm, as that would refute the strong exponential time hypothesis.

JAIR Journal 2024 Journal Article

Approximate Counting of Linear Extensions in Practice

  • Topi Talvitie
  • Mikko Koivisto

We investigate the problem of computing the number of linear extensions of a given partial order on n elements. The problem has applications in numerous areas, such as sorting, planning, and learning graphical models. The problem is #P-hard but admits fully polynomial-time approximation schemes. However, the polynomial complexity bounds of the known schemes involve high degrees and large constant factors, rendering the schemes only feasible when n is some dozens. We present novel schemes, which stem from the idea of not requiring provable polynomial worst-case running time bounds. Using various new algorithmic techniques and implementation optimizations, we discover schemes that yield speedups by several orders of magnitude, enabling accurate approximations even when n is in several hundreds.

ICML Conference 2024 Conference Paper

Estimating the Permanent by Nesting Importance Sampling

  • Juha Harviainen
  • Mikko Koivisto

Sequential importance sampling (SIS) is one of the prominent methods for estimating high-dimensional integrals. For example, it is empirically the most efficient method known for estimating the permanent of nonnegative matrices, a notorious problem with numerous applications in computer science, statistics, and other fields. Unfortunately, SIS typically fails to provide accuracy guarantees due to difficulties in bounding the variance of the importance weights; for estimating the permanent with accuracy guarantees, the most efficient practical methods known are based on rejection sampling. Taking the best of both worlds, we give a variant of SIS, in which sampling is proportional to the upper bound used in rejection sampling. We show that this method is provably more efficient than its rejection sampling counterpart, particularly in high accuracy regimes. On estimating the permanent, we empirically obtain up to two orders-of-magnitude speedups over a state-of-the-art rejection sampling method.

UAI Conference 2024 Conference Paper

Faster Perfect Sampling of Bayesian Network Structures

  • Juha Harviainen
  • Mikko Koivisto

Bayesian inference of a Bayesian network structure amounts to averaging over directed acyclic graphs (DAGs) on a given set of $n$ variables, each DAG weighted by its posterior probability. In practice, save some special inference tasks, one averages over a sample of DAGs generated perfectly or approximately from the posterior. For the hard problem of perfect sampling, we give an algorithm that runs in $O(2. 829^n)$ expected time, getting below $O(3^n)$ for the first time. Our algorithm reduces the problem into two smaller sampling problems whose outputs are combined; followed by a simple rejection step, perfect samples are obtained. Subsequent samples can be generated considerably faster. Empirically, we observe speedups of several orders of magnitude over the state of the art.

AAAI Conference 2023 Conference Paper

A Faster Practical Approximation Scheme for the Permanent

  • Juha Harviainen
  • Mikko Koivisto

The permanent of a matrix has numerous applications but is notoriously hard to compute. While nonnegative matrices admit polynomial approximation schemes based on rapidly mixing Markov chains, the known practical estimators of the permanent rely on importance or rejection sampling. We advance the rejection sampling approach, which provides probabilistic accuracy guarantees, unlike importance sampling. Specifically, we give a novel class of nesting upper bounds and a simple preprocessing method that, in comparison to previous works, enable faster sampling with better acceptance rate; we demonstrate order-of-magnitude improvements with both theoretical and empirical analyses. In addition, we display instances on which our approximation scheme is competitive against state-of-the-art importance sampling based estimators.

UAI Conference 2023 Conference Paper

On inference and learning with probabilistic generating circuits

  • Juha Harviainen
  • Vaidyanathan Peruvemba Ramaswamy
  • Mikko Koivisto

Probabilistic generating circuits (PGCs) are economical representations of multivariate probability generating polynomials (PGPs). They unify and extend decomposable probabilistic circuits and determinantal point processes, admitting tractable computation of marginal probabilities. However, the need for addition and multiplication of high-degree polynomials incurs a significant additional factor in the complexity of inference. Here, we give a new inference algorithm that eliminates this extra factor. Specifically, we show that it suffices to keep track of the highest degree coefficients of the computed polynomials, rendering the algorithm linear in the circuit size. In addition, we show that determinant-based circuits need not be expanded to division-free circuits, but can be handled by division-based fast algorithms. While these advances enhance the appeal of PGCs, we also discover an obstacle to learning them from data: it is NP-hard to recognize whether a given PGC encodes a PGP. We discuss the implications of our ambivalent findings and sketch a method, in which learning is restricted to PGCs that are composed of moderate-size subcircuits.

UAI Conference 2023 Conference Paper

Revisiting Bayesian network learning with small vertex cover

  • Juha Harviainen
  • Mikko Koivisto

The problem of structure learning in Bayesian networks asks for a directed acyclic graph (DAG) that maximizes a given scoring function. Since the problem is NP-hard, research effort has been put into discovering restricted classes of DAGs for which the search problem can be solved in polynomial time. Here, we initiate investigation of questions that have received less attention thus far: Are the known polynomial algorithms close to the best possible, or is there room for significant improvements? If the interest is in Bayesian learning, that is, in sampling or weighted counting of DAGs, can we obtain similar complexity results? Focusing on DAGs with bounded vertex cover number—a class studied in Korhonen and Parviainen’s seminal work (NIPS 2015)—we answer the questions in the affirmative. We also give, apparently the first, proof that the counting problem is $#$P-hard in general. In addition, we show that under the vertex-cover constraint counting is $#$W[1]-hard.

NeurIPS Conference 2022 Conference Paper

Trustworthy Monte Carlo

  • Juha Harviainen
  • Mikko Koivisto
  • Petteri Kaski

Monte Carlo integration is a key technique for designing randomized approximation schemes for counting problems, with applications, e. g. , in machine learning and statistical physics. The technique typically enables massively parallel computation, however, with the risk that some of the delegated computations contain spontaneous or adversarial errors. We present an orchestration of the computations such that the outcome is accompanied with a proof of correctness that can be verified with substantially less computational resources than it takes to run the computations from scratch with state-of-the-art algorithms. Specifically, we adopt an algebraic proof system developed in computational complexity theory, in which the proof is represented by a polynomial; evaluating the polynomial at a random point amounts to a verification of the proof with probabilistic guarantees. We give examples of known Monte Carlo estimators that admit verifiable extensions with moderate computational overhead: for the permanent of zero--one matrices, for the model count of disjunctive normal form formulas, and for the gradient of logistic regression models. We also discuss the prospects and challenges of engineering efficient verifiable approximation schemes more generally.

NeurIPS Conference 2021 Conference Paper

Approximating the Permanent with Deep Rejection Sampling

  • Juha Harviainen
  • Antti Röyskö
  • Mikko Koivisto

We present a randomized approximation scheme for the permanent of a matrix with nonnegative entries. Our scheme extends a recursive rejection sampling method of Huber and Law (SODA 2008) by replacing the permanent upper bound with a linear combination of the subproblem bounds at a moderately large depth of the recursion tree. This method, we call deep rejection sampling, is empirically shown to outperform the basic, depth-zero variant, as well as a related method by Kuck et al. (NeurIPS 2019). We analyze the expected running time of the scheme on random $(0, 1)$-matrices where each entry is independently $1$ with probability $p$. Our bound is superior to a previous one for $p$ less than $1/5$, matching another bound that was only known to hold when every row and column has density exactly $p$.

AAAI Conference 2020 Conference Paper

A Bayesian Approach for Estimating Causal Effects from Observational Data

  • Johan Pensar
  • Topi Talvitie
  • Antti Hyttinen
  • Mikko Koivisto

We present a novel Bayesian method for the challenging task of estimating causal effects from passively observed data when the underlying causal DAG structure is unknown. To rigorously capture the inherent uncertainty associated with the estimate, our method builds a Bayesian posterior distribution of the linear causal effect, by integrating Bayesian linear regression and averaging over DAGs. For computing the exact posterior for all cause-effect variable pairs, we give an algorithm that runs in time O(3d d) for d variables, being feasible up to 20 variables. We also give a variant that computes the posterior probabilities of all pairwise ancestor relations within the same time complexity, significantly improving the fastest previous algorithm. In simulations, our Bayesian method outperforms previous methods in estimation accuracy, especially for small sample sizes. We further show that our method for effect estimation is well-adapted for detecting strong causal effects markedly deviating from zero, while our variant for computing posteriors of ancestor relations is the method of choice for detecting the mere existence of a causal relation. Finally, we apply our method on observational flow cytometry data, detecting several causal relations that concur with previous findings from experimental data.

AAAI Conference 2020 Conference Paper

Error-Correcting and Verifiable Parallel Inference in Graphical Models

  • Negin Karimi
  • Petteri Kaski
  • Mikko Koivisto

We present a novel framework for parallel exact inference in graphical models. Our framework supports error-correction during inference and enables fast verification that the result of inference is correct, with probabilistic soundness. The computational complexity of inference essentially matches the cost of w-cutset conditioning, a known generalization of Pearl’s classical loop-cutset conditioning for inference. Verifying the result for correctness can be done with as little as essentially the square root of the cost of inference. Our main technical contribution amounts to designing a low-degree polynomial extension of the cutset approach, and then reducing to a univariate polynomial employing techniques recently developed for noninteractive probabilistic proof systems.

UAI Conference 2020 Conference Paper

Layering-MCMC for Structure Learning in Bayesian Networks

  • Jussi Viinikka
  • Mikko Koivisto

Bayesian inference of the Bayesian network structure requires averaging over all possible directed acyclic graphs, DAGs, each weighted by its posterior probability. For approximate averaging, the most popular method has been Markov chain Monte Carlo, MCMC. It was recently shown that collapsing the sampling space from DAGs to suitably defined ordered partitions of the nodes substantially expedites the chain’s convergence; this partition-MCMC is similar to order-MCMC on node orderings, but it avoids biasing the sampling distribution. Here, we further collapse the state space by merging some number of adjacent members of a partition into layers. This renders the computation of the (unnormalized) posterior probability of a state, called layering, more involved, for which task we give an efficient dynamic programming algorithm. Our empirical studies suggest that the resulting layering-MCMC is superior to partition-MCMC in terms of mixing time and estimation accuracy.

NeurIPS Conference 2020 Conference Paper

Towards Scalable Bayesian Learning of Causal DAGs

  • Jussi Viinikka
  • Antti Hyttinen
  • Johan Pensar
  • Mikko Koivisto

We give methods for Bayesian inference of directed acyclic graphs, DAGs, and the induced causal effects from passively observed complete data. Our methods build on a recent Markov chain Monte Carlo scheme for learning Bayesian networks, which enables efficient approximate sampling from the graph posterior, provided that each node is assigned a small number K of candidate parents. We present algorithmic techniques to significantly reduce the space and time requirements, which make the use of substantially larger values of K feasible. Furthermore, we investigate the problem of selecting the candidate parents per node so as to maximize the covered posterior mass. Finally, we combine our sampling method with a novel Bayesian approach for estimating causal effects in linear Gaussian DAG models. Numerical experiments demonstrate the performance of our methods in detecting ancestor–descendant relations, and in causal effect estimation our Bayesian method is shown to outperform previous approaches.

AAAI Conference 2019 Conference Paper

Counting and Sampling Markov Equivalent Directed Acyclic Graphs

  • Topi Talvitie
  • Mikko Koivisto

Exploring directed acyclic graphs (DAGs) in a Markov equivalence class is pivotal to infer causal effects or to discover the causal DAG via appropriate interventional data. We consider counting and uniform sampling of DAGs that are Markov equivalent to a given DAG. These problems efficiently reduce to counting the moral acyclic orientations of a given undirected connected chordal graph on n vertices, for which we give two algorithms. Our first algorithm requires O(2n n4 ) arithmetic operations, improving a previous superexponential upper bound. The second requires O k! 2k k2 n operations, where k is the size of the largest clique in the graph; for bounded-degree graphs this bound is linear in n. After a single run, both algorithms enable uniform sampling from the equivalence class at a computational cost linear in the graph size. Empirical results indicate that our algorithms are superior to previously presented algorithms over a range of inputs; graphs with hundreds of vertices and thousands of edges are processed in a second on a desktop computer.

UAI Conference 2019 Conference Paper

Exact Sampling of Directed Acyclic Graphs from Modular Distributions

  • Topi Talvitie
  • Aleksis Vuoksenmaa
  • Mikko Koivisto

We consider the problem of sampling directed acyclic graphs (DAGs) from a given distribution. We assume the sampling distribution is modular, i. e. , the probability of a DAG is a product of local factors, each of which only depends on a node and its parents in the DAG. Using inclusion–exclusion recurrence relations, we give an exact sampler that requires Õ(3^n) time for preprocessing and Õ(2^n) per sample, where n is the number of nodes and Õ suppresses polylogarithmic factors. We also consider the symmetric special case where the factors only depend on the size of the parent set—this covers uniform sampling under indegree constraints. In this case, our exact sampler requires O(n^3) time for preprocessing and O(n^2) per sample; this outperforms the previous best bound even for the uniform distribution. We demonstrate the performance of both samplers also empirically.

IJCAI Conference 2018 Conference Paper

A Scalable Scheme for Counting Linear Extensions

  • Topi Talvitie
  • Kustaa Kangas
  • Teppo Niinimäki
  • Mikko Koivisto

Counting the linear extensions of a given partial order not only has several applications in artificial intelligence but also represents a hard problem that challenges modern paradigms for approximate counting. Recently, Talvitie et al. (AAAI 2018) showed that an exponential time scheme beats the fastest known polynomial time schemes in practice, even if allowing hours of running time. Here, we present a novel scheme, relaxation Tootsie Pop, which in our experiments exhibits polynomial characteristics and significantly outperforms previous schemes. We also instantiate state-of-the-art model counters for CNF formulas; two natural encodings yield schemes that, however, are inferior to the more specialized schemes.

AAAI Conference 2018 Conference Paper

Counting Linear Extensions in Practice: MCMC Versus Exponential Monte Carlo

  • Topi Talvitie
  • Kustaa Kangas
  • Teppo Niinimäki
  • Mikko Koivisto

Counting the linear extensions of a given partial order is a #P-complete problem that arises in numerous applications. For polynomial-time approximation, several Markov chain Monte Carlo schemes have been proposed; however, little is known of their efficiency in practice. This work presents an empirical evaluation of the state-of-the-art schemes and investigates a number of ideas to enhance their performance. In addition, we introduce a novel approximation scheme, adaptive relaxation Monte Carlo (ARMC), that leverages exact exponential-time counting algorithms. We show that approximate counting is feasible up to a few hundred elements on various classes of partial orders, and within this range ARMC typically outperforms the other schemes.

IJCAI Conference 2017 Conference Paper

The Mixing of Markov Chains on Linear Extensions in Practice

  • Topi Talvitie
  • Teppo Niinimäki
  • Mikko Koivisto

We investigate almost uniform sampling from the set of linear extensions of a given partial order. The most efficient schemes stem from Markov chains whose mixing time bounds are polynomial, yet impractically large. We show that, on instances one encounters in practice, the actual mixing times can be much smaller than the worst-case bounds, and particularly so for a novel Markov chain we put forward. We circumvent the inherent hardness of estimating standard mixing times by introducing a refined notion, which admits estimation for moderate-size partial orders. Our empirical results suggest that the Markov chain approach to sample linear extensions can be made to scale well in practice, provided that the actual mixing times can be realized by instance-sensitive upper bounds or termination rules. Examples of the latter include existing perfect simulation algorithms, whose running times in our experiments follow the actual mixing times of certain chains, albeit with significant overhead.

IJCAI Conference 2016 Conference Paper

Counting Linear Extensions of Sparse Posets

  • Kustaa Kangas
  • Teemu Hankala
  • Teppo Niinim
  • auml; ki
  • Mikko Koivisto

We present two algorithms for computing the number of linear extensions of a given n-element poset. Our first approach builds upon an O(2n n)-time dynamic programming algorithm by splitting subproblems into connected components and recursing on them independently. The recursion may run over two alternative subproblem spaces, and we provide heuristics for choosing the more efficient one. Our second algorithm is based on variable elimination via inclusion-exclusion and runs in time O(nt+4)), where t is the treewidth of the cover graph. We demonstrate experimentally that these new algorithms outperform previously suggested ones for a wide range of posets, in particular when the posets are sparse.

UAI Conference 2016 Conference Paper

Pruning Rules for Learning Parsimonious Context Trees

  • Ralf Eggeling
  • Mikko Koivisto

We give a novel algorithm for finding a parsimonious context tree (PCT) that best fits a given data set. PCTs extend traditional context trees by allowing context-specific grouping of the states of a context variable, also enabling skipping the variable. However, they gain statistical efficiency at the cost of computational efficiency, as the search space of PCTs is of tremendous size. We propose pruning rules based on efficiently computable score upper bounds with the aim of reducing this search space significantly. While our concrete bounds exploit properties of the BIC score, the ideas apply also to other scoring functions. Empirical results show that our algorithm is typically an order-of-magnitude faster than a recently proposed memory-intensive algorithm, or alternatively, about equally fast but using dramatically less memory.

JMLR Journal 2016 Journal Article

Structure Discovery in Bayesian Networks by Sampling Partial Orders

  • Teppo Niinimäki
  • Pekka Parviainen
  • Mikko Koivisto

We present methods based on Metropolis-coupled Markov chain Monte Carlo (MC3) and annealed importance sampling (AIS) for estimating the posterior distribution of Bayesian networks. The methods draw samples from an appropriate distribution of partial orders on the nodes, continued by sampling directed acyclic graphs (DAGs) conditionally on the sampled partial orders. We show that the computations needed for the sampling algorithms are feasible as long as the encountered partial orders have relatively few down-sets. While the algorithms assume suitable modularity properties of the priors, arbitrary priors can be handled by dividing the importance weight of each sampled DAG by the number of topological sorts it has---we give a practical dynamic programming algorithm to compute these numbers. Our empirical results demonstrate that the presented partial-order- based samplers are superior to previous Markov chain Monte Carlo methods, which sample DAGs either directly or via linear orders on the nodes. The results also suggest that the convergence rate of the estimators based on AIS are competitive to those of MC3. Thus AIS is the preferred method, as it enables easier large- scale parallelization and, in addition, supplies good probabilistic lower bound guarantees for the marginal likelihood of the model. [abs] [ pdf ][ bib ] &copy JMLR 2016. ( edit, beta )

UAI Conference 2015 Conference Paper

Averaging of Decomposable Graphs by Dynamic Programming and Sampling

  • Kustaa Kangas
  • Teppo Mikael Niinimäki
  • Mikko Koivisto

We give algorithms for Bayesian learning of decomposable graphical models from complete data. We build on a recently proposed dynamic programming algorithm that finds optimal graphs of n nodes in O(4n ) time and O(3n ) space (Kangas et al. , NIPS 2014), and show how it can be turned into accurate averaging algorithms. Specifically, we show that certain marginals of the posterior distribution, like the posterior probability of an edge, can be computed in O(n3 3n ) time, provided that the prior over the graphs is of an appropriate form. To overcome some limitations of the exact approach, we also give sampling schemes that—using essentially no extra space—can draw up to 3n independent graphs from the posterior in O(n4n ) time. Through importance sampling, this enables accurate Bayesian inference with a broader class of priors. Using benchmark datasets, we demonstrate the method’s performance and the advantage of averaging over optimization when learning from little data.

ICML Conference 2015 Conference Paper

Dealing with small data: On the generalization of context trees

  • Ralf Eggeling
  • Mikko Koivisto
  • Ivo Grosse

Context trees (CT) are a widely used tool in machine learning for representing context-specific independences in conditional probability distributions. Parsimonious context trees (PCTs) are a recently proposed generalization of CTs that can enable statistically more efficient learning due to a higher structural flexibility, which is particularly useful for small-data settings. However, this comes at the cost of a computationally expensive structure learning algorithm, which is feasible only for domains with small alphabets and tree depths. In this work, we investigate to which degree CTs can be generalized to increase statistical efficiency while still keeping the learning computationally feasible. Approaching this goal from two different angles, we (i) propose algorithmic improvements to the PCT learning algorithm, and (ii) study further generalizations of CTs, which are inspired by PCTs, but trade structural flexibility for computational efficiency. By empirical studies both on simulated and real-world data, we demonstrate that the synergy of combining of both orthogonal approaches yields a substantial improvement in obtaining statistically efficient and computationally feasible generalizations of CTs.

NeurIPS Conference 2014 Conference Paper

Learning Chordal Markov Networks by Dynamic Programming

  • Kustaa Kangas
  • Mikko Koivisto
  • Teppo Niinimäki

We present an algorithm for finding a chordal Markov network that maximizes any given decomposable scoring function. The algorithm is based on a recursive characterization of clique trees, and it runs in O(4^n) time for n vertices. On an eight-vertex benchmark instance, our implementation turns out to be about ten million times faster than a recently proposed, constraint satisfaction based algorithm (Corander et al. , NIPS 2013). Within a few hours, it is able to solve instances up to 18 vertices, and beyond if we restrict the maximum clique size. We also study the performance of a recent integer linear programming algorithm (Bartlett and Cussens, UAI 2013). Our results suggest that, unless we bound the clique sizes, currently only the dynamic programming algorithm is guaranteed to solve instances with around 15 or more vertices.

AAAI Conference 2014 Conference Paper

Predicting the Hardness of Learning Bayesian Networks

  • Brandon Malone
  • Kustaa Kangas
  • Matti Jarvisalo
  • Mikko Koivisto
  • Petri Myllymaki

There are various algorithms for finding a Bayesian network structure (BNS) that is optimal with respect to a given scoring function. No single algorithm dominates the others in speed, and, given a problem instance, it is a priori unclear which algorithm will perform best and how fast it will solve the problem. Estimating the runtimes directly is extremely difficult as they are complicated functions of the instance. The main contribution of this paper is characterization of the empirical hardness of an instance for a given algorithm based on a novel collection of non-trivial, yet efficiently computable features. Our empirical results, based on the largest evaluation of stateof-the-art BNS learning algorithms to date, demonstrate that we can predict the runtimes to a reasonable degree of accuracy, and effectively select algorithms that perform well on a particular instance. Moreover, we also show how the results can be utilized in building a portfolio algorithm that combines several individual algorithms in an almost optimal manner.

IJCAI Conference 2013 Conference Paper

Annealed Importance Sampling for Structure Learning in Bayesian Networks

  • Teppo Niinimäki
  • Mikko Koivisto

We present a new sampling approach to Bayesian learning of the Bayesian network structure. Like some earlier sampling methods, we sample linear orders on nodes rather than directed acyclic graphs (DAGs). The key difference is that we replace the usual Markov chain Monte Carlo (MCMC) method by the method of annealed importance sampling (AIS). We show that AIS is not only competitive to MCMC in exploring the posterior, but also superior to MCMC in two ways: it enables easy and efficient parallelization, due to the independence of the samples, and lower-bounding of the marginal likelihood of the model with good probabilistic guarantees. We also provide a principled way to correct the bias due to order-based sampling, by implementing a fast algorithm for counting the linear extensions of a given partial order.

JMLR Journal 2013 Journal Article

Finding Optimal Bayesian Networks Using Precedence Constraints

  • Pekka Parviainen
  • Mikko Koivisto

We consider the problem of finding a directed acyclic graph (DAG) that optimizes a decomposable Bayesian network score. While in a favorable case an optimal DAG can be found in polynomial time, in the worst case the fastest known algorithms rely on dynamic programming across the node subsets, taking time and space $2^n$, to within a factor polynomial in the number of nodes $n$. In practice, these algorithms are feasible to networks of at most around 30 nodes, mainly due to the large space requirement. Here, we generalize the dynamic programming approach to enhance its feasibility in three dimensions: first, the user may trade space against time; second, the proposed algorithms easily and efficiently parallelize onto thousands of processors; third, the algorithms can exploit any prior knowledge about the precedence relation on the nodes. Underlying all these results is the key observation that, given a partial order $P$ on the nodes, an optimal DAG compatible with $P$ can be found in time and space roughly proportional to the number of ideals of $P$, which can be significantly less than $2^n$. Considering sufficiently many carefully chosen partial orders guarantees that a globally optimal DAG will be found. Aside from the generic scheme, we present and analyze concrete tradeoff schemes based on parallel bucket orders. [abs] [ pdf ][ bib ] &copy JMLR 2013. ( edit, beta )

UAI Conference 2013 Conference Paper

Treedy: A Heuristic for Counting and Sampling Subsets

  • Teppo Mikael Niinimäki
  • Mikko Koivisto

Consider a collection of weighted subsets of a ground set N. Given a query subset Q of N, how fast can one (1) find the weighted sum over all subsets of Q, and (2) sample a subset of Q proportionally to the weights? We present a tree-based greedy heuristic, Treedy, that for a given positive tolerance d answers such counting and sampling queries to within a guaranteed relative error d and total variation distance d, respectively. Experimental results on artificial instances and in application to Bayesian structure discovery in Bayesian networks show that approximations yield dramatic savings in running time compared to exact computation, and that Treedy typically outperforms a previously proposed sorting-based heuristic.

SAT Conference 2012 Conference Paper

Finding Efficient Circuits for Ensemble Computation

  • Matti Järvisalo
  • Petteri Kaski
  • Mikko Koivisto
  • Janne H. Korhonen

Abstract Given a Boolean function as input, a fundamental problem is to find a Boolean circuit with the least number of elementary gates (AND, OR, NOT) that computes the function. The problem generalises naturally to the setting of multiple Boolean functions: find the smallest Boolean circuit that computes all the functions simultaneously. We study an NP-complete variant of this problem titled Ensemble Computation and, especially, its relationship to the Boolean satisfiability (SAT) problem from both the theoretical and practical perspectives, under the two monotone circuit classes: OR-circuits and SUM-circuits. Our main result relates the existence of nontrivial algorithms for CNF-SAT with the problem of rewriting in subquadratic time a given OR-circuit to a SUM-circuit. Furthermore, by developing a SAT encoding for the ensemble computation problem and by employing state-of-the-art SAT solvers, we search for concrete instances that would witness a substantial separation between the size of optimal OR-circuits and optimal SUM-circuits. Our encoding allows for exhaustively checking all small witness candidates. Searching over larger witness candidates presents an interesting challenge for current SAT solver technology.

AAAI Conference 2012 Conference Paper

On Finding Optimal Polytrees

  • Serge Gaspers
  • Mikko Koivisto
  • Mathieu Liedloff
  • Sebastian Ordyniak
  • Stefan Szeider

Inferring probabilistic networks from data is a notoriously difficult task. Under various goodness-of-fit measures, finding an optimal network is NP-hard, even if restricted to polytrees of bounded in-degree. Polynomial-time algorithms are known only for rare special cases, perhaps most notably for branchings, that is, polytrees in which the in-degree of every node is at most one. Here, we study the complexity of finding an optimal polytree that can be turned into a branching by deleting some number of arcs or nodes, treated as a parameter. We show that the problem can be solved via a matroid intersection formulation in polynomial time if the number of deleted arcs is bounded by a constant. The order of the polynomial time bound depends on this constant, hence the algorithm does not establish fixed-parameter tractability when parameterized by the number of deleted arcs. We show that a restricted version of the problem allows fixed-parameter tractability and hence scales well with the parameter. We contrast this positive result by showing that if we parameterize by the number of deleted nodes, a somewhat more powerful parameter, the problem is not fixed-parameter tractable, subject to a complexity-theoretic assumption.

UAI Conference 2011 Conference Paper

Partial Order MCMC for Structure Discovery in Bayesian Networks

  • Teppo Mikael Niinimäki
  • Pekka Parviainen
  • Mikko Koivisto

We present a new Markov chain Monte Carlo method for estimating posterior probabilities of structural features in Bayesian networks. The method draws samples from the posterior distribution of partial orders on the nodes; for each sampled partial order, the conditional probabilities of interest are computed exactly. We give both analytical and empirical results that suggest the superiority of the new method compared to previous methods, which sample either directed acyclic graphs or linear orders on the nodes.

SODA Conference 2010 Conference Paper

A Space-Time Tradeoff for Permutation Problems

  • Mikko Koivisto
  • Pekka Parviainen

Many combinatorial problems—such as the traveling salesman, feedback arcset, cutwidth, and treewidth problem—can be formulated as finding a feasible permutation of n elements. Typically, such problems can be solved by dynamic programming in time and space O *(2 n ), by divide and conquer in time O *(4 n ) and polynomial space, or by a combination of the two in time O *(4 n 2 −s ) and space O *(2 s ) for s = n, n /2, n /4, …. Here, we show that one can improve the tradeoff to time O *( T n ) and space O *( S n ) with TS < 4 at any. The idea is to find a small family of “thin” partial orders on the n elements such that every linear order is an extension of one member of the family. Our construction is optimal within a natural class of partial order families.

UAI Conference 2009 Conference Paper

Exact Structure Discovery in Bayesian Networks with Less Space

  • Pekka Parviainen
  • Mikko Koivisto

The fastest known exact algorithms for scorebased structure discovery in Bayesian networks on n nodes run in time and space 2n nO(1). The usage of these algorithms is limited to networks on at most around 25 nodes mainly due to the space requirement. Here, we study space–time tradeoffs for finding an optimal network structure. When little space is available, we apply the Gurevich– Shelah recurrence—originally proposed for the Hamiltonian path problem—and obtain time 22n−s nO(1) in space 2s nO(1) for any s = n/2, n/4, n/8, .. .; we assume the indegree of each node is bounded by a constant. For the more practical setting with moderate amounts of space, we present a novel scheme. It yields running time 2n (3/2)p nO(1) in space 2n (3/4)p nO(1) for any p = 0, 1, .. ., n/2; these bounds hold as long as the indegrees are at most 0. 238n. Furthermore, the latter scheme allows easy and efficient parallelization beyond previous algorithms. We also explore empirically the potential of the presented techniques.

FOCS Conference 2008 Conference Paper

Computing the Tutte Polynomial in Vertex-Exponential Time

  • Andreas Björklund
  • Thore Husfeldt
  • Petteri Kaski
  • Mikko Koivisto

The deletion–contraction algorithm is perhapsthe most popular method for computing a host of fundamental graph invariants such as the chromatic, flow, and reliability polynomials in graph theory, the Jones polynomial of an alternating link in knot theory, and the partition functions of the models of Ising, Potts, and Fortuin–Kasteleyn in statistical physics. Prior to this work, deletion–contraction was also the fastest known general-purpose algorithm for these invariants, running in time roughly proportional to the number of spanning trees in the input graph. Here, we give a substantially faster algorithm that computes the Tutte polynomial—and hence, all the aforementioned invariants and more—of an arbitrary graph in time within a polynomial factor of the number of connected vertex sets. The algorithm actually evaluates a multivariate generalization of the Tutte polynomial by making use of an identity due to Fortuin and Kasteleyn. We also provide a polynomial-space variant of the algorithm and give an analogous result for Chung and Graham's cover polynomial.

STOC Conference 2007 Conference Paper

Fourier meets möbius: fast subset convolution

  • Andreas Björklund
  • Thore Husfeldt
  • Petteri Kaski
  • Mikko Koivisto

We present a fast algorithm for the subset convolution problem:given functions f and g defined on the lattice of subsets of an n -element set n , compute their subset convolution f*g, defined for S⊆ N by [ (f * g)(S) = [T ⊆ S] f(T) g(S/T),]where addition and multiplication is carried out in an arbitrary ring. Via Möbius transform and inversion, our algorithm evaluates the subset convolution in O(n 2 2 n ) additions and multiplications, substanti y improving upon the straightforward O(3 n ) algorithm. Specifically, if the input functions have aninteger range [-M,-M+1,...,M], their subset convolution over the ordinary sum--product ring can be computed in Õ(2 n log M) time; the notation Õ suppresses polylogarithmic factors.Furthermore, using a standard embedding technique we can compute the subset convolution over the max--sum or min--sum semiring in Õ(2 n M) time. To demonstrate the applicability of fast subset convolution, wepresent the first Õ(2 k n 2 + n m) algorithm for the Steiner tree problem in graphs with n vertices, k terminals, and m edges with bounded integer weights, improving upon the Õ(3 k n + 2 k n 2 + n m) time bound of the classical Dreyfus-Wagner algorithm. We also discuss extensions to recent Õ(2 n )-time algorithms for covering and partitioning problems (Björklund and Husfeldt, FOCS 2006; Koivisto, FOCS 2006).

UAI Conference 2006 Conference Paper

Advances in Exact Bayesian Structure Discovery in Bayesian Networks

  • Mikko Koivisto

We consider a Bayesian method for learning the Bayesian network structure from complete data. Recently, Koivisto and Sood (2004) presented an algorithm that for any single edge computes its marginal posterior probability in O(n 2^n) time, where n is the number of attributes; the number of parents per attribute is bounded by a constant. In this paper we show that the posterior probabilities for all the n (n - 1) potential edges can be computed in O(n 2^n) total time. This result is achieved by a forward-backward technique and fast Moebius transform algorithms, which are of independent interest. The resulting speedup by a factor of about n^2 allows us to experimentally study the statistical power of learning moderate-size networks. We report results from a simulation study that covers data sets with 20 to 10,000 records over 5 to 25 discrete attributes

FOCS Conference 2006 Conference Paper

An O*(2^n ) Algorithm for Graph Coloring and Other Partitioning Problems via Inclusion--Exclusion

  • Mikko Koivisto

We use the principle of inclusion and exclusion, combined with polynomial time segmentation and fast Mobius transform, to solve the generic problem of summing or optimizing over the partitions of n elements into a given number of weighted subsets. This problem subsumes various classical graph partitioning problems, such as graph coloring, domatic partitioning, and MAX k-CUT, as well as machine learning problems like decision graph learning and model-based data clustering. Our algorithm runs in O*(2^n ) time, thus substantially improving on the usual O*(3^n )-time dynamic programming algorithm; the notation O* suppresses factors polynomial in n. This result improves, e. g. , Byskov's recent record for graph coloring from O*(2. 4023^n ) to O*(2^n ). We note that twenty five years ago, R. M. Karp used inclusion--exclusion in a similar fashion to reduce the space requirement of the usual dynamic programming algorithms from exponential to polynomial.

JMLR Journal 2004 Journal Article

Exact Bayesian Structure Discovery in Bayesian Networks

  • Mikko Koivisto
  • Kismat Sood

Learning a Bayesian network structure from data is a well-motivated but computationally hard task. We present an algorithm that computes the exact posterior probability of a subnetwork, e.g., a directed edge; a modified version of the algorithm finds one of the most probable network structures. This algorithm runs in time O ( n 2 n + n k +1 C ( m )), where n is the number of network variables, k is a constant maximum in-degree, and C ( m ) is the cost of computing a single local marginal conditional likelihood for m data instances. This is the first algorithm with less than super-exponential complexity with respect to n. Exact computation allows us to tackle complex cases where existing Monte Carlo methods and local search procedures potentially fail. We show that also in domains with a large number of variables, exact computation is feasible, given suitable a priori restrictions on the structures; combining exact and inexact methods is also possible. We demonstrate the applicability of the presented algorithm on four synthetic data sets with 17, 22, 37, and 100 variables. [abs] [ pdf ][ ps.gz ][ ps ]