Author name cluster

Manuel Sorge

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

19 papers

2 author rows

AAAI Conference 2026 Conference Paper

How Hard Is It to Explain Preferences Using Few Boolean Attributes?

Clemens Anzinger
Jiehua Chen
Christian Hatschka
Manuel Sorge
Alexander Temper

We study the computational complexity of explaining preference data through Boolean attribute models (BAMs), motivated by extensive research involving attribute models and their promise in understanding preference structure and enabling more efficient decision-making processes. In a BAM, each alternative possesses a subset of binary attributes, each voter cares about a subset of attributes, and voters prefer alternatives with more of their desired attributes. In the BAM problem, we are given a preference profile and want to know whether there is a k-attribute model explaining the profile. We establish a complexity dichotomy for the number of attributes k: BAM is linear-time solvable for k≤2 but NP-complete for k≥3. The problem remains hard even when preference orders have length two. On the positive side, BAM becomes fixed-parameter tractable when parameterized by the number of alternatives m. For the special case of two voters, we provide a linear-time algorithm. We also analyze variants where partial information is given: When voter preferences over attributes are known (BAM With Cares) or when alternative attributes are specified (BAM With Has), showing that for most parameters BAM With Cares is more difficult whereas BAM With Has is more tractable except for being NP-hard even for one voter.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Improving Decision Trees through the Lens of Parameterized Local Search

Juha Harviainen
Frank Sommer
Manuel Sorge

Algorithms for learning decision trees often include heuristic local-search operations such as (1) adjusting the threshold of a cut or (2) also exchanging the feature of that cut. We study minimizing the number of classification errors by performing a fixed number of a single type of these operations. Although we discover that the corresponding problems are NP-complete in general, we provide a comprehensive parameterized-complexity analysis with the aim of determining those properties of the problems that explain the hardness and those that make the problems tractable. For instance, we show that the problems remain hard for a small number $d$ of features or small domain size $D$ but the combination of both yields fixed-parameter tractability. That is, the problems are solvable in $(D + 1)^{2d} \cdot |\mathcal{I}|^{O(1)}$ time, where $|\mathcal{I}|$ is the size of the input. We also provide a proof-of-concept implementation of this algorithm and report on empirical results.

PDF Details

ICML Conference 2025 Conference Paper

Learning Minimum-Size BDDs: Towards Efficient Exact Algorithms

Christian Komusiewicz
André Schidler
Frank Sommer
Manuel Sorge
Luca Pascal Staus

Binary decision diagrams (BDDs) are widely applied tools to compactly represent labeled data as directed acyclic graphs; for efficiency and interpretability reasons small BDDs are preferred. Given labeled data, minimizing BDDs is NP-complete and thus recent research focused on the influence of parameters such as the solution size $s$ on the complexity [Ordyniak et al. , AAAI 2024]. Our main positive result is an algorithm that is efficient if in particular $s$, the domain size $D$, and the Hamming distance between any two data points is small, improving on previous running-time bounds. This algorithm is inspired by the witness-tree paradigm that was recently successful for computing decision trees [Komusiewicz et al. , ICML 2023], whose extension to BDDs was open. We extend our algorithmic results to the case where we allow a small number of misclassified data points and complement them with lower bounds that show that the running times are tight from multiple points of view. We show that our main algorithm holds practical promise by providing a proof-of-concept implementation.

Details

ICML Conference 2025 Conference Paper

Optimal Decision Tree Pruning Revisited: Algorithms and Complexity

Juha Harviainen
Frank Sommer
Manuel Sorge
Stefan Szeider

We present a comprehensive classical and parameterized complexity analysis of decision tree pruning operations, extending recent research on the complexity of learning small decision trees. Thereby, we offer new insights into the computational challenges of decision tree simplification, a crucial aspect of developing interpretable and efficient machine learning models. We focus on fundamental pruning operations of subtree replacement and raising, which are used in heuristics. Surprisingly, while optimal pruning can be performed in polynomial time for subtree replacement, the problem is NP-complete for subtree raising. Therefore, we identify parameters and combinations thereof that lead to fixed-parameter tractability or hardness, establishing a precise borderline between these complexity classes. For example, while subtree raising is hard for small domain size $D$ or number $d$ of features, it can be solved in $D^{2d} \cdot |I|^{O(1)}$ time, where $|I|$ is the input size. We complement our theoretical findings with preliminary experimental results, demonstrating the practical implications of our analysis.

Details

AIJ Journal 2025 Journal Article

The influence of dimensions on the complexity of computing decision trees

Stephen Kobourov
Maarten Löffler
Fabrizio Montecchiani
Marcin Pilipczuk
Ignaz Rutter
Raimund Seidel
Manuel Sorge
Jules Wulms

A decision tree recursively splits a feature space $\mathbb{R}^{d}$ and then assigns class labels based on the resulting partition. Decision trees have been part of the basic machine-learning toolkit for decades. A large body of work treats heuristic algorithms to compute a decision tree from training data, usually aiming to minimize in particular the size of the resulting tree. In contrast, little is known about the complexity of the underlying computational problem of computing a minimum-size tree for the given training data. We study this problem with respect to the number $d$ of dimensions of the feature space. We show that it can be solved in $O(n^{2d + 1}d)$ time, but under reasonable complexity-theoretic assumptions it is not possible to achieve $f(d) \cdot n^{o(d / \log d)}$ running time, where $n$ is the number of training examples. The problem is solvable in $(dR)^{O(dR)} \cdot n^{1+o(1)}$ time, if there are exactly two classes and $R$ is an upper bound on the number of tree leaves labeled with the first~class.

Details DOI

AAAI Conference 2025 Conference Paper

Witty: An Efficient Solver for Computing Minimum-Size Decision Trees

Luca Pascal Staus
Christian Komusiewicz
Frank Sommer
Manuel Sorge

Decision trees are a classic model for summarizing and classifying data. To enhance interpretability and generalization properties, it has been proposed to favor small decision trees. Accordingly, in the minimum-size decision tree training problem (MSDT), the input is a set of training examples in $\mathbb{R}^d$ with class labels and we aim to find a decision tree that classifies all training examples correctly and has a minimum number of nodes. MSDT is NP-hard and therefore presumably not solvable in polynomial time. Nevertheless, a promising algorithmic paradigm called witness trees which solves MSDT efficiently if the solution tree is small has been developed. In this work, we test this paradigm empirically. We provide an implementation, augment it with extensive heuristic improvements, and scrutinize it on standard benchmark instances. The augmentations achieve a mean 324-fold (median 84-fold) speedup over the naive implementation. Compared to the state of the art they achieve a mean 32-fold (median 7-fold) speedup over the dynamic programming based MurTree solver and a mean 61-fold (median 25-fold) speedup over SAT-based implementations. As a theoretical result we obtain an improved worst-case running-time bound for MSDT.

PDF Details DOI

SODA Conference 2023 Conference Paper

Fixed-parameter tractability of DIRECTED MULTICUT with three terminal pairs parameterized by the size of the cutset: twin-width meets flow-augmentation

Meike Hatzel
Lars Jaffke
Paloma T. Lima
Tomás Masarík
Marcin Pilipczuk
Roohani Sharma
Manuel Sorge

Details

AAAI Conference 2023 Conference Paper

Game Implementation: What Are the Obstructions?

Jiehua Chen
Seyedeh Negar Layegh Khavidaki
Sebastian Vincent Haydn
Sofia Simola
Manuel Sorge

In many applications, we want to influence the decisions of independent agents by designing incentives for their actions. We revisit a fundamental problem in this area, called GAME IMPLEMENTATION: Given a game in standard form and a set of desired strategies, can we design a set of payment promises such that if the players take the payment promises into account, then all undominated strategies are desired? Furthermore, we aim to minimize the cost, that is, the worst-case amount of payments. We study the tractability of computing such payment promises and determine more closely what obstructions we may have to overcome in doing so. We show that GAME IMPLEMENTATION is NP-hard even for two players, solving in particular a long-standing open question and suggesting more restrictions are necessary to obtain tractability results. We thus study the regime in which players have only a small constant number of strategies and obtain the following. First, this case remains NP-hard even if each player’s utility depends only on three others. Second, we repair a flawed efficient algorithm for the case of both small number of strategies and small number of players. Among further results, we characterize sets of desired strategies that can be implemented at zero cost as a generalization of Nash equilibria.

PDF Details DOI

ICML Conference 2023 Conference Paper

On Computing Optimal Tree Ensembles

Christian Komusiewicz
Pascal Kunz 0001
Frank Sommer
Manuel Sorge

Random forests and, more generally, (decision-)tree ensembles are widely used methods for classification and regression. Recent algorithmic advances allow to compute decision trees that are optimal for various measures such as their size or depth. We are not aware of such research for tree ensembles and aim to contribute to this area. Mainly, we provide two novel algorithms and corresponding lower bounds. First, we are able to carry over and substantially improve on tractability results for decision trees, obtaining a $(6\delta D S)^S \cdot \mathrm{poly}$-time algorithm, where $S$ is the number of cuts in the tree ensemble, $D$ the largest domain size, and $\delta$ is the largest number of features in which two examples differ. To achieve this, we introduce the witness-tree technique which also seems promising for practice. Second, we show that dynamic programming, which has been successful for decision trees, may also be viable for tree ensembles, providing an $\ell^n \cdot \mathrm{poly}$-time algorithm, where $\ell$ is the number of trees and $n$ the number of examples. Finally, we compare the number of cuts necessary to classify training data sets for decision trees and tree ensembles, showing that ensembles may need exponentially fewer cuts for increasing number of trees.

Details

AAAI Conference 2023 Conference Paper

The Influence of Dimensions on the Complexity of Computing Decision Trees

Stephen G. Kobourov
Maarten Löffler
Fabrizio Montecchiani
Marcin Pilipczuk
Ignaz Rutter
Raimund Seidel
Manuel Sorge
Jules Wulms

A decision tree recursively splits a feature space \mathbb{R}^d and then assigns class labels based on the resulting partition. Decision trees have been part of the basic machine-learning toolkit for decades. A large body of work considers heuristic algorithms that compute a decision tree from training data, usually aiming to minimize in particular the size of the resulting tree. In contrast, little is known about the complexity of the underlying computational problem of computing a minimum-size tree for the given training data. We study this problem with respect to the number d of dimensions of the feature space \mathbb{R}^d, which contains n training examples. We show that it can be solved in O(n^(2d + 1)) time, but under reasonable complexity-theoretic assumptions it is not possible to achieve f(d) * n^o(d / log d) running time. The problem is solvable in (dR)^O(dR) * n^(1+o(1)) time, if there are exactly two classes and R is an upper bound on the number of tree leaves labeled with the first class.

PDF Details DOI

JAIR Journal 2022 Journal Article

Threshold Treewidth and Hypertree Width

Robert Ganian
Andre Schidler
Manuel Sorge
Stefan Szeider

Treewidth and hypertree width have proven to be highly successful structural parameters in the context of the Constraint Satisfaction Problem (CSP). When either of these parameters is bounded by a constant, then CSP becomes solvable in polynomial time. However, here the order of the polynomial in the running time depends on the width, and this is known to be unavoidable; therefore, the problem is not fixed-parameter tractable parameterized by either of these width measures. Here we introduce an enhancement of tree and hypertree width through a novel notion of thresholds, allowing the associated decompositions to take into account information about the computational costs associated with solving the given CSP instance. Aside from introducing these notions, we obtain efficient theoretical as well as empirical algorithms for computing threshold treewidth and hypertree width and show that these parameters give rise to fixed-parameter algorithms for CSP as well as other, more general problems. We complement our theoretical results with experimental evaluations in terms of heuristics as well as exact methods based on SAT/SMT encodings.

PDF Details DOI

SODA Conference 2021 Conference Paper

Efficient fully dynamic elimination forests with applications to detecting long paths and cycles

Jiehua Chen 0001
Wojciech Czerwinski
Yann Disser
Andreas Emil Feldmann
Danny Hermelin
Wojciech Nadara
Marcin Pilipczuk
Michal Pilipczuk

We present a data structure that in a dynamic graph of treedepth at most d, which is modified over time by edge insertions and deletions, maintains an optimum-height elimination forest. The data structure achieves worst-case update time, which matches the best known parameter dependency in the running time of a static fpt algorithm for computing the treedepth of a graph. This improves a result of Dvořák et al. [ESA 2014], who for the same problem achieved update time f ( d ) for some non-elementary (i. e. tower-exponential) function f. As a by-product, we improve known upper bounds on the sizes of minimal obstructions for having treedepth d from doubly-exponential in d to d O ( d ). As applications, we design new fully dynamic parameterized data structures for detecting long paths and cycles in general graphs. More precisely, for a fixed parameter k and a dynamic graph G, modified over time by edge insertions and deletions, our data structures maintain answers to the following queries: Does G contain a simple path on k vertices? Does G contain a simple cycle on at least k vertices? In the first case, the data structure achieves amortized update time. In the second case, the amortized update time is. In both cases we assume access to a dictionary on the edges of G.

Details

IJCAI Conference 2021 Conference Paper

Fractional Matchings under Preferences: Stability and Optimality

Jiehua Chen
Sanjukta Roy
Manuel Sorge

We study generalizations of stable matching in which agents may be matched fractionally; this models time-sharing assignments. We focus on the so-called ordinal stability and cardinal stability, and investigate the computational complexity of finding an ordinally stable or cardinally stable fractional matching which either maximizes the social welfare (i. e. , the overall utilities of the agents) or the number of fully matched agents (i. e. , agents whose matching values sum up to one). We complete the complexity classification of both optimization problems for both ordinal stability and cardinal stability, distinguishing between the marriage (bipartite) and roommates (non-bipartite) cases and the presence or absence of ties in the preferences. In particular, we prove a surprising result that finding a cardinally stable fractional matching with maximum social welfare is NP-hard even for the marriage case without ties. This answers an open question and exemplifies a rare variant of stable marriage that remains hard for preferences without ties. We also complete the picture of the relations of the stability notions and derive structural properties.

PDF Details DOI

SODA Conference 2021 Conference Paper

Optimal Discretization is Fixed-parameter Tractable

Stefan Kratsch
Tomás Masarík
Irene Muzi
Marcin Pilipczuk
Manuel Sorge

Given two disjoint sets W 1 and W 2 of points in the plane, the O ptimal D iscretization problem asks for the minimum size of a family of horizontal and vertical lines that separate W 1 from W 2, that is, in every region into which the lines partition the plane there are either only points of W 1, or only points of W 2, or the region is empty. Equivalently, O ptimal D iscretization can be phrased as a task of discretizing continuous variables: We would like to discretize the range of x -coordinates and the range of y -coordinates into as few segments as possible, maintaining that no pair of points from W 1 × W 2 are projected onto the same pair of segments under this discretization. We provide a fixed-parameter algorithm for the problem, parameterized by the number of lines in the solution. Our algorithm works in time, where k is the bound on the number of lines to find and n is the number of points in the input. Our result answers in positive a question of Bonnet, Giannopolous, and Lampis [IPEC 2017] and of Froese (PhD thesis, 2018) and is in contrast with the known intractability of two closely related generalizations: the Rectangle Stabbing problem and the generalization in which the selected lines are not required to be axis-parallel.

Details

IJCAI Conference 2020 Conference Paper

Threshold Treewidth and Hypertree Width

Robert Ganian
Andre Schidler
Manuel Sorge
Stefan Szeider

PDF Details DOI

AIJ Journal 2016 Journal Article

H-index manipulation by merging articles: Models, theory, and experiments

René van Bevern
Christian Komusiewicz
Rolf Niedermeier
Manuel Sorge
Toby Walsh

An author's profile on Google Scholar consists of indexed articles and associated data, such as the number of citations and the H-index. The author is allowed to merge articles; this may affect the H-index. We analyze the (parameterized) computational complexity of maximizing the H-index using article merges. Herein, to model realistic manipulation scenarios, we define a compatibility graph whose edges correspond to plausible merges. Moreover, we consider several different measures for computing the citation count of a merged article. For the measure used by Google Scholar, we give an algorithm that maximizes the H-index in linear time if the compatibility graph has constant-size connected components. In contrast, if we allow to merge arbitrary articles (that is, for compatibility graphs that are cliques), then already increasing the H-index by one is NP-hard. Experiments on Google Scholar profiles of AI researchers show that the H-index can be manipulated substantially only if one merges articles with highly dissimilar titles.

Details DOI

ECAI Conference 2016 Conference Paper

h-Index Manipulation by Undoing Merges

René van Bevern
Christian Komusiewicz
Hendrik Molter
Rolf Niedermeier
Manuel Sorge
Toby Walsh

The h-index is an important bibliographic measure used to assess the performance of researchers. Van Bevern et al. [Artif. Intel. , to appear] showed that, despite computational worst-case hardness results, substantial manipulation of the h-index of Google Scholar author profiles is possible by merging articles. Complementing this work, we study the opposite operation, the splitting of articles, which is arguably the more natural operation for manipulation and which is also allowed within Google Scholar. We present numerous results on computational complexity (from linear-time algorithms to parameterized computational hardness results) and empirically indicate that at least small improvements of the h-index by splitting merged articles are easily achievable.

Details

IJCAI Conference 2015 Conference Paper

H-Index Manipulation by Merging Articles: Models, Theory, and Experiments

Ren
eacute; van Bevern
Christian Komusiewicz
Rolf Niedermeier
Manuel Sorge
Toby Walsh

An author’s profile on Google Scholar consists of indexed articles and associated data, such as the number of citations and the H-index. The author is allowed to merge articles, which may affect the H-index. We analyze the parameterized complexity of maximizing the H-index using article merges. Herein, to model realistic manipulation scenarios, we define a compatability graph whose edges correspond to plausible merges. Moreover, we consider multiple possible measures for computing the citation count of a merged article. For the measure used by Google Scholar, we give an algorithm that maximizes the H-index in linear time if the compatibility graph has constant-size connected components. In contrast, if we allow to merge arbitrary articles, then already increasing the H-index by one is NP-hard. Experiments on Google Scholar profiles of AI researchers show that the H-index can be manipulated substantially only by merging articles with highly dissimilar titles, which would be easy to discover.

PDF Details

MFCS Conference 2013 Conference Paper

A Parameterized Complexity Analysis of Combinatorial Feature Selection Problems

Vincent Froese
René van Bevern
Rolf Niedermeier
Manuel Sorge

Abstract We examine the algorithmic tractability of NP-hard combinatorial feature selection problems in terms of parameterized complexity theory. In combinatorial feature selection, one seeks to discard dimensions from high-dimensional data such that the resulting instances fulfill a desired property. In parameterized complexity analysis, one seeks to identify relevant problem-specific quantities and tries to determine their influence on the computational complexity of the considered problem. In this paper, for various combinatorial feature selection problems, we identify parameterizations and reveal to what extent these govern computational complexity. We provide tractability as well as intractability results; for example, we show that the Distinct Vectors problem on binary points is polynomial-time solvable if each pair of points differs in at most three dimensions, whereas it is NP-hard otherwise.

Details