Arrow Research search

Author name cluster

Lukasz Kaiser

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

18 papers
2 author rows

Possible papers

18

ICLR Conference 2021 Conference Paper

Rethinking Attention with Performers

  • Krzysztof Choromanski
  • Valerii Likhosherstov
  • David Dohan
  • Xingyou Song
  • Andreea Gane
  • Tamás Sarlós
  • Peter Hawkins
  • Jared Quincy Davis

We introduce Performers, Transformer architectures which can estimate regular (softmax) full-rank-attention Transformers with provable accuracy, but using only linear (as opposed to quadratic) space and time complexity, without relying on any priors such as sparsity or low-rankness. To approximate softmax attention-kernels, Performers use a novel Fast Attention Via positive Orthogonal Random features approach (FAVOR+), which may be of independent interest for scalable kernel methods. FAVOR+ can also be used to efficiently model kernelizable attention mechanisms beyond softmax. This representational power is crucial to accurately compare softmax with other kernels for the first time on large-scale tasks, beyond the reach of regular Transformers, and investigate optimal attention-kernels. Performers are linear architectures fully compatible with regular Transformers and with strong theoretical guarantees: unbiased or nearly-unbiased estimation of the attention matrix, uniform convergence and low estimation variance. We tested Performers on a rich set of tasks stretching from pixel-prediction through text models to protein sequence modeling. We demonstrate competitive results with other examined efficient sparse and dense attention methods, showcasing effectiveness of the novel attention-learning paradigm leveraged by Performers.

NeurIPS Conference 2021 Conference Paper

Sparse is Enough in Scaling Transformers

  • Sebastian Jaszczur
  • Aakanksha Chowdhery
  • Afroz Mohiuddin
  • Lukasz Kaiser
  • Wojciech Gajewski
  • Henryk Michalewski
  • Jonni Kanerva

Large Transformer models yield impressive results on many tasks, but are expensive to train, or even fine-tune, and so slow at decoding that their use and study becomes out of reach. We address this problem by leveraging sparsity. We study sparse variants for all layers in the Transformer and propose Scaling Transformers, a family of next generation Transformer models that use sparse layers to scale efficiently and perform unbatched decoding much faster than the standard Transformer as we scale up the model size. Surprisingly, the sparse layers are enough to obtain the same perplexity as the standard Transformer with the same number of parameters. We also integrate with prior sparsity approaches to attention and enable fast inference on long sequences even with limited memory. This results in performance competitive to the state-of-the-art on long text summarization.

ICLR Conference 2020 Conference Paper

Model Based Reinforcement Learning for Atari

  • Lukasz Kaiser
  • Mohammad Babaeizadeh
  • Piotr Milos
  • Blazej Osinski
  • Roy H. Campbell
  • Konrad Czechowski
  • Dumitru Erhan
  • Chelsea Finn

Model-free reinforcement learning (RL) can be used to learn effective policies for complex tasks, such as Atari games, even from image observations. However, this typically requires very large amounts of interaction -- substantially more, in fact, than a human would need to learn the same games. How can people learn so quickly? Part of the answer may be that people can learn how the game works and predict which actions will lead to desirable outcomes. In this paper, we explore how video prediction models can similarly enable agents to solve Atari games with fewer interactions than model-free methods. We describe Simulated Policy Learning (SimPLe), a complete model-based deep RL algorithm based on video prediction models and present a comparison of several model architectures, including a novel architecture that yields the best results in our setting. Our experiments evaluate SimPLe on a range of Atari games in low data regime of 100k interactions between the agent and the environment, which corresponds to two hours of real-time play. In most games SimPLe outperforms state-of-the-art model-free algorithms, in some games by over an order of magnitude.

ICLR Conference 2020 Conference Paper

Reformer: The Efficient Transformer

  • Nikita Kitaev
  • Lukasz Kaiser
  • Anselm Levskaya

Large Transformer models routinely achieve state-of-the-art results on a number of tasks but training these models can be prohibitively costly, especially on long sequences. We introduce two techniques to improve the efficiency of Transformers. For one, we replace dot-product attention by one that uses locality-sensitive hashing, changing its complexity from O($L^2$) to O($L \log L$), where $L$ is the length of the sequence. Furthermore, we use reversible residual layers instead of the standard residuals, which allows storing activations only once in the training process instead of N times, where N is the number of layers. The resulting model, the Reformer, performs on par with Transformer models while being much more memory-efficient and much faster on long sequences.

ICML Conference 2019 Conference Paper

Area Attention

  • Yang Li 0058
  • Lukasz Kaiser
  • Samy Bengio
  • Si Si

Existing attention mechanisms are trained to attend to individual items in a collection (the memory) with a predefined, fixed granularity, e. g. , a word token or an image grid. We propose area attention: a way to attend to areas in the memory, where each area contains a group of items that are structurally adjacent, e. g. , spatially for a 2D memory such as images, or temporally for a 1D memory such as natural language sentences. Importantly, the shape and the size of an area are dynamically determined via learning, which enables a model to attend to information with varying granularity. Area attention can easily work with existing model architectures such as multi-head attention for simultaneously attending to multiple areas in the memory. We evaluate area attention on two tasks: neural machine translation (both character and token-level) and image captioning, and improve upon strong (state-of-the-art) baselines in all the cases. These improvements are obtainable with a basic form of area attention that is parameter free.

ICML Conference 2018 Conference Paper

Fast Decoding in Sequence Models Using Discrete Latent Variables

  • Lukasz Kaiser
  • Samy Bengio
  • Aurko Roy
  • Ashish Vaswani
  • Niki Parmar
  • Jakob Uszkoreit
  • Noam Shazeer

Autoregressive sequence models based on deep neural networks, such as RNNs, Wavenet and Transformer are the state-of-the-art on many tasks. However, they lack parallelism and are thus slow for long sequences. RNNs lack parallelism both during training and decoding, while architectures like WaveNet and Transformer are much more parallel during training, but still lack parallelism during decoding. We present a method to extend sequence models using discrete latent variables that makes decoding much more parallel. The main idea behind this approach is to first autoencode the target sequence into a shorter discrete latent sequence, which is generated autoregressively, and finally decode the full sequence from this shorter latent sequence in a parallel manner. To this end, we introduce a new method for constructing discrete latent variables and compare it with previously introduced methods. Finally, we verify that our model works on the task of neural machine translation, where our models are an order of magnitude faster than comparable autoregressive models and, while lower in BLEU than purely autoregressive models, better than previously proposed non-autogregressive translation.

ICML Conference 2018 Conference Paper

Image Transformer

  • Niki Parmar
  • Ashish Vaswani
  • Jakob Uszkoreit
  • Lukasz Kaiser
  • Noam Shazeer
  • Alexander Ku
  • Dustin Tran

Image generation has been successfully cast as an autoregressive sequence generation or transformation problem. Recent work has shown that self-attention is an effective way of modeling textual sequences. In this work, we generalize a recently proposed model architecture based on self-attention, the Transformer, to a sequence modeling formulation of image generation with a tractable likelihood. By restricting the self-attention mechanism to attend to local neighborhoods we significantly increase the size of images the model can process in practice, despite maintaining significantly larger receptive fields per layer than typical convolutional neural networks. While conceptually simple, our generative models significantly outperform the current state of the art in image generation on ImageNet, improving the best published negative log-likelihood on ImageNet from 3. 83 to 3. 77. We also present results on image super-resolution with a large magnification ratio, applying an encoder-decoder configuration of our architecture. In a human evaluation study, we find that images generated by our super-resolution model fool human observers three times more often than the previous state of the art.

CSL Conference 2015 Conference Paper

A Unified Approach to Boundedness Properties in MSO

  • Lukasz Kaiser
  • Martin Lang 0001
  • Simon Leßenich
  • Christof Löding

In the past years, extensions of monadic second-order logic (MSO) that can specify boundedness properties by the use of operators referring to the sizes of sets have been considered. In particular, the logics costMSO introduced by T. Colcombet and MSO+U by M. Bojanczyk were analyzed and connections to automaton models have been established to obtain decision procedures for these logics. In this work, we propose the logic quantitative counting MSO (qcMSO for short), which combines aspects from both costMSO and MSO+U. We show that both logics can be embedded into qcMSO in a natural way. Moreover, we provide a decidability proof for the theory of its weak variant (quantification only over finite sets) for the natural numbers with order and the infinite binary tree. These decidability results are obtained using a regular cost function extension of automatic structures called resource-automatic structures.

SAT Conference 2014 Conference Paper

MPIDepQBF: Towards Parallel QBF Solving without Knowledge Sharing

  • Charles Jordan
  • Lukasz Kaiser
  • Florian Lonsing
  • Martina Seidl

Abstract Inspired by recent work on parallel SAT solving, we present a lightweight approach for solving quantified Boolean formulas (QBFs) in parallel. In particular, our approach uses a sequential state-of-the-art QBF solver to evaluate subformulas in working processes. It abstains from globally exchanging information between the workers, but keeps learnt information only locally. To this end, we equipped the state-of-the-art QBF solver DepQBF with assumption-based reasoning and integrated it in our novel solver MPIDepQBF as backend solver. Extensive experiments on standard computers as well as on the supercomputer Tsubame show the impact of our approach.

SAT Conference 2013 Conference Paper

Experiments with Reduction Finding

  • Charles Jordan
  • Lukasz Kaiser

Abstract Reductions are perhaps the most useful tool in complexity theory and, naturally, it is in general undecidable to determine whether a reduction exists between two given decision problems. However, asking for a reduction on inputs of bounded size is essentially a \(\Sigma^p_2\) problem and can in principle be solved by ASP, QBF, or by iterated calls to SAT solvers. We describe our experiences developing and benchmarking automatic reduction finders. We created a dedicated reduction finder that does counter-example guided abstraction refinement by iteratively calling either a SAT solver or BDD package. We benchmark its performance with different SAT solvers and report the tradeoffs between the SAT and BDD approaches. Further, we compare this reduction finder with the direct approach using a number of QBF and ASP solvers. We describe the tradeoffs between the QBF and ASP approaches and show which solvers perform best on our \(\Sigma^p_2\) instances. It turns out that even state-of-the-art solvers leave a large room for improvement on problems of this kind. We thus provide our instances as a benchmark for future work on \(\Sigma^p_2\) solvers.

CSL Conference 2012 Conference Paper

A Counting Logic for Structure Transition Systems

  • Lukasz Kaiser
  • Simon Leßenich

Quantitative questions such as "what is the maximum number of tokens in a place of a Petri net? " or "what is the maximal reachable height of the stack of a pushdown automaton? " play a significant role in understanding models of computation. To study such problems in a systematic way, we introduce structure transition systems on which one can define logics that mix temporal expressions (e. g. reachability) with properties of a state (e. g. the height of the stack). We propose a counting logic Qmu[#MSO] which allows to express questions like the ones above, and also many boundedness problems studied so far. We show that Qmu[#MSO] has good algorithmic properties, in particular we generalize two standard methods in model checking, decomposition on trees and model checking through parity games, to this quantitative logic. These properties are used to prove decidability of Qmu[#MSO] on tree-producing pushdown systems, a generalization of both pushdown systems and regular tree grammars.

AAAI Conference 2012 Conference Paper

Learning Games from Videos Guided by Descriptive Complexity

  • Lukasz Kaiser

In recent years, several systems have been proposed that learn the rules of a simple card or board game solely from visual demonstration. These systems were constructed for specific games and rely on substantial background knowledge. We introduce a general system for learning board game rules from videos and demonstrate it on several well-known games. The presented algorithm requires only a few demonstrations and minimal background knowledge, and, having learned the rules, automatically derives position evaluation functions and can play the learned games competitively. Our main technique is based on descriptive complexity, i. e. the logical means necessary to define a set of interest. We compute formulas defining allowed moves and final positions in a game in different logics and select the most adequate ones. We show that this method is well-suited for board games and there is strong theoretical evidence that it will generalize to other problems.

MFCS Conference 2012 Conference Paper

Solving Counter Parity Games

  • Dietmar Berwanger
  • Lukasz Kaiser
  • Simon Leßenich

Abstract We study a class of parity games equipped with counters that evolve according to arbitrary non-negative affine functions. These games capture several cost models for dynamic systems from the literature. We present an elementary algorithm for computing the exact value of a counter parity game, which both generalizes previous results and improves their complexity. To this end, we introduce a class of ω -regular games with imperfect information and imperfect recall, solve them using automata-based techniques, and prove a correspondence between finite-memory strategies in such games and strategies in counter parity games.

AAAI Conference 2011 Conference Paper

First-Order Logic with Counting for General Game Playing

  • Lukasz Kaiser
  • Lukasz Stafiniak

General Game Players (GGPs) are programs which can play an arbitrary game given only its rules and the Game Description Language (GDL) is a variant of Datalog used in GGP competitions to specify the rules. GDL inherits from Datalog the use of Horn clauses as rules and recursion, but it too requires stratification and does not allow to use quanti- fiers. We present an alternative formalism for game description which is based on first-order logic (FO). States of the game are represented by relational structures, legal moves by structure rewriting rules guarded by FO formulas, and the goals of the players by formulas which extend FO with counting. The advantage of our formalism comes from more explicit state representation and from the use of quantifiers in formulas. We show how to exploit existential quantification in players’ goals to generate heuristics for evaluating positions in the game. The derived heuristics are good enough for a basic alpha-beta agent to win against state of the art GGP.

CSL Conference 2010 Conference Paper

New Algorithm for Weak Monadic Second-Order Logic on Inductive Structures

  • Tobias Ganzow
  • Lukasz Kaiser

Abstract We present a new algorithm for model-checking weak monadic second-order logic on inductive structures, a class of structures of bounded clique width. Our algorithm directly manipulates formulas and checks them on the structure of interest, thus avoiding both the use of automata and the need to interpret the structure in the binary tree. In addition to the algorithm, we give a new proof of decidability of weak MSO on inductive structures which follows Shelah’s composition method. Generalizing this proof technique, we obtain decidability of weak MSO extended with the unbounding quantifier on the binary tree, which was open before.

CSL Conference 2009 Conference Paper

Cardinality Quantifiers in MLO over Trees

  • Vince Bárány
  • Lukasz Kaiser
  • Alexander Rabinovich

Abstract We study an extension of monadic second-order logic of order with the uncountability quantifier “there exist uncountably many sets”. We prove that, over the class of finitely branching trees, this extension is equally expressive to plain monadic second-order logic of order. Additionally we find that the continuum hypothesis holds for classes of sets definable in monadic second-order logic over finitely branching trees, which is notable for not all of these classes are analytic. Our approach is based on Shelah’s composition method and uses basic results from descriptive set theory. The elimination result is constructive, yielding a decision procedure for the extended logic. Furthermore, by the well-known correspondence between monadic second-order logic and tree automata, our findings translate to analogous results on the extension of first-order logic by cardinality quantifiers over injectively presentable Rabin-automatic structures, generalizing the work of Kuske and Lohrey.

MFCS Conference 2009 Conference Paper

Synthesis for Structure Rewriting Systems

  • Lukasz Kaiser

Abstract The description of a single state of a modelled system is often complex in practice, but few procedures for synthesis address this problem in depth. We study systems in which a state is described by an arbitrary finite structure, and changes of the state are represented by structure rewriting rules, a generalisation of term and graph rewriting. Both the environment and the controller are allowed to change the structure in this way, and the question we ask is how a strategy for the controller that ensures a given property can be synthesised. We focus on one particular class of structure rewriting rules, namely on separated structure rewriting, a limited syntactic class of rules. To counter this restrictiveness, we allow the property to be ensured by the controller to be specified in a very expressive logic: a combination of monadic second-order logic evaluated on states and the modal μ -calculus for the temporal evolution of the whole system. We show that for the considered class of rules and this logic, it can be decided whether the controller has a strategy ensuring a given property, and in such case a finite-memory strategy can be synthesised. Additionally, we prove that the same holds if the property is given by a monadic second-order formula to be evaluated on the limit of the evolution of the system.

CSL Conference 2006 Conference Paper

Game Quantification on Automatic Structures and Hierarchical Model Checking Games

  • Lukasz Kaiser

Abstract Game quantification is an expressive concept and has been studied in model theory and descriptive set theory, especially in relation to infinitary logics. Automatic structures on the other hand appear very often in computer science, especially in program verification. We extend first-order logic on structures on words by allowing to use an infinite string of alternating quantifiers on letters of a word, the game quantifier. This extended logic is decidable and preserves regularity on automatic structures, but can be undecidable on other structures even with decidable first-order theory. We show that in the presence of game quantifier any relation that allows to distinguish successors is enough to define all regular relations and therefore the game quantifier is strictly more expressive than first-order logic in such cases. Conversely, if there is an automorphism of atomic relations that swaps some successors then we prove that it can be extended to any relations definable with game quantifier. After investigating it’s expressiveness, we use game quantification to introduce a new type of combinatorial games with multiple players and imperfect information exchanged with respect to a hierarchical constraint. It is shown that these games on finite arenas exactly capture the logic with game quantifier when players alternate their moves but are undecidable and not necessarily determined in the other case. In this way we define the first model checking games with finite arenas that can be used for model checking first-order logic on automatic structures.