Arrow Research search

Author name cluster

Chris Jones

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers
2 author rows

Possible papers

7

ICML Conference 2022 Conference Paper

Improving Language Models by Retrieving from Trillions of Tokens

  • Sebastian Borgeaud
  • Arthur Mensch
  • Jordan Hoffmann
  • Trevor Cai
  • Eliza Rutherford
  • Katie Millican
  • George van den Driessche 0002
  • Jean-Baptiste Lespiau

We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. With a 2 trillion token database, our Retrieval-Enhanced Transformer (RETRO) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile, despite using 25{\texttimes} fewer parameters. After fine-tuning, RETRO performance translates to downstream knowledge-intensive tasks such as question answering. RETRO combines a frozen Bert retriever, a differentiable encoder and a chunked cross-attention mechanism to predict tokens based on an order of magnitude more data than what is typically consumed during training. We typically train RETRO from scratch, yet can also rapidly RETROfit pre-trained transformers with retrieval and still achieve good performance. Our work opens up new avenues for improving language models through explicit memory at unprecedented scale.

ICML Conference 2022 Conference Paper

Unified Scaling Laws for Routed Language Models

  • Aidan Clark
  • Diego de Las Casas
  • Aurelia Guy
  • Arthur Mensch
  • Michela Paganini
  • Jordan Hoffmann
  • Bogdan Damoc
  • Blake A. Hechtman

The performance of a language model has been shown to be effectively modeled as a power-law in its parameter count. Here we study the scaling behaviors of Routing Networks: architectures that conditionally use only a subset of their parameters while processing an input. For these models, parameter count and computational requirement form two independent axes along which an increase leads to better performance. In this work we derive and justify scaling laws defined on these two variables which generalize those known for standard language models and describe the performance of a wide range of routing architectures trained via three different techniques. Afterwards we provide two applications of these laws: first deriving an Effective Parameter Count along which all models scale at the same rate, and then using the scaling coefficients to give a quantitative comparison of the three routing techniques considered. Our analysis derives from an extensive evaluation of Routing Networks across five orders of magnitude of size, including models with hundreds of experts and hundreds of billions of parameters.

FOCS Conference 2021 Conference Paper

Sum-of-Squares Lower Bounds for Sparse Independent Set

  • Chris Jones
  • Aaron Potechin
  • Goutham Rajendran
  • Madhur Tulsiani
  • Jeff Xu

The Sum-of-Squares (SoS) hierarchy of semidefinite programs is a powerful algorithmic paradigm which captures state-of-the-art algorithmic guarantees for a wide array of problems. In the average case setting, SoS lower bounds provide strong evidence of algorithmic hardness or information-computation gaps. Prior to this work, SoS lower bounds have been obtained for problems in the “dense” input regime, where the input is a collection of independent Rademacher or Gaussian random variables, while the sparse regime has remained out of reach. We make the first progress in this direction by obtaining strong SoS lower bounds for the problem of Independent Set on sparse random graphs. We prove that with high probability over an Erdós-Rénvi random graph $G\sim G_{n_{J}\frac{d}{u}}$ with average degree $d > \log^{2}n$, degree-Dsos SoS fails to refute the existence of an independent set of size $k=\displaystyle \Omega(\frac{n}{\sqrt{d}(\log n)(\mathrm{D}_{\mathrm{S}\mathrm{o}\mathrm{S}})^{c_{0}}})$ in $G$ (where $c_{0}$ is an absolute constant), whereas the true size of the largest independent set in $G$ is $O(\displaystyle \frac{n\log d}{d})$. Our proof involves several significant extensions of the techniques used for proving SoS lower bounds in the dense setting. Previous lower bounds are based on the pseudo-calibration heuristic of Barak et al. [FOCS 2016] which produces a candidate SoS solution using a planted distribution indistinguishable from the input distribution via low-degree tests. In the sparse case the natural planted distribution does admit low-degree distinguishers, and we show how to adapt the pseudo-calibration heuristic to overcome this. Another notorious technical challenge for the sparse regime is the quest for matrix norm bounds. In this paper, we obtain new norm bounds for graph matrices in the sparse setting. While in the dense setting the norms of graph matrices are characterized by the size of the minimum vertex separator of the corresponding graph, this turns not to be the case for sparse graph matrices. Another contribution of our work is developing a new combinatorial understanding of structures needed to understand the norms of sparse graph matrices.

FOCS Conference 2020 Conference Paper

Sum-of-Squares Lower Bounds for Sherrington-Kirkpatrick via Planted Affine Planes

  • Mrinalkanti Ghosh
  • Fernando Granha Jeronimo
  • Chris Jones
  • Aaron Potechin
  • Goutham Rajendran

The Sum-of-Squares (SoS) hierarchy is a semi-definite programming meta-algorithm that captures state-of-the-art polynomial time guarantees for many optimization problems such as Max- $k$ -CSPs and Tensor PCA. On the flip side, a SoS lower bound provides evidence of hardness, which is particularly relevant to average-case problems for which NP-hardness may not be available. In this paper, we consider the following average case problem, which we call the Planted Affine Planes (PAP) problem: Given $m$ random vectors $d_{1}, \ldots, d_{m}$ in $\mathrm{I}\! \mathrm{R}^{n}$, can we prove that there is no vector $v \in \mathrm{I}\! \mathrm{R}^{n}$ such that for all $u\in[m], \ \langle v, d_{u}\rangle^{2}= 1$? In other words, can we prove that $m$ random vectors are not all contained in two parallel hyperplanes at equal distance from the origin? We prove that for $m\leq n^{3/ 2-\varepsilon}$, with high probability, degree- $n^{\Omega(\varepsilon)}$ SoS fails to refute the existence of such a vector $v$. When the vectors $d_{1}, \ldots, d_{m}$ are chosen from the multivariate normal distribution, the PAP problem is equivalent to the problem of proving that a random $n$ -dimensional subspace of $\mathrm{I}\! \mathrm{R}^{m}$ does not contain a boolean vector. As shown by Mohanty–Raghavendra–Xu [STOC 2020], a lower bound for this problem implies a lower bound for the problem of certifying energy upper bounds on the Sherrington-Kirkpatrick Hamiltonian, and so our lower bound implies a degree- $n^{\Omega(\varepsilon)}$ SoS lower bound for the certification version of the Sherrington-Kirkpatrick problem. The full version of the paper is available at http: //arxiv. org/abs/2009. 01874.

AAAI Conference 2004 Short Paper

Utilizing Internal State in Multi-Robot Coordination Tasks

  • Chris Jones
  • Maja J. Mataric

In this paper, we present a principled framework suitable for describing and reasoning about the intertwined entities involved in any task-achieving multi-robot system -- the task environment, task definition, and the capabilities of the robots themselves. Using this framework, we present a systematic procedure by which to synthesize controllers for robots in a multi-robot system such that a given sequential task is correctly executed.