Arrow Research search

Author name cluster

David Luan

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

2 papers
2 author rows

Possible papers

2

JMLR Journal 2023 Journal Article

PaLM: Scaling Language Modeling with Pathways

  • Aakanksha Chowdhery
  • Sharan Narang
  • Jacob Devlin
  • Maarten Bosma
  • Gaurav Mishra
  • Adam Roberts
  • Paul Barham
  • Hyung Won Chung

Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning, which drastically reduces the number of task-specific training examples needed to adapt the model to a particular application. To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model (PaLM). We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML system which enables highly efficient training across multiple TPU Pods. We demonstrate continued benefits of scaling by achieving state-of-the-art few-shot learning results on hundreds of language understanding and generation benchmarks. On a number of these tasks, PaLM 540B achieves breakthrough performance, outperforming the finetuned state-of-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark. A significant number of BIG-bench tasks showed discontinuous improvements from model scale, meaning that performance steeply increased as we scaled to our largest model. PaLM also has strong capabilities in multilingual tasks and source code generation, which we demonstrate on a wide array of benchmarks. We additionally provide a comprehensive analysis on bias and toxicity, and study the extent of training data memorization with respect to model scale. Finally, we discuss the ethical considerations related to large language models and discuss potential mitigation strategies. [abs] [ pdf ][ bib ] &copy JMLR 2023. ( edit, beta )

ICML Conference 2020 Conference Paper

Generative Pretraining From Pixels

  • Mark Chen 0003
  • Alec Radford
  • Rewon Child
  • Jeffrey Wu 0003
  • Heewoo Jun
  • David Luan
  • Ilya Sutskever

Inspired by progress in unsupervised representation learning for natural language, we examine whether similar models can learn useful representations for images. We train a sequence Transformer to auto-regressively predict pixels, without incorporating knowledge of the 2D input structure. Despite training on low-resolution ImageNet without labels, we find that a GPT-2 scale model learns strong image representations as measured by linear probing, fine-tuning, and low-data classification. On CIFAR-10, we achieve 96. 3% accuracy with a linear probe, outperforming a supervised Wide ResNet, and 99. 0% accuracy with full fine-tuning, matching the top supervised pre-trained models. We are also competitive with self-supervised benchmarks on ImageNet when substituting pixels for a VQVAE encoding, achieving 69. 0% top-1 accuracy on a linear probe of our features.