Arrow Research search

Author name cluster

Arvind Krishnamurthy

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

3 papers
2 author rows

Possible papers

3

ICLR Conference 2021 Conference Paper

AutoLRS: Automatic Learning-Rate Schedule by Bayesian Optimization on the Fly

  • Yuchen Jin
  • Tianyi Zhou 0001
  • Liangyu Zhao
  • Yibo Zhu 0001
  • Chuanxiong Guo
  • Marco Canini
  • Arvind Krishnamurthy

The learning rate (LR) schedule is one of the most important hyper-parameters needing careful tuning in training DNNs. However, it is also one of the least automated parts of machine learning systems and usually costs significant manual effort and computing. Though there are pre-defined LR schedules and optimizers with adaptive LR, they introduce new hyperparameters that need to be tuned separately for different tasks/datasets. In this paper, we consider the question: Can we automatically tune the LR over the course of training without human involvement? We propose an efficient method, AutoLRS, which automatically optimizes the LR for each training stage by modeling training dynamics. AutoLRS aims to find an LR that minimizes the validation loss, every $\tau$ steps. We formulate it as black-box optimization and solve it by Bayesian optimization (BO). However, collecting training instances for BO requires a system to evaluate each LR queried by BO's acquisition function for $\tau$ steps, which is prohibitively expensive in practice. Instead, we apply each candidate LR for only $\tau'\ll\tau$ steps and train an exponential model to predict the validation loss after $\tau$ steps. This mutual-training process between BO and the exponential model allows us to bound the number of training steps invested in the BO search. We demonstrate the advantages and the generality of AutoLRS through extensive experiments of training DNNs from diverse domains and using different optimizers. The LR schedules auto-generated by AutoLRS leads to a speedup of $1.22\times$, $1.43\times$, and $1.5\times$ when training ResNet-50, Transformer, and BERT, respectively, compared to the LR schedules in their original papers, and an average speedup of $1.31\times$ over state-of-the-art highly tuned LR schedules.

NeurIPS Conference 2018 Conference Paper

Learning to Optimize Tensor Programs

  • Tianqi Chen
  • Lianmin Zheng
  • Eddie Yan
  • Ziheng Jiang
  • Thierry Moreau
  • Luis Ceze
  • Carlos Guestrin
  • Arvind Krishnamurthy

We introduce a learning-based framework to optimize tensor programs for deep learning workloads. Efficient implementations of tensor operators, such as matrix multiplication and high dimensional convolution are key enablers of effective deep learning systems. However, existing systems rely on manually optimized libraries such as cuDNN where only a narrow range of server class GPUs are well-supported. The reliance on hardware specific operator libraries limits the applicability of high-level graph optimizations and incurs significant engineering costs when deploying to new hardware targets. We use learning to remove this engineering burden. We learn domain specific statistical cost models to guide the search of tensor operator implementations over billions of possible program variants. We further accelerate the search by effective model transfer across workloads. Experimental results show that our framework delivers performance competitive with state-of-the-art hand-tuned libraries for low-power CPU, mobile GPU, and server-class GPU.

TCS Journal 2003 Journal Article

Hardness results for multicast cost sharing

  • Joan Feigenbaum
  • Arvind Krishnamurthy
  • Rahul Sami
  • Scott Shenker

We continue the study of multicast cost sharing from the viewpoints of both computational complexity and economic mechanism design. We provide fundamental lower bounds on the network complexity of group-strategyproof, budget-balanced mechanisms. We also extend a classical impossibility result in game theory to show that no strategyproof mechanism can be both approximately efficient and approximately budget-balanced. Our results show that one important and natural case of multicast cost sharing is an example of a canonical hard problem in distributed, algorithmic mechanism design; in this sense, they represent progress toward the development of a complexity theory of Internet computation.