Arrow Research search

Author name cluster

Yifei Cheng

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

2 papers
1 author row

Possible papers

2

AAAI Conference 2021 Conference Paper

STL-SGD: Speeding Up Local SGD with Stagewise Communication Period

  • Shuheng Shen
  • Yifei Cheng
  • Jingchang Liu
  • Linli Xu

Distributed parallel stochastic gradient descent algorithms are workhorses for large scale machine learning tasks. Among them, local stochastic gradient descent (Local SGD) has attracted significant attention due to its low communication complexity. Previous studies prove that the communication complexity of Local SGD with a fixed or an adaptive communication period is in the order of O(N 3 2 T 1 2 ) and O(N 3 4 T 3 4 ) when the data distributions on clients are identical (IID) or otherwise (Non-IID), where N is the number of clients and T is the number of iterations. In this paper, to accelerate the convergence by reducing the communication complexity, we propose STagewise Local SGD (STL-SGD), which increases the communication period gradually along with decreasing learning rate. We prove that STL-SGD can keep the same convergence rate and linear speedup as mini-batch SGD. In addition, as the benefit of increasing the communication period, when the objective is strongly convex or satisfies the Polyak-Łojasiewicz condition, the communication complexity of STL-SGD is O(N log T) and O(N 1 2 T 1 2 ) for the IID case and the Non-IID case respectively, achieving significant improvements over Local SGD. Experiments on both convex and non-convex problems demonstrate the superior performance of STL-SGD.

IJCAI Conference 2019 Conference Paper

Faster Distributed Deep Net Training: Computation and Communication Decoupled Stochastic Gradient Descent

  • Shuheng Shen
  • Linli Xu
  • Jingchang Liu
  • Xianfeng Liang
  • Yifei Cheng

With the increase in the amount of data and the expansion of model scale, distributed parallel training becomes an important and successful technique to address the optimization challenges. Nevertheless, although distributed stochastic gradient descent (SGD) algorithms can achieve a linear iteration speedup, they are limited significantly in practice by the communication cost, making it difficult to achieve a linear time speedup. In this paper, we propose a computation and communication decoupled stochastic gradient descent (CoCoD-SGD) algorithm to run computation and communication in parallel to reduce the communication cost. We prove that CoCoD-SGD has a linear iteration speedup with respect to the total computation capability of the hardware resources. In addition, it has a lower communication complexity and better time speedup comparing with traditional distributed SGD algorithms. Experiments on deep neural network training demonstrate the significant improvements of CoCoD-SGD: when training ResNet18 and VGG16 with 16 Geforce GTX 1080Ti GPUs, CoCoD-SGD is up to 2-3 x faster than traditional synchronous SGD.