Arrow Research search

Author name cluster

An Xu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

5 papers
2 author rows

Possible papers

5

AAAI Conference 2022 Conference Paper

Coordinating Momenta for Cross-Silo Federated Learning

  • An Xu
  • Heng Huang

Communication efficiency is crucial for federated learning (FL). Conducting local training steps in clients to reduce the communication frequency between clients and the server is a common method to address this issue. However, this strategy leads to the client drift problem due to non-i. i. d. data distributions in different clients which severely deteriorates the performance. In this work, we propose a new method to improve the training performance in cross-silo FL via maintaining double momentum buffers. In our algorithm, one momentum buffer is used to track the server model updating direction, and the other one is adopted to track the local model updating direction. More important, we introduce a novel momentum fusion technique to coordinate the server and local momentum buffers. We also derive the first theoretical convergence analysis involving both the server and local standard momentum SGD. Extensive deep FL experimental results verify that our new approach has a better training performance than the FedAvg and existing standard momentum SGD variants.

ICML Conference 2022 Conference Paper

Detached Error Feedback for Distributed SGD with Random Sparsification

  • An Xu
  • Heng Huang 0001

The communication bottleneck has been a critical problem in large-scale distributed deep learning. In this work, we study distributed SGD with random block-wise sparsification as the gradient compressor, which is ring-allreduce compatible and highly computation-efficient but leads to inferior performance. To tackle this important issue, we improve the communication-efficient distributed SGD from a novel aspect, that is, the trade-off between the variance and second moment of the gradient. With this motivation, we propose a new detached error feedback (DEF) algorithm, which shows better convergence bound than error feedback for non-convex problems. We also propose DEF-A to accelerate the generalization of DEF at the early stages of the training, which shows better generalization bounds than DEF. Furthermore, we establish the connection between communication-efficient distributed SGD and SGD with iterate averaging (SGD-IA) for the first time. Extensive deep learning experiments show significant empirical improvement of the proposed methods under various settings. Our reproducible codes and scripts for all experiments in this work will be made publicly available.

AAAI Conference 2021 Conference Paper

On the Convergence of Communication-Efficient Local SGD for Federated Learning

  • Hongchang Gao
  • An Xu
  • Heng Huang

Federated Learning (FL) has attracted increasing attention in recent years. A leading training algorithm in FL is local SGD, which updates the model parameter on each worker and averages model parameters across different workers only once in a while. Although it has fewer communication rounds than the classical parallel SGD, local SGD still has large communication overhead in each communication round for large machine learning models, such as deep neural networks. To address this issue, we propose a new communicationefficient distributed SGD method, which can significantly reduce the communication cost by the error-compensated double compression mechanism. Under the non-convex setting, our theoretical results show that our approach has better communication complexity than existing methods and enjoys the same linear speedup regarding the number of workers as the full-precision local SGD. Moreover, we propose a communication-efficient distributed SGD with momentum, which also has better communication complexity than existing methods and enjoys a linear speedup with respect to the number of workers. At last, extensive experiments are conducted to verify the performance of our proposed methods.

AAAI Conference 2021 Conference Paper

Step-Ahead Error Feedback for Distributed Training with Compressed Gradient

  • An Xu
  • Zhouyuan Huo
  • Heng Huang

Although the distributed machine learning methods can speed up the training of large deep neural networks, the communication cost has become the non-negligible bottleneck to constrain the performance. To address this challenge, the gradient compression based communication-efficient distributed learning methods were designed to reduce the communication cost, and more recently the local error feedback was incorporated to compensate for the corresponding performance loss. However, in this paper, we will show that a new “gradient mismatch” problem is raised by the local error feedback in centralized distributed training and can lead to degraded performance compared with full-precision training. To solve this critical problem, we propose two novel techniques, 1) step ahead and 2) error averaging, with rigorous theoretical analysis. Both our theoretical and empirical results show that our new methods can handle the “gradient mismatch” problem. The experimental results show that we can even train faster with common gradient compression schemes than both the full-precision training and local error feedback regarding the training epochs and without performance loss.

IROS Conference 2018 Conference Paper

Map-based Deep Imitation Learning for Obstacle Avoidance

  • Yuejiang Liu
  • An Xu
  • Zichong Chen

Making an optimal decision to avoid obstacles while heading to the goal is one of the fundamental challenges for mobile robots equipped with limited computational resources. In this paper, we present a deep imitation learning algorithm that develops a computationally efficient obstacle avoidance policy based on egocentric local occupancy maps. The trained model embedded with a variant of the value iteration networks is able to provide near-optimal continuous action commands through fast feed-forward inferences and generalize well to unseen planning-based scenarios. To improve the policy robustness, we augment the training data set with artificially generated maps, which effectively alleviates the shortage of catastrophic samples in normal demonstrations. Extensive experiments on a Segway robot show the effectiveness of the proposed approach in terms of solution optimality, robustness as well as computation time.