Author name cluster

Afshin Abdi

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

4 papers

1 author row

AAAI Conference 2023 Conference Paper

Efficient Distributed Inference of Deep Neural Networks via Restructuring and Pruning

Afshin Abdi
Saeed Rashidi
Faramarz Fekri
Tushar Krishna

In this paper, we consider the parallel implementation of an already-trained deep model on multiple processing nodes (a.k.a. workers). Specifically, we investigate as to how a deep model should be divided into several parallel sub-models, each of which is executed efficiently by a worker. Since latency due to synchronization and data transfer among workers negatively impacts the performance of the parallel implementation, it is desirable to have minimum interdependency among parallel sub-models. To achieve this goal, we propose to rearrange the neurons in the neural network, partition them (without changing the general topology of the neural network), and modify the weights such that the interdependency among sub-models is minimized under the computations and communications constraints of the workers while minimizing its impact on the performance of the model. We propose RePurpose, a layer-wise model restructuring and pruning technique that guarantees the performance of the overall parallelized model. To efficiently apply RePurpose, we propose an approach based on L0 optimization and the Munkres assignment algorithm. We show that, compared to the existing methods, RePurpose significantly improves the efficiency of the distributed inference via parallel implementation, both in terms of communication and computational complexity.

PDF Details DOI

AAAI Conference 2020 Conference Paper

Indirect Stochastic Gradient Quantization and Its Application in Distributed Deep Learning

Afshin Abdi
Faramarz Fekri

Transmitting the gradients or model parameters is a critical bottleneck in distributed training of large models. To mitigate this issue, we propose an indirect quantization and compression of stochastic gradients (SG) via factorization. The gist of the idea is that, in contrast to the direct compression methods, we focus on the factors in SGs, i. e. , the forward and backward signals in the backpropagation algorithm. We observe that these factors are correlated and generally sparse in most deep models. This gives rise to rethinking of the approaches for quantization and compression of gradients with the ultimate goal of minimizing the error in the ﬁnal computed gradients subject to the desired communication constraints. We have proposed and theoretically analyzed different indirect SG quantization (ISGQ) methods. The proposed ISGQ reduces the reconstruction error in SGs compared to the direct quantization methods with the same number of quantization bits. Moreover, it can achieve compression gains of more than 100, while the existing traditional quantization schemes can achieve compression ratio of at most 32 (quantizing to 1 bit). Further, for a ﬁxed total batch-size, the required transmission bit-rate per worker decreases in ISGQ as the number of workers increases.

PDF Details

AAAI Conference 2020 Conference Paper

Quantized Compressive Sampling of Stochastic Gradients for Efficient Communication in Distributed Deep Learning

Afshin Abdi
Faramarz Fekri

In distributed training of deep models, the transmission volume of stochastic gradients (SG) imposes a bottleneck in scaling up the number of processing nodes. On the other hand, the existing methods for compression of SGs have two major drawbacks. First, due to the increase in the overall variance of the compressed SG, the hyperparameters of the learning algorithm must be readjusted to ensure the convergence of the training. Further, the convergence rate of the resulting algorithm still would be adversely affected. Second, for those approaches for which the compressed SG values are biased, there is no guarantee for the learning convergence and thus an error feedback is often required. We propose Quantized Compressive Sampling (QCS) of SG that addresses the above two issues while achieving an arbitrarily large compression gain. We introduce two variants of the algorithm: Unbiased-QCS and MMSE-QCS and show their superior performance w. r. t. other approaches. Speciﬁcally, we show that for the same number of communication bits, the convergence rate is improved by a factor of 2 relative to state of the art. Next, we propose to improve the convergence rate of the distributed training algorithm via a weighted error feedback. Speciﬁcally, we develop and analyze a method to both control the overall variance of the compressed SG and prevent the staleness of the updates. Finally, through simulations, we validate our theoretical results and establish the superior performance of the proposed SG compression in the distributed training of deep models. Our simulations also demonstrate that our proposed compression method expands substantially the region of step-size values for which the learning algorithm converges.

PDF Details

NeurIPS Conference 2017 Conference Paper

Net-Trim: Convex Pruning of Deep Neural Networks with Performance Guarantee

Alireza Aghasi
Afshin Abdi
Nam Nguyen
Justin Romberg

We introduce and analyze a new technique for model reduction for deep neural networks. While large networks are theoretically capable of learning arbitrarily complex models, overfitting and model redundancy negatively affects the prediction accuracy and model variance. Our Net-Trim algorithm prunes (sparsifies) a trained network layer-wise, removing connections at each layer by solving a convex optimization program. This program seeks a sparse set of weights at each layer that keeps the layer inputs and outputs consistent with the originally trained model. The algorithms and associated analysis are applicable to neural networks operating with the rectified linear unit (ReLU) as the nonlinear activation. We present both parallel and cascade versions of the algorithm. While the latter can achieve slightly simpler models with the same generalization performance, the former can be computed in a distributed manner. In both cases, Net-Trim significantly reduces the number of connections in the network, while also providing enough regularization to slightly reduce the generalization error. We also provide a mathematical analysis of the consistency between the initial network and the retrained model. To analyze the model sample complexity, we derive the general sufficient conditions for the recovery of a sparse transform matrix. For a single layer taking independent Gaussian random vectors of length $N$ as inputs, we show that if the network response can be described using a maximum number of $s$ non-zero weights per node, these weights can be learned from $\mathcal{O}(s\log N)$ samples.

PDF Details