Arrow Research search
Back to NeurIPS

NeurIPS 2020

Stochastic Normalization

Conference Paper Artificial Intelligence ยท Machine Learning

Abstract

Fine-tuning pre-trained deep networks on a small dataset is an important component in the deep learning pipeline. A critical problem in fine-tuning is how to avoid over-fitting when data are limited. Existing efforts work from two aspects: (1) impose regularization on parameters or features; (2) transfer prior knowledge to fine-tuning by reusing pre-trained parameters. In this paper, we take an alternative approach by refactoring the widely used Batch Normalization (BN) module to mitigate over-fitting. We propose a two-branch design with one branch normalized by mini-batch statistics and the other branch normalized by moving statistics. During training, two branches are stochastically selected to avoid over-depending on some sample statistics, resulting in a strong regularization effect, which we interpret as ``architecture regularization. '' The resulting method is dubbed stochastic normalization (\textbf{StochNorm}). With the two-branch architecture, it naturally incorporates pre-trained moving statistics in BN layers during fine-tuning, exploiting more prior knowledge of pre-trained networks. Extensive empirical experiments show that StochNorm is a powerful tool to avoid over-fitting in fine-tuning with small datasets. Besides, StochNorm is readily pluggable in modern CNN backbones. It is complementary to other fine-tuning methods and can work together to achieve stronger regularization effect.

Authors

Keywords

No keywords are indexed for this paper.

Context

Venue
Annual Conference on Neural Information Processing Systems
Archive span
1987-2025
Indexed papers
30776
Paper id
339316840823936818