Arrow Research search

Author name cluster

Noboru Murata

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

5 papers
1 author row

Possible papers

5

JMLR Journal 2019 Journal Article

Transport Analysis of Infinitely Deep Neural Network

  • Sho Sonoda
  • Noboru Murata

We investigated the feature map inside deep neural networks (DNNs) by tracking the transport map. We are interested in the role of depth---why do DNNs perform better than shallow models?---and the interpretation of DNNs---what do intermediate layers do? Despite the rapid development in their application, DNNs remain analytically unexplained because the hidden layers are nested and the parameters are not faithful. Inspired by the integral representation of shallow NNs, which is the continuum limit of the width, or the hidden unit number, we developed the flow representation and transport analysis of DNNs. The flow representation is the continuum limit of the depth, or the hidden layer number, and it is specified by an ordinary differential equation (ODE) with a vector field. We interpret an ordinary DNN as a transport map or an Euler broken line approximation of the flow. Technically speaking, a dynamical system is a natural model for the nested feature maps. In addition, it opens a new way to the coordinate-free treatment of DNNs by avoiding the redundant parametrization of DNNs. Following Wasserstein geometry, we analyze a flow in three aspects: dynamical system, continuity equation, and Wasserstein gradient flow. A key finding is that we specified a series of transport maps of the denoising autoencoder (DAE), which is a cornerstone for the development of deep learning. Starting from the shallow DAE, this paper develops three topics: the transport map of the deep DAE, the equivalence between the stacked DAE and the composition of DAEs, and the development of the double continuum limit or the integral representation of the flow representation. As partial answers to the research questions, we found that deeper DAEs converge faster and the extracted features are better; in addition, a deep Gaussian DAE transports mass to decrease the Shannon entropy of the data distribution. We expect that further investigations on these questions lead to the development of an interpretable and principled alternatives to DNNs. [abs] [ pdf ][ bib ] &copy JMLR 2019. ( edit, beta )

NeurIPS Conference 1999 Conference Paper

Population Decoding Based on an Unfaithful Model

  • Si Wu
  • Hiroyuki Nakahara
  • Noboru Murata
  • Shun-ichi Amari

We study a population decoding paradigm in which the maximum likeli(cid: 173) hood inference is based on an unfaithful decoding model (UMLI). This is usually the case for neural population decoding because the encoding process of the brain is not exactly known, or because a simplified de(cid: 173) coding model is preferred for saving computational cost. We consider an unfaithful decoding model which neglects the pair-wise correlation between neuronal activities, and prove that UMLI is asymptotically effi(cid: 173) cient when the neuronal correlation is uniform or of limited-range. The performance of UMLI is compared with that of the maximum likelihood inference based on a faithful model and that of the center of mass de(cid: 173) coding method. It turns out that UMLI has advantages of decreasing the computational complexity remarkablely and maintaining a high-level decoding accuracy at the same time. The effect of correlation on the decoding accuracy is also discussed.

NeurIPS Conference 1996 Conference Paper

Adaptive On-line Learning in Changing Environments

  • Noboru Murata
  • Klaus-Robert Müller
  • Andreas Ziehe
  • Shun-ichi Amari

An adaptive on-line algorithm extending the learning of learning idea is proposed and theoretically motivated. Relying only on gra(cid: 173) dient flow information it can be applied to learning continuous functions or distributions, even when no explicit loss function is gi(cid: 173) ven and the Hessian is not available. Its efficiency is demonstrated for a non-stationary blind separation task of acoustic signals.

NeurIPS Conference 1995 Conference Paper

Statistical Theory of Overtraining - Is Cross-Validation Asymptotically Effective?

  • Shun-ichi Amari
  • Noboru Murata
  • Klaus-Robert Müller
  • Michael Finke
  • Howard Yang

A statistical theory for overtraining is proposed. The analysis treats realizable stochastic neural networks, trained with Kullback(cid: 173) Leibler loss in the asymptotic case. It is shown that the asymptotic gain in the generalization error is small if we perform early stop(cid: 173) ping, even if we have access to the optimal stopping time. Consider(cid: 173) ing cross-validation stopping we answer the question: In what ratio the examples should be divided into training and testing sets in or(cid: 173) der to obtain the optimum performance. In the non-asymptotic region cross-validated early stopping always decreases the general(cid: 173) ization error. Our large scale simulations done on a CM5 are in nice agreement with our analytical findings.

NeurIPS Conference 1992 Conference Paper

Learning Curves, Model Selection and Complexity of Neural Networks

  • Noboru Murata
  • Shuji Yoshizawa
  • Shun-ichi Amari

Learning curves show how a neural network is improved as the number of t. raiuing examples increases and how it is related to the network complexity. The present paper clarifies asymptotic properties and their relation of t. wo learning curves, one concerning the predictive loss or generalization loss and the other the training loss. The result gives a natural definition of the complexity of a neural network. Moreover, it provides a new criterion of model selection.