Author name cluster

Geoffrey E. Hinton

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

40 papers

2 author rows

ICLR Conference 2023 Conference Paper

Analog Bits: Generating Discrete Data using Diffusion Models with Self-Conditioning

Ting Chen 0007
Ruixiang Zhang
Geoffrey E. Hinton

We present Bit Diffusion: a simple and generic approach for generating discrete data with continuous state and continuous time diffusion models. The main idea behind our approach is to first represent the discrete data as binary bits, and then train a continuous diffusion model to model these bits as real numbers which we call analog bits. To generate samples, the model first generates the analog bits, which are then thresholded to obtain the bits that represent the discrete variables. We further propose two simple techniques, namely Self-Conditioning and Asymmetric Time Intervals, which lead to a significant improvement in sample quality. Despite its simplicity, the proposed approach can achieve strong performance in both discrete image generation and image captioning tasks. For discrete image generation, we significantly improve previous state-of-the-art on both CIFAR-10 (which has 3K discrete 8-bit tokens) and ImageNet-64x64 (which has 12K discrete 8-bit tokens), outperforming the best autoregressive model in both sample quality (measured by FID) and efficiency. For image captioning on MS-COCO dataset, our approach achieves competitive results compared to autoregressive models.

ICLR Conference 2023 Conference Paper

Scaling Forward Gradient With Local Losses

Mengye Ren
Simon Kornblith
Renjie Liao 0001
Geoffrey E. Hinton

Forward gradient learning computes a noisy directional gradient and is a biologically plausible alternative to backprop for learning deep neural networks. The standard forward gradient algorithm suffers from the curse of dimensionality in the number of parameters. In this paper, we propose to scale forward gradient by adding a large number of local greedy loss functions. We consider block-wise, patch-wise, and channel group-wise local losses, and show that activity perturbation reduces variance compared to weight perturbation. Inspired by MLPMixer, we also propose a new architecture, LocalMixer, that is more suitable for local learning. We find local learning can work well with both supervised classification and self-supervised contrastive learning. Empirically, it can match backprop on MNIST and CIFAR-10 and significantly outperform backprop-free algorithms on ImageNet.

NeurIPS Conference 2022 Conference Paper

A Unified Sequence Interface for Vision Tasks

Ting Chen
Saurabh Saxena
Lala Li
Tsung-Yi Lin
David J. Fleet
Geoffrey E. Hinton

While language tasks are naturally expressed in a single, unified, modeling framework, i. e. , generating sequences of tokens, this has not been the case in computer vision. As a result, there is a proliferation of distinct architectures and loss functions for different vision tasks. In this work we show that a diverse set of "core" computer vision tasks can also be unified if formulated in terms of a shared pixel-to-sequence interface. We focus on four tasks, namely, object detection, instance segmentation, keypoint detection, and image captioning, all with diverse types of outputs, e. g. , bounding boxes or dense masks. Despite that, by formulating the output of each task as a sequence of discrete tokens with a unified interface, we show that one can train a neural network with a single model architecture and loss function on all these tasks, with no task-specific customization. To solve a specific task, we use a short prompt as task description, and the sequence output adapts to the prompt so it can produce task-specific output. We show that such a model can achieve competitive performance compared to well-established task-specific models.

ICLR Conference 2022 Conference Paper

Pix2seq: A Language Modeling Framework for Object Detection

Ting Chen 0007
Saurabh Saxena
Lala Li
David J. Fleet
Geoffrey E. Hinton

We present Pix2Seq, a simple and generic framework for object detection. Unlike existing approaches that explicitly integrate prior knowledge about the task, we cast object detection as a language modeling task conditioned on the observed pixel inputs. Object descriptions (e.g., bounding boxes and class labels) are expressed as sequences of discrete tokens, and we train a neural network to perceive the image and generate the desired sequence. Our approach is based mainly on the intuition that if a neural network knows about where and what the objects are, we just need to teach it how to read them out. Beyond the use of task-specific data augmentations, our approach makes minimal assumptions about the task, yet it achieves competitive results on the challenging COCO dataset, compared to highly specialized and well optimized detection algorithms.

NeurIPS Conference 2021 Conference Paper

Canonical Capsules: Self-Supervised Capsules in Canonical Pose

Weiwei Sun
Andrea Tagliasacchi
Boyang Deng
Sara Sabour
Soroosh Yazdani
Geoffrey E. Hinton
Kwang Moo Yi

We propose a self-supervised capsule architecture for 3D point clouds. We compute capsule decompositions of objects through permutation-equivariant attention, and self-supervise the process by training with pairs of randomly rotated objects. Our key idea is to aggregate the attention masks into semantic keypoints, and use these to supervise a decomposition that satisfies the capsule invariance/equivariance properties. This not only enables the training of a semantically consistent decomposition, but also allows us to learn a canonicalization operation that enables object-centric reasoning. To train our neural network we require neither classification labels nor manually-aligned training datasets. Yet, by learning an object-centric representation in a self-supervised manner, our method outperforms the state-of-the-art on 3D point cloud reconstruction, canonicalization, and unsupervised classification.

NeurIPS Conference 2021 Conference Paper

Neural Additive Models: Interpretable Machine Learning with Neural Nets

Rishabh Agarwal
Levi Melnick
Nicholas Frosst
Xuezhou Zhang
Ben Lengerich
Rich Caruana
Geoffrey E. Hinton

Deep neural networks (DNNs) are powerful black-box predictors that have achieved impressive performance on a wide variety of tasks. However, their accuracy comes at the cost of intelligibility: it is usually unclear how they make their decisions. This hinders their applicability to high stakes decision-making domains such as healthcare. We propose Neural Additive Models (NAMs) which combine some of the expressivity of DNNs with the inherent intelligibility of generalized additive models. NAMs learn a linear combination of neural networks that each attend to a single input feature. These networks are trained jointly and can learn arbitrarily complex relationships between their input feature and the output. Our experiments on regression and classification datasets show that NAMs are more accurate than widely used intelligible models such as logistic regression and shallow decision trees. They perform similarly to existing state-of-the-art generalized additive models in accuracy, but are more flexible because they are based on neural nets instead of boosted trees. To demonstrate this, we show how NAMs can be used for multitask learning on synthetic data and on the COMPAS recidivism data due to their composability, and demonstrate that the differentiability of NAMs allows them to train more complex interpretable models for COVID-19.

ICLR Conference 2021 Conference Paper

Teaching with Commentaries

Aniruddh Raghu
Maithra Raghu
Simon Kornblith
David Duvenaud
Geoffrey E. Hinton

Effective training of deep neural networks can be challenging, and there remain many open questions on how to best learn these models. Recently developed methods to improve neural network training examine teaching: providing learned information during the training process to improve downstream model performance. In this paper, we take steps towards extending the scope of teaching. We propose a flexible teaching framework using commentaries, learned meta-information helpful for training on a particular task. We present gradient-based methods to learn commentaries, leveraging recent work on implicit differentiation for scalability. We explore diverse applications of commentaries, from weighting training examples, to parameterising label-dependent data augmentation policies, to representing attention masks that highlight salient image regions. We find that commentaries can improve training speed and/or performance, and provide insights about the dataset and training process. We also observe that commentaries generalise: they can be reused when training new models to obtain performance benefits, suggesting a use-case where commentaries are stored with a dataset and leveraged in future for improved model training.

ICML Conference 2021 Conference Paper

Unsupervised Part Representation by Flow Capsules

Sara Sabour
Andrea Tagliasacchi
Soroosh Yazdani
Geoffrey E. Hinton
David J. Fleet

Capsule networks aim to parse images into a hierarchy of objects, parts and relations. While promising, they remain limited by an inability to learn effective low level part descriptions. To address this issue we propose a way to learn primary capsule encoders that detect atomic parts from a single image. During training we exploit motion as a powerful perceptual cue for part definition, with an expressive decoder for part generation within a layered image model with occlusion. Experiments demonstrate robust part discovery in the presence of multiple objects, cluttered backgrounds, and occlusion. The learned part decoder is shown to infer the underlying shape masks, effectively filling in occluded regions of the detected shapes. We evaluate FlowCapsules on unsupervised part segmentation and unsupervised image classification.

ICML Conference 2020 Conference Paper

A Simple Framework for Contrastive Learning of Visual Representations

Ting Chen 0007
Simon Kornblith
Mohammad Norouzi 0002
Geoffrey E. Hinton

This paper presents SimCLR: a simple framework for contrastive learning of visual representations. We simplify recently proposed contrastive self-supervised learning algorithms without requiring specialized architectures or a memory bank. In order to understand what enables the contrastive prediction tasks to learn useful representations, we systematically study the major components of our framework. We show that (1) composition of data augmentations plays a critical role in defining effective predictive tasks, (2) introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and (3) contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning. By combining these findings, we are able to considerably outperform previous methods for self-supervised and semi-supervised learning on ImageNet. A linear classifier trained on self-supervised representations learned by SimCLR achieves 76. 5% top-1 accuracy, which is a 7% relative improvement over previous state-of-the-art, matching the performance of a supervised ResNet-50. When fine-tuned on only 1% of the labels, we achieve 85. 8% top-5 accuracy, outperforming AlexNet with 100X fewer labels.

NeurIPS Conference 2020 Conference Paper

Big Self-Supervised Models are Strong Semi-Supervised Learners

Ting Chen
Simon Kornblith
Kevin Swersky
Mohammad Norouzi
Geoffrey E. Hinton

One paradigm for learning from few labeled examples while making best use of a large amount of unlabeled data is unsupervised pretraining followed by supervised fine-tuning. Although this paradigm uses unlabeled data in a task-agnostic way, in contrast to common approaches to semi-supervised learning for computer vision, we show that it is surprisingly effective for semi-supervised learning on ImageNet. A key ingredient of our approach is the use of big (deep and wide) networks during pretraining and fine-tuning. We find that, the fewer the labels, the more this approach (task-agnostic use of unlabeled data) benefits from a bigger network. After fine-tuning, the big network can be further improved and distilled into a much smaller one with little loss in classification accuracy by using the unlabeled examples for a second time, but in a task-specific way. The proposed semi-supervised learning algorithm can be summarized in three steps: unsupervised pretraining of a big ResNet model using SimCLRv2, supervised fine-tuning on a few labeled examples, and distillation with unlabeled examples for refining and transferring the task-specific knowledge. This procedure achieves 73. 9% ImageNet top-1 accuracy with just 1% of the labels ($\le$13 labeled images per class) using ResNet-50, a 10X improvement in label efficiency over the previous state-of-the-art. With 10% of labels, ResNet-50 trained with our method achieves 77. 5% top-1 accuracy, outperforming standard supervised training with all of the labels.

ICLR Conference 2020 Conference Paper

Detecting and Diagnosing Adversarial Images with Class-Conditional Capsule Reconstructions

Yao Qin 0001
Nicholas Frosst
Sara Sabour
Colin Raffel
Garrison W. Cottrell
Geoffrey E. Hinton

Adversarial examples raise questions about whether neural network models are sensitive to the same visual features as humans. In this paper, we first detect adversarial examples or otherwise corrupted images based on a class-conditional reconstruction of the input. To specifically attack our detection mechanism, we propose the Reconstructive Attack which seeks both to cause a misclassification and a low reconstruction error. This reconstructive attack produces undetected adversarial examples but with much smaller success rate. Among all these attacks, we find that CapsNets always perform better than convolutional networks. Then, we diagnose the adversarial examples for CapsNets and find that the success of the reconstructive attack is highly related to the visual similarity between the source and target class. Additionally, the resulting perturbations can cause the input image to appear visually more like the target class and hence become non-adversarial. This suggests that CapsNets use features that are more aligned with human perception and have the potential to address the central issue raised by adversarial examples.

ICML Conference 2020 Conference Paper

Imputer: Sequence Modelling via Imputation and Dynamic Programming

William Chan
Chitwan Saharia
Geoffrey E. Hinton
Mohammad Norouzi 0002
Navdeep Jaitly

This paper presents the Imputer, a neural sequence model that generates output sequences iteratively via imputations. The Imputer is an iterative generation model, requiring only a constant number of generation steps independent of the number of input or output tokens. The Imputer can be trained to approximately marginalize over all possible alignments between the input and output sequences, and all possible generation orders. We present a tractable dynamic programming training algorithm, which yields a lower bound on the log marginal likelihood. When applied to end-to-end speech recognition, the Imputer outperforms prior non-autoregressive models and achieves competitive results to autoregressive models. On LibriSpeech test-other, the Imputer achieves 11. 1 WER, outperforming CTC at 13. 0 WER and seq2seq at 12. 5 WER.

ICML Conference 2019 Conference Paper

Analyzing and Improving Representations with the Soft Nearest Neighbor Loss

Nicholas Frosst
Nicolas Papernot
Geoffrey E. Hinton

We explore and expand the Soft Nearest Neighbor Loss to measure the entanglement of class manifolds in representation space: i. e. , how close pairs of points from the same class are relative to pairs of points from different classes. We demonstrate several use cases of the loss. As an analytical tool, it provides insights into the evolution of class similarity structures during learning. Surprisingly, we find that maximizing the entanglement of representations of different classes in the hidden layers is beneficial for discrimination in the final layer, possibly because it encourages representations to identify class-independent similarity structures. Maximizing the soft nearest neighbor loss in the hidden layers leads not only to better-calibrated estimates of uncertainty on outlier data but also marginally improved generalization. Data that is not from the training distribution can be recognized by observing that in the hidden layers, it has fewer than the normal number of neighbors from the predicted class.

ICML Conference 2019 Conference Paper

Similarity of Neural Network Representations Revisited

Simon Kornblith
Mohammad Norouzi 0002
Honglak Lee
Geoffrey E. Hinton

Recent work has sought to understand the behavior of neural networks by comparing representations between layers and between different trained models. We examine methods for comparing neural network representations based on canonical correlation analysis (CCA). We show that CCA belongs to a family of statistics for measuring multivariate similarity, but that neither CCA nor any other statistic that is invariant to invertible linear transformation can measure meaningful similarities between representations of higher dimension than the number of data points. We introduce a similarity index that measures the relationship between representational similarity matrices and does not suffer from this limitation. This similarity index is equivalent to centered kernel alignment (CKA) and is also closely connected to CCA. Unlike CCA, CKA can reliably identify correspondences between representations in networks trained from different initializations.

UAI Conference 2013 Conference Paper

Modeling Documents with Deep Boltzmann Machines

Nitish Srivastava
Ruslan Salakhutdinov
Geoffrey E. Hinton

We introduce a type of Deep Boltzmann Machine (DBM) that is suitable for extracting distributed semantic representations from a large unstructured collection of documents. We overcome the apparent difficulty of training a DBM with judicious parameter tying. This enables an efficient pretraining algorithm and a state initialization scheme for fast inference. The model can be trained just as efficiently as a standard Restricted Boltzmann Machine. Our experiments show that the model assigns better log probability to unseen data than the Replicated Softmax model. Features extracted from our model outperform LDA, Replicated Softmax, and DocNADE models on document retrieval and document classification tasks.

ICML Conference 2013 Conference Paper

On the importance of initialization and momentum in deep learning

Ilya Sutskever
James Martens
George E. Dahl
Geoffrey E. Hinton

Deep and recurrent neural networks (DNNs and RNNs respectively) are powerful models that were considered to be almost impossible to train using stochastic gradient descent with momentum. In this paper, we show that when stochastic gradient descent with momentum uses a well-designed random initialization and a particular type of slowly increasing schedule for the momentum parameter, it can train both DNNs and RNNs (on datasets with long-term dependencies) to levels of performance that were previously achievable only with Hessian-Free optimization. We find that both the initialization and the momentum are crucial since poorly initialized networks cannot be trained with momentum and well-initialized networks perform markedly worse when the momentum is absent or poorly tuned. Our success training these models suggests that previous attempts to train deep and recurrent neural networks from random initializations have likely failed due to poor initialization schemes. Furthermore, carefully tuned momentum methods suffice for dealing with the curvature issues in deep and recurrent network training objectives without the need for sophisticated second-order methods.

ICML Conference 2013 Conference Paper

Tensor Analyzers

Yichuan Tang
Ruslan Salakhutdinov
Geoffrey E. Hinton

Factor Analysis is a statistical method that seeks to explain linear variations in data by using unobserved latent variables. Due to its additive nature, it is not suitable for modeling data that is generated by multiple groups of latent factors which interact multiplicatively. In this paper, we introduce Tensor Analyzers which are a multilinear generalization of Factor Analyzers. We describe an efficient way of sampling from the posterior distribution over factor values and we demonstrate that these samples can be used in the EM algorithm for learning interesting mixture models of natural image patches. Tensor Analyzers can also accurately recognize a face under significant pose and illumination variations when given only one previous image of that face. We also show that Tensor Analyzers can be trained in an unsupervised, semi-supervised, or fully supervised settings.

ICML Conference 2012 Conference Paper

Deep Lambertian Networks

Yichuan Tang
Ruslan Salakhutdinov
Geoffrey E. Hinton

ICML Conference 2012 Conference Paper

Deep Mixtures of Factor Analysers

Yichuan Tang
Ruslan Salakhutdinov
Geoffrey E. Hinton

ICML Conference 2012 Conference Paper

Learning to Label Aerial Images from Noisy Data

Volodymyr Mnih
Geoffrey E. Hinton

UAI Conference 2011 Conference Paper

Conditional Restricted Boltzmann Machines for Structured Output Prediction

Volodymyr Mnih
Hugo Larochelle
Geoffrey E. Hinton

Conditional Restricted Boltzmann Machines (CRBMs) are rich probabilistic models that have recently been applied to a wide range of problems, including collaborative filtering, classification, and modeling motion capture data. While much progress has been made in training non-conditional RBMs, these algorithms are not applicable to conditional models and there has been almost no work on training and generating predictions from conditional RBMs for structured output problems. We first argue that standard Contrastive Divergence-based learning may not be suitable for training CRBMs. We then identify two distinct types of structured output prediction problems and propose an improved learning algorithm for each. The first problem type is one where the output space has arbitrary structure but the set of likely output configurations is relatively small, such as in multi-label classification. The second problem is one where the output space is arbitrarily structured but where the output space variability is much greater, such as in image denoising or pixel labeling. We show that the new learning algorithms can work much better than Contrastive Divergence on both types of problems.

ICML Conference 2011 Conference Paper

Generating Text with Recurrent Neural Networks

Ilya Sutskever
James Martens
Geoffrey E. Hinton

JMLR Journal 2011 Journal Article

Two Distributed-State Models For Generating High-Dimensional Time Series

Graham W. Taylor
Geoffrey E. Hinton
Sam T. Roweis

In this paper we develop a class of nonlinear generative models for high-dimensional time series. We first propose a model based on the restricted Boltzmann machine (RBM) that uses an undirected model with binary latent variables and real-valued "visible" variables. The latent and visible variables at each time step receive directed connections from the visible variables at the last few time-steps. This "conditional" RBM (CRBM) makes on-line inference efficient and allows us to use a simple approximate learning procedure. We demonstrate the power of our approach by synthesizing various sequences from a model trained on motion capture data and by performing on-line filling in of data lost during capture. We extend the CRBM in a way that preserves its most important computational properties and introduces multiplicative three-way interactions that allow the effective interaction weight between two variables to be modulated by the dynamic state of a third variable. We introduce a factoring of the implied three-way weight tensor to permit a more compact parameterization. The resulting model can capture diverse styles of motion with a single set of parameters, and the three-way interactions greatly improve its ability to blend motion styles or to transition smoothly among them. Videos and source code can be found at http://www.cs.nyu.edu/~gwtaylor/publications/jmlr2011. [abs] [ pdf ][ bib ] &copy JMLR 2011. ( edit, beta )

ICML Conference 2010 Conference Paper

Rectified Linear Units Improve Restricted Boltzmann Machines

Vinod Nair
Geoffrey E. Hinton

ICML Conference 2009 Conference Paper

Factored conditional restricted Boltzmann Machines for modeling motion style

Graham W. Taylor
Geoffrey E. Hinton

The Conditional Restricted Boltzmann Machine (CRBM) is a recently proposed model for time series that has a rich, distributed hidden state and permits simple, exact inference. We present a new model, based on the CRBM that preserves its most important computational properties and includes multiplicative three-way interactions that allow the effective interaction weight between two units to be modulated by the dynamic state of a third unit. We factor the three-way weight tensor implied by the multiplicative model, reducing the number of parameters from O ( N 3 ) to O ( N 2 ). The result is an efficient, compact model whose effectiveness we demonstrate by modeling human motion. Like the CRBM, our model can capture diverse styles of motion with a single set of parameters, and the three-way interactions greatly improve the model's ability to blend motion styles or to transition smoothly among them.

UAI Conference 2009 Conference Paper

Products of Hidden Markov Models: It Takes N>1 to Tango

Graham W. Taylor
Geoffrey E. Hinton

Products of Hidden Markov Models (PoHMMs) are an interesting class of generative models which have received little attention since their introduction. This may be in part due to their more computationally expensive gradient-based learning algorithm, and the intractability of computing the log likelihood of sequences under the model. In this paper, we demonstrate how the partition function can be estimated reliably via Annealed Importance Sampling. We perform experiments using contrastive divergence learning on rainfall data and data captured from pairs of people dancing. Our results suggest that advances in learning and evaluation for undirected graphical models and recent increases in available computing power make PoHMMs worth considering for complex time-series modeling tasks.

ICML Conference 2009 Conference Paper

Using fast weights to improve persistent contrastive divergence

Tijmen Tieleman
Geoffrey E. Hinton

ICML Conference 2009 Conference Paper

Workshop summary: Workshop on learning feature hierarchies

Kai Yu
Ruslan Salakhutdinov
Yann LeCun
Geoffrey E. Hinton
Yoshua Bengio

ICML Conference 2007 Conference Paper

Restricted Boltzmann machines for collaborative filtering

Ruslan Salakhutdinov
Andriy Mnih
Geoffrey E. Hinton

Most of the existing approaches to collaborative filtering cannot handle very large data sets. In this paper we show how a class of two-layer undirected graphical models, called Restricted Boltzmann Machines (RBM's), can be used to model tabular data, such as user's ratings of movies. We present efficient learning and inference procedures for this class of models and demonstrate that RBM's can be successfully applied to the Netflix data set, containing over 100 million user/movie ratings. We also show that RBM's slightly outperform carefully-tuned SVD models. When the predictions of multiple RBM models and multiple SVD models are linearly combined, we achieve an error rate that is well over 6% better than the score of Netflix's own system.

ICML Conference 2007 Conference Paper

Three new graphical models for statistical language modelling

Andriy Mnih
Geoffrey E. Hinton

The supremacy of n -gram models in statistical language modelling has recently been challenged by parametric models that use distributed representations to counteract the difficulties caused by data sparsity. We propose three new probabilistic language models that define the distribution of the next word in a sequence given several preceding words by using distributed representations of those words. We show how real-valued distributed representations for words can be learned at the same time as learning a large set of stochastic binary hidden features that are used to predict the distributed representation of the next word from previous distributed representations. Adding connections from the previous states of the binary hidden features improves performance as does adding direct connections between the real-valued distributed representations. One of our models significantly outperforms the very best n -gram models.

JMLR Journal 2004 Journal Article

Reinforcement Learning with Factored States and Actions

Brian Sallans
Geoffrey E. Hinton

A novel approximation method is presented for approximating the value function and selecting good actions for Markov decision processes with large state and action spaces. The method approximates state-action values as negative free energies in an undirected graphical model called a product of experts. The model parameters can be learned efficiently because values and derivatives can be efficiently computed for a product of experts. Actions can be found even in large factored action spaces by the use of Markov chain Monte Carlo sampling. Simulation results show that the product of experts approximation can be used to solve large problems. In one simulation it is used to find actions in action spaces of size 2 40. [abs] [ pdf ][ ps.gz ][ ps ]

UAI Conference 2003 Conference Paper

Efficient Parametric Projection Pursuit Density Estimation

Max Welling
Richard S. Zemel
Geoffrey E. Hinton

Product models of low dimensional experts are a powerful way to avoid the curse of dimensionality. We present the ``under-complete product of experts' (UPoE), where each expert models a one dimensional projection of the data. The UPoE is fully tractable and may be interpreted as a parametric probabilistic model for projection pursuit. Its ML learning rules are identical to the approximate learning rules proposed before for under-complete ICA. We also derive an efficient sequential learning algorithm and discuss its relationship to projection pursuit density estimation and feature induction algorithms for additive random field models

JMLR Journal 2003 Journal Article

Energy-Based Models for Sparse Overcomplete Representations

Yee Whye Teh
Max Welling
Simon Osindero
Geoffrey E. Hinton

We present a new way of extending independent components analysis (ICA) to overcomplete representations. In contrast to the causal generative extensions of ICA which maintain marginal independence of sources, we define features as deterministic (linear) functions of the inputs. This assumption results in marginal dependencies among the features, but conditional independence of the features given the inputs. By assigning energies to the features a probability distribution over the input states is defined through the Boltzmann distribution. Free parameters of this model are trained using the contrastive divergence objective (Hinton, 2002). When the number of features is equal to the number of input dimensions this energy-based model reduces to noiseless ICA and we show experimentally that the proposed learning algorithm is able to perform blind source separation on speech data. In additional experiments we train overcomplete energy-based models to extract features from various standard data-sets containing speech, natural images, hand-written digits and faces. [abs] [ pdf ][ ps.gz ][ ps ]

UAI Conference 2001 Conference Paper

Discovering Multiple Constraints that are Frequently Approximately Satisfied

Geoffrey E. Hinton
Yee Whye Teh

Some high-dimensional data.sets can be modelled by assuming that there are many different linear constraints, each of which is Frequently Approximately Satisfied (FAS) by the data. The probability of a data vector under the model is then proportional to the product of the probabilities of its constraint violations. We describe three methods of learning products of constraints using a heavy-tailed probability distribution for the violations.

ICML Conference 2000 Conference Paper

Learning Distributed Representations by Mapping Concepts and Relations into a Linear Space

Alberto Paccanaro
Geoffrey E. Hinton

AIJ Journal 1990 Journal Article

Mapping part-whole hierarchies into connectionist networks

Geoffrey E. Hinton

AIJ Journal 1990 Journal Article

Preface to the special issue on connectionist symbol processing

Geoffrey E. Hinton

AIJ Journal 1989 Journal Article

Connectionist learning procedures

Geoffrey E. Hinton

IJCAI Conference 1985 Conference Paper

Shape Recognition and Illusory Conjunctions

Geoffrey E. Hinton
Kevin J. Lang

IJCAI Conference 1985 Conference Paper

Symbols Among the Neurons: Details of a Connectionist Inference Architecture

David S. Touretzky
Geoffrey E. Hinton