There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

We show how to use "complementary priors" to eliminate the explaining-away effects that make inference difficult in densely connected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory. The fast, greedy algorithm is used to initialize a slower learning procedure that fine-tunes the weights using a contrastive version of the wake-sleep algorithm. After fine-tuning, a network with three hidden layers forms a very good generative model of the joint distribution of handwritten digit images and their labels. This generative model gives better digit classification than the best discriminative learning algorithms. The low-dimensional manifolds on which the digits lie are modeled by long ravines in the free-energy landscape of the top-level associative memory, and it is easy to explore these ravines by using the directed connections to display what the associative memory has in mind.

Related collections

Most cited references 11

Record: found
Abstract: not found
Article: not found

Gradient-based learning applied to document recognition

Y Lecun, L. Bottou, Y Bengio … (1998)

0 comments Cited 3744 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Training products of experts by minimizing contrastive divergence.

Geoffrey E Hinton (2002)

It is possible to combine multiple latent-variable models of the same data by multiplying their probability distributions together and then renormalizing. This way of combining individual "expert" models makes it hard to generate samples from the combined model but easy to infer the values of the latent variables of each expert, because the combination rule ensures that the latent variables of different experts are conditionally independent when given the data. A product of experts (PoE) is therefore an interesting candidate for a perceptual system in which rapid inference is vital and generation is unnecessary. Training a PoE by maximizing the likelihood of the data is difficult because it is hard even to approximate the derivatives of the renormalization term in the combination rule. Fortunately, a PoE can be trained using a different objective function called "contrastive divergence" whose derivatives with regard to the parameters can be approximated accurately and efficiently. Examples are presented of contrastive divergence learning using several types of expert on several types of data.

0 comments Cited 212 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Boosting a Weak Learning Algorithm by Majority

Y Freund Levi (1995)

0 comments Cited 189 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

PubMed ID:: 16764513

DOI:: 10.1162/neco.2006.18.7.1527

ScienceOpen disciplines: Chemistry

Keywords: Algorithms,Animals,Humans,Learning,physiology,Neural Networks (Computer),Neurons

Data availability:

ScienceOpen disciplines: Chemistry

Keywords: Algorithms, Animals, Humans, Learning, physiology, Neural Networks (Computer), Neurons

Comments

Comment on this article

scite_

14,054

7,414

Smart Citations

14,054

7,414

Citing PublicationsSupportingMentioningContrasting

View Citations

See how this article has been cited at scite.ai

scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

A fast learning algorithm for deep belief nets.

Read this article at

Abstract

Related collections

ChemSpider related publications

Most cited references 11

Gradient-based learning applied to document recognition

Training products of experts by minimizing contrastive divergence.

Boosting a Weak Learning Algorithm by Majority

Author and article information

Journal

Comments

Comment on this article

Similar content 96

Cited by 2,607

Most referenced authors 86