Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The primate visual system achieves remarkable visual object recognition performance even in brief presentations, and under changes to object exemplar, geometric transformations, and background variation (a.k.a. core visual object recognition). This remarkable performance is mediated by the representation formed in inferior temporal (IT) cortex. In parallel, recent advances in machine learning have led to ever higher performing models of object recognition using artificial deep neural networks (DNNs). It remains unclear, however, whether the representational performance of DNNs rivals that of the brain. To accurately produce such a comparison, a major difficulty has been a unifying metric that accounts for experimental limitations, such as the amount of noise, the number of neural recording sites, and the number of trials, and computational limitations, such as the complexity of the decoding classifier and the number of classifier training examples. In this work, we perform a direct comparison that corrects for these experimental limitations and computational considerations. As part of our methodology, we propose an extension of “kernel analysis” that measures the generalization accuracy as a function of representational complexity. Our evaluations show that, unlike previous bio-inspired models, the latest DNNs rival the representational performance of IT cortex on this visual object recognition task. Furthermore, we show that models that perform well on measures of representational performance also perform well on measures of representational similarity to IT, and on measures of predicting individual IT multi-unit responses. Whether these DNNs rely on computational mechanisms similar to the primate visual system is yet to be determined, but, unlike all previous bio-inspired models, that possibility cannot be ruled out merely on representational performance grounds.

Author Summary

Primates are remarkable at determining the category of a visually presented object even in brief presentations, and under changes to object exemplar, position, pose, scale, and background. To date, this behavior has been unmatched by artificial computational systems. However, the field of machine learning has made great strides in producing artificial deep neural network systems that perform highly on object recognition benchmarks. In this study, we measured the responses of neural populations in inferior temporal (IT) cortex across thousands of images and compared the performance of neural features to features derived from the latest deep neural networks. Remarkably, we found that the latest artificial deep neural networks achieve performance equal to the performance of IT cortex. Both deep neural networks and IT cortex create representational spaces in which images with objects of the same category are close, and images with objects of different categories are far apart, even in the presence of large variations in object exemplar, position, pose, scale, and background. Furthermore, we show that the top-level features in these models exceed previous models in predicting the IT neural responses themselves. This result indicates that the latest deep neural networks may provide insight into understanding primate visual processing.

Related collections

Most cited references 44

Record: found
Abstract: found
Article: not found

Speed of processing in the human visual system.

Simon Thorpe, Denis Fize, Catherine Marlot (1996)

How long does it take for the human visual system to process a complex natural image? Subjectively, recognition of familiar objects and scenes appears to be virtually instantaneous, but measuring this processing time experimentally has proved difficult. Behavioural measures such as reaction times can be used, but these include not only visual processing but also the time required for response execution. However, event-related potentials (ERPs) can sometimes reveal signs of neural processing well before the motor output. Here we use a go/no-go categorization task in which subjects have to decide whether a previously unseen photograph, flashed on for just 20 ms, contains an animal. ERP analysis revealed a frontal negativity specific to no-go trials that develops roughly 150 ms after stimulus onset. We conclude that the visual processing needed to perform this highly demanding task can be achieved in under 150 ms.

0 comments Cited 693 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Unsupervised spike detection and sorting with wavelets and superparamagnetic clustering.

R. Quian Quiroga, Z Nádasdy, Y. BEN-SHAUL (2004)

This study introduces a new method for detecting and sorting spikes from multiunit recordings. The method combines the wavelet transform, which localizes distinctive spike features, with superparamagnetic clustering, which allows automatic classification of the data without assumptions such as low variance or gaussian distributions. Moreover, an improved method for setting amplitude thresholds for spike detection is proposed. We describe several criteria for implementation that render the algorithm unsupervised and fast. The algorithm is compared to other conventional methods using several simulated data sets whose characteristics closely resemble those of in vivo recordings. For these data sets, we found that the proposed algorithm outperformed conventional methods.

0 comments Cited 549 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

How does the brain solve visual object recognition?

James J. DiCarlo, Davide Zoccolan, Nicole C. Rust (2012)

Mounting evidence suggests that 'core object recognition,' the ability to rapidly recognize objects despite substantial appearance variation, is solved in the brain via a cascade of reflexive, largely feedforward computations that culminate in a powerful neuronal representation in the inferior temporal cortex. However, the algorithm that produces this solution remains poorly understood. Here we review evidence ranging from individual neurons and neuronal populations to behavior and computational models. We propose that understanding this algorithm will require using neuronal and psychophysical data to sift through many computational models, each based on building blocks of small, canonical subnetworks with a common functional goal. Copyright © 2012 Elsevier Inc. All rights reserved.

0 comments Cited 507 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Matthias Bethge: Role: Editor

Journal

Journal ID (nlm-ta): PLoS Comput Biol

Journal ID (iso-abbrev): PLoS Comput. Biol

Journal ID (publisher-id): plos

Journal ID (pmc): ploscomp

Title: PLoS Computational Biology

Publisher: Public Library of Science (San Francisco, USA )

ISSN (Print): 1553-734X

ISSN (Electronic): 1553-7358

Publication date Collection: December 2014

Publication date (Electronic): 18 December 2014

Volume: 10

Issue: 12

Electronic Location Identifier: e1003963

Affiliations

[1 ]Department of Brain and Cognitive Sciences and McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America

[2 ]Harvard–MIT Division of Health Sciences and Technology, Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America

University of Tübingen and Max Planck Institute for Biologial Cybernetics, Germany

Author notes

* E-mail: cadieu@ 123456mit.edu

The authors have declared that no competing interests exist.

Conceived and designed the experiments: CFC HH NP NJM JJD. Performed the experiments: CFC HH DLKY DA EAS NJM. Analyzed the data: CFC HH DLKY DA EAS. Contributed reagents/materials/analysis tools: CFC HH DLKY NP DA EAS. Wrote the paper: CFC HH JJD.

Article

Publisher ID: PCOMPBIOL-D-14-01126

DOI: 10.1371/journal.pcbi.1003963

PMC ID: 4270441

PubMed ID: 25521294

SO-VID: 0d816da9-11d5-4510-a3a2-c3663f6db0fc

License:

This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

History

Date received : 23 June 2014

Date accepted : 3 October 2014

Page count

Pages: 18

Funding

This work was supported by the U.S. National Eye Institute (NIH NEI: 5R01EY014970-09), the National Science Foundation (NSF: 0964269), and the Defense Advanced Research Projects Agency (DARPA: HR0011-10-C-0032). CFC was supported by the U.S. National Eye Institute (NIH: F32 EY022845-01). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Custom metadata

Data Availability The authors confirm that all data underlying the findings are fully available without restriction. All relevant data are available from http://dicarlolab.mit.edu/.

ScienceOpen disciplines: Quantitative & Systems biology

Data availability:

ScienceOpen disciplines: Quantitative & Systems biology