Neurons in sensory cortex are tuned to diverse features in natural scenes. But what determines which features neurons become selective to? Here we explore the idea that neuronal selectivity is optimized to represent features in the recent sensory past that best predict immediate future inputs. We tested this hypothesis using simple feedforward neural networks, which were trained to predict the next few moments of video or audio in clips of natural scenes. The networks developed receptive fields that closely matched those of real cortical neurons in different mammalian species, including the oriented spatial tuning of primary visual cortex, the frequency selectivity of primary auditory cortex and, most notably, their temporal tuning properties. Furthermore, the better a network predicted future inputs the more closely its receptive fields resembled those in the brain. This suggests that sensory processing is optimized to extract those features with the most capacity to predict future input.
A large part of our brain is devoted to processing the sensory inputs that we receive from the world. This allows us to tell, for example, whether we are looking at a cat or a dog, and if we are hearing a bark or a meow. Neurons in the sensory cortex respond to these stimuli by generating spikes of activity. Within each sensory area, neurons respond best to stimuli with precise properties: those in the primary visual cortex prefer edge-like structures that move in a certain direction at a given speed, while neurons in the primary auditory cortex favour sounds that change in loudness over a particular range of frequencies.
Singer et al. sought to understand why neurons respond to the particular features of stimuli that they do. Why do visual neurons react more to moving edges than to, say, rotating hexagons? And why do auditory neurons respond more to certain changing sounds than to, say, constant tones? One leading idea is that the brain tries to use as few spikes as possible to represent real-world stimuli. Known as sparse coding, this principle can account for much of the behaviour of sensory neurons.
Another possibility is that sensory areas respond the way they do because it enables them to best predict future sensory input. To test this idea, Singer et al. used a computer to simulate a network of neurons and trained this network to predict the next few frames of video clips using the previous few frames. When the network had learned this task, Singer et al. examined the neurons’ preferred stimuli. Like neurons in primary visual cortex, the simulated neurons typically responded most to edges that moved over time.
The same network was also trained in a similar way, but this time using sound. As for neurons in primary auditory cortex, the simulated neurons preferred sounds that changed in loudness at particular frequencies. Notably, for both vision and audition, the simulated neurons favoured recent inputs over those further into the past. In this way and others, they were more similar to real neurons than simulated neurons that used sparse coding.
Both artificial networks trained to foretell sensory input and the brain therefore favour the same types of stimuli: the ones that are good at helping to grasp future information. This suggests that the brain represents the sensory world so as to be able to best predict the future.
Knowing how the brain handles information from our senses may help to understand disorders associated with sensory processing, such as dyslexia and tinnitus. It may also inspire approaches for training machines to process sensory inputs, improving artificial intelligence.