We begin this section with a description of the early processing of visual information which has served both explicitly and implicitly as a model for cognitive processing. Pioneering experiments performed by Kuffler in the 1950s revealed that the output neurons of the retina (ganglion cells) are responsive to very particular patterns of light [Kuffler 53]. A ganglion cell receives input from a limited group of receptor cells which determines the region of visual space that cell will "see." This region is called the cell's receptive field. An important determinant of the functioning of ganglion cells is the antagonism between the results of light input to the center region and to the surround region of the receptive field; some cells respond to light in the center and are silenced by light in the periphery, while others do the opposite. Retinal ganglion cells project to the thalamus, which in turn projects to the primary visual cortex. The receptive fields of cells in the thalamus have the same center-surround organization as those of the retinal ganglion cells. Cortical neurons, however, are not organized in a circularly-symmetric manner. Their receptive fields are elongated along one axis which serves to impart the cell with the property of orientation selectivity []. It was proposed that by properly aligning the circular receptive fields of the input cells one can build a receptive field with an elongated center and surround [Hubel 77a]. Other properties of cortical cells, such as disparity selectivity, can also be explained by having units from lower stages converge on the cell of interest, so that its receptive field is basically the sum of the lower stage receptive fields. This led to models which, to use the phrasing of Horace Barlow, assume that the basic function of the cortex is the detection of correlations [Barlow 72]. The outputs of primary sensory cortices would converge onto cells in secondary cortices and so on, in a hierarchy of processing. Following this line of thinking one can imagine building cells with more and more complex response properties. This notion was given substantial support by the report of cells in inferotemporal cortex that respond only to particular objects [Gross 72]. Significantly, certain regions where identified which contain cells which are very selective for particular classes of complex stimuli with clear ethological relevance, such as the famous hand [Gross 72] and face [Perrett 82] cells. Unfortunately, there are some serious problems with this view and over the last couple of decades evidence has accumulated that the system is considerably more complex than thus implied. What's wrong? Firstly, the hierarchical model relies primarily on feed forward connections. However, it is well known that there are substantial feedback connections between both adjacent areas of cortical processing and even between areas that are widely separated. Feedback occurs even at the earliest stages of the process. There is a substantial feedback projection, for example, from the sub-granular layers of the primary visual cortex to the thalamus. The role of these feedback connection is even more poorly understood than the role of the feedforward connections; however, given the fact that their density is substantial, they are certainly involved in important brain functioning.
Secondly, implicit in the hierarchical model there is the concept of a labeled line, according to which each single neuron signals unambiguously the presence of a particular stimulus feature and eventually, at the top of the hierarchy, a whole object. This hypothesis has been called the "Neuron Doctrine for Perception" [Barlow 72], and is more popularly known as the "grandmother cell" theory. However, in addition to the fact that response properties are reflective of feedback from "higher" levels of the system, there are extensive networks of lateral connections within and between cortical areas. This means that the output of any given neuron is highly dependent on the context in which stimulation occurs. As a consequence, the response properties of a single cell are depend on the activity of other cells in the same area and other areas which project to the location of the cell. A single cell cannot unambiguously signal the presence of a simple stimulus because, in order to interpret its output, one must know the activity of a large number of other cells. Thirdly, more recent evidence has challenged the notion that the neurons of the inferotemporal cortex are actually selective for particular objects. Careful experiments have shown that these neurons may in fact respond preferentially to much simpler conjunctions of stimuli that may occur in the object to which the neuron was thought to respond [Kobatake 94, Tanaka 91]. Thus, similar to neurons in the primary sensory areas, neurons in higher stages of processing must participate in the representation of a large number of different stimuli.
Fourthly, the hierarchical model implies that each subsequent stage of the hierarchy is involved in more complex or abstract processing then the preceding levels. In fact, recent investigations have challenged this notion. We will discuss a few examples.
Human beings and other animals are capable of substantial improvement in their performance of perceptual discriminations. Perceptual learning is now thought to involve mechanisms operating in the primary sensory cortices. Improvement on tactile discrimination tasks has been shown to involve changes in somatosensory cortex [Recanzone 92a, Recanzone 92b, Recanzone 92c, Recanzone 93]. The learning of certain types of auditory discriminations involves changes in the response properties of neurons in the primary auditory cortex []. A wide variety of psychophysical results have indicated that the learning of visual discriminations involves the lower levels of cortical visual processing. This evidence has been bolstered recently by the demonstration of changes in the response of primary visual cortical neurons to trained stimuli [Crist 97, Ghose 97, Schoups 97]. These results imply that the representation of trained stimuli depends on activity in the primary sensory cortices.
In addition to learning, the deployment of attention to particular features of the environment is now thought to depend on mechanisms operating in the lower stages of the sensory processing hierarchy. Differential responses of neurons in the primary visual cortex have been reported which correspond to different attentional states [Ito 99]. In addition, in the visual system, the effects of attention have been shown in various other areas related to processing particular components of visual images [Kastner 99, McAdams 99, Moran 85, Reynolds 99, Treue 96].
Finally, and perhaps most suggestive, it has been suggested that the experience of visual imagery depends on the initial stages of the cortical visual processing system which are retinotopically organized. Using a combination of psychophysical techniques and positron emission tomography, Stephen Kosslyn and colleagues demonstrated that when human subjects are asked to perform certain kinds of cognitive tasks (such as mentally rotating an object) these areas are differentially activated [Kosslyn 93]. Recent fMRI studies have also indicated that visual mental imagery may involve even the primary visual cortex [Le Bihan 93, Menon 93]. Implications for Representation These and other examples of complex processing by the lower stages of the sensory processing systems serve to undermine the notion that there is a simple increase in the level of abstraction as one moves from more peripheral to more central brain locations. It seems that even the representation of an external object in a single modality (say, the visual image of a tree) requires the activity of a large number of neurons in several different cortical areas. Representations which are multi-modal will require activity over an even broader range of cortical territory. Thus it is clear that representations are extensive in space. However, communication between neurons takes time. The coordination of the neurons participating in such a representation requires time for signaling between the cells involved. Therefore, representations are also extensive in time. These observations about the basic requirements for neuronal representations are not new. In fact, they have already been discussed by some members of the neurobiology community who have ventured to speculate publicly about the nature of cognition [Changeux 85, Crick 94, Edelman 87]. Changeux for example, in his now classic book, ``Neuronal Man'', describes what he called ``mental objects'' in the following terms:
The concept of assemblies or cooperative groups of neurons leads directly from one level of organization to another, from the individual neuron to a population of neurons. The number of neurons engaged in the graph of a mental object is not known: hundreds of thousands, millions maybe. It is conceivable that these assemblies possess some kind of autonomy, and that within them new properties can appear, explicable in terms of intrinsic properties of neurons-just as the properties of a molecule can be explained on the basis of those of its atoms. (...) Only states of activity of cortical regions or of generalized parts of the human brain have been observed with the positron camera. But I have high hopes that this technique, and others in the future, will allow us to follow mental objects themselves in spite of their fleetingness and their dispersion throughout the brain. To do this, it will be necessary to identify large populations of neurons distributed over wide regions of the cortex and probably other parts of the brain [Changeux 85].
Thus, though rarely expressed, it seems that there is some underlying consensus among neurobiologists, at least as far as the distributed nature of representations in the nervous system goes. We want to see if these insights about the requirements of neuronal representations can be pushed any further. We are going to describe in the following sections our attempt to define an object of physiological inquiry that may correspond to discrete mental representations. For lack of a better word, we will call this a Neuronal Object.