Introduction

Color is the perception associated with the spectral composition of light. The question addressed here is what is the circuitry responsible for conscious color 'perception'. We are concerned with the neural machinery responsible for the hues, red, green blue and yellow, and how they are separated from black and white. Countless ideas have been proposed relating to the neural underpinnings for human color perception. The goal here is not to add new ideas, but rather to examine evidence from experiments in combination with consideration of constraints from evolution to determine which ideas are most likely to be true. From those we attempt to synthesize the best possible current explanation of the physiological mechanisms underlying human color perception.

Understanding the circuitry for color vision is important because it helps explain our conscious and unconscious reactions to colored stimuli. It can also explain the remarkable agreement across people about the appearance of some colors in the face of differences in our physiology and the exceptional disagreement we have about other colors. In addition, we can better understand how color vision deficiencies differ from normal vision and consider the prospects for curing them with gene therapy.

Taking an empirical approach to differentiating theories of color vision was a theme of the work of Frederick William Edridge-Green (1863–1953) as he performed experiments illuminating what it means to be color blind and tried to discover the best ways color vision deficiencies should be tested. We were very privileged to give the Edridge-Green lecture at The Royal College of Ophthalmologists Annual Congress at Birmingham in May 2016. This article is based on the material presented at that occasion.

The world is not colorful for lower animals even though some have color vision

A theme of this article is that there have been a huge number of, often contradictory, ideas about color vision and that evidence is required to determine which are most likely true. As an example, consider the discovery that the mantis shrimp has 12 spectrally different receptors, which led to speculation about their ‘exquisite’ color perception. In their article entitled ‘The colourful world of the mantis shrimp,’ Marshall and Oberwinkler1 speculate that ‘the remarkable colour-vision…befits their habitat of kaleidoscopically colourful tropical coral reefs’. However, evidence from recent experiments2 demonstrates that these ideas are not true. Wavelength (λ) discrimination was remarkably poor in mantis shrimp showing values of Δλ, the separation between wavelengths required for discrimination, ranging between 12 and 25 nm whereas humans, with three cone photoreceptor types, require Δλ values of only 2 nm or less across much of the spectrum.

This illustrates that visual capacities cannot be inferred from the physiological properties of the detectors and how misled we can be by our intuitions. Color vision capacities cannot be determined from knowing the number of spectrally different receptors. For example, humans have at least six spectrally different light sensors in our eye, three types of cones, peaking at 419, 530, and 559 nm, rods at 500 nm, melanopsin ganglion cells with a 480 nm spectral peak, and the newly discovered OPN5-expressing3 ganglion cells peaking at 380 nm. However, we do not have six-dimensional color vision. Our different light sensors evolved at different times for different functions. Although the spectral peaks of the individual receptor types were undoubtedly shaped by evolutionary pressure, the differences between melanopsin at 480 nm, rhodopsin at 500 nm, and M cone opsin at 530 nm did not evolve to provide color discrimination. Similarly, different receptors in the mantis shrimp eye presumably evolved to serve different functions, even some we lack—such as polarization detection—and the differences in spectral sensitivity evidently evolved as the result of other pressures, not to provide extra dimensions of color vision.

Two photoreceptors with spectrally different photopigments do collectively have information for color vision; however, the organism must have the biological machinery to extract it. People tend to have a strong intuition that organisms must be making use of all the information available to them without consideration of the evolutionary demands on the system. The most important lesson is that the only way we can tell if an organism is making use of the information carried by a particular neuron is to test it experimentally.

‘Vision for conscious perception’ vs ‘Vision for action’

Finally, mantis shrimp likely don’t ‘see’ at all in the sense we mean when we refer to our own vision. Our conscious vision is based on the neural pathway that projects through the lateral geniculate nucleus (LGN) of the thalamus. In humans, interruption of this pathway results in the loss of all conscious vision. There can be no conscious color perception without brain structures to mediate it. The primate LGN is homologous to the dorsal LGN in lower vertebrates and this structure did not evolve until after the appearance of amphibians,4 hundreds of millions of years after the appearance of the first eyes.

Thus, even though lights with sufficiently large differences in spectral distribution can drive behavioral responses in the mantis shrimp, and, thus, they can be said to exhibit color vision their ‘world is not coloured.’ The title of this article refers to ‘conscious color vision’ to emphasize evidence presented here that humans have completely separate circuitry originating at the first synapse in the retina dedicated to conscious color ‘vision-for-perception’ and this circuitry evolved independently and much later than the color vision circuitry that evolved to directly mediate action.5

The human brain builds up a representation of the world around us in our cortex

When we look at a scene, a visual representation is built up in our cortex. The brains of early vertebrates are missing the brain structures that we use when we say ‘the world is colored.’ For example, when an animal like a frog looks out at a static unmoving scene it is completely blind to the world around him. A human can peruse the visual representation of the world inside his or her brain and plot a course of action. Without moving at all, a person can decide to walk to the other side of a room and based on an internal visual representation, plan a route to navigate around obstacles. It may be difficult to imagine, but a frog has little or no ability to use vision to do this. In lower vertebrates, the guidance of goal-directed and navigating behaviors is dominated by other sensory modalities, particularly olfaction. They do have a representation of space; it is just not a visual one. They use information from external, non-visual, stimuli along with internal vestibular and proprioceptive inputs to determine their position and heading in space. In contrast to olfaction, vision in lower vertebrates is primarily concerned with operating in real-time, mediating reflexive movements to pursue prey, avoid obstacles, and predators as they are encountered. To serve this function, retinal ganglion cells of lower animals have evolved as detectors that are narrowly tuned to respond to trigger features that have evolved to elicit particular actions moment-to-moment.

Examples of color coding in lower animals

Size, shape, and pattern of movement are all stimulus parameters that can be used as criteria required to make a ‘detector’ ganglion cell fire. Adding color requirements can narrow the possible stimuli that will trigger a ganglion cell and drive a particular movement. Ganglion cells with transient responses can be ‘change detectors.’ For example, we have shown, in primates, that a certain type of ganglion cell that responds transiently to both the onset and offset of a stimulus (ie, a type of ON-OFF ganglion cell) can act as a ‘pursuit-error detector’ in guiding movements.6 The predecessors to pursuit-error detecting ganglion cells in primates are presumably phylogenetically ancient; for example, ‘schooling’ is an important pursuit behavior used as a survival tactic by most fish. They display an amazing ability to keep the school close, moving in nearly perfect unison. It is common for schooling fish to have yellow tails, which are in striking chromatic contrast to the surrounding blue water (Figure 1). Transient blue-ON, yellow-OFF ganglion cells have been recorded from fish;7 these cells fire to the onset of short-to-middle-wavelength light such as that from the blue backdrop of their liquid world and they fire to the offset of longer-wavelength light reflected from the yellow tail of a conspecific. Thus, such cells are ideally suited for triggering movements in pursuit of the fish in front of it. The cell is silenced when the portion of an image of a small part of the yellow tail covers its receptive field but it fires vigorously, and could drive corrective motor movements, when the tail moves, exposing the receptive field to the blue background surrounding the fish. At times, light from the blue background could be exactly ‘equiluminant’ to the yellow light reflected from the tail and responses in the fish would be driven purely by differences in the spectral distribution of the light. By definition, color vision is the ability of an organism to distinguish lights based on wavelength independent of intensity, thus, such ganglion cells do provide the animal with a kind of color vision. The important point is that these neurons do not participate in any way in producing an internal representation of a scene, thus, they are not involved in ‘seeing’ or ‘seeing color’ as we usually think about it. The vision-for-perception structures responsible for the type of conscious ‘seeing’ that humans do, requires the dorsal LGN of the thalamus that is absent in teleost fishes.8

Figure 1
figure 1

It is very common for schooling fish to have yellow tails that are in striking chromatic contrast to the surrounding blue water. Transient blue-ON, yellow-OFF ganglion cells fire to the onset of short-to-middle-wavelength light such as that from the blue backdrop of their world and they fire to the offset of longer-wavelength light reflected from the yellow tail of a conspecific. Thus, such cells are ideally suited for triggering movements to keep a schooling fish on track in its pursuit of the fish in front of it. The cell is silenced when the image of part of the yellow tail covers its receptive field as shown. However, it fires vigorously, and could signal corrective motor movements, when the tail moves, exposing the receptive field to the blue background surrounding the fish.

The evolutionary origins of color vision-for-action go back a billion years before photopigments and photoreceptors served the function of vision.9 A primitive form of blue–yellow color vision evolved to drive circadian vertical migration in one-celled organisms.10, 11 UV light-triggered archaebacteria to descend away from the damaging UV rays of midday, and the orange light of dusk and dawn resulted in upward migration to collect longer wavelengths to serve a form of photosynthesis mediated by bacteriorhodopsin. Emerging 100 s of millions of years later, the hagfish ‘eye’ continued to function as a circadian organ with ganglion cells that project predominantly to the hypothalamus12 just as their likely mammalian homologues, the melanopsin-containing retinal ganglion cells.13, 14, 15, 16, 17 Thus, a form of ‘blue-yellow’ chromatic opponency may be one of the oldest sensory capacities, having originally evolved to signal the large spectral changes in the sky at dawn and dusk.

Chromatic responsivity of systems responsible for entraining circadian rhythms and driving circadian activity patterns has been maintained through the evolution of vertebrates. For example, we measured the responses of the retinohypothalamic tract in fish to colored stimuli.18 Under natural conditions, in which day/night patterns of change in color and intensity were simulated, the fish were most active at dawn and dusk, with lower activity during the rest of the day and at night. The fish were then tested under conditions in which the total intensity of light was kept constant across all 24 h, whereas only the color composition of the lighting changed following the natural pattern. When the artificial ‘sky’ changed only in color but not luminance, the fish showed the same pattern of increases in activity at dawn and dusk as under the regime that mimicked natural light. The results were consistent with the activity of color opponent ganglion cells driving activity patterns and circadian entrainment. As we have seen, ganglion cells evolved as ‘detectors’ that are responsive to some combination of spatial, temporal, and chromatic properties of the stimulus. Ganglion cells that are excited by middle–long-wavelength lights but inhibited by lights absorbed by short-wavelength sensitive (S) cones act as ‘dawn and dusk detectors’ driving crepuscular activity patterns in animals. These ganglion cells mediate a kind of color vision in that wavelength, independent of intensity, can drive changes in behavior but this is nothing like the conscious color vision humans experience.

Cracking the chromatic code in the human visual system

Our visual system evolved from the visual system of fish and we inherited the basic plan of our retina from them. Our ganglion cells are specifically tuned ‘detectors’ just like those of our vertebrate ancestors and this provides a perspective for understanding our own circuitry for color vision.

Nearly 40 years ago, in his book Human Color Vision, Boynton19 wrote ‘The chromatic code of the visual nervous system is incomplete and difficult to interpret.’ Boynton19 described a model for conscious color vision that he characterized as ‘one that seems reasonable to many color vision experts.’ This model has been very influential in spite of his cautioning about the difficulty of interpreting the physiological data in terms of our perceptions. In its simplest form, the model has just two color opponent channels, a red–green (RG) one comparing long-wavelength (L) vs middle-wavelength sensitive (M) cones, and a blue–yellow (BY) one comparing S-cones to the other two types. The channels described by Boynton19 were hypothetical but they may have been chosen because they aligned with the RG and BY cells recorded in the primate LGN by DeValois et al.20 Later, the retinal substrate for the blue-ON cells recorded by DeValois et al20 was shown to be the small bistratified ganglion cells that receive S−(M+L) cone inputs21 and the retinal substrate for the RG cells in the LGN has been shown to be the L vs M opponent midget ganglion cells in the retina. Now, these midget and small bistratified ganglion cells are usually assumed to be the basis for conscious color vision in humans, even though 40 years ago, evidence to the contrary sparked Boynton’s19 remarks about the connection between color vision and physiology being difficult to interpret. Considerably more evidence to the contrary has accumulated since.

In contrast to the simple idea of just two chromatic channels, there are many ganglion cell types and subtypes in the primate retina that carry color information. Not surprisingly, from the discussion above, these have appeared at vastly different times over the history of the evolution of the vertebrate visual system. They project to several different places in the brain and serve a variety purposes besides the perception of hue.

Finding neurons in the LGN that respond to different wavelengths of light, as DeValois et al20 did, it is natural to assume that those cells mediate the conscious perception of hue. However, here we present results from experiments—that lead us to conclude that it is improbable that either the bistratified ganglion cells or midget ganglion cells with L/M opponency are involved in the circuitry for conscious hue perception in humans. This is something some color vision experts may have a hard time accepting. However, we do not believe that it was the intention of the originators of these ideas that their models become set in stone. Boynton19 in his 1970 book, rather than promoting the color model he outlined, said that work on understanding the neural code of the nervous system has ‘barely begun.’ DeValois et al20 also understood that the ideas introduced in their 1966 paper had a high probability of being wrong. In an interview in 1981 DeValois22 said ‘I would not now, 15 years later, want to bet very much on the validity of the model we put forth’. The value of the models has been to guide experiments, the results of which, in turn, inform us about which ideas are most likely to be true.

Synaptic inputs that introduce S-cone signals into the retinal circuitry

S-cones are one of the keys to wavelength encoding. At least three synaptic mechanisms have evolved to introduce S-cone signals into the retinal circuitry (Figure 2). First, there is the direct pathway from S-cones to S-cone ON-bipolar cells. This can be blocked by application of L-2-amino-4-phosphonobutyric acid (L-AP4), which acts as an agonist for the metabotropic glutamate receptor (mGluR6) on the tips of the ON-bipolar cell dendrites. In primates, one ganglion cell that receives S-cone input is the intrinsically photosensitive retinal ganglion cell (iprGC). Although these cells may be best known for their intrinsic light sensitivity, in primates they have cone opponent inputs configured as (L+M)−S.23 We found that that the S-cone input to these cells can be blocked by L-AP4. Presumably, the sign of the S-cone input is inverted by transmission through a glycinergic inhibitory S-cone amacrine cell similar to those responsible for S-OFF inputs to ganglion cells that have been characterized in ground squirrels.24, 25 The ‘blue-OFF’ color opponency was discovered before it was recognized as the homolog to the ipRGCs identified in rodents and Dacey and Packer26 originally named them the ‘large sparse monostratified’ cell and they suggested that they might be the S-OFF counterpart to the small bistratified cell and the retinal basis for S-OFF chromatically opponent cells recorded in the primate LGN. Indeed, ipRGCs may be the +Y-B cells originally observed by DeValois et al20 in their LGN recordings; however, instead of being associated with the perception of yellow, these are the dawn- and dusk-detecting ganglion cells that primates inherited from their osteichthyoid ancestors. The color opponent input presumably still serves the purpose it did in the fish allowing wavelength to contribute to our level of arousal, mood, and circadian entrainment.

Figure 2
figure 2

Three different types of synaptic mechanisms provide S-cone input to a diversity of primate ganglion cells. (a) Glycinergic inhibitory S-cone amacrine cells are proposed to be responsible for both transmitting and sign inverting S-ON-bipolar cell signals to melanopsin ganglion cells allowing changes in the color of light to contribute cues to circadian activity. (b) Synapses between S-cones and S-cone specific ON-bipolar cells are responsible for small and large bistratified ganglion cells having cone inputs configured as S-(L+M). (c) In the retina, L/M opponent cells with S-cone inputs could result from GABA-mediated feedforward from S-cones to L/M midget bipolar cells. L/M opponent cells with S-cone inputs input have been observed in four different configurations: (S+M)-L, L-(S+M), (S+L)-M, and M−(S+L), exactly the same as the mechanisms known to underlie conscious hue perception. Midget ganglion cells without S-cone input carry color information but they also are extremely well suited to mediating very-high acuity achromatic spatial vision. In ancestors to modern primates in which the majority of individuals were dichromatic, the homologs to the L/M opponent cells can only serve achromatic vision. Similarly, L/M opponent cells without S-cone inputs may only function to serve achromatic form vision in trichromats. Abbreviations: dMGC, depolarizing midget ganglion cell; hMGC, hyperpolarizing midget ganglion cell; ipRGC, intrinsically photosensitive ganglion cell; LBC, large bistratified ganglion cell; SBC, small bistratified ganglion cell.

A second target of the S-ON-bipolar cell is the small bistratified ganglion cell. Not only is the direct pathway to S-cone ON-bipolar cells blocked by the application of L-AP4, it is disrupted in patients with mutations in the GRM6 gene. Experiments that have examined the vision of patients in which mGluR6 is completely disrupted provide insights into the role of small bistratified cells in vision. In addition to lacking S-cone input to small bistratified cells, patients who lack mGluR6 function have complete congenital stationary night blindness (CSNB1). Terasaki et al27 studied patients with CSNB1 performing standard color vision tests that assay central vision and blue-on-yellow perimetry to assess S-cone-based peripheral vision. In the peripheral retina, S-cone-based detection was deficient. Remarkably, however, S-cone vision was normal in the central 10–15° of the visual field and the patients performed normally on conventional color vision tests. An obvious hypothesis is that detection by S-cones is primarily mediated by small bistratified cells in the periphery, but some other pathway mediates S-cone-based hue perception in central retina. The notion that small bistratified cells are mainly involved in peripheral visual functions is consistent with human retinal anatomy as they make up about 6–10% of all ganglion cells in the far periphery but only about 1% near the fovea.

Vision in the far periphery is thought to be more concerned with directing shifts in attention, directing reflexive movements, and measuring optical flow. The small bistratified ganglion cells are ON-OFF cells, firing both to the onset of short-wavelength light and to the offset of long-wavelength light much like the blue–yellow ON-OFF ganglion cells described in fish above. We presumably inherited these change-detecting cells from our osteichthyoid ancestors and the changes in peripheral vision associated with their loss of S-cone input indicates that they are particularly well suited to alerting changes in position of long-wavelength-reflecting objects against the blue sky in our peripheral vision.

We tested three night-blind subjects with mutations in GRM6 and everything about their central BY color vision was normal. Subjects were tested using the HRR pseudoisochromatic plates, the Cambridge Colour test, the saturated, and de-saturated D-15; they did color naming and we measured their unique hues. In all respects their central photopic vision including color vision was indistinguishable from normal. Collectively, these results, suggesting that small bistratified cells are not the basis for conscious hue perception, are compelling because (1) the loss of peripheral S-cone sensitivity, the profound night blindness and abnormalities in the S-cone-isolated ERG b-wave all confirm the complete absence of direct signaling from S-cones to S-cone bipolar cells; however, (2) in every detail, central S-cone-based color vision is normal indicating that it is based on a different synaptic mechanism that is unaffected by disruption of mGluR6 signaling. Thus, we conclude that the S-cone ON-bipolar cells serve phylogenetically ancient color vision circuits including one involving melanopsin ganglion cells and a second involving small bistratified ganglion cells—neither of which are involved in conscious hue perception. Rather, they are responsible for similar functions in modern humans to those they had in our primitive ancestors.

Other types of ganglion cells receiving S-cone input

There is a +S −(L+M) large sparse bistratified cell.26 The S-ON component is certainly derived from the S-ON-bipolar cells that are interrupted in mGluR6 patients. Thus, the preserved color vision in those patients along with mismatches between the anatomy and physiology of the cell compared with the requirements for hue perception make it unlikely that they are involved in seeing blue or yellow.

On the basis of a purely anatomical study, one group has proposed the existence of an S-OFF-midget ganglion cell in macaque monkeys.28 There is physiological evidence for all the chromatically opponent ganglion cell types discussed in this article, except for this type. Members of our laboratory have recorded responses from large and small bistratified ganglion cells and melanopsin ganglion cells but we have never encountered a +(L+M)-S midget ganglion cell near the central retina and no such cells have been reported in the literature, making their existence unlikely.

Olfaction vs vision and the origins of an internal visual representation in the brain

In fishes, the brain processes sensory information in two, largely separated, brain centers. The optic tectum, as in other vertebrates, receives moment-to-moment input from the immediate surroundings, using primarily information from the eyes connected with motor centers, enabling fish to react swiftly in an appropriate direction to capture prey, avoid objects, or predators. The importance of the tectum for vision in the teleost fish is demonstrated by the fact that complete ablation renders the animal effectively blind. In contrast, a second brain center, the telencephalon of the fish, the predecessor of our cerebral cortex, is the major processing center for olfactory information and lesioning it leaves most visual functions, including color vision, unimpaired.

In fish, although the visual system is primarily responsible for moment-to-moment real-time control of movements, the olfactory system is concerned with the enduring characteristics of objects and their relationships. Such representations have an essential role in the identification of objects and enable the organism to classify and attach meaning to them. Odors carry information about social cues, sex pheromones, and food. Even more important for the discussion here, olfaction in fish works as a reference system forming a representation of the outside world for spatial navigation, guiding the animal to locate food sources, spawning grounds, and mates.

The origin of what became the cerebral cortex was driven by behavioral adaptations involving olfactory goal-directed and navigating behaviors. Sensory input from other modalities, including vision, were subsequently recruited into this expanding region. The initial expansion of the cortex in early mammals was associated with olfactory navigation. However, the major expansion of the isocortex took place in later stages of evolution when vision began to provide an internal representation of the outside world. Primates with large brains for their body size have relatively expanded visual brain areas, including the primary visual cortex and LGN and within the visual system, evolution has acted primarily on the number of neurons in parvocellular (P) layers of the LGN.29

The evolutionary origins of the P pathway are difficult to trace partly because there are not good examples of transitional forms between the visual system of lower mammals and primates. For example, there are large anatomical differences between even cat and monkey in the central retina, LGN, and cortex. The ganglion cells in the mouse that project to the dLGN identified so far have properties more like those expected for the primate koniocelluar (K) or magnocellular (M) LGN, than P-cell properties.30, 31 The K-projecting cells in primates are heterogeneous in their response properties; these include rapid responsiveness as required for visual processing of motion and they are thought to have a role in attention, arousal, and mediating orienting responses. The M pathway neurons are specialized for motion processing related to the visual control of forelimb movements involved in reaching or grasping. In lower mammals, cortical processing of visual inputs seems mostly associated with vision-for-action functions such as those served by the M and K pathways of the primate. The most distant relative in which clear homolog for the primate P LGN is seen is in their nearest cousins, the prosimians.

The precursor of the midget ganglion cell is presumably present in lower mammals, for example, across mammals, nine types of cone bipolar cells are consistently observed,32 so two of the bipolar cells in the mouse probably correspond to the homologs of the ON- and OFF-midget bipolar cells of primate that have expanded in number such that the corresponding midget types comprise 70–80% of the ganglion cells in the primate retina.33 With the massive expansion of the P LGN there was an invasion of olfactory cortex by visual input such that brain areas serving olfaction in lower mammals have become visual areas responsible for vision-for-perception in primates. The primate ‘perirhinal’ cortex (literally meaning ‘around the nose’) receives a majority of its input from high-level visual areas, whereas in lower mammals, its inputs are primarily olfactory.34

Thus, brain areas responsible for forming a representation of the outside world for spatial navigation and concerned with the identification of objects, and attaching meaning to them based primarily on olfactory input in lower animals were taken over by visual inputs originating in the P LGN. Subsequently, these areas were greatly expanded and elaborated. Thus, a visual representation of the outside world that we associate with conscious perception is built up in the ‘ventral stream’ visual pathway in the cortex. These brain structures were elaborated in primate evolution and they get their main input from the P-layers of the LGN. These neurons project to specific V1 sublayers and from there, the ventral pathway goes through V2 and V4 to areas of the inferior temporal lobe. The final visual representation that is produced combines input originating from the P-cells with a great deal of stored information about the outside world, much of which is learned. Within this representation, we can identify objects, attach meaning and significance to them, and establish their causal relations. Unlike lower animals, we can mentally navigate within our internal visual representation, which not only comprises our immediate visual experience but allows us to recognize and interpret subsequent visual inputs, and to plan our actions ‘off-line’.35, 36

A parvocelluar-based achromatic visual system for conscious vision in primates

When the newly elaborated ventral stream visual system responsible for conscious perception arose in an ancestor to modern primates is likely that the internal representation of the world it produced was not very colorful, if it had any color at all. Routine trichromacy did not evolve until after the divergence of Old and New World primates.37 Thus, the ancestor to primates in which the ventral stream was elaborated would have had S-cones and a single class of M/L cone. By way of the S-cone ON-bipolar cells, S-cones would have provided input to the homologs of the ipRGCs, and the large and small bistratified ganglion cells, none of which evolved to mediate the conscious perception of hue. In order to carry color vision information, a least some P-cells projecting to the ventral stream visual representation would have to carry signals comparing S and L/M cone outputs. In the case of the nascent P-cell-based ventral stream visual system there would not be much opportunity for the required interactions as S-cone inputs have been demonstrated to avoid midget ganglion cells.38 Thus, a P-cell-dominated visual stream would have been mostly monochromatic and the P-cells would have served achromatic vision.

Every retinal ganglion cell evolved as a ‘detector’ and the midget ganglion cells with their center-surround receptive field and excitatory and inhibitory subfields organized into circularly symmetric regions are ‘edge detectors.’ The images of boundaries of all objects in a visual scene form edges against their backgrounds. The overall shapes defined by boundaries are extremely important in identifying objects, which is a central function of our vision-for-perception system. Thus, our conscious visual representation of the world is based on a fine-grained system of edge detectors capable of extracting contours, which produces essentially a line drawing of the world around us.

The midget ganglion cell acts as a mathematical operator that filters the image thereby performing the first step in extracting the edges. It can also perform a second function of working to help separate the intensity of the illumination falling on the scene from the achromatic reflectance of the objects. Extracting object reflectance is important for the ventral stream function of recognizing objects and giving them meaning. For decades, it has been recognized that the reflectance of objects and the intensity of the illumination differ in their spatial distribution across a scene. Incident light intensity usually varies smoothly, with no discontinuities, whereas reflectance will have discontinuities at edges where objects adjoin.39, 40 Thus, by taking the ratio of light intensity between an object and its background, the midget ganglion cell extracts information about its reflectance. The operation of edge detection initiated in the midget ganglion cells is extended in the cortex to complete the ‘line drawing’ internal representation. The filtering by the cortex results in neurons that respond only to moving edges across their receptive field. In fact, the edge-filtering system works so well that, in order keep the image refreshed as they fixate their gaze, primates have evolved microsaccades, tiny involuntary saccades that occur spontaneously during intended fixation, to move edges in the scene across the receptive fields of the cortical detectors. Lower animals make eye movements to exactly compensate for movements of the head, effectively stabilizing an image of the outside world on the retina. This allows vision-for-action systems to separate the animals own head movements from movements of things in the external world relative to the animal. Microsaccades evolved specifically for vision-for-perception systems allowing them to extract edges from the scene, but also keep the image refreshed during fixation.41, 42, 43

There is evidence that surface reflectances are reconstructed in the final representation in the cortex by a process of filling-in from the edges involving visual signals transmitted horizontally.44 This could occur within separate ON and OFF channels within an array of neural elements in the cortex representing the visual scene.45, 46, 47 The idea is that contrast signals extracted at edges undergo a process of lateral spreading until they are stopped at a light–dark boundary marked by firing of ON cells on the lighter side and OFF cells at the dark side of the boundary. Accordingly, the filling-in process depends on excitatory lateral interactions between neighboring ON elements and excitatory lateral interactions between OFF elements forming separate parallel ON and OFF networks within the cortical visual representation array. Firing of ON elements signals ‘lightness;’ firing of OFF elements signals ‘darkness.’ Finally, to stop the spreading at the boundary requires that the ON and OFF networks are mutually inhibitory. The set-up of the separate light and dark networks could be the result of a process of ‘Hebbian learning’ in which neighboring ON-driven neurons would ‘wire together’ because they ‘fire together’ being highly correlated in their responses.48 The same would be true of the OFF-network. Thus, according to this idea, the scene is encoded by two mutually inhibitory parallel representations representing light and dark in the visual scene.

In summary, it is likely that a separate visual system for the specific purpose of producing a visually based internal representation of the world became elaborated during the evolution of primates. However, lacking an integral system for comparing different cone types for extracting wavelength information and incorporating it into the achromatic representation, for the ancestor to modern primates, the world was not colored.

How the world became colored

Mollon49 described the small bistratified ganglion cell in primates as being the basis for a primordial color vision subsystem. Now we know that humans have a number of different phylogenetically ancient S-cone opponent subsystems, which all derive their S-cone signals from S-cone ON-bipolar cells that are disrupted in patients with GRM6 mutations. As introduced above, there is a weight of evidence that none of these are likely the basis for conscious hue perception. In order to serve conscious vision, the circuitry for hue perception must be integral to the P-cell-based vision-for-perception system that has evolved in primates. Recently, we made the surprising discovery of a newly evolved synaptic pathway for carrying S-cone signals in the primate retina that has all the characteristics required to fulfill the function of mediating human conscious hue perception.

As a way of identifying novel synapses in macaque retinae, we used an antibody to the SNARE protein, syntaxin-4, which is an indicator of vesicular transmitter release. At cone terminals, syntaxin-4 was clustered in two bands, one band at horizontal cell dendritic tips and a second band beneath the cone pedicle base where horizontal cells directly contact bipolar cells.50, 51, 52 Strikingly, in the lower band, syntaxin-4 was highly enriched beneath S-cones and co-localized with the HII horizontal cell marker, calbindin. The enrichment at S-cones was not observed in mouse or ground squirrel. This demonstrates a previously undiscovered, enhanced feedforward signaling mechanism between HII horizontal cells and cone bipolar cells that has evolved in primates for the purposes of color vision.53 In subsequent experiments, as predicted by this hypothesis, the existence of other components of a GABA-mediated pathway was verified, including GABA receptors and the concomitant enrichment of the Na-K-Cl co-transporter with syntaxin-4.54 The enrichment beneath S-cones reveals synapses for feedforward signaling between HII horizontal cells and S-cone bipolar cells. However, co-localization with the HII horizontal cell marker indicates a general elaboration of HII horizontal-to-bipolar cell feedforward synapses providing a synaptic pathway for S-cone signals to be introduced into a subset of L/M cone midget bipolar cells.

To corroborate the anatomical observation with physiology, we examined S-cone-mediated signals in the outer retina by recording in vivo and ex vivo L/M cone and S-cone isolating ON-OFF ERGs.55 Using S-cone isolating stimuli, we were able to investigate the presence of a GABA-mediated feedforward pathway for S-cone signals. Predictions made by the hypothesized GABA-mediated pathway were confirmed. These include the presence of a residual depolarizing S-cone-mediated ON response in L-AP4-treated retinas and humans with ON-pathway defects, and the loss of the residual S-cone-mediated b-wave in the presence of GABA blockers in the primate ex vivo ERG.56

The GABA feedforward synaptic pathway from S-cones to ON and OFF L/M midget bipolar cells via HII horizontal cells is illustrated in Figure 2. This provides a synaptic pathway from S-cones to a small subset of midget bipolar cells for mediating conscious hue perception, and it bypasses the synapse that is interrupted in patients with GRM6 mutations explaining why hue perception is preserved in those individuals.

In a dichromatic retina, containing S and L cones, as may have characterized the ancestor to modern primates in which the feedforward mechanism evolved, ON-midget ganglion cells would have a +L center −L surround, whereas OFF-midget ganglion cells would have −L center +L surround receptive field structures. These achromatic ganglion cells would respond well to black and white stimuli. The evolution of an enhanced GABA-mediated S-cone feedforward input to a small subset of these two types of bipolar cells would have created two new subtypes of midget ganglion cells capable of mediating perception of two elemental hues as shown in Figure 3. L-ON-center for yellow, L-OFF-center for blue. In each case, the center L cone has an opposed feedforward input from an S-cone. In Figure 3b, the L-OFF-center bipolar cells get an S-ON input from the GABA-mediated feedforward making the center (+S-L). However, as shown in Figure 3a the S-cones also receive conventional inhibitory feedback from their surrounding that is S-cone dominated. Finally, the L cone also gets conventional inhibitory feedback from the HI horizontal cell surround. Thus, S-cone opponency is center-surround (Figure 3b) and the L cone opponency is also center-surround, but of opposite sign, making these cells double opponent and unresponsive to either black or white stimuli presented in the center or the surround. Thus, these are pure color cells capable of serving the hue sensations of yellow and blue, respectively. A very important point here is that these hue-encoding cells can only make up a small fraction of the total number of midget ganglion cells because the number receiving S-cone input via synapses from HII horizontal cells axons must be relatively small. HII horizontal cells capable of carrying S-cone signals from S-cones to L/M bipolar cell via an enhanced GABA feedforward pathway are not found in lower mammals indicating that a submosaic of midget ganglion cells carrying S-cone signals evolved in an ancestor to Old and New World primates for the purpose of mediating conscious color vision.

Figure 3
figure 3

Proposed circuit for producing a subclass of blue–yellow double-opponent midget ganglion cells in which S-cone signals are added to L/M opponent signals. (a) Feedback- and GABA-mediated feedforward from S-cone inputs are proposed to produce the center and surround organization as illustrated. (b) Full circuit showing S-cone and L cone inputs. In this example, the ON- and OFF-midget ganglion cells (colored light and dark gray, respectively) receive their direct center input from an L cone. This is the basis for a blue–yellow system in which the OFF bipolar with S-L inputs is responsible for the percepts of blueness. The ON-bipolar cell, responsible for yellowness, has L-S inputs. Red–green percepts are proposed to arise later in evolution by addition of M-cones to the retina and thus forming two subtypes of color opponent midget ganglion cells. L cone centers serve blue–yellow color vision and M cone centers serve red–green color vision.

Separating chromatic and achromatic percepts

A dichromat has four ‘color’ sensations, black and white, blue, and yellow. Thus, cones must serve both chromatic and achromatic sensations. A central problem in modern color science is how chromatic and achromatic sensations are separated in the nervous system. Above, we introduced the idea that the neural mechanisms mediating dark and light sensations could be easily separated by the fact that there is a high degree of correlation within population of neurons receiving ON inputs and within the OFF recipient neurons, but between ONs and OFFs there are negative correlations. It has been demonstrated that correlations obtained in response to natural images can be used to classify individual cone types57 making it plausible that parallel ON and OFF sub-representations could arise by unsupervised Hebbian learning and be interspersed in an array of neural elements representing the visual scene. The proposed double-opponent midget retinal ganglion cells receiving GABA-mediated S-cone input would not respond to achromatic stimuli making their responses highly negatively correlated with the larger population of spatially opponent achromatic midget ganglion cells. Moreover, the +S −L midget ganglion cells would be highly negatively correlated in their responses to the −S+L cells. The resulting positive and negative correlations could be the basis for separating two more sub-representations one for blue and one for yellow.

The retina of the primate ancestor in which the nascent conscious color vision emerged would have a mosaic of midget ganglion cells with four different types interspersed. The idea is that somewhere in the ventral stream, the retina is mapped to an array of neural elements in which components receiving input from each one of the four P-cell types, through a process of unsupervised learning, forms four webs of mutually excitatory interconnections. In turn, connections between the four cortical submosaics would be mutually inhibitory such that each forms a separate representation of the scene corresponding to four different sensations, dark and light, blue and yellow. The two achromatic representations (dark and light) are derived from the majority of midget ganglion cells sampling the scene at very high spatial resolution forming a detailed ‘line drawing’ of the scene, which is subsequently filled-in from the edges in the cortex, whereas in the midget ganglion cell-based blue–yellow system, ‘blue’ and ‘yellow’ are sampled at much lower spatial resolution but nonetheless the much more sparse cortical representations of hue are able to crudely—more like a watercolor—fill in the colors of objects from the edges providing the ventral stream visual system with labels useful in identifying objects, providing information about their contents and giving them meaning.

In the central retina of primates, each cone is connected through a midget bipolar cell to a midget ganglion cell, establishing a private line to the brain. It has been suggested that only after this 1 : 1 connection in the central retina had evolved, did a subsequent mutation create a somewhat random mosaic of separate L- and M-cones.58, 59 However, because midget ganglion cells have a pure L or M cone center, random wiring would make them L/M opponent capable of serving red–green color vision.60 However, as explained here, the evolution of conscious color vision involved the elaboration of the central components of the entire ventral stream visual system. This midget ganglion cell-based system is responsible for an internal representation of the outside world that evolved specifically for the purpose of transforming visual inputs into conscious perceptual representations that embody the enduring characteristics of objects and their spatial relations.61

Trichromacy evolved after the ventral stream visual system was elaborated and the proposed midget ganglion cell-based blue–yellow system evolved. Apparently, only then were conditions ripe for the addition of a red–green dimension of color vision. In the dichromatic system of our primate ancestor, there were presumably four submosaics of midget ganglion cells mediating light, dark, blue, and yellow sensations, respectively. After, the mutation that created a mosaic of separate L- and M-cones in the central retina, each midget bipolar cell would connect to a single L or M cone. Thus, there would be two types of midget bipolar cells with respect to their center input, L or M, and each of these would be of two types, ON or OFF, making a total of four combinations: L-ON, L-OFF, M-ON, and M-OFF. For the submosaics of midget bipolar cells with GABA-mediated S-cone feedforward input, these four types have cone inputs, as discussed below, that correspond exactly to those responsible for the four unique hues in modern humans.62, 63 L-ON-center for yellow, L-OFF-center for blue, M-ON-center for green, and M-OFF-center for red. In each case, the center L or M cone has an opposed feedforward input from an S-cone. The center mechanisms are thus +L-S, +S-L, +M-S, and +S-M, respectively for yellow, blue, green, and red. As before, these midget ganglion cells, would behave as pure color cells, being double opponent, as shown in Figure 3, with S-cone input opposing the sign of the central cone in the receptive center, and the signs in the surround reversed compared with the center. As such, they would not respond to black–white edges anywhere in the receptive field. In comparison, the large class of L/M midget ganglion cells without S-cone input respond well to achromatic boundaries. This difference would make the ‘pure color’ midget ganglion cells, with S-cone input, negatively correlated with the achromatic L/M midget ganglion cells providing a substrate for an unsupervised learning mechanism in the cortex to separate submosaics of achromatic and chromatic elements into separate representations. Each of these could be further separated with the achromatic elements separated into ON and OFF types constituting a dark and light representation. Finally, negative correlations between inputs from the four chromatic types of midget ganglion cells could produce representations corresponding to the four elemental hues, red, green, blue, and yellow.

Unique hues

Here we return to the color opponent model Boynton19 wrote about in his book, in which the cone inputs to color opponent channels correspond to the configuration of the predominant B-Y and R-G neurons in the LGN.20 Today, however, we know that neurons comparing L vs M and S vs (L+M) do not correspond colorimetrically to red–green and yellow–blue processes of human color vision. This lack of correspondence of between the axes of phenomenological color space defined by the unique hues and the two channels of Boynton’s19 book has been a central problem in modern color science.64 This problem is solved by having S-cone input to a subset of L/M opponent midget ganglion cells as follows.

DeValois and DeValois65 describe a modern view of the actual two color opponent mechanisms corresponding to human hue perception as both being based on L vs M opponency, however, for the red–green axis, S-cone signals are added to the L side of the opponent circuit and for the blue–yellow axis, S-cone signals are added to the M side of the L vs M opponent circuit. In color space, this adding S-cone input in two different ways rotates the L vs M axes in opposite directions. The two types of neurons, so produced, have chromatic responses aligned along roughly orthogonal axes. Now, as a solution to the problem of the origins of the unique hues, it is clear that feedforward from S-cones, via HII horizontal cells to L/M opponent midget bipolar cells53, 54 is capable of producing a subset of midget ganglion cells with responses that can account for human hue perception. The combinations of S-cone inputs to L/M opponent midget ganglion cells produces exactly the hue axes corresponding to phenomenological hues.62, 63

As evidence, for the proposed hue encoding midget ganglion cells, a small number of L/M opponent ganglion cells with S-cone input have consistently been reported at different levels of the visual pathway, in the retina,66, 67 LGN69, 70 primary visual cortex71,72 and at higher centers in the ventral visual processing stream, in inferior temporal cortex.73 In our own laboratory, recording from an ex vivo preparation of macaque retina6, 74 we have confirmed the presence of a small subset of L/M opponent midget ganglion cells with S-cone input.75

In the past, being small in number and having relatively weak S-cone input, the neurons having the correct chromatic input signature required to mediate unique hues seemed insignificant. However, the new discoveries make it clear why these neurons are significant and have an important role in hue perception and in the evolution of conscious color vision.

The achromatic pathway of the ventral stream: why opponent cells don’t need to do double duty

We started this article with a lesson from the mantis shrimp. Just because information exists across an array of detectors we can never assume that an organism has the circuitry required to do the computations necessary to extract it. Moreover, we can’t assume that an organism has any need, from an evolutionary perspective, to do the necessary computations.

The idea that red–green color vision has ‘piggy-backed’ on the high-acuity midget system of primates has led to an idea, that has been promoted for more than 30 years that the midget bipolar cells perform a ‘double duty’ in visual signaling.76 Information about the spatial structure of stimulus intensity and its spectral content are confounded in the responses of a single L/M opponent midget bipolar cell. Thus, it is impossible to extract color information from a single L/M opponent midget ganglion cell. This fact goes back to Wiesel and Hubel.77 L/M opponent midget ganglion cells would have been categorized as their ‘type I’ cells, whereas parasol ganglion cells would be their ‘type III’ cells. Wiesel and Hubel77 explained that type I cells may not respond to diffuse white light that covers both center and surround, but if the white light covers the center and only part of the surround then the cell is perfectly capable of mediating black and white sensations, leading them to say that the ‘the type I cell is as good a candidate as the type III for the mediation of black-white contrast mechanisms.’ This problem was echoed by Boynton19 when he wrote about the chromatic code being ‘difficult to interpret.’ He noted that ‘many cells that appear to exhibit chromatic specificity may be as much or more concerned with spatial vision.’

It has been proposed that the signals for color and achromatic contrast might be de-confounded by circuitry that makes comparisons across P-cell inputs. DeValois and DeValois65 have, for example, outlined a specific model for how this could be done. Alternatively, other vision scientists including ourselves59, 62, 78, 79, 80, 81 have pointed out that if there is a population of ganglion cells specifically dedicated to color, elaborate anatomical circuits to separate luminance from color signals from the midget ganglion cells are rendered unnecessary. Indeed, this appears to be the case and the system presented here is a beautiful example of efficient coding in the nervous system. We introduced the logic of the retina, above, that each ganglion cell type is a detector. The L/M midget ganglion cells are ‘edge detectors’ and they can do their job most efficiently if they can detect the edges that define objects no matter if the edge is produced by achromatic contrast, equiluminant color contrast, or some combination of both. The four smaller subpopulations of midget ganglion cells, after some cortical conditioning, are low spatial resolution chromatic edge detectors. However, each is tuned through visual experience by a normalization process to be insensitive to the average spectral distribution from the environment—the colors we call white (or gray).62, 82, 83, 84 Since, they are encoding spectral contrast across a boundary, each one evolved as a ‘spectral reflectance’ detector. M-(S+L) cells are detectors of surfaces (which we call ‘green’) that reflect more middle-wavelength light compared with the average spectrum. L-(S+M) cells are detectors of surfaces (we call ‘yellow’) that reflect more long wavelength compared with the average spectrum. Conversely, (S+L)-M cells are detectors of surfaces (called ‘red’) that absorb more middle-wavelength light, whereas (S+M)-L cells detect surfaces (called ‘blue’) absorbing long-wavelength light. With the four types of spectral reflectance detectors in place, the much larger population of L/M opponent achromatic edge detectors can be completely agnostic as to whether an edge is produced by a chromatic or achromatic change.

The term ‘double-duty’ could be used to mean the ability to detect a change including purely achromatic and purely chromatic ones across a boundary, and in this sense, L/M midget ganglion cells can be described as doing ‘double-duty’ signaling both chromatic and achromatic edges indiscriminately. If the edge is purely an achromatic change then none of the ‘surface color reflectance detectors’ are activated, and by default the brain could assign the color white or gray to such a surface outlined by the edge detectors. White and gray being the absence of activity in the color system, that is, the average spectral distribution. If the surface is achromatic but darker than its surround it is assigned the color dark or black. If the edge includes a chromatic change the appropriate color reflectance detectors are activated and the object becomes filled in with the appropriate hue. The whole system would be very efficient by not signaling hues for surfaces that reflect the average and only expending energy on signaling deviations away from the mean. Finally, we want to make it clear that we don’t mean to imply that the low-level local information provided by the midget ganglion cell system can explain our perception of surfaces in the final representation of the scene, which is built up in the ventral stream. It is well understood that the visual input provides useful information that is combined with a huge amount of stored information about the world in producing the final ventral stream visual representation.

Evidence for the parallel representation of color and high-resolution achromatic form within the midget ganglion cell mosaic

It is now technically possible, using adaptive optics, to confine a targeted spot of light to an individual cone85 and this has been done in Austin Roorda’s lab in retinas in which cones have been classified as L, M, and S.81 Above, we have presented the idea that most midget ganglion cells are edge detectors that are agnostic to color with four types of hue detector much more sparsely represented in parallel. This predicts that in the central retina a majority of L and M-cones, when stimulated individually, even those surrounded by cones of the opposite type and thus are the centers of strongly L/M opponent midget ganglion cells, will give rise to the achromatic sensation of white, not color. Moreover, there should be small clusters of cones that are nearly always associated with strong chromatic percepts. These would be the cones providing input to the small subset of color-coded midget ganglion cells that receive S-cone feedforward input.

Sabesan et al81 stimulated human cones with tiny flashes of 543 nm light using adaptive optics together with precise eye-tracking. Subjects indicated the color of each flash. Stimuli were ideal for producing strong responses in midget ganglion cells and exactly as predicted by parallel processing within the midget system, described above, only a small fraction of stimulated L- and M-cones elicited color sensations, whereas the majority gave rise to the sensation of ‘white.’

As predicted by the idea of parallel processing of hue and achromatic sensations within the midget ganglion cell mosaic, there was a distinct separation of cones into populations that elicit achromatic and chromatic sensations. Moreover, contrary to conventional ideas, cones surrounded by cones of the opposite spectral type, for example, an M cone with six L cones in the immediate surround, which would be expected to have the very strongest M/L opponency, were as likely to mediate ‘white’ responses as cones with more mixed surrounds.81 By far, the simplest and most compelling explanation for these results is that the midget ganglion cells associated with most cones serve as agnostic edge detectors and they give rise to achromatic sensations when stimulated. While they do, indeed, encode wavelength information making them responsive to equiluminant edges, their main function is to provide high-resolution edge signals and they do not contribute to hue perception. However, a much smaller subset of cones serve the subclasses of midget ganglion cells that are color coded.

According to these ideas, in the central retina, every cone mediates two sensations via the midget pathway by virtue of the fact that every cone connects to one ON and one OFF-midget ganglion cell. For the achromatic pathway, the ON midget is excited when the central cone catches more photons than the average quantal catch of the surround and the associated sensation is ‘lightness.’ Conversely, when the average quantal catch in the surround is greater than in the center, the OFF pathway is excited and the associated sensation is darkness (or blackness). Every cone is in the center of one ON midget in which stimulus increments mediate lightness sensations via the ON pathway and it is in the surround of OFF midgets in which stimulus increments mediate dark sensations.

Each cone associated with a color-coded midget is also predicted to mediate two sensations. Increments of an L cone in the center of a color-coded ON midget mediate sensations of yellowness. Increments of the same L cone in the surrounding of a color-coded OFF-midget mediate red sensations. Similarly, the population of M-cones that are associated with color-coded midget ganglion cells are predicted to serve sensations of greenness through the ON pathway and blueness through the OFF pathway. Consistent with these predictions, we found non-white signaling M-cones elicited either green or blue depending upon whether the background chromaticity favored signaling in the ON or OFF pathway.86

A gene therapy ‘cure’ of color blindness may recapitulate the evolution of color vision in primates

We showed that the addition of a third opsin via gene therapy in adult red–green color-deficient primates was sufficient to produce trichromatic color vision behavior.87, 88 Some ‘double-duty’ theories have supposed that the appropriate circuits with the combination of cone inputs to explain hue perception and the separation of achromatic and chromatic sensations all occur in the cortex. In contrast, according to the picture of the evolution of primate color vision presented here, achromatic and chromatic sensations are separated and the (S+M) vs L and (S+L) vs M circuitry required for hue perception all arises in the retina automatically and it does not require any kind of neural plasticity or developmental process. According to this idea, after treatment for color blindness, all the midget ganglion cells that served achromatic spatial contrast before treatment continue to serve their same role. They would become L/M opponent extending their edge-detecting capacity to include red–green equiluminant edges, but they still serve to provide a highly detailed ‘line drawing’ without concern for separating chromatic from achromatic contrast at the borders of objects. Before treatment, a subset of midget ganglion cells receiving S-cone feedforward input would have served blue–yellow color vision, OFF midgets for blue, and ON midgets for yellow. After treatment, the OFF-midget serving blue sensations would be split into two classes, one with L cone centers and one with M cone centers. The ON midgets would be similarly divided. The new spectral sensitivities would permit red–green and blue–yellow color discrimination. The final requirement is that the organism be able to learn that the ‘labeled lines’ formally signaling blue has been split into two lines one signaling red and the other blue, and the former yellow ‘labeled line’ is split into two, one for yellow and one for green.