Our perception of the world seems so effortless that we take it for granted. But think of what is involved when you look at even the simplest visual scene. You are given two tiny, upside-down images in your eyeballs, yet what you see is a unified three-dimensional world. This phenomenon, as the late neuropsychologist Richard Gregory once said, is “nothing short of a miracle.”

In practice, this “miraculous” process involves our brain making use of a number of different cues. These can include occlusion (if A covers some part of B, A must be in front), motion parallax (in which objects closer to us appear to move faster than those farther away) and shapes discerned from shading—the main topic of this article. Far from being a mere device employed by artists to convey the impression of depth, shading is a powerful source of information about the 3-D layout of the external world. This information is extracted by using a compact set of simple rules that we have been investigating in our laboratory.

As perception scientists, we study unconscious assumptions that people make about the world and the manner in which the brain uses those ideas to predict what it will encounter in the world. To do so, we work in parallel with a number of vision scientist colleagues, including Heinrich H. Bülthoff of the Max Planck Institute for Biological Cybernetics in Tübingen, Germany, Daniel J. Kersten of the University of Minnesota, James Todd of Ohio State University and Patrick Cavanagh of Harvard University. Together we aim to uncover the perceptual rules that enable the resolution of ambiguity when interpreting shapes from shading and to explore the stages of cognitive processing involved. Such investigations can provide insight into the “rules” used by the brain in perceiving the world, many of which reflect our evolutionary history.

There are not many areas of science in which you can spend just a few hours doodling on your laptop and make surprising new observations in a field that is more than 150 years old. In most scientific disciplines, such as physics or chemistry, the goal is to describe laws that are “objective,” in that they deliberately exclude the subjectivity of the observer. The study of perception is unique in the sense that the object is the subject, which gives the enterprise a curious recursive quality. Thus, the demonstrations that follow are each a unique experiment in which you the reader can participate.

It should be noted that our informal observations need to be followed up with careful measurements and that many questions remain to be answered. But we hope to convincereaders that visual illusions are more than amusing curiosities. They allow us to measure the “IQ” of the visual system. Its processing strategies are often surprisingly sophisticated, but equally often it uses heuristics and shortcuts.


The Basic Rules of Shading

Consider a simple circle with a gradient suggesting one side is illuminated and the other is in shadow (1). Such an illustration is usually seen as a sphere or ball lit from the left, although with a bit of effort you can see it as a cavity lit from the right. This demonstration uncovers the first rule of shape from shading: other things being equal, convexity is preferable. We may have this preference because the objects we encounter in nature are usually convex. A creature that has evolved on Venus, which has no solid objects, would not show this preference.

 

Now examine the illustration at the right (2), and you will notice something strange: when the top row is seen as spheres, there is a strong tendency to see the bottom row as cavities, and vice versa. This observation demonstrates the single-light-source rule, the assumption that in interpreting shaded images, the brain assumes that the entire scene is illuminated by a single light source. You never see the top and bottom rows as both convex and being illuminated from opposite directions. This particular bias makes sense, given that our planet has a single sun.

Next look at 3a. Notice that the disks that are light on top invariably look like spheres, whereas the ones that are light below look like cavities. This demonstrates the third principle: the brain assumes that, in addition to having only one light source, the source must be shining from above (again this is because the sun shines from above, not below). Scottish physicist Sir David Brewster noticed this overhead lighting bias more than 100 years ago when viewing cameos lit from different directions. Our multiple shaded disks amplify the effect considerably and strip the illusion down to its bare essentials.

 

Perception does not involve faithfully transmitting the retinal image to the visual areas of the brain. The process is more complex. Different attributes in the image—called elementary features—are extracted by neurons early in visual processing before activating a cascade of events that culminates in your final act of perception. Examples of such features include edges (especially their orientation), motion and color, all of which are extracted early—quite possibly in area 17, the first visual-processing area of the brain's cortex. More complex features such as facial expression, on the other hand, are computed much later in the process.

One characteristic of elementary features is the fact that they segregate clearly into different groups even when they are intermixed. Shading follows this pattern. Most people viewing 3a, for example, can effortlessly group the spheres and segregate them from the cavities. But the same cannot be said for 3b. This comparison suggests that shading—but not the mere variation of light intensity (known as luminance) across disks—is probably an elementary feature extracted early in the processing stream. Indeed, in 1997 a team of researchers at the University of Western Ontario confirmed our speculation that shading is extracted early in visual processing by measuring the brain activity of six observers using functional magnetic resonance imaging.

But how does the brain put together different depth cues to construct a holistic three-dimensional representation of the world? As discussed, there are many different sources of information about depth, so it stands to reason that the brain initially handles each of these features independently. Is it possible that the signals from different depth cues converge onto a master depth map farther up in the brain?

The answer can be seen in 4. Even on casual inspection, it is obvious that segregation is powerful in 4b but far less vivid in 4a—in other words, it is much easier to perceive different planes of disks in 4b. In 4a, the thin horizontal lines cover the spheres and run behind the cavities, which feels wrong because we expect concave cavities to fall behind convex spheres. What these illustrations reveal is that our brain looks for consistency when combining cues to construct a 3-D reality—otherwise we would not detect this dissonance.

 

The next question is, How does the visual system “know” where the light is coming from? To solve this puzzle, we created vertical “worms,” which always appear juicy and never concave in this illustration (5a). Simple, shaded disks, however, are more ambiguous (as we have established, they become convex or concave based on our assumptions about lighting). When we disperse these disks among the worms in the rightmost illustration (5b), they tend to be seen as convex to conform to the light source from the left, as implied by the worms. (The reverse occurs in the left part of this demonstration.) The brain is therefore using the presence of unambiguous objects—our worms—to decipher where light is coming from and then interpret the more ambiguous details of an image.

 

Shapes and Shadows

Our next display (6a and 6b) is yet another demonstration of the constraint of the single light source. But this time we use shadow rather than shading. In 6a, what are initially seen as random black fragments soon crystallize into 3-D letters of the alphabet. In 6b, on the other hand, the same letters are harder to perceive as 3-D because they are randomly lit from below left or above right. This is true despite the fact that one can cognitively infer the letters individually. The difference is especially clear if the alphabet clusters are viewed in a holistic manner. The effect is also amplified if you tilt any edge of the paper by more than 60 degrees.

 

In the previous illustration, the 3-D letters have what are called attached shadows, in which shading appears on an object. We now turn to what graphic designers and artists use intuitively: cast shadows, which are not attached to their source (7a and 7b). Our next question is, How intelligent are the systems that our brain relies on to determine depth using shadows?

First notice that shadows with penumbrae—the softer-edged shading in 7a—are more realistic than those with sharp edges, such as 7b. German physiologist Ewald Hering made this observation in the 19th century. In 7, you can see that even though the shadow area is located at the same distance from the square in both 7a and 7b, the squares with blurred-edged shadows appear nearer to the observer than those with sharp-edged shadows.

The next illustration shows that the distance between the square and the shadow matters (compare 8a and 8b). The shadows can signal not only the presence but also the magnitude of depth. Yet this is no longer true if the shadow is completely detached from the object (8c). Even though this happens in the real world, it does not happen often enough to be incorporated, as a rule of thumb, into visual processing.

 

When the Systems Fail

There are limits to how sophisticated our perception truly is. We observe that the shape of a shadow does not inhibit our ability to link an object to its shadow (9). The system is smart but clearly not smart enough. More intensive investigation might reveal limits to this tolerance of shape mismatch between a shadow and its source.

 

Another example of our perceptual limits comes from considering how some rules may overturn others. In addition to the constraints of having a single light source and light above, for example, there is a weaker assumption that a single, isolated shaded disk is most likely to be convex even when lit from below (rather than a cavity lit from above). This effect is especially true when multiple disks are used, and most naive subjects—as a default—see them as a clutch of spheres (10a).

Yet if a single sphere lit from above is inserted among them (10b), the other disks instantly transform into cavities because of the new information provided by the single sphere. This change is a striking example of how a single but strong cue can veto the effect of multiple ambiguous inputs.

 

The important role of attention in light-source interpretation can be seen in the next illustration. If you fixate on the “X” in the middle of the display in 11 and focus your attention on just the cluster on the right, you will see it is made of spheres (lit from below). But if you let your attention expand to include the single sphere on the left, instantly the disks on the right start to look like cavities. We may conclude that the light-source rule applies not to the entire visual field but only to the portion that is encompassed by the window of attention.

 

By conveying depth using other cues, we can discover new ways to test our perceptual intelligence. Although different aspects of the visual image (such as color and shading) are initially extracted by separate neural channels early in visual processing, they are eventually put together to form a coherent object or event in the visual scene. We have begun doing experiments to explore how the different sources of information interact.

In an unpublished study, we investigated the interaction between shading and movement by creating an animation using the two frames shown in 12a. A sphere and a cavity were presented simultaneously, side by side, in frame 1 of the movie sequence. This was followed by the sphere and cavity appearing in the reversed locations in frame 2. In our demonstration, the two frames cycled continuously. Theoretically, there were at least three ways in which one could see the display:

1. Two flat, shaded disks reversing the polarity (direction) of luminance.

2. A stationary sphere transforming into a cavity on the left, while a cavity transforms into a sphere on the right.

3. The sphere and cavity trading places.

 

What more than two thirds of our 15 participants actually saw was something completely different and unexpected: a single ball jumping left and right—filling and emptying two stationary cavities in the background! In the control setup, which did not employ a shading gradient (12a, rightmost panels), people did not see any such movement. This experiment demonstrates that the visual system, even early in processing, deploys surprisingly sophisticated knowledge about moving objects—namely that in the real world, cavities do not move, but balls or spheres do.

Remarkably the entire perception of the display changes if the lighting is reversed for only one disk and not the other (12b). This time the disk on the left is seen to pulse inward and outward, morphing between sphere and cavity. The brain is willing to accept the deforming sphere, in the interest of obeying the single-light-source rule.

On the other hand, if there is no overhead lighting, the visual system reverts to the single-light-source rule, as shown in 13. Here half the disks are left-right shaded, and half are shaded from right to left.

 

Now have someone hold the page upright in relation to gravity (as most people would naturally do to read the words on the page) while you tilt your head sideways 90 degrees so that it is parallel with the ground. (You might find it easier if you lie down on your side.) You will discover that half the disks—the ones lit on the left—suddenly transform into spectacular spheres and the rest into cavities. So “light above” refers to “above” in relation to the head rather than the world!

Although you, as the conscious observer, know the sun is still overhead, your visual system, which is on autopilot, does not know. It makes the silly assumption that the sun is still above—as though it were stuck to your head—even when your head tilts, probably because our ancestors did not walk around with their head to the side often enough to require a mechanism that would correct for this tilt using vestibular feedback. The computational burden of doing so was avoided altogether by using a quick and dirty shortcut. The penalty you pay is vulnerability to false interpretation—your ancestors may have seen concave oranges when their head tilted accidentally. But so long as people could continue surviving long enough to have babies, this cost was not an issue in evolutionary terms.

So how does the brain get away with using such shortcuts? The goal of evolution is adequacy—not optimality—and scientists working in AI, robotics and computer vision would do well to follow nature's footsteps. As our colleague Francis Crick said, “God is a hacker.”

Whenever our brain missteps and we perceive something incorrectly, we are experiencing an illusion. Such demonstrations also have an aesthetic component, not just because they are appealing visually but also because the researcher's scientific inference is based directly on observation. (Our observations are therefore not many steps removed from the data, as is often the case in other areas of science.) There is beauty in working so closely with nature.

Finally, these illusions have implications for other aspects of vision beyond depth perception. For example, our studies provide insight into how we perceive lightness and brightness. Consider the trio of left-right shaded disks compared with three top-lit spheres in 14. This demonstration provides insight into the phenomenon of seeing the steepness of the luminance gradient—that is, the perceived contrast of brightness from one side of a disk to the other. Despite the fact that these shapes are physically identical, you probably see greater contrast in the left-right shaded set. We perceive a difference because—given the overhead-lighting rule—the top-lit spheres appear to bulge out more, and the visual system ascribes the lion's share of light intensity to surface curvature. In the case of left-right shaded disks, the brain attributes the difference in luminance to the surface itself, a principle called reflectance.

Using such demonstrations, one can play Sherlock Holmes to unravel perception's mysteries. We invite readers to create their own images and then write to us at vramacha@ucsd.edu or cchunharas@ucsd.edu about their discoveries.