HOW MANY TIMES have you heard people say that something is “black and white,” meaning it is simple or crystal clear? And because black and white are so obviously distinct, it would be only natural for us to assume that understanding how we see them must be equally straightforward.

We would be wrong. The seeming ease of perceiving the two color extremes hides a formidable challenge confronting the brain every time we look at a surface. For instance, under the same illumination, white reflects much more light to the eye than black does. But a white surface in shadow often reflects less light to the eye than a black surface in sun. Nevertheless, somehow we can accurately discern which is which. How? Clearly, the brain uses the surrounding context to make such judgments. The specific program used to interpret that context is fraught with mystery for neuroscientists like me.

Recent studies of how we see black and white have provided insights into how the human visual system analyzes the incoming pattern of light and computes object shades correctly. In addition to explaining more about how our own brains work, such research could help us in the design of artificial visual systems for robots. Computers are notoriously horrible at the kind of pattern recognition that comes so naturally to people. If computers could “see” better, they could provide more services: they could recognize our faces for keyless locks, chauffeur us around town, bring us the newspaper or pick up the trash.

Ask the Brains

Vision scientists force the brain to reveal its secrets using a method called psychophysics. Of course, the brain is not going to talk to us in lucid prose. Rather it is like a game of 20 questions. We ask the brain only yes or no questions: Do you work this way or that way? To get a clear answer, we must start with at least two competing hypotheses. Then we must carefully construct a test image that contains a critical “target” surface that should appear, let us say, light gray according to one hypothesis but dark gray for a competing explanation. Often these test images consist of delightful illusions, such as those you will see in this article.

To appreciate the complexities of seeing a surface as black, white or gray, it helps to start with some basic physics. White surfaces reflect most of the light that strikes them—roughly 90 percent. In contrast (pun unintended), black surfaces reflect only about 3 percent of that light. When this reflected light enters the eye opening called the pupil, the lens focuses it onto the inner rear surface, or retina, much as light enters a simple box camera through a lens and then strikes the film. Photoreceptors in the retina can measure the amount of incoming light striking them.

So far, so good. But the light reflected from an object we look at, by itself, contains no hint of the shade of gray from which it was reflected. Here is where things get interesting.

The total amount of light reaching the eye depends far more on the level of illumination in any scene than it does on the percentage of light that any given surface reflects. Although a white surface reflects about 30 times as much light as a neighboring black shape in the same illumination, in bright sunlight that same white surface can reflect millions of times more light than it does in moonlight. Indeed, a black surface in bright light can easily send more light to the eye than a white surface in shadow. (This fact is why no robot today can identify the gray shade of an object in its field of view. The robot can measure only the amount of light that a given object reflects, called luminance. But, as is now clear, any luminance can come from any surface.)

Recognizing that the light reflected by the object itself contains insufficient information, psychologist Hans Wallach suggested in 1948 that the brain determines a surface’s shade of gray by comparing the light received from neighboring surfaces. Wallach, a cousin of Albert Einstein, contributed a great deal to our knowledge of visual and auditory perception in studies he conducted during his long tenure at Swarthmore College. He showed that a homogeneous disk could appear as any shade between black and white simply by changing the brightness of the light surrounding it, even though the disk itself never changes.

In a classic illusion, a gray square sits on a white background, and an identical gray square is on an adjacent black background [see top illustration on opposite page]. If the perceived lightness depended solely on the amount of light reflected, the two squares would look identical. The square on the black background looks lighter—which shows us that the brain compares neighboring surfaces.

More recent evidence has shown that this comparison of neighboring surfaces may be even simpler than Wallach imagined. Instead of measuring the intensity of light at each point in the scene, the eye seems to start by measuring only the change in luminance at each border in the scene.

Wallach’s work showed that the relative luminance of two surfaces is an important piece of the puzzle. But knowing just that property would still leave a lot of ambiguity. Put another way, if one patch of a scene is five times brighter than a neighboring patch, what does that tell the eye? The two patches might be a medium gray and black. Or they could just as well be white and gray. Thus, by itself, relative luminance can tell you only how different two shades are from each other but not the specific tint of either. To compute the exact gray of a surface, the brain requires something more: a point of comparison against which it can measure various hues, which researchers now call an anchoring rule.

Two anchoring rules have been proposed. Wallach himself, and later Edwin Land, inventor of instant photography, suggested that the highest luminance in a given scene automatically appears white. If this rule were true, it would serve as the standard by which the brain compared all lower luminances. Adaptation-level theory, created in the 1940s by psychologist Harry Helson, implied that the average luminance in a scene always appears middle gray. Lighter and darker gray shades would then be identified by comparing other luminances to this middle value. Researchers working in machine vision called this the “gray world assumption.”

Which was right? In my laboratory we sought to find out in 1994. My colleagues and I at Rutgers University devised a way to test these rules under the simplest possible conditions: two gray surfaces that fill the entire visual field of an observer. We asked volunteers to place their head inside a large opaque hemisphere with its interior painted a medium shade of gray on the left and black on the right. We suspended the hemisphere within a larger rectangular chamber with lamps that created diffuse lighting for the viewer.

Remember, the brain does not yet know what these two shades of gray are—it has only relative luminance. If the brain’s anchoring rule is based on the highest luminance, then the middle gray half should appear white and the black half should appear middle gray. But if the rule is based on the average luminance, then the middle gray half should appear light gray, whereas the black half should appear dark gray. The viewer would not see either side as being black or white.

The results were clear. The middle gray half appeared totally white; the black half, middle gray. Thus, our perceived gray scale is anchored at the “top,” not in the middle. This finding tells us much about how the brain computes gray shades in simple scenes. The highest luminance appears white, whereas the perceived shade of gray of a darker surface depends on the difference—or, more precisely, the ratio—between its own luminance and that of the surface with the highest luminance.

Different Anchors

What about the much more complex scenes typical of everyday life? Does this simple algorithm work? At this point, the reader may not be surprised to learn that the answer is, “No, it is more complicated.” If the brain compared only the luminance of each surface with the highest luminance in the entire scene, then a black surface in bright light would appear as the same shade as a white surface in shadow, given only that both have the same luminance, as often happens. But they do not: we can discern the difference between them. The visual system must, then, apply a different anchor within each region of illumination.

And indeed, research with many illusions shows that the anchor does vary. If I paste several identical gray disks onto a photograph with lots of brighter areas and shadows, the disks in the shadowed regions will appear much lighter than those in the sunlight [see illustration on opposite page]. I call these “probe disks,” because they allow us to probe how the visual system computes gray shades at any location in the scene. Within any given region of illumination, the precise location of the disk matters little; the disk appears roughly the same shade of gray throughout the region.

Functionally, each region seems to have its own anchor—the luminance at which the brain perceives that a surface appears white. But programming a robot to process the image in this way presents a big challenge. Segmenting the picture into separate regions that have different illuminations requires the visual system to determine which edges in the image represent a change in the pigment of the surface and which, like the line formed by the outline of a shadow, mean an alteration in the illumination level. Such a program, for example, might classify an edge as the boundary between different regions of illumination if it is blurred or if it represents a planar break as, say, a corner.

Theorists such as Barbara Blakeslee and Mark McCourt of North Dakota State University argue that the human visual system need not use this kind of edge classification either. They argue for a less sophisticated process called spatial filtering. In our picture with gray disks, for instance, they would suggest that the gray shade of each disk depends mainly on the local luminance contrast at the edge of that disk (much as in Wallach’s earlier proposal). They might note that the apparent shade of each disk in the photograph depends simply on the direction and strength of the luminance contrast between each disk and its immediate background.

We can test whether this simple idea works by placing some probe disks on a checkerboard with a shadow falling across it [see illustration above]. We find that disks with identical local contrasts will appear to have different shades. On the other hand, disks with different local contrasts may share the same shade of gray.

All Together Now

Consider another visual trick, which sheds light on how the brain decides what elements to group together when it is sorting out patterns of light. Imagine a black “plus” sign, with two gray triangles [see top right in box on next page]. One of the triangles fits into the crook of the white area formed by the “elbow” of the plus; the other pokes inside the black area of one of the black bars. Here the two gray triangles are identical, and their immediate surroundings are identical. Each triangle borders white along its hypotenuse (the longest side) and black along the other two, equal-length sides. But the lower triangle, inside the black bar, “belongs” to the black cross, whereas the upper triangle seems to be part of its white background. Notice the boundary intersections. When the borders come together to form a kind of T junction, the brain seems to define the regions divided by the stem of the T as belonging together, but not the regions divided by the top of the T.

This interpretation of T junctions as a way for the brain to establish groups holds for another illusion, created by Australian artist Michael White. It has a series of horizontal black bars stacked with white spaces between them. In it, gray bars that are neighbored by black more than by white [see top left in box on opposite page] appear darker (not lighter) than the gray bars that are neighbored mostly by white. Here the T junctions at the corners of the gray bars suggest that the gray bars on the left lie in the same plane as the white background, whereas those on the right lie in the same plane as the black bars.

Paola Bressan in the psychology department at the University of Padua in Italy created a “dungeon” illusion, which further details the brain’s grouping mechanisms. The gray squares at the middle right in the box on the opposite page, which are surrounded by black, appear darker than those at the middle left, which are enclosed by white.

This effect may occur because the gray elements on the right appear to lie in the same plane with the white background, rather than the black bars of the dungeon window. A reverse contrast illusion by University of Crete perception researcher Elias Economou makes the same point. The gray bar [see bottom right in box on opposite page], even though it is completely bordered by black, appears darker, apparently because it is a member of the group of white bars.

These fun illusions have a serious side. They show that the brain cannot compute the gray levels we perceive by simply comparing the luminances of two neighboring surfaces alone. Rather the surrounding context comes into play in a very sophisticated way. The fact that most people are unaware of the difficulty of the problem testifies to the remarkable achievement of the human visual system.

The Big Picture

Scientific consensus on how the brain computes black and white remains further down the road. Current theories fall into three classes: low, middle and high level. Low-level theories, based on neural spatial-filtering mechanisms that encode local contrast, fail to predict the gray shades that people see. High-level theories treat the computation of surface gray shades as a kind of unconscious intellectual process in which the intensity of light illuminating a surface is automatically taken into account. Such processes might be intuitively appealing but tell us neither what to look for in the brain nor how to program a robot. Middle-level theories parse each scene into multiple frames of reference, each containing its own anchor. These theories specify the operations by which black, white and gray shades are computed better than the high-level theories do, while accounting for human perception of gray surfaces better than the low-level theories do.

But before we can truly comprehend this aspect of vision—or program a robot to do what our human system does—we will need a better understanding of how boundaries are processed. The human eye, like the robot, starts with a two-dimensional picture of the scene. How does it determine which regions of the picture should be grouped together and assigned a common anchor? Vision scientists will continue to propose hypotheses and test them with experiments. Step by step, we will force the visual system to give up its secrets.

Decoding human visual computing may be the best way to build robots that can see. But more important, it may be the best way to get a grip on how the brain works.