SPANISH PAINTER EL GRECO often depicted elongated human figures and objects in his work. Some art historians have suggested that he might have been astigmatic—that is, his eyes’ corneas or lenses may have been more curved horizontally than vertically, causing the image on the retina at the back of the eye to be stretched vertically. But surely this idea is absurd. If it were true, then we should all be drawing the world upside down, because the retinal image is upside down! (The lens flips the incoming image, and the brain interprets the image on the retina as being right-side up.) The fallacy arises from the flawed reasoning that we literally “see” a picture on the retina, as if we were scanning it with some inner eye.

No such inner eye exists. We need to think, instead, of innumerable visual mechanisms that extract information from the image in parallel and process it stage by stage, before their activity culminates in perceptual experience. As always, we will use some striking illusions to help illuminate the workings of the brain in this processing.

Angry and Calm
Compare the two faces shown in a. If you hold the page about nine to 12 inches away, you will see that the face on the right is frowning and the one on the left has a placid expression.

But if you move the figure, so that it is about six or eight feet away, the expressions change. The left one now smiles, and the right one looks calm.

How is this switch possible? It seems almost magical. To help you understand it, we need to explain how the images were constructed by Philippe G. Schyns of the University of Glasgow and Aude Oliva of the Massachusetts Institute of Technology.

A normal portrait (photographic or painted) contains variations in what neuroscientists such as ourselves term “spatial frequency.” We will discuss two types of spatial frequency: The first is “high”—with sharp, fine lines or details present in the picture. The second is “low”—conveyed by blurred edges or large objects. (In fact, most images contain a spectrum of frequencies ranging from high to low, in varying ratios and contrasts, but that is not important for the purposes of this column.)

Using computer algorithms, we can process a normal portrait to remove either high or low spatial frequencies. For instance, if we remove high frequencies, we get a blurred image that is said to contain “low spatial frequencies in the Fourier space.” (This mathematical description need not concern us further here.) In other words, this procedure of blurring is called low-pass filtering, because it filters out the high spatial frequencies (sharp edges or fine lines) and lets through only low frequencies. High-pass filtering, the opposite procedure, retains sharp edges and outlines but removes large-scale variations. The result looks a bit like an outline drawing without shading.

These types of computer-processed images are combined together, in an atypical manner, to create the mysterious faces shown in a. The researchers began with normal photographs of three faces: one calm, one angry and one smiling. They filtered each face to obtain both high-pass (containing sharp, fine lines) and low-pass (blurred, so as to capture large-scale luminance variations) images. They then combined the high-pass calm face with the low-pass smiling face to obtain the left image. For the right image, they overlaid the high-pass frowning face with the low-pass calm face.

What happens when the figures are viewed close-up? And why do the expressions change when you move the page away? To answer these questions, we need to tell you two more things about visual processing. First, the image needs to be close for you to see the sharp features. Second, sharp features, when visible, “mask”—or deflect attention away from—the large-scale objects (low spatial frequencies).

So when you bring the picture near, the sharp features become more visible, masking the coarse features. As a result, the face on the right looks like it is frowning and the one on the left, like it is relaxed. You simply do not notice the opposite emotions that the low spatial frequencies convey. Then, when you move the page farther away, your visual system is no longer able to resolve the fine details. So the expression conveyed by these fine features disappears, and the expression conveyed by low frequencies is unmasked and perceived.

The experiment shows vividly an idea originally postulated by Fergus Campbell and John Robson of the University of Cambridge: information from different spatial scales is extracted in parallel by various neural channels, which have wide ranges of receptive field sizes. (The receptive field of a visual neuron is the part of the visual field and corresponding tiny patch of retina to which a stimulus needs to be presented to activate it.) It also shows that the channels do not work in isolation from one another. Rather they interact in interesting ways (for example, the sharp edges picked up by small receptive fields mask the blurred large-scale variations signaled by large receptive fields).

Honest Abe
Experiments of this kind go back to the early 1960s, when Leon Harmon, then working at Bell Laboratories, devised the famous Abraham Lincoln effect. Harmon produced the picture of Honest Abe by taking a regular picture and digitizing it into coarse pixels (picture elements). Even when viewed close-up, there is enough information in the blocky brightness variations to recognize Lincoln. But these data, as we noted already, are masked by the sharp edges of the pixels. When you move far away from the photograph or squint, the image blurs, eliminating the sharp edges. Presto! Lincoln becomes instantly recognizable. The great artist Salvador Dal was sufficiently inspired by this illusion to use it as a basis for his paintings, an unusual juxtaposition of art and science.

Mysterious Mona Lisa
Finally, consider the mysterious smile of Leonardo da Vinci’s Mona Lisa. Philosophers and art historians who specialize in aesthetics often refer to her expression as “enigmatic” or “elusive,” mainly because they do not understand it. Indeed, we wonder whether they prefer not to understand it, because they seem to resent any attempts to explain it scientifically, apparently for fear that such analysis might detract from its beauty.

But recently neurobiologist Margaret Livingstone of Harvard Medical School made an intriguing observation; she cracked the da Vinci code, you might say. She noticed that when she looked directly at Mona Lisa’s mouth, the smile was not apparent (quite a disappointment). Yet as she moved her gaze away from the mouth, the smile appeared, beckoning her eyes back. Looking again at the mouth, she saw that the smile disappeared again. In fact, she noted, the elusive smile can be seen only when you look away from the mouth. You have to attend to it out of the corner of your eye, rather than fixating on it directly. Because of the unique shading (placement of low spatial frequencies) at the corners of the mouth, a smile is perceived only when the low spatial frequencies are dominant—that is, when you look indirectly at the masterpiece.

To confirm this notion, she performed a low-pass filtering and a high-pass filtering of the Mona Lisa. Notice that with the low-pass (blurred) image the smile is more obvious than in the original—it can be seen even if you look directly at the mouth. With the high-pass (outlinelike) image, however, no smile is apparent, even if you look away from the mouth. Putting these two images back together restores the original masterpiece and the elusive nature of the smile. As with the changing faces, we can now better appreciate what Leonardo seems to have stumbled on and fallen in love with—a portrait that seems alive because its fleeting expression (thanks to quirks of our visual system) perpetually tantalizes the viewer.

Taken collectively, these experiments show that there is more to perception than what meets the eye. More specifically, they demonstrate that information at different scales, such as fine versus coarse, may be extracted initially from an image by separate neural channels and recombined at different stages of processing to create the final impression of a single unified picture in your mind.