HUMANS enjoy stereoscopic vision (a). As we mentioned in our essay last issue, because our eyes are separated horizontally images we see in the two eyes are slightly different and the difference is proportional to the relative depth (b). The visual areas in the brain measure these differences, and we experience the result as stereo—what we all have enjoyed as children playing with View-Master toys.
Visual-image processing from the eye to the brain happens in stages. Rudimentary features such as the orientation of edges, direction of motion, color, and so on are extracted early on in areas called V1 and V2 before reaching the next stages in the visual-processing hierarchy for a progressively more refined analysis. This stage-by-stage description is a caricature; many pathways go “back” from stage to stage—allowing the brain to play a kind of 20-questions game to arrive at a solution after successive iterations.
Returning to the concept of stereo, we can ask: At what stage is the comparison of the two eyes’ images made? If you are looking at a scene with hundreds of features, how do you know which feature in one eye matches with which feature in the other eye? How do you avoid false matches? Until the correct matching is achieved, you cannot measure differences. In stereopsis, this conundrum is called the correspondence problem.
Questions about Boundaries
To address this issue, the great 19th-century German physicist, ophthalmologist and physiologist Hermann von Helmholtz asked: Is the comparison done very early, before object boundaries are recognized, or does the brain first separately extract contours in each eye before comparing them? He concluded, without a great deal of evidence, that form perception of outlines in each eye occurs prior to interocular comparison. “Monocular form perception precedes stereopsis,” he said, arguing that the task of comparing the images in the two eyes is horrendously complex and happens very high up. The brain solves the correspondence problem by initially recognizing forms and then comparing the extended outlines of the forms. This strategy allows the brain to avoid (or minimize) false matches.
This idea was challenged nearly 100 years later by the late Hungarian scientist Bla Julesz, a non-self-effacing man of unparalleled genius, while working at Bell Labs. He employed a different stereogram (c), using computer-generated random-dot patterns rather than photographs or line drawings. In neither the left nor right eye image is there any recognizable contour or form—at all. Although these are made using a computer (as schematized in d), the principle can be understood by using a digital camera and random-dot images. Begin with a random-dot pattern about five square centimeters in size. Use a pair of scissors to cut out a one- by one-square-centimeter patch from another random-dot pattern (call it S, for square). Center this square atop the first pattern and take a photo to produce the left eye’s image (L). If S is correctly positioned, it becomes virtually invisible because of camouflage from background dots. Now, slightly shift S horizontally to the right (making sure to position it so that no boundary of overlapping dots is seen from the small square). Take another picture to make the right eye’s image, R.
Julesz presented just one image from his random-dot stereogram to each eye and was astonished to see a small square float out so vividly that he was almost tempted to grab it, even though no square is visible in either eye. The original experiment was done with digitally generated pixels rather than bits of paper, and the shift was also exactly digital. So it is not as if there is a square hidden in each eye’s image; mathematically, it does not even exist in either eye alone. It is defined exclusively by the difference—the horizontal shift of S (shown by the column of Xs and Ys in d). Julesz concluded that von Helmholtz was wrong. Because the square emerges only as a result of stereoscopic fusion, stereo matching must be a point-to-point (or pixel-to-pixel) measurement of displacement, and the outline of the square emerges solely from this comparison. Stereo precedes detection of form (“form” being used interchangeably with extended outlines and boundaries in this context).
Julesz’s demo inspired a young medical student, Jack Pettigrew (then at the University of California, Berkeley), to look at the physiology of binocular nerve cells in the earliest stage of binocular processing. Until then, the problem of stereoscopic vision seemed intractable because, if von Helmholtz were right, researchers would have had to tackle the physiology of form perception first—about which no one had the foggiest idea how to proceed. Pettigrew found, however, that his hunch was right—these cells were extracting the horizontal shifts and signaling stereo (as we discussed in our previous column).
That is the simple story, but the picture got more complicated when a student (Ramachandran) from India found that in some circumstances form perception preceded stereo, showing the flexibility of the brain’s visual centers. He created a stereogram that had a texture-defined square in each eye. He then shifted this entire square instead of shifting the dots that defined the textures (e).
He had two random-dot patterns, one in each eye. But this time there is a square visible in each eye separately—unlike Julesz patterns. It is still made of random dots yet, because of a difference of texture, a square is visible separately in each eye. The dots that constitute the left eye’s image (including S) are completely different in the two eyes; unlike Julesz’s pictures, they are uncorrelated. This stereogram is the converse of Julesz’s—a square is visible in each eye, but the dots that constitute it (and its background) is unrelated in the two eyes.
Ramachandran found that when he viewed this image through a stereoscope, the central square floated out. Because the dots defining the squares were uncorrelated in the two eyes, he and his colleagues concluded that, in this case, form perception occurred prior to stereo. The square was recognized separately in each eye before the shift across the eyes was measured. The Julesz rule could be violated. The brain often uses multiple tricks to achieve the same goal. In a noisy camouflaged environment, it makes sense to use both strategies.
The second display he invented makes the same point. It takes advantage of a curious visual effect dubbed illusory contours (f). Four “pacmen” are made of four black disks with pie-shaped wedges cut out from each. What you see, though, is not pacpeople facing each other; you see an opaque illusory white square occluding four black disks in the background. The brain says, in effect, “What is the likelihood that an evil scientist has precisely aligned these disks? More likely it is an opaque square, so that is what I will see.” You hallucinate the edges, called image segmentation.
Now can these illusory edges provide an input for stereo? Begin with the left eye’s picture in f and shift the illusory square to the left to create the right eye’s image. (This shift entails taking bigger bites out of the pie.) When you view the images through a viewer—lo and behold—the illusory square floats out! Again, form processing and image segmentation occur prior to stereo.
It gets better. Let us take a template of this stereogram and paste it on repeating wallpaper made of columns of dots (g). The dots are identical in the two eyes; they convey no disparity information. Yet amazingly, the dots inside the illusory square float out along with it—an illusion we call stereo capture; the dots are captured by the illusory square and dragged forward even though they themselves are not shifted.
This result suggests that Julesz’s claim was not entirely correct: stereo involves more than comparing pixels across the two eyes. Even if you consider Pettigrew’s disparity cells, they must be extracting tiny oriented clusters (not points) and “looking for” identical clusters to match. But the experiments of Ramachandran (and very similar results from psychologist Lloyd Kaufman of New York University) showed that the mechanism was even more sophisticated than that; it could segment the image based on implied occlusion and “hallucinate” illusory contours that can serve as tokens for stereoscopic matching. Once this information has been extracted and disparity measured, the brain constructs a 3-D illusory surface. The fact that the enclosed dots are dragged forward implies that the 3-D surface feeds back to be applied to the dots.
Thus, we may conclude that von Helmholtz, Julesz, Pettigrew and Ramachandran are all right; the visual processing of stereo is more complex than we thought. We have no inkling of the physiological mechanisms underlying these interactions. Cells signaling disparity are in V1 (as shown by Pettigrew), but cells that extract illusory contours (from implied occlusion) are extracted in area V2, the next stage up, as shown by Rudiger von der Heydt of Johns Hopkins University. These findings imply that messages from V2 must be fed back to V1 to modulate processing of smaller features. This idea has yet to be tested.
Note: This article was originally printed with the title, "Two Eyes, Two Views."