Ambiguity also arises in motion perception. In f, we begin with two light spots flashed simultaneously on diagonally opposite corners of an imaginary square, shown at 1. The lights are then switched off and replaced by spots appearing on the remaining two corners, at 2. The two frames are then cycled continuously. In this display, which we call a bistable quartet, the spots can be seen as oscillating vertically (dashed arrows) or horizontally (solid arrows) but never as both simultaneously—another example of ambiguity. It takes greater effort, but as with the cube, you can intentionally flip between these alternate percepts.
We asked ourselves what would happen if you scattered several such bistable-quartet stimuli across a computer screen. Would they all flip together when you mentally flipped one? Or, given that any one of them has a 50 percent chance of being vertical or horizontal, would each flip separately? That is, is the resolution of ambiguity global (all the quartets look the same), or does the process occur piecemeal for different parts of the visual field?
The answer is clear: they all flip together. There must be global fieldlike effects in the resolution of ambiguity. You might want to try experimenting with this on your computer. You could also ask, Does the same rule apply for the mother-in-law/wife illusion? How about the Necker cube? It is remarkable how much you can learn about perception using such simple displays; it is what makes the field so seductive.
We must be careful not to say that top-down influences play no role at all. In some of the figures, you can get stuck in one interpretation but can switch once you hear, verbally, that there is an alternative interpretation. It is as if your visual system—tapping into high-level memory—“projects” a template (for example, an old or young face) onto the fragments to facilitate their perception. One could argue that the recognition of objects can benefit from top-down processes that tap into attentional selection and memory. In contrast, seeing contours, surfaces, motion and depth is mainly from the bottom up (you can “see” all the surfaces and corners of a cube, even reach out and grab it physically, and yet not know or recognize it as a cube). In fact, we have both had the experience of peering at neurons all day through a microscope and then the next day “hallucinating” neurons everywhere: in trees, leaves and clouds. The extreme example of this effect is seen in patients who become completely blind and start hallucinating elves, circus animals and other objects—called the Charles Bonnet syndrome. In these individuals, only top-down inputs contribute to perception—the bottom-up processes, missing because they are blind (from macular degeneration or cataracts), can no longer limit their hallucinations. It is almost as though we are all hallucinating all the time and what we call object perception merely involves selecting the one hallucination that best matches the current sensory input, however fragmentary. Vision, in short, is controlled hallucination.
But doesn’t this statement contradict what we said earlier about vision being largely bottom-up? The answer to this riddle is “vision” is not a single process; perception of objectness—its outline, surface depth, and so on, as when you see a cube as cuboid—is largely bottom-up, whereas higher-level identification and categorization of objects into neurons or umbrellas do indeed benefit enormously from top-down, memory-based influences.